USARemote Work
Find JobsCategoriesAboutContactFor Employers
Sign InGet Job AlertsPost a Remote Job
USARemote Work

U.S.-focused remote opportunities, curated for better matches.

Job Seekers

  • Browse Jobs
  • Categories
  • Create an Account

Employers

  • Post a Remote Job
  • Employer Login
  • Create Employer Account

Company

  • About
  • Contact

Legal

  • Terms of Service
  • Privacy Policy

© 2026 USA Remote Work — Built for U.S.-based remote job seekers.

TermsPrivacyContactAdmin
HomeJobsDataMember of Technical Staff (Data Scientist, Evals)
P

Member of Technical Staff (Data Scientist, Evals)

Perplexity• 4 months ago•via ashby:perplexity
Full-time Fully remote Salary not disclosed

Job Snapshot

Company
Perplexity
Category
Data
Remote
Fully remote
Eligibility
All 50 states
Posted
4 months ago
Salary
Not disclosed

Eligibility

Hiring in all 50 states.

About this role

Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sources. We aim to use the latest models as they are released, but the intelligence frontier is a jagged one, and popular benchmarks do not effectively cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based LLM answers and other scenarios popular with our users.

Responsibilities

  • Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness
  • Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality
  • Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
  • Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements
  • Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality

Qualifications

  • PhD or MS in a technical field or equivalent experience
  • 4+ years of experience in data science or machine learning
  • Strong proficiency in Python and SQL (expected to write production-grade code)
  • Experience building within a modern cloud data stack, specifically AWS and Databricks
  • Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster

Preferred Qualifications

  • 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups
  • Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale
  • A strong research background, with experience applying research methods to real-world ML problems
  • Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets
This is a preview of a job posted on We Work Remotely. Read the full description and apply →
Report this listing

Similar Remote Jobs

R

Analytics Engineer

Reddit

All 50 statesFull-timeabout 9 hours ago
Salary not disclosedData
R

Analytics Engineer

Reddit

All 50 statesFull-timeabout 9 hours ago
Salary not disclosedData
T

Staff, Analytics Engineer

Twilio

All 50 statesFull-timeabout 13 hours ago
Salary not disclosedData
Compensation
Salary not disclosed

This job was originally posted on ashby:perplexity. Clicking Apply will take you to the original listing.

You'll apply on ashby:perplexity's site.

HirePilot Pro

Unlock AI Tools For This Job

Analyze your fit, tailor your resume, and generate a cover letter using this job description.

Unlock HirePilot Pro

7-Day Money-Back Guarantee

Share

Originally posted on ashby:perplexity. View original.

Optimize This Application