How do I apply for the Applied Research Intern position at labelbox and track my application?

We provide a direct link to the official application portal. You can apply directly here . Additionally, you can easily track this and other applications by clicking the bookmark icon to save it to your YGI Kanban board .

Is the internship at labelbox paid and what compensation can candidates expect?

Yes, this is a paid internship opportunity. Exact compensation details are typically finalized during the company interview process.

What are the working arrangements and expected duration for the Applied Research Intern role?

This position is a not disclosed, structured as a hybrid role.

What educational background or degree level is required for the Applied Research Intern at labelbox?

Candidates are generally expected to be holding or pursuing the following degree levels: Master, PhD.

Does labelbox provide visa sponsorship or assistance for international applicants?

Regarding visa assistance and sponsorship for this role, the official status is currently listed as: Not disclosed.

Applied Research Intern

labelbox

Found 6 months ago

Location

San Francisco, United States

Time

Not disclosed

Work Mode

Hybrid

Salary

$35 - $45 USD

Visa Help

Not disclosed

Last Verified

1 month ago

Education

Master
PhD

Skills & Qualifications

Technical Skills

Python
deep learning frameworks
PyTorch
JAX
TensorFlow
LLMs
multimodal models
AI
machine learning

Soft Skills

communication skills
collaboration skills

Job Description

As an Applied Research intern at Labelbox, you will design, build, and productionize evaluation and post‑training systems for frontier LLMs and multimodal models. You’ll own continuous, high-quality evals and benchmarks (reasoning, code, agent/tool‑use, long‑context, vision‑language, et al.), create and curate post‑training datasets (human + synthetic), and prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to measure and improve real‑world task and agent performance. Build and own evaluation and benchmark suites for reasoning, code, agents, long‑context, and V/LLMs. Create post‑training datasets at scale: design preference/critique pipelines (human + synthetic), and target hard failures surfaced by evals. Experiment and prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to improve real‑world task and agent performance. Land research in product: ship improvements into Labelbox workflows, services, and customer‑facing evaluation/quality features; quantify impact with customer and internal metrics. Engage with customer research teams: run pilots, co‑design benchmarks, and share practical findings through internal research reports, blog posts, talks, and published papers.

Requirements

design, build, and productionize evaluation and post‑training systems for frontier LLMs and multimodal models
own continuous, high-quality evals and benchmarks (reasoning, code, agent/tool‑use, long‑context, vision‑language, et al.)
create and curate post‑training datasets (human + synthetic)
prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to measure and improve real‑world task and agent performance
Build and own evaluation and benchmark suites for reasoning, code, agents, long‑context, and V/LLMs
Create post‑training datasets at scale: design preference/critique pipelines (human + synthetic), and target hard failures surfaced by evals
Experiment and prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to improve real‑world task and agent performance
Land research in product: ship improvements into Labelbox workflows, services, and customer‑facing evaluation/quality features; quantify impact with customer and internal metrics
Engage with customer research teams: run pilots, co‑design benchmarks, and share practical findings through internal research reports, blog posts, talks, and published papers
A strong foundation in AI and machine learning, backed by a Ph.D. or Master’s degree in Computer Science, Machine Learning, AI, or a related field (in progress degrees are acceptable for intern positions)
A deep understanding of frontier autoregressive and diffusion multimodal models, along with the human and synthetic data strategies needed to optimize them
Passion and experience for LLM evaluation and benchmarking
Expertise in training data quality construction, measurement and refinement
The ability to bridge research and application by interpreting new findings and translating them into functional prototypes
A track record of publishing in top-tier AI/ML conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL) and contributing to the broader research community
Proficiency in Python and experience with deep learning frameworks like PyTorch, JAX, or TensorFlow
Exceptional communication and collaboration skills
Are you currently based in the SF Bay Area and open to working onsite 2 3 days per week?*
Do you hold, or are you currently pursuing, a Master’s or Ph.D. in Computer Science, Machine Learning, AI, or a related field?*
Do you have experience with Python and deep learning frameworks such as PyTorch, JAX, or TensorFlow?*

AI & Machine Learning

AI Research

Languages

English

Nice to Haves

publishing in top-tier AI/ML conferences
contributing to the broader research community

Apply Now

4Open Positions