Paid Internship
Work Mode
Time Spent
Required Degree
Duration

Open Positions

Experience More On the Go

GET IT ONGoogle Play
Download on theApp Store
© 2026you'll get it. all rights reserved.

Internship Explorer

  • Explore
  • Saved Internships
Sign In

Internship Explorer

  • Explore
  • Saved Internships
Sign In
Paid Internship
Work Mode
Time Spent
Required Degree
Duration

4Open Positions

Auto-load
  • Applied Research Intern

    labelbox
    San Francisco, United States
    Found 6 months ago
  • Working Student (f/m/d) Embedded Control

    NXP Semiconductors
    Munich, Germany
    Found 3 weeks ago
  • Product Application Engineer

    AMD
    Belfast, United Kingdom
    Found 1 month ago
  • Student assistant

    Technische Universität Berlin
    Berlin, Germany
    Found 1 month ago
  • Stagiaire – Calcul embarqué des pertes IGBT d’un onduleur moyenne tension (H/F)

    GE Vernova
    Belfort, France
    Found 2 months ago

Applied Research Intern

labelbox
Found 6 months ago
Location
San Francisco, United States
Time
Not disclosed
Work Mode
Hybrid
Salary
$35 - $45 USD
Visa Help
Not disclosed
Last Verified
1 month ago

Education

  • Master
  • PhD

Skills & Qualifications

Technical Skills

  • Python
  • deep learning frameworks
  • PyTorch
  • JAX
  • TensorFlow
  • LLMs
  • multimodal models
  • AI
  • machine learning

Soft Skills

  • communication skills
  • collaboration skills

Job Description

As an Applied Research intern at Labelbox, you will design, build, and productionize evaluation and post‑training systems for frontier LLMs and multimodal models. You’ll own continuous, high-quality evals and benchmarks (reasoning, code, agent/tool‑use, long‑context, vision‑language, et al.), create and curate post‑training datasets (human + synthetic), and prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to measure and improve real‑world task and agent performance. Build and own evaluation and benchmark suites for reasoning, code, agents, long‑context, and V/LLMs. Create post‑training datasets at scale: design preference/critique pipelines (human + synthetic), and target hard failures surfaced by evals. Experiment and prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to improve real‑world task and agent performance. Land research in product: ship improvements into Labelbox workflows, services, and customer‑facing evaluation/quality features; quantify impact with customer and internal metrics. Engage with customer research teams: run pilots, co‑design benchmarks, and share practical findings through internal research reports, blog posts, talks, and published papers.

Requirements

  • design, build, and productionize evaluation and post‑training systems for frontier LLMs and multimodal models
  • own continuous, high-quality evals and benchmarks (reasoning, code, agent/tool‑use, long‑context, vision‑language, et al.)
  • create and curate post‑training datasets (human + synthetic)
  • prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to measure and improve real‑world task and agent performance
  • Build and own evaluation and benchmark suites for reasoning, code, agents, long‑context, and V/LLMs
  • Create post‑training datasets at scale: design preference/critique pipelines (human + synthetic), and target hard failures surfaced by evals
  • Experiment and prototype RLHF/RLAIF/RLVR/RM/DPO‑style training loops to improve real‑world task and agent performance
  • Land research in product: ship improvements into Labelbox workflows, services, and customer‑facing evaluation/quality features; quantify impact with customer and internal metrics
  • Engage with customer research teams: run pilots, co‑design benchmarks, and share practical findings through internal research reports, blog posts, talks, and published papers
  • A strong foundation in AI and machine learning, backed by a Ph.D. or Master’s degree in Computer Science, Machine Learning, AI, or a related field (in progress degrees are acceptable for intern positions)
  • A deep understanding of frontier autoregressive and diffusion multimodal models, along with the human and synthetic data strategies needed to optimize them
  • Passion and experience for LLM evaluation and benchmarking
  • Expertise in training data quality construction, measurement and refinement
  • The ability to bridge research and application by interpreting new findings and translating them into functional prototypes
  • A track record of publishing in top-tier AI/ML conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL) and contributing to the broader research community
  • Proficiency in Python and experience with deep learning frameworks like PyTorch, JAX, or TensorFlow
  • Exceptional communication and collaboration skills
  • Are you currently based in the SF Bay Area and open to working onsite 2 3 days per week?*
  • Do you hold, or are you currently pursuing, a Master’s or Ph.D. in Computer Science, Machine Learning, AI, or a related field?*
  • Do you have experience with Python and deep learning frameworks such as PyTorch, JAX, or TensorFlow?*

Related Field

  • AI & Machine Learning

Related Subfield

  • AI Research

Languages

  • English

Nice to Haves

  • publishing in top-tier AI/ML conferences
  • contributing to the broader research community
▶Apply Now

Similar Roles You Might Like

  • Graduation Internship - research

    H Company
    London, United Kingdom
    Found 2 months ago
  • Master's Thesis Expert-in-the-Loop AgentOps/LLMOps Pipeline for Systems Engineering Process Agents

    FEV.io
    Aachen, Germany
    Found 2 months ago
  • Junior UX/UI designer

    Bending Spoons
    Milan, Italy
    Found 3 weeks ago
  • GPU Embedded AI/ML Technical Marketing Engineer

    AMD
    Belfast, United Kingdom
    Found 1 month ago
  • AI/ML Specialist – Human-Machine Interaction (Internship)

    Huawei Technologies Research & Development (UK) Ltd
    London, United Kingdom
    Found 1 month ago
  • Systems Business Trainer

    Taylor Rose
    Peterborough, United Kingdom
    Found 2 months ago