Agentic Tasker (Frontier STEM)
About the Role
Exciting opportunity to work directly with researchers at a top 8 Frontier Lab. The core objective of this role is to enhance the reasoning and problem-solving capabilities of a target frontier model in STEM topics by designing, validating, and analyzing challenging benchmark tasks.
Key Responsibilities
- Task Design and Development: Design challenging, real-world data science problems that serve as the foundation for Colab Bench tasks.
- Content Generation: Integrate the problems into an Agentic development environment, preparing all necessary components using Python, which include:
- Detailed Instructions and an overview of the required task.
- A Golden solution that follows the instructions.
- The necessary Environment, including datasets, Python libraries, and metadata.
- A Test notebook containing unit tests that solutions must pass.
- Evaluation and Analysis: Evaluate the cross model’s performance on the tasks
- Headroom Identification: Identify tasks where target model fails to pass all tests, specifically classifying the failure as a logical reasoning failure
- Loss Extraction: Analyze the agent’s steps (Agent Trajectory) to observe and extract core capability loss patterns from the model.
Qualifications and Recruitment
-
Expertise Focus: Applicants must have strong expertise in data science, ML, finance, and coding, with a deep background in frontier STEM.
-
Target Candidates: We are actively recruiting PhD students from top schools in the US and highly-skilled GitHub contributors. (Will consider a small cohort in India as well)
Offer Details
-
Rate:~ $30/hour.
-
Commitment: Minimum 30 hours per week (on Week days).
-
Employment Type: Contractor (no medical/paid leave).
-
Duration: 3 months (expected start date: next week).
-
Locations: India.
About Turing
Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. They connect elite AI professionals with frontier model training projects, offering competitive compensation for domain expertise across coding, STEM, creative writing, and more.
How To Apply
Click the Apply button to view the full job details on Turing and submit your application.