Turing is one of the world’s fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.
Turing helps customers in two ways: Working with the world’s leading AI labs to advance frontier model capabilities in thinking, reasoning, coding, agentic behavior, multimodality, multilinguality, STEM and frontier knowledge; and leveraging that work to build real-world AI systems that solve mission-critical priorities for companies.
Annotators are the core builders of SkillsBench. You will design and write AI evaluation tasks — structured challenges given to large language model (LLM) agents running inside automated environments. Each task you create tests whether an AI agent performs significantly better when given domain-specific knowledge
skills versus without it. Your tasks directly feed into Turing's commercial AI evaluation pipeline, used by clients.
Write clear, unambiguous task instructions that define exactly what an AI agent must produce, where to save it, and what rules to follow Create reference solutions that demonstrate the correct approach and pass all automated checks Write human-readable verifier descriptions listing every check the automated test suite will run Author domain-specific skill files that teach an AI agent the conventions, workflows, and edge cases relevant to the task — without leaking expected answers Ensure the no-skills variant of each task is identical to the with-skills variant except for the absence of skill files Work within the task structure (instruction, environment, solution, tests) and follow Turing's task quality standards
Bachelor's degree or higher in a relevant technical or domain-specific field (Computer Science, Engineering, Finance, Data Science, Linguistics, etc.) Experience: 1–3 years in a domain where you have hands-on practical expertise (software development, financial analysis, document processing, data science, etc.)
Strong written English; ability to write precise, unambiguous instructions
Genuine hands-on expertise in at least one of the SkillsBench domains (coding, finance, document generation, audio/ML, etc.)
Ability to think from an AI agent's perspective — what would a model get wrong without guidance?
Comfort reading and producing structured file outputs (JSON, DOCX, XLSX, Markdown)
Prior experience with LLM evaluation, prompt engineering, or AI benchmark design
Familiarity with Python scripting Experience with Docker or containerised environments
Power Systems & Control
Cybersecurity
Network & System Engineering
Turing is the world's leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. They connect elite AI professionals with frontier model training projects, offering competitive compensation for domain expertise across coding, STEM, creative writing, and more.
Click the Apply button to view the full job details on Turing and submit your application.