Alignerr logo

Senior Software Engineer — AI Evaluation & Benchmarks

Alignerr
2 hours ago
Contract
Remote
Worldwide
$80 - $100 USD hourly
AI Trainer Jobs – Train AI Systems In Your Area Of Expertise

Senior Software Engineer — AI Evaluation & Benchmarks

About the Role

What if the code you write could determine how smart the next generation of AI truly is? We're looking for experienced Software Engineers to design and build the coding benchmarks and data pipelines used to evaluate frontier AI models — the systems that decide whether an AI can actually reason, debug, and write production-quality software.

This is high-impact, technically demanding work at the intersection of software engineering and AI research. You'll work with large codebases, multiple programming languages, and scalable infrastructure to create evaluation systems that push the boundaries of what AI can do.

This is a fully remote contract role. If you thrive in fast-paced engineering environments and want your work to directly shape the trajectory of AI — this is the role.

  • Organization: Alignerr
  • Type: Hourly Contract
  • Location: Remote
  • Contract Length: 3 Months
  • Commitment: Full-time availability preferred

What You'll Do

  • Design and implement coding benchmarks used to evaluate frontier AI models across real-world programming tasks
  • Build and maintain scalable data pipelines for AI evaluation workflows
  • Analyze AI-generated code for correctness, reliability, and edge-case failures
  • Create structured evaluation scenarios that rigorously test reasoning, debugging, and code quality
  • Work with large code repositories and multi-language environments
  • Collaborate on systems that improve how AI models understand and generate software
  • Provide detailed technical feedback on model performance and failure patterns
  • Contribute to the design of evaluation frameworks that set industry standards

Who You Are

  • 4+ years of professional software engineering experience — this is non-negotiable
  • Experience working at a high-growth tech company or top-tier software organization
  • Expert proficiency in Python — you write clean, performant, well-tested Python code
  • Hands-on experience with code repositories and working in large, complex codebases
  • Proven experience designing and implementing LLM coding benchmarks and data pipelines
  • Track record of working in high-performance engineering environments with large-scale products or platforms
  • Strong command of version control systems (Git) and modern development workflows
  • Bilingual or native English speaker with strong written communication skills
  • Self-directed, technically rigorous, and comfortable operating with autonomy

What Makes a Perfect Match

Candidates with these additional qualifications have the highest chance of success:

  • Senior or Lead-level engineering profiles with a history of technical ownership
  • Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field — or equivalent professional experience
  • Proficiency in one or more additional languages: JavaScript, Go, C++, or other relevant languages
  • Experience with CI/CD pipelines and writing robust unit tests (pytest, Mocha, JUnit)
  • Background in security engineering or significant open-source contributions
  • Familiarity with AI/ML evaluation methodologies or model benchmarking

Why Join Us

  • Work on cutting-edge AI evaluation projects alongside world-class research teams
  • Fully remote — work from anywhere with a reliable internet connection
  • Your benchmarks directly influence how the most advanced AI systems in the world are measured and improved
  • Freelance autonomy with meaningful, high-stakes engineering work
  • Collaborate with a global community of elite engineers and researchers
  • Potential for contract extension and ongoing engagement as new evaluation challenges emerge