AfterQuery logo

Mechanistic Interpretability (llms) Machine Learning Expert

AfterQuery
3 hours ago
Contract
Remote
Worldwide
AI Trainer Jobs – Train AI Systems In Your Area Of Expertise

This is a remote, project-based role for machine learning researchers with deep expertise in mechanistic interpretability. You will complete tasks at the frontier of interpretability research — including analyzing internal model representations, reverse-engineering learned circuits, and developing tools and techniques to understand how neural networks compute. Work is over the next 2–3 weeks, asynchronous, and assigned on a project-by-project basis, with an expected commitment of 10–20 hours per week for the projects you accept. This position offers exceptional pay, exposure to cutting-edge AI safety and interpretability research, and a strong addition to your research portfolio.

Commitment: 10 hours/week | Pay: $150 - $200/hr | Type: Contract

Responsibilities

  • Conduct mechanistic interpretability research on transformer-based and other neural network architectures
  • Identify, isolate, and analyze computational circuits responsible for specific model behaviors
  • Apply and extend techniques such as activation patching, probing, sparse autoencoders, and attention analysis
  • Develop tools and frameworks to automate or scale interpretability workflows across model families
  • Document methodologies, findings, and technical approaches clearly and reproducibly

Required Qualifications

  • Published researcher with at least one first-author publication in a peer-reviewed venue (e.g., NeurIPS, ICML, ICLR, or equivalent)
  • Master's or PhD in Machine Learning, Artificial Intelligence, Computer Science, or a related quantitative field
  • Demonstrated expertise in mechanistic interpretability, model analysis, or AI safety research
  • Deep familiarity with transformer architectures and modern large language model internals
  • Strong problem-solving skills and ability to work independently on open-ended research tasks

Preferred Qualifications

  • Hands-on experience with interpretability tools and libraries (e.g., TransformerLens, baukit, or similar)
  • Familiarity with sparse autoencoders, superposition, and feature geometry research
  • Background in TA'ing or teaching deep learning, NLP, or AI safety courses

Why Apply

  • Flexible Time Commitment – Work on your schedule while tackling meaningful research challenges
  • Startup Exposure – Work directly with an early-stage Y Combinator-backed company, gaining hands-on experience that sets you apart
  • Exceptional Pay – Project-based pay ranges from $150–$200/hour
  • Portfolio Building – Gain experience on frontier interpretability and AI safety research problems
  • Professional Growth – Sharpen your skills on varied, challenging model analysis and reverse-engineering tasks