This is a remote, project-based role for machine learning researchers with deep expertise in mechanistic interpretability. You will complete tasks at the frontier of interpretability research — including analyzing internal model representations, reverse-engineering learned circuits, and developing tools and techniques to understand how neural networks compute. Work is over the next 2–3 weeks, asynchronous, and assigned on a project-by-project basis, with an expected commitment of 10–20 hours per week for the projects you accept. This position offers exceptional pay, exposure to cutting-edge AI safety and interpretability research, and a strong addition to your research portfolio.
Commitment: 10 hours/week | Pay: $150 - $200/hr | Type: Contract
Responsibilities
- Conduct mechanistic interpretability research on transformer-based and other neural network architectures
- Identify, isolate, and analyze computational circuits responsible for specific model behaviors
- Apply and extend techniques such as activation patching, probing, sparse autoencoders, and attention analysis
- Develop tools and frameworks to automate or scale interpretability workflows across model families
- Document methodologies, findings, and technical approaches clearly and reproducibly
Required Qualifications
- Published researcher with at least one first-author publication in a peer-reviewed venue (e.g., NeurIPS, ICML, ICLR, or equivalent)
- Master's or PhD in Machine Learning, Artificial Intelligence, Computer Science, or a related quantitative field
- Demonstrated expertise in mechanistic interpretability, model analysis, or AI safety research
- Deep familiarity with transformer architectures and modern large language model internals
- Strong problem-solving skills and ability to work independently on open-ended research tasks
Preferred Qualifications
- Hands-on experience with interpretability tools and libraries (e.g., TransformerLens, baukit, or similar)
- Familiarity with sparse autoencoders, superposition, and feature geometry research
- Background in TA'ing or teaching deep learning, NLP, or AI safety courses
Why Apply
- Flexible Time Commitment – Work on your schedule while tackling meaningful research challenges
- Startup Exposure – Work directly with an early-stage Y Combinator-backed company, gaining hands-on experience that sets you apart
- Exceptional Pay – Project-based pay ranges from $150–$200/hour
- Portfolio Building – Gain experience on frontier interpretability and AI safety research problems
- Professional Growth – Sharpen your skills on varied, challenging model analysis and reverse-engineering tasks