
Diksha Shrivastava
Diksha is an AI Safety Researcher based in India, who has spent the last eight months working full-time on safety research at a capability-first lab and is now moving to independent work. Her research sits at the intersection of dynamic agency, multi-agent risks, developmental interpretability and scalable oversight — she’s building upon Causal Incentives research to study how co-evolving environments shape temporal goal structures in RL agents using regret-based Unsupervised Environment Design. She’s particularly interested in what it means for an agent to model its own training process, and what that implies for oversight. Alongside her research, she volunteers in reading groups and mentors people new to AI Safety. She’s always glad to talk about risks from open-endedness, agent epistemics, or alignment as an environment design problem.Note: I’m not active on social media — the best way to reach me is by email.