A research initiative by Beth Israel Deaconess Medical Center & NAMI

The Problem

Millions of people already use AI for mental health support. The tools they rely on have not been independently or systematically evaluated. AI chatbots are increasingly being used for mental health support, but rigorous evaluation of these tools has lagged far behind their adoption.

MindBench.ai is built to address that gap.

What We Do

We are building a comprehensive, publicly available evaluation resource that examines AI systems across multiple dimensions — from the safety of their technical infrastructure to the personas they present to users, to what they know and how they reason about clinical problems.

Our work is shaped by clinicians, engineers, researchers, and people with lived experience of mental illness, because no single perspective is sufficient to determine whether these tools are safe and effective.

The goal is to give users, clinicians, developers, and policymakers the evidence they need to make informed decisions about AI in mental health.

What We Evaluate

Safety & Infrastructure

Technical safeguards, crisis response protocols, and data privacy protections.

Clinical Reasoning

Accuracy on clinically relevant benchmarks and knowledge of psychiatric care.

User Experience

Conversational style, persona, and how tools present themselves to users.

Transparency

Disclosure of AI nature, limitations, and evidence base to users.

Built on Our Prior Work

MindBench is developed in partnership with NAMI and builds on the foundation of the division's long-running work with MindApps.org — our app evaluation platform that has assessed over 500 mental health apps using a framework developed with the American Psychiatric Association. The platform evaluates both the profile of AI tools (technical features, privacy protections, conversational style) and their performance on clinically relevant benchmarks.

Visit MindBench.ai