Peer-Reviewed Research

Our Publications

Explore our body of work across digital psychiatry, AI safety, and mental health technology.

Showing 1–20 of 435 publications

Beyond artificial intelligence psychosis: a functional typology of large language model-associated psychotic phenomena

Flathers M, Roux S, Torous J

The Lancet. Digital health·2026·Review

A man breached Windsor Castle with a crossbow after his large language model (LLM)-based companion encouraged an assassination plan. A father's question about pi evolved into more than 300 h of engagement with an LLM, leading to delusions about reality-altering mathematical formulas. An Australian woman's early-stage psychotic symptoms worsened when an LLM validated her distorted beliefs. A Florida man's grief over losing access to his ChatGPT persona culminated in a fatal police encounter. These disparate cases, often grouped together under the label of artificial intelligence psychosis, represent distinct phenomena requiring different clinical and technological interventions. This Viewpoint, co-written by a software engineer, a person with lived experience of schizophrenia, and a psychiatrist, proposes a functional typology of LLM-associated psychotic phenomena, based on the system's role: catalyst (precipitating new symptoms in previously healthy individuals), amplifier (worsening pre-existing psychiatric symptoms), coauthor (participating in the development of harmful narratives), or object (becoming the focus of delusional beliefs). By distinguishing functional roles rather than assuming a unified phenomenon, this typology allows clinicians to identify concerning LLM usage patterns and technology companies to develop specific safeguards, moving beyond sensationalised terminology towards mechanism-specific interventions.

Barriers and facilitators to usability of a smartphone-based digital mental health tool in older adults: Insights from a secondary analysis of mindLAMP

Calvert E, Hackett K, Torous J, Giovannetti T

International psychogeriatrics·2026

mindLAMP

As demand for mental healthcare access grows among older adult populations, digital mental health tools have emerged as promising tools. However, bridging the digital divide among older technology users remains critical. This post-hoc analysis evaluated potential factors influencing the adoption of a digital mental health tool in older adults. We analyzed data from 37 older adults who used a digital phenotyping app (mindLAMP) for 4 weeks to capture passive sensor data and complete nightly surveys. We examined associations between baseline participant features including demographics, cognition, mood, technology attitudes and use, and usability outcomes including app training metrics, adherence, and self-reported usability. Participants had a mean age of 72, with most identifying as female (68 %), college educated (76 %), retired (81 %), and White (59 %). The app demonstrated high usability, with baseline training averaging 20.2 (±6.5) minutes and 80 % nightly survey completion. At study completion, 30/37 participants reported finding the app easy to use. While not significant after correction, female sex, Black race, and some college education emerged as potentially promising factors associated with better usability outcomes. These findings suggest that with modest training, older adults can engage with digital health tools and report positive usability experiences. Differences in usability outcomes by sex, race, and education point to potential characteristics that may influence engagement. However, given the small, highly educated sample, these findings should be replicated in larger, more diverse cohorts to better understand which factors support the successful use of digital health tools in older adults.

Efficacy and risks of artificial intelligence chatbots for anxiety and depression: a narrative review of recent clinical studies

Bodner R, Lim K, Schneider R, Torous J

Current opinion in psychiatry·2026·Review

AIreview

The rapidly growing environment of artificial intelligence (AI) has accelerated interest in its potential use for improving the efficiency and efficacy of the healthcare industry. Specifically, there has been a growing interest in AI role mental healthcare for common disorders like anxiety and depression. However, it remains unclear whether current evidence is sufficient to determine efficacy and safety of AI chatbots in clinical practice. Most studies reported symptom reductions on validated anxiety and depression measures; however, the majority lacked appropriate active control conditions, featured small and demographically narrow samples, and used inconsistent outcome metrics, limiting generalizability and replication. Reporting of adverse events was rare, and potential risks such as emotional dependence and parasocial relationships were largely unexamined. This paper offers a review of the recent literature (February 2024 and July 2025) regarding AI effectiveness in treating anxiety and depression. While findings suggest that AI chatbots are feasible and acceptable to users, current evidence is insufficient to determine their efficacy or safety in clinical practice.

Metadata in Smartphone-Based Cognitive Assessments: Current State and Emerging Evidence in Psychiatric Disorders

Kim KW, Byun AJS, Castillo J, Youn YC, et al.

Harvard review of psychiatry·2026·Review

SHARPmindLAMP

Smartphone-based cognitive assessments have emerged as promising tools for frequent and ecologically valid monitoring of cognitive function in real-world settings. These tools enable continuous capture of cognitive and behavioral patterns, including intra-individual variability, practice-related improvement, and contextual influences. Repeated assessments offer a unique opportunity to detect subtle cognitive changes over time. The interpretability and clinical utility of the metadata generated by such assessments, however, remain underexplored. In this review, we consider the current landscape of smartphone-derived cognitive metadata in the context of cognitive and affective disorders. We focus on emerging evidence linking metadata features to functional outcomes and symptom fluctuations across conditions such as schizophrenia, bipolar disorder, and depression. Additionally, we discuss methodological considerations for optimizing metadata analysis, including test design, sampling frequency, and analytical strategies. We propose that cognitive metadata may serve as sensitive indicators of early cognitive change and support personalized mental health monitoring and targeted intervention.

Mindbench.ai: an actionable platform to evaluate the profile and performance of large language models in a mental healthcare context

Dwyer B, Flathers M, Sano A, Dempsey A, et al.

NPP - digital psychiatry and neuroscience·2025

mindbench

Individuals are increasingly utilizing large language model (LLM)-based tools for mental health guidance and crisis support in place of human experts. While AI technology has great potential to improve health outcomes, insufficient empirical evidence exists to suggest that AI technology can be deployed as a clinical replacement; thus, there is an urgent need to assess and regulate such tools. Regulatory efforts have been made and multiple evaluation frameworks have been proposed, however,field-wide assessment metrics have yet to be formally integrated. In this paper, we introduce a comprehensive online platform that aggregates evaluation approaches and serves as a dynamic online resource to simplify LLM and LLM-based tool assessment: MindBench.ai. At its core, MindBench.ai is designed to provide easily accessible/interpretable information for diverse stakeholders (patients, clinicians, developers, regulators, etc.). To create MindBench.ai, we built off our work developing MINDapps.org to support informed decision-making around smartphone app use for mental health, and expanded the technical MINDapps.org framework to encompass novel large language model (LLM) functionalities through benchmarking approaches. The MindBench.ai platform is designed as a partnership with the National Alliance on Mental Illness (NAMI) to provide assessment tools that systematically evaluate LLMs and LLM-based tools with objective and transparent criteria from a healthcare standpoint, assessing both profile (i.e. technical features, privacy protections, and conversational style) and performance characteristics (i.e. clinical reasoning skills). With infrastructure designed to scale through community and expert contributions, along with adapting to technological advances, this platform establishes a critical foundation for the dynamic, empirical evaluation of LLM-based mental health tools-transforming assessment into a living, continuously evolving resource rather than a static snapshot. AI chatbots powered by large language models are increasingly used for mental health support, yet they can give misleading or unsafe replies. To address this, our team created MindBench.ai, an open platform that helps patients, clinicians, researchers, and regulators evaluate AI systems transparently and consistently. Building on MINDapps.org, it profiles and benchmarks AI tools with metrics developed with NAMI, experts, and people with lived experience to ensure transparency, safety, and responsible use in mental health.

Contextualizing Clinical Benchmarks: A Tripartite Approach to Evaluating LLM-Based Tools in Mental Health Settings

Flathers M, Dwyer B, Rozenblit E, Torous J

Journal of psychiatric practice·2025

mindbench

The rapid proliferation of Large Language Model (LLM)-based tools in mental health care presents an urgent need for clinical evaluation frameworks. With millions already engaging with Artificial Intelligence (AI) tools, mental health disciplines require immediate, practical evaluation approaches rather than awaiting idealized methodologies. This paper introduces a practical, implementable approach to evaluating LLM-based tools in mental health settings through both theoretical analysis and actionable assessment methods. We propose a tripartite evaluation framework comprising: (1) the technical profile layer, which assesses foundational model safety and infrastructure compliance; (2) the health care knowledge layer, which validates domain-specific clinical knowledge and safety boundaries; and (3) the clinical reasoning layer, which evaluates decision-making capabilities and reasoning processes. Each proposed layer includes concrete evaluation methods that clinical teams can implement immediately, from direct model questioning to adversarial testing approaches. As health care organizations conduct and share evaluations using this approach, the field can collectively develop the specialized benchmarks and reasoning assessments essential for ensuring LLM integrations enhance rather than compromise patient care in the mental health space. The framework serves both as an immediate practical guide and a foundation for building more sophisticated evaluation resources tailored to mental health contexts.

Accelerating Digital Mental Health: The Society of Digital Psychiatry's Three-Pronged Road Map for Education, Digital Navigators, and AI

Torous J, Ledley KT, Gorban C, Strudwick G, et al.

JMIR mental health·2025·Editorial

SODP

Digital mental health tools such as apps, virtual reality, and artificial intelligence (AI) hold great promise but continue to face barriers to widespread clinical adoption. The Society of Digital Psychiatry, in partnership with JMIR Mental Health, presents a 3-pronged road map to accelerate their safe, effective, and equitable implementation. First, education: integrate digital psychiatry into core training and professional development through a global webinar series, annual symposium, newsletter, and an updated open-access curriculum addressing AI and the evolving digital navigator role. Second, AI standards: develop transparent, actionable benchmarks and consensus guidance through initiatives like MindBench.ai to assess reasoning, safety, and representativeness across populations. Third, digital navigators: expand structured, train-the-trainer programs that enhance digital literacy, engagement, and workflow integration across diverse care settings, including low- and middle-income countries. Together, these pillars bridge research and practice, advancing digital psychiatry grounded in inclusivity, accountability, and measurable clinical impact.

Mobile cognitive remote assessment of schizophrenia: a global multi-site pilot study

Castillo J, Cheong J, Choudhary S, Bondre A, et al.

Schizophrenia (Heidelberg, Germany)·2025

sharpmindlamp

Cognition in schizophrenia is difficult to assess in clinical settings due to the time required to administer traditional pen-and-paper tests, among other factors. Digital remote assessments completed on a smartphone offer an alternative that can reduce the burden on healthcare staff and patients, in addition to providing more nuanced cognitive profiles, especially when used in conjunction with smartphone data such as sleep. Building on previous work using the mindLAMP research app in international contexts, this paper presents a global multi-site pilot study to explore the validity of the app's digital cognitive assessments as proxies for traditional in-person assessments such as the gold standard MATRICS Consensus Battery (MCCB). Across one site in the U.S. (Boston) and two sites in India (Bangalore and Bhopal), a total of 56 participants with diagnoses of early-course schizophrenia or schizoaffective disorder were recruited between September 2024 and March 2025 to engage with the mindLAMP app for 30 days. Participants completed 2-3 different cognitive tasks and surveys each day; at the beginning and the end of this period, participants also took the MCCB and surveys related to their diagnosis. mindLAMP cognitive assessments were scored using different metrics that combine speed and accuracy, and correlation analyses were run on these metrics and MCCB domains. Of the scoring metrics used, the Rate-Correct Score (RCS) most consistently correlates with baseline MCCB domains corrected for age, gender, and education. Moderate test-retest reliability was observed across certain cognitive assessments such as a mobile version of Trails-Making Test A and Symbol Digit Substitution, which agrees with previous research done by Keefe et al.; poor test-retest reliability, in contrast, was observed across assessments such as Spatial Span. Additionally, we conducted exploratory mediation analyses using sleep data to see if sleep mediates between the Ecological Momentary Assessment (EMA) survey scores and performance on select digital cognitive assessments on mindLAMP. Our results support the initial accessibility, validity and reliability of using smartphones to assess cognition in schizophrenia. Future research to develop additional smartphone-based cognitive tests, as well as with larger samples and in other psychiatric populations, is warranted.

FDA-authorized software as a medical device in mental health: a perspective on evidence, device lineage, and regulatory challenges

Herpertz J, Stern AD, Opel N, Reininghaus U, et al.

Npj mental health research·2025

mindapps

FDA approval is widely regarded as a benchmark of quality for medical devices. However, concerns persist regarding its regulatory framework for digital mental health devices. This perspective article examined FDA-authorized Software as a Medical Device (SaMD) in mental health, tracing the devices' regulatory lineage through the De Novo and 510(k)-clearance pathways while assessing the quality of evidence that led to their authorization. Many 510(k)-cleared devices lacked direct evidence of effectiveness, relying solely on equivalence to predicate devices. Furthermore, we identified four FDA-authorized SaMD whose pivotal randomized controlled trials tested prototypes delivered on different digital platforms than those of the final marketed products. Strengthening regulatory standards requires randomized controlled trials evaluating the final marketed product on its intended platform and the use of context-appropriate control conditions. Sham placebo controls may be considered feasible; however, evidence supporting fully inert and fully blinding sham controls for digital interventions remains limited at present. This should occur alongside consistent application of the FDA's discretionary authority to require new 510(k) submissions when substantial product changes occur.

Social Media Detox and Youth Mental Health

Calvert E, Cipriani M, Dwyer B, Lisowski V, et al.

JAMA network open·2025

social media

The association between social media use and youth mental health remains poorly understood, with recent systematic reviews reporting inconsistent and conflicting findings. These discrepancies reflect the overreliance on self-reported estimates of use, lack of passive monitoring of behavior, and limited measurement of momentary mental health states. To examine the association between objective social media use, problematic engagement, and mental health outcomes in young adults, as well as evaluate the outcomes of a 1-week social media detox intervention on behavior and mental health symptoms. A remote cohort study conducted in the US using a national recruitment registry between March 2024 and March 2025, where participants completed a 2-week observational baseline, followed by an optional 1-week social media detox intervention. Participants were young adults (aged 18-24 years) with a smartphone and English fluency. Social media use of Facebook, Instagram, Snapchat, TikTok, and X over a 2-week baseline period, followed by an optional 1-week detox intervention. The main outcomes were detox changes in symptoms of depression (Patient Health Questionnaire-9), anxiety (Generalized Anxiety Disorder-7), insomnia (Insomnia Severity Index), and loneliness (University of California, Los Angeles Loneliness Scale), as well as within-participant changes in behavior including screen use, communication, mobility, and momentary mental health states. Of 417 enrolled participants, 373 (mean [SD] age, 21.0 [1.9] years) completed baseline assessments, with 295 (79.1%) opting into a detox intervention that reduced symptoms anxiety by 16.1% (-1.9 reduction; Cohen d, -0.44; 95% CI, -0.56 to -0.32), depression by 24.8% (-2.0 reduction; Cohen d, -0.37; 95% CI, -0.49 to -0.32) and insomnia by 14.5% (-2.1 reduction; Cohen d, -0.44; 95% CI, -0.56 to -0.32). There was no significant change in loneliness (Cohen d, -0.40; 95% CI, -0.17 to 0.06). Marginal increases were seen in home time (β, 42.8; 95% CI, 24.3 to 61.2 minutes) and screen duration (β, 15.4; 95% CI, 4.9 to 25.9 seconds), with considerable within-person variability. No other changes in behavioral or EMA-based features were observed after the detox. In this cohort of young adults, reducing social media use for 1 week was associated with reductions in symptoms of depression, anxiety, and insomnia; however, the durability of these therapeutic outcomes and their associations with behavior warrant further study, particularly in more diverse populations.

Smartphone intervention apps for schizophrenia: A review of the academic literature and app stores

Xia W, Hau C, Burns J, Ryan S, et al.

Schizophrenia research·2025·Review

mindappsschizophrenia

While the interest and use of apps for anxiety and depression accelerated with COVID-19, less is known about the status of apps for people with schizophrenia and psychosis-spectrum disorders. This study aims to offer a comprehensive overview of the research and commercial app marketplaces (Apple, Android) to assess how recent technological advances translate into tools patients can use today. In December 2024, we conducted a narrative review for apps related to schizophrenia and coded a brief overview of the studies, eligibility, outcomes and experiences, engagement and features, attrition and adherence, and app availability. We simultaneously conducted a search on the U.S. Google Play and the U.S. Apple App Store for apps denoted in the research literature and other commercially available apps. The academic literature search yielded 3753 articles, of which 34 were included. Across these 34 studies, 32 unique apps related to schizophrenia and psychosis were featured. A search of the U.S. app marketplaces yielded only one relevant app peer-reviewed in the last decade and that was broadly accessible to patients. To realize the full clinical utility of these apps, it is essential to shift the focus toward their specific features and functionalities, supported by more rigorous research and follow-up studies. There is a pressing need for greater standardization in outcome measures and record-keeping practices to ensure consistency and reliability across the field. While the number of commercially available apps has increased, the lack of robust, large-scale controlled studies and standardized controls has resulted in inconsistent findings. Findings highlight the importance of conducting more controlled studies and randomized clinical trials with appropriate controls to strengthen the evidence base and guide the effective implementation of digital health interventions in mental health care.

Is digital alliance associated with engagement & outcomes in guided digital interventions? An analysis of data from two studies

Macrynikola N, Chang S, Torous J

Journal of affective disorders·2025

digital clinic

Digital interventions have the potential to increase access to care. Despite their demonstrated efficacy in clinical trials, however, they often suffer from low sustained engagement in real-world contexts. Digital alliance (i.e., therapeutic alliance between user and app) may enhance engagement and outcomes, but its role in guided digital interventions (i.e., those with human support) is little understood. Using data from two studies involving the mental health app mindLAMP, we examined digital alliance and its association with engagement and outcomes. In Study 1, mindLAMP was offered as a standalone app with several brief check-ins by a digital navigator (aka coach). In Study 2, mindLAMP was integrated into a brief teletherapy program supported by a clinician and a digital navigator. Digital alliance was assessed near study midpoint with a validated measure. Digital alliance was associated with engagement in both studies. In Study 1, digital alliance predicted subsequent app engagement, b = 0.18, p < .01, adjusting for prior engagement, which remained significant, b = 0.62, p < .001. Similarly, in Study 2, digital alliance predicted subsequent engagement, b = 0.21, p < .01, adjusting for prior engagement, b = 0.22, p < .05. Digital alliance also predicted co-morbid anxiety and depressive symptoms at post-intervention, after adjusting for baseline symptoms, in Study 1, b = -0.27, p < .001, but not in Study 2. Participant demographics were not representative of the general population. Findings underscore the potential of digital alliance in enhancing engagement and outcomes in digital interventions.

Testing the Feasibility, Acceptability, and Potential Efficacy of an Innovative Digital Mental Health Care Delivery Model Designed to Increase Access to Care: Open Trial of the Digital Clinic

Macrynikola N, Chen K, Lane E, Nguyen N, et al.

JMIR mental health·2025

digital clinic

Mental health concerns have become increasingly prevalent; however, care remains inaccessible to many. While digital mental health interventions offer a promising solution, self-help and even coached apps have not fully addressed the challenge. There is now a growing interest in hybrid, or blended, care approaches that use apps as tools to augment, rather than to entirely guide, care. The Digital Clinic is one such model, designed to increase access to high-quality mental health services. To assess the feasibility, acceptability, and potential efficacy of the Digital Clinic model, this study aims to conduct a nonrandomized open trial with participants experiencing depression, anxiety, or both, at various levels of clinical severity. Clinicians were trained in conducting brief transdiagnostic evidence-based treatment augmented by a mental health app (mindLAMP); digital navigators were trained in supporting participants' app engagement and digital literacy while also sharing app data with both patients and clinicians. Feasibility and acceptability of this 8-week program were assessed against a range of benchmarks. Potential efficacy was assessed by calculating pre-post change in symptoms of depression (Patient Health Questionnaire-9; PHQ-9), anxiety (7-item Generalized Anxiety Disorder; GAD-7), and comorbid depression and anxiety (Patient Health Questionnaire Anxiety and Depression Scale; PHQ-ADS), as well as rates of clinically meaningful improvement and remission. Secondary outcomes included change in functional impairment, self-efficacy in managing emotions, and flourishing. Of the 258 enrolled participants, 215 (83.3%) completed the 8-week program. Most were White (n=151, 70.2%) and identified as cisgender women (n=136, 63.3%), with a mean age of 41 (SD 14) years. Feasibility and acceptability were good to excellent across a range of domains. The program demonstrated potential efficacy: the average PHQ-9 score was moderate to moderately severe at baseline (mean 13.39, SD 4.53) and decreased to subclinical (mean 7.79, SD 4.61) by the end of the intervention (t 126 =12.50, P<.001, Cohen d=1.11). Similarly, the average GAD-7 score decreased from moderate at baseline (mean 12.93, SD 3.67) to subclinical (mean 7.35, SD 4.19) by the end of the intervention (t 113 =13, P<.001, Cohen d=1.22). Participation in the program was also associated with high rates of clinically significant improvement and remission. Results suggest that the Digital Clinic model is feasible, acceptable, and potentially efficacious, warranting a future randomized controlled trial to establish the efficacy of this innovative model of care.

Transforming Digital Phenotyping Raw Data Into Actionable Biomarkers, Quality Metrics, and Data Visualizations Using Cortex Software Package: Tutorial

Burns J, Chen K, Flathers M, Currey D, et al.

Journal of medical Internet research·2024

digital phenotyping

As digital phenotyping, the capture of active and passive data from consumer devices such as smartphones, becomes more common, the need to properly process the data and derive replicable features from it has become paramount. Cortex is an open-source data processing pipeline for digital phenotyping data, optimized for use with the mindLAMP apps, which is used by nearly 100 research teams across the world. Cortex is designed to help teams (1) assess digital phenotyping data quality in real time, (2) derive replicable clinical features from the data, and (3) enable easy-to-share data visualizations. Cortex offers many options to work with digital phenotyping data, although some common approaches are likely of value to all teams using it. This paper highlights the reasoning, code, and example steps necessary to fully work with digital phenotyping data in a streamlined manner. Covering how to work with the data, assess its quality, derive features, and visualize findings, this paper is designed to offer the reader the knowledge and skills to apply toward analyzing any digital phenotyping data set. More specifically, the paper will teach the reader the ins and outs of the Cortex Python package. This includes background information on its interaction with the mindLAMP platform, some basic commands to learn what data can be pulled and how, and more advanced use of the package mixed with basic Python with the goal of creating a correlation matrix. After the tutorial, different use cases of Cortex are discussed, along with limitations. Toward highlighting clinical applications, this paper also provides 3 easy ways to implement examples of Cortex use in real-world settings. By understanding how to work with digital phenotyping data and providing ready-to-deploy code with Cortex, the paper aims to show how the new field of digital phenotyping can be both accessible to all and rigorous in methodology.

The Digital Navigator: Standardizing Human Technology Support in App-Integrated Clinical Care

Chen K, Lane E, Burns J, Macrynikola N, et al.

Telemedicine journal and e-health : the official journal of the American Telemedicine Association·2024

digital navigator

Background: Mental health apps offer scalable care, yet clinical adoption is hindered by low user engagement and integration challenges into clinic workflows. Human support staff called digital navigators, trained in mental health technology, could enhance care access and patient adherence and remove workflow burdens from clinicians. While the potential of this role is clear, training staff to become digital navigators and assessing their impact are primary challenges. Methods: We present a detailed manual/framework for implementation of the Digital Navigator within a short-term, cognitive-behavioral therapy-focused hybrid clinic. We analyze patient engagement, satisfaction, and digital phenotyping data quality outcomes. Data from 83 patients, for the period spanning September 2022 to September 2023, included Digital Navigator satisfaction, correlated with demographics, mindLAMP app satisfaction, engagement, and passive data quality. Additionally, average passive data across 33 clinic patients from November 2023 to January 2024 were assessed for missingness. Results: Digital Navigator satisfaction averaged 18.8/20. Satisfaction was not influenced by sex, race, gender, or education. Average passive data quality across 33 clinic patients was 0.82 at the time this article was written. Digital Navigator satisfaction scores had significant positive correlation with both clinic app engagement and perception of that app. Conclusions: Results demonstrate preliminary support and patient endorsement for the Digital Navigator role and positive outcomes around digital engagement and digital phenotyping data quality. Through sharing training resources and standardizing the role, we aim to enable clinicians and researchers to adapt and utilize the Digital Navigator for their own needs.

The impact of mindfulness apps on psychological processes of change: a systematic review

Macrynikola N, Mir Z, Gopal T, Rodriguez E, et al.

Npj mental health research·2024·Review

mindapps

Mindfulness-based interventions (MBIs) have demonstrated therapeutic efficacy for various psychological conditions, and smartphone apps that facilitate mindfulness practice can enhance the reach and impact of MBIs. The goal of this review was to summarize the published evidence on the impact of mindfulness apps on the psychological processes known to mediate transdiagnostic symptom reduction after mindfulness practice. A literature search from January 1, 1993, to August 7, 2023 was conducted on three databases, and 28 randomized controlled trials involving 5963 adults were included. Across these 28 studies, 67 outcome comparisons were made between a mindfulness app group and a control group. Between-group effects tended to favor the mindfulness app group over the control group in three psychological process domains: repetitive negative thinking, attention regulation, and decentering/defusion. Findings were mixed in other domains (i.e., awareness, nonreactivity, non-judgment, positive affect, and acceptance). The range of populations examined, methodological concerns across studies, and problems with sustained app engagement likely contributed to mixed findings. However, effect sizes tended to be moderate to large when effects were found, and gains tended to persist at follow-up assessments two to six months later. More research is needed to better understand the impact of these apps on psychological processes of change. Clinicians interested in integrating apps into care should consider app-related factors beyond evidence of a clinical foundation and use app databases to identify suitable apps for their patients, as highlighted at the end of this review.

An exploratory analysis of the effect size of the mobile mental health Application, mindLAMP

Chang S, Alon N, Torous J

Digital health·2023

mindlamp

Despite the proliferation of mobile mental health apps, evidence of their efficacy around anxiety or depression is inadequate as most studies lack appropriate control groups. Given that apps are designed to be scalable and reusable tools, insights concerning their efficacy can also be assessed uniquely through comparing different implementations of the same app. This exploratory analysis investigates the potential to report a preliminary effect size of an open-source smartphone mental health app, mindLAMP, on the reduction of anxiety and depression symptoms by comparing a control implementation of the app focused on self-assessment to an intervention implementation of the same app focused on CBT skills. A total of 328 participants were eligible and completed the study under the control implementation and 156 completed the study under the intervention implementation of the mindLAMP app. Both use cases offered access to the same in-app self-assessments and therapeutic interventions. Multiple imputations were utilized to impute the missing Generalized Anxiety Disorder-7 and Patient Health Questionnaire-9 survey scores of the control implementation. Post hoc analysis revealed small effect sizes of Hedge's g  = 0.34 for Generalized Anxiety Disorder-7 and Hedge's g  = 0.21 for Patient Health Questionnaire-9 between the two groups. mindLAMP shows promising results in improving anxiety and depression outcomes in participants. Though our results mirror the current literature in assessing mental health apps' efficacy, they remain preliminary and will be used to inform a larger, well-powered study to further elucidate the efficacy of mindLAMP.

Comprehensive Model for Mental health Access and service use (CoMMA): A process model for technology-enhanced mental healthcare

De Witte NAJ, Best P, Torous J, Mulvenna M, et al.

Internet interventions·2026·Review

Over recent decades, mental healthcare reforms have been proposed to facilitate deinstitutionalization, integration into primary care, task-sharing to non-specialist providers and, more recently, digital interventions. All are aimed at improving the accessibility, acceptability and effectiveness of care. However, many healthcare systems still suffer from complexity, rigidity and inequity in access. New and more integrated models of service delivery are needed to fully harness the potential of evidence-based approaches in mental healthcare, including digital or community-based interventions. The current contribution provides an overview of recent developments - in the organization of care, allocation of healthcare services, and the digital transformation of care - and presents the Comprehensive Model for Mental health Access and service use (CoMMA). The process model includes both informal support (e.g., self-help and community care) as well as formal services (e.g., diagnostics and interventions delivered by healthcare professionals). In line with the increasing digital transformation of care, CoMMA also addressed how technology can play a role in the different model components. The purpose of the model is to provide guidance to healthcare systems, professionals and trainees in shaping the provision of evidence-based psychological services and implementing interventions. It hereby aims to complement ongoing societal, regulatory, and economic changes in the healthcare field by providing a conceptual and substantive narrative. The model shows how mental health services can be organized based on current scientific frameworks, policy perspectives, and clinical practice.

An AI-based mental health guardrail and dataset for identifying psychiatric crises in text-based conversations

Nelson BW, Wong C, Silvestrini MT, Shin S, et al.

NPJ digital medicine·2026

Large language models often mishandle psychiatric emergencies, offering harmful or inappropriate advice. This study evaluated the Verily Mental Health Guardrail (VMHG) on two clinician-labeled datasets: the Verily Mental Health Crisis Dataset v1.0, containing 1800 simulated messages and the NVIDIA Aegis AI Content Safety Dataset subsetted to 794 mental health-related messages. Performance was benchmarked against OpenAI Omni Moderation Latest and NVIDIA NeMo Guardrails. The VMHG demonstrated high sensitivity (0.990) and specificity (0.992) on the Verily dataset, with an F1-score of 0.939 and high category-level sensitivity (0.917-0.992) and specificity (≥0.978). On the NVIDIA dataset, it maintained strong sensitivity (0.982) and accuracy (0.921) with reduced specificity (0.859). Compared with NVIDIA and OpenAI guardrails, the VMHG achieved significantly higher sensitivity (all p < 0.001) and comparable specificity (NVIDIA p < 0.001, OpenAI p = 0.094). Overall, the VMHG demonstrated robust, generalizable, and clinically oriented safety performance that prioritizes sensitivity to minimize missed mental health crises.

Investigating Placebos and Controls Used in Large Language Model-Based Chatbot Intervention Trials: Protocol for a Methodological Review

Druart L, Faria V, Annoni M, Torous J, et al.

JMIR research protocols·2026

Large language model (LLM)-based chatbots are rapidly being repurposed as patient-facing digital health tools. Their interactive, adaptive, and seemingly empathic behavior can heighten engagement and expectancy-nonspecific factors that complicate causal inference. Yet, comparator strategies in LLM trials are inconsistently defined and often undermatched (eg, minimal education vs highly engaging chatbots), risking biased effect estimates and poor reproducibility. The aim of this study was to systematically identify and categorize the control conditions used in interventional studies of LLM-based, patient-facing digital health interventions and to evaluate their methodological appropriateness. Secondary aims are to describe variability by health domain and study design and to explore whether control type/quality relates to the direction of reported effects. This protocol follows PRISMA-P (Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols) and is registered in PROSPERO. Eligible studies are interventional designs that evaluate LLM-based, patient-facing digital health interventions; any control condition is eligible (including no control, waitlist, treatment-as-usual, attention/education, active comparator, or sham digital control). We will search PubMed, PsycINFO, CENTRAL, CINAHL, and Scopus for records from January 1, 2023, onward. All records will be managed and screened in Rayyan by 2 independent reviewers. Dual, independent data extraction will target study context, intervention details, and control-arm characteristics (typology, rationale, matching to nonspecifics, blinding, reporting). No formal risk-of-bias assessments are planned, as the focus is on meta-research. At submission, the protocol is registered in PROSPERO and has received no specific funding. Scoping searches are complete; full screening and extraction have not yet commenced. This review will provide an empirical map of control practices in LLM chatbot trials and guidance for designing better-matched comparators, supporting more valid and interpretable evaluations as LLMs diffuse into patient care. PROSPERO CRD420251246148; https://www.crd.york.ac.uk/PROSPERO/view/CRD420251246148. PRR1-10.2196/90507.

…