Operationalizing machine-assisted translation in healthcare

Rawal, S. et al. Association between limited English proficiency and revisits and readmissions after hospitalization for patients with acute and chronic conditions in Toronto, Ontario, Canada. JAMA 322, 1605–1607 (2019).
Google Scholar
Lion, K. C., Lin, Y.-H. & Kim, T. Artificial intelligence for language translation: the equity is in the details. JAMA 332, 1427–1428 (2024).
Google Scholar
Flores, G. The impact of medical interpreter services on the quality of health care: a systematic review. Med. Care Res. Rev.62, 255–299 (2005).
Google Scholar
Schulson, L. B. & Anderson, T. S. National estimates of professional interpreter use in the ambulatory setting. J. Gen. Intern. Med. 37, 472–474 (2022).
Google Scholar
Diamond, L. C., Schenker, Y., Curry, L., Bradley, E. H. & Fernandez, A. Getting by: underuse of interpreters by resident physicians. J. Gen. Intern. Med. 24, 256–262 (2009).
Google Scholar
Detz, A. et al. Language concordance, interpersonal care, and diabetes self-care in rural Latino patients. J. Gen. Intern. Med. 29, 1650–1656 (2014).
Google Scholar
Betancourt, J. R., Green, A. R., Carrillo, J. E. & Ananeh-Firempong, O. Defining cultural competence: a practical framework for addressing racial/ethnic disparities in health and health care. Public Health Rep. 118, 293–302 (2003).
Google Scholar
Molina, R. L. & Kasper, J. The power of language-concordant care: a call to action for medical schools. BMC Med. Educ. 19, 378 (2019).
Google Scholar
Harvey, S. M., Branch, M. R., Hudson, D. & Torres, A. Listening to immigrant Latino men in rural Oregon: exploring connections between culture and sexual and reproductive health services. Am. J. Mens. Health 7, 142–154 (2013).
Google Scholar
Gavvala, S. Ensuring understanding: Language-concordant discharge instructions. Rice Univ. Baker Inst. Public Policy, Issue Brief. (2023).
Karpińska, P. Computer aided translation – possibilities, limitations and changes in the field of professional translation. J. Educ. Cult. Soc. 8, 133–142 (2017).
Google Scholar
Davis, S. H. et al. Translating discharge instructions for limited English-proficient families: strategies and barriers. Hosp. Pediatr. 9, 779–787 (2019).
Google Scholar
Choe, A. Y. et al. Improving discharge instructions for hospitalized children with limited english proficiency. Hosp. Pediatr. 11, 1213–1222 (2021).
Google Scholar
Diamond, L. C., Wilson-Stronks, A. & Jacobs, E. A. Do hospitals measure up to the national culturally and linguistically appropriate services standards?. Med. Care 48, 1080–1087 (2010).
Google Scholar
Rights (OCR), O. for C. Summary of Guidance to Federal Financial Assistance Recipients Regarding Title VI and the prohibition against national origin discrimination affecting limited English proficient persons. (2007).
Wu, Y. et al. Google’s neural machine translation system: bridging the gap between human and machine translation. Preprint at (2016).
Koehn, P. & Knowles, R. Six challenges for neural machine translation. In Proc. First Workshop on Neural Machine Translation (eds. Luong, T., Birch, A., Neubig, G. & Finch, A.) 28–39 (Association for Computational Linguistics, Vancouver, 2017). https://doi.org/10.18653/v1/W17-3204.
Vaswani, A. et al. Attention is all you need. in Advances in Neural Information Processing Systems 30 (Curran Associates, Inc., 2017).
Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450 (2025).
Google Scholar
Brewster, R. C. L. et al. Performance of ChatGPT and Google Translate for Pediatric Discharge Instruction Translation. Pediatrics 154, e2023065573 (2024).
Google Scholar
Ortega, J. E., Castro Mamani, R. & Cho, K. Neural machine translation with a polysynthetic low resource language. Mach. Transl. 34, 325–346 (2020).
Google Scholar
Adebara, I., Abdul-Mageed, M. & Silfverberg, M. Linguistically-Motivated Yorùbá-English Machine Translation. In Proc. of the 29th International Conference on Computational Linguistics (eds Calzolari, N. et al.) 5066–5075 (International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 2022).
Goh, E. et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat. Med. 1–6 (2025).
Savage, T. et al. Fine tuning large language models for medicine: the role and importance of direct preference optimization. Preprint at (2024).
Mirza, F. N. et al. Using ChatGPT to facilitate truly informed medical consent. NEJM AI 1, AIcs2300145 (2024).
Google Scholar
Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, 1134–1142 (2024).
Google Scholar
Zaretsky, J. et al. Generative artificial intelligence to transform inpatient discharge summaries to patient-friendly language and format. JAMA Netw. Open 7, e240357 (2024).
Google Scholar
Nondiscrimination in Health Programs and Activities. Federal Register (2024).
Damschroder, L. J., Reardon, C. M., Widerquist, M. A. O. & Lowery, J. The updated consolidated framework for implementation research based on user feedback. Implement. Sci. 17, 75 (2022).
Google Scholar
Xu, Z., Jain, S. & Kankanhalli, M. Hallucination is inevitable: an innate limitation of large language models. Preprint at (2024).
Liu, N. F. et al. Lost in the middle: how language models use long contexts. Trans. Assoc. Comput. Linguist. 12, 157–173 (2024).
Google Scholar
Levy, A., Agrawal, M., Satyanarayan, A. & Sontag, D. Assessing the impact of automated suggestions on decision making: domain experts mediate model errors but take less initiative. In Proc. 2021 CHI Conference on Human Factors in Computing Systems 1–13 (Association for Computing Machinery, New York, NY, USA, 2021). https://doi.org/10.1145/3411764.3445522.
Kuperman, G. J. et al. Medication-related clinical decision support in computerized provider order entry systems: a review. J. Am. Med. Inform. Assoc. 14, 29–40 (2007).
Google Scholar
Data controls in the OpenAI platform – OpenAI API. https://platform.openai.com.
Ng, M. Y., Helzer, J., Pfeffer, M. A., Seto, T. & Hernandez-Boussard, T. Development of secure infrastructure for advancing generative AI research in healthcare at an academic medical center. Res. Sq. rs.3.rs-5095287 (2024).
Vedula, K. S. et al. Distilling large language models for efficient clinical information extraction. Preprint at (2024).
Woods, A. P. et al. Limited English proficiency and clinical outcomes after hospital-based care in English-speaking countries: a systematic review. J. Gen. Intern. Med. 37, 2050–2061 (2022).
Google Scholar
Manuel, S. P., Nguyen, K., Karliner, L. S., Ward, D. T. & Fernandez, A. Association of English language proficiency with hospitalization cost, length of stay, disposition location, and readmission following total joint arthroplasty. JAMA Netw. Open 5, e221842 (2022).
Google Scholar
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. In Proc. 36th International Conference on Neural Information Processing Systems 22199–22213 (Curran Associates Inc., Red Hook, NY, USA, 2022).
Bakken, S. AI in health: keeping the human in the loop. J. Am. Med. Inform. Assoc. 30, 1225–1226 (2023).
Google Scholar
Swaminathan, A. et al. Natural language processing system for rapid detection and intervention of mental health crisis chat messages. NPJ Digit. Med. 6, 1–9 (2023).
Google Scholar
BigQuery enterprise data warehouse. Google Cloud https://cloud.google.com/bigquery.
The Snowflake AI Data Cloud – Mobilize Data, Apps, and AI. https://www.snowflake.com/content/snowflake-site/global/en.
Create Your Azure Free Account Or Pay As You Go | Microsoft Azure. https://azure.microsoft.com/en-us/pricing/purchase-options/azure-account/search.
Zhang, X., Rajabi, N., Duh, K. & Koehn, P. Machine Translation with Large Language Models: Prompting, Few-shot Learning, and Fine-tuning with QLoRA. In Proc. Eighth Conference on Machine Translation (eds. Koehn, P., Haddow, B., Kocmi, T. & Monz, C.) 468–481 (Association for Computational Linguistics, Singapore, 2023). https://doi.org/10.18653/v1/2023.wmt-1.43.
Rafailov, R. et al. Direct preference optimization: your language model is secretly a reward model. In Proc. 37th International Conference on Neural Information Processing Systems 53728–53741 (Curran Associates Inc., Red Hook, NY, USA, 2023).
Looker Studio. Google for Developers https://developers.google.com/looker-studio.
Lommel, A. R., Burchardt, A. & Uszkoreit, H. Multidimensional quality metrics: a flexible system for assessing translation quality. In Proc. Translating and the Computer 35 (Aslib, London, UK, 2013).
Chen, X., Acosta, S. & Barry, A. E. Evaluating the accuracy of Google translate for diabetes education material. JMIR Diabetes 1, e3 (2016).
Google Scholar
Lopez, I., Haredasht, F. N., Caoili, K., Chen, J. H. & Chaudhari, A. Embedding-driven diversity sampling to improve few-shot synthetic data generation. Preprint at (2025).
Popović, M. chrF++: words helping character n-grams. In Proc. Second Conference on Machine Translation (eds. Bojar, O. et al.) 612–618 (Association for Computational Linguistics, Copenhagen, Denmark, 2017). https://doi.org/10.18653/v1/W17-4770.
Rei, R., Stewart, C., Farinha, A. C. & Lavie, A. COMET: A Neural Framework for MT Evaluation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds. Webber, B., Cohn, T., He, Y. & Liu, Y.) 2685–2702 (Association for Computational Linguistics, Online, 2020). https://doi.org/10.18653/v1/2020.emnlp-main.213.
Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics (eds. Isabelle, P., Charniak, E. & Lin, D.) 311–318 (Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002). https://doi.org/10.3115/1073083.1073135.
Mathur, N., Baldwin, T. & Cohn, T. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds. Jurafsky, D., Chai, J., Schluter, N. & Tetreault, J.) 4984–4997 (Association for Computational Linguistics, Online, 2020). https://doi.org/10.18653/v1/2020.acl-main.448.
Lopez, I. et al. Clinical entity augmented retrieval for clinical information extraction. NPJ Digit. Med. 8, 1–11 (2025).
Google Scholar
Swaminathan, A. et al. Selective prediction for extracting unstructured clinical data. J. Am. Med. Inform. Assoc. 31, 188–197 (2024).
Google Scholar
Bates, B. A. et al. Validity of International Classification of Diseases (ICD)-10 diagnosis codes for identification of acute heart failure hospitalization and heart failure with reduced versus preserved ejection fraction in a national medicare sample. Circ. Cardiovasc. Qual. Outcomes 16, e009078 (2023).
Google Scholar
Gothe, H. et al. Algorithms to identify COPD in health systems with and without access to ICD coding: a systematic review. BMC Health Serv. Res. 19, 737 (2019).
Google Scholar
Shoemaker, S. J., Wolf, M. S. & Brach, C. Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ. Couns. 96, 395–403 (2014).
Google Scholar
Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1 (2023).
Carrell, D. et al. Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text. J. Am. Med. Inform. Assoc. 20, 342–348 (2013).
Google Scholar
National Standards for Culturally and Linguistically Appropriate Services (CLAS) in Health and Health Care. Federal Register (2013).
Li, Z. et al. Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages. In Special Track on AI Alignment 28186–28194 (Association for the Advancement of Artificial Intelligence, 2025). https://doi.org/10.1609/aaai.v39i27.35038.
Xie, Y. et al. Weakly supervised scene text generation for low-resource languages. Expert Syst. Appl. 237, 121622 (2024).
Google Scholar
Khoong, E. C. & Rodriguez, J. A. A research agenda for using machine translation in clinical medicine. J. Gen. Intern. Med. 37, 1275–1277 (2022).
Google Scholar
link