We Use ChatGPT in Medical Practice -- And You Can Too

— These tools can make us better physicians

 A photo of a man sitting in front of a laptop displaying the ChatGPT application.
Bair and Djulbegovic are resident physicians and digital health researchers. Taylor Gonzalez is a medical student and AI researcher.

The spread of ChatGPT to near-ubiquity is extraordinary. While it's been less than a year since its release, more than 9 out of 10 companies currently recruiting are looking for competency with ChatGPT. The immense potential of this tool to transform medicine hasn't escaped notice, with the New England Journal of Medicine recently launching an article series addressing the challenges and opportunities of artificial intelligence (AI) tools like ChatGPT in medicine, and various articles in JAMA discussing issues ranging from its legal implications to diagnostic capabilities. ChatGPT is far from alone -- alternative large language models (LLMs) such as Google's Bard, Perplexity, Claude 2, and many others offer ever-expanding use cases and specialized applications.

Nevertheless, step into day-to-day, on-the-ground clinical work, and ChatGPT is all but absent.

As newly minted physicians in the first months of our internship (Bair and Djulbegovic), we find it hard to ignore the rapidly growing abilities of LLMs like ChatGPT and their tremendous potential to make us better physicians. We urge clinical trainees to experiment with implementing ChatGPT into their daily workflow, and argue that clinical training programs ought to provide trainees with the resources to do so.

Having spent the period between our Match Day and the commencement of residency exploring ChatGPT, we conceived of ways to apply ChatGPT soon after assuming patient care responsibilities.

From the outset, we found that one of the most straightforward ways to use ChatGPT was to have it help with creating checklists for working up common presentations. Prompting ChatGPT with a brief one-liner encapsulating a patient's relevant past medical history and presenting symptoms yields diagnoses in addition to comprehensive lists of notable history and physical examination findings, blood tests and imaging studies to consider, consults to call, and immediate steps to stabilize the patient or provide symptom relief. The benefits are threefold. First, by clearly and sequentially organizing the workup, charting becomes significantly more efficient. Second, by catching steps we may have missed, the quality of our workup is improved. Third, by providing an extensive range of possible diagnoses, the effects of cognitive biases are reduced.

With the advent of plugins such as BrowserPilot, which allows ChatGPT to access live websites in real-time, and ScholarAI, which allows ChatGPT to search through peer-reviewed articles on PubMed and other research databases, we became able to access management guidelines pertaining to patient-specific issues. For example, by asking plugin-enabled ChatGPT about guidelines for venous thromboembolism prevention in cancer patients undergoing surgery, ChatGPT provided recommendations from both the American Society of Hematology and the American College of Chest Physicians, including direct links to the published guidelines.

Yet another area in which ChatGPT has proven valuable is in documentation. By generating templates for common clinical scenarios such as diabetic ketoacidosis and appendicitis, documentation is standardized and expedited. As another example, when we discharge patients, we typically write summaries of their hospital course describing our diagnostic and treatment interventions. ChatGPT can instantaneously generate explanations of disease entities and treatment plans in language that is easily understood by the lay person, further aiding in patient education.

As powerful as ChatGPT might be at summarizing and re-organizing existing information, it has its limitations. ChatGPT currently lacks the ability to synthesize all relevant information of patients with particularly complex medical histories together with the mechanics of institutional-specific practices. For instance, ChatGPT may provide an exhaustive list of all possible studies for working up an autoimmune disorder; but almost always, we rely on what is best described as "clinical gestalt" to determine the tests that are truly worthwhile to perform -- and which are feasible given resource considerations. The reliability of ChatGPT is also limited to relatively common patient presentations, symptoms, and diseases, though we expect this to change as the corpus of knowledge on disease entities continues to flourish. On a more intuitive level, ChatGPT is also incapable of integrating the psychosocial components of medical care. While the algorithm can recommend textbook recommendations for treatments of a certain cancer, for instance, it is up to us, the human physicians, to tease out the direction of care most consistent with a patient's beliefs about what quality of life means to him or her. These limitations necessarily mean ChatGPT must be regarded as a tool for the preliminary stages of medical decision making, and never as its final adjudicator.

What does this mean for future doctors? First, we believe that prompt engineering is an indispensable skill when it comes to using LLMs. Prompt engineering involves designing questions, statements, or instructions to guide AI models in generating specific and valuable responses. We also believe that the effective future doctor will need to know how to quickly find reliable clinical information and craft coherent treatment plans tailored to the values and preferences of a patient. It is critical to remember that whether or not we embrace tools like ChatGPT, patients are already using them. We must understand where and how patients are utilizing health information and help them make sense of a potentially overwhelming amount of data.

Finally, with information discovery and retrieval becoming increasingly simple, the effective future doctor ought to focus on cultivating expertise in navigating challenging patient encounters, leading goals of care conversations particularly around serious illnesses and at the end of life, breaking bad news, and coordinating interdisciplinary care for patients. In other words, ChatGPT and other LLMs present an opportunity for us to further refine the aspects of medicine that make it a fundamentally humanistic enterprise.

Although these are still the early days of LLMs, our fellow trainees and current clinicians should begin experimenting with how they can facilitate daily workflow. Prompt engineering is a learnable skill -- primarily through self-experimentation for now, given the dearth of organized frameworks for its instruction. But beyond that, we encourage open sharing of experiences among trainees and clinicians, so that new uses of LLMs are continually developed. It is our hope that graduate medical education programs, with their already built-in hours of didactic sessions and noon conferences, create modules for effective and ethical use of AI and ChatGPT in clinical work. The idea is not as far-fetched as it may seem. DeepLearning.ai, for example, already offers an online course on ChatGPT prompt engineering geared towards software developers; Vanderbilt University offers a course on prompt engineering for general and education purposes. On a more systemic level, hospitals may also consider implementing ChatGPT in various clinical contexts and inviting resident physicians to participate in their iteration. Hospitals from Stanford Health Care to UNC Health have already begun doing this. More recently, Microsoft announced an LLM-powered clinical note-generator that integrates with Epic, the most widely used electronic health record system in the U.S.

LLMs have profound potential to augment our diagnostic processes, streamline clinical documentation, and access a wealth of medical knowledge. Yet, far from replacing the physician's job, we believe this tool frees up valuable time and cognitive resources for us to focus on the most human aspects of care. The future of medicine with AI need not be a sterile, mechanized one; it can be a future where technology and the human touch coalesce, where the promise of technology is realized not in its ability to mimic human intelligence, but in its capacity to amplify our most unique capabilities. It is up to us to explore and learn these tools, ensuring that the heart of medicine beats stronger in a world increasingly driven by data and algorithms.

Henry Bair, MD, MBA, and Mak Djulbegovic, MD, MSc, are resident physicians at Wills Eye Hospital/Jefferson Health in Philadelphia. Bair is a recent graduate of Stanford University School of Medicine, where he directed several courses on digital health. Djulbegovic is a recent graduate of the University of Miami Miller School of Medicine, where his research focused on medical applications of LLMs. David Taylor Gonzalez is a medical student at the University of Miami Miller School of Medicine, where he is a researcher in the Artificial Intelligence and Computer Augmented Vision Laboratory at the Bascom Palmer Eye Institute.