Generative artificial intelligence (GenAI) has been a hard topic to avoid in the media for more than a year. But what do all of the terms mean and what are areas of concern with GenAI tools?
This column aims to provide a baseline explanation of terminology and concepts that are frequently in the media.
GenAI, GENERALLY
First off, what makes GenAI tools different? The primary leap forward for ChatGPT, Google’s Gemini, and other GenAI tools is the “generative” part of the description — basically, these tools are able to create (i.e., generate) new content in a way that artificial intelligence (AI) tools in the past were not able to.
The base for the recent jumps in AI abilities are AI models that have been trained on vast amounts of information. In this usage, “model” means a computer program trained on a set of data to make decisions based on patterns. These models use multiple algorithms to complete tasks or respond to prompts.
And, yes, AI has been in use for years and has been part of the recent steps forward in the usefulness of platforms across the general internet like Google Search and in legal platforms like LexisNexis and Westlaw. The AI that has been used in these products is known as extractive AI and is focused on locating, identifying, and pulling out (i.e., “extracting”) specific data or information from a database. Extractive AI has been used by Google to improve search results and in legal databases like Bloomberg Law, Lexis, and Westlaw in brief analysis tools, predictive search results, headnote creation, and related document lists
MACHINE LEARNING
Machine learning is a field within computer science where computers are fed incredible amounts of data and use algorithms to repeat specific tasks to become increasingly accurate at completing those tasks. This improvement in task completion is the “learning” — imitating the way that humans may learn — as opposed to traditional computer programming, where every possible action needs to be coded by a programmer. Examples of machine learning in everyday interactions include social media feeds, Amazon product result lists, and suggested shows and movies to watch on Netflix.
NATURAL LANGUAGE PROCESSING
Natural language processing is machine learning that allows computers to better communicate with humans via human language. Natural language processing is the basis for digital assistants like Apple’s Siri and Amazon’s Alexa, and what allows chatbots like ChatGPT to sound knowledgeable and confident so as to imitate human thought. Natural language processing allows chatbots to understand human speech via prompt requests and uses its statistical models to predict and generate appropriate responses.
LARGE LANGUAGE MODELS
A large language model (LLM) is a powerful model trained on billions or trillions (or more) of pieces of data in order to be capable of understanding context and making connections to complete a variety of tasks, including the generation of natural language when asked a question. LLMs are, by definition, gigantic data sources designed to interact across a spectrum of tasks and subjects, e.g. Chat GPT. LLMs are also designed to understand text and generate responses as a human would; it can complete tasks such as summarizing documents and videos, writing drafts of documents, making recommendations based on criteria about things such as vacations or recipes, writing computer code, or translating between written languages.
When being trained for language accuracy and fluency, LLMs ingest a tremendous amount of data comprised of written text. In working through the text, the LLM learns grammar, the relationships between words, and logic, semantics, and concepts within the scope of language. The LLM uses this knowledge to probabilistically predict what should come next in terms of words and concepts and generate language responses with which humans can interact.
BIAS AND HALLUCINATION
Bias and hallucination are two problems that concern users of AI systems. Bias in AI systems refers to the systems’ reflection and perpetuation of the human society. The bias in results can be a result of bias in the data used to train the AI model; e.g., the training data may overrepresent or underrepresent categories in relation to the larger population that the training data is meant to represent. The bias in results can also be due to biases coded into the algorithm; e.g., factors may be weighed in an unfair manner that produces flawed results. Examples of AI systems that showed biases include recruiting tools1 and predictive policing.2
Hallucination is when an AI program creates false information in response to a query or a task request. Hallucinations have been hotly discussed in the legal field ever since 2023, when an attorney suing Avianca Airlines on behalf of a client relied on a citation provided by ChatGPT to a case that did not exist; the story landed on the front page of the New York Times.3
Why do hallucinations happen? Simply put, AI systems don’t understand and use language in the same way as humans; they generate answers and text based on learned patterns and act as prediction engines supplying a likely next word in a sentence until that sentence is complete. To the AI, the citation could have existed, but it did not.
Wholesale creation of false information via hallucination is not the only concern when considering AI responses — sometimes AI chatbots are not hallucinating, but just plain wrong. A recent paper4 by Stanford University and Yale University researchers puts this into perspective. The researchers tested Lexis and Westlaw AI tools and found that while there was hallucination in the tools, there was also varying levels of inaccuracy and incompleteness in the responses in terms of citing sources that did not support the claims provided, providing irrelevant responses, or including incorrect information in a response.
RETRIEVAL-AUGMENTED GENERATION
Retrieval-augmented generation (RAG) is considered by many as a way to reduce hallucinations in AI systems by adding a retrieval step into the generation process. Specifically, in RAG, the AI retrieves relevant documents or information and then uses those documents in tandem with the query to generate a response. The response would then have a lower hallucination rate because it uses the smaller universe of relevant material that the AI uses during the creation of its response; a selection of retrieved documents are also provided as citations with the response.
The paper cited previously discusses the shortcomings of RAG in Lexis and Westlaw.5 Problems with RAG are similar to what Lexis and Westlaw users have always had to contend with in searching databases from the beginning — the words used return documents or cases that seem relevant but are not contextually on point.