Skip to Main Content

A Guide to AI for Gonzaga Students

About This Guide

AI! It's everywhere, and seemingly everyone is using it — sometimes getting awarded, sometimes getting busted.

This guide will explain what generative AI is, how it works under the hood, what it does well, and what it's not so good at. The goal is to demystify this powerful technology, empowering you with the knowledge needed to make informed, ethical, and reflective decisions about when and how to use it best.

What is Generative AI?

Generative artificial intelligence is AI that is designed to create (i.e. generate) new data. This differs from AI models common before generative AI, which are designed to make predictions about existing data, for example to state how likely a certain image is to be of a cow or a dog, or how likely a borrower was to default on a loan.

Generative AI creates new information, usually in the form of text (Large Language Models such as ChatGPT) or images (image generation models such as DALL•E).

What is a Large Language Model?

The GPT in ChatGPT stands for Generative, Pre-Trained Transformer. This bit of technobabble actually describes how a Large Language Model works. 

Large Language Models are generative, that is, they create data. The data they create is human language. In other words, LLMs are human language simulatorsThey represent words as numbers, then do calculations on those numbers in order to predict which words are most probabilistically likely to occur next. 

Put simply, Large Language Models are very complex and very capable versions of the autocomplete ability that has been in our phones for years. Large Language Models have already been trained (pre-trained) on a massive amounts of human language, allowing them to trace the patterns in human language, i.e. the probabilistic likelihood of words being close to each other. They are then able to predictively duplicate or continue those patterns.

What sets LLMs apart from the old autocomplete systems is their magical-seeming ability to work with large amounts of language and its context, which is made possible by a neural network architecture called a transformer. Transformers, first described in a Google paper in 2017, are how an LLM is able to work with words in relation to their context, in order to more accurately predict the next word.

To use the example from the Google paper in which transformers were firest introduced, if you see the sentence, "I arrived at the bank after crossing the river," what type of "bank" do you think of? If you said "a river bank," you likely noticed the proximity of the word "river" to the work "bank" in that sentence. "River" told you something about "bank." Before transformers, this kind of inference was very hard for computers to do. They had no way of knowing that "bank" was a river bank and not a financial institution or a piggy bank. They wouldn't even know if it was a noun, or a verb, like the motion a plane makes when it turns. But transformers allow LLMs to work with words in their context. A transformer step in an LLM sees the proximity of the word "bank" to the word "river" and creates a new mathematical representation of the word "bank" which bakes in that relationship. It will then pass that new representation forward, ready for the next step in the prediction process.

A diagram showing the flow of information through a transformer

The above illustration of a similar sentence, from understandingai.org, traces the flow of information through multiple layers of transformers, from bottom to top. First, each word in the sentence is broken down into a mathematical representation (called a token). Given where "cash" sits in the pattern of the sentence, the transformer infers that it is a verb. The transformer then creates a new token for the word "cash" that locates it as the verb cash in vector space, and passes it forward to the next layer. The next layer of transformer now has a new version of "cash" that it can use to infer that "bank" is a financial institution.

The more complex LLMs have hundreds of these transformer layers, and vast context windows (the amount of tokens they can work with at once), allowing them can do these kinds of contextual pattern analyses on very large blocks of text, giving a magical-seeming appearance of contextual understanding to the patterns of language that they generate.

But remember: they are not doing language the way we are. They are doing math about patterns, to predict the next part of a sequence.

Video Overview

Accessibility | Proxy Logout