AI Concepts

This page explains AI concepts that might be helpful to get started. It is not intended to be a comprehensive overview of AI.

Fundamentals

Artificial Intelligence (AI) is a subfield in computer science focused on building computer systems capable of intelligent behavior. AI encompasses various techniques and approaches aimed at creating machines capable of learning, reasoning, problem-solving, perceiving, and interacting with the environment in intelligent ways.

Experimentation: refers to the empirical-science process of trial and error to improve a system and comparing metrics along the way until a desired outcome is reached. In the ML world this is a coming practice as machine learning models are probabilistic and require experimentation and tuning to reduce the error rate.

Machine Learning (ML) refers to a set of techniques in AI to learn and act without being explicitly programmed. This involves building algorithms that adapt their models on sample data to improve their ability to make predictions. ML Model: are computer algorithms that take data as input and return a prediction. For example, the YouTube Recommendations ML model takes your video viewing history as input and returns a list of videos you are most likely to enjoy next.

Terminology

Foundation model: A machine learning model trained on an extremely large corpus of data (such as the entire internet!) to learn general high level concepts from the data.

Generative AI (GenAI): a type of AI system capable of generating text, media, or other data modality in response to prompts. These models are also known as "Generative AI" (GenAI) models, because of their abilities to generate information based on unstructured inputs.

Transformers: Is model architecture design which enabled many breakthroughs in language modeling and eventually the ability to build large models for any data modality.

Large Language Model (LLM): refers to foundation models trained on large amounts of text data, consisting of billions of parameters. Given a prompt, LLMs can generate text and perform text-based tasks. GPT-3.5 (ChatGPT) and PaLM 2 (Bard) are LLMs.

GPT: referring to Generative pre-trained transformers. It's a technical implementation detail for today's state-of-the-art foundational models. Due to the popularity of "ChatGPT" and "GPT3", it has also become a marketing term to signify an advanced AI model. In the future there will likely be other kinds of foundational models that don't rely on transformers.

Prompt Engineering: Also known as In-Context Prompting, refers to methods for how to communicate with a Large Language Model to steer its behavior for desired outcomes without updating the model weights or retraining the model. Prompt engineering methods can vary a lot among models, thus requiring heavy experimentation and heuristics.

Zero-shot Prompting: It simply refers to feeding the task text to the model and ask for results on the first try.

Few-shot Prompting: Also known as instruction prompting presents a set of high-quality demonstrations, each consisting of both input and desired output, on the target task. As the model first sees good examples, it can better understand human intention and criteria for what kinds of answers are wanted. Therefore, few-shot learning often leads to better performance than zero-shot.

Chain-of-thought Prompting: generates a sequence of short sentences to describe reasoning logics step by step, known as reasoning chains or rationales, to eventually lead to the final answer. The benefit of chain-of-thought prompting is more pronounced for complicated reasoning tasks when using large language models.

Parameters Tuning: Change the parameters of a model (such as the input prompt format, the variability or "temperature" of responses, the number of tokens to keep in memory or generate, etc.). Think of tweaking a model like changing the bass, treble, volume of a speaker. You're not changing the speaker itself, just some of its settings.

Context Injection: Sometimes referred to as memory augmentation which means providing custom data to the model, so it has specific context to reason over. This can also be called "prompt injection", where you insert the relevant context before asking a question. For example, you can tune a model by giving it a Vitamix blender user manual, and asking it about the voltage and dimensions of the model. As with tweaking, tuning doesn't result in a new model. It just gives additional context to the model to improve its performance for specific domains.

Fine-tune: With fine-tuning, you give a model new data (such as a csv containing prompts and expected responses). You then get a new model that has learned from this training data. Usually, you would start your experiments with tweaking, see if that's sufficient, and then progressively proceed to tuning and fine-tuning until you're happy with the performance of the model.

Attention: Self-attention is a type of attention mechanism where the model makes predictions for one part of a data sample using other parts of the observation about the same sample. Simply put, If you want to predict the next word, you look at X amount of words that came before to inform your prediction.

Encoding (Positional encoding): Because transformer models use self-attention mechanisms for prediction, it is important to encode and provide order information to the model to predict the next word. The ordered position is learned by the model in earlier steps, or can be learned by a separate model. These learned orders are referred to as embeddings, and it’s represented in ventors.

Context Window (Memory): refers to the number of learned positional encodings (embeddings) a model can access in a single shot to make a prediction.