LISRC Dubai

AI & Data Science Glossary

Name: London Institute of AI & Data Science
Address: Oaks Liwa Heights, Units 3507 & 3203, Jumeirah Lakes Towers, Dubai, Dubai, AE
Telephone: +971589583070

Plain-English definitions of the 50 most important AI, Machine Learning, and Data Science terms, written by LISRC's Dubai training team for beginners and professionals.

50 AI, ML, and Data Science terms defined in plain English
Covers LLMs, RAG, agents, prompt engineering, MLOps, and analytics tools
Maintained by LISRC, a KHDA-permitted Dubai training institute

All terms

Artificial Intelligence (AI): Computer systems that perform tasks normally requiring human intelligence, such as understanding language, recognizing images, making predictions, and generating content.
AGI (Artificial General Intelligence): A hypothetical AI capable of understanding and learning any intellectual task a human can, rather than excelling at one narrow task.
AI Agent: An AI system that can plan, take actions, and use tools (like browsers, code, or APIs) to complete multi-step tasks with limited human input.
Agentic AI: AI built around autonomous agents that decide, act, and iterate toward goals — for example reading data, drafting emails, and updating systems without step-by-step instructions.
Algorithm: A defined sequence of steps a computer follows to solve a problem or perform a calculation — the building block of all software and machine learning.
API (Application Programming Interface): A standardized way for software systems to talk to each other. AI models like GPT and Claude are commonly accessed through APIs.
Big Data: Datasets too large or fast-moving for traditional tools, requiring distributed technologies such as Hadoop and Spark to store and analyze.
Business Intelligence (BI): The practice of turning company data into dashboards, reports, and insights for decision-making, using tools such as Power BI and Tableau.
Chatbot: A software application that converses with users in natural language. Modern chatbots like ChatGPT are powered by large language models.
ChatGPT: OpenAI's conversational AI assistant based on GPT large language models, widely used for writing, analysis, coding, and research.
Computer Vision: The field of AI that enables machines to interpret images and video — used in face recognition, quality inspection, medical imaging, and self-driving cars.
Context Window: The amount of text (measured in tokens) a language model can consider at once. Larger context windows let models work with longer documents and conversations.
Data Analytics: Examining datasets to find patterns, answer business questions, and support decisions — typically with Excel, SQL, Python, and BI dashboards.
Data Engineering: Building the pipelines and infrastructure that collect, clean, store, and move data so analysts and data scientists can use it reliably.
Data Science: An interdisciplinary field combining statistics, programming, and domain knowledge to extract insight and build predictive models from data.
Data Visualization: Presenting data as charts, graphs, and dashboards so patterns and trends are easy to see and communicate.
Deep Learning: A branch of machine learning using multi-layered neural networks to learn complex patterns — the technology behind modern image, speech, and language AI.
Embeddings: Numerical representations of text, images, or other data that capture meaning, allowing AI systems to measure similarity and power search and recommendations.
Feature Engineering: Selecting and transforming raw data into the input variables (features) that help a machine learning model make better predictions.
Fine-tuning: Further training a pre-trained AI model on specific data so it performs better on a particular task, domain, or style.
Generative AI (GenAI): AI that creates new content — text, images, audio, video, or code — rather than only analyzing existing data. ChatGPT and Midjourney are examples.
GPU (Graphics Processing Unit): A processor designed for parallel computation, originally for graphics, now essential for training and running AI models efficiently.
Hallucination: When an AI model confidently produces information that is false or fabricated. Reducing hallucinations is a key focus of techniques like RAG.
Inference: Running a trained AI model to produce outputs (predictions or generated content), as opposed to the training phase where the model learns.
LLM (Large Language Model): An AI model trained on massive text datasets to understand and generate human language. GPT, Claude, and Gemini are large language models.
Machine Learning (ML): A subset of AI where systems learn patterns from data and improve with experience instead of being explicitly programmed with rules.
MCP (Model Context Protocol): An open standard that lets AI assistants securely connect to external tools and data sources, so they can read files, query systems, and take actions.
MLOps: The practice of deploying, monitoring, and maintaining machine learning models in production — combining ML with DevOps engineering discipline.
Model: The trained artifact produced by machine learning: a mathematical function that maps inputs (like an email) to outputs (like 'spam' or 'not spam').
Natural Language Processing (NLP): The field of AI focused on understanding and generating human language — powering translation, sentiment analysis, chatbots, and summarization.
Neural Network: A machine learning model inspired by the brain, made of layers of connected nodes (neurons) that transform inputs into predictions.
No-code AI: Building AI-powered apps and automations using visual tools (like Lovable, Bolt, or n8n) instead of writing traditional code.
Overfitting: When a model memorizes its training data instead of learning general patterns, causing it to perform poorly on new, unseen data.
Power BI: Microsoft's business intelligence platform for building interactive dashboards and reports, widely used by data analysts in the UAE job market.
Prompt Engineering: Designing effective instructions (prompts) for AI models to get accurate, useful, and consistent outputs — a core modern workplace skill.
Python: The most widely used programming language in AI and data science, known for readable syntax and libraries like Pandas, Scikit-Learn, and TensorFlow.
PyTorch: An open-source deep learning framework developed by Meta, popular in research and production for building and training neural networks.
RAG (Retrieval-Augmented Generation): A technique where an AI model retrieves relevant documents from a knowledge base before answering, improving accuracy and reducing hallucinations.
Reinforcement Learning: Training method where an AI agent learns by trial and error, receiving rewards for good actions — used in robotics, games, and model alignment.
Scikit-Learn: A popular Python library providing ready-to-use implementations of classic machine learning algorithms for classification, regression, and clustering.
SQL (Structured Query Language): The standard language for querying and managing data in relational databases — a foundational skill for every data analyst and data scientist.
Supervised Learning: Machine learning where models train on labeled examples (inputs paired with correct answers), such as emails labeled spam or not spam.
Tableau: A leading data visualization platform for building interactive dashboards, commonly paired with Power BI skills in analytics roles.
TensorFlow: Google's open-source framework for building and deploying machine learning and deep learning models at scale.
Token: The unit of text an LLM processes — roughly a word fragment. Model pricing, speed, and context windows are measured in tokens.
Training Data: The dataset used to teach a machine learning model. Its quality and coverage largely determine how well the model performs.
Transformer: The neural network architecture introduced in 2017 that underpins modern LLMs, using attention mechanisms to process language in parallel.
Unsupervised Learning: Machine learning that finds structure in unlabeled data — for example clustering customers into segments without predefined categories.
Vector Database: A database optimized for storing and searching embeddings, enabling semantic search and RAG applications — examples include Pinecone and pgvector.
Vibe Coding: Building software by describing what you want to AI coding tools in natural language and iterating on the result, rather than hand-writing every line.

https://lisrc.ae/ai-glossary