LISRC Dubai

AI & Data Science Glossary

Plain-English definitions of the 50 most important AI, Machine Learning, and Data Science terms, written by LISRC's Dubai training team for beginners and professionals.

All terms

Artificial Intelligence (AI)
Computer systems that perform tasks normally requiring human intelligence, such as understanding language, recognizing images, making predictions, and generating content.
AGI (Artificial General Intelligence)
A hypothetical AI capable of understanding and learning any intellectual task a human can, rather than excelling at one narrow task.
AI Agent
An AI system that can plan, take actions, and use tools (like browsers, code, or APIs) to complete multi-step tasks with limited human input.
Agentic AI
AI built around autonomous agents that decide, act, and iterate toward goals — for example reading data, drafting emails, and updating systems without step-by-step instructions.
Algorithm
A defined sequence of steps a computer follows to solve a problem or perform a calculation — the building block of all software and machine learning.
API (Application Programming Interface)
A standardized way for software systems to talk to each other. AI models like GPT and Claude are commonly accessed through APIs.
Big Data
Datasets too large or fast-moving for traditional tools, requiring distributed technologies such as Hadoop and Spark to store and analyze.
Business Intelligence (BI)
The practice of turning company data into dashboards, reports, and insights for decision-making, using tools such as Power BI and Tableau.
Chatbot
A software application that converses with users in natural language. Modern chatbots like ChatGPT are powered by large language models.
ChatGPT
OpenAI's conversational AI assistant based on GPT large language models, widely used for writing, analysis, coding, and research.
Computer Vision
The field of AI that enables machines to interpret images and video — used in face recognition, quality inspection, medical imaging, and self-driving cars.
Context Window
The amount of text (measured in tokens) a language model can consider at once. Larger context windows let models work with longer documents and conversations.
Data Analytics
Examining datasets to find patterns, answer business questions, and support decisions — typically with Excel, SQL, Python, and BI dashboards.
Data Engineering
Building the pipelines and infrastructure that collect, clean, store, and move data so analysts and data scientists can use it reliably.
Data Science
An interdisciplinary field combining statistics, programming, and domain knowledge to extract insight and build predictive models from data.
Data Visualization
Presenting data as charts, graphs, and dashboards so patterns and trends are easy to see and communicate.
Deep Learning
A branch of machine learning using multi-layered neural networks to learn complex patterns — the technology behind modern image, speech, and language AI.
Embeddings
Numerical representations of text, images, or other data that capture meaning, allowing AI systems to measure similarity and power search and recommendations.
Feature Engineering
Selecting and transforming raw data into the input variables (features) that help a machine learning model make better predictions.
Fine-tuning
Further training a pre-trained AI model on specific data so it performs better on a particular task, domain, or style.
Generative AI (GenAI)
AI that creates new content — text, images, audio, video, or code — rather than only analyzing existing data. ChatGPT and Midjourney are examples.
GPU (Graphics Processing Unit)
A processor designed for parallel computation, originally for graphics, now essential for training and running AI models efficiently.
Hallucination
When an AI model confidently produces information that is false or fabricated. Reducing hallucinations is a key focus of techniques like RAG.
Inference
Running a trained AI model to produce outputs (predictions or generated content), as opposed to the training phase where the model learns.
LLM (Large Language Model)
An AI model trained on massive text datasets to understand and generate human language. GPT, Claude, and Gemini are large language models.
Machine Learning (ML)
A subset of AI where systems learn patterns from data and improve with experience instead of being explicitly programmed with rules.
MCP (Model Context Protocol)
An open standard that lets AI assistants securely connect to external tools and data sources, so they can read files, query systems, and take actions.
MLOps
The practice of deploying, monitoring, and maintaining machine learning models in production — combining ML with DevOps engineering discipline.
Model
The trained artifact produced by machine learning: a mathematical function that maps inputs (like an email) to outputs (like 'spam' or 'not spam').
Natural Language Processing (NLP)
The field of AI focused on understanding and generating human language — powering translation, sentiment analysis, chatbots, and summarization.
Neural Network
A machine learning model inspired by the brain, made of layers of connected nodes (neurons) that transform inputs into predictions.
No-code AI
Building AI-powered apps and automations using visual tools (like Lovable, Bolt, or n8n) instead of writing traditional code.
Overfitting
When a model memorizes its training data instead of learning general patterns, causing it to perform poorly on new, unseen data.
Power BI
Microsoft's business intelligence platform for building interactive dashboards and reports, widely used by data analysts in the UAE job market.
Prompt Engineering
Designing effective instructions (prompts) for AI models to get accurate, useful, and consistent outputs — a core modern workplace skill.
Python
The most widely used programming language in AI and data science, known for readable syntax and libraries like Pandas, Scikit-Learn, and TensorFlow.
PyTorch
An open-source deep learning framework developed by Meta, popular in research and production for building and training neural networks.
RAG (Retrieval-Augmented Generation)
A technique where an AI model retrieves relevant documents from a knowledge base before answering, improving accuracy and reducing hallucinations.
Reinforcement Learning
Training method where an AI agent learns by trial and error, receiving rewards for good actions — used in robotics, games, and model alignment.
Scikit-Learn
A popular Python library providing ready-to-use implementations of classic machine learning algorithms for classification, regression, and clustering.
SQL (Structured Query Language)
The standard language for querying and managing data in relational databases — a foundational skill for every data analyst and data scientist.
Supervised Learning
Machine learning where models train on labeled examples (inputs paired with correct answers), such as emails labeled spam or not spam.
Tableau
A leading data visualization platform for building interactive dashboards, commonly paired with Power BI skills in analytics roles.
TensorFlow
Google's open-source framework for building and deploying machine learning and deep learning models at scale.
Token
The unit of text an LLM processes — roughly a word fragment. Model pricing, speed, and context windows are measured in tokens.
Training Data
The dataset used to teach a machine learning model. Its quality and coverage largely determine how well the model performs.
Transformer
The neural network architecture introduced in 2017 that underpins modern LLMs, using attention mechanisms to process language in parallel.
Unsupervised Learning
Machine learning that finds structure in unlabeled data — for example clustering customers into segments without predefined categories.
Vector Database
A database optimized for storing and searching embeddings, enabling semantic search and RAG applications — examples include Pinecone and pgvector.
Vibe Coding
Building software by describing what you want to AI coding tools in natural language and iterating on the result, rather than hand-writing every line.

https://lisrc.ae/ai-glossary