LISRC Dubai
AI & Data Science Glossary
Plain-English definitions of the 50 most important AI, Machine Learning, and Data Science terms, written by LISRC's Dubai training team for beginners and professionals.
- 50 AI, ML, and Data Science terms defined in plain English
- Covers LLMs, RAG, agents, prompt engineering, MLOps, and analytics tools
- Maintained by LISRC, a KHDA-permitted Dubai training institute
All terms
- Artificial Intelligence (AI)
- Computer systems that perform tasks normally requiring human intelligence, such as understanding language, recognizing images, making predictions, and generating content.
- AGI (Artificial General Intelligence)
- A hypothetical AI capable of understanding and learning any intellectual task a human can, rather than excelling at one narrow task.
- AI Agent
- An AI system that can plan, take actions, and use tools (like browsers, code, or APIs) to complete multi-step tasks with limited human input.
- Agentic AI
- AI built around autonomous agents that decide, act, and iterate toward goals — for example reading data, drafting emails, and updating systems without step-by-step instructions.
- Algorithm
- A defined sequence of steps a computer follows to solve a problem or perform a calculation — the building block of all software and machine learning.
- API (Application Programming Interface)
- A standardized way for software systems to talk to each other. AI models like GPT and Claude are commonly accessed through APIs.
- Big Data
- Datasets too large or fast-moving for traditional tools, requiring distributed technologies such as Hadoop and Spark to store and analyze.
- Business Intelligence (BI)
- The practice of turning company data into dashboards, reports, and insights for decision-making, using tools such as Power BI and Tableau.
- Chatbot
- A software application that converses with users in natural language. Modern chatbots like ChatGPT are powered by large language models.
- ChatGPT
- OpenAI's conversational AI assistant based on GPT large language models, widely used for writing, analysis, coding, and research.
- Computer Vision
- The field of AI that enables machines to interpret images and video — used in face recognition, quality inspection, medical imaging, and self-driving cars.
- Context Window
- The amount of text (measured in tokens) a language model can consider at once. Larger context windows let models work with longer documents and conversations.
- Data Analytics
- Examining datasets to find patterns, answer business questions, and support decisions — typically with Excel, SQL, Python, and BI dashboards.
- Data Engineering
- Building the pipelines and infrastructure that collect, clean, store, and move data so analysts and data scientists can use it reliably.
- Data Science
- An interdisciplinary field combining statistics, programming, and domain knowledge to extract insight and build predictive models from data.
- Data Visualization
- Presenting data as charts, graphs, and dashboards so patterns and trends are easy to see and communicate.
- Deep Learning
- A branch of machine learning using multi-layered neural networks to learn complex patterns — the technology behind modern image, speech, and language AI.
- Embeddings
- Numerical representations of text, images, or other data that capture meaning, allowing AI systems to measure similarity and power search and recommendations.
- Feature Engineering
- Selecting and transforming raw data into the input variables (features) that help a machine learning model make better predictions.
- Fine-tuning
- Further training a pre-trained AI model on specific data so it performs better on a particular task, domain, or style.
- Generative AI (GenAI)
- AI that creates new content — text, images, audio, video, or code — rather than only analyzing existing data. ChatGPT and Midjourney are examples.
- GPU (Graphics Processing Unit)
- A processor designed for parallel computation, originally for graphics, now essential for training and running AI models efficiently.
- Hallucination
- When an AI model confidently produces information that is false or fabricated. Reducing hallucinations is a key focus of techniques like RAG.
- Inference
- Running a trained AI model to produce outputs (predictions or generated content), as opposed to the training phase where the model learns.
- LLM (Large Language Model)
- An AI model trained on massive text datasets to understand and generate human language. GPT, Claude, and Gemini are large language models.
- Machine Learning (ML)
- A subset of AI where systems learn patterns from data and improve with experience instead of being explicitly programmed with rules.
- MCP (Model Context Protocol)
- An open standard that lets AI assistants securely connect to external tools and data sources, so they can read files, query systems, and take actions.
- MLOps
- The practice of deploying, monitoring, and maintaining machine learning models in production — combining ML with DevOps engineering discipline.
- Model
- The trained artifact produced by machine learning: a mathematical function that maps inputs (like an email) to outputs (like 'spam' or 'not spam').
- Natural Language Processing (NLP)
- The field of AI focused on understanding and generating human language — powering translation, sentiment analysis, chatbots, and summarization.
- Neural Network
- A machine learning model inspired by the brain, made of layers of connected nodes (neurons) that transform inputs into predictions.
- No-code AI
- Building AI-powered apps and automations using visual tools (like Lovable, Bolt, or n8n) instead of writing traditional code.
- Overfitting
- When a model memorizes its training data instead of learning general patterns, causing it to perform poorly on new, unseen data.
- Power BI
- Microsoft's business intelligence platform for building interactive dashboards and reports, widely used by data analysts in the UAE job market.
- Prompt Engineering
- Designing effective instructions (prompts) for AI models to get accurate, useful, and consistent outputs — a core modern workplace skill.
- Python
- The most widely used programming language in AI and data science, known for readable syntax and libraries like Pandas, Scikit-Learn, and TensorFlow.
- PyTorch
- An open-source deep learning framework developed by Meta, popular in research and production for building and training neural networks.
- RAG (Retrieval-Augmented Generation)
- A technique where an AI model retrieves relevant documents from a knowledge base before answering, improving accuracy and reducing hallucinations.
- Reinforcement Learning
- Training method where an AI agent learns by trial and error, receiving rewards for good actions — used in robotics, games, and model alignment.
- Scikit-Learn
- A popular Python library providing ready-to-use implementations of classic machine learning algorithms for classification, regression, and clustering.
- SQL (Structured Query Language)
- The standard language for querying and managing data in relational databases — a foundational skill for every data analyst and data scientist.
- Supervised Learning
- Machine learning where models train on labeled examples (inputs paired with correct answers), such as emails labeled spam or not spam.
- Tableau
- A leading data visualization platform for building interactive dashboards, commonly paired with Power BI skills in analytics roles.
- TensorFlow
- Google's open-source framework for building and deploying machine learning and deep learning models at scale.
- Token
- The unit of text an LLM processes — roughly a word fragment. Model pricing, speed, and context windows are measured in tokens.
- Training Data
- The dataset used to teach a machine learning model. Its quality and coverage largely determine how well the model performs.
- Transformer
- The neural network architecture introduced in 2017 that underpins modern LLMs, using attention mechanisms to process language in parallel.
- Unsupervised Learning
- Machine learning that finds structure in unlabeled data — for example clustering customers into segments without predefined categories.
- Vector Database
- A database optimized for storing and searching embeddings, enabling semantic search and RAG applications — examples include Pinecone and pgvector.
- Vibe Coding
- Building software by describing what you want to AI coding tools in natural language and iterating on the result, rather than hand-writing every line.