A Gentle Tour of AI Fundamentals

Goal —Provide a clear, progressive travel‑guide through the landscape of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL). Each section builds on the previous one so that newcomers can read straight through without back‑tracking.

1 What Is Artificial Intelligence?

Artificial Intelligence (AI) is the broad scientific quest to create machines that perform tasks we regard as intelligent: reasoning, perception, language understanding, planning and learning from experience.

Classical branches

Natural Language Processing (NLP) – parse, interpret and generate human language.
Computer Vision – extract meaning from images and video.
Robotics – sense‑plan‑act loops for physical agents.
Expert Systems – rule‑based decision engines drawing on specialist knowledge.

2 From AI ➜ ML ➜ DL — How They Relate

Layer	Main Idea	Typical Output
AI	Any approach that mimics human intellect	A chatbot, a chess engine, a self‑driving car
ML	Sub‑field of AI; algorithms that learn patterns from data instead of explicit rules	Email spam filter, loan‑default predictor
DL	Sub‑field of ML; multi‑layer neural networks that learn hierarchical features from raw data	Image captioning, large language models

The pyramid is nested: DL ⟶ ML ⟶ AI.

3 Data Foundations (Before Any Model!)

Data Quality & Cleaning – handle missing values, fix outliers, deduplicate.
Feature Engineering & Encoding – scaling, normalisation, one‑hot encoding, domain‑specific transforms.
Data Augmentation – synthesise new examples (e.g., image flips, noise injection) to improve generalisation.
Dataset Splits – maintain train / validation / test partitions to measure generalisation honestly.

Rule of thumb: Better data beats cleverer algorithms.

4 Core Learning Paradigms

4.1 Supervised Learning

Learns from labelled examples (feature ➜ known output).

Workflow

Collect & clean data
Split → train/val/test
Train model on train
Tune hyper‑parameters on val
Report metrics on test
Deploy & monitor

Key Concepts

Term	One‑Line Explanation
Features	Measurable attributes (pixels, words, sensor values)
Labels	Ground‑truth answers (cat/dog, price)
Overfitting	Memorising training noise; poor generalisation
Regularisation	Penalty that discourages overly complex models
Cross‑Validation	Rotate train/val splits for robustness

Classification vs Regression

Classification → discrete categories (spam / not‑spam)
Regression → continuous values (house price)

Popular Algorithms

Linear & Logistic Regression • Decision Trees / Random Forest • Gradient Boosting (XGBoost, LightGBM, CatBoost) • Support Vector Machines • k‑Nearest Neighbours • Neural Networks

Evaluation Cheatsheet

Problem	Top Metrics
Classification	accuracy, precision, recall, F1, ROC‑AUC
Regression	MAE, MSE, RMSE, R²

4.2 Unsupervised Learning

Finds structure in unlabelled data.

Task	Goal	Example
Clustering	group similar items	customer segmentation
Dimensionality Reduction	compress data while preserving structure	visualise high‑D bio‑markers
Anomaly Detection	flag out‑of‑pattern cases	credit‑card fraud alert

Popular Algorithms
k‑Means • DBSCAN • Hierarchical Clustering • Gaussian Mixtures • PCA • t‑SNE • UMAP • Isolation Forest • One‑Class SVM • Autoencoders

FYI – Anomalies come in point, contextual and collective flavours.

4.3 Reinforcement Learning

An agent learns via trial‑and‑error in an environment to maximise cumulative reward.

Component	Role
Agent	Learner / decision maker
Environment	The world it acts in (maze, game, traffic)
State	Environment snapshot
Action	Choice the agent can make
Reward	Scalar feedback after action
Policy	Mapping state → action

Core ideas: exploration vs exploitation, discount factor, value functions.

Algorithms: Q‑Learning • SARSA • Deep Q‑Network (DQN) • Proximal Policy Optimisation (PPO) • Actor–Critic • AlphaZero self‑play.

4.4 Emerging Paradigms (2025 & Beyond)

Paradigm	Essence	Real‑World Spark
Semi‑Supervised	small labelled + large unlabelled pool	medical imaging (few radiologist labels)
Self‑Supervised	generate labels from data itself	large language models predicting next‑word
Transfer Learning	reuse pre‑trained weights on new task	fine‑tuning BERT for legal texts

5 Deep Learning in Focus

Deep Learning (DL) uses stacked layers of artificial neurons (a deep neural network) to discover increasingly abstract features directly from raw signals—pixels, waveforms, tokens, tabular rows—without heavy hand‑crafted engineering.

5.1 Key Architectures & Their Superpowers

Family	Core Idea	Signature Strengths	Everyday Examples
Convolutional Neural Networks (CNNs)	Learn local spatial filters shared across the image	Image, video & audio recognition; robustness to translation	Face unlock, medical X‑ray triage, self‑driving lane detection
Recurrent Nets (LSTM / GRU)	Maintain a hidden state that rolls through time	Sequential & time‑series data; can remember long‑range context	Speech‑to‑text, stock‑price forecasting, IoT sensor health
Transformers	Self‑attention weighs relationships between all tokens in parallel	Scales to massive context windows; excels at language & (now) vision	Chatbots, code completion, image captioning
Graph Neural Networks (GNNs)	Message passing over edges in a graph	Relational reasoning on irregular structures	Fraud detection in payment graphs, protein folding
Autoencoders / Diffusion Models	Compress then reconstruct data, or model noise → data trajectories	Generative tasks: synthesis, denoising, upscaling	Stable Diffusion art, image super‑resolution

Tip —Most real‑world systems combine several families (e.g., CNN feature extractor feeding a Transformer for multi‑modal reasoning).

5.2 Optimising Deep Nets

Backpropagation + Stochastic Gradient Descent (SGD, Adam, Lion).
Regularisation tricks: dropout, batch/layer norm, weight decay, data augmentation.
Hardware: GPUs, TPUs, ASICs accelerate tensor math; quantisation & pruning shrink models for the edge.
Scaling Laws: test loss ∝ (data × compute)^‑β—bigger is better until bandwidth or data quality becomes the limiting factor.

5.3 Large Language Models (LLMs)

LLMs are giant Transformer decoders (billions → trillions of parameters) trained with self‑supervised objectives (next‑token prediction, masked‑token infilling) on web‑scale corpora.

Phase	What Happens	Outcome
Pre‑Training	Predict tokens across trillions of words/code/images	Model acquires general world & layout knowledge
Alignment / RLHF	Fine‑tune with human feedback & rules	Safer, more helpful behaviour
Task‑Specific Fine‑Tuning	Smaller dataset, domain or instruction prompts	Legal‑QA bot, medical triage assistant

LLMs underpin chat assistants, retrieval‑augmented generation (RAG) systems, code copilots, and multimodal models like GPT‑4o that ingest text and vision.

Why here? LLMs exemplify DL’s ability to emerge capabilities (reasoning, few‑shot learning) once scale surpasses critical thresholds—illustrating the continuum from classical DL to state‑of‑the‑art AI.

6 Quick Glossary

Epoch – one full pass over the training set.
Batch Size – number of examples processed before model update.
Learning Rate – step size in parameter optimisation.
Loss Function – objective the model tries to minimise.

TL;DR

AI is the goal, ML the vehicle, DL the turbo‑charged engine inside ML.
High‑quality data is the fuel.
Supervised, Unsupervised and Reinforcement Learning cover most modelling needs; emerging paradigms close remaining gaps.
Deep Learning dominates unstructured data.
Responsible AI and lifecycle discipline turn prototypes into trusted products.