What Is Machine Learning?

Mar 24, 2026 · 6 min read AI Augmented

🎯 What You Will Learn

Understand what machine learning is and when to use it
Distinguish between supervised, unsupervised, and reinforcement learning
Map ML concepts to real infrastructure and ops problems
Know the key terminology used throughout this series

📋 Prerequisites

None — this is Part 1. Basic programming knowledge helps but is not required. We will set up the Python toolkit in Part 3.

Series Overview

This series is a practical introduction to machine learning for engineers who build and operate systems. Twelve posts, each building on the last — from fundamentals through to deploying ML in production.

The four colour-coded branches map to the structure of the series:

Foundations (yellow) — data preparation, tooling, and end-to-end design
Supervised Learning (green) — classification, regression, neural networks, deep learning
Unsupervised Learning (purple) — clustering, anomaly detection
Applied ML (red) — infrastructure ops, time-series, and ML platforms

Why Machine Learning?

If you run infrastructure, you already have the raw ingredients for machine learning: logs, metrics, time-series data, and patterns that repeat. The difference between traditional automation and ML is how you handle the patterns.

Traditional automation is explicit. You write a rule: if CPU > 90% for 5 minutes, scale up. That works when you know the pattern in advance. Machine learning flips this — you give the system data, and it finds the patterns itself.

This matters when:

The patterns are too complex to write rules for (predicting disk failure from SMART attributes)
The patterns change over time (traffic profiles, market behaviour)
You have more data than you can reason about manually (millions of log lines per day)

ML is not a replacement for understanding your systems. It is a tool for extracting signal from data at a scale and speed that humans cannot match.

The Three Learning Paradigms

Machine learning breaks down into three broad categories based on what kind of data you have and what you are trying to do with it.

Supervised Learning

You have labelled data — inputs paired with known correct outputs. The model learns to map inputs to outputs so it can predict on new, unseen data.

Classification — predicting a category:

Will this server fail in the next 24 hours? (yes/no)
Is this log entry an error, warning, or normal? (multi-class)
Is this network request malicious? (binary)

Regression — predicting a continuous value:

What will CPU usage be in 30 minutes?
How long until this disk is full?
What will the closing price be?

The key requirement is labelled training data. If you have a dataset of past server failures with the metrics that preceded them, you can train a classifier to predict future failures.

from sklearn.tree import DecisionTreeClassifier

# Features: [cpu_avg, mem_pct, disk_io, error_count]
X_train = [
    [45, 62, 120, 3],   # healthy
    [92, 88, 450, 47],  # failed
    [38, 55, 95, 1],    # healthy
    [87, 91, 520, 62],  # failed
]
y_train = [0, 1, 0, 1]  # 0 = healthy, 1 = will fail

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# New server metrics — will it fail?
new_server = [[85, 79, 380, 35]]
prediction = model.predict(new_server)
print(f"Prediction: {'at risk' if prediction[0] else 'healthy'}")

This is a trivial example, but the mechanics are real. Every supervised learning problem follows this pattern: features in, labels out, train, predict.

Unsupervised Learning

You have data but no labels. The model finds structure on its own.

Clustering — grouping similar items:

Grouping servers by behaviour profile (which hosts behave similarly?)
Segmenting log entries into categories without predefined types
Finding natural groupings in network traffic

Dimensionality reduction — compressing data while preserving structure:

Reducing 50 system metrics down to the 5 that actually matter
Visualising high-dimensional data in 2D

Anomaly detection — finding outliers:

Identifying unusual deployment patterns
Detecting configuration drift across a fleet

from sklearn.cluster import KMeans
import numpy as np

# Server metrics: [cpu_avg, mem_pct, network_mbps]
servers = np.array([
    [25, 30, 50],    # low utilisation
    [28, 35, 45],    # low utilisation
    [82, 75, 200],   # high utilisation
    [88, 80, 220],   # high utilisation
    [45, 90, 15],    # memory-heavy
    [40, 88, 20],    # memory-heavy
])

kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
labels = kmeans.fit_predict(servers)

for i, label in enumerate(labels):
    print(f"Server {i}: cluster {label}")

No one told the algorithm what the groups should be. It found them from the data.

Reinforcement Learning

An agent learns by interacting with an environment. It takes actions, receives rewards or penalties, and adjusts its strategy.

This is less common in day-to-day ops, but it shows up in:

Auto-scaling policies that learn optimal thresholds over time
Network routing optimisation
Automated trading strategies (exploring which actions maximise returns)

We will not cover reinforcement learning in depth in this series, but it is worth knowing it exists as a third paradigm.

Key Terminology

Before going further, here are the terms you will see throughout this series:

Term	Meaning
Feature	A measurable property used as input. CPU usage, memory percentage, and request count are features.
Label	The known correct answer in supervised learning. "Failed" or "healthy" is a label.
Training set	The data used to teach the model.
Test set	Data held back to evaluate how well the model generalises to unseen inputs.
Model	The learned function that maps features to predictions.
Overfitting	When a model memorises the training data instead of learning general patterns. It performs well on training data but poorly on new data.
Underfitting	When a model is too simple to capture the patterns in the data.

Term

Meaning

Feature

A measurable property used as input. CPU usage, memory percentage, and request count are features.

Label

The known correct answer in supervised learning. "Failed" or "healthy" is a label.

Training set

The data used to teach the model.

Test set

Data held back to evaluate how well the model generalises to unseen inputs.

Model

The learned function that maps features to predictions.

Overfitting

When a model memorises the training data instead of learning general patterns. It performs well on training data but poorly on new data.

Underfitting

When a model is too simple to capture the patterns in the data.

Common Algorithms — A Preview

This series will cover each of these in detail. Here is the landscape:

Category	Algorithms	Good For
Classification	KNN, Decision Trees, Naive Bayes, Logistic Regression, SVM	Labelled categories — failure prediction, spam detection, log classification
Regression	Linear Regression, Polynomial Regression, Ridge/Lasso	Predicting continuous values — resource forecasting, pricing
Clustering	K-Means, DBSCAN, Hierarchical	Finding structure without labels — server grouping, anomaly detection
Neural Networks	MLP, CNN, RNN, Transformers	Complex patterns — image recognition, time-series, natural language
Ensemble Methods	Random Forest, XGBoost, LightGBM, CatBoost	Competition-winning accuracy — the workhorses of applied ML

Comments

Loading comments...