What Is Machine Learning?
- Understand what machine learning is and when to use it
- Distinguish between supervised, unsupervised, and reinforcement learning
- Map ML concepts to real infrastructure and ops problems
- Know the key terminology used throughout this series
Series Overview
This series is a practical introduction to machine learning for engineers who build and operate systems. Twelve posts, each building on the last — from fundamentals through to deploying ML in production.
The four colour-coded branches map to the structure of the series:
Foundations (yellow) — data preparation, tooling, and end-to-end design
Supervised Learning (green) — classification, regression, neural networks, deep learning
Unsupervised Learning (purple) — clustering, anomaly detection
Applied ML (red) — infrastructure ops, time-series, and ML platforms
Why Machine Learning?
If you run infrastructure, you already have the raw ingredients for machine learning: logs, metrics, time-series data, and patterns that repeat. The difference between traditional automation and ML is how you handle the patterns.
Traditional automation is explicit. You write a rule: if CPU > 90% for 5 minutes, scale up. That works when you know the pattern in advance. Machine learning flips this — you give the system data, and it finds the patterns itself.
This matters when:
The patterns are too complex to write rules for (predicting disk failure from SMART attributes)
The patterns change over time (traffic profiles, market behaviour)
You have more data than you can reason about manually (millions of log lines per day)
ML is not a replacement for understanding your systems. It is a tool for extracting signal from data at a scale and speed that humans cannot match.
The Three Learning Paradigms
Machine learning breaks down into three broad categories based on what kind of data you have and what you are trying to do with it.
Supervised Learning
You have labelled data — inputs paired with known correct outputs. The model learns to map inputs to outputs so it can predict on new, unseen data.
Classification — predicting a category:
Will this server fail in the next 24 hours? (yes/no)
Is this log entry an error, warning, or normal? (multi-class)
Is this network request malicious? (binary)
Regression — predicting a continuous value:
What will CPU usage be in 30 minutes?
How long until this disk is full?
What will the closing price be?
The key requirement is labelled training data. If you have a dataset of past server failures with the metrics that preceded them, you can train a classifier to predict future failures.
from sklearn.tree import DecisionTreeClassifier
# Features: [cpu_avg, mem_pct, disk_io, error_count]
X_train = [
[45, 62, 120, 3], # healthy
[92, 88, 450, 47], # failed
[38, 55, 95, 1], # healthy
[87, 91, 520, 62], # failed
]
y_train = [0, 1, 0, 1] # 0 = healthy, 1 = will fail
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# New server metrics — will it fail?
new_server = [[85, 79, 380, 35]]
prediction = model.predict(new_server)
print(f"Prediction: {'at risk' if prediction[0] else 'healthy'}")This is a trivial example, but the mechanics are real. Every supervised learning problem follows this pattern: features in, labels out, train, predict.
Unsupervised Learning
You have data but no labels. The model finds structure on its own.
Clustering — grouping similar items:
Grouping servers by behaviour profile (which hosts behave similarly?)
Segmenting log entries into categories without predefined types
Finding natural groupings in network traffic
Dimensionality reduction — compressing data while preserving structure:
Reducing 50 system metrics down to the 5 that actually matter
Visualising high-dimensional data in 2D
Anomaly detection — finding outliers:
Identifying unusual deployment patterns
Detecting configuration drift across a fleet
from sklearn.cluster import KMeans
import numpy as np
# Server metrics: [cpu_avg, mem_pct, network_mbps]
servers = np.array([
[25, 30, 50], # low utilisation
[28, 35, 45], # low utilisation
[82, 75, 200], # high utilisation
[88, 80, 220], # high utilisation
[45, 90, 15], # memory-heavy
[40, 88, 20], # memory-heavy
])
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
labels = kmeans.fit_predict(servers)
for i, label in enumerate(labels):
print(f"Server {i}: cluster {label}")No one told the algorithm what the groups should be. It found them from the data.
Reinforcement Learning
An agent learns by interacting with an environment. It takes actions, receives rewards or penalties, and adjusts its strategy.
This is less common in day-to-day ops, but it shows up in:
Auto-scaling policies that learn optimal thresholds over time
Network routing optimisation
Automated trading strategies (exploring which actions maximise returns)
We will not cover reinforcement learning in depth in this series, but it is worth knowing it exists as a third paradigm.
Key Terminology
Before going further, here are the terms you will see throughout this series:
| Term | Meaning |
|---|---|
Feature | A measurable property used as input. CPU usage, memory percentage, and request count are features. |
Label | The known correct answer in supervised learning. "Failed" or "healthy" is a label. |
Training set | The data used to teach the model. |
Test set | Data held back to evaluate how well the model generalises to unseen inputs. |
Model | The learned function that maps features to predictions. |
Overfitting | When a model memorises the training data instead of learning general patterns. It performs well on training data but poorly on new data. |
Underfitting | When a model is too simple to capture the patterns in the data. |
Common Algorithms — A Preview
This series will cover each of these in detail. Here is the landscape:
| Category | Algorithms | Good For |
|---|---|---|
Classification | KNN, Decision Trees, Naive Bayes, Logistic Regression, SVM | Labelled categories — failure prediction, spam detection, log classification |
Regression | Linear Regression, Polynomial Regression, Ridge/Lasso | Predicting continuous values — resource forecasting, pricing |
Clustering | K-Means, DBSCAN, Hierarchical | Finding structure without labels — server grouping, anomaly detection |
Neural Networks | MLP, CNN, RNN, Transformers | Complex patterns — image recognition, time-series, natural language |
Ensemble Methods | Random Forest, XGBoost, LightGBM, CatBoost | Competition-winning accuracy — the workhorses of applied ML |
Videos:
- 3Blue1Brown — But what is a neural network? — visual introduction to how ML models work under the hood.
- StatQuest — Machine Learning Fundamentals — short, clear overview of the ML landscape.
- Google — Introduction to Machine Learning — 7 min crash course with practical framing.
Reading:
- Scikit-learn — An introduction to machine learning — the official getting-started guide.
- Mathematics for Machine Learning (free textbook) — if you want the maths behind the concepts in this series.
1. Classify your own data. Think of a system you manage. List three things you could predict about it (classification or regression). What features would you use? What would the labels be?
2. Find the clusters. Look at your monitoring dashboard. Can you spot natural groupings among your hosts based on resource usage patterns? How many clusters would you expect?
3. Label check. For each prediction task you listed in exercise 1, do you have labelled historical data? If not, how would you create it?
Next
Part 2: Data Pre-processing and Evaluation — how to clean and prepare data, split it for training and testing, and measure whether your model is actually any good.
Comments
Loading comments...