What Is Machine Learning?

🎯 What You Will Learn
  • Understand what machine learning is and when to use it
  • Distinguish between supervised, unsupervised, and reinforcement learning
  • Map ML concepts to real infrastructure and ops problems
  • Know the key terminology used throughout this series
📋 Prerequisites
None — this is Part 1. Basic programming knowledge helps but is not required. We will set up the Python toolkit in Part 3.

Series Overview

This series is a practical introduction to machine learning for engineers who build and operate systems. Twelve posts, each building on the last — from fundamentals through to deploying ML in production.

ML Fundamentals series mindmap

The four colour-coded branches map to the structure of the series:

  • Foundations (yellow) — data preparation, tooling, and end-to-end design

  • Supervised Learning (green) — classification, regression, neural networks, deep learning

  • Unsupervised Learning (purple) — clustering, anomaly detection

  • Applied ML (red) — infrastructure ops, time-series, and ML platforms

Why Machine Learning?

If you run infrastructure, you already have the raw ingredients for machine learning: logs, metrics, time-series data, and patterns that repeat. The difference between traditional automation and ML is how you handle the patterns.

Traditional automation is explicit. You write a rule: if CPU > 90% for 5 minutes, scale up. That works when you know the pattern in advance. Machine learning flips this — you give the system data, and it finds the patterns itself.

This matters when:

  • The patterns are too complex to write rules for (predicting disk failure from SMART attributes)

  • The patterns change over time (traffic profiles, market behaviour)

  • You have more data than you can reason about manually (millions of log lines per day)

ML is not a replacement for understanding your systems. It is a tool for extracting signal from data at a scale and speed that humans cannot match.

The Three Learning Paradigms

Machine learning breaks down into three broad categories based on what kind of data you have and what you are trying to do with it.

Supervised Learning

You have labelled data — inputs paired with known correct outputs. The model learns to map inputs to outputs so it can predict on new, unseen data.

Classification — predicting a category:

  • Will this server fail in the next 24 hours? (yes/no)

  • Is this log entry an error, warning, or normal? (multi-class)

  • Is this network request malicious? (binary)

Regression — predicting a continuous value:

  • What will CPU usage be in 30 minutes?

  • How long until this disk is full?

  • What will the closing price be?

The key requirement is labelled training data. If you have a dataset of past server failures with the metrics that preceded them, you can train a classifier to predict future failures.

from sklearn.tree import DecisionTreeClassifier

# Features: [cpu_avg, mem_pct, disk_io, error_count]
X_train = [
    [45, 62, 120, 3],   # healthy
    [92, 88, 450, 47],  # failed
    [38, 55, 95, 1],    # healthy
    [87, 91, 520, 62],  # failed
]
y_train = [0, 1, 0, 1]  # 0 = healthy, 1 = will fail

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# New server metrics — will it fail?
new_server = [[85, 79, 380, 35]]
prediction = model.predict(new_server)
print(f"Prediction: {'at risk' if prediction[0] else 'healthy'}")

This is a trivial example, but the mechanics are real. Every supervised learning problem follows this pattern: features in, labels out, train, predict.

Unsupervised Learning

You have data but no labels. The model finds structure on its own.

Clustering — grouping similar items:

  • Grouping servers by behaviour profile (which hosts behave similarly?)

  • Segmenting log entries into categories without predefined types

  • Finding natural groupings in network traffic

Dimensionality reduction — compressing data while preserving structure:

  • Reducing 50 system metrics down to the 5 that actually matter

  • Visualising high-dimensional data in 2D

Anomaly detection — finding outliers:

  • Identifying unusual deployment patterns

  • Detecting configuration drift across a fleet

from sklearn.cluster import KMeans
import numpy as np

# Server metrics: [cpu_avg, mem_pct, network_mbps]
servers = np.array([
    [25, 30, 50],    # low utilisation
    [28, 35, 45],    # low utilisation
    [82, 75, 200],   # high utilisation
    [88, 80, 220],   # high utilisation
    [45, 90, 15],    # memory-heavy
    [40, 88, 20],    # memory-heavy
])

kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
labels = kmeans.fit_predict(servers)

for i, label in enumerate(labels):
    print(f"Server {i}: cluster {label}")

No one told the algorithm what the groups should be. It found them from the data.

Reinforcement Learning

An agent learns by interacting with an environment. It takes actions, receives rewards or penalties, and adjusts its strategy.

This is less common in day-to-day ops, but it shows up in:

  • Auto-scaling policies that learn optimal thresholds over time

  • Network routing optimisation

  • Automated trading strategies (exploring which actions maximise returns)

We will not cover reinforcement learning in depth in this series, but it is worth knowing it exists as a third paradigm.

Key Terminology

Before going further, here are the terms you will see throughout this series:

TermMeaning

Feature

A measurable property used as input. CPU usage, memory percentage, and request count are features.

Label

The known correct answer in supervised learning. "Failed" or "healthy" is a label.

Training set

The data used to teach the model.

Test set

Data held back to evaluate how well the model generalises to unseen inputs.

Model

The learned function that maps features to predictions.

Overfitting

When a model memorises the training data instead of learning general patterns. It performs well on training data but poorly on new data.

Underfitting

When a model is too simple to capture the patterns in the data.

Common Algorithms — A Preview

This series will cover each of these in detail. Here is the landscape:

CategoryAlgorithmsGood For

Classification

KNN, Decision Trees, Naive Bayes, Logistic Regression, SVM

Labelled categories — failure prediction, spam detection, log classification

Regression

Linear Regression, Polynomial Regression, Ridge/Lasso

Predicting continuous values — resource forecasting, pricing

Clustering

K-Means, DBSCAN, Hierarchical

Finding structure without labels — server grouping, anomaly detection

Neural Networks

MLP, CNN, RNN, Transformers

Complex patterns — image recognition, time-series, natural language

Ensemble Methods

Random Forest, XGBoost, LightGBM, CatBoost

Competition-winning accuracy — the workhorses of applied ML

📚 Resources

Videos:

  • 3Blue1Brown — But what is a neural network? — visual introduction to how ML models work under the hood.
  • StatQuest — Machine Learning Fundamentals — short, clear overview of the ML landscape.
  • Google — Introduction to Machine Learning — 7 min crash course with practical framing.

Reading:

  • Scikit-learn — An introduction to machine learning — the official getting-started guide.
  • Mathematics for Machine Learning (free textbook) — if you want the maths behind the concepts in this series.
🔬 Try It Yourself

1. Classify your own data. Think of a system you manage. List three things you could predict about it (classification or regression). What features would you use? What would the labels be?

2. Find the clusters. Look at your monitoring dashboard. Can you spot natural groupings among your hosts based on resource usage patterns? How many clusters would you expect?

3. Label check. For each prediction task you listed in exercise 1, do you have labelled historical data? If not, how would you create it?

Next

Part 2: Data Pre-processing and Evaluation — how to clean and prepare data, split it for training and testing, and measure whether your model is actually any good.

Next in series Data Pre-processing and Evaluation

Part 2 of the ML Fundamentals series. Cleaning messy data, selecting features, splitting datasets, and measuring whether your model is actually any good.

Comments

Loading comments...