==============================================================================================
RAHUL'S ML BLOG -- notes on machine learning, worked out by hand est. 2026
==============================================================================================
home | about | archive | glossary | contact
----------------------------------------------------------------------------------------------
THE DECODER RING
Plain words on the left. The buzzwords they map to on the right.
============================================================================================
On this blog the idea always comes first and the label comes last -- so every post ends
with its own little "The Labels, Last" box. This page gathers all of them into one
master decoder, laid out in reading order.
Two ways to use it. If you have read the posts and want the textbook term to drop into a
resume, a paper, or an interview, scan left to right. If you arrived already fluent in
jargon and want to know what on earth a "straight-stick rule" is, scan right to left.
Either way: the plain word and the buzzword name the SAME thing. One is just wearing a
suit.
A. THE SETUP -- data, and the one honest rule
---------------------------------------------
sheet of written-down numbers dataset / design matrix X
one measured column feature / predictor / independent variable
right-answer column target / label / dependent variable (y)
working pile training set
hidden / sealed / exam pile test set / hold-out set
rotating folds k-fold cross-validation
memorising the working pile overfitting
too stiff -- always a bit wrong underfitting / high bias
too jumpy -- swings with the pile high variance
mistake on rows never seen generalisation error
letting the exam pile leak in data leakage
B. THE TWO GUESSERS -- ask-closest and straight-stick
-----------------------------------------------------
ask-closest rule k-nearest neighbours (KNN)
store the pile, look it up non-parametric / instance-based / lazy learner
gap between two rows Euclidean distance (L2); city-block = L1/Manhattan
straight-stick rule linear regression / ordinary least squares (OLS)
dials weights / coefficients (beta)
fixed nudge intercept / bias term (beta0)
put columns on the same ruler feature scaling / standardisation / normalisation
bowl-shaped mistake convex loss surface
flat-point equations normal equations (X^T X beta = X^T y)
flat-shadow thrower hat matrix / orthogonal projection
right-angle leftovers orthogonal residuals
cut the sheet, don't flip it QR / SVD decomposition (vs matrix inverse)
shakiness number condition number (kappa)
best flat honest rule BLUE (Gauss-Markov theorem)
add a bump to the diagonal ridge regression / L2-penalised least squares
C. GRADING A GUESS -- scoring a number
--------------------------------------
size of the miss error / residual
miss, squared and averaged mean squared error (MSE)
...with the root put back root mean squared error (RMSE)
...not squared, just averaged mean absolute error (MAE)
the always-average fool baseline / mean predictor
total wobble total sum of squares (TSS)
leftover -- the bad part residual sum of squares (RSS)
what the stick ate -- the good part explained sum of squares (ESS)
slice of wobble eaten R^2 / coefficient of determination
docked for each extra dial adjusted R^2
too stiff vs too jumpy bias-variance trade-off
D. READING THE DIALS -- what coefficients say
---------------------------------------------
dial on a column coefficient / weight
the bare row of dials model.coef_
dials after same-ruler standardised (beta) coefficients
columns moving as a pack collinearity / multicollinearity
hold the others fixed ceteris paribus / partial effect
E. SORTING INTO BINS -- the yes/no machine
------------------------------------------
sort into bins / bin-sorter classification / classifier
S-curve yes/no guesser logistic regression
squash curve sigmoid / logistic function
dial sum (z) log-odds / logit
cross-entropy leftover binary cross-entropy / log-loss
cutoff / fence decision threshold
CAUGHT true positive (TP)
ALARM false positive (FP)
MISSED false negative (FN)
CLEAR true negative (TN)
four-box table confusion matrix
F. THE FOUR SCORES AND THE CURVES
---------------------------------
catch-rate recall / sensitivity / true positive rate (TPR)
when we cry sick, real sick? precision / positive predictive value
false-alarm rate false positive rate (FPR) / (1 - specificity)
balance of precision and recall F1 score (harmonic mean of the two)
tilt toward recall or precision F-beta score
chance-score posterior probability / predict_proba
trade curve ROC curve (receiver operating characteristic)
area under the trade curve AUC (area under the ROC curve)
pick-higher reading Wilcoxon-Mann-Whitney statistic
precision-vs-recall curve precision-recall (PR) curve
area under the PR curve average precision (AP) / AUCPR
G. HUMBLING THE MACHINE -- leash and cloud
------------------------------------------
leash on the dials L2 regularisation / ridge penalty
dial-size price penalty term, lambda * sum(beta_j^2)
leash tightness (the inverse) C in sklearn (C = 1 / lambda)
two-cloud wall linear discriminant analysis (LDA)
cloud centre class mean (mu_k)
pooled within-class spread within-class scatter matrix (S_W)
which way to aim the wall Fisher's criterion
common-ness nudge log-prior term, log(pi1 / pi0)
models P(x | class) generative model
models P(class | x) directly discriminative model
H. THE MESSY WORLD -- tuning, scaling, imbalance, many bins
-----------------------------------------------------------
setting I pick by hand hyperparameter
dial the machine tunes itself parameter
grid hunt grid search (GridSearchCV)
standard ruler standardisation / StandardScaler
pinch-to-fit ruler min-max scaling / MinMaxScaler
skewed pile class imbalance / imbalanced dataset
upweight the rare class class_weight='balanced'
treat-all-classes-equal averaging macro averaging
count-every-label averaging micro averaging
size-weighted averaging weighted averaging
I. HUMBLE DIALS AND WOBBLE BANDS -- regularisation and uncertainty
-----------------------------------------------------------------
the leash / a fine for big dials regularisation / penalised regression
Ridge -- the square fine L2 regularisation / ridge regression
Lasso -- the absolute fine L1 regularisation / lasso regression
the knob a (alpha) regularisation strength (alpha / lambda)
bump on the diagonal (X^T X + alpha*I)^-1 X^T y
snap a weak dial to zero sparsity / automatic feature selection
count the survivors number of non-zero coefficients
re-deal with repeats bootstrap / sampling with replacement
200 versions of a dial the bootstrap distribution
the 95% wobble band 95% confidence interval
chop 2.5% each end the 2.5th and 97.5th percentiles
band crosses zero not statistically distinguishable from 0
the ~37% left out out-of-bag (OOB) sample
the free exam out-of-bag error estimate
the flat-bottom solve the normal equations
(X^T X)^-1 X^T y ordinary-least-squares (OLS) solution
the untangler inverse of the Gram matrix (X^T X)
the LADDER (one column alone) univariate / simple-regression slope
stubborn driver / too stiff high bias
panicky driver / too jumpy high variance
the whole trade the bias-variance trade-off
J. QUESTION CHARTS AND COMMITTEES -- trees, ensembles, interpretation
---------------------------------------------------------------------
question chart decision tree / CART
the whole build hunt (.fit) recursive binary splitting
one yes/no question per room a split / a node
the final pile at the bottom a leaf / terminal node
badness (sum of squared misses) residual sum of squares for a split
midpoints of the column candidate split thresholds
champion of champions the best split (lowest impurity)
depth (questions deep) max_depth hyperparameter
mixed-ness / how mixed Gini impurity
the drop in mixing information gain
share yes / share no class proportion (p, q)
the alpha tax per leaf cost-complexity pruning (ccp_alpha)
staircase of snip-points effective-alpha sequence (ccp_alphas)
fire a doctor prune a subtree
keep the sick:well ratio even stratified train/test split
deal with repeats x200, average bootstrap aggregating / bagging
sticky note (pile of people) bootstrap sample (with replacement)
vote of many charts ensemble prediction
bagging + hide columns per cut random forest (max_features)
line of fixers (charts in sequence) gradient boosting
stump (one question only) weak learner / decision stump
trust-each-fixer knob learning rate / shrinkage
where validation error bottoms out early stopping / best iteration
scramble a column, re-grade permutation feature importance
sweep a column, average guesses partial dependence plot (PDP)
one line per person individual conditional expectation (ICE)
credit per cut, summed over trees mean decrease in impurity (feature_importances_)
K. FINDING PATTERNS WITHOUT ANSWERS -- unsupervised learning, distances, PCA
----------------------------------------------------------------------------
sheet with no answer column unlabeled data / unsupervised learning
middle of a column mean / average
spread of a column standard deviation (std / sigma)
range (max - min) feature range
straight-line gap between two rows Euclidean distance (L2)
city-block gap Manhattan distance (L1)
sheet of gaps pairwise distance matrix
put every column on the same ruler z-score / StandardScaler
fair gap standardized Euclidean distance
crush a many-wall room to flat dimensionality reduction
longest shadow / strongest direction first principal component (PC1)
second shadow (at right angle) second principal component (PC2)
the recipe for a shadow loadings / components_
each row's coordinate on the shadow score / transformed data
how much each shadow carries explained variance ratio (PVE)
keep shadows until 80% captured cumulative PVE threshold
blow the shadow back up inverse transform / reconstruction
blurriness after blowing up reconstruction error (MSE)
column importance in the recipe loading magnitude (absolute value)
put k flags, join each point to nearest k-means clustering
move each flag to the centre of its crowd k-means update step
pick k by the elbow elbow method for k
home vs next-door fit score silhouette coefficient
pick k at the highest fit peak silhouette analysis
family tree of rows hierarchical clustering / dendrogram
glue the closest pair together linkage (single / complete / average)
read the tree top-down, choose cut cut the dendrogram at a height
fill in the blanks matrix completion / imputation
guess what a user likes recommender system / collaborative filtering
## A Note on the Plain Words
>> NOTE: THE PLAIN WORDS ARE NOT BABY-TALK
"Straight-stick rule" is not a dumbed-down stand-in for linear regression -- it is
linear regression, named for what it does instead of who invented it. The buzzword is
a handle: short, shared, and useful once you already hold the idea. Learn the idea by
its plain name first, pick up the handle here, and you will never again mistake the
label for the understanding.
----------------------------------------------------------------------------------------------
<- Back to all posts . About this blog
----------------------------------------------------------------------------------------------
(c) 2026 Rahul Rai . pure HTML+CSS, no JavaScript, no trackers .
home . source on GitHub
==============================================================================================