==============================================================================================
RAHUL'S ML BLOG -- notes on machine learning, worked out by hand est. 2026
==============================================================================================
home | about | archive | glossary | contact
----------------------------------------------------------------------------------------------
ABOUT THIS BLOG
Last updated: 2026-06-06
Who Am I
--------
Hi, I'm Rahul Rai. I write about machine learning from first principles -- not the
hype version, the actual nuts-and-bolts. What *is* a rule, really? Why hide part of
the pile? What is the machine actually doing when it guesses?
One rule on this blog: no jargon until it's been earned. If I can't explain something
in words a farmer in 1953 would understand, it hasn't been explained yet. So I ban the
buzzwords, draw the idea, solve it by pencil, then write the steps -- and only at the
very end do I stick the textbook labels on.
What This Blog Is
-----------------
- ML fundamentals, derived by hand before any toolbox is opened
- ASCII diagrams and pencil math, then the real Python
- Code that actually ran, producing results that actually work
- Honest about what I don't know
The 1950 Contract
-----------------
Pretend it is 1950. Print these pages. On your desk: pencils, erasers, infinite
graph paper, a blackboard, chalk -- and a room of infinite, tireless CLERKS who will
add, subtract, multiply, divide, and never complain. There is no computer and no
calculator anywhere in the teaching.
Four promises follow from that:
1. Nothing in the teaching path ever needs a machine. The Python at the end of
each post is a museum exhibit -- for the day you meet a computer.
2. Every number is recomputed where it is needed. No page assumes you remember
the previous one.
3. Every worked example is followed by a YOUR TURN drill -- do it on your slate
before reading on; the checked answer is printed right after.
4. Cost is counted in clerk-steps, not seconds: how many subtractions, how many
squarings, done by lunch or done by Christmas.
Your attention is the scarce thing; arithmetic is free. The blog is written
accordingly: short sections, heavy pencil work.
What's Here
-----------
A short book in six chapters (twenty-one posts), plus an appendix and a glossary. Read it
top to bottom -- each chapter draws its ideas by hand, then gathers the Python at the end.
- CHAPTER 1 -- Predicting House Prices: two estimators from scratch (ask-closest and
straight-stick) on the California housing pile, by hand.
part 1 .
part 2 .
part 3
- CHAPTER 2 -- Grading a Guesser: two ways to score a fitted line (MSE and R^2) and how
to read the dials it sets, on a sheet of cars.
part 1 .
part 2
- CHAPTER 3 -- Sorting Into Bins: sorting breast lumps into sick/well: the S-curve
guesser, the four-box table, ROC/AUC, the L2 leash, the two-cloud wall (LDA), grid
search, skewed piles, and macro/micro averaging.
part 1 .
part 2 .
part 3 .
part 4
- CHAPTER 4 -- Humble Dials and Wobble Bands: regularisation and uncertainty on the
diabetes pile: Ridge and Lasso leashes, picking the knob by cross-validation,
bootstrap wobble bands, the out-of-bag free exam, and the matrix solve by hand.
part 1 .
part 2 .
part 3
- CHAPTER 5 -- Question Charts and Committees: a machine that asks yes/no questions
instead of turning dials -- built by hand from blank sheet, then humbled by pruning.
Then 200 charts vote: bagging, random forest (hide the columns), and gradient
boosting (a line of stumps each fixing the last one's leftovers). How to make the
black box confess which columns it leaned on.
part 1 .
part 2 .
part 3
- CHAPTER 6 -- Finding Patterns Without Answers: unsupervised learning -- sheets with no
answer column. Distance and the ruler problem, PCA (the strongest direction), K-means
and hierarchical clustering, both tools on real gene data, and a recommender that fills
a sheet full of holes.
part 1 .
part 2 .
part 3 .
part 4 .
part 5 .
part 6
- APPENDIX A -- Classification Reference: all of Chapter 3's concepts in one flip-to page:
cross-entropy vs MSE, C parameter, LDA with priors, grid search fold safety,
min-max scaling, precision vs recall in business, PR curves, and averaging.
classification reference
- APPENDIX B -- Distance and Clustering Reference: Chapter 6's loose ends -- Hamming and
Mahalanobis distance, missing-data traps, why crush a room, and the ethics of sorting
people into piles (bias, privacy, transparency) plus customer segmentation.
clustering reference
- Glossary / Decoder Ring -- every plain term on this blog mapped to its textbook
buzzword, and back again. The whole "ban the jargon" idea, in one table.
glossary
What This Blog Isn't
--------------------
- AI hype coverage
- Tutorial-farm content with 30 headers and no depth
- Sponsored anything
The Site
--------
Plain-text source files wrapped in one <pre> tag each, styled by a single hand-written
stylesheet. No JavaScript, no tracking, no cookies, no frameworks, no web fonts.
Authored to 100 columns so it reads the same in a browser or piped through `less` in a
terminal. Hosted on GitHub Pages.
Contact
-------
Email: learn.rahul.rai@gmail.com
GitHub: learnrahulrai-ui
----------------------------------------------------------------------------------------------
(c) 2026 Rahul Rai . pure HTML+CSS, no JavaScript, no trackers .
home . source on GitHub
==============================================================================================