==============================================================================================
RAHUL'S ML BLOG -- notes on machine learning, worked out by hand est. 2026
==============================================================================================
home | about | archive | glossary | contact
----------------------------------------------------------------------------------------------
CHAPTER 6 . FINDING PATTERNS WITHOUT ANSWERS . PART 6 OF 6
Filling the Blanks: Recommender Systems and Matrix Factorisation
Posted: 2026-06-09 . Author: Rahul Rai . Tags: recommender-systems, matrix-factorisation, collaborative-filtering
============================================================================================
PATH . post 21 of 28
<- prev: Chapter 6, Part 5: Both Tools on NCI60
next: Chapter 7, Part 1: How a Network Computes a Guess ->
The last post in the book. Every chapter until now had a full sheet of numbers -- maybe
no answer column, but at least every cell was filled in. This post starts with a sheet
that is mostly HOLES, and the whole job is to guess what belongs in the blanks.
This is the recommender problem: a grid of users down the side, movies across the top,
and a star rating in a cell only where a user actually rated a movie. Most cells are
empty -- nobody watches everything. Fill a blank well and you can say "you, who has not
seen this film, will probably rate it 4.5 stars" -- a recommendation.
## The Sheet Full of Holes
Matrix Titanic Alien Notebook Rambo
Ann [ 5 . 4 . . ]
Ben [ . 4 . 5 . ]
Cal [ 4 . 5 . 4 ]
Dee [ . 5 . 4 . ]
Eve [ 5 . 4 . 3 ]
"." = blank (this user never rated this movie)
The grid is mostly dots. The goal: guess the dots. If we can guess that Ben would rate
Alien a 4, we recommend Alien to Ben. There is no separate "answer column" -- the answers
ARE the scattered ratings themselves, and the blanks are what we predict.
## The Old Idea in New Clothes: Hidden Tastes
Stare at the filled cells and a pattern leaks out. Ann and Eve both love Matrix and
Alien -- sci-fi. Ben and Dee both love Titanic and Notebook -- romance. Nobody told us
"sci-fi" and "romance" exist; they fall out of the ratings. These are HIDDEN TASTES, and
they are exactly PCA's shadows from Part 2, wearing a new coat.
The trick (matrix factorisation): every rating is the match between a user's taste and a
movie's flavour. Give each user a short list of hidden-taste scores, and each movie the
same short list of hidden-flavour scores. A rating is their dot-product:
rating(user, movie) = (user's taste) matched against (movie's flavour)
Ann's tastes: sci-fi 0.9, romance 0.1
Alien's flavour: sci-fi 0.8, romance 0.0
guess = 0.9 x 0.8 + 0.1 x 0.0 = 0.72
(The actual numbers the machine finds won't be tidy 0.9 and 0.8; those
are for illustration. The optimiser picks whatever scale makes the
predicted ratings match the true ones -- so if real ratings are 1-5
the factors end up in a range that produces 1-5 predictions.)
High when taste lines up with flavour, low when it does not. That is the whole engine.
## Two Skinny Sheets Instead of One Holey One
Matrix factorisation splits the big holey grid into two SKINNY full sheets -- one for
users, one for movies -- with a few hidden columns each:
big holey grid user tastes movie flavours
(users x movies) = (users x K) times (K x movies)
5 users x 5 movies ~= 5 users x 2 x 2 x 5 movies
(mostly blank) (full) (full)
K is the number of hidden tastes (2 here: sci-fi, romance). The two skinny sheets are
full -- no holes. Multiply them back together and you get a COMPLETE grid: every cell
filled, including the ones that were blank. Those filled-in blanks are the predictions.
This is the same shape PCA used -- a short list of per-row scores multiplied by a shared
recipe (loadings) rebuilds the full sheet. PCA crushed a full
sheet to a smaller one and blew it back up blurry; here we LEARN the two skinny sheets so
that blowing them back up reproduces the ratings we DO have -- and fills the ones we don't.
## How the Skinny Sheets Are Learned
We never saw the hidden tastes -- we invent them and tune them. Start both skinny sheets
with random numbers, then drive down one familiar score:
for every rating we ACTUALLY have:
guess = user-taste matched against movie-flavour
miss = real rating - guess
total wrongness = sum of miss^2 (only over the cells we have!)
nudge both skinny sheets to shrink that wrongness, and repeat
The bottom-of-the-bowl, squared-miss idea from Chapter 1 -- one last time. The one new
wrinkle: we only sum the miss over the FILLED cells. The blanks have no truth to compare
against, so they sit out the scoring entirely. Once the filled cells are well-predicted,
the blanks come along for free when you multiply the two sheets.
>> NOTE: A LEASH, BECAUSE THE SHEET IS MOSTLY HOLES
With so few filled cells and freely-spinning hidden tastes, the machine will happily
memorise the handful of ratings it has and predict nonsense in the blanks. So we add
the Chapter 4 leash -- a fine on big taste/flavour numbers -- to keep it humble. Same
leash, brand new place to need it.
## The Lab, End to End: 100 People, 50 Movies, 3,786 Holes
Rule done -- now the real sheet from the lab. 100 people down the side, 50 movies
across the top: 100 x 50 = 5,000 cells. Only 1,214 hold a rating; the other
5,000 - 1,214 = 3,786 are blank.
blank fraction = 3786 / 5000 = 0.757 <- three-quarters of the sheet is holes
The engine below fills every hole WITHOUT the nudging loop above -- no learning rate,
no gradient. It re-uses PCA's rebuild from Part 2 (crush to a few shadows, blow back
up), run in rounds, and the blanks fill themselves.
## First Move: Humble Every Movie Column
Subtract each movie's middle from every rating in its column. The middle is computed
from the PRESENT values only -- blanks do not vote, and blanks stay blank.
one movie's 4 present ratings: 5, 4, 3, 4 (made-up)
middle = 16 / 4 = 4
the person who said 3 -> 3 - 4 = -1 liked it LESS than the crowd
the person who said 5 -> 5 - 4 = +1 liked it MORE than the crowd
the person who said 4 -> 4 - 4 = 0 exactly the crowd
>> YOUR TURN
A sixth person rated this same movie a 2. Its middle is still 4 (the middle is
fixed from the present values; a new low rating does not change the worked middle
here). Humble his 2.
check your slate: 2 - 4 = -2. He liked it two stars LESS than the crowd -- the
most-below-the-middle voter so far. The minus sign is information, not a penalty.
Why by COLUMN and not by row? A movie carries a baseline fame -- some films everyone
rates high, some low. Subtracting the movie's middle strips the fame off and leaves
the pure signal: this person, more or less than the crowd, on this film. After the
subtraction, every column's middle parks at zero.
## Round 0: A Lazy Zero in Every Blank
Write 0 into all 3,786 blanks (the 5,000 - 1,214 = 3,786 cells with no rating). On the
humbled sheet, 0 claims "this person would rate
it exactly at the movie's middle" -- a guess that uses NO evidence about the person.
Dumb, harmless, temporary: it budges no column's middle, and the rounds below will
overwrite every one of these zeros with something earned.
## A Hand Recipe That Dies of Echo
Before trusting any machine, I tried to fill a blank by hand. Person 7 rated 5
movies; his cell at movie m is blank. The recipe:
1. rank everyone by closeness to person 7 -- over the movies BOTH rated, take
the differences, and average their sizes into one closeness number
2. walk down the ranking; stop at the FIRST person who actually RATED movie m
3. copy that person's humbled VALUE into person 7's blank
One catch the recipe must dodge: trust whole-row closeness over one lucky shared
movie.
shared movies: a b c d e
person 11 differences: 0 3 4 2 3 <- perfect once, far elsewhere
person 12 differences: 0.2 0.2 0.2 0.2 0.2 <- near on ALL -> copy HIM
Person 11 agrees perfectly on one movie and disagrees everywhere else; person 12 is a
little off on every movie. Person 12 is the honest twin.
And then the recipe dies -- caught by hand. Say person 12 was the donor, and his
values were copied verbatim into person 7's blanks. Now re-rank everyone for the
next blank: over those copied cells, the difference between person 7 and person 12
is 0 BY CONSTRUCTION. The ranking is no longer hearing evidence about person 7 --
it is hearing its own echo, and each copy rigs the next ranking harder. Dead end.
The machine below never falls in this hole, because no blank is ever filled from ONE
donor: every mark is blended from the entire sheet at once.
## The Machine Loop: Rebuild, Restore, Repeat
IN HAND: a lab sheet of 100 people x 50 movies = 5,000 cells, of which 5,000 - 1,214 =
3,786 are blank (3,786 / 5,000 = 0.757, three-quarters holes). Each blank is humbled by
its movie's middle, then started at a lazy zero. This section turns those zeros into
earned marks: rebuild the whole sheet, keep the blanks, restore the truth, repeat.
round 0: write the lazy zeros into the blanks
round 1: the machine re-writes the WHOLE sheet from its k strongest directions
paste the machine's marks into the BLANKS ONLY
put the rated cells back to the truth
round 2: the same, on the better sheet
...
freeze: this round's marks ~ last round's marks -> stop
Two lines carry all the danger.
First: the rebuild re-writes EVERY cell -- the rated ones included. After one rebuild
a true 0.30 comes back as 0.28. Truth is not the machine's to edit. So keep a MASK
SHEET from the start -- one true/false per cell, "was this one blank?" -- and after
every rebuild restore the rated cells from the original sheet. On the working sheet
your eye cannot tell a true value from a pencilled mark; only the mask knows.
Second: why rounds at all? Round 1's machine studied a sheet that was three-quarters
fake zeros, so its marks are crude. Round 2's machine studies a sheet whose blanks
hold those crude-but-better-than-zero marks, and spits cleaner ones. Each round eats
a cleaner sheet and returns a cleaner sheet, until the marks stop moving.
Count the clerk bill for this loop. Each rebuild re-writes every cell: 100 people x 50
movies = 5,000 cell-writes. Run it for 20 rounds and the clerks lay down 5,000 x 20 =
100,000 cell-writes before the marks freeze. Heavy, but every stroke is plain
arithmetic -- no magic, just patience.
## Inside the Machine, Knob Set to 1
The machine has one knob: how many directions to keep. Open the box at knob = 1.
The bet at knob 1: every person is an ECHO of one shared taste-row.
a made-up corner of a sheet:
+2 +4 +6 +8 <- one row of real information
+1 +2 +3 +4 <- the same row, at HALF strength
-2 -4 -6 -8 <- the same row, at MINUS-ONE strength
Three rows, but only ONE row of information. Knob 1 stores exactly that: one
BUILTROW (50 numbers, one per movie -- computed from the sheet, not stored in it,
the same way the movie middles were) plus ONE STRENGTH per person (100 numbers).
builtrow[m] = sum over people( strength x his value at m ) / sum( strength^2 )
person's strength = sum over movies( builtrow[m] x his value at m ) / sum( builtrow^2 )
Look hard at those two lines: each one is the LADDER from Chapter 4, Part 3 -- the
one-column slope sum(x*y)/sum(x^2) -- pointed at a different unknown. The builtrow
is fitted holding the strengths still; the strengths are fitted holding the builtrow
still; alternate until neither moves.
Check it by hand on the corner above. Strengths (1, 1/2, -1); first column (2, 1, -2):
top = 1x2 + (1/2)x1 + (-1)x(-2) = 4.5
bottom = 1 + 1/4 + 1 = 2.25
builtrow entry = 4.5 / 2.25 = 2 <- the first entry of the real row. correct.
And a person whose row is (1, 2, 3, 4) against builtrow (2, 4, 6, 8):
strength = (1x2 + 2x4 + 3x6 + 4x8) / (4 + 16 + 36 + 64) = 60 / 120 = 1/2 correct.
>> YOUR TURN
Same corner, same strengths (1, 1/2, -1). Now build the SECOND entry of the
builtrow from the second column (+4, +2, -4). Do the top, the bottom, and divide.
check your slate:
top = 1x4 + (1/2)x2 + (-1)x(-4) = 4 + 1 + 4 = 9
bottom = 1 + 1/4 + 1 = 2.25
entry = 9 / 2.25 = 4
The real row's second entry is +4 -- the ladder pulled it back out clean, just as
it did the first entry (2).
Once the two have settled, filling a blank costs ONE multiply:
person 7's mark at movie m = strength_7 x builtrow[m]
Knob 2 stores TWO builtrows, and every person carries his OWN two strengths. These
are recipes, not camps -- nobody gets sorted into a group; each person mixes the
builtrows in his own private proportions. The knob = how many independent base rows
you claim the sheet really has.
## Pick the Knob by Hiding What You Know
IN HAND: a 100 x 50 = 5,000-cell lab sheet, 3,786 of them blank, humbled by movie
middles and filled by the rebuild-restore loop -- whose one knob is how many builtrows
to keep. This section sets that knob honestly, without ever peeking at a true blank.
A mark written into a blank can never be graded -- a blank has no answer behind it.
So manufacture blanks whose answers you KEEP:
hide 20 of every 100 RATED cells (pretend blank; pocket their truth)
run the whole loop once per knob: 1, 2, 3, 5, 10
uncover: compare each knob's marks against the pocketed truth -> one error per knob
smallest error wins
Hide only RATED cells. Hiding a blank pockets nothing -- there is no truth to grade
against later. (Chapter 1's lock-away-the-exam habit, shrunk down to single cells.)
>> NOTE: THIS LAB WAS SYNTHETIC, SO A FULL ANSWER KEY EXISTED
The sheet was generated, and the true rating behind every blank sat in a second
file. Grading the finished marks against it: agreement around 0.89, typical miss
around 0.24, on the humbled scale. One trap inside that grading: humble the
answer key with the SAME movie middles as the working sheet, not with its own --
otherwise the two sheets sit in different frames and the comparison lies.
## Un-Humble and Recommend
IN HAND: a full 100 x 50 = 5,000-cell sheet, every one of the 3,786 blanks now holding
an earned mark, the knob picked by hiding rated cells -- but every number still in
humbled units ("more or less than the crowd"). This section turns those marks back into
stars a person can read.
The finished sheet still speaks in humbled units ("more or less than the crowd").
To speak in stars, add the movie's middle back, then clamp into the 1-to-5 range:
predicted stars = mark + that movie's middle (clamped into 1..5)
Worked once: a blank came back with mark +1, and that movie's middle is 4. Stars =
1 + 4 = 5, already inside 1..5, so it stands at 5 -- a strong recommendation.
>> YOUR TURN
Another blank for the same movie (middle 4) came back with mark -3.5. Un-humble it,
then clamp into 1..5.
check your slate: stars = -3.5 + 4 = 0.5; 0.5 is below the floor of 1, so clamp
up to 1. Predicted 1 star -- the lowest the scale allows; do not recommend it.
For one person: predict every movie he never rated, sort, and the top 5 is his
recommendation list. That single un-humble line is the point of the whole pipeline
-- everything before it existed so this addition lands on a number worth showing him.
## Cold Start: The Honest Limit
The method only works where there is something to chew on. A brand-new user who has
rated NOTHING has an all-blank row -- no tastes can be inferred, so no honest
recommendation can be made. Same for a movie nobody has rated yet. This is the COLD
START problem, and it is a real wall, not a bug you can code around: with zero ratings
there is simply no signal. Real systems paper over it with side facts (your age, the
movie's genre) until enough ratings arrive.
## Where the Book Lands
Look back at the whole arc. We began in Chapter 1 guessing a house price with a straight
stick and squared misses. Six chapters later we are guessing a missing movie rating with
two skinny sheets -- and it is the SAME squared-miss bowl, the SAME leash, the SAME
scores-times-recipe shape as PCA. The cast of tricks is small. We just kept meeting them
in new costumes: dials, leashes, wobble bands, question charts, committees, shadows,
piles, and now hidden tastes. Name the trick under the costume and the machine has no
black boxes left in it.
## Common Tripwires I Caught
TRIPWIRE 1: Only score the cells you actually have.
The wrongness sum runs over FILLED ratings only. Counting the blanks (which
have no truth) is meaningless and breaks the whole method.
TRIPWIRE 2: K (number of hidden tastes) is a knob, like everywhere else.
Too few hidden tastes and the machine cannot tell sci-fi from romance; too
many and it memorises noise in the handful of ratings. Pick K by holding out
some known ratings and checking the guesses -- the Chapter 1 habit, again.
TRIPWIRE 3: Hidden tastes are not labelled "sci-fi" by the machine.
The machine finds K useful directions; it never names them. "Sci-fi" and
"romance" are stories WE tell after looking at which movies score high. Same
as PCA loadings -- the math finds the axis; the meaning is our interpretation.
TRIPWIRE 4: Cold start is a wall, not a tuning problem.
No ratings means no tastes means no honest guess. Do not fake it with the
global average and call it personalised -- it is not.
TRIPWIRE 5: You still need a leash.
A mostly-empty sheet is the easiest thing in the world to overfit. Without
the Chapter 4 penalty on big numbers, predictions in the blanks go wild.
TRIPWIRE 6: This is unsupervised in spirit, supervised in mechanics.
There is no tidy answer column, yet we DO fit to known ratings. It sits on
the border -- learning hidden structure (unsupervised) by predicting observed
values (supervised). Do not box it too hard either way.
TRIPWIRE 7: Humbling SUBTRACTS, and the sign survives.
3 - 4 = -1 means "liked it one star LESS than the crowd" -- information, not
punishment. And nobody "gets a negative for not going": blanks are skipped,
not humbled. Only present ratings move; blanks stay blank.
TRIPWIRE 8: The closeness number never gets written into a cell.
It only ORDERS people -- most honest twin first. What gets copied into the
blank is the chosen person's VALUE. Writing the closeness number itself into
the sheet confuses the ruler with the thing it measured.
TRIPWIRE 9: Verbatim copying poisons the next ranking.
Paste a donor's values into your blanks and your difference to him over those
cells is 0 BY CONSTRUCTION -- the re-ranking now hears its own echo, not
evidence. This killed the hand recipe. The machine survives because every
mark is blended from the whole sheet, never copied from one donor.
TRIPWIRE 10: The rebuild edits EVERY cell -- restore the truth each round.
After one rebuild a true 0.30 comes back 0.28. Rated cells are not the
machine's to edit: keep a mask sheet ("was this one blank?") and put the
truth back after every rebuild. On the working sheet, the eye cannot tell
truth from pencil -- only the mask knows.
TRIPWIRE 11: Hide only RATED cells when grading a knob.
A hidden blank pockets no truth, so there is nothing to grade against later.
The 20% you hide must come from cells that actually hold ratings.
## The Labels, Last
Plain term used above Standard label
----------------------------------- ------------------------------------------
sheet full of holes sparse user-item rating matrix
guess the blanks rating prediction / recommendation
hidden tastes / flavours latent factors
number of hidden tastes (K) latent dimensionality / rank
two skinny sheets user and item factor matrices
matched against (dot-product) the inner product of factor vectors
split the grid into two sheets matrix factorisation
learn from ratings of similar users collaborative filtering
score only the filled cells loss over observed entries only
the leash on big taste numbers L2 regularisation (from Chapter 4)
no ratings yet, no guess the cold-start problem
humble a movie column centring by the item (column) mean
the movie's middle the item mean over observed ratings
the lazy zero / round-0 fill zero imputation (after centring)
the mask sheet the boolean missing-data mask
the machine's rebuild truncated SVD reconstruction
rebuild-restore rounds iterative SVD matrix completion
the marks freeze convergence of the imputation loop
the builtrow and the strengths the rank-1 singular vector and its scores
the knob (how many builtrows) the rank
hide rated cells and grade validation masking / held-out cells
un-humble (add the middle back) un-centring / adding back the item mean
## The Code, If You Want It
Nothing above needed a computer -- only pencils, clerks, and patience. This last
section is for the day you meet one: the same steps, spoken in Python.
>> NEW TO PYTHON? Each named once:
np.where(mask, a, b) -- pick a where mask is true, else b (filled vs blank)
X @ Y -- multiply the two skinny sheets back into a full grid
np.isnan(R) -- find the blank cells (NaN = "not a number" = blank)
np.nanmean(R, axis=0) -- column middles that SKIP the blanks
np.linalg.svd(...) -- the rebuild machinery (Part 2's shadows)
np.allclose(a, b) -- "are these two sheets the same, near enough?"
np.clip(x, 1, 5) -- clamp every number into a range
import numpy as np
# R = ratings grid, np.nan in the blanks
mask = ~np.isnan(R) # True where we HAVE a rating
n_users, n_movies = R.shape
K = 2 # hidden tastes
rng = np.random.RandomState(42)
U = rng.normal(size=(n_users, K)) # user tastes (skinny, full)
V = rng.normal(size=(n_movies, K)) # movie flavours (skinny, full)
lr, leash = 0.01, 0.1
for step in range(2000):
pred = U @ V.T # blow the two sheets back to a full grid
err = np.where(mask, R - pred, 0.0) # miss on FILLED cells only
U += lr * (err @ V - leash * U) # nudge tastes (with the leash)
V += lr * (err.T @ U - leash * V) # nudge flavours
full_grid = U @ V.T # every blank now filled -> recommendations
# ---- the lab's engine: rebuild-restore rounds (no gradient) ----
movie_middles = np.nanmean(R, axis=0) # present values only
C = R - movie_middles # humble; blanks stay NaN
blank = np.isnan(C) # the mask sheet
filled = np.where(blank, 0.0, C) # round-0 lazy zeros
k = 3 # the knob (picked by hiding)
for _ in range(20):
Uu, s, Vt = np.linalg.svd(filled, full_matrices=False)
rebuilt = (Uu[:, :k] * s[:k]) @ Vt[:k] # keep k builtrows
new = np.where(blank, rebuilt, C) # blanks: marks; rated: truth restored
if np.allclose(new, filled): # freeze
break
filled = new
stars = np.clip(filled + movie_middles, 1, 5) # un-humble, clamp into 1..5
# picking k: hide 20% of the RATED cells, run the loop per k in [1,2,3,5,10],
# grade each k's marks against the pocketed truth, keep the smallest error.
# the heavyweight toolbox route (the surprise package: from sklearn / surprise):
# from sklearn.decomposition import TruncatedSVD # factor a filled matrix
# or a dedicated library (e.g. `surprise`) for the holey, regularised case
----------------------------------------------------------------------------------------------
IN THIS CHAPTER (Chapter 6 -- Finding Patterns Without Answers):
Part 1 -- Looking at a Sheet With No Answers .
Part 2 -- The Strongest Direction (PCA) .
Part 3 -- Grouping by Nearest Centre (K-Means) .
Part 4 -- The Family Tree (Hierarchical Clustering) .
Part 5 -- Both Tools on NCI60 (Re-visited) .
Part 6 (this post)
That is the end of the book. For quick lookups, see the
Appendix -- Classification Reference and the
Glossary / Decoder Ring.
<- Back to all posts
----------------------------------------------------------------------------------------------
(c) 2026 Rahul Rai . pure HTML+CSS, no JavaScript, no trackers .
home . source on GitHub
==============================================================================================