The Dial by Hand: Where the Dials Really Come From

==============================================================================================
  RAHUL'S ML BLOG -- notes on machine learning, worked out by hand                    est. 2026
==============================================================================================
  home | about | archive | glossary | contact
----------------------------------------------------------------------------------------------

  CHAPTER 4 . HUMBLE DIALS AND WOBBLE BANDS . PART 3 OF 3
  The Dial by Hand: Where the Dials Really Come From
  Posted: 2026-06-07 . Author: Rahul Rai . Tags: least-squares, collinearity, derivation
  ============================================================================================

  PATH . post 12 of 28
    <- prev:  Chapter 4, Part 2: One Dial Is a Lie
       next:  Chapter 5, Part 1: Question Charts by Hand ->

  We have leashed the dials and measured their wobble, and through all of it we have leaned
  on a sentence we never fully cashed: "the dials are whatever makes the total miss
  smallest." Fine -- but where do those numbers actually COME from? Not astrology. This
  closing post puts the whole thing on a blank sheet with nothing but a calculator, the way
  Chapter 1, Part 3 first did it -- and then it
  catches one honest trap that nobody warns you about until it is too late.


  ## The Sheet, the Goal

    patient   age   bmi   bp    ...   answer
    ----------------------------------------
     #1      0.04  0.06  0.02  ...    151
     #2     -0.01 -0.05 -0.03  ...     75
     #10      ?     ?     ?    ...    ???    <- guess THIS number

  All I have is the sheet and a calculator, and I want one number for patient #10. The shape
  of the guess is the straight stick, unchanged from the very first chapter:

    guess = (dial1 * age) + (dial2 * bmi) + ... + (dial10 * s6) + nudge
             +-------------- 10 "dirty pieces" --------------+    +- a fixed lift

  Multiply each of #10's ten columns by its dial, add the ten pieces, add the nudge. So the
  whole machine is just eleven numbers: ten dials and one nudge. The only question worth
  this post is where those eleven numbers come from.


  ## What Makes a Dial "Right"

  Not a guess. The dials are the exact set that drives the TOTAL WRONGNESS to its smallest:

    for every study patient:  miss = answer - guess
    total wrongness = add up all (miss^2)      <- square so + and - misses cannot cancel
    the right dials = the ones that push this to its lowest point

    wrongness
        | \              /
        |  \            /
        |   \_        _/
        |     \__  __/
        |        \./          <- bottom of the bowl = the winning dials
        +----------------- dial values

  At the very bottom of that bowl the slope is flat in every dial direction at once. Writing
  "slope = 0" for all ten dials and solving those equations is the whole job. Chapter 1 Part
  3 ground through that solve by hand on three rows; here we just name its answer.


  ## The EXACT Dials: One Line That Untangles Everything

  Stack the sheet as a grid X (people down, columns across) and the answers as a column y.
  The flat-bottom solve collapses to one famous line:

    all 10 dials at once  =  (X^T X)^-1  X^T y

    X^T y       = how each column lines up with the answer
    (X^T X)^-1  = the UNTANGLER -- it fixes columns that move together

  That second piece is the hero of the post. This is exactly what LinearRegression().fit
  computes under the hood, and it is EXACT whether the columns overlap or not.


  ## The LADDER: A Crutch That Looks Like the Answer

  There is a tempting shortcut for a single dial -- just measure how that one column and the
  answer move together, on their own:

    LADDER:  one dial  ~=  sum( (x - xbar)(answer - ybar) )  /  sum( (x - xbar)^2 )
                           +-- how the column and answer move together --+ / +- column wiggle -+

  A concrete 4-person worked example, by pencil:

    person   bmi (x)   answer (y)   x - xbar   y - ybar   (x-xbar)(y-ybar)   (x-xbar)^2
    --------------------------------------------------------------------------------------
      A       0.04        97        -0.04       -29          1.16             0.0016
      B       0.06       121        -0.02        -5          0.10             0.0004
      C       0.12       135         0.04         9          0.36             0.0016
      D       0.18       151         0.10        25          2.50             0.0100
                        -----                                                --------
    xbar = (0.04+0.06+0.12+0.18)/4 = 0.40/4 = 0.10
    ybar = (97+121+135+151)/4 = 504/4 = 126

    "x - xbar" means: subtract the column's own average (0.10) from each person's bmi.
    For person A: 0.04 - 0.10 = -0.04.  For B: 0.06 - 0.10 = -0.02.  And so on.

    "y - ybar" means: subtract the answer's average (126) from each person's answer.
    For person A: 97 - 126 = -29.  For B: 121 - 126 = -5.  And so on.

    Multiply (x-xbar) by (y-ybar) for each person:
      A: -0.04 * -29 = 1.16    (minus * minus = plus)
      B: -0.02 * -5  = 0.10
      C:  0.04 *  9  = 0.36
      D:  0.10 * 25  = 2.50
    sum = 1.16 + 0.10 + 0.36 + 2.50 = 4.12   <- this is the "top"

    Square (x-xbar) for each person:
      A: (-0.04)^2 = -0.04 * -0.04 = 0.0016
      B: (-0.02)^2 = -0.02 * -0.02 = 0.0004
      C: (0.04)^2  =  0.04 *  0.04 = 0.0016
      D: (0.10)^2  =  0.10 *  0.10 = 0.0100
    sum = 0.0016 + 0.0004 + 0.0016 + 0.0100 = 0.0136   <- this is the "bottom"

    LADDER dial = 4.12 / 0.0136 ~ 303

  >> YOUR TURN
     One person on a one-column sheet (made-up) sits at x - xbar = 0.05 and
     y - ybar = 20.  Give this person's contribution to the "top" and to the "bottom".

     check your slate:  top piece = (x-xbar) x (y-ybar) = 0.05 x 20 = 1.0;  bottom
     piece = (x-xbar)^2 = 0.05 x 0.05 = 0.0025.  Sum the top pieces over everyone for
     the LADDER's numerator, the bottom pieces for its denominator.

    That is the bmi-dial IF bmi were the ONLY column.  It is 303 because the
    numbers here are small (0.04, 0.06, ...) and multiplying by 303 brings them
    into the range of the answer (97 to 151).  The LADDER dial says "for every
    1 unit bmi goes up, the guess goes up by 303."

    But this is a LIE if any other column moves together with bmi.  The real
    dial is SMALLER because the untangler (X^T X)^-1 splits the credit among
    all columns.  The LADDER gives all credit to bmi; the EXACT gives each
    column only its own share.

  It is a lovely first picture of one column in isolation. It is also a TRAP, and here is
  the exact size of the trap:

  !! WARN: THE LADDER EQUALS THE REAL DIAL ONLY WHEN COLUMNS DON'T OVERLAP
     The per-column slope above matches the true dial ONLY when no column overlaps any
     other. Real columns always overlap -- bmi and bp and the blood serums all drift
     together -- so these one-at-a-time numbers come out WRONG. The true dials need the
     untangler (X^T X)^-1 to share the credit between columns that move as a pack. We met
     the symptom in Chapter 2, Part 2: when columns travel
     together, a lone dial gets shaky and can even flip sign. The LADDER is that shakiness
     baked right in.

  So say it out loud, every time, BEFORE building anything on a number:

    LADDER = a nice first picture of ONE dial, alone, ignoring the others
    EXACT  = (X^T X)^-1 X^T y, all dials at once, untangling the overlap   <- the real machine

  The teaching trap is to show the LADDER, call it "the dial," and bury the "only if columns
  don't overlap" as small print after you have already built on it. Don't be fooled, and
  don't fool anyone else: name it EXACT or LADDER up front.


  IN HAND: the bmi LADDER dial, worked from top 4.12 over bottom 0.0136 ~ 303 -- the
  credit bmi grabs if it stands alone -- with the warning that the EXACT dials need the
  untangler (X^T X)^-1 to share credit among overlapping columns.  This section adds the
  eleventh number, the nudge.

  ## The Nudge -- and Its Quiet Asterisk

  The fixed lift looks like it should be just as fiddly. On a centred sheet it is almost
  insultingly simple:

    nudge = the average of the study answers

  >> YOUR TURN
     On a centred sheet, four study answers (made-up) are 100, 120, 140, 160.  What is
     the nudge?

     check your slate:  nudge = (100 + 120 + 140 + 160)/4 = 520/4 = 130.  On a centred
     sheet the dials contribute nothing on average, so the only lift left to add is the
     answers' own average -- 130.

  But that tidy fact rests on a condition that is easy to forget:

  !! WARN: "NUDGE = AVERAGE OF ANSWERS" IS EXACT ONLY ON CENTRED COLUMNS
     It holds because every column on this sheet is already CENTRED -- its own average sits
     at zero, so on average the ten dirty pieces contribute nothing, and the only lift left
     to add is the answer's average. This is NOT a universal law. Un-centre the columns and
     the nudge changes. (It is the reason every chapter puts the columns on one ruler first.)


  ## Guess, Then Grade

  With eleven numbers in hand, the rest is the old ritual from Chapter 2:

    guess(#10) = sum( dial * #10's column ) + nudge        <- one number
    miss       = answer - guess
    RMSE       = sqrt( mean( miss^2 ) )      miss -> square -> mean -> root
    R^2        = (total wobble - leftover) / total wobble


  ## The Honest One Page

    guess  = sum(dial * column) + nudge
    dials  = (X^T X)^-1 X^T y           <- EXACT (untangles overlapping columns)
    slope  = sum((x-xbar)(y-ybar)) / sum((x-xbar)^2)   <- LADDER (only if columns don't overlap)
    nudge  = mean of the answers        <- EXACT only because the columns are centred
    grade  = miss -> square -> mean -> root (RMSE) ;  R^2 = 1 - leftover/wobble

  That is the engine under everything in this book: the same eleven numbers, found by the
  same flat-bottom solve, whether we then leash them (Part 1) or measure their wobble (Part
  2). Name your crutches out loud, keep your columns on one ruler, and the machine has no
  black boxes left in it.


  ## The Labels, Last

    Plain term used above                 Standard label
    -----------------------------------   ------------------------------------------
    total wrongness                       residual sum of squares / squared loss
    the flat-bottom solve                 the normal equations
    (X^T X)^-1 X^T y                      the ordinary-least-squares (OLS) solution
    the untangler                         the inverse of the Gram matrix X^T X
    the LADDER (one column alone)         the univariate / simple-regression slope
    columns that move together            collinearity / multicollinearity
    the fixed lift                        the intercept / bias term
    centred columns                       mean-centred / zero-mean features


  ## The Code, If You Want It

  Nothing above needed a computer -- only pencils, clerks, and patience.  This last
  section is for the day you meet one: the same steps, spoken in Python.

  The toolbox hides all of this behind one line; the by-hand version is two. Worth running
  both once, to see they agree -- and to see the LADDER disagree the moment columns overlap.

  >> NEW TO PYTHON? Each named once:
       Xc, y               -- Xc = X_train_scaled (zero-mean columns); y = y_train
       np.linalg.inv(M)    -- the inverse of a grid M (the untangler)
       A @ B               -- multiply two grids together (matrix multiply)
       X.T                 -- flip a grid on its side (transpose, the "X^T" above)

    import numpy as np
    from sklearn.linear_model import LinearRegression

    # the toolbox -- one line, exact, untangling and all
    dials = LinearRegression(fit_intercept=False).fit(Xc, y).coef_

    # the same thing by hand -- the flat-bottom solve, spelled out
    dials_by_hand = np.linalg.inv(Xc.T @ Xc) @ Xc.T @ y     # (X^T X)^-1 X^T y

    # the LADDER for one column, alone -- matches ONLY if columns don't overlap
    x = Xc[:, 1]                                            # e.g. the bmi column, centred
    ladder_bmi = np.sum(x * y) / np.sum(x * x)              # sum(x*y) / sum(x*x) on centred x


----------------------------------------------------------------------------------------------
  IN THIS CHAPTER (Chapter 4 -- Humble Dials and Wobble Bands):
    Part 1 -- The Leash .
    Part 2 -- One Dial Is a Lie .
    Part 3 (this post)

  <- Back to all posts
----------------------------------------------------------------------------------------------
  (c) 2026 Rahul Rai . pure HTML+CSS, no JavaScript, no trackers .
  home . source on GitHub
==============================================================================================