==============================================================================================
  RAHUL'S ML BLOG -- notes on machine learning, worked out by hand                    est. 2026
==============================================================================================
  home | about | archive | glossary | contact
----------------------------------------------------------------------------------------------

  CHAPTER 3 . SORTING INTO BINS . PART 2 OF 4
  The Trade Curve: Sliding the Cutoff and What AUC Actually Measures
  Posted: 2026-06-05 . Author: Rahul Rai . Tags: roc-curve, auc, cutoff, evaluation
  ============================================================================================

  PATH . post 7 of 28
    <- prev:  Sorting 1: S-Curve and Four-Box Table
       next:  Sorting 3: Leash and Cloud ->

  Every score in the last post quietly agreed on one thing: draw the line at 0.5. But who
  said 0.5? It is just one opinion about where "probably well" ends and "probably sick"
  begins -- and it is an opinion you are free to overrule.

  Reach out and slide that line. Push it left and the machine turns anxious, crying "sick"
  at the faintest shadow -- it catches more real cases, but startles more healthy people
  with false alarms. Push it right and the machine turns stoic, holding its tongue until
  it is almost certain -- fewer false alarms, but more real cases slip quietly out the
  door. Nowhere on that line is there a free lunch: every spot you pick is a different
  bargain struck between lives and money. The trade curve is what you get when you stop
  arguing over a single spot and lay all the bargains out side by side, in one picture.


  IN HAND: machine gives each lump a score 0-to-1; draw a line (the cutoff) somewhere on
  that scale. Four boxes: CAUGHT (truly sick, called sick), FALSE ALARM (truly well, called
  sick), MISSED (truly sick, called well), CLEARED (truly well, called well).
  This section measures the two rates those boxes produce as the cutoff slides.

  ## Two Rates

  At any fixed cutoff, the four boxes yield two fractions:

    catch-rate   (TPR) = CAUGHT / all truly sick   ^ good
    false-alarm  (FPR) = ALARM  / all truly well   v good

  Formally (Y = true verdict: 1 sick, 0 well; yhat = machine's called verdict):

      TPR(t) = P(yhat=1 | Y=1) = TP(t) / (TP(t) + FN(t))
      FPR(t) = P(yhat=1 | Y=0) = FP(t) / (FP(t) + TN(t))

  TPR is the same as recall. FPR is the fraction of healthy lumps that get misread as
  sick. Moving the cutoff shifts both at once -- they cannot be improved together without
  limit.


  ## Sliding the Cutoff

    chance 0.0 --------------- 1.0
                     ^ cutoff (slide)

    slide LEFT  (cutoff ~= 0.1): shout sick at almost everyone
        catch-rate   ^   (few sick lumps escape)
        false-alarm  ^   (healthy people caught in the net too)

    slide RIGHT (cutoff ~= 0.9): shout sick only when bone-sure
        catch-rate   v   (many sick lumps called well and sent home)
        false-alarm  v   (healthy people mostly called clear)

    -> a TRADE: cannot raise catches without raising false alarms

  There is no cutoff that maximises TPR while holding FPR at zero -- unless the machine
  perfectly separates the two groups. What exists is a continuous family of deals.


  ## Drawing All the Deals at Once

  Sweep the cutoff from 1 down to 0. At each position, compute (FPR, TPR) and drop a dot at
  those coordinates. Connect the dots: that is the trade curve, or ROC curve.

    TPR (catch-rate, lives saved)
     1 | * <- perfect machine (catch all, alarm none)
       |  ###
       |     ##
       |  - - - - \  <- diagonal = useless coin flip
     0 +--------------- FPR (false-alarm, money wasted)
       0              1

    each dot = one cutoff position
    perfect machine: hugs the top-left corner
    coin flip: follows the diagonal

  The curve is defined by t: {(FPR(t), TPR(t)) : t in [0, 1]}. As t -> 0, everyone is
  called sick -> TPR=1, FPR=1 (top-right). As t -> 1, everyone is called well -> TPR=0,
  FPR=0 (bottom-left).

  A concrete 6-person exam pile, by pencil:

    exam pile:  3 sick (truth=1), 3 well (truth=0)

    lump   truth   machine chance   6 cutoff positions
    -------------------------------------------------------------
      A    sick    0.97     -> cutoff 0.99: below -> well   FN
      B    sick    0.88     -> cutoff 0.90: below -> well   FN
      C    sick    0.72     -> cutoff 0.80: below -> well   FN
      D    well    0.45     -> cutoff 0.60: below -> well   TN
      E    well    0.22     -> cutoff 0.40: below -> well   TN
      F    well    0.11     -> cutoff 0.20: below -> well   TN

    Sweep the cutoff through 6 positions and compute (FPR, TPR):

      cutoff   >=cut?   CAUGHT  MISSED  ALARM  CLEAR   TPR     FPR
      -----------------------------------------------------------------
      0.99     none     0/3     3/3     0/3    3/3     0.000   0.000
      0.90     A        1/3     2/3     0/3    3/3     0.333   0.000
      0.80     A,B      2/3     1/3     0/3    3/3     0.667   0.000
      0.60     A,B,C    3/3     0/3     0/3    3/3     1.000   0.000
      0.40     A,B,C,D  3/3     0/3     1/3    2/3     1.000   0.333
      0.20     A,B,C,D, 3/3     0/3     2/3    1/3     1.000   0.667
              E
      0.00     all      3/3     0/3     3/3    0/3     1.000   1.000

    TPR = CAUGHT / (CAUGHT + MISSED).  FPR = ALARM / (ALARM + CLEAR).
    At cutoff 0.90: 1 sick caught, 2 missed -> TPR=1/3=0.333;
    0 alarms -> FPR=0/3=0.000.  At cutoff 0.40: all 3 sick caught +
    1 well alarmed -> TPR=3/3=1.000, FPR=1/3=0.333.

    Plot the 6 dots: (0.000,0.000), (0.000,0.333), (0.000,0.667),
    (0.000,1.000), (0.333,1.000), (0.667,1.000), (1.000,1.000).
    The curve hugs the left edge (FPR=0 for the first 4 cutoffs),
    then bends right.  The curve is ABOVE the diagonal -- the machine
    is doing real work.

  >> YOUR TURN
     At some cutoff (made-up) the four-box holds, out of 4 sick and 6 well: CAUGHT 3,
     MISSED 1, ALARM 2, CLEAR 4.  Work the catch-rate and false-alarm rate.

     check your slate:  TPR = CAUGHT / (CAUGHT + MISSED) = 3 / (3 + 1) = 3/4 = 0.75;
     FPR = ALARM / (ALARM + CLEAR) = 2 / (2 + 4) = 2/6 ~= 0.333.  One dot on the
     trade curve at (0.333, 0.75) -- three sick in four caught, one well in three
     falsely alarmed.

  Count the sweep in clerk-steps: n lumps give n + 1 cutoff positions, and each
  position re-tallies the four boxes over all n lumps.  On the Wisconsin exam of 569
  lumps that is 570 x 569 = 324,330 comparisons to draw the whole curve -- a morning
  for the clerks, an afternoon's nap for the kettle.


  ## The Area Under the Curve

  IN HAND: a 6-person exam swept through 6 cutoffs into (FPR, TPR) pairs that climbed
  the left edge -- (0,0.333), (0,0.667), (0,1.0) -- then bent right along the top.
  This section crushes that whole staircase into one number.

  The whole curve collapses into one number: the area under it, AUC. A perfect machine
  that hugs the top-left corner has area 1.0. A coin flip that follows the diagonal has
  area 0.5. A machine whose scores are systematically backwards has area below 0.5.

    AUC = 1.0   perfect -- no overlap between sick and well scores
    AUC = 0.5   useless -- sick and well scores interleaved randomly
    AUC < 0.5   backwards -- sick scores lower than well scores

  AUC is computed from the curve points by the trapezoidal rule.

  >> YOUR TURN
     Two neighbouring ROC dots (made-up): (FPR 0.2, TPR 0.6) then (FPR 0.5, TPR 0.9).
     Work the area of the trapezoid strip between them: width x average height.

     check your slate:  width = 0.5 - 0.2 = 0.3;  average height = (0.6 + 0.9)/2 =
     1.5/2 = 0.75;  strip = 0.3 x 0.75 = 0.225.  Add up every such strip across the
     sweep and you have the whole area under the curve.
  From the 6-cutoff
  sweep above, by pencil:

      drop       FPR      TPR     (FPR_next - FPR) x (TPR_next + TPR) / 2
      ----------------------------------------------------------------------
       1         0.000    0.000   (0.000 - 0.000) x (0.333 + 0.000) / 2  = 0.000
       2         0.000    0.333   (0.000 - 0.000) x (0.667 + 0.333) / 2  = 0.000
       3         0.000    0.667   (0.000 - 0.000) x (1.000 + 0.667) / 2  = 0.000
       4         0.000    1.000   (0.333 - 0.000) x (1.000 + 1.000) / 2  = 0.333
       5         0.333    1.000   (0.667 - 0.333) x (1.000 + 1.000) / 2  = 0.333
       6         0.667    1.000   (1.000 - 0.667) x (1.000 + 1.000) / 2  = 0.333
       --
       total AUC = 0.000 + 0.000 + 0.000 + 0.333 + 0.333 + 0.333 = 1.000

  The AUC is 1.0 -- a perfect ranking.  Indeed the machine sorted all 3 sick chances
  (0.97, 0.88, 0.72) above all 3 well chances (0.45, 0.22, 0.11).  For ANY cutoff
  between 0.72 and 0.45, the machine catches all sick and alarms none.


  ## What AUC Actually Measures

  Areas under curves are easy to compute and hard to feel. So here is the same number
  told as a tiny story you can actually picture -- a result that goes by the name
  pick-higher reading:

      AUC = P( score(x+) > score(x-) )

  Pick one sick lump at random and one well lump at random. What is the probability that
  the machine gave the sick lump a higher chance-score than the well lump? That is AUC.

    GOOD machine -- scores:
      well lumps ->  ####                       (all low)
      sick lumps ->             ####            (all high)
      -> almost always: sick score > well score -> AUC ~= 1.0

    BAD machine -- scores:
      well lumps ->   # # #  #  #  #
      sick lumps ->  #  # #   #  #  #
      -> roughly half the time sick scores below well -> AUC ~= 0.5

  AUC = 0.97 means: take a random sick lump and a random well lump; 97 times out of 100
  the machine scored the sick lump higher. It answers "how cleanly do the two groups
  separate?" without committing to any one cutoff.

  ** KEY: AUC IS CUTOFF-INDEPENDENT
     The four scores from Part 1 -- accuracy, recall, precision, F1 -- all answer "how
     good is this machine at threshold 0.5?" AUC answers "how good is this machine's
     RANKING of sick chances above well chances, everywhere, always?" It is the right score
     when you want to compare two machines before deciding where to set the cutoff.


  ## Why You Need Chances, Not Labels

  The ROC sweep needs a continuous score to sweep through. predict returns only 0 or 1 --
  two unique scores, two dots, a useless curve. predict_proba returns the raw chance
  output before any cutoff, giving a different number for every row.

    predict(X_test_scaled)        -> [1, 0, 1, 1, 0, ...]   (0 or 1 only)
    predict_proba(X_test_scaled)  -> [[0.03, 0.97],          (well-chance, sick-chance)
                                       [0.81, 0.19],           per row
                                       ...]

  predict_proba returns two columns per row: column 0 is the well-chance P(B), column 1 is
  the sick-chance P(M). For the ROC curve you want the sick-chance -- column 1 -- and the
  code that sweeps it into a curve is at the end of the post.


  ## Plotting It

  When you plot it (code at the end), the grey 45-degree diagonal is the reference: a
  machine that assigns random scores sits right on it. Any curve above the diagonal beats
  random, and the further it bulges toward the top-left corner, the cleaner the separation.


  ## Practical Reading

    AUC ~= 0.99   the two clouds are cleanly separated; almost any cutoff works
    AUC ~= 0.85   good separation; choice of cutoff matters; tune recall vs precision
    AUC ~= 0.70   noisy; weak signal; more columns or a different machine might help
    AUC ~= 0.50   no signal at all; the 30 columns carry no information about the bin

  >> NOTE: THE TRADE CURVE AND AUC DO NOT PICK A CUTOFF FOR YOU
     They tell you how well-separated the two groups are. The choice of cutoff -- how much
     ALARM is acceptable to buy a given catch-rate -- is a clinical or business decision.
     In cancer screening, a doctor reading a curve of AUC 0.97 might pick the cutoff that
     gives 99% recall even if FPR climbs to 15%. The curve shows the available deals; the
     doctor picks one.


  ## The Code, If You Want It

  Nothing above needed a computer -- only pencils, clerks, and patience.  This last
  section is for the day you meet one: the same steps, spoken in Python.

  The ROC sweep needs a continuous score, not a 0/1 label -- so grab predict_proba's
  column 1 (the sick-chance) and feed it to roc_curve. Then plot the curve against the
  diagonal.

  >> NEW TO PYTHON? Each named once:
       array[:, 1]        -- take column 1 from every row (slicing a 2-D array)
       a, b, c = func()   -- unpack several returned values into separate names
       f'AUC = {x:.3f}'   -- an f-string: drop a value into text, here to 3 decimals

    y_proba = log_reg.predict_proba(X_test_scaled)[:, 1]   # sick-chance per row
    fpr, tpr, thresholds = roc_curve(y_test, y_proba)
    roc_auc = auc(fpr, tpr)

  !! WARN: COLUMN 0 FLIPS THE CURVE BELOW THE DIAGONAL
     If you pass column 0 (well-chances), sick lumps score LOW and well lumps score HIGH --
     the ranking is backwards. The ROC curve dips below the diagonal and AUC < 0.5. The
     machine looks worse than a coin flip. Always use column 1.

  And the plot -- the curve, plus the grey diagonal for reference:

    plt.figure(figsize=(7, 5))
    plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.3f}')
    plt.plot([0, 1], [0, 1], linestyle='--', color='grey', label='Random guess')
    plt.xlabel('FPR  (false-alarm rate)')
    plt.ylabel('TPR  (catch-rate)')
    plt.title('Trade Curve')
    plt.legend()
    plt.show()


  ## The Labels, Last

    Plain term used above                    Standard label
    -------------------------------------    -------------------------------------------
    catch-rate (TPR)                         true positive rate / sensitivity / recall
    false-alarm (FPR)                        false positive rate / (1 - specificity)
    trade curve                              ROC curve (receiver operating char.)
    area under trade curve                   AUC (area under the ROC curve)
    chance-score                             posterior probability / predict_proba
    pick-higher reading                      Wilcoxon-Mann-Whitney: P(score+ > score-)
    trapezoidal rule                         numerical integration used by auc()

----------------------------------------------------------------------------------------------
  IN THIS CHAPTER (Chapter 3 -- Sorting Into Bins):
    Part 1 -- The S-Curve, the Four-Box Table .
    Part 2 (this post) .
    Part 3 -- Leash and Cloud .
    Part 4 -- Picking Settings, Skewed Piles

  <- Back to all posts
----------------------------------------------------------------------------------------------
  (c) 2026 Rahul Rai . pure HTML+CSS, no JavaScript, no trackers .
  home . source on GitHub
==============================================================================================