==============================================================================================
RAHUL'S ML BLOG -- notes on machine learning, worked out by hand est. 2026
==============================================================================================
home | about | archive | glossary | contact
----------------------------------------------------------------------------------------------
CHAPTER 3 . SORTING INTO BINS . PART 2 OF 4
The Trade Curve: Sliding the Cutoff and What AUC Actually Measures
Posted: 2026-06-05 . Author: Rahul Rai . Tags: roc-curve, auc, cutoff, evaluation
============================================================================================
PATH . post 7 of 28
<- prev: Sorting 1: S-Curve and Four-Box Table
next: Sorting 3: Leash and Cloud ->
Every score in the last post quietly agreed on one thing: draw the line at 0.5. But who
said 0.5? It is just one opinion about where "probably well" ends and "probably sick"
begins -- and it is an opinion you are free to overrule.
Reach out and slide that line. Push it left and the machine turns anxious, crying "sick"
at the faintest shadow -- it catches more real cases, but startles more healthy people
with false alarms. Push it right and the machine turns stoic, holding its tongue until
it is almost certain -- fewer false alarms, but more real cases slip quietly out the
door. Nowhere on that line is there a free lunch: every spot you pick is a different
bargain struck between lives and money. The trade curve is what you get when you stop
arguing over a single spot and lay all the bargains out side by side, in one picture.
IN HAND: machine gives each lump a score 0-to-1; draw a line (the cutoff) somewhere on
that scale. Four boxes: CAUGHT (truly sick, called sick), FALSE ALARM (truly well, called
sick), MISSED (truly sick, called well), CLEARED (truly well, called well).
This section measures the two rates those boxes produce as the cutoff slides.
## Two Rates
At any fixed cutoff, the four boxes yield two fractions:
catch-rate (TPR) = CAUGHT / all truly sick ^ good
false-alarm (FPR) = ALARM / all truly well v good
Formally (Y = true verdict: 1 sick, 0 well; yhat = machine's called verdict):
TPR(t) = P(yhat=1 | Y=1) = TP(t) / (TP(t) + FN(t))
FPR(t) = P(yhat=1 | Y=0) = FP(t) / (FP(t) + TN(t))
TPR is the same as recall. FPR is the fraction of healthy lumps that get misread as
sick. Moving the cutoff shifts both at once -- they cannot be improved together without
limit.
## Sliding the Cutoff
chance 0.0 --------------- 1.0
^ cutoff (slide)
slide LEFT (cutoff ~= 0.1): shout sick at almost everyone
catch-rate ^ (few sick lumps escape)
false-alarm ^ (healthy people caught in the net too)
slide RIGHT (cutoff ~= 0.9): shout sick only when bone-sure
catch-rate v (many sick lumps called well and sent home)
false-alarm v (healthy people mostly called clear)
-> a TRADE: cannot raise catches without raising false alarms
There is no cutoff that maximises TPR while holding FPR at zero -- unless the machine
perfectly separates the two groups. What exists is a continuous family of deals.
## Drawing All the Deals at Once
Sweep the cutoff from 1 down to 0. At each position, compute (FPR, TPR) and drop a dot at
those coordinates. Connect the dots: that is the trade curve, or ROC curve.
TPR (catch-rate, lives saved)
1 | * <- perfect machine (catch all, alarm none)
| ###
| ##
| - - - - \ <- diagonal = useless coin flip
0 +--------------- FPR (false-alarm, money wasted)
0 1
each dot = one cutoff position
perfect machine: hugs the top-left corner
coin flip: follows the diagonal
The curve is defined by t: {(FPR(t), TPR(t)) : t in [0, 1]}. As t -> 0, everyone is
called sick -> TPR=1, FPR=1 (top-right). As t -> 1, everyone is called well -> TPR=0,
FPR=0 (bottom-left).
A concrete 6-person exam pile, by pencil:
exam pile: 3 sick (truth=1), 3 well (truth=0)
lump truth machine chance 6 cutoff positions
-------------------------------------------------------------
A sick 0.97 -> cutoff 0.99: below -> well FN
B sick 0.88 -> cutoff 0.90: below -> well FN
C sick 0.72 -> cutoff 0.80: below -> well FN
D well 0.45 -> cutoff 0.60: below -> well TN
E well 0.22 -> cutoff 0.40: below -> well TN
F well 0.11 -> cutoff 0.20: below -> well TN
Sweep the cutoff through 6 positions and compute (FPR, TPR):
cutoff >=cut? CAUGHT MISSED ALARM CLEAR TPR FPR
-----------------------------------------------------------------
0.99 none 0/3 3/3 0/3 3/3 0.000 0.000
0.90 A 1/3 2/3 0/3 3/3 0.333 0.000
0.80 A,B 2/3 1/3 0/3 3/3 0.667 0.000
0.60 A,B,C 3/3 0/3 0/3 3/3 1.000 0.000
0.40 A,B,C,D 3/3 0/3 1/3 2/3 1.000 0.333
0.20 A,B,C,D, 3/3 0/3 2/3 1/3 1.000 0.667
E
0.00 all 3/3 0/3 3/3 0/3 1.000 1.000
TPR = CAUGHT / (CAUGHT + MISSED). FPR = ALARM / (ALARM + CLEAR).
At cutoff 0.90: 1 sick caught, 2 missed -> TPR=1/3=0.333;
0 alarms -> FPR=0/3=0.000. At cutoff 0.40: all 3 sick caught +
1 well alarmed -> TPR=3/3=1.000, FPR=1/3=0.333.
Plot the 6 dots: (0.000,0.000), (0.000,0.333), (0.000,0.667),
(0.000,1.000), (0.333,1.000), (0.667,1.000), (1.000,1.000).
The curve hugs the left edge (FPR=0 for the first 4 cutoffs),
then bends right. The curve is ABOVE the diagonal -- the machine
is doing real work.
>> YOUR TURN
At some cutoff (made-up) the four-box holds, out of 4 sick and 6 well: CAUGHT 3,
MISSED 1, ALARM 2, CLEAR 4. Work the catch-rate and false-alarm rate.
check your slate: TPR = CAUGHT / (CAUGHT + MISSED) = 3 / (3 + 1) = 3/4 = 0.75;
FPR = ALARM / (ALARM + CLEAR) = 2 / (2 + 4) = 2/6 ~= 0.333. One dot on the
trade curve at (0.333, 0.75) -- three sick in four caught, one well in three
falsely alarmed.
Count the sweep in clerk-steps: n lumps give n + 1 cutoff positions, and each
position re-tallies the four boxes over all n lumps. On the Wisconsin exam of 569
lumps that is 570 x 569 = 324,330 comparisons to draw the whole curve -- a morning
for the clerks, an afternoon's nap for the kettle.
## The Area Under the Curve
IN HAND: a 6-person exam swept through 6 cutoffs into (FPR, TPR) pairs that climbed
the left edge -- (0,0.333), (0,0.667), (0,1.0) -- then bent right along the top.
This section crushes that whole staircase into one number.
The whole curve collapses into one number: the area under it, AUC. A perfect machine
that hugs the top-left corner has area 1.0. A coin flip that follows the diagonal has
area 0.5. A machine whose scores are systematically backwards has area below 0.5.
AUC = 1.0 perfect -- no overlap between sick and well scores
AUC = 0.5 useless -- sick and well scores interleaved randomly
AUC < 0.5 backwards -- sick scores lower than well scores
AUC is computed from the curve points by the trapezoidal rule.
>> YOUR TURN
Two neighbouring ROC dots (made-up): (FPR 0.2, TPR 0.6) then (FPR 0.5, TPR 0.9).
Work the area of the trapezoid strip between them: width x average height.
check your slate: width = 0.5 - 0.2 = 0.3; average height = (0.6 + 0.9)/2 =
1.5/2 = 0.75; strip = 0.3 x 0.75 = 0.225. Add up every such strip across the
sweep and you have the whole area under the curve.
From the 6-cutoff
sweep above, by pencil:
drop FPR TPR (FPR_next - FPR) x (TPR_next + TPR) / 2
----------------------------------------------------------------------
1 0.000 0.000 (0.000 - 0.000) x (0.333 + 0.000) / 2 = 0.000
2 0.000 0.333 (0.000 - 0.000) x (0.667 + 0.333) / 2 = 0.000
3 0.000 0.667 (0.000 - 0.000) x (1.000 + 0.667) / 2 = 0.000
4 0.000 1.000 (0.333 - 0.000) x (1.000 + 1.000) / 2 = 0.333
5 0.333 1.000 (0.667 - 0.333) x (1.000 + 1.000) / 2 = 0.333
6 0.667 1.000 (1.000 - 0.667) x (1.000 + 1.000) / 2 = 0.333
--
total AUC = 0.000 + 0.000 + 0.000 + 0.333 + 0.333 + 0.333 = 1.000
The AUC is 1.0 -- a perfect ranking. Indeed the machine sorted all 3 sick chances
(0.97, 0.88, 0.72) above all 3 well chances (0.45, 0.22, 0.11). For ANY cutoff
between 0.72 and 0.45, the machine catches all sick and alarms none.
## What AUC Actually Measures
Areas under curves are easy to compute and hard to feel. So here is the same number
told as a tiny story you can actually picture -- a result that goes by the name
pick-higher reading:
AUC = P( score(x+) > score(x-) )
Pick one sick lump at random and one well lump at random. What is the probability that
the machine gave the sick lump a higher chance-score than the well lump? That is AUC.
GOOD machine -- scores:
well lumps -> #### (all low)
sick lumps -> #### (all high)
-> almost always: sick score > well score -> AUC ~= 1.0
BAD machine -- scores:
well lumps -> # # # # # #
sick lumps -> # # # # # #
-> roughly half the time sick scores below well -> AUC ~= 0.5
AUC = 0.97 means: take a random sick lump and a random well lump; 97 times out of 100
the machine scored the sick lump higher. It answers "how cleanly do the two groups
separate?" without committing to any one cutoff.
** KEY: AUC IS CUTOFF-INDEPENDENT
The four scores from Part 1 -- accuracy, recall, precision, F1 -- all answer "how
good is this machine at threshold 0.5?" AUC answers "how good is this machine's
RANKING of sick chances above well chances, everywhere, always?" It is the right score
when you want to compare two machines before deciding where to set the cutoff.
## Why You Need Chances, Not Labels
The ROC sweep needs a continuous score to sweep through. predict returns only 0 or 1 --
two unique scores, two dots, a useless curve. predict_proba returns the raw chance
output before any cutoff, giving a different number for every row.
predict(X_test_scaled) -> [1, 0, 1, 1, 0, ...] (0 or 1 only)
predict_proba(X_test_scaled) -> [[0.03, 0.97], (well-chance, sick-chance)
[0.81, 0.19], per row
...]
predict_proba returns two columns per row: column 0 is the well-chance P(B), column 1 is
the sick-chance P(M). For the ROC curve you want the sick-chance -- column 1 -- and the
code that sweeps it into a curve is at the end of the post.
## Plotting It
When you plot it (code at the end), the grey 45-degree diagonal is the reference: a
machine that assigns random scores sits right on it. Any curve above the diagonal beats
random, and the further it bulges toward the top-left corner, the cleaner the separation.
## Practical Reading
AUC ~= 0.99 the two clouds are cleanly separated; almost any cutoff works
AUC ~= 0.85 good separation; choice of cutoff matters; tune recall vs precision
AUC ~= 0.70 noisy; weak signal; more columns or a different machine might help
AUC ~= 0.50 no signal at all; the 30 columns carry no information about the bin
>> NOTE: THE TRADE CURVE AND AUC DO NOT PICK A CUTOFF FOR YOU
They tell you how well-separated the two groups are. The choice of cutoff -- how much
ALARM is acceptable to buy a given catch-rate -- is a clinical or business decision.
In cancer screening, a doctor reading a curve of AUC 0.97 might pick the cutoff that
gives 99% recall even if FPR climbs to 15%. The curve shows the available deals; the
doctor picks one.
## The Code, If You Want It
Nothing above needed a computer -- only pencils, clerks, and patience. This last
section is for the day you meet one: the same steps, spoken in Python.
The ROC sweep needs a continuous score, not a 0/1 label -- so grab predict_proba's
column 1 (the sick-chance) and feed it to roc_curve. Then plot the curve against the
diagonal.
>> NEW TO PYTHON? Each named once:
array[:, 1] -- take column 1 from every row (slicing a 2-D array)
a, b, c = func() -- unpack several returned values into separate names
f'AUC = {x:.3f}' -- an f-string: drop a value into text, here to 3 decimals
y_proba = log_reg.predict_proba(X_test_scaled)[:, 1] # sick-chance per row
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
roc_auc = auc(fpr, tpr)
!! WARN: COLUMN 0 FLIPS THE CURVE BELOW THE DIAGONAL
If you pass column 0 (well-chances), sick lumps score LOW and well lumps score HIGH --
the ranking is backwards. The ROC curve dips below the diagonal and AUC < 0.5. The
machine looks worse than a coin flip. Always use column 1.
And the plot -- the curve, plus the grey diagonal for reference:
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.3f}')
plt.plot([0, 1], [0, 1], linestyle='--', color='grey', label='Random guess')
plt.xlabel('FPR (false-alarm rate)')
plt.ylabel('TPR (catch-rate)')
plt.title('Trade Curve')
plt.legend()
plt.show()
## The Labels, Last
Plain term used above Standard label
------------------------------------- -------------------------------------------
catch-rate (TPR) true positive rate / sensitivity / recall
false-alarm (FPR) false positive rate / (1 - specificity)
trade curve ROC curve (receiver operating char.)
area under trade curve AUC (area under the ROC curve)
chance-score posterior probability / predict_proba
pick-higher reading Wilcoxon-Mann-Whitney: P(score+ > score-)
trapezoidal rule numerical integration used by auc()
----------------------------------------------------------------------------------------------
IN THIS CHAPTER (Chapter 3 -- Sorting Into Bins):
Part 1 -- The S-Curve, the Four-Box Table .
Part 2 (this post) .
Part 3 -- Leash and Cloud .
Part 4 -- Picking Settings, Skewed Piles
<- Back to all posts
----------------------------------------------------------------------------------------------
(c) 2026 Rahul Rai . pure HTML+CSS, no JavaScript, no trackers .
home . source on GitHub
==============================================================================================