==============================================================================================
  RAHUL'S ML BLOG -- notes on machine learning, worked out by hand                    est. 2026
==============================================================================================
  home | about | archive | glossary | contact
----------------------------------------------------------------------------------------------

  SPECIAL . FOR HACKER NEWS
  Cheapest Walk: UCS and A*, Built From Nothing, Every Pop Shown
  Posted: 2026-06-17 . Author: Rahul Rai . Tags: search, ucs, a-star, dijkstra, by-hand
  ============================================================================================

  PATH . SPECIAL -- Cheapest Walk: UCS and A*  (standalone; no other page on this blog needed)

  Every pathfinding post draws a maze with coloured arrows, names f = g + h, and waves at
  "admissibility." This page never waves. One town, five places. Every single pop and offer
  shown with its arithmetic. The proof that the cheapest pop is already final, worked out with
  numbers, not hand-waved. And A* as the few-line change to UCS that it really is -- run on
  the same map, every pop shown, the exact moment it wins demonstrated.

  THE RULE OF THIS PAGE. Assume you remember nothing. Every word is defined where it first
  appears. Anything named more than five lines ago is named again. Every number is worked in
  front of you, never "recall that...". One idea per line. A pencil and this page are the
  whole kit. Decimals shown to two places; a final digit may sit one off from a longer division.

  TOY MAP, REAL RECIPE. One line of honesty up front, so nothing misleads. The RECIPE --
  every move, in order -- is exactly Uniform-Cost Search and A* as Dijkstra, Russell, and
  Norvig wrote them. The MAP is a toy: five letter-nodes, a handful of roads with small costs,
  chosen so every number fits on paper. Where a toy choice could mislead about the method,
  the text says so. Trust the recipe; treat the map as a sketchpad.

  The whole method, named once so you see the shape, then built piece by piece below:

      map + costs  ->  pile (heap)  ->  pop cheapest  ->  stale? toss  ->  goal? stop
           ->  seal  ->  offer each unsealed neighbour  ->  add if new, replace if cheaper
           ->  write tab + breadcrumb  ->  repeat

  Three data structures sit on your desk throughout. Named once here so you are not surprised:
  the pile (a min-heap of slips), the tab (a dict of cheapest costs found so far), and the
  breadcrumb (a dict of predecessors, used to reconstruct the path at the end). Each is fully
  spelled out where it first appears below.

  ** KEY: THE BLIND MAZE TRAP -- why a cost-only search wastes time in the wrong direction
     Uniform cost search spreads like a puddle: it explores every road with equal cost
     before committing to any direction. It WILL find the cheapest path -- but it
     explores the entire map outward in a ring, including all the roads leading away
     from the goal, before it reaches the goal.
     THE FIX: A*. Add a compass (h) that estimates the remaining cost to the goal.
     Each slip on the pile gets scored as cost-so-far + compass-estimate.
     The puddle now pushes harder in the direction the compass says is promising.
     It still finds the optimal path (as long as the compass never overestimates).


  ## A Machine That Adds Costs -- and "S→B" Is Not a Number

  A town has five places: S (start), A, B, C, G (goal). Every road has a cost painted on it
  in horses -- how dear it is to walk that road. Roads run both ways at the same cost.

      leaves S :  A(1)   B(4)
      leaves A :  S(1)   B(1)   C(5)
      leaves B :  S(4)   A(1)   C(1)   G(8)
      leaves C :  A(5)   B(1)   G(2)
      leaves G :  C(2)   B(8)

      start = S      goal = G

  Drawn on paper (each number is a road cost):

           S
          / \
         1   4
        /     \
       A --1-- B --8-- G
        \     /       /
         5   1       2
          \ /       /
           C ------+

  The cheapest walk from S to G is NOT the most direct one (S->B->G = 4+8 = 12).
  It is S->A->B->C->G = 1+1+1+2 = 5. Four roads, less than three.

  In code this is a dict of dicts: `graph['B'] = {'S': 4, 'A': 1, 'C': 1, 'G': 8}`. Nothing
  else. The map never changes. Every confusion that comes later traces back to forgetting this.

  The goal: find the walk from S to G whose road-costs add to the smallest total. Not the
  walk with the fewest roads -- the walk with the cheapest SUM.


  ## Fewest Roads Is Not Cheapest -- Therefore You Cannot Commit to a Road

  Two ways to reach B. Direct: S→B costs 4. Via A: S→A costs 1, then A→B costs 1, total 2.
  Two roads beats one road, 2 < 4. Fewer roads is more expensive here.

  That kills the obvious strategy: take the cheapest first road and follow it. A dear opening
  road might reveal a cheaper continuation later. You cannot commit to any path early, because
  you don't yet know what roads open up ahead.

  Which means you must keep ALL partial routes alive until you can PROVE a place is final.
  That proof only arrives at one specific moment -- spelled out below -- and it relies on the
  costs being ≥ 0. That condition matters; it's the floor the whole argument stands on.


  ## Three Papers On Your Desk Force Themselves Into Being

  If you can't commit, you need a running record. Three pieces of paper appear on your desk
  not by choice but by necessity:

  **The pile** (call it `q`). A heap -- a list kept so the cheapest item always sits on top.
  Each item on the pile is a slip: (cost-so-far, place). The pile holds every frontier
  candidate you haven't decided about yet. Heap means: heapq in Python; every push and pop
  keeps the smallest on top automatically.

  **The tab** (call it `states`). A dict `{place: cheapest cost-so-far found to reach it}`.
  When you find a cheaper route to a place, you overwrite the old cost. One number per place,
  always the smallest seen so far. Separate from the pile: the pile holds (cost, place) PAIRS
  you pop; the tab holds ONE cost PER PLACE you look up. Never blur them.

  **The breadcrumb** (call it `previous`). A dict `{place: predecessor}`. When place P gets
  a new cheapest route, you write `previous[P] = wherever that cheaper route came from`.
  At the end, walk it backward: G→C→B→A→S→(None) and you've recovered the path.

  At birth, you're already standing on S for free. No road was walked to reach it. So:

      q        = [ (0, S) ]        pile holds one slip: cost 0, at S
      states   = { S: 0 }         tab says S reached for 0
      previous = { S: None }      S has no predecessor (the chain stops here)
      sealed   = { }              nothing finalized yet

  `states` is NOT empty at birth. Empty would mean "I've reached nothing." But you're
  standing on S right now, so S is reached for 0.


  ## The Wheel: Pull, Discard or Stop or Open

  Pull the cheapest slip (cost, place) off the pile. Three cases, in order:

    Already sealed?  →  Discard. Pull again.  (A stale copy -- explained right after the walk.)
    Is it the goal?  →  Stop. Walk breadcrumbs back from G to S.
    Else:            →  Seal this place. Then, for every road leaving it:

        new = cost + road-cost

      For each neighbour on that road:
        skip if neighbour is already sealed
        neighbour not in tab yet         → ADD:     push (new, nbr), tab[nbr]=new, prev[nbr]=place
        nbr in tab and new < tab[nbr]    → REPLACE: push (new, nbr), tab[nbr]=new, prev[nbr]=place
        nbr in tab and new ≥ tab[nbr]    → offer loses, do nothing

  Both ADD and REPLACE push a NEW slip onto the pile. You never edit a slip already on the
  pile -- heaps have no eraser. The old pricier slip stays, rides until it surfaces, sees its
  place is already sealed, and gets tossed. Popped many times; sealed exactly once.


  ## The Full UCS Walk (Six Pops, Every Offer Shown)

  ```
    start:  q=[(0,S)]  states={S:0}  previous={S:None}  sealed={}

    POP (0, S)   S not in sealed.  S ≠ goal.  seal S.  offer roads from S:
      A: new = 0+1 = 1.  A not in states  → ADD:  states[A]=1, prev[A]=S, push (1,A)
      B: new = 0+4 = 4.  B not in states  → ADD:  states[B]=4, prev[B]=S, push (4,B)
      pile after: (1,A) (4,B)

    POP (1, A)   A not sealed.  A ≠ goal.  seal A.  offer roads from A:
      S: new = 1+1 = 2.  states[S]=0.  2 ≥ 0  → offer loses, skip
      B: new = 1+1 = 2.  states[B]=4.  2 < 4  → REPLACE: states[B]=2, prev[B]=A, push (2,B)
      C: new = 1+5 = 6.  C not in states  → ADD: states[C]=6, prev[C]=A, push (6,C)
      pile after: (2,B) (4,B)stale (6,C)

    POP (2, B)   B not sealed.  B ≠ goal.  seal B.  offer roads from B:
      S: new = 2+4 = 6.  states[S]=0.  skip.    A: new = 2+1 = 3.  states[A]=1.  skip.
      C: new = 2+1 = 3.  states[C]=6.  3 < 6  → REPLACE: states[C]=3, prev[C]=B, push (3,C)
      G: new = 2+8 = 10. G not in states  → ADD: states[G]=10, prev[G]=B, push (10,G)
      pile after: (3,C) (4,B)stale (6,C)stale (10,G)

    POP (3, C)   C not sealed.  C ≠ goal.  seal C.  offer roads from C:
      A: new = 3+5 = 8.  states[A]=1.  skip.    B: new = 3+1 = 4.  states[B]=2.  skip.
      G: new = 3+2 = 5.  states[G]=10. 5 < 10 → REPLACE: states[G]=5, prev[G]=C, push (5,G)
      pile after: (4,B)stale (5,G) (6,C)stale (10,G)stale

    POP (4, B)   B IS in sealed → STALE.  Discard.  Pull again.

    POP (5, G)   G not sealed.  G IS the goal → STOP.
  ```

  Walk breadcrumbs back from G:

      G  ← C  (prev[G] = C)
      C  ← B  (prev[C] = B)
      B  ← A  (prev[B] = A)
      A  ← S  (prev[A] = S)
      S  ← •  (prev[S] = None  →  stop)

  Path = [S, A, B, C, G].   Cost = states[G] = 5.

  The cheapest walk is not S→B→G (cost 4+8=12, two roads). Not S→B→C→G (4+1+2=7). It is
  S→A→B→C→G (1+1+1+2=5). The method found it by keeping all options alive and closing the
  cheapest box in each round.


  ## Two Things the Walk Just Did, Worth Naming

  **The tab killed the circle.** When A offered S back, 1+1=2 lost to states[S]=0. No
  "visited" set needed. The tab already holds the cheapest cost to every sealed place; any
  offer to a sealed place will always lose, because the sealed place was finalized when it
  was the global cheapest and -- since every road is ≥ 0 -- nothing can be cheaper than that.
  The tab plus the non-negative roads are the circle-breaker, built in, for free. (Drop the
  non-negative condition and this stops working: a place's tab could keep improving forever
  around a negative loop. The tab breaks circles only because no road ever lowers a total.)

  **Stale slips ride harmlessly.** When A found B for cost 2, the old (4,B) slip was already
  sitting on the pile. It couldn't be removed -- heaps have no per-item delete. So it rode
  along, three more pops later surfaced as round 5, saw B in the sealed set, and died.
  One extra discard was the price HERE; in general it's one stale pop per superseded slip, so
  a place reached cheaper k times leaves k−1 ghosts on the pile. That is the price of not
  having a decrease-key operation. In practice this is called "lazy deletion" and it's how
  most real UCS/Dijkstra/A* implementations work (the alternative, an indexed heap with
  decrease-key, trades the ghosts for bookkeeping).


  ## Why the Cheapest Pop Is Already Final (The Proof, With Numbers)

  Pop X off the pile. X is the cheapest unsealed place; its current cost is c. Why is c final?
  Why can no future route to X be cheaper?

  Any route to X that doesn't go through X directly must pass through some other unsealed
  place first -- call it Y. Y is unsealed, so states[Y] ≥ c (that's what "cheapest unsealed"
  means -- everything else costs at least as much). To then walk from Y to X costs a road ≥ 0.
  So the total via Y is ≥ c + 0 = c. Exactly c, at best. No improvement.

  With numbers from the walk: after sealing S and A, G first appeared at cost 10 (via B).
  Was it safe to leave G alone and seal C first? C was the cheapest unsealed place at cost 3.
  Any other route to C passes through an unsealed place costing ≥ 3, plus a road ≥ 0: total
  ≥ 3. C is final at 3. And from C, G dropped to 5 -- cheaper than the original 10.

  **The floor the whole argument stands on:** roads must cost ≥ 0. If a road has negative
  cost, walking it subtracts from the total. Then a route through a more expensive place could
  end up cheaper after the negative road, and the step "≥ c + 0 ≥ c" breaks. UCS is no longer
  guaranteed correct -- it CAN seal a place at a cost a later negative road would have beaten.
  (It doesn't fail on every graph with a negative road -- some happen to come out right -- but
  the guarantee is gone, which is all you need to stop trusting it.) Worse, on THIS map every
  road is two-way, so a single negative road is a negative two-cycle you can loop forever for
  unboundedly negative cost -- there is no shortest path to find at all. For directed graphs
  with negative edges but no negative cycle, Bellman-Ford handles it, at higher cost.

  This correctness argument is due to Dijkstra (1959) and is also why "this IS Dijkstra's
  algorithm" -- see the honest note at the end.


  ## A*: The Same Wheel, One Change

  UCS orders the pile by g alone (g = cost-so-far). A* orders it by f = g + h, where h is a
  straight-line guess of how far the goal still is. The guess tilts the pile toward goal-
  directed places. The wheel, the tab, the breadcrumbs, the sealed set: all identical. Only
  the number you push onto the pile changes.

  Each slip on the pile carries THREE numbers: (f, g, place). f = g + h sorts the pile; g
  is the true cost spent; place is where. The tab, the offer test, and whatever you return at
  the end all use g alone. The guess ONLY affects which slip pops first. It never touches the
  money.

  The straight-line guess -- ignoring the roads, measuring crow-flies -- is:

      h(place) = sqrt( (goal_col - place_col)^2 + (goal_row - place_row)^2 )

  A place is written (row, col). Column gap first, row gap second. Both are squared so the
  order doesn't matter for the value -- but get the mapping right deliberately: a place is
  (row, col), not (x, y).

  Here is the one thing you cannot fake: the straight-line guess is only a safe under-count
  if the map is HONEST -- if every road is at least as long as the straight line between its
  two ends. A crow-flies guess can only be a lower bound on road cost when roads don't take
  shortcuts the crow can't. So assign positions where every road's length ≤ its painted cost:

      S = (0, 0)    A = (1, 0)    B = (1, 1)    C = (2, 1)    G = (2, 3)

  Check the honesty -- each road's straight-line length vs its cost:

      S-A: length 1.00 ≤ cost 1     A-B: length 1.00 ≤ cost 1     B-C: length 1.00 ≤ cost 1
      C-G: length 2.00 ≤ cost 2     S-B: length 1.41 ≤ cost 4     A-C: length 1.41 ≤ cost 5
      B-G: length 2.24 ≤ cost 8

  Every road is at least as long as its crow-flies line. (The first attempt at this section
  used flashier coordinates -- and they LIED: a road of cost 1 spanned a straight-line gap of
  2, so the guess overshot and the under-count proof below would have been false. The numbers
  here are honest by construction.)

  h values to G = (2, 3):

      h(S) = sqrt( (3-0)^2 + (2-0)^2 ) = sqrt(13)  ≈ 3.61
      h(A) = sqrt( (3-0)^2 + (2-1)^2 ) = sqrt(10)  ≈ 3.16
      h(B) = sqrt( (3-1)^2 + (2-1)^2 ) = sqrt(5)   ≈ 2.24
      h(C) = sqrt( (3-1)^2 + (2-2)^2 ) = sqrt(4)   = 2.00
      h(G) = 0

  Cross-check against the true cheapest road costs to G (S=5, A=4, B=3, C=2): every h is ≤
  its true cost (3.61≤5, 3.16≤4, 2.24≤3, 2.00≤2). Each h is a genuine under-count -- which,
  proved below, is exactly what makes A* safe.


  ## The Full A* Walk (Five Pops -- One Fewer Than UCS)

  ```
    start:  pile = [(3.61, 0, S)]   states = {S:0}   prev = {S:None}   sealed = {}

    POP (3.61, 0, S)   S not sealed.  S ≠ goal.  seal S.  offer from S:
      A: g=0+1=1,  f=1+3.16=4.16  A not in states  → states[A]=1, prev[A]=S, push (4.16,1,A)
      B: g=0+4=4,  f=4+2.24=6.24  B not in states  → states[B]=4, prev[B]=S, push (6.24,4,B)
      pile: (4.16,1,A) (6.24,4,B)

    POP (4.16, 1, A)   A not sealed.  A ≠ goal.  seal A.  offer from A:
      S: g=1+1=2, f=2+3.61=5.61.  states[S]=0. skip.
      B: g=1+1=2, f=2+2.24=4.24.  states[B]=4. 2<4  → REPLACE: states[B]=2, prev[B]=A,
                                                                  push (4.24,2,B)
      C: g=1+5=6, f=6+2.00=8.00.  C not in states  → states[C]=6, prev[C]=A, push (8.00,6,C)
      pile: (4.24,2,B) (6.24,4,B)stale (8.00,6,C)

    POP (4.24, 2, B)   B not sealed.  B ≠ goal.  seal B.  offer from B:
      S: skip.   A: skip.
      C: g=2+1=3, f=3+2.00=5.00.  states[C]=6. 3<6  → REPLACE: states[C]=3, prev[C]=B,
                                                                  push (5.00,3,C)
      G: g=2+8=10, f=10+0=10.00.  G not in states  → states[G]=10, prev[G]=B, push (10.00,10,G)
      pile: (5.00,3,C) (6.24,4,B)stale (8.00,6,C)stale (10.00,10,G)

    POP (5.00, 3, C)   C not sealed.  C ≠ goal.  seal C.  offer from C:
      A: skip.   B: skip.
      G: g=3+2=5, f=5+0=5.00.  states[G]=10. 5<10  → REPLACE: states[G]=5, prev[G]=C,
                                                               push (5.00,5,G)
      pile: (5.00,5,G) (6.24,4,B)stale (8.00,6,C)stale (10.00,10,G)stale

    POP (5.00, 5, G)   G not sealed.  G IS the goal → STOP.
  ```

  Same path: [S, A, B, C, G]. Same cost: 5.

  Five pops total. UCS needed six (the stale (4,B) was pop 5 in UCS, wasting a round).
  Here the stale (6.24,4,B) is still on the pile when G pops -- it was never reached.
  Why? After sealing C, the pile held (5.00,5,G) and (6.24,4,B)stale. UCS would sort these
  by g: 4 < 5, so stale B would pop first. A* sorted by f: 5.00 < 6.24, so G popped first.
  The heuristic pushed G ahead of the stale copy. That is the whole A* win on this map --
  one saved discard -- but on a grid with thousands of nodes the savings compound across
  hundreds of pops.


  ## The Guess Is an Under-Count -- Which Is Why A* Returns the Same Answer as UCS

  The heuristic h(place) is admissible if it never overcounts the true remaining cost:
  h(place) ≤ true cheapest road cost from place to G, always. Here is the careful version
  of the claim, because it is easy to get wrong: Euclidean distance is admissible ONLY when
  every road costs at least its straight-line length. That is the honesty condition checked
  when the coordinates were assigned. Under it, the cheapest road from any place to G is a
  sequence of road costs, each ≥ its own straight-line leg, so the whole road sum ≥ the
  straight line from place to G -- which is exactly h. So h ≤ true remaining cost. If road
  costs do NOT dominate straight-line lengths (a cheap road across a wide gap), Euclidean
  distance can overshoot and is NOT admissible -- it is not magic, it is a consequence of the
  map being honest.

  Why admissibility matters: if h ever overcounted (h(X) > true cost X→G), it could make a
  slip for X look more expensive than it really is, pushing X down the pile past slips on a
  worse path. A* could then pop the goal by a dearer route before X's cheaper route ever
  surfaces, seal it, and return that dearer answer -- the optimality guarantee is gone. An
  admissible h can't do this -- it undersells, so it can only pull slips EARLIER, never bury
  the cheapest one. A* seals the same correct answer as UCS, in fewer pops.

  The stronger h property worth naming: **consistent** (also called monotone). h is
  consistent if for every road P→Q of cost c: h(P) ≤ c + h(Q). On an honest map Euclidean
  distance satisfies this by the triangle inequality (h(P) ≤ straight-line P→Q ≤ c, plus
  h(Q)). Consistency implies admissibility, and it is the property that lets graph-search A*
  trust the sealed set: with a consistent h, once a place is sealed its g is final and never
  needs revisiting. (With a merely-admissible-but-inconsistent h, textbook A* may need to
  reopen sealed nodes.) In practice: an honest-map Euclidean h gives you both. If you write a
  custom h, check admissibility first, consistency second.


  ## Honest Extras (Three Short)

  **UCS IS Dijkstra's algorithm.** Edsger Dijkstra published this exact method in 1959 for
  finding shortest paths in graphs with non-negative edge weights. UCS is the AI textbook
  name (Russell & Norvig). Dijkstra is the CS algorithms name. Same algorithm, same data
  structures, same correctness proof. If you knew Dijkstra's but had not seen UCS: you already
  knew this. If you knew UCS but not Dijkstra's: now you do. The algorithm has one name on
  every graph theory exam and a different name on every AI midterm.

  **Why Bellman-Ford instead of UCS for negative edges.** The proof above requires the
  cheapest unsealed place to be final. Negative edges break this: a place penned at cost 5
  might later be reached for cost 5 + (-3) = 2 via a negative road, which would have beaten
  the sealed cost. Bellman-Ford handles this by relaxing every edge |V|−1 times; it's slower
  (O(VE) vs O(E log V) for Dijkstra) but correct on directed graphs with negative edges --
  provided there is no negative cycle, which makes "shortest path" meaningless for anyone.
  (On the binary-heap with lazy deletion shown here the heap can hold up to E entries, so the
  literal bound is O(E log E); but log E ≤ 2 log V for any graph, so it collapses to the
  usual O(E log V).)

  **Why the tab and the pile hold the same fact at birth but are different tools.** Both
  start holding one cost-0 entry for S. After one pop they diverge: the pile grows by adding
  slips for A and B; the tab grows by adding entries for A and B. But the pile holds PAIRS
  (cost, place) you retrieve by popping cheapest. The tab holds ONE number per place you
  look up by name. Confusing them -- e.g. using the pile to check if a place has been reached,
  or using the tab to get the cheapest pair -- is the single most common first-implementation
  bug. The pile answers "what should I explore next?"; the tab answers "what is the cheapest
  cost to place P?"


  ## The Numbers, in Python

  Round-by-round, the same numbers from the walk above, as named variables:

  ```python
  # --- town road costs ---
  s_to_a, s_to_b = 1, 4
  a_to_s, a_to_b, a_to_c = 1, 1, 5
  b_to_s, b_to_a, b_to_c, b_to_g = 4, 1, 1, 8
  c_to_a, c_to_b, c_to_g = 5, 1, 2
  g_to_c, g_to_b = 2, 8

  # UCS round 1: pop S (cost 0)
  cost_s = 0
  new_a_r1 = cost_s + s_to_a     # 1  (A not in tab → ADD)
  new_b_r1 = cost_s + s_to_b     # 4  (B not in tab → ADD)
  tab_a, tab_b = new_a_r1, new_b_r1   # tab: A=1, B=4

  # UCS round 2: pop A (cost 1)
  cost_a = tab_a                  # 1
  new_s_r2 = cost_a + a_to_s     # 2  (tab S=0, 2 ≥ 0 → skip)
  new_b_r2 = cost_a + a_to_b     # 2  (2 < tab B=4 → REPLACE)
  new_c_r2 = cost_a + a_to_c     # 6  (C not in tab → ADD)
  tab_b    = new_b_r2             # 2  (was 4)
  tab_c    = new_c_r2             # 6

  # UCS round 3: pop B (cost 2)
  cost_b    = tab_b               # 2
  new_c_r3  = cost_b + b_to_c    # 3  (3 < tab C=6 → REPLACE)
  new_g_r3  = cost_b + b_to_g    # 10 (G not in tab → ADD)
  tab_c     = new_c_r3            # 3  (was 6)
  tab_g     = new_g_r3            # 10

  # UCS round 4: pop C (cost 3)
  cost_c   = tab_c                # 3
  new_g_r4 = cost_c + c_to_g     # 5  (5 < tab G=10 → REPLACE)
  tab_g    = new_g_r4             # 5  (was 10)

  # UCS round 5: pop (4, B) → stale: B sealed at 2, slip says 4
  stale_b_slip = 4
  b_sealed_at  = 2                # 4 ≠ 2 → toss

  # UCS round 6: pop G (cost 5) → GOAL
  final_cost_ucs = tab_g          # 5

  # A* straight-line h values to G=(2,3); coords picked so every road >= its straight line
  import math
  s_pos, a_pos, b_pos, c_pos, g_pos = (0,0), (1,0), (1,1), (2,1), (2,3)

  def h(place_pos, goal_pos):
      return math.sqrt((goal_pos[1]-place_pos[1])**2 + (goal_pos[0]-place_pos[0])**2)

  h_s = h(s_pos, g_pos)   # 3.61  (true cost to G = 5, so 3.61 <= 5: admissible)
  h_a = h(a_pos, g_pos)   # 3.16  (true 4)
  h_b = h(b_pos, g_pos)   # 2.24  (true 3)
  h_c = h(c_pos, g_pos)   # 2.00  (true 2)
  h_g = h(g_pos, g_pos)   # 0.00

  # A* f-values (what goes into the pile to sort it)
  # round 1: pop S (f=3.61, g=0) → offer A and B
  f_a_r1 = 1 + h_a        # 4.16  (pile entry for A)
  f_b_r1 = 4 + h_b        # 6.24  (pile entry for B)

  # round 2: pop A (f=4.16, g=1) → B cheaper
  f_b_r2 = 2 + h_b        # 4.24  (replaces 6.24)
  f_c_r2 = 6 + h_c        # 8.00

  # round 3: pop B (f=4.24, g=2) → C and G offered
  f_c_r3 = 3 + h_c        # 5.00  (replaces 8.00)
  f_g_r3 = 10 + h_g       # 10.00

  # round 4: pop C (f=5.00, g=3) → G offered cheaper
  f_g_r4 = 5 + h_g        # 5.00  (replaces 10.00)

  # round 5: cheapest f in pile = (5.00,5,G) < (6.24,4,B)stale → G pops first!
  # UCS would pop stale B here (g=4 < g=5). A* skips it.
  f_g_final       = f_g_r4    # 5.00  ← wins the sort over stale B's 6.24
  f_b_stale       = f_b_r1    # 6.24  ← never popped; G won first
  final_cost_astar = 5         # same answer as UCS, one fewer pop
  ```

  Working heapq implementation -- the recipe above, directly:

  ```python
  import heapq, math

  graph = {
      'S': {'A': 1, 'B': 4},
      'A': {'S': 1, 'B': 1, 'C': 5},
      'B': {'S': 4, 'A': 1, 'C': 1, 'G': 8},
      'C': {'A': 5, 'B': 1, 'G': 2},
      'G': {'C': 2, 'B': 8},
  }

  def ucs(graph, start, goal):
      pile     = [(0, start)]      # the pile: (g, place) slips
      states   = {start: 0}        # the tab: cheapest cost per place
      previous = {start: None}     # the breadcrumb: predecessor per place
      sealed   = set()
      while pile:
          cost, place = heapq.heappop(pile)
          if place in sealed: continue
          if place == goal:   break
          sealed.add(place)
          for nbr, road in graph[place].items():
              if nbr in sealed: continue
              new = cost + road
              if nbr not in states or new < states[nbr]:
                  states[nbr]   = new
                  previous[nbr] = place
                  heapq.heappush(pile, (new, nbr))
      if goal not in states:                       # goal never reached
          return None, math.inf
      path, node = [], goal
      while node is not None: path.append(node); node = previous[node]
      return list(reversed(path)), states[goal]

  pos = {'S':(0,0), 'A':(1,0), 'B':(1,1), 'C':(2,1), 'G':(2,3)}  # honest map: road >= straight line

  def astar(graph, start, goal, pos):
      def h(p): return math.sqrt((pos[goal][1]-pos[p][1])**2+(pos[goal][0]-pos[p][0])**2)
      pile     = [(h(start), 0, start)]   # the pile: (f, g, place) slips, sorted by f
      states   = {start: 0}               # the tab: cheapest g per place (NOT f)
      previous = {start: None}            # the breadcrumb: predecessor per place
      sealed   = set()
      while pile:
          _, cost, place = heapq.heappop(pile)
          if place in sealed: continue
          if place == goal:   break
          sealed.add(place)
          for nbr, road in graph[place].items():
              if nbr in sealed: continue
              new = cost + road
              if nbr not in states or new < states[nbr]:
                  states[nbr]   = new
                  previous[nbr] = place
                  heapq.heappush(pile, (new + h(nbr), new, nbr))
      if goal not in states:
          return None, math.inf
      path, node = [], goal
      while node is not None: path.append(node); node = previous[node]
      return list(reversed(path)), states[goal]

  ucs_path, ucs_cost   = ucs(graph, 'S', 'G')    # ['S','A','B','C','G'], 5
  astar_path, astar_cost = astar(graph, 'S', 'G', pos)  # ['S','A','B','C','G'], 5
  ```


  ## One Breath (Carry This Away)

  Keep a pile (min-heap of slips), a tab (cheapest cost per place), and breadcrumbs; pop the
  cheapest, discard if already sealed, stop if it's the goal, else seal it and offer every
  unsealed neighbour cost+road: add if new, replace if cheaper, ignore if not; safe because
  the cheapest unsealed place's cost can't be undercut -- any alternative passes through an
  unsealed place costing ≥ that, plus a road ≥ 0. The tab kills backward walks by itself,
  no visited-set needed; stale slips die on pop via lazy deletion. A* is the same wheel with
  the pile sorted by g + straight-line h instead of g alone, while the tab and returned cost
  stay g; admissible h (h ≤ true remaining cost, always) guarantees the same answer, in fewer
  pops. UCS and Dijkstra's algorithm are the same thing with different name tags.

  On this map: UCS found [S,A,B,C,G] cost 5 in 6 pops (5 seals + 1 stale discard).
  A* found the same path and cost in 5 pops (5 seals, stale B never reached the top).


----------------------------------------------------------------------------------------------
  SPECIAL . CHEAPEST WALK: UCS AND A*  (written for Hacker News; standalone)
  Go deeper, same blog, same method:
    LSTM From Pencil -- RNN and LSTM From Scratch, Nothing But a Pencil
    Transformer From Pencil -- Attention From Scratch, One Number at a Time
    Transformers With Pencil -- A Whole Block Worked by Hand

  <- Back to all posts
----------------------------------------------------------------------------------------------
  (c) 2026 Rahul Rai . pure HTML+CSS, no JavaScript, no trackers .
  home . source on GitHub
==============================================================================================