The Kelly Criterion: How a 1950s Gambling Formula Optimizes Your Sprint Planning and Cloud Infrastructure

The Kelly Criterion: How a 1950s Gambling Formula Optimizes Your Sprint Planning and Cloud Infrastructure

The Kelly Criterion: How a 1950s Gambling Formula Optimizes Your Sprint Planning and Cloud Infrastructure

Łukasz Głowacki

Imagine you walk into a casino and find a game that seems broken. Flip a coin that’s heavily weighted in your favor:

  • 90% probability you win, doubling your money.

  • 10% probability you lose, and your stake is gone.

Your eyes light up. This is it. You’re about to become filthy rich.

You start with $100. With a 90% win rate, you decide to go big. You bet your entire $100 on the first flip.

Flip 1: Win! $200. “Easy game.” You go all-in again. Flip 2: Win! $400. Flip 3: Win! $800. Flip 4: Win! $1,600.

You’re unstoppable. Why bet less when odds are this good? You place your entire $1,600 fortune on the table.

Flip 5: It lands on the 10% chance. Loss.

In an instant, your $1,600 is gone. Back to zero. The math was in your favor, but your strategy was fatally flawed.

The Rebound and Discovery

Devastated but determined, you scrape together another $100. This time, you’re terrified of going broke. You bet just $1 per flip.

You play for hours. You almost never hit zero. But after 50 flips, you’ve made maybe $30. You’re safe, but you aren’t getting rich.

You realize there must be a sweet spot between reckless “all-in” and timid “$1 bet.”

Enter John Kelly Jr., a scientist at Bell Labs in the 1950s. He wasn’t looking to beat the house—he was solving signal noise issues. But the formula he derived, the Kelly Criterion, became the legendary secret weapon for gamblers and investors like Warren Buffett.

The Kelly Criterion Explained

The Kelly formula calculates the exact percentage of your bankroll you should bet to maximize long-term growth.

f* = (bp - q) / b

Where:

  • f* = Optimal fraction to bet

  • b = Odds you get on the bet (1:1, so b=1)

  • p = Probability of winning (90%, or 0.9)

  • q = Probability of losing (10%, or 0.1)

For our 90% win rate:

f* = (1 × 0.9 - 0.1) / 1 = 0.8

Kelly says: bet 80% of your bankroll on every flip.

  • Start with $100 → Bet $80

  • Win: $180 → Next bet: $144 (80% of $180)

  • Lose: $20 → Next bet: $16 (80% of $20)

Why is this unbeatable?

  1. Never go broke: You always bet a percentage, never your last dollar

  2. Maximize geometric growth: Compounds faster than any other strategy

In the real world, our “90%” estimates are just guesses. Smart players often use Half-Kelly—cut the percentage in half (e.g., 40% instead of 80%). This sacrifices a small amount of growth for a massive reduction in volatility.

The Outstanding Insight: Kelly’s “Percentage to Bet” IS a Code Rate

Here’s where it gets intriguing. John Kelly didn’t set out to create a gambling formula — he was working at Bell Labs on signal processing problems. He discovered something really fascinating:

The optimal percentage of your bankroll to bet (f) IS mathematically identical to the optimal code rate for transmitting information through a noisy channel.*

Technically, the code rate means how big part of your transmission you can use for raw data, when the rest of it is error correction/redundancy. Or in other words: Kelly’s formula is literally about how efficiently you encode your wealth growth signals through a channel of probabilistic outcomes.

 From Theory to Practice: Kelly in Modern IT

Kelly discovered this principle by thinking about betting and communication channels. But the same mathematical structure applies whenever you’re making decisions under uncertainty—encoding your “signals” (decisions) for transmission through noisy environments.

Kelly’s insight might guide us across different domains in different ways. Sprint planning, cloud optimization, portfolio management — each problem has its own characteristics, but the core principle of encoding signals efficiently for noisy channels remains the same.

 1. Spot Instance Bidding: Your Kelly Guide to Cloud Savings

This is one of different ways Kelly’s insights apply to modern DevOps. It transforms Spot Instance bidding from a gamble into a calculated financial portfolio strategy.

The Problem: You need to run stateless workloads (CI/CD runners, video rendering, batch data processing, ML training). On-Demand instances are expensive ($1.00/hr). Spot instances are cheap ($0.10/hr) but unreliable (can be preempted at any time).

Most teams guess: “Maybe 30% Spot, 70% On-Demand?” or play it safe with 100% On-Demand. This is either timid (wasting money) or all-in (risking failure).

The Kelly Solution: Calculate the optimal code rate (f*)—the percentage of your cluster that should be Spot Instances.

The Setup:

To use the formula, we must translate cloud economics into betting economics:

  • The Bet: Running a job on a Spot Instance.

  • The Payoff (Odds, b): The cost savings multiplier.

    • If On-Demand is $1.00/hr and Spot is $0.10/hr, you get $10.00 of value for a $0.10 bet.

    • b = 10 (10:1 payoff).

  • The Win Probability §: The probability that the instance survives for the duration of your job.

    • Source: AWS Spot Instance Advisor or historical data (e.g., us-east-1 m5.large has a <5% interruption rate).

  • The Loss Probability (q): The chance of preemption (1 - p).

Where f* is the percentage of your cluster that should be Spot Instances. The remainder should be On-Demand to hedge your risk.

Practical Example: GPU Workload

You’re running ML training jobs that need p3.2xlarge GPUs for 6 hours.

  • Spot Price: $0.59/hr (vs $3.82/hr On-Demand) → b = 6.47

  • Reliability: Historically, 40% of instances survive a 6-hour window → p = 0.40, q = 0.60

Kelly Calculation: f* = (6.47 × 0.40 - 0.60) / 6.47 f* = (2.59 - 0.60) / 6.47 f* = 1.99 / 6.47 f = 31%*

Kelly says: Use 31% Spot, 69% On-Demand.

Why this works: The 6.47x payoff (significant savings) justifies the 31% bet despite the 60% risk of preemption. Over time, the savings compound (code rate analogy — you’re efficiently encoding your wealth growth signals).

Practical Implementation:

  1. Start conservative: Begin with Half-Kelly (15% Spot) to account for uncertainty in reliability data.

  2. Monitor actuals: Track your preemption rates, job failures, and total costs.

  3. Adjust your code rate: If preemption rate is higher than expected, reduce Spot percentage. If lower, increase it.

  4. Use “Checkpointing”: For stateful jobs, save progress every 5-10 minutes. If a Spot instance dies, your job continues on On-Demand. This dramatically reduces the “cost of losing.”

When NOT to Use Kelly:

  • Critical infrastructure (databases, master nodes): The cost of preemption is infinite (data loss, service outage). Kelly would say f* < 0. Don’t bet at all.

  • Very short jobs (seconds to minutes): Preemption rate is too high, or monitoring overhead is too expensive. The code rate analysis doesn’t apply.

  • Market is too volatile or too expensive: If b is too low, Kelly will recommend 0% (don’t bet at all).

This approach isn’t gambling—it’s a mathematical framework for optimizing cloud spend with quantified risk. The “code rate” f* balances savings against the probability of preemption.

 2. Sprint Planning: Your Encoding Scheme for Velocity

This is where Kelly can change your life as a Scrum Master or Product Owner.

The Problem: Teams treat capacity (velocity) as deterministic. “Our velocity is 50 points, so let’s load 50 points.” This is the “all-in” encoding scheme—using 100% of your channel with no redundancy. If any estimate is wrong (and they always are), the signal gets corrupted — you “bust,” carry over work and miss the Sprint Goal.

The Kelly Solution: Treat Sprint capacity as your channel capacity and find the optimal code rate.

The Kelly formula simplifies to: f = 2p - 1* (where p = confidence in completion)

Confidence §

Kelly Code Rate

Commitment

50%

0%

Don’t transmit (don’t commit)

70%

40%

Use 40% of the capacity

90%

80%

Use 80% of the capacity

Real-world example: Your team’s average velocity is 40 points.

  • High-confidence Sprint (familiar system, well-understood stories): Kelly says code rate = 90-95% → Commit to 36-38 points

  • Low-confidence Sprint (integrating new API, vague stories): Kelly says code rate = 50-60% → Commit to 20-24 points

The key insight: Most teams already apply error-correcting codes unconsciously—they add redundancy by padding task estimates. The question is whether to make your code rate explicit.

The alternative encoding scheme: If you prefer constant transmission rates (stable velocity), strip the redundancy from individual signals (story estimates) and apply Kelly at the system level (sprint planning). If you prefer simple signal encoding (story pointing), acknowledge the hidden redundancy and commit 100% of velocity.

Either way, unused capacity isn’t wasted—it’s redundancy in your encoding scheme to absorb channel noise (estimation errors) and guarantee the Sprint Goal signal reaches the destination without corruption.

 3. Portfolio Management: Your Code Rate Across Multiple Channels

Don’t transmit your entire annual budget (100% code rate) on one massive, high-risk “innovation” signal. Use Kelly to determine the optimal code rate across high-risk/high-reward signals vs safe, keep-the-lights-on signals.

The Problem: Innovation projects are high-bandwidth signals with lots of noise. Encoding at 100% code rate means signal corruption (project failure). Encoding at very low code rates means wasting channel capacity (idle budget).

Example: Innovation Budget = $1,000,000

Project

Payoff (b)

Success §

Kelly Code Rate

Allocation

AI-powered search

5x

25%

0.1

$100,000

Mobile app overhaul

3x

40%

0.2

$200,000

API integration

2x

80%

0.7

$700,000

Assuming these signals are independent — no shared dependencies, vendors, or failure modes, it’s pretty much a ready-to-go strategy! In practice, organizations also apply hard caps — no single signal exceeds 20-30% of the budget, as a safeguard against signal interference (correlations) in the real world.

The reality check: Success probabilities for innovation projects are notoriously hard to estimate. Research shows experts consistently tend to be overconfident — thinking their signal-to-noise ratio is better than it actually is. In practice, you can use historical reference classes and monitor actual outcomes to refine your estimates.

 The Takeaway

The Kelly Criterion teaches us that optimal isn’t about avoiding risk — it’s about sizing risk correctly.

But now we understand something deeper: Kelly’s “percentage to bet” is literally a code rate—the optimal way to encode your decisions for transmission through noisy channels.

In IT, we often oscillate between two extremes:

  • The “all-in” approach: Transmit at 100% code rate with no redundancy. High risk of signal corruption (total failure).

  • The “timid” approach: Transmit at very low code rates. Wasting channel capacity. Zero growth.

Whether you’re bidding on Spot Instances, planning a Sprint, or allocating portfolios—ask yourself:

“What’s my Kelly code rate for this decision channel?”

Encode consciously, not blindly. Think like a seasoned gambler. Think like Kelly. Your stakeholders (and your sanity) will thank you.

Got curious about the Kelly’s original way of thinking? I recommend checking out his original paper https://www.princeton.edu/~wbialek/rome/refs/kelly_56.pdf Have fun!

Want to expand the topic?

Want to expand the topic?

Address:

Let's Go DevOps Sp z o.o.
Zamknięta Str. 10/1.5
30-554 Cracow, Poland

View our profile
desingrush.com

Let’s arrange a free consultation

Just fill out the form below and we will contact you via email to arrange a free call to discuss your project scope and share our insights from similar projects.

© 2024 Let’s Go DevOps. All rights reserved.

Address:

Let's Go DevOps Sp z o.o.
Zamknięta Str. 10/1.5
30-554 Cracow, Poland

View our profile
desingrush.com

Let’s arrange a free
consultation

Just fill out the form below and we will contact you via email to arrange a free call to discuss your project scope and share our insights from similar projects.

© 2024 Let’s Go DevOps. All rights reserved.

Address:

Let's Go DevOps Sp z o.o.
Zamknięta Str. 10/1.5
30-554 Cracow, Poland

View our profile
desingrush.com

Let’s arrange a free consultation

Just fill out the form below and we will contact you via email to arrange a free call to discuss your project scope and share our insights from similar projects.

© 2024 Let’s Go DevOps. All rights reserved.