Tomorrow is the Kentucky Derby, the most iconic horse race in America. Last year, $273 million in bets were (legally) placed. The track ended up with $44 million of that. This isn’t a surprise. For reasons I’ll dive into below, horse racing has some of the worst odds structure of anything you can gamble on. It’s designed for bettors to lose, at zero risk to the racetrack.
Even with the inherent disadvantages, horse racing tips and strategies abound. They have fun names like “Yankee” and “Dutch”. I don’t know if they work (although I strongly doubt it). I don’t know anything about horses or the nuances of horse racing. However, eight years in health tech has taught me a lot about writing software around regulatory intricacies. When I learned about an interesting U.S law, I figured it’d be a fun weekend project to see if it could be used as the base of a provably profitable horse racing strategy. Here’s what I found.
First, some background on horse racing and betting on races. Most importantly, race betting in America is parimutuel. When you place a race bet with FanDuel, DraftKings, or at the track, you are not actually betting against them. Instead, your money is pooled with all other gamblers who placed the same type of bet. The bet taker (e.g the track) removes and keeps a fixed percentage of money from the pool (the “takeout” or “vig”), and the remaining money is paid out to the winning bettors. We’ll dive more into how those payouts are calculated, but this is extremely different then betting on sports. Some of the key differences:
Your odds (and thus your potential payout) change after you place your bet. Your payout is based on the pool at the time the race starts, so odds shift based on bets placed after yours.
The “takeout” percentage is brutal. It varies, but is incredibly high across the board. Churchill Downs (the host of the Kentucky Derby) takes out 17.5% which is pretty standard. For comparison, betting on an NFL point spread (at -110 odds) has a “vig” of 4.76%. Betting on horses is like if NFL spreads were priced at -140.
There is “breakage”. To make things even worse, payouts are rounded down to the nearest dime. 1:9 (the lowest odds you’ll see) is actually paid out at 1:10.
This has some big impacts
There’s no such thing as locking in good odds early. Odds change after you bet, so even you had insider info on a horse, you gain no first mover advantage. This uncertainty causes a large rush of bets right before the race starts.
To overcome the takeout and breakage, you’d need to be right an incredibly high amount of the time. If you’re betting on a horse that priced at even money (e.g a horse with implied odds of 50% chance of winning), you’d need to be correct more than 57% of the time.
This makes horse betting impossible to win at without inside information, true divine inspiration, or legislative intervention. Let’s explore the last one.
Here’s an example pool for a “show” bet, which is betting on a horse to finish in the top 3.
Let’s assume Homer finishes in the top 3. To calculate winnings, you
Take the total pool size ($1,030)
Take out the “vig” (17%, leaving you with $854)
Subtract the total amount bet on horses that finishes in the top 3 ($854 - $1,020, for -$166)
Divide that by the number of winners (-166 by 3, for -$55)
Divide that by the amount bet on the horse you bet on (e.g -55 by 1000 for Homer, or -55 by 10 for Marge)
Wait a minute, that’s a negative number
In certain situations where there’s an extremely heavy favorite, the calculated payout is negative. If you bet a dollar on Homer, the math has you receiving 95 cents back, even though your bet won. This is called a minus pool.
Losing money on a winning bet doesn’t sound fair. The government agrees. U.S gaming commissions impose minimum payout rules. A $1 bet needs to pay at least $1.05. For tracks in West Virginia, it’s a minimum of $1.10. The difference between the pool and the minimum payout has to be paid by the track.
This side-steps the aforementioned issues that make horse racing so hard to beat. When you bet a favorite in a minus pool, you are getting fixed odds (-1000 in West Virginia, -1900 elsewhere), inverting the “vig”, and forcing the track to take a side. This doesn’t inherently make betting the favorite in a minus pool profitable, but it opens up the potential for it to be.
For this strategy to be profitable and usable, a couple things need to be true.
Minus pools need to exist in real life. If they only occur once in a blue moon, then trying to identify and bet on them is impractical and effectively pointless. We need them to happen regularly for us to have enough historical data to calculate expected value, and also to be able to actually have something to bet on.
Favorites need to win, and win a lot. In West Virginia races (which have the most favorable laws), your “show” bets need to win more than 90.91% of the time.
So, are those two things true? I had no clue, but the fun is in finding out!
Quick aside, The current state of horse racing is problematic. While this strategy would potentially be taking money from the tracks, race betting is fun but ultimately pointless. There are much more important problems to solve, and we’re tackling some of them at Healthie. Come join our team!
To answer those two questions, we need data, and a lot of it. Does horse racing generate enough data points, and is that data accessible? The answer, surprisingly to me, was a resounding yes.
I was blown away by the breadth, maturity, and openness of race betting is in the U.S. The first surprise was the amount of bettable horse races. There are tracks all over the world, and with the ranges of timezones, races are happening 24/7. No matter what site you bet on, or even if you bet in-person, you are betting into the same pool. You can watch any of these races online, completely for free. There are 2,460 games in the six month NBA season. There are more than 3,000 horse races every week.
Data is powerful, and horse racing generates a ton of data. The data is also surprisingly easy to access. There are big businesses built on buying and selling clean and accurate historical sports datasets. You can write simple code to ingest years of racing data directly from the primary source.
I like writing code, so I did that. I spent a few hours building a data set of tens of thousands of past results. With that, these questions become straight-forward to answer.
Minus pools happen with decent frequency. There have been more than 130 in the past seven days.
Favorites win a lot, but not enough. Looking at all minus pools in the data set, the favorite finishes in the top 3 (e.g wins the “Show” bet) about 80% of the time. That’s far below the 90.91% we need to be winners.
The good news is, we have tons of data! That 80% number is across all races. We can further segment the races to try and identify which ones would be profitable. There are a couple key factors that stood out.
How many horses are running. The Kentucky Derby has up to 20 horses in the race. That’s a lot! The number of horses varies between races and it is common to have just five or sometimes even four horses in a race. A “show” bet is placing in the top three, which should be much easier to predict in a race with less horses.
How much has been bet in the pool. Some races have under $100 bet on them. The efficiency of horse racing markets is a whole separate topic, but while they can be quite inefficient, a ten thousand dollar pool seems like it would a stronger signal that a fifty dollar one.
How much of the pool is concentrated on the top three favorites. It feels easier to predict a top three finish in a five horse race with two extreme long-shots.
Playing around with these variables, I determined two situations where betting a favorite minus pool at a West Virginia track is profitable.
The first is when a race only has four horses. This makes sense, since only one horse will finish outside of the top three. Four horse races that offer “Show” bets are rare, and are normally the result of a horse being removed from the race at the last minute. Historically, the minus pool “show” bet wins 93.75% of the time (above the all important 90.91%). This percentage goes up when you only look at races with a pool over $1,000.
The second are five horse races with a large pool (greater than $2,000) where more than 90% is concentrated on the top three favorites. This bet wins 91.08% of the time, which while lower, is still above the break-even point. That’s pretty neat.
There’s almost certainly room to improve those percentages (or find more scenarios) by digging deeper into other variables. Races vary in length, riding style, track surfaces, and horse ages all in ways that could potentially impact outcomes.
At the end of day, researching and writing code to dig into this is a lot more fun than using the strategy would be. This is the definition of picking pennies up in front of a steamroller. You need ten straight wins to offset one loss. Even if executed perfectly, you’d be dealing with immense amount of variance, just to earn less money than you would by keeping a bankroll in a high-yield savings account.
I’ve only been to a horse race once. My granddad took me once when I was sixteen, and I had a lot of fun getting him to place $2 bets based on which horse name I liked the most. This strategy is little more data-driven, but of pretty similar utility. Diving deep here likely won’t lead to any money, but it will make you a great conversationalist on West Virginia gaming regulation. What’s better than that?
Super interesting article! But I’m most curious if you placed a bet on today’s Kentucky Derby?