For the last couple of months I’ve been working on developing my own suite of online poker tools. It’s something that I think a lot of software developers who play the game try at some stage of other – the premise of poker is so simple but it almost invites efforts at simulation and modelling, yet it is difficult to actually get it right. I’ve made starts on similar programs before but it’s only recently that I have become serious about producing something to a commercial grade.
Right back when I started, one of the first things that I wanted to be able to do was a simple probability calculation; if I have these two cards and my opponent has these two cards and there are these three cards already on the board, what is the probability that I win the hand?
Of course, there are plenty of free tools online which already do this, but that wasn’t really the point of my efforts – I had to start somewhere and if I couldn’t manage to create something as simple as an odds calculator then I had no chance of building anything more advanced.
It’s not as easy as it sounds. The obvious approach is iteration – create a calculator that works out the value of a hand and then iterate through all the possible outcomes and keep count of how many times each player wins. Simple, right?
Yet my first attempt at this was dreadful. It did the job, just not fast enough. It would return the value of an individual hand in less than a tenth of a millisecond – but to evaluate the probability of an outcome took it several minutes.
Why? Because if two players are holding two cards each preflop, there are 1,712,304 possible 5 card combinations for the board. That’s almost 3.5 million calculations required (since there are 2 players). Even at a hundredth of a millisecond per calculation, that would still take over half a minute – which is not an acceptable response time. So the hand calculation algorithm needs to be faster.
Fortunately, there are a number of ways in which we can speed things up.
Using bitwise logic to create a fast poker hand evaluator
The problem with my initial calculator was that it was doing a lot of internal iteration. I stored each card as a value in an array and calculated how many cards of a particular denomination or suit the array contained. I used standard arithmetic and analysis via cascading ‘if’ statements e.g. “if there are 4 cards of the same value, then…”. This is tedious to code and time-consuming in processing terms.
But these calculations can be abstracted. There are 52 cards in a deck. An unsigned long integer has 64 bits (a least, it does in C#, my development language of choice). What if you assigned one bit per card? So the first bit represents 2 of clubs, the second bit represents the 3 of clubs… all the way through to ace of spades.
That way, you can store a hold hand in a single unique number. If binary 100000000 is the 10 of clubs, then 111110000 is a straight flush: or Tc-9c-8c-7c-6c to use standard poker notation.
If you have a full deck of cards (111111111… etc), you can ‘deal’ a cardfrom the deck using the ^ (NOR) operator (e.g. “deck ^= card”), and when you want to add that card to the board you can combine it using the | (OR) operator (e.g. “board |= card”).
You can even use the shift operator to repeat operations for different suits. Consider the code below:
int DIAMONDS = 13;
uint sd = (uint)((deck >> DIAMONDS) & 0x1fffUL);
Assuming that ‘deck’ is an unsigned long int whose bits represent all cards remaining in the deck, this simple operation will return to you the denominations of all diamonds remaining in the pack – e.g. 1101000110011 would represent the Ad, Kd, Jd, 7d, 6d, 3d and 2d.
Storing precalculated values to save on run-time operations
Since you can store a set of cards in a single number, you can also store the results to common calculations in advance for quick and easy access. In the example above, the binary representation of 7 cards evaluates to 6539 in standard decimal. You can save this information by creating an array of length 2^13 and storing the value ’7′ at index 6539. Do this for every other variation and then when you want to know if you have a flush, you simple use the code:
uint sd = (uint)((deck >> DIAMONDS) & 0x1fffUL);
bool isFlushDiamons = (bitCount[sd] >= 5);
You can perform similar operations to tell you the value of the highest card and the number of consecutive denominations (i.e. do you have a straight?).
In theory you can even take this further and create a hash for each unique 7 card value and pre-calculate the result. However there are 133 million possible combinations which presents a couple of problems. Firstly, is retrieving a value from such a large pool going to be any faster than doing the calculation? Secondly, assuming each hand value takes up 4 bytes of data (as you have to store information on kickers as well), do you really want the application to take up 633 MB of disk space?
I decided I didn’t. Unless…
Avoiding duplicate calculations by removing suits from the equation
Another step which will vastly improve the speed of calculation (and significantly reduce the amount of space required to store the results on disk) is to avoid iterating over every card combination and instead iterate only over unique numerical combinations. That means that rather than dealing with 52 different items that can be dealt only once, we are dealing with 13 different items that can be deal up to 4 times. The total number of iterations remarkably drops from 1,7 million to a little over 6,000 – or about 0.35%.
We still need to take into account flushes and straight flushes, but we can do this using simple mathematics (rather than iteration). Take the combination 5,6,7,8,9. Assuming there are 4 cards of each left in the deck, there are 256 (4 ^ 5) different ways of dealing these denominations (in order). Say we have two diamonds. The probability of dealing at least 3 diamonds can be calculated using the binomial distribution function which, in this case, gives a result of around 10.35%.
So we know that on 10.35% of 256 occasions we will get a flush, and the remainder of that 256 we get a straight. When we do the probability calculation, we calculate the number of wins for each player based on these figures.
This last tip is particularly useful when attempting to calculate hands in games like Omaha. In Omaha, due to having 4 hole cards and the requirement to use exactly 2 of them in your hand, there are 10 different board combinations and 6 hole combinations that need to be evaluated for each iteration. This results in about 100 million calculations if you do things purely by iteration. By abstracting the cards to be values only and calculating the flush and straight values as probability percentages, you can reduce the number of calculations required down to close to 600,000 and keep the calculation at under a second.
When all else fails… hand sampling
For all of this, there are still some calculations for which I could not devise a fast algorithm. Up to this point I have discussed calculations for Texas Holdem, but having just mentioned Omaha I should mention other games which I have considered – specifically, 7 card stud.
In 7 card stud each player is dealt 7 cards. If each player has one known card and 6 unknowns then the number of possible outcomes is several orders of magnitude more than in Texas Holdem poker – somewhere in the trillions, in fact. To try an iterate through this many possibilities, even with the shortcuts mentioned, is madness – and I discounted a purely mathematical approach as overly complex and not worth the effort.
This left me with hand sampling. By generating a number of random hands and tallying the results we can calculate an estimated probability with a given confidence level. The bigger the sample, the more accurate our result. I settled on a sample size of 3 million, as this gave accuracy to within around 0.3% while still running fast enough to perform the calculation in less than a second. I tested my end product against some already available calculators and discovered that it was actually more accurate – I can only assume that they also used a sampling approach but based results on a smaller sample size.
So there you have it. My code for the calculation has evolved somewhat beyond the tenets above as, ultimately, I wanted to do more than just calculate hand value probabilities – but hopefully the principles outlined are enough to get a casual developer started.Read
To draw a poly line between two points on a blank canvas.
1. The line can only move in the vertical or horizontal direction.
2. The canvas may have other objects, such as rectangles, circles which must not be crossed.
3. The poly line must be as short as possible.
The problem was posed to me by a potential client based in Australia. At the time I assumed it was a real world logistics issue abstracted into a geometric puzzle. In truth, it was an interview question – he was testing the mental agility and communication skills of potential recruits for his remote development team. I had limited time to devise the solution but thankfully I passed the interview.
I created a test environment using the .Net Drawing library on a C# Winforms Canvas, which would allow me to visually represent shapes, nodes and vectors. Then I set about building the algorithm, which could be divided into 3 more or less distinct steps
1) Create a perpendicular grid that traces the edges of the polygon obstacles
The smallest possible rectangle that fully encapsulates a polygon without actually touching (intersecting it) is given by the points (Xmin – 1, Ymin -1), (Xmin – 1, Ymax + 1), (Xmax + 1, Ymax + 1), (Xmax +1, Ymin – 1), where Xmin and Xmax are the smallest and largest X co-ordinate of the polygon, and Ymin and Ymax are the smallest and largest Y co-ordinates.
If the polygon presents an obstacle to an otherwise straight line between A and B, then that line will intersect with the given rectangle at 2 points. Tracing a route from these 2 points along the sides of this rectangle will give the shortest possible route to bypass the polygon. (Note, that it will not be the ONLY shortest possible route – there will be other routes equally as short – but there will be no shorter route)
Once the points defining these rectangles have been plotted, you can extend the lines all the way to the edge of the grid, creating a lattice like the one in the diagram below.
2) ‘Reflect’ any lines which intersect with a polygon along the perpendicular
Each of the polygons is defined by a series of vectors, so we can test where a line enters/exits the bounds of a polygon by testing the point of intersection with each of the vectors defining a polygon. Two infinitely long, straight, non perpendicular lines will always have an intersection point and – assuming each line is defined by Ax + By = C – you can find this with the code:
float delta = A*line.B – line.A*B;
if (delta == 0) return null;
int x = (int)((line.B*C – B*line.C)/delta);
int y = (int)((A*line.C – line.A*C)/delta);
return new Point(x, y);
Of course, you also need to check that the intersection point falls within the two corners that define the endpoints of the polygon’s side. The most efficient way to do this is a greater than/less than statement that’s not nearly so pretty:
return (((point.X <= Point1.X && point.X >= Point2.X) ||
(point.X >= Point1.X && point.X <= Point2.X)) &&
((point.Y <= Point1.Y && point.Y >= Point2.Y) ||
(point.Y >= Point1.Y && point.Y <= Point2.Y)))
Once the algorithm has calculated the intersection point of the line with any polygons, it diverts the line along the perpendicular until it encounters another obstacle, and records any points on intersection with other lines already traced by the lattice. To do this, as well as the point of intersection, the code needs to know which side of the line represents inside the polygon and which side represent outside.
To this, assuming that the line starts on opposite edges of the canvas and each start point is outside the polygon (conditions that you can enforce by ensuring that any polygon which falls partially outside the bounds of the canvas is cropped along the canvas edge), you simply count the number of points of intersection with the polygon on each side. If there are an odd number of intersections, that direction leads into the polygon; if there are an even number, that direction leads out of the polygon. (NB: this approach doesn't work 100% - corners and perpendicular edges can cause problems. See endnote).
Knowing this, we can chop the line into two, erasing the section that falls inside the polygon, and creating a new line that runs perpendicular to the original and away from the polygon. This creates a grid like the one below:
3) Use Djisktra’s algorithm to find the shortest route along the grid
The resultant grid can then be divided into a series of vectors which run between points of intersection between the lines. In the diagram below, each point of intersection between two black lines or the edge of the grid is considered a node and a single vector runs between edge pair of nodes connected by a black line that does not cross another node.
We can now abstract the problem into a node-web that can be traversed using Dkisktra’s algorithm, where the differences in X and Y co-ordinates can be translated into distances. My implementation of the algorithm is below:
And the result?
Improving the algorithm
I tested the solution for 10 to 12 problems and, after tweaking phase 2 a little, had something which was running well enough for most arrangements but which still potentially contained a couple of bugs. However, I had limited time in which to work and it had been made clear to me that my first submission did not need to be perfect, so I went ahead and submitted it.
The main issues that the algorithm could encounter are with the second phase, where the arrangement of polygons has has the potential to throw it off.
Firstly, where a line intersects a corner without passing directly into the polygon, there will be only 1 point of intersection with the polygon, which will create confusion with the code that determines whether a line is inside or outside the polygon’s bounds.
Secondly, there is an assumption that all shapes are polygons which can be defined as a series of nodes and vectors. If you look at the original problem description, this is not the case – shapes could be circles or simple contain curved surfaces. More sophisticated logic is required to calculate the points of intersection.
Thirdly, there are some arrays of shapes which, even if the intersection logic is perfect, could lead to vector lines tracing routes being ‘reflected’ away from openings. For example, consider two interlocking crescents or spirals, spaced so that there is a path between them but, due to their concave surfaces, the line that the algorithm attempts to trace between them is constantly diverted away.Read