http://www.powerball.com/powerball/pb_nbr_history.asp
I downloaded to a flat text file. Unfortunately, numerical analysis of all 1900 drawings is not easily done. The rules have changed quite frequently. The number of balls used for the drawing has changed frequently.
https://en.wikipedia.org/wiki/Powerball
If you wanted to use the full sample for analysis, you would have to go through all these rule changes and weight the balls accordingly throughout the history of the game. Something I am not going to do on a Sunday morning. The last rule change was October 7th of 2015. This gives me a meager sample of 28 drawings. Nonetheless, I generate the numbers based on this set of drawings. I wacked together a C++ program to calculate the number of times any given ball was drawn.
// powerball.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <fstream>
#include <map>
int _tmain(int argc, _TCHAR* argv[])
{
std::ifstream lStream( _T("c:\\temp\\powerball2.txt"), std::ios::in );
// Check for successful open.
if ( ! lStream.is_open () )
{
printf ("open failed\n");
return 1;
}
std::map<int, int> lFreq;
std::map<int, int> lPowerFreq;
for (int i = 1; i < 70; i++)
{
lFreq[i] = 0;
}
for (int i = 1; i < 27; i++)
{
lPowerFreq[i] = 0;
}
for (;;)
{
if (lStream.eof())
{
break;
}
char lLineBuf[_MAX_PATH];
lLineBuf[0] = 0;
lStream.getline (lLineBuf, sizeof(lLineBuf));
if (strlen(lLineBuf))
{
int lBall1, lBall2, lBall3, lBall4, lBall5, lBall6;
char lDateBuff[_MAX_PATH];
if (sscanf_s(lLineBuf, "%s %d %d %d %d %d %d", lDateBuff, _MAX_PATH, &lBall1, &lBall2, &lBall3, &lBall4, &lBall5, &lBall6) == 7)
{
lFreq[lBall1]++;
lFreq[lBall2]++;
lFreq[lBall3]++;
lFreq[lBall4]++;
lFreq[lBall5]++;
lPowerFreq[lBall6]++;
}
}
}
for (int i = 1; i < 70; i++)
{
printf ("freq %d %d\n", i, lFreq[i]);
}
for (int i = 1; i < 27; i++)
{
printf ("powerfreq %d %d\n", i, lPowerFreq[i]);
}
return 0;
}
Input for the program in powerball2.txt was as follows:
Draw Date WB1 WB2 WB3 WB4 WB5 PB PP
01/09/2016 32 16 19 57 34 13 3
01/06/2016 47 02 63 62 11 17 3
01/02/2016 42 15 06 05 29 10 2
12/30/2015 12 61 54 38 36 22 3
12/26/2015 65 40 44 59 27 20 2
12/23/2015 67 16 63 38 55 25 4
12/19/2015 30 68 59 41 28 10 2
12/16/2015 09 42 10 55 32 06 2
12/12/2015 62 02 30 19 14 22 2
12/09/2015 16 46 10 56 07 01 2
12/05/2015 47 33 68 27 13 13 2
12/02/2015 14 18 19 64 32 09 2
11/28/2015 47 02 66 67 06 02 3
11/25/2015 53 16 69 58 29 21 2
11/21/2015 37 57 47 50 52 21 3
11/18/2015 40 17 46 69 41 06 2
11/14/2015 66 37 22 14 45 05 3
11/11/2015 26 04 32 55 64 18 3
11/07/2015 50 53 07 16 25 15 2
11/04/2015 12 02 17 20 65 17 4
10/31/2015 09 47 20 25 68 07 2
10/28/2015 56 62 54 63 04 10 2
10/24/2015 20 31 56 64 60 02 3
10/21/2015 57 32 30 42 56 11 4
10/17/2015 57 62 69 49 48 19 3
10/14/2015 20 15 31 40 29 01 2
10/10/2015 27 68 12 43 29 01 2
10/07/2015 52 40 48 18 30 09 3
I outputted this to a text file and loaded it into Excel. Here is the graph of balls and the number of times they have been picked.
Here is the graph of the powerball picks.
With the sample size, the 5 base balls on average should have been picked 2.02 times. With the powerball, each number should have been picked 1.07 times. This is based on my sample size of 28.
With the drawing yesterday, here are the results.
16 - picked 5 times. Well above 2.02
19 - picked 3 times. Above average.
32 - picked 5 times.
34 - picked 1 time. Below average finally.
57 - picked 4 times.
Powerball 13 has been picked 2 times. Above average.
The strategy of picking numbers that haven't been picked before clearly would not have worked yesterday. That said, it would be interesting to analyze the full data set and see if any real trends can be observed. This simple exercise opens the door for all sorts of numerical analysis. I am not sure if I want to go down that hole.
