 |
|
 |
| View previous topic :: View next topic |
Author |
Message |
Sandy

Joined: 03 Feb 2006 Posts: 673 Location: San Diego, CA
|
Posted: Mon Feb 22, 2010 12:34 am Post subject: Show 257 |
|
|
The conversation continues here... _________________ Cachin' with my sweetie...
Sandy of Team PodCacher |
|
Back to top |
|
 |
batsgonemad

Joined: 30 Jan 2008 Posts: 279 Location: Northern Ireland
|
Posted: Mon Feb 22, 2010 1:36 am Post subject: |
|
|
FTP, sweet, downloading show now, making coffee, then back to bed for a ittle while still _________________
 |
|
Back to top |
|
 |
batsgonemad

Joined: 30 Jan 2008 Posts: 279 Location: Northern Ireland
|
Posted: Mon Feb 22, 2010 4:29 am Post subject: |
|
|
Ended up listening to the show on Stitcher _________________
 |
|
Back to top |
|
 |
ePeterso2

Joined: 30 Jun 2009 Posts: 143 Location: 26.1ºN 80.1ºW 98.6ºF
|
Posted: Mon Feb 22, 2010 8:20 am Post subject: |
|
|
If anyone's interested in the math behind the One Million Caches forecast page, here's how it works ...
I started collecting data on Feb 8 after I first heard about the contest. I kept a record of the number of active caches each day, and I began plotting them on a graph (date on the horizontal axis, caches on the vertical axis). I figured that once I had enough data, I'd see if there was any kind of pattern.
When I had a week's worth of data, it seemed to be fairly linear - that is, the plot on the graph ended up pretty close to a straight line. But it's not exactly a straight line - there is a bit of variation from day to day. (The R^2 number is a measure of how good the estimate is - the closer that value is to 1.0, the better the forecast matches the actuals.)
I used a mathematical technique called linear regression to find the straight line that most closely fits the observed data and minimizes the variation. I then project that line into the future to find the point where it crosses the 1,000,000 mark. That's how I arrive at the most likely date.
Since the data seems to be linear, that means that the difference in the number of caches from one day to the next should be about the same. So I calculated the difference in the cache count each day, then used that to derive the standard deviation in the daily difference, which gives you an idea of how much the future data is likely to vary (assuming, of course, that the future data are measures of the same processes that generated the past data).
The upshot of all this is that the forecast date range is calculated by taking linear regression line, then adding and subtracting 3 standard deviations of the daily difference data. I don't know if that's a perfectly solid statistical basis for the prediction, but it's probably a reasonable estimate for what the true outcome will be.
(No, I didn't run control charts or a Monte Carlo simulation. I figured that by the time I got the Perl code for that to work, the contest entry date would be long gone )
(FWIW - I ran an XmR chart on the data as of this posting and found that the data does display statistical control, passing all of the run tests. X-CL is 510, X-UCL is 1002, and X-LCL is 18 ) _________________ Check out Puzzlehead.org
Information, resources, stories and fun for puzzle solvers and creators
Join the Puzzleheads group on Facebook! |
|
Back to top |
|
 |
CoronaKid

Joined: 04 Aug 2006 Posts: 908 Location: Corona, CA
|
|
Back to top |
|
 |
ePeterso2

Joined: 30 Jun 2009 Posts: 143 Location: 26.1ºN 80.1ºW 98.6ºF
|
Posted: Mon Feb 22, 2010 9:20 am Post subject: |
|
|
Most puzzle-solving is all about pattern recognition. It's a very handy and underappreciated skill, one I wish was emphasized in school more ... _________________ Check out Puzzlehead.org
Information, resources, stories and fun for puzzle solvers and creators
Join the Puzzleheads group on Facebook! |
|
Back to top |
|
 |
Sonny Site Admin

Joined: 03 Aug 2006 Posts: 1375 Location: San Diego, California
|
|
Back to top |
|
 |
ePeterso2

Joined: 30 Jun 2009 Posts: 143 Location: 26.1ºN 80.1ºW 98.6ºF
|
Posted: Mon Feb 22, 2010 7:19 pm Post subject: |
|
|
Okay, you talked me into it - I switched the app to use a Monte Carlo simulation instead. I wasn't comfortable with the results ... they didn't "feel" right ...
If you really want to know all of the details, Wikipedia has an excellent article on it. Here's the synopsis:
The web page takes the historical data, figures out the average growth rate as well as the variability (standard deviation), then runs a simulation of how growth will happen in the future. It picks growth numbers at random based upon the average and the variability, then remembers how long it took the simulation to reach 1,000,000 caches.
If you run just a single simulation, it doesn't tell you a whole lot. But if you run a lot of simulations and add up the results, a pattern begins to appear in the results. In the web app's case, it runs 10,000 simulations every time you load the page.
What it reports is a table that shows the frequency with which each date ended up as the result of the simulation. So if a particular date shows up as the solution 2,700 times out of 10,000, then that date is estimated to have a 27% chance of being the actual date.
The result is something that looks a lot more reasonable to me ... and will be more accurate as we get closer and closer to the actual date.
The simulation runs from scratch each time you load the page, which is why there's a slight delay, and your results will look a tad different each time. But there's not really any benefit to running it repeatedly - the pattern is basically the same (which is the entire point of a Monte Carlo simulation).
Percentages are rounded off, so dates with likelihoods shown as 0% are actually nonzero, just really really small. If a date doesn't appear, then it never came up as a result of the simulation run.
The page also shows the mode, or the most likely outcome. This date has the best chance of being the actual date. However, a date that has a 25% chance of being correct still has a 75% chance of being incorrect. As we get closer to the real date, the likelihood of the real date should increase more and more.
http://www.epeterso2.com/geocaching/onemillioncaches/
(Yes, this actually is fun for me )
-eP |
|
Back to top |
|
 |
CoronaKid

Joined: 04 Aug 2006 Posts: 908 Location: Corona, CA
|
Posted: Tue Feb 23, 2010 8:28 am Post subject: |
|
|
Great show Triple S! Thanks for the nice interview with the German cachers. It's always interesting to get a global perspective of the sport. I'm glad to hear that Geothief was caught and I'm hopeful that it will help curtail such activity. I'm just worried that this slap on the wrist might only infuriate him even more and cause him to lash out even more. I guess time will tell.
BTW, am I the only one that has absolutely no idea what Sean said??  |
|
Back to top |
|
 |
batsgonemad

Joined: 30 Jan 2008 Posts: 279 Location: Northern Ireland
|
|
Back to top |
|
 |
addisonbr
Joined: 01 Mar 2008 Posts: 82 Location: New York, New York
|
Posted: Tue Feb 23, 2010 2:10 pm Post subject: |
|
|
| ePeterso2 wrote: |
| The result is something that looks a lot more reasonable to me ... and will be more accurate as we get closer and closer to the actual date. |
It'll be interesting to see if human behavior mucks up the models. I know that when interesting waypoints roll around, a lot of people rapid-register a bunch of new caches to try to capture it... I'm wondering if people trying to publish the "1,000,000th Cache" will somehow cause new cache listing frequencies to deviate from the historical.
It's not obvious to me how it would, only that... it's always the non-obvious things that break my models. |
|
Back to top |
|
 |
CoronaKid

Joined: 04 Aug 2006 Posts: 908 Location: Corona, CA
|
Posted: Tue Feb 23, 2010 2:19 pm Post subject: |
|
|
| batsgonemad wrote: |
| CoronaKid wrote: |
BTW, am I the only one that has absolutely no idea what Sean said??  |
uh no i have noidea either, maybe its cause we dont have our own kids |
Well, I have two young kids so I don't have that excuse. I do remember that my wife and I were the only ones that understood what my daughter was saying.
I'll be amazed if anyone gets all 5. |
|
Back to top |
|
 |
BuckeyeBeth

Joined: 27 Apr 2009 Posts: 8 Location: Columbus, Ohio
|
Posted: Tue Feb 23, 2010 7:03 pm Post subject: |
|
|
Quick, somebody loan me a toddler so I can figure out what on earth Sean is saying!  |
|
Back to top |
|
 |
Sonny Site Admin

Joined: 03 Aug 2006 Posts: 1375 Location: San Diego, California
|
Posted: Tue Feb 23, 2010 7:35 pm Post subject: |
|
|
| BuckeyeBeth wrote: |
Quick, somebody loan me a toddler so I can figure out what on earth Sean is saying!  |
We've gotten lots of responses from various sources that people can't figure out what Sean is saying. Yes, it might be a good idea to go find a toddler and play the show for them and ask for an interpretation!
Realize, this one is a toughie. Sandy and I have to really pay attention and even then our only hope is that we can see what he's talking about and it's in context.
Bonne chance, Buena suerte and Sana swertehin ka _________________ Have you found it yet? |
|
Back to top |
|
 |
Sonny Site Admin

Joined: 03 Aug 2006 Posts: 1375 Location: San Diego, California
|
Posted: Tue Feb 23, 2010 7:37 pm Post subject: |
|
|
| batsgonemad wrote: |
| Ended up listening to the show on Stitcher |
We were going to ask this on a show: How do you listen to the show? Some at computers, some on MP3 players, others ... ? _________________ Have you found it yet? |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|