14 September 2024


This question was part of the GJ Open Forecasting website.

***How many transit calls will the Panama Canal have between and including 15 July 2024 and 14 August 2024, according to the IMF? Github Link***

My initial attempt to approach this question was to take the previous data and make a prediction using the average. Though this was apparently the correct approach (in a sense 🥲), this question turned out to be a typical case wherein what the data was trying to tell us, was slightly misleading given the options I had to choose from. Allow me to explain…

The first dataset I analyzed covered data from July 1st to July 23rd. My approach was to calculate the average value for this period, which came out to be 26.17. Using this average, I then attempted to forecast the remaining days. To do this, I took the sum of the values from July 15th to July 23rd, which totaled 243. With 22 days remaining in the month, I multiplied the average (26.17) by the number of remaining days and added it to the existing sum: 243 + (22 * 26.17) ≈ 818.74.

In the second dataset, I shifted my focus to a more recent period, analyzing data from July 15th to July 29th. The average value for this period was slightly higher at 26.533. To make a new estimate, I considered the current cumulative sum of 398 and calculated the projection for the remaining 16 days. Using the average value, my estimate was around 398 + 16 * 26.533 ≈ 822. Alternatively, by rounding the average to 27 for simplicity, I calculated another estimate: 398 + 16 * 27 ≈ 830. Given the standard deviation of 2.29 for this period, I decided to split the probabilities into two potential ranges: 750-825 and 825-900. I assigned a higher weight to the 825-900 bucket. This was influenced mostly because of the crowd’s opinion on the website and my own estimates.

The primary reason as to why the question got tricky has to got to do with the probability buckets given on the GJ Open website:

The challenge arose from the fact that the buckets were relatively close in range, especially in the middle. With the estimates hovering around the 822-830 mark, they straddled two adjacent buckets: 'at least 750, but fewer than 825' and 'at least 825, but fewer than 900.' This made it difficult to confidently choose one range over the other, especially considering the slight variations in averages and the standard deviation. The narrow margins between these buckets meant that even a minor deviation in the forecast could significantly shift the probability distribution, complicating the decision-making process.

But that wasn’t the end of the surprises. The real twist came when I calculated the average from July 15th to August 11th—it turned out to be exactly 26. This brought the cumulative total to 728, and using this new average, my final prediction landed at around 806. This outcome was quite different from my earlier estimates, which hovered closer to 822-830.