I posted my code that does a monte-carlo simulation of how the benchmark statistic behaves. Feel free to plug your version in and see how it does!

]]>https://github.com/mindplay-dk/benchpress

I tried different things and ended up with this while searching for an algorithm that, in a reasonable amount of time, would arrive at close to the same result every time I ran it, even with a reasonably small number of samples.

Weighted average seemed to work the best, but my approach was completely experimental, so I can’t argue as to why this seems to work better than min…

