Benchmarks and honesty - Chas Emerick

…like oil and water, right? Not necessarily; we should hope not, for otherwise we’re all in trouble. Last week, someone anonymously posted a comment to a previous entry of mine. In a nutshell, he or she implied that the benchmarks we publishcomparing PDFTextStream text extraction performance to that of other Java PDF libraries was rubbish. Here’s the comment in its entirety:

If the product is so good, why are your speed comparisons using your latest version against 2 year old products.

Wow, that hurt. I responded with a comment to the same entry, but the original implication was serious enough that I felt compelled to make a more visible statement about the benchmark that we publish. The core complaint in the comment was that we’re tilting the playing field by comparing PDFTextStream to other years-old Java PDF libraries. That was and is fundamentally untrue, except in the case of Etymon’s PJ library. Here, I’ll quote my response on this issue from my comment in the original entry:

Etymon PJ was abandoned in favor of PJx years ago; PJx hasn’t been under active development since April of 2004 though (see http://sourceforge.net/projects/pjx/), and in its current state provides no API for text extraction that we can see. However, our original benchmarks nevertheless showed the older PJ library to be the fastest of the available libraries (second to PDFTextStream), so we included it even though Etymon doesn’t appear to support it anymore.

Our perspective on this is that we have been trying to be as transparent and honest as possible with these benchmarks from day one; therefore, when searching out Java PDF libraries to compare to PDFTextStream, we wanted to find the toughest competition possible. We found Etymon’s PJ library to be the fastest text extraction library (second to PDFTextStream), so we included it in the benchmark. I think that’s very fair, and very honest. Frankly, given the sometimes rabid nature of skepticism in some developer circles, we would like likely have been suspected of hiding something if we had originally decided to exclude the PJ library because it’s no longer supported. Benchmarks have long been viewed with suspicion by technologists of all stripes, but being a publisher of a benchmark has provided me with some perspective. Yes, benchmarks can be gamed; yes, internally-conducted benchmarks canbe more vendor fantasy than reality. We knew this from the start, which is why we made extraordinary efforts to make the benchmark as transparent as possible (by publishing the benchmark code, test files, and methodology along with the bottom-line results). Any skeptics are free to run the benchmarks themselves, and report any observed discrepancies. If that’s not the gold standard of honesty when it comes to benchmarking, and if a benchmark conducted and published in this manner cannot be trusted by the broader developer community, then we’re all in trouble. There are thousands of software products out there, all of which claim a particular advantage over their competition. Some advantages are qualitative, and cannot be measured — that’s fine. However, other advantages are quantifiable; for these claims, we should all welcome a transparent, published benchmark. Otherwise, the process of selecting software products descends into a matter of who has the better marketing and PR game (not that that hasn’t already happened to a very large extent already, but that’s a different post!). Fundamentally, I hope the benchmark doesn’t matter. In the end, I would hope that every developer that is looking for a PDF extraction solution for Java would download all of the available libraries and do some real due diligence to determine which library delivers the best features and throughput in their environment. Voilà, everyone wins. There’s little left to say, except that, if you find our benchmark to be unconvincing, we remain open and receptive to feedback. If there’s a way we can improve the benchmark, whether through changing methodology, test files, or tweaking timing code, we’ll do it.