Memory-mapping Files in Java Causes Problems

Today, we released PDFTextStream v2.0.1— a minor patch release that contains a workaround for an interesting and unfortunate bug: on Windows, if one accesses a PDF file on disk using PDFTextStream, then closes the PDFTextStream instance (using PDFTextStream.close()), the PDF file will still be locked. It can’t be moved or deleted.

This is actually not a bug in PDFTextStream, but in Java, documented as Sun bug #4724038. In short, any file that is memory-mapped cannot reliably be “closed” (i.e. the DirectByteBuffer (or some native proxy, perhaps) that holds the OS-level file handle does not release those resources, even when the FileChannel is closed that was used to create the DirectByteBuffer). Reading the comments on that bug report show a great deal of frustration, and rightly so: regardless of the technical reasons for the behavior, memory-mapping files isn’t rocket science (or, hasn’t been for 20 years or somesuch), and this kind of thing shouldn’t happen.

Since we can’t fix the bug, we devised a workaround: if you set the pdfts.mmap.disable system property to Y, then PDFTextStream won’t memory-map PDF files. Simple enough fix. FYI, there appears to be no performance degredation associated with using PDFTextStream in this mode.

Of course, this is only a problem on Windows, which does not allow files to be moved or deleted while a process has an open file handle. We have a number of customers that deploy on Windows Server (although that number is much smaller than those that deploy on a variety of *nix), but until last week, they hadn’t reported any problems. Our best guess is that, given the systems we know those customers are running, they are probably using PDFTextStream’s in-memory mode (where PDF data is in memory, and provided to PDFTextStream as a ByteBuffer). Of course, in that case, no file handles are ever opened, so all is well.