Update 2018-02-06: Command-line tools for comparing result files to the ground truth are now available. They can be downloaded as a JAR or ZIP with source code. The GitHub page is here; we would appreciate it if you would flag any issues or bugs. Thanks!
Update 2013-04-05: The competition dataset has now been released and can be downloaded from the competition home page.
Update 2013-03-17: Tools for automatic visualization and comparison to ground truth are now online.
New deadlines for registration (31 March) and result submission.
Click here to get back to the previous page describing the dataset format.
We have now made a GUI tool available, which enables a visual comparison between PDF, ground truth and result table regions. It also enables automatic adjustment of coordinates by setting offsets and scale factors (which can also be negative). We hope that this tool will make it easier for participants to prepare their results for submission in our format.
The tool is written in Java and has been built for Linux and Windows, both in 32-bit and 64-bit versions. It can be downloaded here. It does not require installing; simply unzip the archive and run the pdfAnnotator executable.
You can open either a single PDF file (File | Open PDF) or a ZIP file (File | Open ZIP). In either case, the tool will automatically load the corresponding ground truth and result files, as long as they conform to the following naming conventions:
The tabs "Ground truth" and "Result" show the GT and result data and enable it to be edited and saved. If either GT or result is missing, it can be loaded by clicking on the "+" button.
It is possible to open several documents at once and generate results for a set of documents. These documents are displayed in the top-left frame. It is also possible to close an open document by right-clicking on it and selecting "Close". If working on a complete dataset, we recommend putting everything in a ZIP file, which can be loaded by the tool at once.
The document window on the right-hand side shows a preview of the PDF (the tabs on the bottom switch between character elements and rendition). On the character element view, it is possible to show and hide the GT and result boxes as well as adjust offsets and scale factors to ensure that the coordinate systems match. The completeness and purity scores are also displayed for the current page; in order to generate them for the complete document as well as all other open documents, click on "Tools | Create Report".
Anssi Nurminen has kindly released Python scripts for comparing GT and result files for both sub-competitions. Please note that these scripts have not been written or verified by the competition organizers, and therefore may not behave exactly the same way as the algorithms that we will use to compare each participant's results. In particular, the table region comparision currently uses word-level instead of character-level granularity which, in special cases, can lead to slightly different numerical results.
You may wish to try out these scripts to test your algorithm's result. You can download them here:
We are grateful to Anssi for making these scripts available.
If you have any further questions, please feel free to get in touch.
Click here to get back to the previous page describing the dataset format.back to the competition homepage