In our research, we invented a tool that assigns place markers to labels on old maps. What is that good for? Well, as you can see above, historical maps often contain a surprising amount of labeled places. This is interesting content one can extract from the map.
Now supposing you know the positions of these elements in the map, our tool can figure out how they belong together – and do so very quickly.
As an input, our tool requires a bounding box for each place marker and each label in the map. This information can be acquired using image processing, OCR or by manually drawing boxes.
Using a minimum cost matching approach, our tool assigns pairs of markers and labels based on their distance on the map and the surrounding elements. This only takes a couple of seconds on most maps.
Some situations in the maps are quite unclear, even to humans without domain knowledge. In this last step, our algorithm identifies such situations and presents them to the user for verification.
At the moment, our tool requires bounding boxes of each place marker and label in the map as an input. By bounding boxes, we mean rectangles drawn around those features in the map.
For testing and evaluation, this input can be made by hand (although it took us quite a while, even for a few maps only). For productive use in, say, a library, this manual process is not acceptable. For this reason, we are working right now on tools that identify place markers as well as text in historical maps automatically.
We have developed an algorithm that assigns pairs of markers and labels in the map. Its decisions are based on the distance between these elements. However, it does not simply always assign the elements nearest to each other. Instead, it is based on the concept of minimum cost matchings and optimizes a global objective function.
With this approach, we are able to calculate assignments with very low error rates – even in really difficult situations like in the figure on the right.
Our algorithm runs in polynomial time; in practice it only takes a couple of seconds to find an assignment, even if the map contains thousands of markers and labels.
Although the matching algorithm has a very low error rate, there remain some incorrect and unclear situations. Of course, we only want to show those few, interesting situations to the user (and not thousands of clear situations).
For that reason, we have developed a classifier that ranks the computed assignments by the confidence our algorithm has in them. These confidence values are based on the concept of sensitivity analysis (and can be visualized in different colors, for instance).
This allows us to only show a small fraction of all assignments to the user, but still to be sure we get user verification where needed.
Feel free to share your thoughts, ideas and complaints and contact the authors at any time. If you do something similar, go ahead and say hi, we are certainly interested in your project. You find contact information on our websites (see boxes on the right).
This project was developed at the Chair for Computer Science I, University of Würzburg, Germany.
We'd like to thank the University Library in Würzburg for the beautiful historical maps for this project!