Skip to main content

Researching and creating a historical data layer from scratch

How a hobby turned into a 6-year commitment and what I learned along the way

I am a map nerd and have been since I was a child. When I was 10, I drew my own world atlas (now sadly lost). After a detour in a completely unrelated field I ended up in cartography and GIS and, eventually, with Esri Canada where I have the pleasure of working with maps and geospatial data every day. You’d think that would be enough but, no, I also engage in mapping in my spare time as a hobby.

My latest project has kept me busy in my off hours for the past 6 years. I didn’t think it would take so long and, if I had known that before I started, I might not have begun. But recently I published an ArcGIS dashboard that shows the locations of all the ships that were sunk during the Second World War. There are about 18,000 ships that were sunk as a result of enemy action during this conflict and the dashboard shows the locations of about 13,000 of them; the remainder don’t have a location (yet). In this app and database, I tracked the names and locations of the ships sunk, the tonnage for each (about 40 million in total), the number of casualties (about 500,000), how the ships were sunk and links to online references for each.

6 years of data gathering, an hour or two of dashboard building

My interest in naval history began when I was a child and read C. S. Forester’s Sink the Bismarck! It tells the story about the sleek, new, powerful battleship Bismarck that threatened to wreck havoc in the North Atlantic convoys, the almost immediate destruction of the HMS Hood that was sent to chase her down, the frantic days when Britain’s Royal Navy lost the Bismarck’s location and her now seemingly inevitable destruction. It was a thrilling story that, in retrospect, played a little loose with the facts but was highly entertaining to my boyhood imagination.

The KMS Bismarck sets sail on her fateful journey (Source: Wikipedia)

Fast forward to 2014 when I stumbled across uboat.net, a website that lists all of the German submarines in the Second World War and the ships they sunk. The record for each sinking included a simple Google Map. What it didn’t have was one single comprehensive map that showed the location of all the sinkings. Why not make something like that? I thought. Better yet. Why not map all the ships that sank in the war? Then you could actually start to see some of the temporal-spatial patterns happening. But surely someone has already done that?

Well, yes and no. I discovered this small map of (part of) the world showing many, but not all, of the ships sunk.

A map of some of the ships sunk in the Second world War: not quite complete (Source: SeaAustralia)

But it was hardly to my satisfaction because it wasn’t complete or interactive and the resolution was too small. So, I embarked on the project myself.

I began with uboat.net but quickly discovered a number of other great internet sources that focused on a country’s navy (e.g. Japan’s Combined Fleet) or a particular geographic area or a particular time or battle in the war.

These were all of great use, but I quickly realized that if I wanted to develop a comprehensive database of sunken ships, I would need to be more systematic in my approach. So, I started with the first day of the war, September 1, 1939*, and worked through the war, day by day, all 2,193 of them. Wikipedia has a listing of ships sunk by day but this is content provided by volunteers and there are numerous errors and omissions in it. Same thing for Wrecksite, a website that tracks all ships sunk (and who now limit access to the coordinates of the ship locations). To improve the quality of the data, I cross-referenced each of the ships with multiple sources. Often, there was conflicting or incomplete information and the data I created was a best guess. With famous sinkings such as that of the Bismarck, it was easy. For some events such as the attack on Pearl Harbor or the scuttling of the French fleet in Toulon or the multiple naval battles off Guadalcanal in the South Pacific, there were multiple maps available online. These were of great help but even then, the cartographers of the day sometimes got it wrong. But since I’m a map and data nerd, the hunt for accurate data was a pleasure (and a source of frustration) for me.

Just one of the many map sources used to compile the data (Source: Naval History and Heritage Command)

Most of the work for this project was in research and creation of the data. It took about 6 years of part time work to compile the entire database and even now I continue to find corrections and additions. The dashboard itself which I’ve created to display the map took less than a day to put together. Other than being an Esri Canada employee, the main reason I used ArcGIS Online to showcase my work was that the basemap was completely customizable. I used a combination of out-of-the-box basemaps – the Firefly imagery basemap and the Nova basemap – which I then tweaked using the Esri Vector Tile Style Editor. I added and labelled graticules to finish the look.

I’ve learned a few things along the way that you might find helpful in undertaking your own project of data creation – whether it be part of your job or as a hobby.

  1. Spend some time with your source material and study the topic you want to map before you begin to do any work on it. See what is already currently available and decide what it is you want to include in your database before you even begin to assemble the content. Envision how you want your final product to look and how it will be used. The time you spend planning and thinking through will be more than compensated by the time that you save from having to go back and revise things after you start.
  2. Get to know your data sources. This is always true, but it is especially true of online data sources. By repeatedly comparing data sources you’ll get a feel of your sources’ strengths and weaknesses. And you’ll discover that seemingly reliable data sources can sometimes be wrong (even original source material, particularly if it is historical in nature).
  3. Don’t rely on just one or two data sources. Unfortunately, you can’t always do this as many times multiple online data sources will reference the same original source.
  4. Depending on your topic, original sources might not always be your best source. In the case of my topic, the recording of ship locations in the heat of battle were not always accurate; subsequent research of wreck sites usually provides a better, more accurate location.
  5. Be patient. Depending on the size and complexity of your project, preparation, data gathering, creation and verification can take time. As I discovered shortly into this project the Second World War was a very long war.
  6. Be prepared to get distracted, especially on the Internet. That doesn’t mean it will be a waste of time. You could come across some interesting stories as I did with my project.
  7. And, finally, you’re likely going to be wrong at some point. For me this project will be with me for a long time as I edit, add and correct as I come across more and more sources.

What’s next for me? More mapping of naval events and battles in the Second World War with the hope that, in the end, I’ll have created a nice collection of interactive, online maps that might be seen as a resource to others who have an interest in Second World War naval history. Stay tuned!

*Though this is the commonly accepted date of the Second World War, it can be argued that the war started with the Second Sino-Japanese War that began in 1937.

About the Author

Paul Heersink is a cartographer and Program Manager of Esri Canada’s Roads & Addresses Program: an initiative that is aiming to build a seamless topographic basemap using contributor data. He has over 25 years of cartographic experience, working in both the public and private sectors. Paul has always been interested in mapping and drew his own atlas at the age of 10. He took a detour in his career through the fields of psychology and social work before returning to cartography.

Profile Photo of Paul Heersink