WhoWasInCommand shows you all the sources that evidence every piece of data – but you probably missed the way it does this

WhoWasInCommand shows you all the sources used to evidence every piece of data it provides.

When you’re browsing your favourite units and commanders on WhoWasInCommand.com – like Operation Lafiya Dole, for example –  just hover your mouse over (or tap on, if you’re on a mobile or tablet) the bit of data you’re interested in and this happens:

sources_show_em_all

This interaction gives you a lot of useful information:

  • Because the little circle is green, it tells you that we have rated this bit of data as “High Confidence” (which means it is drawn from a wide variety of sources of different types)
  • The pop-over that appears when you click tells you how many sources there are
  • You can scroll to see them all the sources, along with links to the source’s URL (even if it’s now dead) and a link to a copy of the source we made by submitting it to the Internet Archive
  • The little question mark icon links off to the page in our Research Handbook that answers questions about this widget.

Now, I think this feature is pretty cool (well, I designed it so I would say that). We did some  research into how citations, references and footnotes were managed on websites, and our hunch was this would be a good start.

But it’s not my view that counts – it’s your view as a user that matters.

We get a lot of questions about our sources and whilst it’s clear this feature is a practical way to deliver information that answers those questions, I suspect that a lot of users don’t use it either because it’s not immediately apparent it is there or because it is not how users would think about how to find sources.

We could do to sit down with people who are using WhoWasInCommand, watch how they use the site, and ask them for ideas about how we can make these sorts of features clearer.

Any volunteers?

 

 

 

OpenStreetMap is (sometimes) a handy database of military and police locations – here’s how to see them

overpassblog3
OpenStreetMap – 70,641 objects are tagged with “landuse=military”. Source: TagInfo, 6 July 2018

Most of the time we use OpenStreetMap (OSM) as a gazetteer; that is, a means of representing the geographical aspects of Security Force Monitor’s data.

For example, our research indicates that the Mexican army unit 105 Batallón de Infantería had a base in Frontera, Coahuila, Mexico from 24 February 2014. To geocode this data we will search OSM to find the nearest “object” to the named settlement – in this case a “node” called Frontera (ID number 215400772)  – and link it to the unit as a base. Our Research Handbook contains the rules we use for doing this.

When we publish the data on WhoWasInCommand.com it will be displayed in the “Sites” section of the record for 105 Batallón de Infantería along with all the sources that evidence it:

overpassblog1
WhoWasInCommand.com: sites for 105 Batallón de Infantería, Mexico

So far we have found OSM to be a good enough gazetteer. And it’s free. And it’s open licensed. And we can fix it if we need to. So you won’t find us moaning and whinging.

However, OSM has a number of issues with accuracy, coverage and change over time so we do not use OSM as a primary source of information. Instead we use it as one of a number of sources of lead information which help us piece together the geographical footprint of a security force. It’s why, for example, we don’t place 105 Batallón de Infantería directly at Venustiano Carranza International Airport, even though this is the case on OpenStreetMap. We don’t (yet) have other sources to evidence this, but OSM gives us a useful prompt to investigate this further.

I’ll cover the pros and cons of using OSM in our research in a future blog post but for now I’d like to talk about how we OSM in the early stages of research into a security force.

OSM is a useful tool for getting an impression of a security force’s physical infrastructure: lead information about where it may have bases and facilities, and the terrain that may be reserved for use by security forces  (like firing ranges,  training areas, ). How do we do this?

OpenStreetMap is a database

The points, lines and polygons (“objects”) you see on OSM are described with “tags”: for example, a tag can define a line as a “road” or a shape as a “building”, and give it a name. Incredibly, on OSM there are  over 70,000 different ways to describe an object, but the tag we’re interested is “landuse=military”.

OSM currently has 70,641 objects to which the tag “landuse=military” has been applied. OSM’s own documentation about this tag is here. The tag can be refined further by applying another tag called “military=[something]” – the [something] in question can be values like the below:

  • military=airfield
  • military=barracks
  • military=bunker
  • military=checkpoint
  • military=training_area

There are currently over 290 additional tags used on OSM to increase the specificity about the type of military land use.

How can we use this information to aid our research? The usual need we have is for a BIG LIST that we can simply go through one by one and use as starting points for searches or to cross reference data we get from other sources. Although we can view these items on OSM we can’t get such a BIG LIST. To do this we need to use a way of accessing OSM’s data called Overpass API. This is mostly by programmers but for us patient non-programmers there is a slightly easier way to use this API – it’s called  Overpass Turbo.

Using Overpass Turbo to show military land use on OSM

So, here goes. Let’s ask OSM what objects in Mexico are tagged with “landuse=military”.  Head over to Overpass Turbo:

https://overpass-turbo.eu/

After opening that link copy the below into the input area on the left-hand side and then hit the “Run” button (top left):

// Limit the search to “Mexico”
{{geocodeArea:Mexico}}->.searchArea;
// Pull together the results that we want
(
 // Ask for the objects we want, and the tags we want
 way["landuse"="military"](area.searchArea);
 relation["landuse"="military"](area.searchArea);
 node["landuse"="military"](area.searchArea);
);
// Print out the results
out body;
>;
out skel qt;

What’s this then? Yes, it’s a map of just those objects tagged with “landuse=military”:

overpassblog4
Overpass Turbo – map of objects tagged “landuse=military” in Mexico (live)

Exciting! You can export this into a common geographical format (like KML or geoJSON). But I said we needed a list. Let’s alter the query a bit. Try putting this into the editor:

// Get a CSV output
[out:csv(name, "tags:name:es", "tags:name:en", ::"type", ::"id", ::"lat", ::"lon";true;",")][timeout:25];

// Limit the search to “Mexico”
{{geocodeArea:Mexico}}->.searchArea;
// Pull together the results that we want
(
 // Ask for the 
 way["landuse"="military"](area.searchArea);
 relation["landuse"="military"](area.searchArea);
 node["landuse"="military"](area.searchArea);
);
// Print out the results
out body;
>;
out skel qt;

Same data, but in a list that we throw into a spreadsheet to work more on:

overpassblog5
Overpass Turbo – CSV list of objects tagged “landuse=military” in Mexico (live)

Even the snippet above gives us some unit and facility names to research further, as well as the locations of possible facilities that perhaps someone with local knowledge has flagged as being used for military stuff.

The queries above can be altered to search within different countries or other defined areas, examine different tags (like “amenity=police”… give it a try), and export more data (such as an object’s history).

Wrapping up

  • As well as being a map that we can search, OpenStreetMap is a database that can we query in depth.
  • Historical and contemporary military and police locations may be identified inside OpenStreetMap using the “landuse” tag. More information about the tagging system can be found on OSM’s own TagInfo service.
  • Using Overpass Turbo we can pull out that information as use it as lead information during our research. Overpass Turbo is free to use, and can output  maps and lists. The Overpass query language is documented here and there are some super examples on the OSM wiki here.

I’m sure there are more elegant ways to use Overpass Turbo than my basic code, so should anyone wish to help us out  I’m all ears (tom [at] securityforcemonitor.org). We’re also interested in improving the data on military and police facilities that exists in OSM, … but that’s another post.

I hope this has been a helpful read, and do comment, respond and correct as needed.

Not all snapshots are created equal – a time-saving Wayback Machine technique

We’re going to write about our daily work more often.  I’ll go first with a nerdy research tip:

The Internet Archive’s Wayback Machine (the awesomeness of which I won’t bang on about) can show you when captures of the same page differ in some way from each other.

So what?

Here’s a long dead page used by La Secretaría de la Defensa Nacional (SEDENA) in Mexico to list the commanding officers of Zonas Militares (a major tier of the army in Mexico).

http://www.sedena.gob.mx:80/ejercito/comandancias/zon_mil.htm

It exists only in the Internet Archive’s Wayback Machine now. Here are two captures of that URL – made in 2004 and 2005 respectively . The screenshots below show only the first 10 entries (of over 40 in each). Can you spot the difference?

cdxblog1
Clipping from 8 February 2004 Wayback Machine snapshot of SEDENA army commanders page
cdxblog2
Clipping from 3 October 2005 Wayback Machine snapshot of SEDENA army commanders page

Although the archived URL is the same, the content is not. For example, in the February 2004 snapshot SEDENA lists “Noe Antonio Ordoñez Herran” as the commander of 1/a Z.M. However, by October 2005 SEDENA lists “Germán Redondo Azuara” as the commanding officer. This is a substantive difference that we want to capture; there are also other differences between these two snapshots.

How do we approach it? First, we establish the total number of snapshots. Helpfully, the Wayback Machine tells us this for any URL that it holds snapshots for. For example, the present SEDENA page was captured 57 times:

cdxblog3

It is likely that a page like this may have been updated regularly: the little bar chart tells us that there are differences in the sizes of the snapshots, indicating that something changed. The changes could be an update to the text in the list of commanders,  a design change of some sort that affects the page size.

Do we have to wade through all of them to find out what the differences are? No. The Wayback Machine can tell us which snapshots differ from the previous ones. Therefore, we can just go to those that differ in some way from the others and extract information from those.

To do this, we have to use another way to ask the Wayback Machine questions: the Wayback CDX server. The CDX server is a more advanced way to query the Wayback Machine, but also using your browser. It doesn’t have graphical user interface to browse the archived pages. Rather it provides metadata about the snapshots.

Here’s Wayback Machine data about our URL, but viewed from the CDX server:

cdxblog4
Some output from the Wayback Machine CDX server.

Here’s the URL that gives you those results:

https://web.archive.org/cdx/search/cdx?url=http://www.sedena.gob.mx:80/ejercito/comandancias/zon_mil.htm

This is the few rows of the same 57 results but shown as metadata rather than as a navigable, graphical version of the web captures themselves. I’m sure you can figure how out how to turn this list into a spreadsheet that you can use to organise your research (hint: copy-paste into your favourite spreadsheet, then text-to-columns using a space as the separator).

By changing the URL a bit we can filter out snapshots that are the same as the preceding one:

https://web.archive.org/cdx/search/cdx?url=http://www.sedena.gob.mx:80/ejercito/comandancias/zon_mil.htm&showDupeCount=true&collapse=digest

We’ve tacked on two new bits to the end of the query URL:

&showDupeCount=true

This shows which of the snapshots have duplicates. And then:

&collapse=digest

This has the effect of removing data about snapshots that are the same as the previous one.

Overall, our results are filtered from 57 down to 31 snapshots. It’s removed 26 that were the same as the preceding one and saved us a good hour of work.

As it happens, of those 31 snapshots only 12 hold content that is useful to us. The remainder are captures of server errors, because SEDENA changed its official website (and URL structure) four times between 2004 and 2017. But that, my friends, is another blogpost.

So, to wrap up:

  • The Wayback Machine has the equivalent of an advanced query that helps us find out when snapshots of the same page differ from each other.
  • It’s called the Wayback CDX server, and you can read more about what it does on its Github page.
  • Using it at the beginning of a bit of research can save you a lot of time.

I hope this helps some of you save time when trawling the Wayback Machine, and encourages you to experiment a bit with obscurer features of well known tools. It certainly helps us create the rich data you see on WhoWasInCommand.com

Cheers!