This weekend, security and human rights wonks received an early Christmas present from the U.S. Department of State: the Foreign Military Training and DoD Engagement Activities of Interest, 2018-2019.
Published annually since 2000, this extensive report contains detailed information about non-classified training activities the Department of State and Department of Defense have funded and delivered to non-US security forces in the preceding fiscal year. The 2018-2019 report, then, delivers detailed data on trainings that took place in the 2017-2018 period, along with a less detailed list of those planned and ongoing in the 2018-2019 period.
The value of this very detailed data is high but its accessibility is very low. This is because of the backwards way the data are published: 1000s of pages of tables in PDFs […] Our hope is that when the next report arrives in a short few months, we will be able to turn it into machine readable data and pass it around the sector in minutes, rather than months.
So, over this weekend we grabbed the 462 page PDF of the 2018-2019 report and extracted 11,904 rows of new training data from Volume I: Section IV, which details trainings that have taken place in the 2017-2018 period. We’ve added this data to our online database so you can search and download it:
(To download the data for use in a spreadsheet, scroll down until you see “Advanced export” – tick “Download file” and “Stream all rows” and then select “Export CSV”)
With the inclusions of this new 2018-2019 data, we’re publishing data on 213,603 trainings in 185 countries between 1 October 2000 and 30 September 2018.
The data publishing platform is flexible and powerful and gives you tools to search, sort, facet, filter and download all or parts of the dataset. The platform offers a user interface that you can use to build queries; you can also query the data directly using Structured Query Language (SQL), but this will take a bit of practice to master. I’ll write up some better instructions and add them to our Research Handbook in due course. In the meantime, here are some queries to help you get started:
- Filter the 2018-2019 data release by country: for example, here’s the training and assistance activities delivered to Ukraine security forces in the 2017-2018 period.
- Create a spending overview for the year: here’s a query showing the expenditure and number of trainings by country in 2017 and 2018.
- Examine which units delivered and received training: this query prints a tables showing the top 20 US units that delivered training in the 2017-2018 period. By adapting that query a little, we can show which units the US spent the most money training. This last query is interesting, and shows up some of the limits of the data: the generic and not particularly precise “Army” is the number one result!
The nature of the automated data extraction process means we can’t guarantee that the data are error-free. You can see the complete extraction code and process over on Github, and we’ve done our best to ensure that it accurately turns the content of the original PDF report into data – before using this data, however, please check it against the source.
Happy data wrangling!
Image: Tom Longley, CC-BY-4.0