Analysis of sources for data on security forces: can computers help us out?

all_5_illustration
Image: Security Force Monitor uses sources like news articles, NGO reports and official press releases to create data on the structure, leadership and behaviour of security forces

Can computers help Security Force Monitor’s researchers increase their speed and accuracy when extracting relevant data about security forces from the text of news articles and reports?

Over the last few months Yue “Ulysses” Chang, a masters student at the Data Science Institute at Columbia University, has interned with us to help us explore this question. The quick answer is “yes, 79% of the time.” The longer (and hopefully nice and readable answer) starts in this blog post, the first of a short series.

How do we create data about security forces?

Each week at Security Force Monitor we identify and read 100s of news articles, reports, maps and datasets – these are the “sources” out of which we pull thousands of little details about organizations, names and locations. We stitch these together to create the rich view of security force structures and commanders that you can search through on WhoWasInCommand.com.

This is time-consuming work. In most cases after we’ve found a useful source we just have to read through it, identify the snippets of information we need and then copy, paste or re-type them into our databases. Here’s a few paragraphs from a typical source – “Police IG Redeploys AIGs, CPs For April 11 Polls” – published by Channels TV on 10 April 2015:

ig_redeploys_aigs_2015-410
Image: excerpt from “Police IG Redeploys AIGs, CPs For April 11 Polls”, Channels TV (Nigeria), 10 April 2015.

We can use the information in this source to support the below statements, and enter the relevant values into a database:

  • On 10 April 2015 (the publication date of the source) Inspector General of Police, Assistant Inspector General, and Deputy Inspector-General of Police are ranks in the police force in Nigeria.
  • On 10 April 2015 Force Public Relations Officer is a title in the police force in Nigeria.
  • On 10 April 2015  Suleiman Abba is a person holding the rank of Inspector General of Police (IGP) in Nigeria.
  • On 10 April 2015 Aigusman Gwary is a person holding the rank of Assistant Inspector General of Police (AIG) in Nigeria.
  • On 10 April 2015 six Deputy Inspectors of Police “coordinate activities” in six geo-political zones.

Every one of these data points has at least a single source. For example, to make our data on units – distinct parts of security forces such as army battalions or police divisions – we looked at around 3500 unique sources taken from over 200 different publications.  From these sources we were able to evidence 25,505 data points with a single source, 2086 data points with more than 10 distinct sources, and 59 data points with over 40 distinct sources.

table_sources_to_datapoints
Table: How many data points about the units on WhoWasInCommand.com are evidenced by more than one unique source?

We presently cover branches of the security forces of Nigeria, Mexico and Egypt.  As we expand our coverage to other countries we will need to consider ways of reducing the time spent and risk of error inherent in this part of our research process. If we can reduce the time we spend searching, cutting and pasting bits of text, then we can spend more time cross-referencing and producing interesting analysis from the data. Could we get more help from computers than we currently do?

“NLP”, “NER” … ?

Computers can read too. Sort of.

Natural Language Processing (NLP) is a long-established field of computer science that looks at how machines relate to people’s speech and writing, and ultimately how they can comprehend information passed to it by a person. The fruits of NLP research provide technologies that power everything from the recommendations you get on search engines, those (irritating) automated voice call systems, and the (less irritating) digital voice assistants. Named Entity Recognition (NER) is the sub-field of NLP that gives computers the capability to pick out things that people can recognise in text – like names, persons, organizations, locations, dates. Could they be applied to our work?

We can start exploring this question very quickly by using one of numerous “off the shelf” NLP and NER toolsets. To test our ideas out we have chosen a toolkit called spaCy.  This has the benefit of having a wide range of functions, and being free and open source – this enables us to use the toolset without direct cost.

Without any modification spaCy can assess text and identify persons, organizations, locations, dates and lots of other types of entities. It can also be trained to improve its ability to detect the above entities (like adding in new geographical model), and identify new entities such as rank and role, or connections between entities. What’s not to love?

NER and real sources

Let’s give it a try. We can take the  text from the sample news article we analysed above, and place it into into spaCy.  It will highlight different parts of the text that it considers to be entities:

displacy_ner_example
Image: Use of unmodified, untrained spaCy NER algorithm to identify people, places and organizations in text (see an interactive version of this example).

The performance here is is ok, but it is not without problems. For example, spaCy correctly picked out all but one of people mentioned in the article (“Aigusman Gwary”, who it tagged as an Organization rather than a Person). It has also successfully identified Lagos and Bauchi as geo-political entities (“GPE”) but misses “Akwa Ibom” and “Rivers”, and mis-categorizes “Jigawa” as an organization. There are other misses in there too.

Bringing this into the work of Security Force Monitor

In this post we’ve outlined the challenges we face, and in broad terms the way that we see this set of technologies offering an opportunity to address them. The intriguing question for us is how to take the raw capabilities of NER and have them benefit our research work in specific and effective ways. We have a long list of things on our mind, including:

  • The skills and financial costs that will be required to develop, implement and maintain such a system in a way that is reliable and effective.
  • Whether we can improve the performance of the algorithm by using the data we have already collected to train spaCy to better pick out what we are looking for in a source.
  • How to reconcile the stream of information coming in from NER with the data we already have – for example, what process will we use to figure out if “Jane S Smith” and “Jayne S Smith” are the same person?
  • How we evaluate NLP and NER systems so we know whether they are getting better (or worse!).
  • The type of workflow and user interface that would be needed to bring these capabilities effectively into our research work so they are actually helpful.

In the next post in this series, Ulysses and I will start digging into these questions and revealing some of the work that we have done so far.

(Article edited on 22 February 2018 to correct typos and clarify wording)

February data update on WhoWasInCommand.com – SARS Nigeria, Mexico military garrisons, new Egypt units

Since December 2017 we have made published two updates to WhoWasInCommand.com, adding a large number of new records, expanding others and making some corrections. Cumulatively, these updates increase the data available on WhoWasInCommand.com by 25%. In this blog post we’ll look in depth a recent restructure of the Special Anti-Robbery Squads (SARS) of the Nigeria Police Force and give a brief overview of other updates.

Special Anti-Robbery Squads (SARS) – Nigeria Police Force

SARS_combined_v1
Changes in the chain of command of Special Anti-Robbery Squads (SARS) of the Nigeria Police Force

SARS are a specialised type of unit of the Nigeria Police Force. They were established in each state and the Federal Capital Territory (FCT) to combat violent crime. Civil society groups have reported on allegations of human rights abuses by SARS for at least 15 years. In its September 2016 report “You Have Signed Your Death Warrant” Amnesty International documented numerous allegations against SARS across Nigeria, including acts of torture and other cruel, inhuman or degrading treatment or punishment. We have carefully extracted these incidents from Amnesty’s report and made them searchable on WhoWasInCommand.com.

In December 2017 Nigerian citizens rallied around the #EndSARS hashtag on social media, using it to make allegations and share experiences of violence and corruption by SARS personnel. #EndSARS culminated in a number of protests during which the movement’s leadership demanded the squads be disbanded. In response, the Inspector-General of Police did not disband SARS but restructured the units… twice. What, if anything, changed?

For a long stretch between 2010 and 3 December 2017 the SARS in each state and the FCT of Nigeria had two different and simultaneous chains of command. Each state/FCT SARS was under the Criminal Investigation Division (CID) for their state/FCT while also being “coordinated” by a Commissioner of Police for SARS who was under the Federal Criminal Investigation Department/”D” Department of the Nigeria Police Force. Ultimately both chains of command end at the Inspector General of Police (IGP) at Force Headquarters.

On 4 December 2017 the IGP announced a dramatic reshuffle: SARS in each state/FCT would report to the Federal SARS, which itself would be moved under the “B” Department/Operations Department at Force Headquarters in Abuja. Thus for a brief moment all of the SARS units in each state had a single chain of command.

It may be that this was a mistake because just over a fortnight later on 22 December 2017 the IGP made another announcement: SARS would return to having two simultaneous chains of command. SARS in each state/FCT would be under the command of the state/FCT Commissioner of Police (through the CP’s deputies in charge of operations) as well as continuing to report to the CP in charge of Federal SARS who was still under the “B” Department/Operations.

So, the overall effect on the SARS chain of command is the removal of State CID, along with a shift in reporting from “D” Department (Investigations) to the “B” Department (Operations) at Force Headquarters. The impact of these restructurings on SARS themselves are difficult to assess. A past reorganization announced by the IGP in November 2015 – which split SARS in each state into “operations” and “investigations” branches – apparently was never actually implemented on the ground. Amnesty International reported SARS officers they interviewed in June 2016 were “unaware of the IGP’s announcement [in November 2015] that SARS ha[d] been split into two units for operations purposes.” For now, SARS is also still listed as under the “D” Department on the Nigeria Police Force’s website. We will continue to watch developments closely, update and extend our data on SARS as more information becomes available.

You can view the updated data on the Special Anti-Robbery Squads (SARS) of the Nigerian Police on WhoWasInCommand.com.

Other updates to data on security forces in Nigeria, Mexico and Egypt

Nigeria

As well as our close look at SARS above we have updated WhoWasInCommand.com with data on police units in Delta and Bauchi States in Nigeria. Further, we have now added allegations of human rights abuses by security forces against pro-Biafra protesters in the south-eastern states of Nigeria. In its November 2016 report “Bullets Were Raining Everywhere” Amnesty International reports numerous allegations of extrajudicial killing, torture and arbitrary arrest and detention committed by security forces against pro-Biafran protesters between August 2015 and August 2016 in Nigeria’s Anambra, Abia and Rivers States.

View the updated Nigeria data on WhoWasInCommand.com:

Mexico

We have extended the Mexico dataset on WhoWasInCommand.com to cover Military Garrisons (“guarniciones militares”) and their commanders. Garrisons can play an active role in military operations and often command smaller units as well. One example of this is Guarnición Militar de Ciudad Juárez which participated in a major military joint operation Operación Conjunta Chihuahua and commanded both the 9 and 20 Regimientos de Caballería Motorizado (motorized cavalry regiments).

In an earlier upload of data we had omitted full descriptions of a number alleged human rights abuses in Mexico. We have now corrected this.

View the updated Mexico data on WhoWasInCommand.com:

Egypt

For our data on Egypt, we have added initial data on top level military structures and entries for police units in Aswan and Al Sharqia governorates in Egypt. We’ve also included a small number of allegations of human rights abuses by police in Egypt as reported by Human Rights Watch (HRW) in September 2017.

View the updated Egypt data on WhoWasInCommand.com:

 

Launching WhoWasInCommand.com – a power tool for investigating security forces

It’s a big day here at  Security Force Monitor. We’re excited to reveal our first official product: WhoWasInCommand.com.

WhoWasInCommand
WhoWasInCommand.com shows the composition of security forces, their commanders, and the locations of operations and bases

WhoWasInCommand.com makes it fast and easy to find detailed information about the chain of command, areas of operation, commanders and bases of the police, military and other security forces of a country and discover links to alleged human rights violations.

This platform is a unique resource containing a level of detailed data about security forces that has never existed before. It’s the result of an enormous amount of work – and would not have been possible without extensive advice and help from civil society partners. We hope that you find this new tool useful.

10 reasons to use WhoWasInCommand.com

We’d like to point out some of the things that we think make WhoWasInCommand.com a powerful and effective research tool:

  1. Unique, high grade research: WhoWasInCommand.com contains thousands of units and commanders from the security forces of Egypt, Mexico and Nigeria going back over 10 years. We are committed to expanding our coverage for those and other countries. Expect more data soon!
  2. Start with search; find things fast: It’s easy to find what you want, no need to navigate unnecessarily.
  3. Refine your search with powerful filters: Your search results can be refined using nearly 30 different dimensions about location, time, organizational attributes and relationships, and biographical details of personnel.
  4. Crystal clear views of the data: We’ve designed simple maps, tables and tree charts to present the data we have in the clearest ways possible.
  5. Check out where every bit of data comes from: You can take a look and get at the sources used to evidence every single datapoint on WhoWasInCommand.com. Also the methods we have used to create the data are fully documented in our Research Handbook.
  6. Take your findings home with you: Search results, along with any dossier on WhoWasInCommand.com can be downloaded into a spreadsheet along with all their sources.
  7. Get help when you need it: WhoWasInCommand.com contains help and tips throughout, explaining how different bits of the site work, and what the data presented means.
  8. Use it in your language: WhoWasInCommand.com is currently translated into English and Spanish, with several more languages to come.
  9. Use it on mobiles and tablets! WhoWasInCommand.com is mobile friendly.
  10. Get your own WhoWasInCommand: The software powering WhoWasInCommand.com is open source, which means you can set up and run your own copy of the platform.

Security Force Monitor has partnered with DataMade to create WhoWasInCommand.com. DataMade has operationalized and refined Security Force Monitor’s data structure, created a powerful open source platform to put the data online, and made a significant contribution to the concept and design of WhoWasInCommand.com.

We hope that WhoWasInCommand.com aids the work of journalists, human rights researchers, advocates, litigators and others working to make security forces accountable to the public they serve.

We’re keen to hear what you think about WhoWasInCommand.com. Email us at technical@securityforcemonitor.org.

Learning from our users – first feedback on our prototype

In mid-April we publicly released the first version of our web application for feedback. We sought out advice from human rights researchers, international criminal litigators, investigative journalists, and policy advocates – spending over an hour with more than 45 users. Our whole team was struck by the willingness of our colleagues to dedicate time to giving us feedback. To everyone that took the time to talk with us – thank you!

This post will cover what we’ve learned from our users, how their feedback helps our mission, and why we’ll be reaching out again shortly.

Interviewees really liked… Interviewees had questions about …
  • The content itself
  • Charts showing command structures
  • Being able to access all the sources
  • Ways of going forward and backwards in time
  • Background information about security forces in a country
  • Showing analysis rather than just data
  • Visual clutter and data density
  • How to find data quickly
  • Downloading our information
  • Not knowing how data was selected for inclusion
  • Completeness of the data
  • Slow application responsiveness

We are answering the right questions, but not always in the right way

Users strongly validated the premise of our work – they very much wanted (and generally found it hard to find) information on the organizational structure, command personnel, location and areas of operation of security forces tracked through time. Interviewees could see the value of the dataset – in part or as a whole – to their own work. They were also intrigued by the visuals on offer, liking our ambition.

Our visualization of the command chart was a huge success, users loved being able to see that information in a visual way. Users had extremely positive views about actual data. They got glimpses its value, in particular where we included the analysis we can produce using the data. For example, for each alleged human rights incident we included a list of nearby units, which numerous interviewees found clarified the overall purpose of our research. They also liked the ability to see sources for each datapoint, which would help them appraise the data for inclusion in their own work. While users did not find the timeline functionality intuitive to use, when we explained how they could “time travel” through our data to see the command tree, commanders and other data at a particular point in time they were thrilled.

The biggest issue for users was that they were often overwhelmed with information, particularly in the initial map view of country. This made it difficult to grasp what they were looking at, or where to go next. There were many questions related to our terminology (what do we mean by area of operations, affiliated person, etc). Users often had difficulty navigating around the application from page to page, and our tools to sort through the visuals (like the filters on the map page) were not intuitive.

Users Want More of This Users Want Less of This
moreofthis lessofthis

What we will do next

The 45 people we interviewed told us straight what they liked and did not about our work so far They have given us good guidance on the direction we need to take.

  • Make it far, far simpler to use – less of a stand-alone application and more of a webpage
  • Simplify the presentation of key information, using visualization more sparingly, offering contextual guidance where needed.
  • Make it ‘search first’, giving lots of ways of find, sort and filter throughout the application.
  • Speed it up, and give lots of cues about how user things change things when the user does something.
  • Let people take the data home.

We have taken the feedback users have given us and will launching a radically new and improved application in the coming months – so stay tuned for more updates!

Why I started the Security Force Monitor

by Tony Wilson.

Today, I’m sharing something with you that I’m proud of.  It’s a new research tool we’ve created – an early version that shows our research, and embodies our reason for being.  It brings to light something we have never seen so clearly in one place before: the structure and operations of security forces, surfaced and arranged from thousands of publicly-available sources.

This slideshow requires JavaScript.

You can visit our prototype online here. Currently it covers security forces in Mexico and Nigeria – over 1,000 discrete organizations and nearly 900 affiliated persons.

How could all this information not already exist?

I started the Security Force Monitor in order to address a simple problem – the lack of detailed information about the police, military and other security forces of a country. At the time, I was trying to help advocates raise human rights concerns about U.S. security assistance to Bahrain. Since the protests that began in February 2011, rampant human rights abuses had been documented, but it was incredibly difficult for human rights researchers to identify specific perpetrators because the security forces were not transparent. This was extremely frustrating since it made advocacy supporting human rights conditionality on security assistance even harder. So, the task was clear: find detailed information about the security forces of Bahrain.

After several weeks of pilot research together with a colleague we had found data from hundreds of sources, and compiled them into an ever-lengthening Word document. This initial research demonstrated the information gap could be filled, but just as quickly a second problem arose – making sense of large amounts of quite detailed data. Using the limited skills I had, I created a rough Google Map and a rudimentary organizational chart, in an effort to make sense of all that data.

This slideshow requires JavaScript.

Even with these basic tools, I could see compelling connections between alleged human rights abuses and specific units and commanders. But the limitations were also evident. It was clear to me that in order to be a sustained effort the Security Force Monitor could not just work off of a text document or even a spreadsheet. It would need professionally-developed tools that a team could use to make accurate data easy to create, and a way of publishing it that would aid others in their own investigations.

Armed with some sketches of what a potential platform could look like, I interviewed almost 90 journalists, advocates, human rights researchers and others engaged in public interest efforts and asked them what would be useful to them and their work.

sketch
A capture from initial sketches for an application, showing a command tree, and a time slider

Their feedback guided the genesis of Security Force Monitor.  I have been fortunate to gain the support of the Open Society Foundations and the Oak Foundation, win the Knight News Challenge on Data, and pull together a great team – Tom Longley and Michel E. Manzur. The Security Force Monitor also found a welcoming institutional home at Columbia Law School Human Rights Institute and has begun to build an exceptional Advisory Council for the project. To move from concept to tool we have worked with the creative civic technologists at DataMade, FFunction and OpenNorth. Together we worked to create the datasets and produce our first attempt at a product, the prototype that we are releasing today.

The picture it shows of security forces is rich and detailed, and changes with time. The prototype platform shows the changes that occurred over time as units were created, moved or disbanded; and as commanders were promoted, retired or fired. Finally, all our work is transparent: every data point is sourced with citations back to where we got our data.

This prototype is a big first step. When we were getting our idea off the ground we talked with almost 90 journalists, activists, researchers and policy experts. Now, with this prototype in hand, we will be conducting even more interviews over the next several weeks on what does and does not work in order to develop an even better version.

You may be getting an email from us very soon. In fact, if you have thoughts and feedback, email me directly – tony [at] securityforcemonitor.org.

Our mission is simple but bold. The Security Force Monitor will organize every piece of public information about security forces. We’ll produce research of the highest quality that will help make security forces more transparent. I believe that producing this research will aid journalists, civil society, human rights researchers, oversight efforts and others in making security forces more accountable.

You can help us do this. Take the first step by using our prototype and telling us what you think. Working together we can make the Security Force Monitor an indispensable digital service for transparency, accountability and human rights.