Analysis of sources for data on security forces: can computers help us out?

Image: Security Force Monitor uses sources like news articles, NGO reports and official press releases to create data on the structure, leadership and behaviour of security forces

Can computers help Security Force Monitor’s researchers increase their speed and accuracy when extracting relevant data about security forces from the text of news articles and reports?

Over the last few months Yue “Ulysses” Chang, a masters student at the Data Science Institute at Columbia University, has interned with us to help us explore this question. The quick answer is “yes, 79% of the time.” The longer (and hopefully nice and readable answer) starts in this blog post, the first of a short series.

How do we create data about security forces?

Each week at Security Force Monitor we identify and read 100s of news articles, reports, maps and datasets – these are the “sources” out of which we pull thousands of little details about organizations, names and locations. We stitch these together to create the rich view of security force structures and commanders that you can search through on

This is time-consuming work. In most cases after we’ve found a useful source we just have to read through it, identify the snippets of information we need and then copy, paste or re-type them into our databases. Here’s a few paragraphs from a typical source – “Police IG Redeploys AIGs, CPs For April 11 Polls” – published by Channels TV on 10 April 2015:

Image: excerpt from “Police IG Redeploys AIGs, CPs For April 11 Polls”, Channels TV (Nigeria), 10 April 2015.

We can use the information in this source to support the below statements, and enter the relevant values into a database:

  • On 10 April 2015 (the publication date of the source) Inspector General of Police, Assistant Inspector General, and Deputy Inspector-General of Police are ranks in the police force in Nigeria.
  • On 10 April 2015 Force Public Relations Officer is a title in the police force in Nigeria.
  • On 10 April 2015  Suleiman Abba is a person holding the rank of Inspector General of Police (IGP) in Nigeria.
  • On 10 April 2015 Aigusman Gwary is a person holding the rank of Assistant Inspector General of Police (AIG) in Nigeria.
  • On 10 April 2015 six Deputy Inspectors of Police “coordinate activities” in six geo-political zones.

Every one of these data points has at least a single source. For example, to make our data on units – distinct parts of security forces such as army battalions or police divisions – we looked at around 3500 unique sources taken from over 200 different publications.  From these sources we were able to evidence 25,505 data points with a single source, 2086 data points with more than 10 distinct sources, and 59 data points with over 40 distinct sources.

Table: How many data points about the units on are evidenced by more than one unique source?

We presently cover branches of the security forces of Nigeria, Mexico and Egypt.  As we expand our coverage to other countries we will need to consider ways of reducing the time spent and risk of error inherent in this part of our research process. If we can reduce the time we spend searching, cutting and pasting bits of text, then we can spend more time cross-referencing and producing interesting analysis from the data. Could we get more help from computers than we currently do?

“NLP”, “NER” … ?

Computers can read too. Sort of.

Natural Language Processing (NLP) is a long-established field of computer science that looks at how machines relate to people’s speech and writing, and ultimately how they can comprehend information passed to it by a person. The fruits of NLP research provide technologies that power everything from the recommendations you get on search engines, those (irritating) automated voice call systems, and the (less irritating) digital voice assistants. Named Entity Recognition (NER) is the sub-field of NLP that gives computers the capability to pick out things that people can recognise in text – like names, persons, organizations, locations, dates. Could they be applied to our work?

We can start exploring this question very quickly by using one of numerous “off the shelf” NLP and NER toolsets. To test our ideas out we have chosen a toolkit called spaCy.  This has the benefit of having a wide range of functions, and being free and open source – this enables us to use the toolset without direct cost.

Without any modification spaCy can assess text and identify persons, organizations, locations, dates and lots of other types of entities. It can also be trained to improve its ability to detect the above entities (like adding in new geographical model), and identify new entities such as rank and role, or connections between entities. What’s not to love?

NER and real sources

Let’s give it a try. We can take the  text from the sample news article we analysed above, and place it into into spaCy.  It will highlight different parts of the text that it considers to be entities:

Image: Use of unmodified, untrained spaCy NER algorithm to identify people, places and organizations in text (see an interactive version of this example).

The performance here is is ok, but it is not without problems. For example, spaCy correctly picked out all but one of people mentioned in the article (“Aigusman Gwary”, who it tagged as an Organization rather than a Person). It has also successfully identified Lagos and Bauchi as geo-political entities (“GPE”) but misses “Akwa Ibom” and “Rivers”, and mis-categorizes “Jigawa” as an organization. There are other misses in there too.

Bringing this into the work of Security Force Monitor

In this post we’ve outlined the challenges we face, and in broad terms the way that we see this set of technologies offering an opportunity to address them. The intriguing question for us is how to take the raw capabilities of NER and have them benefit our research work in specific and effective ways. We have a long list of things on our mind, including:

  • The skills and financial costs that will be required to develop, implement and maintain such a system in a way that is reliable and effective.
  • Whether we can improve the performance of the algorithm by using the data we have already collected to train spaCy to better pick out what we are looking for in a source.
  • How to reconcile the stream of information coming in from NER with the data we already have – for example, what process will we use to figure out if “Jane S Smith” and “Jayne S Smith” are the same person?
  • How we evaluate NLP and NER systems so we know whether they are getting better (or worse!).
  • The type of workflow and user interface that would be needed to bring these capabilities effectively into our research work so they are actually helpful.

In the next post in this series, Ulysses and I will start digging into these questions and revealing some of the work that we have done so far.

(Article edited on 22 February 2018 to correct typos and clarify wording)

February data update on – SARS Nigeria, Mexico military garrisons, new Egypt units

Since December 2017 we have made published two updates to, adding a large number of new records, expanding others and making some corrections. Cumulatively, these updates increase the data available on by 25%. In this blog post we’ll look in depth a recent restructure of the Special Anti-Robbery Squads (SARS) of the Nigeria Police Force and give a brief overview of other updates.

Special Anti-Robbery Squads (SARS) – Nigeria Police Force

Changes in the chain of command of Special Anti-Robbery Squads (SARS) of the Nigeria Police Force

SARS are a specialised type of unit of the Nigeria Police Force. They were established in each state and the Federal Capital Territory (FCT) to combat violent crime. Civil society groups have reported on allegations of human rights abuses by SARS for at least 15 years. In its September 2016 report “You Have Signed Your Death Warrant” Amnesty International documented numerous allegations against SARS across Nigeria, including acts of torture and other cruel, inhuman or degrading treatment or punishment. We have carefully extracted these incidents from Amnesty’s report and made them searchable on

In December 2017 Nigerian citizens rallied around the #EndSARS hashtag on social media, using it to make allegations and share experiences of violence and corruption by SARS personnel. #EndSARS culminated in a number of protests during which the movement’s leadership demanded the squads be disbanded. In response, the Inspector-General of Police did not disband SARS but restructured the units… twice. What, if anything, changed?

For a long stretch between 2010 and 3 December 2017 the SARS in each state and the FCT of Nigeria had two different and simultaneous chains of command. Each state/FCT SARS was under the Criminal Investigation Division (CID) for their state/FCT while also being “coordinated” by a Commissioner of Police for SARS who was under the Federal Criminal Investigation Department/”D” Department of the Nigeria Police Force. Ultimately both chains of command end at the Inspector General of Police (IGP) at Force Headquarters.

On 4 December 2017 the IGP announced a dramatic reshuffle: SARS in each state/FCT would report to the Federal SARS, which itself would be moved under the “B” Department/Operations Department at Force Headquarters in Abuja. Thus for a brief moment all of the SARS units in each state had a single chain of command.

It may be that this was a mistake because just over a fortnight later on 22 December 2017 the IGP made another announcement: SARS would return to having two simultaneous chains of command. SARS in each state/FCT would be under the command of the state/FCT Commissioner of Police (through the CP’s deputies in charge of operations) as well as continuing to report to the CP in charge of Federal SARS who was still under the “B” Department/Operations.

So, the overall effect on the SARS chain of command is the removal of State CID, along with a shift in reporting from “D” Department (Investigations) to the “B” Department (Operations) at Force Headquarters. The impact of these restructurings on SARS themselves are difficult to assess. A past reorganization announced by the IGP in November 2015 – which split SARS in each state into “operations” and “investigations” branches – apparently was never actually implemented on the ground. Amnesty International reported SARS officers they interviewed in June 2016 were “unaware of the IGP’s announcement [in November 2015] that SARS ha[d] been split into two units for operations purposes.” For now, SARS is also still listed as under the “D” Department on the Nigeria Police Force’s website. We will continue to watch developments closely, update and extend our data on SARS as more information becomes available.

You can view the updated data on the Special Anti-Robbery Squads (SARS) of the Nigerian Police on

Other updates to data on security forces in Nigeria, Mexico and Egypt


As well as our close look at SARS above we have updated with data on police units in Delta and Bauchi States in Nigeria. Further, we have now added allegations of human rights abuses by security forces against pro-Biafra protesters in the south-eastern states of Nigeria. In its November 2016 report “Bullets Were Raining Everywhere” Amnesty International reports numerous allegations of extrajudicial killing, torture and arbitrary arrest and detention committed by security forces against pro-Biafran protesters between August 2015 and August 2016 in Nigeria’s Anambra, Abia and Rivers States.

View the updated Nigeria data on


We have extended the Mexico dataset on to cover Military Garrisons (“guarniciones militares”) and their commanders. Garrisons can play an active role in military operations and often command smaller units as well. One example of this is Guarnición Militar de Ciudad Juárez which participated in a major military joint operation Operación Conjunta Chihuahua and commanded both the 9 and 20 Regimientos de Caballería Motorizado (motorized cavalry regiments).

In an earlier upload of data we had omitted full descriptions of a number alleged human rights abuses in Mexico. We have now corrected this.

View the updated Mexico data on


For our data on Egypt, we have added initial data on top level military structures and entries for police units in Aswan and Al Sharqia governorates in Egypt. We’ve also included a small number of allegations of human rights abuses by police in Egypt as reported by Human Rights Watch (HRW) in September 2017.

View the updated Egypt data on