Global Terrorism Database: Interactive Analysis

As final project assignment of the course Social Data (2017) at DTU

Project maintained by polakowo This project is released under the terms of the MIT license.

Table of contents

Next you will find the table of contents. To jump to a section, click on the particular header.

1. Introduction

The scope of this project is to drill down the terrorist events around the world from 1970 through 2015

The primary objectives are

The idea behind the project is to find out how the terrorism has developed in the Western world and whether we need to build tall walls to protect ourself against future threats. We chose our topic to be more global oriented, because

More general information about the project you find in the video below.

SocialData2017 from polakowo on Vimeo.

2. Dataset

The dataset is very comprehensive and contains a lot of terrorism-related information. We downloaded the entire dataset Global Terrorism Database, available from Gtd Homepage. It contains 156,772 terrorist attacks x 137 features, and takes 142.3 MB of disk space. It's worth to mention that it is almost completely encoded (strings/long numbers to short numbers). To decode the dataset we looked at the codebook available here. After exploring the codebook we discovered some columns to be redundant, or not relevant, which we removed. See the corresponding notebook Cleaning Data for further details on how we approached.

We ended up working on 23 columns, which contain the quantitative as well as the qualitative information of the main interest. After decoding, cleaning, filtering, and encoding steps, we've got 156,772 rows x 23 columns, or equivalently 26.8 MB of disk space

You can download the cleaned dataset from this link to try it out by yourself!

2.1. Columns

Below you find some basic information on columns we used in charts.

The first table contains numeric data:

Column name Type Min Max NaN Description
year int 1970 2015 None Year
nkilled int 0 1500 None Total Number of Fatalities
nkilledter int 0 500 None Number of Perpetrator Fatalities
nwounded int 0 5500 None Total Number of Injured
nwoundedter int 0 200 None Number of Perpetrators Injured
lat float -53.1546 74.6336 None Latitude (of city)
lon float -176.176 179.367 None Longitude (of city)

The second table contains categorical data:

Column name Unique Top NaN Description
region 12 Middle East & North Africa None Region
country 204 Iraq None Country
weapontype 12 Explosives/Bombs/Dynamite 7.59% Weapon Type
attacktype 9 Bombing/Explosion 3.38% Attack Type
targettype 22 Private Citizens & Property 2.46% Target/Victim Type
gname 3216 Unknown 46.41% Perpetrator Group Name

Hint: Every categorical feature is encoded with integers to save a lot of space. Therefore, we introduce a global JSON dict called strings.json containing a map of all integers to their corresponding strings. The decoding process takes place after the data has been successfully loaded to the front end, so none of the charts must take care of it.

You may already noticed that the amount of columns is less than 23. We limited the amount of information to be able to deliver the information quickly and make charts to be more responsive (= less laggy). Below you find optional columns we skipped.

Column name Description
state Province / Administrative Region / State
city City
extended Extended Incident?
multiple Part of Multiple Incident?
success Successful Attack?
suicide Suicide Attack?
nter Number of Perpetrators
claimed Claim of Responsibility?
property Property Damage
propertyextent Extent of Property Damage
To get more information on those columns, feel free to jump to the Explainer Notebook, where you find additional charts (using Python and Plotly) capturing those columns as well.

3. Charts

Charts are vital in presentation of data. They are used in both exploratory and descriptive analysis. As the most aggregations are time-intensive, we outsource them to the back end. We perform every major task in two steps:

  1. Use iPython to process the data, e.g., apply filters, perform aggregations, etc.
  2. On the client's side, use d3.js that facilitates generation and manipulation of web documents with data, for construction of beautiful interactive data visualizations.

3.1. Histogram

This histogram aims at exploration of temporal patterns of terrorism from 1970 through 2015

The main question we address is

How has the terrorism developed over time from the perspective of geographical units, types or terrorist groups?

We're interested in the temporal aspect of terrorist development, which touches many interesting attributes:

  1. Geographical units in form of regions, countries, states and cities
  2. Types, such as weapon, attack and target types
  3. Terrorist groups
Basically, we can aggregate on every categorical column, what we actually did.

We implemented the following cool features:

3.2. Scatterplot

The second chart is a scatterplot, which encodes 3 numeric attributes

at a time. We use the scatterplot to compare categorical attributes (like Weapon Type) by numeric metrics. Because many of those attributes compare differently across regions or even countries, we added the ability to aggregate by geographical units.

The main question we address is

What are
  • the most lethal weapon types,
  • the most effective attack types, and
  • the most vulnerable target types
in the selected geographical unit?

We implemented the following cool features:

3.3. Choropleth

The scatterplot above has one big issue: we can display up to ~30 circles before we run out of space. But what if we'd love to compare countries? Even on a rectangular map with Mercador projection we need some kind of zoom. To tackle the problem we decided on another, more difficult, but also interesting solution: map countries on a virtual globe and let the user rotate it!

The main question we address is

How do the countries compare with each other in terms of terrorism?

We implemented the following cool features:

Temporal Distribution
1970 2015

3.4. K-Means

K-Means is the first algorithm in pattern recognition we'll use for analysis. Using K-Means, we can partition terrorist attacks into groups (at least it's an idea) to see how the terrorism is distributed geographically.

The main question we address is

Do the terrorist attacks form some geographical groups? Are there some visual patterns to find?

We implemented the following cool features:

3.5. K-Nearest Neighbors

The second pattern recognition algorithm is k-Nearest Neighbors, which is a classification and regression algorithm. Using kNN we are able to classify any point on the globe based on its (k-) neighbors.

The main question we address is

What if we knew that a terrorist attack is going to happen somewhere on the globe, what type will it likely be?

We implemented the following cool features:

k-Nearest Neighbors
2 100

4. Further Information

Some (optional-) information was skipped to shrink the size of the webpage, thus you are welcome to continue the reading in the Explainer Notebook. You may also be interested in testing the things out, for this, clone the repository, download and import the data, and enjoy your analysis.