Global Terrorism Database: Interactive Analysis

As final project assignment of the course Social Data (2017) at DTU

Project maintained by polakowo This project is released under the terms of the MIT license.

Next you will find the table of contents. To jump to a section, click on the particular header.

1. Introduction
2. Dataset

2.1. Columns

3. Charts

3.1. Histogram
3.2. Scatterplot
3.3. Choropleth
3.4. K-Means
3.5. K-Nearest Neighbors

4. Further information

1. Introduction

The scope of this project is to drill down the terrorist events around the world from 1970 through 2015

The primary objectives are

to identify and highlight the geographical and temporal patterns of the terrorism,
to discover the main parameters of a successful terrorist attack, and
to allow the user to customize the analysis and to explore the data in the most interactive way.

The idea behind the project is to find out how the terrorism has developed in the Western world and whether we need to build tall walls to protect ourself against future threats. We chose our topic to be more global oriented, because

It enables aggregation on many geographical levels including the globe, regions, countries, states, and cities
It is very diversified and encapsulates many interesting attributes
It has both temporal and geographical data

More general information about the project you find in the video below.

SocialData2017 from polakowo on Vimeo.

2. Dataset

The dataset is very comprehensive and contains a lot of terrorism-related information. We downloaded the entire dataset Global Terrorism Database, available from Gtd Homepage. It contains 156,772 terrorist attacks x 137 features, and takes 142.3 MB of disk space. It's worth to mention that it is almost completely encoded (strings/long numbers to short numbers). To decode the dataset we looked at the codebook available here. After exploring the codebook we discovered some columns to be redundant, or not relevant, which we removed. See the corresponding notebook Cleaning Data for further details on how we approached.

We ended up working on 23 columns, which contain the quantitative as well as the qualitative information of the main interest. After decoding, cleaning, filtering, and encoding steps, we've got 156,772 rows x 23 columns, or equivalently 26.8 MB of disk space

You can download the cleaned dataset from this link to try it out by yourself!

2.1. Columns

Below you find some basic information on columns we used in charts.

The first table contains numeric data:

Column name	Type	Min	Max	NaN	Description
`year`	int	1970	2015	None	Year
`nkilled`	int	0	1500	None	Total Number of Fatalities
`nkilledter`	int	0	500	None	Number of Perpetrator Fatalities
`nwounded`	int	0	5500	None	Total Number of Injured
`nwoundedter`	int	0	200	None	Number of Perpetrators Injured
`lat`	float	-53.1546	74.6336	None	Latitude (of city)
`lon`	float	-176.176	179.367	None	Longitude (of city)

The second table contains categorical data:

Column name	Unique	Top	NaN	Description
`region`	12	Middle East & North Africa	None	Region
`country`	204	Iraq	None	Country
`weapontype`	12	Explosives/Bombs/Dynamite	7.59%	Weapon Type
`attacktype`	9	Bombing/Explosion	3.38%	Attack Type
`targettype`	22	Private Citizens & Property	2.46%	Target/Victim Type
`gname`	3216	Unknown	46.41%	Perpetrator Group Name

Hint: Every categorical feature is encoded with integers to save a lot of space. Therefore, we introduce a global JSON dict called strings.json containing a map of all integers to their corresponding strings. The decoding process takes place after the data has been successfully loaded to the front end, so none of the charts must take care of it.

You may already noticed that the amount of columns is less than 23. We limited the amount of information to be able to deliver the information quickly and make charts to be more responsive (= less laggy). Below you find optional columns we skipped.

Column name	Description
`state`	Province / Administrative Region / State
`city`	City
`extended`	Extended Incident?
`multiple`	Part of Multiple Incident?
`success`	Successful Attack?
`suicide`	Suicide Attack?
`nter`	Number of Perpetrators
`claimed`	Claim of Responsibility?
`property`	Property Damage
`propertyextent`	Extent of Property Damage

To get more information on those columns, feel free to jump to the Explainer Notebook, where you find additional charts (using Python and Plotly) capturing those columns as well.

3. Charts

Charts are vital in presentation of data. They are used in both exploratory and descriptive analysis. As the most aggregations are time-intensive, we outsource them to the back end. We perform every major task in two steps:

Use iPython to process the data, e.g., apply filters, perform aggregations, etc.
On the client's side, use d3.js that facilitates generation and manipulation of web documents with data, for construction of beautiful interactive data visualizations.

3.1. Histogram

This histogram aims at exploration of temporal patterns of terrorism from 1970 through 2015

The main question we address is

How has the terrorism developed over time from the perspective of geographical units, types or terrorist groups?

We're interested in the temporal aspect of terrorist development, which touches many interesting attributes:

Geographical units in form of regions, countries, states and cities
Types, such as weapon, attack and target types
Terrorist groups

Basically, we can aggregate on every categorical column, what we actually did.

We implemented the following cool features:

Choose the main category (e.g., Region)
Choose an item of the selected category (e.g., Western Europe)
Choose a metric (e.g., Killed)
Hover over a bar to display the year's share in % (e.g., 7% of all victims in Western Europe were killed in year 1988).

3.2. Scatterplot

The second chart is a scatterplot, which encodes 3 numeric attributes

X: Killed,
Y: Wounded, and
Size: Count

at a time. We use the scatterplot to compare categorical attributes (like Weapon Type) by numeric metrics. Because many of those attributes compare differently across regions or even countries, we added the ability to aggregate by geographical units.

The main question we address is

What are

the most lethal weapon types,

the most effective attack types, and

the most vulnerable target types

in the selected geographical unit?

We implemented the following cool features:

Choose the level of aggregation (e.g., Country)
Choose an item on the selected level (e.g., Germany)
Choose the categorical variable (e.g., Target Type)
Choose the role (either Victims or Terrorists)
Switch between Relative (average per attack) and Absolute (sum) metrics
Hover over a circle to display the label and the number of attacks.

Relative Absolute

3.3. Choropleth

The scatterplot above has one big issue: we can display up to ~30 circles before we run out of space. But what if we'd love to compare countries? Even on a rectangular map with Mercador projection we need some kind of zoom. To tackle the problem we decided on another, more difficult, but also interesting solution: map countries on a virtual globe and let the user rotate it!

The main question we address is

How do the countries compare with each other in terms of terrorism?

We implemented the following cool features:

Choose the category (e.g., Terrorist Group)
Choose an item of the selected category (e.g., Taliban)
Choose a metric (e.g., Killed)
Choose the year on the slider, or move the slider slowly to animate the change in global terrorism
Click on the Time Machine to auto-play the animation from 1970 through 2015...
Drag the globe to rotate, or one-click on a country to jump to it
Hover over a country to display its name and the metric.

Temporal Distribution

1970 2015

Time Machine

3.4. K-Means

K-Means is the first algorithm in pattern recognition we'll use for analysis. Using K-Means, we can partition terrorist attacks into groups (at least it's an idea) to see how the terrorism is distributed geographically.

The main question we address is

Do the terrorist attacks form some geographical groups? Are there some visual patterns to find?

We implemented the following cool features:

Choose the category (e.g., Terrorist Group)
Choose an item of the selected category (e.g., Taliban)
Choose a metric (e.g., Killed)
Choose the variable K which represents the number of groups (e.g., 3)
Hover over a hexbin/centroid to display the number of enclosed attacks
Each big circle represents a centroid - the center of each partition of attacks
Hexagons of the same color belong to the same partition.

3.5. K-Nearest Neighbors

The second pattern recognition algorithm is k-Nearest Neighbors, which is a classification and regression algorithm. Using kNN we are able to classify any point on the globe based on its (k-) neighbors.

The main question we address is

What if we knew that a terrorist attack is going to happen somewhere on the globe, what type will it likely be?

We implemented the following cool features:

Choose the type (e.g., Weapon Type)
Choose the number of groups (k) on slider (e.g., 3)
You can also adjust k by scrolling up and down
Hover over the map to classify the underlying point and display the related neighbors

k-Nearest Neighbors

2 100

4. Further Information

Some (optional-) information was skipped to shrink the size of the webpage, thus you are welcome to continue the reading in the Explainer Notebook. You may also be interested in testing the things out, for this, clone the repository, download and import the data, and enjoy your analysis.

Global Terrorism Database: Interactive Analysis

As final project assignment of the course Social Data (2017) at DTU

Table of contents

1. Introduction

2. Dataset

2.1. Columns

3. Charts

3.1. Histogram

3.2. Scatterplot

3.3. Choropleth

3.4. K-Means

3.5. K-Nearest Neighbors

4. Further Information