Analysis on the fatalities in ACLED India Dataset

 

These above plots have the basic description of the ACLED India datasets fatalities

 

1. Fatalities Prediction (Regression)
Key Metric: MAE = 0.06
What This Means:
– On average, the model’s predictions are off by **0.06 fatalities per event**
– Given data’s original fatality statistics:
– Mean = 0.048
– 75% of events have 0 fatalities
– Critical Insight:
The model is slightly worse than simply predicting 0 for all events (baseline MAE = 0.05). This suggests:
– The model struggles to predict rare high-fatality events
– Most predictions cluster near 0 fatalities

 

Questions while trying to process tests on ACLED-India dataset

  1.            What is the distribution of disorder types across the events?

  2. How many events occurred in each location mentioned in the dataset?

  3. What is the breakdown of event types and sub-event types?

  4. Are there any patterns in the geographical distribution of events (using latitude and longitude)?

  5. What is the most common actor type involved in these events?

  6. Is there a correlation between the type of event and the crowd size (where reported)?

  7. How do the events differ in terms of geo_precision, and what might this indicate about data reliability?

  8. What is the distribution of events across different source scales (National vs. Subnational)?

  9. Are there any trends in the fatalities reported across different event types?

  10. How do the associated actors vary across different types of protests or demonstrations?

  11. What insights can be drawn from the time_precision column in relation to the events reported?

  12. Is there any correlation between the location of events and the type of source reporting them?

     

 1. Dataset Overview
– Shape: 65,535 events (rows) with 31 features (columns)
– First 5 rows: Shows protest/rally events with 0 fatalities from December 2024
– Key Insight: Early entries suggest many non-violent protest events

 2. Fatalities Analysis
Basic Statistics:
– Mean: 0.048 fatalities/event
– Median & Mode: 0 fatalities
– Range: 0-35 fatalities
– Std Dev: 0.386 (low dispersion)
– Skewness: 29.14 (extreme right skew)
– Kurtosis: 1767 (extreme peakedness with heavy tail)

Interpretation:
– 75% of events have 0 fatalities (Q3=0)
– 95%+ events likely have ≤1 fatality
– Extreme outliers exist (max=35 deaths)
– Distribution is non-normal (confirmed by Shapiro-Wilk p=0.000)

Outliers:
– 2,183 events (3.3%) exceed normal range
– Outliers range 1-35 fatalities (mean=1.45)
– Indicates rare but severe violent incidents

 3. Event-Type Analysis
By Event Type:
1. Battles: Most deadly (mean=0.93/event)
2. Violence vs Civilians: Second deadliest (mean=0.46)
3. Riots: Most frequent violent event (6,818 cases) but low lethality

By Sub-Event Type:
1. Armed Clashes: 1,173 fatalities
2. Attacks: 1,036 fatalities
3. Mob Violence: 700 fatalities

Key Insight: Organized violence (battles/attacks) deadlier than spontaneous violence

4. Temporal Patterns
– Yearly Analysis: Data shows 2024 entries only (partial year data)
– Monthly Analysis: Time series plot (not shown) would require full-year data

 5. Spatial Patterns
– Top Locations: Plot shows specific hotspots (exact locations not listed)
– Geo Analysis: Latitude/longitude data available for mapping clusters

 6. Correlation Analysis
– Matrix shows relationships between fatalities and:
– Year (temporal correlation)
– Latitude/Longitude (spatial patterns)
– Exact correlations not shown but methodology correct

 7. Data Quality Notes
– No missing values in fatalities column
– High precision: 0-mortality events well-documented
– Source Scale: Mix of national/subnational sources

Key Conclusions
1. Conflict Nature:
– Mostly non-lethal protests (51,409 protest events)
– Occasional high-casualty outbreaks

2. Violence Profile:
– Battles → Highest per-event lethality
– Riots → Most frequent violence type
– Sexual violence exists but rare (20 fatalities)

3. Data Characteristics:
– Zero-inflated distribution
– Requires non-parametric statistical methods
– Outliers represent critical security events

4. Research Implications:
– Focus on armed clashes for casualty prevention
– Protest management appears effective (low fatalities)
– Spatial analysis needed for hotspot identification

 

Police Shooting Dataset Project Issues

  • In police shootings, what are the age cumulative distribution functions (or CDFs) for various racial groups?
  • Are there statistically significant differences between these distributions, and how do they compare?
  • Is it possible to measure the impact of age differences using Cohen’s d?
  • Are the ages of those who were escaping and those who were not significantly different?
  • Does the chance of being shot while escaping vary by race?
  • Which statistical tests—such as Monte Carlo methods or t-tests—confirm or disprove these trends?
  • Does the percentage of shootings involving unarmed people drop noticeably when body camera footage is used?
  • What ethnic differences exist in body camera use?
  • How do police shootings by state relate to the use of body cameras?
  • What proportion of people who were shot by police had weapons as opposed to none?
  • Does the victim’s race have a statistically significant impact on whether they were armed?
  • Does using a body camera change when a weapon is present?
  • Which cities or states have the greatest per capita rates of police shootings?
  • Are there trends of greater racial disparity in shootings in some states?
  • Can high-risk areas where police encounters lead to more fatalities be identified using clustering methods?

Washington Post – Police shootings data

Some important questions that I wanted to discuss to get a better understanding of the dataset are

  1. What is the age distribution of individuals involved in these incidents?
  2. Are there any notable patterns in urban vs. rural areas?
  3. Are there any patterns in weapon types across different demographic groups?
  4. Is there any correlation between race and the type of weapon involved?
  5. Is there any relationship between body camera usage and specific police departments? What percentage of incidents had body cameras present?

The dataset poses interesting issues on whether there are temporal patterns in the number of events over different seasons or months, whether certain jurisdictions exhibit greater rates of particular incident kinds, and the relationship between body camera usage and incident outcomes.