Submitted Project 2: Fatalities in U.S. Protests

Posted on April 28, 2025May 5, 2025 by pgattamaneni

I wrapped up Project 2 by combining all analysis into one comprehensive report. It includes high-risk zone clustering, media bias testing, and fatality trend visualizations. The report concludes with actionable policy insights on law enforcement preparation, protest safety strategies, and the need for unbiased media representation.

Analysis on the fatalities in ACLED India Dataset

Posted on April 8, 2025April 8, 2025 by pgattamaneni

These above plots have the basic description of the ACLED India datasets fatalities

1. Fatalities Prediction (Regression)
Key Metric: MAE = 0.06
What This Means:
– On average, the model’s predictions are off by **0.06 fatalities per event**
– Given data’s original fatality statistics:
– Mean = 0.048
– 75% of events have 0 fatalities
– Critical Insight:
The model is slightly worse than simply predicting 0 for all events (baseline MAE = 0.05). This suggests:
– The model struggles to predict rare high-fatality events
– Most predictions cluster near 0 fatalities

Basic questions and findings on the ACLED USA dataset

Posted on April 8, 2025May 1, 2025 by pgattamaneni

Issue 1: Do violent protests occur more frequently in summer months?

Goal: To find out if violent protests—like riots, battles, or civilian attacks—happen more often during the summer season.
Test Used: Chi-Square Test of Independence
If violence tends to spike in summer, it can help cities prepare better during these months with more resources, monitoring, and crowd control measures.

Issue 2: Are protests in cities more likely to involve fatalities than in rural areas?

Goal: To see if protests held in cities are more likely to result in deaths compared to those in non-urban or rural areas.
Test Used: Chi-Square Test or Comparison Grouping
Understanding whether city-based protests pose higher risks helps local authorities and emergency services plan ahead and deploy preventive measures.

Issue 3: Which U.S. regions show clustering of high fatality events?

Goal: To identify specific geographic zones where fatal protest events are concentrated.
Method Used: KMeans Clustering (Latitude and Longitude)
This helps map out hotspots where deadly protests happen more often, so local governments or NGOs can focus safety efforts and outreach there.

Results and Output Plots :

2. Findings and Discussion

Violence Peaks in Summer Months

The first Chi-Square test checked whether violent protests were more likely in the summer months. With a Chi² value of 49.96 and a p-value of 0.000000, the result was statistically significant. The heatmap clearly shows a spike in violent events during June, July, and August.

Summer may bring larger crowds due to holidays and outdoor activities, increasing protest frequency and tension. Law enforcement and city planning may need to anticipate and prepare for unrest during this period.

Fatality Risk is Not Tied to Urban vs Rural

The second Chi-Square test examined whether fatal events occur more in cities compared to rural areas. With a Chi² of 0.11 and p = 0.745, this test was not significant, meaning fatalities are not strongly related to whether the location is urban or not.

This goes against common assumptions that cities are more dangerous. It implies that rural protests may be just as volatile, and safety measures must be equally distributed. High Fatality Clusters Found in Specific U.S. Regions

KMeans clustering was used to detect regional hotspots of deadly protests. The scatter plot shows 4 clusters, with some clearly centered in regions like Southern California, Texas, and the Northeast. These hotspots help identify zones where civil unrest is consistently deadly.

Targeted policy, rapid-response teams, or awareness campaigns could reduce the risk in these zones. It allows for smarter, data-driven deployment of resources instead of blanket policies.

Worked on model evaluation and media influence

Posted on April 2, 2025May 5, 2025 by pgattamaneni

I looked at media outlets that reported on fatalities. According to a chi-square test, Fox News, CNN, and AP were regularly included in news about fatalities. This analysis gave my thesis a new perspective by demonstrating how media coverage might distort the public’s perception of protest violence.

ACLED US Data And K Means Clustering

Posted on March 25, 2025March 25, 2025 by pgattamaneni

Does K Means clustering algorithm run well with the datasets that are mainly categorical (neither numerical nor geo) ?
Can we use K Means for prediction of unseen data points and forming clusters ?
How well does the K Means handle the outliers in the data ?

Questions while trying to process tests on ACLED-India dataset

Posted on March 25, 2025April 8, 2025 by pgattamaneni

What is the distribution of disorder types across the events?
How many events occurred in each location mentioned in the dataset?
What is the breakdown of event types and sub-event types?
Are there any patterns in the geographical distribution of events (using latitude and longitude)?
What is the most common actor type involved in these events?
Is there a correlation between the type of event and the crowd size (where reported)?
How do the events differ in terms of geo_precision, and what might this indicate about data reliability?
What is the distribution of events across different source scales (National vs. Subnational)?
Are there any trends in the fatalities reported across different event types?
How do the associated actors vary across different types of protests or demonstrations?
What insights can be drawn from the time_precision column in relation to the events reported?
Is there any correlation between the location of events and the type of source reporting them?

1. Dataset Overview
– Shape: 65,535 events (rows) with 31 features (columns)
– First 5 rows: Shows protest/rally events with 0 fatalities from December 2024
– Key Insight: Early entries suggest many non-violent protest events

2. Fatalities Analysis
Basic Statistics:
– Mean: 0.048 fatalities/event
– Median & Mode: 0 fatalities
– Range: 0-35 fatalities
– Std Dev: 0.386 (low dispersion)
– Skewness: 29.14 (extreme right skew)
– Kurtosis: 1767 (extreme peakedness with heavy tail)

Interpretation:
– 75% of events have 0 fatalities (Q3=0)
– 95%+ events likely have ≤1 fatality
– Extreme outliers exist (max=35 deaths)
– Distribution is non-normal (confirmed by Shapiro-Wilk p=0.000)

Outliers:
– 2,183 events (3.3%) exceed normal range
– Outliers range 1-35 fatalities (mean=1.45)
– Indicates rare but severe violent incidents

3. Event-Type Analysis
By Event Type:
1. Battles: Most deadly (mean=0.93/event)
2. Violence vs Civilians: Second deadliest (mean=0.46)
3. Riots: Most frequent violent event (6,818 cases) but low lethality

By Sub-Event Type:
1. Armed Clashes: 1,173 fatalities
2. Attacks: 1,036 fatalities
3. Mob Violence: 700 fatalities

Key Insight: Organized violence (battles/attacks) deadlier than spontaneous violence

4. Temporal Patterns
– Yearly Analysis: Data shows 2024 entries only (partial year data)
– Monthly Analysis: Time series plot (not shown) would require full-year data

5. Spatial Patterns
– Top Locations: Plot shows specific hotspots (exact locations not listed)
– Geo Analysis: Latitude/longitude data available for mapping clusters

6. Correlation Analysis
– Matrix shows relationships between fatalities and:
– Year (temporal correlation)
– Latitude/Longitude (spatial patterns)
– Exact correlations not shown but methodology correct

7. Data Quality Notes
– No missing values in fatalities column
– High precision: 0-mortality events well-documented
– Source Scale: Mix of national/subnational sources

Key Conclusions
1. Conflict Nature:
– Mostly non-lethal protests (51,409 protest events)
– Occasional high-casualty outbreaks

2. Violence Profile:
– Battles → Highest per-event lethality
– Riots → Most frequent violence type
– Sexual violence exists but rare (20 fatalities)

3. Data Characteristics:
– Zero-inflated distribution
– Requires non-parametric statistical methods
– Outliers represent critical security events

4. Research Implications:
– Focus on armed clashes for casualty prevention
– Protest management appears effective (low fatalities)
– Spatial analysis needed for hotspot identification

Statistical tests and clustering analysis

Posted on March 25, 2025May 5, 2025 by pgattamaneni

I used the Kruskal-Wallis and Chi-square tests this week to see if the number of fatalities differed depending on the type of protest or the location. In order to pinpoint high-risk areas, I also started grouping protests according to latitude and longitude. I discovered that violent protest kinds, such as “Violence against Civilians,” had a much higher death toll, and that Texas and California were statistical outliers with exceptionally high death tolls.

Police Shooting Dataset Project Issues

Posted on March 18, 2025March 18, 2025 by pgattamaneni

In police shootings, what are the age cumulative distribution functions (or CDFs) for various racial groups?
Are there statistically significant differences between these distributions, and how do they compare?
Is it possible to measure the impact of age differences using Cohen’s d?
Are the ages of those who were escaping and those who were not significantly different?
Does the chance of being shot while escaping vary by race?
Which statistical tests—such as Monte Carlo methods or t-tests—confirm or disprove these trends?
Does the percentage of shootings involving unarmed people drop noticeably when body camera footage is used?
What ethnic differences exist in body camera use?
How do police shootings by state relate to the use of body cameras?
What proportion of people who were shot by police had weapons as opposed to none?
Does the victim’s race have a statistically significant impact on whether they were armed?
Does using a body camera change when a weapon is present?
Which cities or states have the greatest per capita rates of police shootings?
Are there trends of greater racial disparity in shootings in some states?
Can high-risk areas where police encounters lead to more fatalities be identified using clustering methods?

Started Project 2 with ACLED Dataset on U.S. Protests

Posted on March 18, 2025May 5, 2025 by pgattamaneni

I began looking into the ACLED dataset for Project 2 after finishing Project 1. After cleaning the dataset, I produced derived variables such as violence_level and fatal_event. I started classifying fatal versus non-fatal protests in different U.S. states and protest categories using this new framework.

Submitted Project 1 on Police Shootings

Posted on February 25, 2025May 5, 2025 by pgattamaneni

The completed report was turned in. It featured visualizations such as clustering maps, cumulative plots, and bar charts. Monte Carlo simulations were also used to confirm the results on age differences. In the discussion section, I offered policy-level proposals after coming to the conclusion that there is a notable racial and age bias in the way police shootings take place throughout the United States.