- Does K Means clustering algorithm run well with the datasets that are mainly categorical (neither numerical nor geo) ?
- Can we use K Means for prediction of unseen data points and forming clusters ?
- How well does the K Means handle the outliers in the data ?
Questions while trying to process tests on ACLED-India dataset
-
-
What is the distribution of disorder types across the events?
-
How many events occurred in each location mentioned in the dataset?
-
What is the breakdown of event types and sub-event types?
-
Are there any patterns in the geographical distribution of events (using latitude and longitude)?
-
What is the most common actor type involved in these events?
-
Is there a correlation between the type of event and the crowd size (where reported)?
-
How do the events differ in terms of geo_precision, and what might this indicate about data reliability?
-
What is the distribution of events across different source scales (National vs. Subnational)?
-
Are there any trends in the fatalities reported across different event types?
-
How do the associated actors vary across different types of protests or demonstrations?
-
What insights can be drawn from the time_precision column in relation to the events reported?
-
Is there any correlation between the location of events and the type of source reporting them?
output log :
Dataset Overview:
event_id_cnty event_date year time_precision disorder_type \
0 IND173617 13-Dec-24 2024 1 Demonstrations
1 IND173619 13-Dec-24 2024 1 Demonstrations
2 IND173620 13-Dec-24 2024 1 Demonstrations
3 IND173643 13-Dec-24 2024 1 Demonstrations
4 IND173670 13-Dec-24 2024 2 Political violenceevent_type sub_event_type actor1 \
0 Protests Peaceful protest Protesters (India)
1 Protests Peaceful protest Protesters (India)
2 Protests Peaceful protest Protesters (India)
3 Protests Peaceful protest Protesters (India)
4 Riots Mob violence Rioters (India)assoc_actor_1 inter1 … \
0 Health Workers (India); Students (India) Protesters …
1 General Caste Group (India); Hindu Group (Indi… Protesters …
2 Muslim Group (India) Protesters …
3 Unknown Protesters …
4 Labor Group (India) Rioters …location latitude longitude geo_precision source \
0 Baramulla 34.2090 74.3429 1 Daily Excelsior
1 Jammu 32.7357 74.8691 1 Northlines
2 Srinagar 34.0857 74.8056 3 Kashmir News Service
3 Suktabari 26.3011 89.4058 1 Millennium Post (India)
4 Katahkuchi 26.4846 91.4778 1 Pratidin Timesource_scale notes fatalities \
0 Subnational On 13 December 2024, medical students held a p… 0
1 Subnational On 13 December 2024, locals and religious outf… 0
2 Subnational On 13 December 2024, people (likely from the M… 0
3 National On 13 December 2024, locals held a protest mar… 0
4 Subnational Around 13 December 2024 (as reported), locals … 0tags timestamp
0 crowd size=no report 1734479873
1 crowd size=large scale 1734479873
2 crowd size=no report 1734479873
3 crowd size=no report 1734479873
4 crowd size=no report 1734479874[5 rows x 31 columns]
Null values in the dataset:
event_id_cnty 0
event_date 0
year 0
time_precision 0
disorder_type 0
event_type 0
sub_event_type 0
actor1 0
assoc_actor_1 0
inter1 0
actor2 0
assoc_actor_2 0
inter2 50097
interaction 0
civilian_targeting 0
iso 0
region 0
country 0
admin1 0
admin2 0
admin3 0
location 0
latitude 0
longitude 0
geo_precision 0
source 0
source_scale 0
notes 0
fatalities 0
tags 6903
timestamp 0
dtype: int64After handling missing values:
event_id_cnty 0
event_date 0
year 0
time_precision 0
disorder_type 0
event_type 0
sub_event_type 0
actor1 0
assoc_actor_1 0
inter1 0
actor2 0
assoc_actor_2 0
inter2 50097
interaction 0
civilian_targeting 0
iso 0
region 0
country 0
admin1 0
admin2 0
admin3 0
location 0
latitude 0
longitude 0
geo_precision 0
source 0
source_scale 0
notes 0
fatalities 0
tags 6903
timestamp 0
dtype: int64GeoDataFrame:
event_id_cnty event_date year time_precision disorder_type \
0 IND173617 13-Dec-24 2024 1 Demonstrations
1 IND173619 13-Dec-24 2024 1 Demonstrations
2 IND173620 13-Dec-24 2024 1 Demonstrations
3 IND173643 13-Dec-24 2024 1 Demonstrations
4 IND173670 13-Dec-24 2024 2 Political violenceevent_type sub_event_type actor1 \
0 Protests Peaceful protest Protesters (India)
1 Protests Peaceful protest Protesters (India)
2 Protests Peaceful protest Protesters (India)
3 Protests Peaceful protest Protesters (India)
4 Riots Mob violence Rioters (India)assoc_actor_1 inter1 … \
0 Health Workers (India); Students (India) Protesters …
1 General Caste Group (India); Hindu Group (Indi… Protesters …
2 Muslim Group (India) Protesters …
3 Unknown Protesters …
4 Labor Group (India) Rioters …latitude longitude geo_precision source source_scale \
0 34.2090 74.3429 1 Daily Excelsior Subnational
1 32.7357 74.8691 1 Northlines Subnational
2 34.0857 74.8056 3 Kashmir News Service Subnational
3 26.3011 89.4058 1 Millennium Post (India) National
4 26.4846 91.4778 1 Pratidin Time Subnationalnotes fatalities \
0 On 13 December 2024, medical students held a p… 0
1 On 13 December 2024, locals and religious outf… 0
2 On 13 December 2024, people (likely from the M… 0
3 On 13 December 2024, locals held a protest mar… 0
4 Around 13 December 2024 (as reported), locals … 0tags timestamp geometry
0 crowd size=no report 1734479873 POINT (74.3429 34.209)
1 crowd size=large scale 1734479873 POINT (74.8691 32.7357)
2 crowd size=no report 1734479873 POINT (74.8056 34.0857)
3 crowd size=no report 1734479873 POINT (89.4058 26.3011)
4 crowd size=no report 1734479874 POINT (91.4778 26.4846)[5 rows x 32 columns]
KS-test p-value for uniform randomness: 0.00e+00
KS-test p-value for Poisson randomness: 1.12e-08
Real Data Nearest Neighbor Distances Statistics:
count 10956.000000
mean 0.067921
std 0.067383
min 0.000922
25% 0.028562
50% 0.050649
75% 0.087821
max 2.987769
dtype: float64Random Data Nearest Neighbor Distances Statistics:
count 600.000000
mean 0.749501
std 0.382952
min 0.030353
25% 0.469101
50% 0.726676
75% 0.996883
max 2.831957
dtype: float64KS-test p-value comparing real vs random data: 0.00e+00
Theoretical Variance (Clark & Evans 1954): 1.01
Police Shooting Dataset Project Issues
- In police shootings, what are the age cumulative distribution functions (or CDFs) for various racial groups?
- Are there statistically significant differences between these distributions, and how do they compare?
- Is it possible to measure the impact of age differences using Cohen’s d?
- Are the ages of those who were escaping and those who were not significantly different?
- Does the chance of being shot while escaping vary by race?
- Which statistical tests—such as Monte Carlo methods or t-tests—confirm or disprove these trends?
- Does the percentage of shootings involving unarmed people drop noticeably when body camera footage is used?
- What ethnic differences exist in body camera use?
- How do police shootings by state relate to the use of body cameras?
- What proportion of people who were shot by police had weapons as opposed to none?
- Does the victim’s race have a statistically significant impact on whether they were armed?
- Does using a body camera change when a weapon is present?
- Which cities or states have the greatest per capita rates of police shootings?
- Are there trends of greater racial disparity in shootings in some states?
- Can high-risk areas where police encounters lead to more fatalities be identified using clustering methods?
Washington Post – Police shootings data
Some important questions that I wanted to discuss to get a better understanding of the dataset are
- What is the age distribution of individuals involved in these incidents?
- Are there any notable patterns in urban vs. rural areas?
- Are there any patterns in weapon types across different demographic groups?
- Is there any correlation between race and the type of weapon involved?
- Is there any relationship between body camera usage and specific police departments? What percentage of incidents had body cameras present?
The dataset poses interesting issues on whether there are temporal patterns in the number of events over different seasons or months, whether certain jurisdictions exhibit greater rates of particular incident kinds, and the relationship between body camera usage and incident outcomes.
MTH 522 Jan 27 Week 1
Week 1 demo post for MTH 522 AMS
MTH 522 Demo
This is my MTH 522 Advanced Mathematical Statistics site