Questions while trying to process tests on ACLED-India dataset

  1.            What is the distribution of disorder types across the events?

  2. How many events occurred in each location mentioned in the dataset?

  3. What is the breakdown of event types and sub-event types?

  4. Are there any patterns in the geographical distribution of events (using latitude and longitude)?

  5. What is the most common actor type involved in these events?

  6. Is there a correlation between the type of event and the crowd size (where reported)?

  7. How do the events differ in terms of geo_precision, and what might this indicate about data reliability?

  8. What is the distribution of events across different source scales (National vs. Subnational)?

  9. Are there any trends in the fatalities reported across different event types?

  10. How do the associated actors vary across different types of protests or demonstrations?

  11. What insights can be drawn from the time_precision column in relation to the events reported?

  12. Is there any correlation between the location of events and the type of source reporting them?

    output log :

    Dataset Overview:
    event_id_cnty event_date year time_precision disorder_type \
    0 IND173617 13-Dec-24 2024 1 Demonstrations
    1 IND173619 13-Dec-24 2024 1 Demonstrations
    2 IND173620 13-Dec-24 2024 1 Demonstrations
    3 IND173643 13-Dec-24 2024 1 Demonstrations
    4 IND173670 13-Dec-24 2024 2 Political violence

    event_type sub_event_type actor1 \
    0 Protests Peaceful protest Protesters (India)
    1 Protests Peaceful protest Protesters (India)
    2 Protests Peaceful protest Protesters (India)
    3 Protests Peaceful protest Protesters (India)
    4 Riots Mob violence Rioters (India)

    assoc_actor_1 inter1 … \
    0 Health Workers (India); Students (India) Protesters …
    1 General Caste Group (India); Hindu Group (Indi… Protesters …
    2 Muslim Group (India) Protesters …
    3 Unknown Protesters …
    4 Labor Group (India) Rioters …

    location latitude longitude geo_precision source \
    0 Baramulla 34.2090 74.3429 1 Daily Excelsior
    1 Jammu 32.7357 74.8691 1 Northlines
    2 Srinagar 34.0857 74.8056 3 Kashmir News Service
    3 Suktabari 26.3011 89.4058 1 Millennium Post (India)
    4 Katahkuchi 26.4846 91.4778 1 Pratidin Time

    source_scale notes fatalities \
    0 Subnational On 13 December 2024, medical students held a p… 0
    1 Subnational On 13 December 2024, locals and religious outf… 0
    2 Subnational On 13 December 2024, people (likely from the M… 0
    3 National On 13 December 2024, locals held a protest mar… 0
    4 Subnational Around 13 December 2024 (as reported), locals … 0

    tags timestamp
    0 crowd size=no report 1734479873
    1 crowd size=large scale 1734479873
    2 crowd size=no report 1734479873
    3 crowd size=no report 1734479873
    4 crowd size=no report 1734479874

    [5 rows x 31 columns]

    Null values in the dataset:
    event_id_cnty 0
    event_date 0
    year 0
    time_precision 0
    disorder_type 0
    event_type 0
    sub_event_type 0
    actor1 0
    assoc_actor_1 0
    inter1 0
    actor2 0
    assoc_actor_2 0
    inter2 50097
    interaction 0
    civilian_targeting 0
    iso 0
    region 0
    country 0
    admin1 0
    admin2 0
    admin3 0
    location 0
    latitude 0
    longitude 0
    geo_precision 0
    source 0
    source_scale 0
    notes 0
    fatalities 0
    tags 6903
    timestamp 0
    dtype: int64

    After handling missing values:
    event_id_cnty 0
    event_date 0
    year 0
    time_precision 0
    disorder_type 0
    event_type 0
    sub_event_type 0
    actor1 0
    assoc_actor_1 0
    inter1 0
    actor2 0
    assoc_actor_2 0
    inter2 50097
    interaction 0
    civilian_targeting 0
    iso 0
    region 0
    country 0
    admin1 0
    admin2 0
    admin3 0
    location 0
    latitude 0
    longitude 0
    geo_precision 0
    source 0
    source_scale 0
    notes 0
    fatalities 0
    tags 6903
    timestamp 0
    dtype: int64

    GeoDataFrame:
    event_id_cnty event_date year time_precision disorder_type \
    0 IND173617 13-Dec-24 2024 1 Demonstrations
    1 IND173619 13-Dec-24 2024 1 Demonstrations
    2 IND173620 13-Dec-24 2024 1 Demonstrations
    3 IND173643 13-Dec-24 2024 1 Demonstrations
    4 IND173670 13-Dec-24 2024 2 Political violence

    event_type sub_event_type actor1 \
    0 Protests Peaceful protest Protesters (India)
    1 Protests Peaceful protest Protesters (India)
    2 Protests Peaceful protest Protesters (India)
    3 Protests Peaceful protest Protesters (India)
    4 Riots Mob violence Rioters (India)

    assoc_actor_1 inter1 … \
    0 Health Workers (India); Students (India) Protesters …
    1 General Caste Group (India); Hindu Group (Indi… Protesters …
    2 Muslim Group (India) Protesters …
    3 Unknown Protesters …
    4 Labor Group (India) Rioters …

    latitude longitude geo_precision source source_scale \
    0 34.2090 74.3429 1 Daily Excelsior Subnational
    1 32.7357 74.8691 1 Northlines Subnational
    2 34.0857 74.8056 3 Kashmir News Service Subnational
    3 26.3011 89.4058 1 Millennium Post (India) National
    4 26.4846 91.4778 1 Pratidin Time Subnational

    notes fatalities \
    0 On 13 December 2024, medical students held a p… 0
    1 On 13 December 2024, locals and religious outf… 0
    2 On 13 December 2024, people (likely from the M… 0
    3 On 13 December 2024, locals held a protest mar… 0
    4 Around 13 December 2024 (as reported), locals … 0

    tags timestamp geometry
    0 crowd size=no report 1734479873 POINT (74.3429 34.209)
    1 crowd size=large scale 1734479873 POINT (74.8691 32.7357)
    2 crowd size=no report 1734479873 POINT (74.8056 34.0857)
    3 crowd size=no report 1734479873 POINT (89.4058 26.3011)
    4 crowd size=no report 1734479874 POINT (91.4778 26.4846)

    [5 rows x 32 columns]

    KS-test p-value for uniform randomness: 0.00e+00
    KS-test p-value for Poisson randomness: 1.12e-08
    Real Data Nearest Neighbor Distances Statistics:
    count 10956.000000
    mean 0.067921
    std 0.067383
    min 0.000922
    25% 0.028562
    50% 0.050649
    75% 0.087821
    max 2.987769
    dtype: float64

    Random Data Nearest Neighbor Distances Statistics:
    count 600.000000
    mean 0.749501
    std 0.382952
    min 0.030353
    25% 0.469101
    50% 0.726676
    75% 0.996883
    max 2.831957
    dtype: float64

    KS-test p-value comparing real vs random data: 0.00e+00
    Theoretical Variance (Clark & Evans 1954): 1.01

Police Shooting Dataset Project Issues

  • In police shootings, what are the age cumulative distribution functions (or CDFs) for various racial groups?
  • Are there statistically significant differences between these distributions, and how do they compare?
  • Is it possible to measure the impact of age differences using Cohen’s d?
  • Are the ages of those who were escaping and those who were not significantly different?
  • Does the chance of being shot while escaping vary by race?
  • Which statistical tests—such as Monte Carlo methods or t-tests—confirm or disprove these trends?
  • Does the percentage of shootings involving unarmed people drop noticeably when body camera footage is used?
  • What ethnic differences exist in body camera use?
  • How do police shootings by state relate to the use of body cameras?
  • What proportion of people who were shot by police had weapons as opposed to none?
  • Does the victim’s race have a statistically significant impact on whether they were armed?
  • Does using a body camera change when a weapon is present?
  • Which cities or states have the greatest per capita rates of police shootings?
  • Are there trends of greater racial disparity in shootings in some states?
  • Can high-risk areas where police encounters lead to more fatalities be identified using clustering methods?

Washington Post – Police shootings data

Some important questions that I wanted to discuss to get a better understanding of the dataset are

  1. What is the age distribution of individuals involved in these incidents?
  2. Are there any notable patterns in urban vs. rural areas?
  3. Are there any patterns in weapon types across different demographic groups?
  4. Is there any correlation between race and the type of weapon involved?
  5. Is there any relationship between body camera usage and specific police departments? What percentage of incidents had body cameras present?

The dataset poses interesting issues on whether there are temporal patterns in the number of events over different seasons or months, whether certain jurisdictions exhibit greater rates of particular incident kinds, and the relationship between body camera usage and incident outcomes.