Sign-up to receive the latest articles related to the area of business excellence.

Crime analysis using K-Means clustering

View All Blogs

Author: Palak Kumar

Criminal activities are a major cause for concern for law enforcement officials. Existing strategies to control crime are usually reactive, responding to the crime scene after the crimes have occurred. However, with the advent of technology and data analytics, it is now possible to recognize patterns in criminal activities using historical data and help law enforcement officers do a better job in crime prevention and control.

There are certain questions that law enforcement officers often ask - is there any correlation between crime type, the weapon used, and locations? What are the demographics of the people performing a certain crime? What are the most typical weapons that are possessed by the criminals? Can the reports help us in prediction or future criminal activities?

To answer these types of questions, we can use historical data about past criminal activities and mine this data for specific patterns. Historical data such as date, time, location of the crime, type of crime committed, gender, weapons used etc. are now easily available. This prior crime information can be converted to data mining problem and any information gathered from this analysis can help law enforcement officials do a better job.

Data analysts help speed up the process of solving crimes and help in law enforcement. Criminal data analytics works to create a geospatial plot of criminal activities. The plots can be analysed to predict the instances of crime.

Steps of crime pattern analysis

K-Means Geospatial Plot
  • Determine the geospatial plot of crimes in the city: The first step is the collection of crime information in a given city. This is usually available from multiple places such as law enforcement reports, victimization statistical surveys, collation of newspaper articles etc. This data can be plotted on a geographical map such as the one shown above.
  • Use clustering techniques to identify patterns: Clustering is a method to depict the dataset in the form of subsets called clusters so that the observations in the same cluster make some sense. It is a method of unsupervised learning and is used for statistical data analysis.

    The use of K-means data mining approach helps us identify patterns since it is very difficult for humans to process large amounts of data, especially if there are missing information to detect patterns.

    Clusters are useful in identifying a crime spree committed by a single or the same group of suspects. These clusters are then presented to the detectives who drill down using their domain expertise to solve the cases.

    K-means clustering is one of the methods of cluster analysis. In the K-means algorithm, each point is assigned to the cluster whose centroid is the closest. K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. It can be applied to relatively large sets of data.

    Use the following steps for cluster analysis:
    • Sorting of the records - the first sorting will be done on the most important characteristics based on the detective’s experience.
    • Data mining is then used to detect more complex patterns as in real life there are many attributes associated with the crime and we often have partial information available.
    • Identification of significant attributes for clustering.
    • Placing different weights on different attributes dynamically based on the crime types being clustered.
    • Cluster the dataset for crime patterns and present the results to the detective or the domain expert along with the statistics of the important attributes.
    • The detective looks at the clusters and gives recommendations.
    • Unsolved crimes are clustered based on significant attributes and the result is given to detective for inspection.

    In this article, we will use the K-means approach for generating the clusters. The K-means algorithm consists of the following steps:
    • Decide the number of clusters, K. The K-means cluster analysis requires you to know how many clusters to generate before the start of the algorithm.
    • Initialize the K clusters or generate them randomly. Different starting points for the clusters may yield different results.
    • Assign each observation to the nearest cluster centre. This is an iterative technique which builds the clusters as we progress.
    • Re-compute the new cluster centres. Note that you need to specify the algorithms for determining the distance between clusters.
    • Repeat the process until none of the observations changed their membership in the last iteration.
    • An example of the K-means cluster analysis is shown in the figure below. In this example, we show the creation of 3 clusters (each in a different colour).
    • K-Means Construction
  • Analysing patterns and drawing conclusions This involves the analysis of each cluster formed. The computer is unable to understand what is unique about each cluster. This is where human expertise comes into play. For example, all the crimes committed in red may have been committed using a similar gun or that all the crimes shown in blue may be due to theft of jewellery where people were walking on the road and the assailants were traveling on a motor bike etc. This helps to find crime patterns and trend correlations. Once a specific pattern is detected, the law enforcement officers can deploy additional and suitable resources for detection and suppression of criminal activities.

Advantages of clustering for crime pattern analysis

There are several advantages to using this approach for crime pattern analysis:
  • This approach helps us to analyse the historical crime rates and enhance the crime resolution rate of the present.
  • Take actions to prevent future incidents by using preventive mechanisms based on observed patterns.
  • Reduce the training time of the officers that are assigned to a new location and have no prior knowledge of site-specific crimes.
  • Increase operational efficiency by optimally redeploying limited resources to the right places at the right times.

Limitations of crime pattern detection

There are a few limitations to using this approach for crime pattern detection:
  • Crime pattern analysis can only help the detectives and not replace them. It is up to the human experts to interpret what the clusters are telling us.
  • Data mining is sensitive to the quality of input data and that can be inaccurate sometimes. Missing information can also cause errors.
  • Mapping data mining attributes is a difficult task and hence it requires a skilled data miner and a crime data analyst with good domain knowledge.

Example using K-means on Sigma Magic software

Here is a K-means analysis of the Crimes in India. This example uses randomly generated data to illustrate the concepts and there is no correlation with real data. The data included the places, the number of murders, theft, cybercrimes, and the percentage of the population living in the urban area. The number of clusters K is 4 and it took 3 iterations to obtain the pattern.

K-Means Example
This cluster data can now be given to detectives to find a pattern, identify strategies to attack the problem, share best practices between different organizations and put policies and procedures in place to prevent the occurrence of future crime.

Follow us on LinkedIn to get the latest posts & updates.


Any images shown on our website were sourced from websites such as,,, and Only images that were displayed as Royalty Free and those that can be used for Personal and Commercial projects were used. Under no circumstance do we claim ownership of these images. In fact, we have no means of detecting if in fact these images are copyright free. If you feel any images are infringing on any copyrights, please let us know and we will promptly take down those images.

sigma magic adv