Sunday, November 18, 2018

Segment Neighborhoods in New York and Toronto

Introduction

In this project, I try to answer the following question:
Given a neighborhood (through Postcode) in Toronto, how to find the similar neighborhoods in New York City.
The answer can help a person who wants to move from Toronto to New York City. Also for business owners to start their business in New York City in similar neighborhoods.
We can use the Foursquare location data to find the neighborhoods with the similar nearby venues in New York City through k-means clustering algorithm.

This project is done through Jupyter Notebook, which can be access in Github repository through this link:


Data

New York's neighborhoods can be downloaded From NYU Spatial Data Repository through this link https://geo.nyu.edu/catalog/nyu_2451_34572The dataset includes Borough, Latitude and Logitude of each Neighborhood.
The New York dataframe has 5 boroughs and 306 neighborhoods.

Post codes and neighborhoods of Canada can be downoladed from from Wiki page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M The Latitude and Longitude of Post code can be downloaded from https://cocl.us/Geospatial_data. The neighborhoods of Toronto add string -Toronto to differentiate from the neighborhoods of New York.
The Toronto dataframe has 4 boroughs and 38 neighborhoods.

The combined neighborhoods dataframe has 9 boroughs and 344 neighborhoods.

The following Foursquare API is able to get the nearby venues by providing the longitude and latitude.
I set the radius to be 500 meters and limit to be 100, then find the nearby venues for each neighborhood.

Methodology

From the Foursquare API's results, we can find the category of venues. There are 444 unique categories for combined neighborhoods.

I set the dummy variables for each venue's category for each neighborhood:


Then group by Neighborhood and take the mean of the frequency of occurrence of each category, which be used to run the k-means algorithm:


I can find the 10 most common venues for each neighborhood:


Then run k-means to cluster the neighborhood into 10 clusters, which as similar nearby venues:


Result

I use the folium library to show the clusters of New York in map:


This is the map for the clusters of Toronto:


Discussion

For the 10 Clusters, New York has neighborhoods in 9 clusters, but Toronto has only 4 clusters, so New York is more diversified than Toronto. There are lots of neighborhoods for cluster 2 in New York, but none in Toronto. Most of the neighborhoods in Toronto are in Cluster 1 (32 of 38). The neighborhoods in Cluster 1 for New York are most likely in Manhattan and Brooklyn boroughs. There is one neighborhood Roselawn in Toronto in cluster 7, which has no similar neighborhoods in New York.
Since most of the neighborhoods in Toronto are in cluster 1, we may need additional data to segment the neighborhoods. For example the house prices or rents. There are only 38 neighborhoods in Toronto (306 in New York), we may increase the number of neighborhoods in Toronto area to get better results.

Conclusions

Based on the nearby venues, most of Toronto's neighborhoods are in cluster 1. For a person who wants to move from Toronto to New York City, also for business owners to start their business in New York City, the similar neighborhoods in New York will be most likely in Manhattan or Brooklyn boroughs. They can also avoid the neighborhoods in other clusters which has no similar neighborhoods in Toronto.

No comments:

Post a Comment