New Neighbourhood Recommender Engine for Movers

Access the GitHub repository for the project here

A brief presentation summarising the project and results can be found here

Consider the following scenario: you live on the west side of the city of Toronto. One day, a company from the east side of the city offers you a lucrative position. The pay is better, the benefits fantastic but there is one catch. To make sure that you make it to work on time each day and don’t have to spend 3-4 hours of your valuable time traveling, you have to move. The problem is that you quite like the neighborhood you live in currently. The area is well connected, has plenty of grocery stores, and your personal favorite café is just an arm’s reach away. So, what do you do? Do you give up the job opportunity and the better pay and career prospects that come with it or blindly risk moving to a neighborhood you know next to nothing about?

It can be unanimously agreed that what a person most needs in such a situation, in addition to the courage to give up morning coffee, is more information about prospective neighborhoods. The most common solution is to contact a realtor and provide them with a list of your requirements. Although this is the most common practice, it is highly unreliable. The person who is contacted may not be an expert on the matter. Moreover, it is highly probable that they are not intimately acquainted with each neighborhood and are more concerned with the accommodation aspect rather than the surrounding factors. In such a situation, the only alternative that remains is for us to turn to technology and since machine learning and more importantly data science are fields that exclusively focus on gaining insights from raw data, we shall use them in this difficult time.

Given the appropriate dataset, segment and cluster similar neighborhoods in the city of Toronto into groups or clusters of identical properties. Given the location coordinates of the center of each neighborhood, contact the Foursquare API to collect a list of popular venues from that neighborhood. Following this using machine learning and data science techniques to arrive at an appropriately grouped data set with cluster labels assigned to each area so that the data can be queried to obtain similar neighborhoods to a desired one. Thus, the ultimate goal of the project is to create an appropriately labeled dataset which can suitably be queried. The user should be able to enter the coordinates of a preferred neighborhood and be provided in return with a list of all the other neighborhoods in the city with similar properties.

The project helped gain another new skill which is to interact with the foursquare API and manage and manipulate the results returned by it to draw meaningful conclusions. At the end of the development cycle, a highly informative database was created which was flexible enough to be updated to present real-time, up to date information and could be easily queried to extract useful information.