How to pick the right location for your store.
Nowadays, there are some industries that still need to have presence near the public, one of them is a mail/logistics office.
Let’s picture the following scenario. You are the biggest logistics company in your country and you have operations all over the place, even in the neighbouring countries. You establish that you wanna be in every city in Argentina, with more than 30k population.
So, you just need to:
- Grab two DataBases: One with the population by city and other with all the cities where you already have a store. (+ Geoposition)
- Normalize the names (In spanish you have some annoying characters as “tilde” and “ñ”)
- Check it out how far are every one of them from the near office with customer service and plot it.
So, easy-peasy , you got something like the next image, to impress your boss, and an ugly list to start checking for any rent available location inside the selected cities.
So now, after quick analysis, you realized that Argentinian capital has x1000 higher density than the country average one ( 15,069 people/km² vs 16,26 people/km²).
So… How do we pick the right location, for the next office, in downtown?
In order to do that, I deploy 4 different programs, that can be used individually or in tandem, depending on your needs.
01- HeatMap
From the Scratch:
Libraries that we are going to be using from now on:
Folium: Just for plotting our maps. It's going to help us understand what is going on in the script.
Shapely: An easy way to check if a point is inside a figure.
Json, Pandas and Numpy needs no introduction.
Inputs:
- Population by neighbourhood
- GeoJson by neighbourhood
GeoJson is a Json file where you have the border coordinates of a place. You can get GeoJson from several places, or you can make your own with this resource:
1- Load the DataSets
We load both of our inputs. In addition, we create a new column in our data set, that contains the % of people living in that particular neighbourhood.
2- Geojson to Polygon + Fitting all in one List.
In the previous script, we iterate over the Geojson. Grabbing the main attributes by neighbourhood. So finally, we got a list containing:
- Name of the neighbourhood, % of people living in it, Polygon (Shapely type), Bounds
3- Geoposition each point.
Next, we must add each point inside our city, according to the amount of people in each neighbourhood.
We randomly generate the coords of each “person”. As you may see, we can only generate a geoposition inside a square (Defined by the bounds of our neighbourhood). In order to locate the points in the correct position, we need to check that the point is actually inside our polygon, if not, just run it again.
The amount of times that you need to redo the random positioning depends on the relation between the Square and the actual polygon. So, for example, if the polygon has half the area of the square:
The probability of a random point being inside the polygon is going to be:
Pol = Area of the polygon
Sq= Area of the Square
P(x) = Pol/Sq
The amount of times (N) that you need to try in order to fit H positions is equal to:
N = H * (Sq/Pol)
4- PLOT
Finally, we plot our map, and get the html file to analyse.
02-Process
Now that We have our heat map, with ALL the people living inside the city. We need to filter the people who already is in coverage area from a near store.
The coverage estimated area for a store is a circle with r= 1,2km (Based on experience). If a customer needs to move from home more than that, he's probably gonna choose another company. Therefore, we need to focus on the potential customers who lives further than 1,2km from a Store.
To do that, we are going to remove all the people in our dataset that is in the “coverage area” and redo the HeatMap.
Long story short, We got:
- Redistributed HeatMap.
- A file with all the people far from a store
03-Optimization
Now, we need to find the best spot for our new store. We are going to use an optimization algorithm based on greedy best first search.
What are you talking about?
Let’s imagine that we are in a maze. Actually, in this maze:
Each square of the maze has a specific value, and we need to pick in which square we wanna lay, in order to maximize our value.
Let’s get rid of the 10 point, because they are the lowest values in the maze.
Now that the maze is cleaner. We start in the X position (Seed), and we can only move: left, right, up, down.
We check our value at this moment:
Our current value is 20. We check our surroundings and decided to “explore”up, and to “explore”left.
We are going to iterate now until we reach a square that has a bigger value that the four surroundings
We have 2 dead ends, and apparently 90, is the best we can get from this maze. We pick that one.
Why don’t you get to the 100 square?!?!
Well, Actually with only one seed we are going to get a relative max.
If we wanna have an absolute max, We need a few more seed.
Ok, but how can this method help us to pick the right location for the store?
Let’s then pick a seed, in our map.
The value in the script, is going to be set by the amount of people inside our coverage area (We got that info in the last app)
The value of my seed is 308. We check the surroundings, and only the bottom point is bigger than the seed. So, we stand in that position and repeat.
The reach value is 363, and that is the max for that seed.
If we see the bigger picture, we figure out that we didn't explore a lot of possibles locations for our store.
When we add a few more seed. This is what happened!
We still have the 363 as a relative max (Blue). But we find an absolute max with a value of 464 (Green).
The absolute max given, it’s not an actual point where you need to set your new store. You would check inside that coverage area and probably find a spot with a bigger value. But being realistic, the rent available locations to set a store are limited, so after you got an approximation, you are going to be best fitted checking for other features (Like rent cost, or accessibility to the store).
The code + comments are self-explanatory.
04- Indicator
We are ready to start setting up new stores all over the place. But How are we going to measure how well are we doing it?
For that, there is a fourth script, that in combination with the second, give us two main indicators.
- Amount of people reached
- Area of the city where we have coverage (% of the total)
Finding out the combination of the areas above, could be pretty annoying. However, we can sort this out by using the Montecarlo method.
The essence of Montecarlo method consist to place in a random and uniform way, a large number of dots in a known space, and check out how many of them are inside the area analysed.
The script consist in two main parts:
1-Creation of a list with points “uniform and random” that are inside the city.
2-Check how many of these points actually are attended by the coverage area.
The results:
- 37% of the people are in the coverage area of the stores.
- 30% of the city AREA is covered by the store coverage.