Design a proximity server that implements local search

Michael Tong
3 min readJun 2, 2023
Photo by Jan Kopřiva on Unsplash

With this problem, we have to design a system that handles “places nearby” queries . Given a latitude and longitude, find all the places that are within a given distance.

Well…how do we go about this solution?

Solution 1: Given two latitude and longitude, grab all the locations within this square.

Here is how the query would look like:

select * from places where lat >= X-d and lat <= X+d and long >= Y-d and long <= Y+d

What this will do is getting two datasets of locations with the two indices. While this works, one or both of these datasets can be big and that intersecting the two datasets of locations can be tremendously draining for the system.

Solution 2: Separate everything into grids

Given a latitude, longitude and distance, let’s compute all the grids and find all the locations within this grids. After generating the grids, we can filter out any locations that are not within the grids.

While this effectively eliminate the possibility of dealing a huge pool of dataset like we do in solution 1, there is a possibilty that there is a disproportionate amount of restaurants in one grid while none or little in some grids.

Photo by jaikishan patel on Unsplash

Hmmm…how do we solve this problem?

What we can do is design the grids dynamically. For some grids with a disproportionate amount of locations, we can assign different grids for them specifically. For the purpose of this problem, we do not have to go too much detail on how the dynamic grids are implemented.

Next part of this question involves understanding where to store all the indices for these locations. The question is: how do we design a system that handles 100k locations that are readily available in 200 milliseconds or less?

That can vary depending on where we store the data. Before we decide, let’s understand what the worst case scenarios are. Let’s say there are 50M places and each of these places take 2 bytes of data. That’s around 100M of byte of data. That is not a lot of bytes in today’s computer world.

Let’s think from another perspective: we can either store this information in the database or memory. One argument we might have for storing this information in the database but what about memory? If the amount of the data we store is magnificent(maybe billions and trillions of bytes of data) then database is probably a better option since we can shard it.

However, in this case memory should be sufficient for the data storage. Not to mention, it is also faster to access memory comparing to database.

Photo by Joshua Olsen on Unsplash

What if the amount of locations doesn’t fit into memory?

Well…we can partition all these locations by some sort of place indices or by regions. When an update happens to a location, you locate to the right region and updates the location.

How about reads? There are two ways to go about:

  • region based sharding: pick data based on regions and query index. How do we handle non-uniform distributed locations amog grids? We can try to dynamically distribute again amog popular locations but there comes a great complexity from doing so.
  • place id based sharding: sends a query to all the servers. Each server gives you a list of locations and a central server will handle this data and return an aggregated list.

If there is 100k query per second, what happens? With either sharding, you need at least 10–100 shards to handle all these queries, let’s say each one of them handle 10k queries. If it is too much for these servers to handle, you would need to increase the amount of servers to handle the requests.

--

--