Creating a GeoSpatial Index with Python and MongoDB

It’s a fairly common use case: given some latitude and longitude coordinates, find results close to those points. Fortunately for us application developers, MongoDB makes these sorts of queries relatively painless. It has built-in support for geospatial querying that enable applications to search the dataset for points near a given set of coordinates.

In order to support geospatial queries, our data needs to be in a certain format and we need to create an index which will support the queries. I will show you how to use PyMongo to create our collection, add some data, and then create the index.

PyMongo

PyMongo is the de facto standard for connecting to and performing on a MongoDB database from a Python application. I am going to assume that you are comfortable installing a python module and skip these steps and go straight in to writing our Python code.

Connecting to MongoDB

To start, we need to import the PyMongo module and create a client to connect to our MongoDB database. I will use environment variables to configure the client and provide some reasonable defaults for local development.

import pymongo

DB_USER = environ.get("DB_USER")
DB_PASS = environ.get("DB_PASS")
DB_NAME = environ.get("DB_NAME")
DB_HOST = environ.get("DB_HOST", "localhost")
DB_PORT = environ.get("DB_PORT",  "27017")
DB_COLLECTION = environ.get("DB_COLLECTION", "locations")

(DB_USER and DB_PASS and DB_NAME) or exit("DB_USER, DB_PASS, and DB_NAME must be set in environment")

client = pymongo.MongoClient("mongodb://%s:%s@%s:%s" % (DB_USER, DB_PASS, DB_HOST, DB_PORT))

db = client[DB_NAME]

Loading Data from CSV

I am going to load data from a CSV file which has three columns: name, latitude, longitude. For each row, we will insert a document into the collection specified by the DB_COLLECTION variable. We will specify the filename as another environment variable.

import pymongo
import csv

CSV_FILE = environ.get("CSV_FILE") or exit("CSV_FILE must be set in environment")
...

db = client[DB_NAME]

with open(CSV_FILE) as file: 
    for name, lat, lon in csv.reader(file):
        pass # TODO: insert document

Creating the Documents

In order to support geospatial querying, the documents we store in our collection must have a field that is supported GeoJSON object. This example will be using the Point type since we only have a pair of latitude and longitude coordinates, but by looking at the documentation you can see that these types can be used to support a wide range of geospatial queries.

Our documents will be formatted as follows:

{
    name: <name>,
    point: {
        type: 'Point', 
        coordinates: [longitude,latitude]
    }
}

The type field must be a valid GeoJSON type (see above link for other types). Now all that we know the shape of our documents, we can create them and add them one at a time to our collection.

...
with open(CSV_FILE) as file: 
    for name, lat, lon in csv.reader(file):
        document = { 
            'name': name,
            'point': { 
                'type': 'Point',
                'coordinates': [lon,lat]
            }
        }
        db[DB_COLLECTION].insert_one(document) 

Adding an Index

There are two types of geospatial indexes that we have to consider for our queries: 2d and 2dsphere. The difference is subtle, but basically comes down to whether or not our data points are on a flat plane or on the surface of a sphere. This example will use the latter index since we are using points on the Earth’s surface.

PyMongo comes with a set of constants for the index types so we don’t need to worry about typos or anything of that nature. We just pass a list of fields and the index type into the create_index function:

...
db[DB_COLLECTION].create_index([('point', pymongo.GEOSPHERE)]

Summary

Using PyMongo to create a geospatial index in MongoDB really can’t get much easier. One of the great things about Python is how little code you have to write to get these sorts of tasks done. With only a couple dozen lines of code we have connected to our Mongo database, imported points from a CSV file, added the items to our Mongo collection, and created the index we will need to support our use case.

Leave A Reply

Your email address will not be published. Required fields are marked *