MongoDB: Connecting to NoSQL Databases

MongoDB: Connecting to NoSQL Databases

Whats MongoDB?

MongoDB is a NoSQL database that stores data in JSON-like documents with flexible schemas instead of the traditional table-based database structure. The document model maps to objects in application code which makes it easier to work with. It has a rich-query language that supports dynamic queries on documents. MongoDB also has its own aggregation pipeline and support for map-reduce, eliminating the need for complex data pipelines. PyMongo is a Python library that contains tools for interacting with MongoDB databases To install PyMongo from PyPI:

python -m pip install pymongo

For this article, we will be working with a local MongoDB instance. Instructions for downloading and installing can be found here. Additionally, I recommend installing MongoDB Compass to have a GUI to explore the data and see the changes made by the code.

Making a Connection with the MongoDB instance

When working with MongoDB databases, or any database for that matter, the first thing we need to do is to make a connection. You can do so using the MongoClient() method:

import pymongo
client = pymongo.MongoClient()

This establishes a connection to the default host and port, we can also specify the host and port:

client =  pymongo.MongoClient('localhost', 27017)

Managing NoSQL Databases With MongoDB

MongoDB is a document-oriented database classified as NoSQL. It’s become popular throughout the industry in recent years and integrates extremely well with Python. Unlike traditional SQL RDBMSs, MongoDB uses collections of documents instead of tables of rows to organize and store data.

MongoDB stores data in schemaless and flexible JSON-like documents. Here, schemaless means that you can have documents with a different set of fields in the same collection, without the need for satisfying a rigid table schema.

You can change the structure of your documents and data over time, which results in a flexible system that allows you to quickly adapt to requirement changes without the need for a complex process of data migration. However, the trade-off in changing the structure of new documents is that exiting documents become inconsistent with the updated schema. So this is a topic that needs to be managed with care.

Note:

JSON stands for JavaScript Object Notation. It’s a file format with a human-readable structure consisting of key-value pairs that can be nested arbitrarily deep.

MongoDB is written in C++ and actively developed by MongoDB Inc. It runs on all major platforms, such as macOS, Windows, Solaris, and most Linux distributions. In general, there are three main development goals behind the MongoDB database:

  • Scale well
  • Store rich data structures
  • Provide a sophisticated query mechanism

MongoDB is a distributed database, so high availability, horizontal scaling, and geographic distribution are built into the system. It stores data in flexible JSON-like documents. You can model these documents to map the objects in your applications, which makes it possible to work with your data effectively.

MongoDB provides a powerful query language that supports ad hoc queries, indexing, aggregation, geospatial search, text search, and a lot more. This presents you with a powerful tool kit to access and work with your data. Finally, MongoDB is freely available and has great Python support.

Reviewing MongoDB’s Features

As for the database management side, MongoDB offers the following features: As for the database management side, MongoDB offers the following features:

  • Query support: You can use many standard query types, such as matching (==), comparison (<, >), and regular expressions. Data accommodation: You can store virtually any kind of data, be it structured, partially structured, or even polymorphic.
  • Scalability: It handles more queries just by adding more machines to the server cluster.
  • Flexibility and agility: You can develop applications with it quickly. Document orientation and schemalessness: You can store all the information regarding a data model in a single document.
  • Adjustable schema: You can change the schema of the database on the fly, which reduces the time needed to provide new features or fix existing problems.
  • Relational database functionalities: You can perform actions common to relational databases, like indexing.

As for the operations side, MongoDB provides a few tools and features that you won’t find in other database systems:

  • Scalability: Whether you need a stand-alone server or complete clusters of independent servers, you can scale MongoDB to whatever size you need it to be.
  • Load-balancing support: MongoDB will automatically move data across various shards.
  • Automatic failover support: If your primary server goes down, then a new primary will be up and running automatically.
  • Management tools: You can track your machines using the cloud-based MongoDB Management Service (MMS).
  • Memory efficiency: Thanks to the memory-mapped files, MongoDB is often more efficient than relational databases.

Creating a Collection

To create a collection in MongoDB, use database object and specify the name of the collection you want to create.

MongoDB will create the collection if it does not exist.

import pymongo

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]

mycol = mydb["customers"]

Important

In MongoDB, a collection is not created until it gets content!

Insert Into Collection

To insert a record, or document as it is called in MongoDB, into a collection, we use the insert_one() method.

The first parameter of the insert_one() method is a dictionary containing the name(s) and value(s) of each field in the document you want to insert.

import pymongo

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]

mydict = { "name": "John", "address": "Highway 37" }

x = mycol.insert_one(mydict)
`

Return the _id Field

The insert_one() method returns a InsertOneResult object, which has a property, inserted_id, that holds the id of the inserted document.

mydict = { "name": "Peter", "address": "Lowstreet 27" }

x = mycol.insert_one(mydict)

print(x.inserted_id)

If you do not specify an _id field, then MongoDB will add one for you and assign a unique id for each document.

In the example above no _id field was specified, so MongoDB assigned a unique _id for the record (document).

Insert Multiple Documents

To insert multiple documents into a collection in MongoDB, we use the insert_many() method.

The first parameter of the insert_many() method is a list containing dictionaries with the data you want to insert:

import pymongo

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]

mylist = [
  { "name": "Amy", "address": "Apple st 652"},
  { "name": "Hannah", "address": "Mountain 21"},
  { "name": "Michael", "address": "Valley 345"},
  { "name": "Sandy", "address": "Ocean blvd 2"},
  { "name": "Betty", "address": "Green Grass 1"},
  { "name": "Richard", "address": "Sky st 331"},
  { "name": "Susan", "address": "One way 98"},
  { "name": "Vicky", "address": "Yellow Garden 2"},
  { "name": "Ben", "address": "Park Lane 38"},
  { "name": "William", "address": "Central st 954"},
  { "name": "Chuck", "address": "Main Road 989"},
  { "name": "Viola", "address": "Sideway 1633"}
]

x = mycol.insert_many(mylist)

#print list of the _id values of the inserted documents:
print(x.inserted_ids)

The insert_many() method returns a InsertManyResult object, which has a property, inserted_ids, that holds the ids of the inserted documents.

Insert Multiple Documents, with Specified IDs

If you do not want MongoDB to assign unique ids for you document, you can specify the _id field when you insert the document(s).

Remember that the values has to be unique. Two documents cannot have the same _id.

import pymongo

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]

mylist = [
  { "_id": 1, "name": "John", "address": "Highway 37"},
  { "_id": 2, "name": "Peter", "address": "Lowstreet 27"},
  { "_id": 3, "name": "Amy", "address": "Apple st 652"},
  { "_id": 4, "name": "Hannah", "address": "Mountain 21"},
  { "_id": 5, "name": "Michael", "address": "Valley 345"},
  { "_id": 6, "name": "Sandy", "address": "Ocean blvd 2"},
  { "_id": 7, "name": "Betty", "address": "Green Grass 1"},
  { "_id": 8, "name": "Richard", "address": "Sky st 331"},
  { "_id": 9, "name": "Susan", "address": "One way 98"},
  { "_id": 10, "name": "Vicky", "address": "Yellow Garden 2"},
  { "_id": 11, "name": "Ben", "address": "Park Lane 38"},
  { "_id": 12, "name": "William", "address": "Central st 954"},
  { "_id": 13, "name": "Chuck", "address": "Main Road 989"},
  { "_id": 14, "name": "Viola", "address": "Sideway 1633"}
]

x = mycol.insert_many(mylist)

#print list of the _id values of the inserted documents:
print(x.inserted_ids)

Find One

In MongoDB we use the find and findOne methods to find data in a collection.

Just like the SELECT statement is used to find data in a table in a MySQL database.

To select data from a collection in MongoDB, we can use the find_one() method.

The find_one() method returns the first occurrence in the selection.

import pymongo

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]

x = mycol.find_one()

print(x)

Find All

To select data from a table in MongoDB, we can also use the find() method.

The find() method returns all occurrences in the selection.

The first parameter of the find() method is a query object. In this example we use an empty query object, which selects all documents in the collection.

No parameters in the find() method gives you the same result as SELECT * in MySQL.

import pymongo

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]

for x in mycol.find():
  print(x)

Return Only Some Fields

The second parameter of the find() method is an object describing which fields to include in the result.

This parameter is optional, and if omitted, all fields will be included in the result.

import pymongo

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]

for x in mycol.find({},{ "_id": 0, "name": 1, "address": 1 }):
  print(x)