Documents are addressed in the database via a (indexed) unique key.
Enable retrieving documents based on content, not only keys.
Schema-free organization of data:
Documents are flexible, semistructured, and have a hierarchical nature.
Any sort of document can be stored, and those documents can change in form at any time.
No schema update is required and no database downtime is necessary.
Ability to add additional metadata outside of the content of the document.
Enable flexible indexing, powerful ad hoc queries, and analytics over collections of documents.
Uses cases:
Where documents evolve over time (catalogs, user profiles, content management)
When prototyping (web apps)
MongoDB
MongoDB is an open-source, NoSQL database that provides support for JSON-like storage systems.
MongoDB is available under General Public license for free.
Also available under Commercial license from the manufacturer.
Supports a flexible data model for storing data of any structure:
For example, hierarchical relationships, arrays, and other complex structures.
BSON format adds support for data types like date that aren’t supported in JSON.
Provides high-performance data persistence.
Indexes are created to improve the performance of searches.
Supports primary and secondary indexes for faster queries.
They support faster queries and can include keys from embedded documents and arrays.
Ad hoc queries:
Supports a rich query language to support CRUD operations (like SQL)
Supports field, range queries, regular expression searches.
Provides a framework for data aggregation modeled on the concept of MapReduce.
Supports query operations that perform a text search of string content.
Allows for geospatial indexing to efficiently execute spatial queries.
Replica set (group of servers) provides automatic failover and data redundancy:
The primary replica interacts with the client and performs CRUD operations.
The secondary replicas maintain a copy of the data of the primary replica.
This is one of the major key features that make MongoDB production-ready.
Uses the concept of sharding to scale horizontally:
Sharding distributes data across a cluster of machines.
Load balancing: Supports creating zones of data based on the shard key.
Multiple clients can read and write the same data and it has locking feature to ensure concurrency.
Uses internal memory for storing the (windowed) working set, enabling faster access of data.
Supports multiple storage engines.
Has BI connector for SQL tools.
Atlas Data Lake allows users to query data, using the MongoDB Query Language, on AWS S3, no matter their format, including JSON, BSON, CSV, TSV, Parquet and Avro.
Documents (i.e. objects) correspond to native data types in many programming languages.
Documents are stored in collections:
Collections are analogous to tables in relational databases.
But they do not enforce any document structure.
Data modeling should be done based data retrieval patterns.
For example, heavy query usage benefits from the use of indexes.
A write operation is atomic on the level of a single document, even with embedded documents.
Operation on multiple documents is not atomic though.
But latest version supports multi-document ACID transactions.
Dynamic schema supports fluent polymorphism.
To make schema stricter, define validation rules (on a per-collection basis)
Design your database in a way that the most common queries can be satisfied by querying a single collection, even when this means that you will have some redundancy in your database.