← Back to blogBackend

PostgreSQL or MongoDB: how I actually decide

July 9, 2023·6 min read

The PostgreSQL vs MongoDB debate is usually framed as relational vs document, SQL vs NoSQL, structured vs flexible. These distinctions are technically accurate and practically useless. Both databases can handle most workloads. The question is which one handles your specific workload with less friction.

I've used both in production across different projects. The decision framework I use isn't based on database theory. It's based on three practical questions about the project.

Question 1: what do your queries look like?

If you need to join data from multiple sources in a single query, PostgreSQL is almost always the right choice. Relational joins are what PostgreSQL was built for. They're performant, well-optimized, and the query planner is extraordinarily good at them.

MongoDB has a $lookup aggregation stage that functions like a join, but it isn't the same. It doesn't benefit from the decades of join optimization that PostgreSQL has. For simple lookups, it works. For complex multi-table joins with conditions and aggregations, PostgreSQL is faster and the query is more readable.

If your queries are primarily "give me this document by its ID" or "give me all documents matching these field values," MongoDB handles this with minimal overhead. The document is stored as a single blob and returned without assembly. There's no join because all the data is in one place.

For the healthcare application I work on, the core data model is sessions: a session connects a hospital staff member, an interpreter, and a patient. A session has metadata, messages, duration tracking, and status history. In PostgreSQL, this would be a sessions table joined with users, messages, status_changes, and languages tables. In MongoDB, the session document contains everything: participant references, inline messages, embedded status history.

We use PostgreSQL for the financial and audit data (billing records, compliance logs) and MongoDB for session data. The session data benefits from document storage because a single read returns everything needed to render the session view.

Question 2: how much will your schema change?

If the schema is well-defined and stable, PostgreSQL's strict schema is an advantage. Migrations are explicit. Schema changes are reviewed and tested. You can't accidentally store a number where a string should be.

If the schema is evolving rapidly, or if different documents in the same collection legitimately have different structures, MongoDB's schema flexibility reduces friction. You can add fields without migrations. Different documents can have different shapes.

The nuance: MongoDB's flexibility can become a problem. Without schema validation, it's easy to end up with documents that have inconsistent field names, missing fields, or wrong types. MongoDB supports schema validation rules, and using them is strongly recommended for any production system.

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["email", "name", "createdAt"],
      properties: {
        email: { bsonType: "string" },
        name: { bsonType: "string" },
        createdAt: { bsonType: "date" },
        preferences: { bsonType: "object" }
      }
    }
  }
});

This gives you some of PostgreSQL's safety while keeping the flexibility for unstructured sub-documents like preferences.

Question 3: what are your operational requirements?

PostgreSQL has ACID transactions that span multiple tables. If you need to update a user's balance and record a transaction and send a notification as an atomic operation, PostgreSQL handles this natively.

MongoDB added multi-document transactions in version 4.0, but the implementation has higher overhead than PostgreSQL's. For transaction-heavy workloads, PostgreSQL is more efficient.

PostgreSQL's replication and backup ecosystem is mature. Point-in-time recovery, streaming replication, logical replication for zero-downtime upgrades. The tooling is well-documented and battle-tested.

MongoDB's replication (replica sets) is straightforward to set up and provides automatic failover. For teams without dedicated database administrators, MongoDB's operational simplicity is a real advantage.

The project where the choice was unclear

A fintech project needed to store transaction records (clearly relational) and user-generated documents (clearly document-oriented). The transaction records had fixed schemas, needed joins for reporting, and required ACID guarantees. The user documents were freeform, nested, and varied by document type.

We considered using both (PostgreSQL for transactions, MongoDB for documents). The operational overhead of maintaining two database systems for a small team wasn't worth it.

We chose PostgreSQL with JSONB columns for the semi-structured data:

CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES users(id),
  doc_type VARCHAR(50) NOT NULL,
  content JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_documents_content ON documents USING GIN (content);

JSONB columns in PostgreSQL give you document-style flexibility within a relational database. You can query into the JSON structure, index specific paths, and still use joins and transactions across the relational schema.

This worked well for our use case. The transaction records stayed in normalized tables. The freeform documents lived in JSONB columns. One database, one set of backups, one connection pool.

The costs that tutorials don't mention

MongoDB's memory usage is higher than PostgreSQL's for the same data volume. Documents are stored with field names repeated in every document. A field called transactionTimestamp uses 20 bytes per document just for the field name. In PostgreSQL, column names are defined once in the schema.

PostgreSQL's connection model (one process per connection) uses more memory per connection than MongoDB's threaded model. For applications with many concurrent connections, PostgreSQL needs a connection pooler like PgBouncer.

MongoDB Atlas (the managed service) is expensive at scale. PostgreSQL managed services (RDS, Cloud SQL) are generally cheaper for equivalent workloads. The pricing difference narrows at smaller scales.

Neither database is universally better. The right choice depends on your query patterns, your schema stability, and your operational capacity. For most web applications, PostgreSQL is the safer default because its feature set covers more use cases without requiring a second database. MongoDB is the right choice when document storage is genuinely a better fit for your data model and you're willing to manage the schema discipline that flexibility demands.

RESPONSES

Leave a response