A client just said the words "we want Cassandra-scale" and now you, the freelancer, have to decide whether to nod, quote the work, push back toward something simpler, or steer them at a managed service. This guide is the honest freelance-developer answer. Apache Cassandra is a free Apache 2.0 distributed NoSQL wide-column database that genuinely powers Netflix, Apple, and Instagram-scale write workloads, but it is also operationally heavy enough that most freelance clients are better served by ScyllaDB Cloud, DataStax Astra DB, AWS Keyspaces, or DynamoDB. Below: when Cassandra is the right pick, when it is the wrong pick, the five-minute hello-world to show the client on your laptop, the data-modeling fundamentals you have to teach them, the consistency-level decision your client needs to own, the managed-alternative pricing breakdown, the handoff runbook their ops team needs from you, and how to scope the engagement so you do not undercharge a 20-to-80-hour Cassandra job as if it were a four-hour migration.
Cassandra for freelance contracts: quick reference
| Client scenario | Is Cassandra the right pick? | What to recommend | Realistic scope hours |
|---|---|---|---|
| Time-series ingest, ≥10,000 writes per second sustained, predictable shape queries | Yes, Cassandra fits the use case | Cassandra 5.x with NetworkTopologyStrategy, 3-node minimum, weekly repair | 40 to 80 hours setup, then retainer |
| IoT or sensor ingest where data lifespan is bounded by TTL | Yes, Cassandra fits | Cassandra 5.x with per-table TTL + LeveledCompactionStrategy | 30 to 60 hours setup |
| Event log / audit log that never updates rows | Yes, but check budget first | Cassandra OSS if client owns infra; Astra DB if not | 20 to 50 hours |
| Real-time chat or messaging fan-out | Yes, this is a classic Cassandra workload | Cassandra 5.x or Astra DB | 30 to 60 hours |
| OLTP web app with relational joins | No | Postgres or MySQL | 8 to 24 hours |
| Analytics dashboards, ad-hoc SQL queries | No | Postgres + a warehouse (Snowflake, BigQuery, DuckDB) | varies |
| Single-node small app, under 100 GB total | No | Postgres or SQLite | 4 to 16 hours |
| AI / vector search workload, write-heavy | Yes, Cassandra 5.x added native Vector Search and SAI | Cassandra 5.x SAI vector index, or Astra DB | 40 to 80 hours |
| Client has no DBA budget and no ops team | No, not self-hosted | DataStax Astra DB or AWS Keyspaces (managed) | 12 to 30 hours integration |
| Client wants "Cassandra-scale" but only does 200 req/sec | Probably no | Postgres until you have actual scale data, then revisit | 8 to 16 hours |
The freelancer math: if the workload is genuinely write-heavy time-series at real scale, Cassandra earns its operational cost. If it is anything else, the honest move is to push the client toward Postgres, ScyllaDB Cloud, or DynamoDB. The conversation pays off twice: you save the client from a budget hole, and you earn the trust that gets you hired for the next project.
Who this page is for (and the conversation that brought you here)
You are a freelance backend or full-stack developer. A client emailed, slacked, or sat across the table from you and used a phrase like one of these: "we want Cassandra-scale," "we heard Cassandra is what Netflix uses," "our previous architect mentioned Cassandra," or "we have a write-heavy workload and need NoSQL." You are now in the awkward middle: you can technically learn or already know Cassandra, but you also know that recommending Cassandra to the wrong client is a setup for a six-month support tail you will not be paid for.
The point of this guide is not to convince you Cassandra is amazing. The official Apache documentation does that job for free. The point is to give you the freelance-business framework for the conversation: when to say yes, when to say no, what to charge when you say yes, and what to recommend instead when you say no. This is the section every other "Cassandra tutorial" online skips, because most tutorials are written by enthusiasts who want you to use the tool. Your job is different. Your job is to ship the client the right outcome and bill it correctly.
Throughout the rest of this guide you will find real CQL code you can run on your laptop in five minutes, a managed-alternatives pricing comparison your client will actually read, a handoff runbook the client's ops team needs from you, and a scope-to-quote table that prevents the most common freelance billing failure on a Cassandra job, which is quoting it like a Postgres migration when it is actually a multi-week distributed-systems engagement.
When Cassandra is the right pick for a freelance gig
Five tests have to pass before you recommend Cassandra to a freelance client. If any one of them fails, you are looking at the wrong tool. Walk the client through these the way an honest contractor walks a homeowner through whether they actually need a load-bearing wall removed or whether a partial opening will do.
Test one: write rate. Cassandra was built for write-heavy workloads. The honest threshold is sustained ingestion in the tens of thousands of writes per second across a cluster, or projected to be there within 12 months. If the client is doing 200 requests per second, that is a Postgres workload all day long. The published benchmark numbers for Cassandra are easy to look up on the Apache Cassandra documentation, and the gap between "I read Netflix uses Cassandra" and "we sustain 10K writes per second" is usually two orders of magnitude. Ask the client for their actual current peak QPS and their realistic 12-month projection. If they cannot answer, the client is not ready for Cassandra; they are ready for a smaller database that gets out of the way while they figure out their traffic shape.
Test two: query shape. Cassandra is query-first, not relational-first. You design tables to match your queries, not your data. If your client's app has a fixed set of read patterns (look up the last 50 events for user X, fetch the time-series window between timestamps T1 and T2 for sensor S), Cassandra fits. If your client's app requires ad-hoc joins, business-intelligence-style aggregations, or "tell me which 5% of users did Y in the last 24 hours" analytical queries, Cassandra is the wrong tool and the answer is Postgres with a warehouse layer for the analytics workload.
Test three: data lifespan. Cassandra excels when you can model your data with a clear time dimension and ideally a TTL. Time-series, event logs, IoT readings, audit trails, session data — these are textbook fits. If the data is mostly mutable, gets rewritten frequently, and has complex foreign-key relationships, you are fighting Cassandra's data model and you will lose.
Test four: client operational budget. This is the test most freelancers fail to apply, and it is the one that kills more Cassandra projects than any technical mismatch. A production Cassandra cluster has a real ongoing operational cost: three nodes minimum for any production setup, weekly nodetool repair scheduling, occasional node replacement when hardware fails, version-upgrade work as 5.x ships point releases, monitoring with Prometheus exporters, and someone on call when an incident happens. If your client does not have an ops team or budget for a managed service, you are signing them up for a system they cannot maintain after you leave. Push them to managed (Astra DB, AWS Keyspaces, ScyllaDB Cloud) every time.
Test five: you have time to deliver it correctly. Cassandra is not a four-hour migration. The smallest realistic production-ready engagement (data model design, three-node cluster provisioning, replication strategy configuration, application driver integration, basic monitoring, handoff documentation) is 20 hours. A more typical full delivery (data model review, schema migration from an existing store, multi-DC replication, monitoring, backup configuration, repair scheduling, handoff README, runbook for the ops team) is 40 to 80 hours. If the client wants this done in a sprint and you have already committed to two other projects, the honest answer is to defer or hand the project off rather than ship a half-built cluster.
If all five tests pass, you have a real Cassandra job. If even one fails, the next section is for you.
When Cassandra is the wrong pick (and what to recommend instead)
The honest section. The one no other Cassandra tutorial writes because the rest of the internet wants you to pick Cassandra. Here are five client scenarios where the correct freelance answer is "Cassandra is not right for this, here is what we should do instead."
Scenario one: an OLTP web app with relational joins. The client has users, orders, products, and a checkout flow. They asked about Cassandra because they read that NoSQL scales better. The honest answer is Postgres. Postgres in 2026 scales further than most freelance clients will ever need (single-node Postgres comfortably handles tens of thousands of QPS for OLTP work; Postgres with read replicas and connection pooling handles the next tier; managed Postgres on AWS RDS or Google Cloud SQL takes care of the ops). When the client outgrows Postgres in three years, you can revisit. Until then, Postgres saves your client from a problem they do not have yet.
Scenario two: analytical dashboards and ad-hoc queries. The client wants to slice data by 12 dimensions and let business users build dashboards. Cassandra is openly bad at this; its data model rewards predictable access patterns and punishes ad-hoc queries with ALLOW FILTERING warnings that scream "I am doing a full table scan." Recommend Postgres for the source-of-truth OLTP store plus a warehouse (Snowflake, BigQuery, ClickHouse, DuckDB) for the analytical workload. The combined cost is almost always lower than running a Cassandra cluster that fights its own design.
Scenario three: a small app that will never cross 100 GB. The client has a side project or a small SaaS. They asked about Cassandra because someone in their network mentioned it. Recommend SQLite for read-heavy small apps, or Postgres for anything writing meaningfully. The reason is that Cassandra's three-node minimum is a fixed operational cost that does not amortize against a small data set. You are paying for distributed-systems machinery you do not need.
Scenario four: the client has no DBA budget and no managed-service budget either. This is the one to be honest about. If the client cannot pay for either a DBA (full-time or consulting) or a managed Cassandra service, a self-hosted production Cassandra cluster will outlast the contract by months and start failing in ways the client cannot debug. Recommend a managed alternative even if it is not Cassandra: DynamoDB on AWS, Firestore on Google Cloud, MongoDB Atlas. Or steer them back to managed Postgres on RDS. The client's outcome matters more than the resume bullet.
Scenario five: the client wants "Cassandra-scale" but is currently doing 200 requests per second. This is the most common case. The client has read about Cassandra from a conference talk or a competitor's engineering blog. They are not at scale; they are aspiring to scale. The freelance-honest move is to build the right thing for today (Postgres, almost always) and document the migration path to Cassandra or another scale-out store for when the client actually crosses the threshold. Doing this earns the client's trust, prevents premature optimization, and positions you as the engineer who will be called back when the scale problem becomes real.
When you say no to Cassandra for one of these reasons, you are not losing the project. You are earning the next one.
Five-minute Cassandra hello-world your client can see
When Cassandra is the right pick, the demo matters. A working hello-world on your laptop, with the client looking over your shoulder, is the most efficient way to validate the engagement before you bill any hours. Here is the five-minute sequence using Docker and the official cassandra:5 image.
# Pull and run a single Cassandra 5.x node (development only; production wants 3 nodes minimum)
docker run --name cass-demo \
-p 9042:9042 \
-d cassandra:5
# Wait ~30 seconds for the node to start (check with: docker logs -f cass-demo)
# Then drop into cqlsh, the Cassandra Query Language shell
docker exec -it cass-demo cqlshInside cqlsh you create a keyspace, a table, insert rows, and read them back. This is the script that demonstrates the language and the data model:
-- Create a keyspace. In dev we use SimpleStrategy with replication_factor 1.
-- IMPORTANT: production uses NetworkTopologyStrategy. We will fix this below.
CREATE KEYSPACE freelance_demo
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
USE freelance_demo;
-- A typical time-series table: events bucketed by sensor and ordered by time.
-- The partition key (sensor_id) decides which node owns the data.
-- The clustering column (event_time DESC) decides the on-disk row order.
CREATE TABLE sensor_events (
sensor_id text,
event_time timestamp,
reading double,
metadata map<text, text>,
PRIMARY KEY (sensor_id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
-- Insert some rows.
INSERT INTO sensor_events (sensor_id, event_time, reading, metadata)
VALUES ('s-001', toTimestamp(now()), 72.4, {'unit': 'F', 'firmware': 'v2.1'});
INSERT INTO sensor_events (sensor_id, event_time, reading, metadata)
VALUES ('s-001', toTimestamp(now()), 72.7, {'unit': 'F', 'firmware': 'v2.1'});
-- Read the last 10 events for sensor s-001 — the partition key drives the query.
SELECT sensor_id, event_time, reading
FROM sensor_events
WHERE sensor_id = 's-001'
LIMIT 10;That is the whole hello-world. The client sees a working database, sees the CQL syntax (which looks like SQL but is not SQL — there are no joins, no subqueries, no GROUP BY in the ways they expect), and gets a feel for how Cassandra rewards partition-key thinking.
Now show them the production version of the same keyspace. This is the version you ship:
-- Drop the dev keyspace and recreate it with NetworkTopologyStrategy.
-- This is the keyspace shape you use in production.
DROP KEYSPACE freelance_demo;
CREATE KEYSPACE freelance_demo
WITH replication = {
'class': 'NetworkTopologyStrategy',
'us-east-1': 3
};The difference matters. SimpleStrategy is for a single-data-center development cluster and is the wrong choice in production. NetworkTopologyStrategy lets you declare replication per data center, which is how multi-DC Cassandra works and how every real production deployment is shaped. Even a single-DC production cluster uses NetworkTopologyStrategy with one DC named, because the day the client wants a second region, you do not have to migrate the keyspace; you just add the new DC's replication factor and run a rebuild. This is one of the most common freelancer mistakes: copy-pasting a SimpleStrategy keyspace from a blog post into production. Do not be that freelancer.
The whole sequence — pull image, start container, open cqlsh, create keyspace, create table, insert, select — takes less than five minutes on a laptop with a reasonable network connection. The client sees the work. You bill the demo as the first hour of the engagement.
Data modeling fundamentals: query-first, not relational-first
The single most important shift you have to teach the client is that Cassandra data modeling is query-first. In Postgres, the rule is "model your tables after your data, then write the queries you need." In Cassandra, the rule is "model your tables after your queries." If the client has 12 distinct read patterns, the right answer is often 12 distinct Cassandra tables, each with a partition key and clustering columns tuned to one read pattern. Duplicate data across tables is normal and correct. Cassandra writes are cheap; storage is cheap; coordinator-side joins do not exist.
The partition key is the single most important decision in a Cassandra schema. It does three things at once: it determines which node stores the row, it determines how big any single partition gets, and it determines what queries are possible without doing a cluster-wide scan. The freelancer rule: every read path should know the exact partition key before the query is issued.
-- BAD: partition key is too coarse. All events for all sensors land on one
-- partition; this partition grows without bound and one node carries all the load.
CREATE TABLE bad_sensor_events (
bucket text, -- always the literal string 'all'
event_time timestamp,
sensor_id text,
reading double,
PRIMARY KEY (bucket, event_time, sensor_id)
);
-- BETTER: partition key includes sensor_id so each sensor has its own partition.
-- Now the cluster distributes load by sensor, and a single sensor's history
-- lives on one node (one replica set, technically).
CREATE TABLE better_sensor_events (
sensor_id text,
event_time timestamp,
reading double,
PRIMARY KEY (sensor_id, event_time)
);
-- BEST for high-volume sensors: composite partition key with a time bucket
-- so a single sensor's history is split across day-sized partitions.
-- This prevents any single partition from growing past Cassandra's healthy
-- partition-size guidance (target under 100 MB per partition; hard avoid above 1 GB).
CREATE TABLE best_sensor_events (
sensor_id text,
day_bucket date, -- YYYY-MM-DD
event_time timestamp,
reading double,
PRIMARY KEY ((sensor_id, day_bucket), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);The double-parentheses on the primary key ((sensor_id, day_bucket), event_time) is the syntax that makes (sensor_id, day_bucket) together the partition key. Without those inner parentheses, sensor_id is the partition key and day_bucket is a clustering column, which is a different physical layout.
Clustering columns determine the on-disk order of rows within a partition. They make range queries fast (WHERE sensor_id = 's-001' AND event_time > '2026-05-01'), but they only work for the leading clustering columns. You cannot skip a clustering column. If your schema has clustering columns (event_time, event_type), you can query by event_time alone or by (event_time, event_type) together, but you cannot query by event_type alone without scanning the partition. The freelance lesson: enumerate the queries first, then pick clustering columns that serve them in declared order.
Cassandra 5.x added Storage Attached Indexes (SAI), which change the calculus a little. SAI lets you index non-partition-key columns with much better performance than the older secondary-index implementation. You can now do this in production with reasonable cost:
-- Storage Attached Index (Cassandra 5.x feature) on a non-partition-key column.
-- Use sparingly; SAI is great but it is not free.
CREATE INDEX events_by_firmware ON best_sensor_events (metadata['firmware'])
USING 'sai';SAI is genuinely useful for ad-hoc filtering needs that did not justify a second table. The honest warning: do not use SAI to convert a relational schema you brought from Postgres. Use SAI to add one or two ad-hoc indexes on top of a properly modeled Cassandra schema. The data model still has to be query-first.
Read/write patterns and consistency levels (what to recommend per client risk tolerance)
Cassandra's consistency model is the thing junior engineers and non-Cassandra clients get most wrong. Cassandra is tunably consistent: every read and every write declares its own consistency level, and you trade latency for guarantee on a per-operation basis. Here are the levels a freelancer needs to know and the freelance-appropriate recommendation for each.
ONE means "ack as soon as one replica confirms the write" for writes, and "return as soon as one replica returns the data" for reads. Lowest latency, highest availability, weakest guarantee. Recommend for high-volume telemetry where occasional staleness is acceptable.
QUORUM means "ack as soon as a strict majority of replicas confirm." If replication_factor = 3, QUORUM needs 2 replicas. This is the default safety net for most production reads and writes. Recommend as the freelance baseline.
LOCAL_QUORUM means "QUORUM within the local data center only." On a multi-DC cluster, this gives you QUORUM-strength consistency without paying the cross-DC latency penalty. Recommend for any multi-region cluster you ship to a client.
EACH_QUORUM means "QUORUM in every data center." Strong consistency across regions; the strictest guarantee that is still typically practical. Recommend only for financial or audit data where cross-DC consistency is non-negotiable.
ALL means "every replica must respond." This is rarely the right choice in production because a single failed replica blocks the operation. Use almost never.
The freelance billing implication: the client's consistency-level choice is a business decision, not a technical one, and you have to walk the founder or product owner through it explicitly. "If a sensor reading is lost during a node failure, is that a critical event or a minor blip?" determines whether the client picks ONE or QUORUM for writes. Get the answer in writing during scope. The wrong default is one of the most common post-launch fights between freelancer and client; it is also one of the most preventable.
# Python driver (cassandra-driver) example showing per-query consistency.
# Other languages have the same pattern; this is illustrative.
from cassandra.cluster import Cluster
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatementcluster = Cluster(['10.0.0.1', '10.0.0.2', '10.0.0.3']) session = cluster.connect('freelance_demo')
Telemetry write: cheap, low-stakes, prefer availability.
write = SimpleStatement( "INSERT INTO sensor_events (sensor_id, event_time, reading) VALUES (%s, %s, %s)", consistency_level=ConsistencyLevel.ONE, ) session.execute(write, ('s-001', '2026-05-20 10:00:00', 72.4))
Billing-relevant read: prefer correctness, accept extra latency.
read = SimpleStatement( "SELECT sensor_id, event_time, reading FROM sensor_events WHERE sensor_id = %s LIMIT 10", consistency_level=ConsistencyLevel.LOCAL_QUORUM, ) rows = session.execute(read, ('s-001',))
The application driver is where consistency levels live. Document the chosen consistency level in the codebase comments, in the handoff README, and in the runbook you ship to the client's ops team. When someone six months from now asks "why is this read so slow?" the documented `LOCAL_QUORUM` decision is the answer that saves you a support call.
## Managed alternatives: Astra DB vs AWS Keyspaces vs ScyllaDB Cloud vs self-hosted
This is the section that pays the client back for hiring you. Most freelance clients do not need self-hosted Cassandra; they need Cassandra-compatible managed infrastructure. The 2026 landscape has four real options, and your job is to help the client pick the right one for their workload and budget.
**DataStax Astra DB** is the official managed Cassandra offering from DataStax, the commercial sponsor of Apache Cassandra. It exposes the standard CQL interface, supports the modern Cassandra 5.x features including SAI and Vector Search, and runs on AWS, Google Cloud, and Azure. The pricing model has a free tier (small enough that production workloads will exceed it within weeks but generous enough for prototyping) and a pay-as-you-go model based on storage, read operations, and write operations. Recommend Astra DB when the client is a startup, is already deployed on AWS or GCP, wants the official Cassandra-team-built managed experience, and values Vector Search or other Cassandra 5.x features.
**AWS Keyspaces** is Amazon's Cassandra-compatible serverless offering. It speaks CQL and the Cassandra wire protocol, so most application code Just Works, but it is not literally Apache Cassandra under the hood. It is priced per request capacity unit (read and write) and per gigabyte stored. Recommend AWS Keyspaces when the client is already deep into AWS (uses VPC, IAM, CloudWatch, S3 for backups, et cetera), prefers serverless billing over capacity planning, and does not need the absolute newest Cassandra features.
**ScyllaDB Cloud** is the managed offering from ScyllaDB, which is a Cassandra-compatible database written in C++ from the ground up for higher performance per node. ScyllaDB Cloud runs on AWS and GCP and offers a free trial tier and a paid tier priced by cluster size. Recommend ScyllaDB Cloud when raw performance per dollar matters more than ecosystem stickiness, when the workload is throughput-bound, and when the client is comfortable with a non-Apache implementation that nonetheless speaks the CQL protocol.
**Self-hosted Apache Cassandra OSS** is the free Apache 2.0 option that runs on whatever VMs the client owns. Recommend self-hosted only when the client genuinely has a DBA or an ops team that can carry the operational load. The published pricing comparison every freelancer should walk a client through looks roughly like this in 2026; check the official pricing pages before you quote anything:
| Option | Pricing model | Best for | Operational load on client |
|---|---|---|---|
| Apache Cassandra OSS self-hosted | Free software; client pays for VMs, storage, ops team | Client with existing DBA / ops team | High; client owns repair scheduling, backups, monitoring, upgrades |
| DataStax Astra DB | Free tier + pay-as-you-go (storage + read/write ops) | Startups and teams who want the official managed Cassandra | Low; DataStax handles the ops |
| AWS Keyspaces | Per RU/WCU + per GB stored (on-demand or provisioned) | Teams already on AWS who want serverless | Low; AWS handles the ops |
| ScyllaDB Cloud | Per-cluster (instance-size + storage) | Throughput-bound workloads, cost-per-throughput priority | Low; ScyllaDB handles the ops |
The honest freelance recommendation in 80% of cases is one of the managed options. The 20% where self-hosted is correct: the client already runs Cassandra somewhere, has a DBA, or has compliance requirements that prevent using a third-party managed service (some healthcare and government workloads). In every other case, the conversation starts and ends with "let me show you Astra DB's free tier, and we can be running tomorrow."
For a full head-to-head between Cassandra and ScyllaDB specifically, see [Apache Cassandra vs ScyllaDB](/launch-school/comparisons/apache-cassandra-vs-scylladb) for the technical comparison and [migrating from Cassandra to ScyllaDB](/launch-school/comparisons/migrate-apache-cassandra-to-scylladb) for the migration walkthrough.
## The freelance handoff: what your client's ops team needs from you
You will leave. The cluster has to keep running. The deliverable that makes you a professional, rather than a contractor who built a system the client now cannot maintain, is the handoff package. Here is the minimum viable handoff README and runbook content you ship to the client's ops team at project close.
### The handoff README
```markdown
# Cassandra cluster operations
This document is the operations runbook for the production Cassandra cluster
deployed for [Client Name]. The cluster is Apache Cassandra 5.x on three nodes
(or whichever shape you actually shipped) in [region].
## Cluster topology
- 3 nodes, all in data center us-east-1
- Keyspace: app_production with NetworkTopologyStrategy and replication_factor 3
- Authentication: PasswordAuthenticator enabled
- Client driver: cassandra-driver (Python) version 3.29.x
## Daily operations
- Monitor: Prometheus exporter on each node, scraped to the central
monitoring stack. Dashboards live at [Grafana URL].
- Alert on: read latency 99th percentile > 100ms, write latency 99th
percentile > 50ms, any node down, any pending compactions > 1000.
## Weekly operations
- Run `nodetool repair --full` on each node, staggered across the week
so that only one node is repairing at any time. The cron is in
`/etc/cron.d/cassandra-repair` on each node.
## Monthly operations
- Snapshot the data directory and copy to S3 / GCS / [your offsite].
Script: `/usr/local/bin/cassandra-snapshot.sh`.
- Review pending Cassandra 5.x point releases. Apply security patches
within the same week.
## When a node fails
- Replacement runbook: [link to the dedicated runbook below].
- Page [escalation contact] if more than one node is down simultaneously.The node-replacement runbook
# Replacing a failed Cassandra node
A node has failed and needs to be replaced. This runbook walks through
the standard procedure for Cassandra 5.x.
1. Identify the failed node's address from `nodetool status` on a
surviving node. Note the host_id of the failed node.
2. Provision a new VM with the same Cassandra version and configuration.
Use the configuration-management tool of choice (Ansible, Chef,
Terraform; whatever the client already uses).
3. Set the replace_address_first_boot system property in the new node's
cassandra-env.sh:
JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address_first_boot=<failed_node_ip>"
4. Start Cassandra on the new node. It will stream data from surviving
replicas. Monitor with `nodetool netstats` on the new node.
5. After streaming completes (can take hours; depends on data size),
confirm with `nodetool status` that the cluster shows the new
node Up Normal and the old node is gone.
6. Run `nodetool repair --full` on the new node within 24 hours of
completion to catch any writes the cluster missed during the
bootstrap window.That is the minimum handoff. The expanded handoff (which I include on any engagement worth more than 40 hours) adds backup-restore procedures, a CQL schema diff workflow, an upgrade procedure for cross-version moves, and a quarterly capacity review checklist. The expanded handoff is also where you place the language that justifies the maintenance retainer in the next section.
Billing and scope: how to charge for Cassandra work in 2026
Cassandra is not a four-hour migration. Most of the freelance billing failures I see on Cassandra projects come from quoting the work as if it were a Postgres setup. Here is how to scope and bill so the engagement stays profitable and the client stays happy.
Scope-to-quote table
| Scope item | Realistic effort | How to bill |
|---|---|---|
| Initial discovery and data-model review | 4 to 8 hours | Fixed-price phase 1 |
| Cassandra schema design (keyspace, tables, primary keys, clustering columns) | 8 to 20 hours | Fixed-price phase 1 deliverable |
| 3-node cluster provisioning (self-hosted) including configuration management | 12 to 24 hours | Fixed-price phase 2 |
| Migration to Astra DB or AWS Keyspaces from another store | 8 to 30 hours | Fixed-price phase 2 |
| Application driver integration and consistency-level decisions | 4 to 12 hours | Fixed-price phase 3 |
| Monitoring setup (Prometheus exporters, dashboards, alerts) | 6 to 16 hours | Fixed-price phase 3 |
| Backup and restore configuration and tested rehearsal | 4 to 10 hours | Fixed-price phase 4 |
| Handoff documentation and runbooks | 4 to 12 hours | Bundle into final phase |
| Weekly repair scheduling and ongoing monitoring | 1 to 4 hours per month | Monthly retainer |
| Cassandra 5.x point-release security patching | 2 to 6 hours per quarter | Retainer or hourly emergency |
| Node replacement when hardware fails | 2 to 6 hours per incident | Retainer or hourly emergency |
| Schema migration / new query pattern requested | 4 to 16 hours per request | Hourly change order |
Why the floor is 20 hours, not 4
Cassandra has irreducible setup time that does not exist in Postgres. The data model has to match the queries, which means real time spent walking the client's product owner through their read paths. The replication strategy has to be configured for production (NetworkTopologyStrategy with the right data centers), which means real time spent understanding the client's target deployment topology. The application drivers have to be configured with the right consistency levels per query, which means real time spent on business-decision conversations with the client. The handoff has to include monitoring, repair scheduling, and a node-replacement runbook, because the cluster will outlive the engagement and the client's ops team has to keep it running. Each of these is hours, and they compound.
A 20-hour engagement is the floor for a single-DC Cassandra cluster on managed infrastructure (Astra DB or AWS Keyspaces), with a simple schema, a single application using one driver, and a basic handoff. A 40-to-80-hour engagement is realistic for multi-DC, self-hosted, multiple-application-driver, and a full handoff with runbooks. If a client wants Cassandra delivered in a week, you are either using managed infra and a tiny schema (possible) or you are setting up a failure (likely).
The maintenance retainer framing
Add a recurring retainer to every Cassandra engagement. Cassandra needs ongoing operational attention even on managed services (schema evolution, query-pattern review, point-release upgrades), and on self-hosted clusters the retainer is genuinely load-bearing. The retainer pays for itself the first time a nodetool repair schedule slips, the first time a Cassandra 5.x point release ships a security patch, or the first time the client's application team wants to add a new query pattern that requires schema work.
A reasonable retainer structure: a fixed monthly fee covering 1 to 4 hours of routine ops work (repair scheduling check, monitoring review, point-release awareness), plus an hourly rate for incident response and change orders. The freelance-business benefit is predictable monthly revenue across a portfolio of clients, which smooths the project-based volatility of pure delivery work.
The change-order conversation when the client says "just add this query"
A specific moment that pays the retainer back: a client adds a feature that needs a new read pattern. In Postgres they would just write a new SELECT. In Cassandra they need a new table (because the query shape has to match the partition key), which means a schema migration, an application driver change, a data-backfill if historical rows need to be queryable the new way, and a handoff update. This is a 4-to-16-hour change order, not a one-line code edit. Document the change-order workflow in the handoff README and reference it the first time it comes up. The client respects the predictability; you respect the value of your time.
Frequently asked questions
Is Apache Cassandra free?
Yes. Apache Cassandra is released under the Apache License 2.0, which is the same permissive license used by Apache Kafka, Apache Spark, and most Apache Software Foundation projects. There is no per-node fee, no per-seat fee, and no commercial license required for production use. The cost of running Cassandra is the cost of the infrastructure (VMs, storage, network) and the operational labor (DBA, ops engineer, or managed-service fees). DataStax Astra DB, AWS Keyspaces, and ScyllaDB Cloud are paid managed services built on or compatible with Cassandra; the underlying open-source software is free.
Cassandra vs DynamoDB for freelance clients?
DynamoDB wins for AWS-native clients who want serverless billing and do not care about portability; Cassandra (via Astra DB or AWS Keyspaces) wins for clients who want CQL portability across cloud providers and who value the Cassandra ecosystem of drivers and tools. Both are wide-column-style stores; both are tunable for consistency; both scale horizontally. The freelance decision usually comes down to where the client is already deployed and how multi-cloud-aware they want to be.
Should I run Cassandra in production for a client?
Usually no. Use DataStax Astra DB, AWS Keyspaces, or ScyllaDB Cloud unless the client has a DBA on staff or compliance requirements that block managed services. The operational load of self-hosted Cassandra (repair scheduling, monitoring, node replacement, version upgrades) is genuinely real and is not the freelance-friendly default. Recommend managed first, justify self-hosted only when the client genuinely has the ops capacity.
What is the smallest Cassandra cluster?
In production, three nodes is the minimum. Replication factor 3 requires at least three replicas, and running with replication_factor=1 in production defeats Cassandra's entire reason for existing (it is a distributed database; a single-node Cassandra is just a slower Postgres). For development and demos, a single-node cassandra:5 Docker container is fine and is what you use during the discovery phase.
How do I learn Cassandra fast?
Start with the official Apache Cassandra documentation at cassandra.apache.org, run the Docker cassandra:5 container locally, and work through a real client-shaped data model with three or four query patterns. The DataStax Academy material at docs.datastax.com is excellent for the managed-service angle and includes practical CQL examples. Pair that reading with a small personal project (an event log, an IoT simulator, a time-series ingestor) so the data-modeling muscle gets exercised. Two weeks of focused study plus a small project will get a freelancer to "I can deliver a real engagement" competence.
Does Cassandra 5.x support vector search for AI workloads?
Yes. Cassandra 5.0 added native vector search via Storage Attached Indexes (SAI) with a vector data type and similarity search functions. This makes Cassandra a viable option for AI workloads that combine vector similarity with the wide-column write-heavy patterns Cassandra is already known for (semantic search over event logs, AI agent memory stores, RAG systems with high write throughput). The honest caveat: dedicated vector databases (Pinecone, Weaviate, qdrant) are still more feature-rich for vector-only workloads. Cassandra 5.x vector search is the right pick when you already have a Cassandra cluster doing the write-heavy work and want vector search alongside it without a second data store.
Related reading and authoritative references
Internal cross-links to dig deeper into the Cassandra ecosystem on Solomon Signal:
- The full Apache Cassandra tool profile with features, pricing, integrations, and ratings
- The companion use-case page Apache Cassandra for Freelancers covering the generic freelancer workflow
- The head-to-head Apache Cassandra vs ScyllaDB comparison for the C++ alternative
- The Cassandra to ScyllaDB migration guide when the client wants to switch
- The persona-sibling guide esbuild for Freelancers demonstrating the freelance-narrative pattern across a different tool
External authority references for client-facing conversations:
- Apache Cassandra official documentation for the canonical project documentation, current as of Cassandra 5.x
- DataStax docs for the Astra DB managed-service documentation and CQL reference
The freelance move on a Cassandra job is the move on every job: tell the client the truth about what they actually need, scope the work honestly, ship the deliverable plus the runbook, and bill the retainer for the maintenance the client will absolutely require. Apache Cassandra is one of the most powerful distributed databases ever built. It is also operationally heavy enough that the freelance-honest answer in most engagements is "let's start with the managed alternative and grow into self-hosted Cassandra only if we genuinely need to." That answer wins the engagement, protects the client's budget, and earns the call back six months from now when the scale problem becomes real.