I was inserting the contents of wikipedia, so the columns were at multi
kilobyte strings. It's a good data source to run tests with as the records and
relationships are somewhat varied in size.
My main point was to say the best way to benchmark cassandra with with multiple
server nodes, multipl
Since each row in my column family has 30 columns, wouldn't this translate
to ~8,000 rows per second...or am I misunderstanding something.
Talking in terms of columns, my load test would seem to perform as follows:
100,000 rows / 26 sec * 30 columns/row = 115K columns per second.
That's on a dua
To give an idea, last March (2010) I run the a much older Cassandra on 10 HP
blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing around 250K
columns per second with 500 python processes loading the data from wikipedia
running on another 10 HP blades.
This was my first out of the
You don't give many details, but I would guess:
- your benchmark is not multithreaded
- mongodb is not configured for durable writes, so you're really only
measuring the time for it to buffer it in memory
- you haven't loaded enough data to hit "mongo's index doesn't fit in
memory anymore"
On Tue
Use more nodes to increase your write throughput. Testing on a single
machine is not really a viable benchmark for what you can achieve with
cassandra.
I am working for client that needs to persist 100K-200K records per second
for later querying. As a proof of concept, we are looking at several
options including nosql (Cassandra and MongoDB).
I have been running some tests on my laptop (MacBook Pro, 4GB RAM, 2.66 GHz,
Dual Core/4 logical cores)