Hello everyone,

we are experiencing performance issues with Cassandra overloading effects 
(dropped mutations and node drop-outs) with the following workload:

create table test (year bigint, spread bigint, time bigint, batchid bigint, 
value set<text>, primary key ((year, spread), time, batchid))
inserting data using an update statement ("+" operator to merge the sets). Data 
_is_being_ordered_ before the mutation is executed on the session. Number of 
inserts range from 400k to a few millions.

Originally we were using scalding/summingbird and thought the problem to be in 
our Cassandra-storage-code. To test that i wrote a simple cascading-hadoop job 
(not using BulkOutputFormat, but the Datastax driver). I was a little bit 
surprised to still see Cassandra _overload_ (3 reducers/Hadoop-writers and 3 
co-located Cassandra nodes, as well as a setup with 4/4 nodes). The internal 
reason seems to be that many worker threads go into state BLOCKED in 
AtomicBTreeColumns.addAllWithSizeDelta, because s.th. called "waste" is used up 
and Cassandra switches to pessimistic locking.

However, i re-wrote the job using plain Hadoop-mapred (without cascading) but 
using the same storage abstraction for writing and Cassandra _did_not_overload_ 
and the job has the great write-performance i'm used to (and threads are not 
going into state BLOCKED).  We're totally lost and puzzled.

So i have a few questions:
1. What is this "waste" used for? Is it a way of braking or load shedding? Why 
is locking being used in AtomicBTreeColumns?
2. Is it o.k. to order columns before inserts are being performed?
3. What could be the reason that "waste" is being used-up in the cascading job 
and not  in the plain Hadoop-job (sorting order?)?
4. Is there any way to circumvent using up "waste" (except for scaling nodes, 
which does not seem to be the answer, as the plain Hadoop job runs 
Cassandra-"friendly")?

thanks in advance,
regards,
Andi








SEEBURGER AG            Vorstand/SEEBURGER Executive Board:
Sitz der Gesellschaft/Registered Office:                Bernd Seeburger, Axel 
Haas, Michael Kleeberg, Friedemann Heinz, Dr. Martin Kuntz, Matthias Feßenbecker
Edisonstr. 1
D-75015 Bretten         Vorsitzende des Aufsichtsrats/Chairperson of the 
SEEBURGER Supervisory Board:
Tel.: 07252 / 96 - 0            Prof. Dr. Simone Zeuchner
Fax: 07252 / 96 - 2222
Internet: http://www.seeburger.de               Registergericht/Commercial 
Register:
e-mail: i...@seeburger.de               HRB 240708 Mannheim


Dieses E-Mail ist nur für den Empfänger bestimmt, an den es gerichtet ist und 
kann vertrauliches bzw. unter das Berufsgeheimnis fallendes Material enthalten. 
Jegliche darin enthaltene Ansicht oder Meinungsäußerung ist die des Autors und 
stellt nicht notwendigerweise die Ansicht oder Meinung der SEEBURGER AG dar. 
Sind Sie nicht der Empfänger, so haben Sie diese E-Mail irrtümlich erhalten und 
jegliche Verwendung, Veröffentlichung, Weiterleitung, Abschrift oder jeglicher 
Druck dieser E-Mail ist strengstens untersagt. Weder die SEEBURGER AG noch der 
Absender (Petter. Andreas) übernehmen die Haftung für Viren; es obliegt Ihrer 
Verantwortung, die E-Mail und deren Anhänge auf Viren zu prüfen.


This email is intended only for the recipient(s) to whom it is addressed. This 
email may contain confidential material that may be protected by professional 
secrecy. Any fact or opinion contained, or expression of the material herein, 
does not necessarily reflect that of SEEBURGER AG. If you are not the addressee 
or if you have received this email in error, any use, publication or 
distribution including forwarding, copying or printing is strictly prohibited. 
Neither SEEBURGER AG, nor the sender (Petter. Andreas) accept liability for 
viruses; it is your responsibility to check this email and its attachments for 
viruses.

Reply via email to