> Any other ideas? Sounds like a nasty heisenbug, can you replace or rebuild the machine?
Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/05/2013, at 9:36 PM, Michal Michalski <mich...@opera.com> wrote: > I've finally had some time to experiment a bit with this problem (it occured > twice again) and here's what I found: > > 1. So far (three occurences in total), *when* it happened, it happened only > for streaming to *one* specific C* node (but it works on this node too for > 99,9% of the time) > 2. It happens with compression turned on (cassandra.output.compression.class > set to org.apache.cassandra.io.compress.DeflateCompressor, but it doesn't > matter what the chunk length is) > 3. Everything works fine when compression is turned off. > > So it looks like I have a workaround for now, but I don't really understand > the root cause of this problem and what's the "right" solution if we want to > keep using compression. > > Anyway, the thing that interests me the most is why does it fail so rarely > and - assuming it's not a coincidence - why only for one C* node... > > May it be a DeflateCompressor's bug? > Any other ideas? > > Regards, > Michał > > > W dniu 31.03.2013 12:01, aaron morton pisze: >>> but yesterday one of 600 mappers failed >>> >> :) >> >>> From what I can understand by looking into the C* source, it seems to me >>> that the problem is caused by a empty (or surprisingly finished?) input >>> buffer (?) causing token to be set to -1 which is improper for >>> RandomPartitioner: >> Yes, there is a zero length key which as a -1 token. >> >>> However, I can't figure out what's the root cause of this problem. >>> Any ideas? >> mmm, the BulkOutputFormat uses a SSTableSimpleUnsortedWriter and neither of >> them check for zero length row keys. I would look there first. >> >> There is no validation in the AbstractSSTableSimpleWriter, not sure if that >> is by design or an oversight. Can you catch the zero length key in your map >> job ? >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Consultant >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 28/03/2013, at 2:26 PM, Michal Michalski <mich...@opera.com> wrote: >> >>> We're streaming data to Cassandra directly from MapReduce job using >>> BulkOutputFormat. It's been working for more than a year without any >>> problems, but yesterday one of 600 mappers faild and we got a >>> strange-looking exception on one of the C* nodes. >>> >>> IMPORTANT: It happens on one node and on one cluster only. We've loaded the >>> same data to test cluster and it worked. >>> >>> >>> ERROR [Thread-1340977] 2013-03-28 06:35:47,695 CassandraDaemon.java (line >>> 133) Exception in thread Thread[Thread-1340977,5,main] >>> java.lang.RuntimeException: Last written key >>> DecoratedKey(5664330507961197044404922676062547179, >>> 302c6461696c792c32303133303332352c312c646f6d61696e2c756e6971756575736572732c633a494e2c433a6d63635f6d6e635f636172726965725f43656c6c4f6e655f4b61726e6174616b615f2842616e67616c6f7265295f494e2c643a53616d73756e675f47542d49393037302c703a612c673a3133) >>> >= current key DecoratedKey(-1, ) writing into >>> /cassandra/production/IndexedValues/production-IndexedValues-tmp-ib-240346-Data.db >>> at >>> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133) >>> at >>> org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:209) >>> at >>> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179) >>> at >>> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122) >>> at >>> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226) >>> at >>> org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166) >>> at >>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) >>> >>> >>> From what I can understand by looking into the C* source, it seems to me >>> that the problem is caused by a empty (or surprisingly finished?) input >>> buffer (?) causing token to be set to -1 which is improper for >>> RandomPartitioner: >>> >>> public BigIntegerToken getToken(ByteBuffer key) >>> { >>> if (key.remaining() == 0) >>> return MINIMUM; // Which is -1 >>> return new BigIntegerToken(FBUtilities.hashToBigInteger(key)); >>> } >>> >>> However, I can't figure out what's the root cause of this problem. >>> Any ideas? >>> >>> Of course I can't exclude a bug in my code which streams these data, but - >>> as I said - it works when loading the same data to test cluster (which has >>> different number of nodes, thus different token assignment, which might be >>> a case too). >>> >>> Michał >> >> >