sstableloader prints nothing
Hi everyone, I'm trying to use sstableloader to import large amounts of data into my cassandra-2.0.3 instance (single node). I've created sstables directory and not running > sstableloader -d localhost but the process just starts and prints nothing at all! I've no idea whether it is doing anything - is there a way to see the output and progress? Or is the process just hanging? Thanks, Andrey
Crash with TombstoneOverwhelmingException
Hi all, One of my cassandra nodes crashes with the following exception periodically - ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java (line 200) Scanned over 10 tombstones; query aborted (see tombstone_fail_thr eshold) ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java (line 187) Exception in thread Thread[HintedHandoff:33,1,main] org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306) at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309) at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92) at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Why does this happen? Does this relate to any incorrect config value? The Cassandra Version I'm running is ReleaseVersion: 2.0.3 - Sanjeeth
Re: Crash with TombstoneOverwhelmingException
Sanjeeth, Looks like the error is being populated from the hintedhandoff, what is the size of your hints cf? Thanks Rahul On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote: > Hi all, > One of my cassandra nodes crashes with the following exception > periodically - > ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java > (line 200) Scanned over 10 tombstones; query aborted (see > tombstone_fail_thr > eshold) > ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java > (line 187) Exception in thread Thread[HintedHandoff:33,1,main] > org.apache.cassandra.db.filter.TombstoneOverwhelmingException > at > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201) > at > org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) > at > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) > at > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) > at > org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) > at > org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306) > at > org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351) > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309) > at > org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92) > at > org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > Why does this happen? Does this relate to any incorrect config value? > > The Cassandra Version I'm running is > ReleaseVersion: 2.0.3 > > - Sanjeeth > >
Re: sstableloader prints nothing
OK, I figured that out - turns out that my sstables were in directory but not in /. Would be great to have a proper error message here.. However, I still can't import the data. The exception I get on server now looks like this: WARN [STREAM-IN-/127.0.1.1] 2013-12-25 18:20:09,686 StreamSession.java (line 519) [Stream #4ec06a70-6d6e-11e3-85ae-9b0764b01181] Retrying for following error java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:130) at org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:255) at org.apache.cassandra.streaming.StreamReader.writeRow(StreamReader.java:134) at org.apache.cassandra.streaming.StreamReader.read(StreamReader.java:88) at org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:55) at org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:45) at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54) at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287) at java.lang.Thread.run(Thread.java:724) Googling didn't show me the way.. Could anyone help me with this please? Thanks, Andrey 2013/12/25 Andrey Razumovsky > Hi everyone, > > I'm trying to use sstableloader to import large amounts of data into my > cassandra-2.0.3 instance (single node). I've created sstables directory and > not running > > sstableloader -d localhost > > but the process just starts and prints nothing at all! I've no idea > whether it is doing anything - is there a way to see the output and > progress? Or is the process just hanging? > > Thanks, > Andrey >
Re: Crash with TombstoneOverwhelmingException
I have to hijack this thread. There seem to be many problems with the 2.0.3 release. If this exception is being generated by hinted-handoff, I could understand where it is coming from. If you have many hints and many tombstones then this new feature interacts with the hint delivery process, in a bad way. If I understand the feature correctly this feature should always be off for the hints, because the regardless of how many tombstones are in the hints this rule should not apply. I want to bring up these questions: Outside of passing all unit tests, factors into the release voting process? What other type of extended real world testing should be done to find bugs like this one that unit testing wont? Not trying to call anyone out this feature/bug. I totally understand why you would want a warning, or want to opt out of a read scanning over a massive number of tombstones, and I think it is a smart feature. But what I want more is to trust that every release is battle tested. Here is a whack y idea that I am half serious about. Make a CMS for http://cassndra.apache.org that back ends it's data and reporting into cassandra. No release unless Cassanda db that servers the site is upgraded first. :) On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon wrote: > Sanjeeth, > > Looks like the error is being populated from the hintedhandoff, what is > the size of your hints cf? > > Thanks > Rahul > > > On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote: > >> Hi all, >> One of my cassandra nodes crashes with the following exception >> periodically - >> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java >> (line 200) Scanned over 10 tombstones; query aborted (see >> tombstone_fail_thr >> eshold) >> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java >> (line 187) Exception in thread Thread[HintedHandoff:33,1,main] >> org.apache.cassandra.db.filter.TombstoneOverwhelmingException >> at >> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201) >> at >> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) >> at >> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) >> at >> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) >> at >> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) >> at >> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) >> at >> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487) >> at >> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306) >> at >> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351) >> at >> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309) >> at >> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92) >> at >> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:744) >> >> Why does this happen? Does this relate to any incorrect config value? >> >> The Cassandra Version I'm running is >> ReleaseVersion: 2.0.3 >> >> - Sanjeeth >> >> >
Re: Crash with TombstoneOverwhelmingException
It's a feature: In the stock cassandra.yaml file for 2.03 see: # When executing a scan, within or across a partition, we need to keep the > # tombstones seen in memory so we can return them to the coordinator, which > # will use them to make sure other replicas also know about the deleted > rows. > # With workloads that generate a lot of tombstones, this can cause > performance > # problems and even exaust the server heap. > # ( > http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets > ) > # Adjust the thresholds here if you understand the dangers and want to > # scan more tombstones anyway. These thresholds may also be adjusted at > runtime > # using the StorageService mbean. > tombstone_warn_threshold: 1000 > tombstone_failure_threshold: 10 You are hitting the failure threshold. ml On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon wrote: > Sanjeeth, > > Looks like the error is being populated from the hintedhandoff, what is > the size of your hints cf? > > Thanks > Rahul > > > On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote: > >> Hi all, >> One of my cassandra nodes crashes with the following exception >> periodically - >> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java >> (line 200) Scanned over 10 tombstones; query aborted (see >> tombstone_fail_thr >> eshold) >> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java >> (line 187) Exception in thread Thread[HintedHandoff:33,1,main] >> org.apache.cassandra.db.filter.TombstoneOverwhelmingException >> at >> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201) >> at >> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) >> at >> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) >> at >> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) >> at >> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) >> at >> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) >> at >> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487) >> at >> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306) >> at >> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351) >> at >> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309) >> at >> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92) >> at >> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:744) >> >> Why does this happen? Does this relate to any incorrect config value? >> >> The Cassandra Version I'm running is >> ReleaseVersion: 2.0.3 >> >> - Sanjeeth >> >> >
Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.
Hi Ed, my opinion on unit testing with C* is: Use the real database, not any embedded crap :-) All you need are fast truncates, by which I mean: JVM_OPTS="$JVM_OPTS -Dcassandra.unsafesystem=true" and auto_snapshot: false This setup works really nice for me (C* 1.1 and 1.2, have not tested 2.0 yet). Imho this setup is better for multiple reasons: - No extra classpath issues - Faster: Running JUnits and C* in one JVM would require a really large heap (for me at least). - Faster: No Cassandra startup everytime I run my tests. The only downside is that developers must change the properties in their configs. cheers, Christian On Tue, Dec 24, 2013 at 9:31 PM, Edward Capriolo wrote: > I am not sure there how many people have been around developing Cassandra > for as long as I have, but the state of all the client libraries and the > cassandra server is WORD_I_DONT_WANT_TO_SAY. > > Here is an example of something I am seeing: > ERROR 14:59:45,845 Exception in thread Thread[Thrift:5,5,main] > java.lang.AbstractMethodError: > org.apache.thrift.ProcessFunction.isOneway()Z > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:51) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > DEBUG 14:59:51,654 retryPolicy for schema_triggers is 0.99 > > In short: If you are new to cassandra and only using the newest client I > am sure everything is peachy for you. > > For people that have been using Cassandra for a while it is harder to > "jump ship" when something better comes along. You need sometimes to > support both hector and astyanax, it happens. > > For a while I have been using hector. Even not to use hector as an API, > but the one nice thing I got from hector was a simple EmbeddedServer that > would clean up after itself. Hector seems badly broken at the moment. I > have no idea how the current versions track with anything out there in the > cassandra world. > > For a while I played with https://github.com/Netflix/astyanax, which has > it's own version and schemes and dependent libraries. (astyanax has some > packaging error that forces me into maven3) > > Enter cassandra 2.0 which forces you into java 0.7. Besides that it has > it's own kit of things it seems to want. > > I am guessing since hectors embedded server does not work, and I should go > to https://github.com/jsevellec/cassandra-unit not sure...really...how > anyone does this anymore. I am sure I could dive into the source code and > figure this out, but I would just rather have a stable piece of code that > brings up the embedded server that "just works" and "continues working". > > I can not seem to get this working right either. (since it includes hector > I see from the pom) > > Between thrift, cassandra,client x, it is almost impossible to build a > sane classpath, and that is not even counting the fact that people have > their own classpath issues (with guava mismatches etc). > > I think the only sane thing to do is start shipping cassandra-embedded > like this: > > https://github.com/kstyrc/embedded-redis > > In other words package embedded-cassandra as a binary. Don't force the > client/application developer to bring cassandra on the classpath and fight > with mismatches in thrift/guava etc. That or provide a completely shaded > cassandra server for embedded testing. As it stands now trying to support a > setup that uses more than one client or works with multiple versions of > cassandra is major pita. (aka library x compiled against 1.2.0 library y > compiled against 2.0.3) > > Does anyone have any thoughts on this, or tried something similar? > > Edward > >
Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.
I have been using vagrant (e.g. https://github.com/stealthly/scala-cassandra/ ) which is 100% reproducible across devs and test systems (prod in some cases). Also have a Docker setup too https://github.com/pegasussolutions/docker-cassandra . I have been doing this more and more with clients to better mimic production before production and smoothing the release process from development. I also use packer (scripts released soon) to build images too (http://packer.io) Love vagrant, packer and docker!!! Apache Mesos too :) /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Dec 25, 2013, at 3:28 PM, horschi wrote: > Hi Ed, > > my opinion on unit testing with C* is: Use the real database, not any > embedded crap :-) > > All you need are fast truncates, by which I mean: > JVM_OPTS="$JVM_OPTS -Dcassandra.unsafesystem=true" > and > auto_snapshot: false > > This setup works really nice for me (C* 1.1 and 1.2, have not tested 2.0 yet). > > Imho this setup is better for multiple reasons: > - No extra classpath issues > - Faster: Running JUnits and C* in one JVM would require a really large heap > (for me at least). > - Faster: No Cassandra startup everytime I run my tests. > > The only downside is that developers must change the properties in their > configs. > > cheers, > Christian > > > > On Tue, Dec 24, 2013 at 9:31 PM, Edward Capriolo > wrote: > I am not sure there how many people have been around developing Cassandra for > as long as I have, but the state of all the client libraries and the > cassandra server is WORD_I_DONT_WANT_TO_SAY. > > Here is an example of something I am seeing: > ERROR 14:59:45,845 Exception in thread Thread[Thrift:5,5,main] > java.lang.AbstractMethodError: org.apache.thrift.ProcessFunction.isOneway()Z > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:51) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > DEBUG 14:59:51,654 retryPolicy for schema_triggers is 0.99 > > In short: If you are new to cassandra and only using the newest client I am > sure everything is peachy for you. > > For people that have been using Cassandra for a while it is harder to "jump > ship" when something better comes along. You need sometimes to support both > hector and astyanax, it happens. > > For a while I have been using hector. Even not to use hector as an API, but > the one nice thing I got from hector was a simple EmbeddedServer that would > clean up after itself. Hector seems badly broken at the moment. I have no > idea how the current versions track with anything out there in the cassandra > world. > > For a while I played with https://github.com/Netflix/astyanax, which has it's > own version and schemes and dependent libraries. (astyanax has some packaging > error that forces me into maven3) > > Enter cassandra 2.0 which forces you into java 0.7. Besides that it has it's > own kit of things it seems to want. > > I am guessing since hectors embedded server does not work, and I should go to > https://github.com/jsevellec/cassandra-unit not sure...really...how anyone > does this anymore. I am sure I could dive into the source code and figure > this out, but I would just rather have a stable piece of code that brings up > the embedded server that "just works" and "continues working". > > I can not seem to get this working right either. (since it includes hector I > see from the pom) > > Between thrift, cassandra,client x, it is almost impossible to build a sane > classpath, and that is not even counting the fact that people have their own > classpath issues (with guava mismatches etc). > > I think the only sane thing to do is start shipping cassandra-embedded like > this: > > https://github.com/kstyrc/embedded-redis > > In other words package embedded-cassandra as a binary. Don't force the > client/application developer to bring cassandra on the classpath and fight > with mismatches in thrift/guava etc. That or provide a completely shaded > cassandra server for embedded testing. As it stands now trying to support a > setup that uses more than one client or works with multiple versions of > cassandra is major pita. (aka library x compiled against 1.2.0 library y > compiled against 2.0.3) > > Does anyone have any thoughts on this, or tried something similar? > > Edward > >