sstableloader prints nothing

2013-12-25 Thread Andrey Razumovsky
Hi everyone,

I'm trying to use sstableloader to import large amounts of data into my
cassandra-2.0.3 instance (single node). I've created sstables directory and
not running
> sstableloader -d localhost 

but the process just starts and prints nothing at all! I've no idea whether
it is doing anything - is there a way to see the output and progress? Or is
the process just hanging?

Thanks,
Andrey


Crash with TombstoneOverwhelmingException

2013-12-25 Thread Sanjeeth Kumar
Hi all,
  One of my cassandra nodes crashes with the following exception
periodically -
ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
(line 200) Scanned over 10 tombstones; query aborted (see
tombstone_fail_thr
eshold)
ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java (line
187) Exception in thread Thread[HintedHandoff:33,1,main]
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
at
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
at
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
at
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
at
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
at
org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Why does this happen? Does this relate to any incorrect config value?

The Cassandra Version I'm running is
ReleaseVersion: 2.0.3

- Sanjeeth


Re: Crash with TombstoneOverwhelmingException

2013-12-25 Thread Rahul Menon
Sanjeeth,

Looks like the error is being populated from the hintedhandoff, what is the
size of your hints cf?

Thanks
Rahul


On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar  wrote:

> Hi all,
>   One of my cassandra nodes crashes with the following exception
> periodically -
> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
> (line 200) Scanned over 10 tombstones; query aborted (see
> tombstone_fail_thr
> eshold)
> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
> at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
> at
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
> at
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
> at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
> at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
> at
> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
> at
> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> Why does this happen? Does this relate to any incorrect config value?
>
> The Cassandra Version I'm running is
> ReleaseVersion: 2.0.3
>
> - Sanjeeth
>
>


Re: sstableloader prints nothing

2013-12-25 Thread Andrey Razumovsky
OK, I  figured that out - turns out that my sstables were in directory
 but not in /. Would be great to
have a proper error message here..

However, I still can't import the data. The exception I get on server now
looks like this:
 WARN [STREAM-IN-/127.0.1.1] 2013-12-25 18:20:09,686 StreamSession.java
(line 519) [Stream #4ec06a70-6d6e-11e3-85ae-9b0764b01181] Retrying for
following error
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:130)
at
org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103)
at
org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:255)
at
org.apache.cassandra.streaming.StreamReader.writeRow(StreamReader.java:134)
at
org.apache.cassandra.streaming.StreamReader.read(StreamReader.java:88)
at
org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:55)
at
org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:45)
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
at java.lang.Thread.run(Thread.java:724)


Googling didn't show me the way.. Could anyone help me with this please?

Thanks,
Andrey


2013/12/25 Andrey Razumovsky 

> Hi everyone,
>
> I'm trying to use sstableloader to import large amounts of data into my
> cassandra-2.0.3 instance (single node). I've created sstables directory and
> not running
>  > sstableloader -d localhost 
>
> but the process just starts and prints nothing at all! I've no idea
> whether it is doing anything - is there a way to see the output and
> progress? Or is the process just hanging?
>
> Thanks,
> Andrey
>


Re: Crash with TombstoneOverwhelmingException

2013-12-25 Thread Edward Capriolo
I have to hijack this thread. There seem to be many problems with the 2.0.3
release.  If this exception is being generated by hinted-handoff, I could
understand where it is coming from. If you have many hints and many
tombstones then this new feature interacts with the hint delivery process,
in a bad way.

If I understand the feature correctly this feature should always be off for
the hints, because the regardless of how many tombstones are in the hints
this rule should not apply.

I want to bring up these questions:

Outside of passing all unit tests, factors into the release voting process?
What other type of extended real world testing should be done to find bugs
like this one that unit testing wont?

Not trying to call anyone out this feature/bug. I totally understand why
you would want a warning, or want to opt out of a read scanning over a
massive number of tombstones, and I think it is a smart feature. But what I
want more is to trust that every release is battle tested.

Here is a whack y idea that I am half serious about. Make a CMS for
http://cassndra.apache.org  that back ends it's data and reporting into
cassandra. No release unless Cassanda db that servers the site is upgraded
first. :)


On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon  wrote:

> Sanjeeth,
>
> Looks like the error is being populated from the hintedhandoff, what is
> the size of your hints cf?
>
> Thanks
> Rahul
>
>
> On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote:
>
>> Hi all,
>>   One of my cassandra nodes crashes with the following exception
>> periodically -
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
>> (line 200) Scanned over 10 tombstones; query aborted (see
>> tombstone_fail_thr
>> eshold)
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
>> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
>> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>> at
>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
>> at
>> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
>> at
>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
>> at
>> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>>
>> Why does this happen? Does this relate to any incorrect config value?
>>
>> The Cassandra Version I'm running is
>> ReleaseVersion: 2.0.3
>>
>> - Sanjeeth
>>
>>
>


Re: Crash with TombstoneOverwhelmingException

2013-12-25 Thread Laing, Michael
It's a feature:

In the stock cassandra.yaml file for 2.03 see:

# When executing a scan, within or across a partition, we need to keep the
> # tombstones seen in memory so we can return them to the coordinator, which
> # will use them to make sure other replicas also know about the deleted
> rows.
> # With workloads that generate a lot of tombstones, this can cause
> performance
> # problems and even exaust the server heap.
> # (
> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
> )
> # Adjust the thresholds here if you understand the dangers and want to
> # scan more tombstones anyway.  These thresholds may also be adjusted at
> runtime
> # using the StorageService mbean.
> tombstone_warn_threshold: 1000
> tombstone_failure_threshold: 10


You are hitting the failure threshold.

ml


On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon  wrote:

> Sanjeeth,
>
> Looks like the error is being populated from the hintedhandoff, what is
> the size of your hints cf?
>
> Thanks
> Rahul
>
>
> On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote:
>
>> Hi all,
>>   One of my cassandra nodes crashes with the following exception
>> periodically -
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
>> (line 200) Scanned over 10 tombstones; query aborted (see
>> tombstone_fail_thr
>> eshold)
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
>> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
>> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>> at
>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
>> at
>> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
>> at
>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
>> at
>> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>>
>> Why does this happen? Does this relate to any incorrect config value?
>>
>> The Cassandra Version I'm running is
>> ReleaseVersion: 2.0.3
>>
>> - Sanjeeth
>>
>>
>


Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.

2013-12-25 Thread horschi
Hi Ed,

my opinion on unit testing with C* is: Use the real database, not any
embedded crap :-)

All you need are fast truncates, by which I mean:
JVM_OPTS="$JVM_OPTS -Dcassandra.unsafesystem=true"
and
auto_snapshot: false

This setup works really nice for me (C* 1.1 and 1.2, have not tested 2.0
yet).

Imho this setup is better for multiple reasons:
- No extra classpath issues
- Faster: Running JUnits and C* in one JVM would require a really large
heap (for me at least).
- Faster: No Cassandra startup everytime I run my tests.

The only downside is that developers must change the properties in their
configs.

cheers,
Christian



On Tue, Dec 24, 2013 at 9:31 PM, Edward Capriolo wrote:

> I am not sure there how many people have been around developing Cassandra
> for as long as I have, but the state of all the client libraries and the
> cassandra server is WORD_I_DONT_WANT_TO_SAY.
>
> Here is an example of something I am seeing:
> ERROR 14:59:45,845 Exception in thread Thread[Thrift:5,5,main]
> java.lang.AbstractMethodError:
> org.apache.thrift.ProcessFunction.isOneway()Z
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:51)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> DEBUG 14:59:51,654 retryPolicy for schema_triggers is 0.99
>
> In short: If you are new to cassandra and only using the newest client I
> am sure everything is peachy for you.
>
> For people that have been using Cassandra for a while it is harder to
> "jump ship" when something better comes along. You need sometimes to
> support both hector and astyanax, it happens.
>
> For a while I have been using hector. Even not to use hector as an API,
> but the one nice thing I got from hector was a simple EmbeddedServer that
> would clean up after itself. Hector seems badly broken at the moment. I
> have no idea how the current versions track with anything out there in the
> cassandra world.
>
> For a while I played with https://github.com/Netflix/astyanax, which has
> it's own version and schemes and dependent libraries. (astyanax has some
> packaging error that forces me into maven3)
>
> Enter cassandra 2.0 which forces you into java 0.7. Besides that it has
> it's own kit of things it seems to want.
>
> I am guessing since hectors embedded server does not work, and I should go
> to https://github.com/jsevellec/cassandra-unit not sure...really...how
> anyone does this anymore. I am sure I could dive into the source code and
> figure this out, but I would just rather have a stable piece of code that
> brings up the embedded server that "just works" and "continues working".
>
> I can not seem to get this working right either. (since it includes hector
> I see from the pom)
>
> Between thrift, cassandra,client x, it is almost impossible to build a
> sane classpath, and that is not even counting the fact that people have
> their own classpath issues (with guava mismatches etc).
>
> I think the only sane thing to do is start shipping cassandra-embedded
> like this:
>
> https://github.com/kstyrc/embedded-redis
>
> In other words package embedded-cassandra as a binary. Don't force the
> client/application developer to bring cassandra on the classpath and fight
> with mismatches in thrift/guava etc. That or provide a completely shaded
> cassandra server for embedded testing. As it stands now trying to support a
> setup that uses more than one client or works with multiple versions of
> cassandra is major pita.  (aka library x compiled against 1.2.0 library y
> compiled against 2.0.3)
>
> Does anyone have any thoughts on this, or tried something similar?
>
> Edward
>
>


Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.

2013-12-25 Thread Joe Stein
I have been using vagrant (e.g. https://github.com/stealthly/scala-cassandra/ ) 
which is 100% reproducible across devs and test systems (prod in some cases).  
Also have a Docker setup too 
https://github.com/pegasussolutions/docker-cassandra .  I have been doing this 
more and more with clients to better mimic production before production and 
smoothing the release process from development.  I also use packer (scripts 
released soon) to build images too (http://packer.io)

Love vagrant, packer and docker!!!  Apache Mesos too :)


/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
/


On Dec 25, 2013, at 3:28 PM, horschi  wrote:

> Hi Ed,
> 
> my opinion on unit testing with C* is: Use the real database, not any 
> embedded crap :-)
> 
> All you need are fast truncates, by which I mean: 
> JVM_OPTS="$JVM_OPTS -Dcassandra.unsafesystem=true" 
> and
> auto_snapshot: false
> 
> This setup works really nice for me (C* 1.1 and 1.2, have not tested 2.0 yet).
> 
> Imho this setup is better for multiple reasons:
> - No extra classpath issues
> - Faster: Running JUnits and C* in one JVM would require a really large heap 
> (for me at least).
> - Faster: No Cassandra startup everytime I run my tests.
> 
> The only downside is that developers must change the properties in their 
> configs.
> 
> cheers,
> Christian
> 
> 
> 
> On Tue, Dec 24, 2013 at 9:31 PM, Edward Capriolo  
> wrote:
> I am not sure there how many people have been around developing Cassandra for 
> as long as I have, but the state of all the client libraries and the 
> cassandra server is WORD_I_DONT_WANT_TO_SAY.
> 
> Here is an example of something I am seeing:
> ERROR 14:59:45,845 Exception in thread Thread[Thrift:5,5,main]
> java.lang.AbstractMethodError: org.apache.thrift.ProcessFunction.isOneway()Z
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:51)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> DEBUG 14:59:51,654 retryPolicy for schema_triggers is 0.99
> 
> In short: If you are new to cassandra and only using the newest client I am 
> sure everything is peachy for you.
> 
> For people that have been using Cassandra for a while it is harder to "jump 
> ship" when something better comes along. You need sometimes to support both 
> hector and astyanax, it happens. 
> 
> For a while I have been using hector. Even not to use hector as an API, but 
> the one nice thing I got from hector was a simple EmbeddedServer that would 
> clean up after itself. Hector seems badly broken at the moment. I have no 
> idea how the current versions track with anything out there in the cassandra 
> world. 
> 
> For a while I played with https://github.com/Netflix/astyanax, which has it's 
> own version and schemes and dependent libraries. (astyanax has some packaging 
> error that forces me into maven3)
> 
> Enter cassandra 2.0 which forces you into java 0.7. Besides that it has it's 
> own kit of things it seems to want. 
> 
> I am guessing since hectors embedded server does not work, and I should go to 
> https://github.com/jsevellec/cassandra-unit not sure...really...how anyone 
> does this anymore. I am sure I could dive into the source code and figure 
> this out, but I would just rather have a stable piece of code that brings up 
> the embedded server that "just works" and "continues working".
> 
> I can not seem to get this working right either. (since it includes hector I 
> see from the pom)
> 
> Between thrift, cassandra,client x, it is almost impossible to build a sane 
> classpath, and that is not even counting the fact that people have their own 
> classpath issues (with guava mismatches etc).
> 
> I think the only sane thing to do is start shipping cassandra-embedded like 
> this:
> 
> https://github.com/kstyrc/embedded-redis
> 
> In other words package embedded-cassandra as a binary. Don't force the 
> client/application developer to bring cassandra on the classpath and fight 
> with mismatches in thrift/guava etc. That or provide a completely shaded 
> cassandra server for embedded testing. As it stands now trying to support a 
> setup that uses more than one client or works with multiple versions of 
> cassandra is major pita.  (aka library x compiled against 1.2.0 library y 
> compiled against 2.0.3)
> 
> Does anyone have any thoughts on this, or tried something similar?  
> 
> Edward
> 
>