Secondary index data gone after restart (1.1.1)

2012-06-26 Thread Ivo Meißner
Hi,

I am running into some problems with secondary indexes that I am unable to 
track down. When I restart the cassandra service, the secondary index data 
won't load and I get the following error during startup: 

INFO 08:29:42,127 Opening 
/var/myproject/cassandra/data/mykeyspace/group_admin/mykeyspace-group_admin.group_admin_groupId_idx-hd-1
 (20808 bytes)
ERROR 08:29:42,159 Exception in thread Thread[SSTableBatchOpen:1,5,main]
java.lang.ClassCastException: java.math.BigInteger cannot be cast to 
java.nio.ByteBuffer
at 
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:37)
at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:45)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:89)
at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38)
at java.util.TreeMap.getEntry(TreeMap.java:328)
at java.util.TreeMap.containsKey(TreeMap.java:209)
at java.util.TreeSet.contains(TreeSet.java:217)
at 
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:396)
at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:187)
at 
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:225)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

When the service starts I can still select data from the column family, but not 
using the secondary index. 
After I execute "nodetool rebuild_index" the secondary index works fine again 
until the next restart. 

The error only seems to occur on the column groupId (TimeUUIDType). The other 
index on userId seems to work. 

I have the following column family definition: 

create column family group_admin with
  comparator = UTF8Type and
  key_validation_class = UTF8Type and
  column_metadata = [
{column_name: id, validation_class: UTF8Type},
{column_name: added, validation_class: LongType},
{column_name: userId, validation_class: BytesType, index_type: KEYS},
{column_name: requestMessage, validation_class: UTF8Type},
{column_name: status, validation_class: LongType},
{column_name: groupId, validation_class: TimeUUIDType, index_type: KEYS}
  ];

Thank you very much for your help!

Ivo

Re: Secondary index data gone after restart (1.1.1)

2012-06-26 Thread Fei Shan
Hi

   please refer JDK nio package's ByteBuffer, I don't think that ByteBuffer
can be cast from the BigInteger directly,
it seems you need make some conversion before put it into a ByteBuffer.

Thanks
Fei

On Tue, Jun 26, 2012 at 12:07 AM, Ivo Meißner  wrote:

> Hi,
>
> I am running into some problems with secondary indexes that I am unable to
> track down. When I restart the cassandra service, the secondary index data
> won't load and I get the following error during startup:
>
> INFO 08:29:42,127 Opening
> /var/myproject/cassandra/data/mykeyspace/group_admin/mykeyspace-group_admin.group_admin_groupId_idx-hd-1
> (20808 bytes)
> ERROR 08:29:42,159 Exception in thread Thread[SSTableBatchOpen:1,5,main]
> java.lang.ClassCastException: java.math.BigInteger cannot be cast to
> java.nio.ByteBuffer
> at
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:37)
> at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:45)
> at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:89)
> at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38)
> at java.util.TreeMap.getEntry(TreeMap.java:328)
> at java.util.TreeMap.containsKey(TreeMap.java:209)
> at java.util.TreeSet.contains(TreeSet.java:217)
> at
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:396)
> at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:187)
> at
> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:225)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
> When the service starts I can still select data from the column family,
> but not using the secondary index.
> After I execute "nodetool rebuild_index" the secondary index works fine
> again until the next restart.
>
> The error only seems to occur on the column groupId (TimeUUIDType). The
> other index on userId seems to work.
>
> I have the following column family definition:
>
> create column family group_admin with
>   comparator = UTF8Type and
>   key_validation_class = UTF8Type and
>   column_metadata = [
> {column_name: id, validation_class: UTF8Type},
> {column_name: added, validation_class: LongType},
> {column_name: userId, validation_class: BytesType, index_type: KEYS},
> {column_name: requestMessage, validation_class: UTF8Type},
> {column_name: status, validation_class: LongType},
> {column_name: groupId, validation_class: TimeUUIDType, index_type:
> KEYS}
>   ];
>
> Thank you very much for your help!
>
> Ivo
>


How to use row caching to enable faster retrieval of rows in Cassandra

2012-06-26 Thread Prakrati Agrawal
Dear all,

I am trying to understand whether I can fasten the retrieval process using 
cache. Please can you help me write the code for setting the cache properties 
in Cassandra.
Please help

Thanks and Regards
Prakrati




This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Secondary index data gone after restart (1.1.1)

2012-06-26 Thread Ivo Meißner
Hi,

but if the data must be converted, this is something that should be fixed 
inside cassandra… Is this a bug, should I file a bug report?
Or is there some kind of setting I can change to make it work for now?

Maybe it is related to this issue, but this should have been fixed in 1.1.0:

https://issues.apache.org/jira/browse/CASSANDRA-3954

Thanks
Ivo


Am 26.06.2012 um 09:26 schrieb Fei Shan:

> Hi
> 
>please refer JDK nio package's ByteBuffer, I don't think that ByteBuffer 
> can be cast from the BigInteger directly, 
> it seems you need make some conversion before put it into a ByteBuffer.
> 
> Thanks
> Fei
> 
> On Tue, Jun 26, 2012 at 12:07 AM, Ivo Meißner  wrote:
> Hi,
> 
> I am running into some problems with secondary indexes that I am unable to 
> track down. When I restart the cassandra service, the secondary index data 
> won't load and I get the following error during startup: 
> 
> INFO 08:29:42,127 Opening 
> /var/myproject/cassandra/data/mykeyspace/group_admin/mykeyspace-group_admin.group_admin_groupId_idx-hd-1
>  (20808 bytes)
> ERROR 08:29:42,159 Exception in thread Thread[SSTableBatchOpen:1,5,main]
> java.lang.ClassCastException: java.math.BigInteger cannot be cast to 
> java.nio.ByteBuffer
>   at 
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:37)
>   at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:45)
>   at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:89)
>   at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38)
>   at java.util.TreeMap.getEntry(TreeMap.java:328)
>   at java.util.TreeMap.containsKey(TreeMap.java:209)
>   at java.util.TreeSet.contains(TreeSet.java:217)
>   at 
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:396)
>   at 
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:187)
>   at 
> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:225)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> 
> When the service starts I can still select data from the column family, but 
> not using the secondary index. 
> After I execute "nodetool rebuild_index" the secondary index works fine again 
> until the next restart. 
> 
> The error only seems to occur on the column groupId (TimeUUIDType). The other 
> index on userId seems to work. 
> 
> I have the following column family definition: 
> 
> create column family group_admin with
>   comparator = UTF8Type and
>   key_validation_class = UTF8Type and
>   column_metadata = [
> {column_name: id, validation_class: UTF8Type},
> {column_name: added, validation_class: LongType},
> {column_name: userId, validation_class: BytesType, index_type: KEYS},
> {column_name: requestMessage, validation_class: UTF8Type},
> {column_name: status, validation_class: LongType},
> {column_name: groupId, validation_class: TimeUUIDType, index_type: KEYS}
>   ];
> 
> Thank you very much for your help!
> 
> Ivo
> 




Re: Request Timeout with Composite Columns and CQL3

2012-06-26 Thread Sylvain Lebresne
On Mon, Jun 25, 2012 at 11:10 PM, Henning Kropp  wrote:
> Hi,
>
> I am running into timeout issues using composite columns in cassandra 1.1.1
> and cql 3.
>
> My keyspace and table is defined as the following:
>
> create keyspace bn_logs
>     with strategy_options = [{replication_factor:1}]
>     and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
>
> CREATE TABLE logs (
>   id text,
>   ref text,
>   time bigint,
>   datum text,
>   PRIMARY KEY(id, ref, time)
> );
>
> I import some data to the table by using a combination of the thrift
> interface and the hector Composite.class by using its serialization as the
> column name:
>
> Column col = new Column(composite.serialize());
>
> This all seems to work fine until I try to execute the following query which
> leads to a request timeout:
>
> SELECT datum FROM logs WHERE id='861' and ref = 'raaf' and time > '3000';

If it timeouts the likely reason is that this query selects more data
than the machine is able to fetch before the timeout. You can either
add a limit to the query, or increase the timeout.
If that doesn't seem to fix it, it might be worth checking the server
log to see if there isn't an error.

> I really would like to figure out, why running this query on my laptop
> (single node, for development) will not finish. I also would like to know if
> the following query would actually work
>
> SELECT datum FROM logs WHERE id='861' and ref = 'raaf*' and time > '3000';

It won't. You can perform the following query:

SELECT datum FROM logs WHERE id='861' and ref = 'raaf';

which will select every datum whose ref starts with 'raaf', but then
you cannot restrict
the time parameter, so you will get ref where the time is <= 3000. Of
course you can
always filter client side if that is an option.

> or how else there is a way to define a range for the second component of the
> column key?

As described above, you can define a range on the second component, but then you
won't be able to restrict on the 3rd component.

>
> Any thoughts?
>
> Thanks in advance and kind regards
> Henning
>


Re: Removing a counter columns using Thrift interface

2012-06-26 Thread Patrik Modesto
On Mon, Jun 25, 2012 at 9:28 AM, Sylvain Lebresne  wrote:
> On Mon, Jun 25, 2012 at 9:06 AM, Patrik Modesto
>  wrote:
>> I'm used to use Mutation for everything, so the first thing I tried
>> was Deletion on Counter column. Well, nothing happened. No error and
>> the Counter column was still there.
>
> That shouldn't happen.
>
>> The second try was the remove_counter() method. When I set just the
>> column_family of ColumnPath, nothing happened. No error and the
>> Counter column was still there. I supposed it would work like the
>> remove() method which would remove whole row.
>
> It should. If if it doesn't that would be a bug. If you can reproduce
> such a bug, then please do open a ticket.

I've tried again today and found one bug in my test program. Now
Deletion works as expected. The remove_counter() works as well, I
misinterpreted the results.

Regards,
P.


Create column family fail

2012-06-26 Thread Juan Ezquerro
Hi,

I create this column family:


CREATE COLUMN FAMILY Clients
WITH column_type='Super'
AND key_validation_class = LongType -- master_id
AND comparator = LongType -- client_id
AND subcomparator = UTF8Type
AND column_metadata = [
{column_name: client_name, validation_class: UTF8Type}
];

But column metadata is not saved, as you can see with cassandra-cli:


create column family Clients
  with column_type = 'Super'
  and comparator = 'BytesType'
  and subcomparator = 'BytesType'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'LongType'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY'
  and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'}

Only happend with that column family and i don't know why. Comparator and
subcomparator look bad too.

Can help please?

-- 
Juan Ezquerro LLanes 

Telf: 618349107/964051479


Re: Create column family fail

2012-06-26 Thread Juan Ezquerro
Ok the '--' was the problem ... LOL

2012/6/26 Juan Ezquerro 

> Hi,
>
> I create this column family:
>
>
> CREATE COLUMN FAMILY Clients
> WITH column_type='Super'
> AND key_validation_class = LongType -- master_id
> AND comparator = LongType -- client_id
> AND subcomparator = UTF8Type
> AND column_metadata = [
> {column_name: client_name, validation_class: UTF8Type}
> ];
>
> But column metadata is not saved, as you can see with cassandra-cli:
>
>
> create column family Clients
>   with column_type = 'Super'
>   and comparator = 'BytesType'
>   and subcomparator = 'BytesType'
>   and default_validation_class = 'BytesType'
>   and key_validation_class = 'LongType'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and compression_options = {'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
>
> Only happend with that column family and i don't know why. Comparator and
> subcomparator look bad too.
>
> Can help please?
>
> --
> Juan Ezquerro LLanes 
>
> Telf: 618349107/964051479
>
>


-- 
Juan Ezquerro LLanes 

Telf: 618349107/964051479


AW: Request Timeout with Composite Columns and CQL3

2012-06-26 Thread Henning Kropp
Thanks for the reply. Should have thought about looking into the log files 
sooner. An AssertionError happens at execution. I haven't figured out yet why. 
Any input is very much appreciated:

ERROR [ReadStage:1] 2012-06-26 15:49:54,481 AbstractCassandraDaemon.java (line 
134) Exception in thread Thread[ReadStage:1,5,main]
java.lang.AssertionError: Added column does not sort as the last column
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:130)
at 
org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:107)
at 
org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:102)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:141)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:139)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:283)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:63)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1321)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1183)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1118)
at org.apache.cassandra.db.Table.getRow(Table.java:374)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:816)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1250)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


BTW: I really would love to understand as of why the combined comparator will 
not allow two ranges be specified for two key parts. Obviously I still lack a 
profound understanding of cassandras architecture to have a clue.
And while client side filtering might seem like a valid option I am still 
trying to get might head around a cassandra data model that would allow this.

best regards
 

Von: Sylvain Lebresne [sylv...@datastax.com]
Gesendet: Dienstag, 26. Juni 2012 10:21
Bis: user@cassandra.apache.org
Betreff: Re: Request Timeout with Composite Columns and CQL3

On Mon, Jun 25, 2012 at 11:10 PM, Henning Kropp  wrote:
> Hi,
>
> I am running into timeout issues using composite columns in cassandra 1.1.1
> and cql 3.
>
> My keyspace and table is defined as the following:
>
> create keyspace bn_logs
> with strategy_options = [{replication_factor:1}]
> and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
>
> CREATE TABLE logs (
>   id text,
>   ref text,
>   time bigint,
>   datum text,
>   PRIMARY KEY(id, ref, time)
> );
>
> I import some data to the table by using a combination of the thrift
> interface and the hector Composite.class by using its serialization as the
> column name:
>
> Column col = new Column(composite.serialize());
>
> This all seems to work fine until I try to execute the following query which
> leads to a request timeout:
>
> SELECT datum FROM logs WHERE id='861' and ref = 'raaf' and time > '3000';

If it timeouts the likely reason is that this query selects more data
than the machine is able to fetch before the timeout. You can either
add a limit to the query, or increase the timeout.
If that doesn't seem to fix it, it might be worth checking the server
log to see if there isn't an error.

> I really would like to figure out, why running this query on my laptop
> (single node, for development) will not finish. I also would like to know if
> the following query would actually work
>
> SELECT datum FROM logs WHERE id='861' and ref = 'raaf*' and time > '3000';

It won't. You can perform the following query:

SELECT datum FROM logs WHERE id='861' and ref = 'raaf';

which will select every datum whose ref starts with 'raaf', but then
you cannot restrict
the time parameter, so you will get ref where the time is <= 3000. Of
course you can
always filter client side if that is an option.

> or how else there is a way to define a range for the second component of the
> column key?

As described above, you can define a range on the second component, but then you
won't be able to restrict on the 3rd component.

>
> Any thoughts?
>
> Thanks in advance and kind regards
> Henning
>


Cassandra and massive TTL expirations cause HEAP issue

2012-06-26 Thread Nils Pommerien
Hello,
I am evaluating Cassandra in a log retrieval application.  My ring conists of3 
m2.xlarge instances (17.1 GB memory, 6.5 ECU (2 virtual cores with 3.25 EC2 
Compute Units each), 420 GB of local instance storage, 64-bit platform) and I 
am writing at roughly 220 writes/sec.  Per day I am adding roughly 60GB of 
data.  All of this sounds simple and easy and all three nodes are humming along 
with basically no load.

The issue is that I am writing all my data with a TTL of 10 days.  After 10 
days my cluster crashes due to a java.lang.OutOfMemoryError during compaction 
of the big column family that contains roughly 95% of the data.  So basically 
after 10 days my data set is 600GB and after 10 days Cassandra would have to 
tombstone and purge 60GB of data at the same rate of roughly 220 
deletes/second.  I am not sure if Cassandra should be able to do it, whether I 
should take a partitioning approach (one CF per day), or if there is simply 
some tweaks I need to make in the yaml file.  I have tried:

 1.  Decrease flush-largest-memtables-at to .4
 2.  reduce_cache_sizes_at and reduce_cache_capacity_to set to 1

Now, the issue remains the same:

WARN [ScheduledTasks:1] 2012-06-11 19:39:42,017 GCInspector.java (line 145) 
Heap is 0.9920103380107628 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically.

Eventually it will just die with this message.  This affects all nodes in the 
cluster, not just one.

Dump file is incomplete: file size limit
ERROR 19:39:39,695 Exception in thread Thread[ReadStage:134,5,main]
java.lang.OutOfMemoryError: Java heap space
ERROR 19:39:39,724 Exception in thread Thread[MutationStage:57,5,main]
java.lang.OutOfMemoryError: Java heap space
  at 
org.apache.cassandra.utils.FBUtilities.hashToBigInteger(FBUtilities.java:213)
  at 
org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:154)
  at 
org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:47)
  at org.apache.cassandra.db.RowPosition.forKey(RowPosition.java:54)

Any help is highly appreciated.  It would be cool to tweak it in a way that I 
can have a moving window of 10 days in Cassandra while dropping the old data… 
Or, if there is any other recommended way to deal with such sliding time 
windows I am open for ideas.

Thank you for your help!


Multi datacenter, WAN hiccups and replication

2012-06-26 Thread Karthik N
My Cassandra ring spans two DCs. I use local quorum with replication
factor=3. I do a write in DC1 with local quorum. Data gets written to
multiple nodes in DC1. For the same write to propagate to DC2 only one
copy is sent from the coordinator node in DC1 to a coordinator node in
DC2 for optimizing traffic over the WAN (from what I have read in the
Cassandra documentation)

Will a Wan hiccup result in a Hinted Handoff (HH) being created in
DC1's coordinator for DC2 to be delivered when the Wan link is up
again?


Re: Multi datacenter, WAN hiccups and replication

2012-06-26 Thread Mohit Anchlia
On Tue, Jun 26, 2012 at 7:52 AM, Karthik N  wrote:

> My Cassandra ring spans two DCs. I use local quorum with replication
> factor=3. I do a write in DC1 with local quorum. Data gets written to
> multiple nodes in DC1. For the same write to propagate to DC2 only one
> copy is sent from the coordinator node in DC1 to a coordinator node in
> DC2 for optimizing traffic over the WAN (from what I have read in the
> Cassandra documentation)
>
> Will a Wan hiccup result in a Hinted Handoff (HH) being created in
> DC1's coordinator for DC2 to be delivered when the Wan link is up
> again?
>

I have seen hinted handoff messages in the log files when the remote DC is
unreachable. But this mechanism is only used for a the time defined in
cassandra.yaml file.


Re: Request Timeout with Composite Columns and CQL3

2012-06-26 Thread Sylvain Lebresne
On Tue, Jun 26, 2012 at 4:00 PM, Henning Kropp  wrote:
> Thanks for the reply. Should have thought about looking into the log files 
> sooner. An AssertionError happens at execution. I haven't figured out yet 
> why. Any input is very much appreciated:
>
> ERROR [ReadStage:1] 2012-06-26 15:49:54,481 AbstractCassandraDaemon.java 
> (line 134) Exception in thread Thread[ReadStage:1,5,main]
> java.lang.AssertionError: Added column does not sort as the last column
>        at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:130)
>        at 
> org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:107)
>        at 
> org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:102)
>        at 
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:141)
>        at 
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:139)
>        at 
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:283)
>        at 
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:63)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1321)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1183)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1118)
>        at org.apache.cassandra.db.Table.getRow(Table.java:374)
>        at 
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
>        at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:816)
>        at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1250)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)

Obviously that shouldn't happen. You didn't happen to change the
comparator for the column family or something like that from the
hector side?
Are you able to reproduce from a blank DB?

--
Sylvain

>
>
> BTW: I really would love to understand as of why the combined comparator will 
> not allow two ranges be specified for two key parts. Obviously I still lack a 
> profound understanding of cassandras architecture to have a clue.
> And while client side filtering might seem like a valid option I am still 
> trying to get might head around a cassandra data model that would allow this.
>
> best regards
>
> 
> Von: Sylvain Lebresne [sylv...@datastax.com]
> Gesendet: Dienstag, 26. Juni 2012 10:21
> Bis: user@cassandra.apache.org
> Betreff: Re: Request Timeout with Composite Columns and CQL3
>
> On Mon, Jun 25, 2012 at 11:10 PM, Henning Kropp  wrote:
>> Hi,
>>
>> I am running into timeout issues using composite columns in cassandra 1.1.1
>> and cql 3.
>>
>> My keyspace and table is defined as the following:
>>
>> create keyspace bn_logs
>>     with strategy_options = [{replication_factor:1}]
>>     and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
>>
>> CREATE TABLE logs (
>>   id text,
>>   ref text,
>>   time bigint,
>>   datum text,
>>   PRIMARY KEY(id, ref, time)
>> );
>>
>> I import some data to the table by using a combination of the thrift
>> interface and the hector Composite.class by using its serialization as the
>> column name:
>>
>> Column col = new Column(composite.serialize());
>>
>> This all seems to work fine until I try to execute the following query which
>> leads to a request timeout:
>>
>> SELECT datum FROM logs WHERE id='861' and ref = 'raaf' and time > '3000';
>
> If it timeouts the likely reason is that this query selects more data
> than the machine is able to fetch before the timeout. You can either
> add a limit to the query, or increase the timeout.
> If that doesn't seem to fix it, it might be worth checking the server
> log to see if there isn't an error.
>
>> I really would like to figure out, why running this query on my laptop
>> (single node, for development) will not finish. I also would like to know if
>> the following query would actually work
>>
>> SELECT datum FROM logs WHERE id='861' and ref = 'raaf*' and time > '3000';
>
> It won't. You can perform the following query:
>
> SELECT datum FROM logs WHERE id='861' and ref = 'raaf';
>
> which will select every datum whose ref starts with 'raaf', but then
> you cannot restrict
> the time parameter, so you will get ref where the time is <= 3000. Of
> course you can
> always filter client side if that is an option.
>
>> or how else there is a way to define a range for the second component of the
>> column key?
>
> As described above, you can define a range on the second compon

Re: Multi datacenter, WAN hiccups and replication

2012-06-26 Thread Karthik N
Since Cassandra optimizes and sends only one copy over the WAN, can I opt
in only for HH for WAN replication and avoid HH for the local quorum?
(since I know I have more copies)

On Tuesday, June 26, 2012, Mohit Anchlia wrote:

>
> On Tue, Jun 26, 2012 at 7:52 AM, Karthik N 
> 
> > wrote:
>
>> My Cassandra ring spans two DCs. I use local quorum with replication
>> factor=3. I do a write in DC1 with local quorum. Data gets written to
>> multiple nodes in DC1. For the same write to propagate to DC2 only one
>> copy is sent from the coordinator node in DC1 to a coordinator node in
>> DC2 for optimizing traffic over the WAN (from what I have read in the
>> Cassandra documentation)
>>
>> Will a Wan hiccup result in a Hinted Handoff (HH) being created in
>> DC1's coordinator for DC2 to be delivered when the Wan link is up
>> again?
>>
>
> I have seen hinted handoff messages in the log files when the remote DC is
> unreachable. But this mechanism is only used for a the time defined in
> cassandra.yaml file.



-- 
Thanks, Karthik


Re: Multi datacenter, WAN hiccups and replication

2012-06-26 Thread Mohit Anchlia
On Tue, Jun 26, 2012 at 8:16 AM, Karthik N  wrote:

> Since Cassandra optimizes and sends only one copy over the WAN, can I opt
> in only for HH for WAN replication and avoid HH for the local quorum?
> (since I know I have more copies)
>
>
>
I am not sure if I understand your question. In general I don't think you
can selectively decide on HH. Besides HH should only be used when the
outage is in mts, for longer outages using HH would only create memory
pressure.

>  On Tuesday, June 26, 2012, Mohit Anchlia wrote:
>
>>
>> On Tue, Jun 26, 2012 at 7:52 AM, Karthik N  wrote:
>>
>>> My Cassandra ring spans two DCs. I use local quorum with replication
>>> factor=3. I do a write in DC1 with local quorum. Data gets written to
>>> multiple nodes in DC1. For the same write to propagate to DC2 only one
>>> copy is sent from the coordinator node in DC1 to a coordinator node in
>>> DC2 for optimizing traffic over the WAN (from what I have read in the
>>> Cassandra documentation)
>>>
>>> Will a Wan hiccup result in a Hinted Handoff (HH) being created in
>>> DC1's coordinator for DC2 to be delivered when the Wan link is up
>>> again?
>>>
>>
>> I have seen hinted handoff messages in the log files when the remote DC
>> is unreachable. But this mechanism is only used for a the time defined in
>> cassandra.yaml file.
>
>
>
> --
> Thanks, Karthik
>


Re: Consistency Problem with Quorum consistencyLevel configuration

2012-06-26 Thread Jason Tang
Hi
  After enable Cassandra debug log, I got following log, it shows the
delete mutation send to other two nodes rather then local node.
  And then the read command come to the local nodes.
  And local one found the mismatch.
  But I don't know why local node return the local dirty data. It supposed
to repair the data, and return correct one?

192.168.0.6:
DEBUG [MutationStage:61] 2012-06-26 23:09:00,036
RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc',
key='33323130537570657254616e6730', modifications=[ColumnFamily(queue
-deleted at 1340723340044000- [])]) applied.  Sending response to 3555@/
192.168.0.5

192.168.0.4:
DEBUG [MutationStage:40] 2012-06-26 23:09:00,041
RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc',
key='33323130537570657254616e6730', modifications=[ColumnFamily(queue
-deleted at 1340723340044000- [])]) applied.  Sending response to 3556@/
192.168.0.5

192.168.0.5 (local one):
DEBUG [pool-2-thread-20] 2012-06-26 23:09:00,105 StorageProxy.java (line
705) Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
Mismatch for key DecoratedKey(7649972972837658739074639933581556,
33323130537570657254616e6730) (b20ac6ec0d29393d70e200027c094d13 vs
d41d8cd98f00b204e9800998ecf8427e)



2012/6/25 Jason Tang 

> Hi
>
> I met the consistency problem when we have Quorum for both read and
> write.
>
> I use MultigetSubSliceQuery to query rows from super column limit size
> 100, and then read it, then delete it. And start another around.
>
> But I found, the row which should be delete by last query, it still
> shown from next around query.
>
> And also form normal Column Family, I updated the value of one column
> from status='FALSE' to status='TURE', and next time I query it, the status
> still 'FALSE'.
>
> More detail:
>
>- It not happened not every time (1/10,000)
>- The time between two round query is around 500 ms (but we found two
>query which 2 seconds happened later then the first one, still have this
>consistency problem)
>- We use ntp as our cluster time synchronization solution.
>- We have 6 nodes, and replication factor is 3
>
> Some body say, Cassandra suppose to have such problem, because read
> may not happen before write inside Cassandra. But for two seconds?! And if
> so, it meaningless to have Quorum or other consistency level configuration.
>
>So first of all, is it the correct behavior of Cassandra, and if not,
> what data we need to analyze for further investment.
>
> BRs
> Ares
>


Re: Multi datacenter, WAN hiccups and replication

2012-06-26 Thread Karthik N
Let me attempt to articulate my question a little better.

Say I choose LOCAL_QUORUM with a Replication Factor of 3. Cassandra
stores three copies in my local datacenter. Therefore the cost
associated with "losing" one node is not very high locally, and I
usually HH, and use read repair/nodetool repair instead.

However over the WAN network blips are quite normal and HH really
helps. More so because for WAN replication Cassandra sends only one
copy to a coordinator in the remote datacenter.

Therefore I was wondering if Cassandra already intelligently optimizes
for HH-over-WAN (since this is common) or alternately if there's a way
to enable HH for WAN replication?

Thank you.

On Tue, Jun 26, 2012 at 9:22 AM, Mohit Anchlia  wrote:
>
>
> On Tue, Jun 26, 2012 at 8:16 AM, Karthik N  wrote:
>>
>> Since Cassandra optimizes and sends only one copy over the WAN, can I opt
>> in only for HH for WAN replication and avoid HH for the local quorum? (since
>> I know I have more copies)
>>
>>
>
> I am not sure if I understand your question. In general I don't think you
> can selectively decide on HH. Besides HH should only be used when the outage
> is in mts, for longer outages using HH would only create memory pressure.
>>
>> On Tuesday, June 26, 2012, Mohit Anchlia wrote:
>>>
>>>
>>> On Tue, Jun 26, 2012 at 7:52 AM, Karthik N  wrote:

 My Cassandra ring spans two DCs. I use local quorum with replication
 factor=3. I do a write in DC1 with local quorum. Data gets written to
 multiple nodes in DC1. For the same write to propagate to DC2 only one
 copy is sent from the coordinator node in DC1 to a coordinator node in
 DC2 for optimizing traffic over the WAN (from what I have read in the
 Cassandra documentation)

 Will a Wan hiccup result in a Hinted Handoff (HH) being created in
 DC1's coordinator for DC2 to be delivered when the Wan link is up
 again?
>>>
>>>
>>> I have seen hinted handoff messages in the log files when the remote DC
>>> is unreachable. But this mechanism is only used for a the time defined in
>>> cassandra.yaml file.
>>
>>
>>
>> --
>> Thanks, Karthik
>
>


Re: Multi datacenter, WAN hiccups and replication

2012-06-26 Thread Karthik N
I re-read my last post and didn't think I had done a good job articulating.

Sorry! I'll try again...

Say I choose LOCAL_QUORUM with a Replication Factor of 3. Cassandra
stores three copies in my local datacenter. Therefore the cost
associated with "losing" one node is not very high locally, and I
usually disable HH, and use read repair/nodetool repair instead.

However over the WAN, network blips are quite normal and HH really
helps. More so because for WAN replication Cassandra sends only one
copy to a coordinator in the remote datacenter, and it's rather vital
for that copy to make it over to keep the two datacenters in sync.

Therefore I was wondering if Cassandra already intelligently special cases
for HH-over-WAN (since this is common) even if HH is disabled or alternately
if there's a way to enable HH for WAN replication only while disabling it for
the LOCAL_QUORUM?

Thank you.
Thanks, Karthik


On Tue, Jun 26, 2012 at 10:14 AM, Karthik N  wrote:
> Let me attempt to articulate my question a little better.
>
> Say I choose LOCAL_QUORUM with a Replication Factor of 3. Cassandra
> stores three copies in my local datacenter. Therefore the cost
> associated with "losing" one node is not very high locally, and I
> usually HH, and use read repair/nodetool repair instead.
>
> However over the WAN network blips are quite normal and HH really
> helps. More so because for WAN replication Cassandra sends only one
> copy to a coordinator in the remote datacenter.
>
> Therefore I was wondering if Cassandra already intelligently optimizes
> for HH-over-WAN (since this is common) or alternately if there's a way
> to enable HH for WAN replication?
>
> Thank you.
>
> On Tue, Jun 26, 2012 at 9:22 AM, Mohit Anchlia  wrote:
>>
>>
>> On Tue, Jun 26, 2012 at 8:16 AM, Karthik N  wrote:
>>>
>>> Since Cassandra optimizes and sends only one copy over the WAN, can I opt
>>> in only for HH for WAN replication and avoid HH for the local quorum? (since
>>> I know I have more copies)
>>>
>>>
>>
>> I am not sure if I understand your question. In general I don't think you
>> can selectively decide on HH. Besides HH should only be used when the outage
>> is in mts, for longer outages using HH would only create memory pressure.
>>>
>>> On Tuesday, June 26, 2012, Mohit Anchlia wrote:


 On Tue, Jun 26, 2012 at 7:52 AM, Karthik N  wrote:
>
> My Cassandra ring spans two DCs. I use local quorum with replication
> factor=3. I do a write in DC1 with local quorum. Data gets written to
> multiple nodes in DC1. For the same write to propagate to DC2 only one
> copy is sent from the coordinator node in DC1 to a coordinator node in
> DC2 for optimizing traffic over the WAN (from what I have read in the
> Cassandra documentation)
>
> Will a Wan hiccup result in a Hinted Handoff (HH) being created in
> DC1's coordinator for DC2 to be delivered when the Wan link is up
> again?


 I have seen hinted handoff messages in the log files when the remote DC
 is unreachable. But this mechanism is only used for a the time defined in
 cassandra.yaml file.
>>>
>>>
>>>
>>> --
>>> Thanks, Karthik
>>
>>


Re: cassandra 1.0.9 error - "Read an invalid frame size of 0"

2012-06-26 Thread Guy Incognito

i have seen this as well, is it a known issue?

On 18/06/2012 19:38, Gurpreet Singh wrote:


I found a fix for this one, rather a workaround.

I changed the rpc_server_type in cassandra.yaml, from hsha to sync, 
and the error went away. I guess, there is some issue with the thrift 
nonblocking server.


Thanks
Gurpreet

On Wed, May 16, 2012 at 7:04 PM, Gurpreet Singh 
mailto:gurpreet.si...@gmail.com>> wrote:


Thanks Aaron. will do!


On Mon, May 14, 2012 at 1:14 PM, aaron morton
mailto:aa...@thelastpickle.com>> wrote:

Are you using framed transport on the client side ?

Try the Hector user list for hector specific help
https://groups.google.com/forum/?fromgroups#!searchin/hector-users


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/05/2012, at 5:44 AM, Gurpreet Singh wrote:


This is hampering our testing of cassandra a lot, and our
move to cassandra 1.0.9.
Has anyone seen this before? Should I be trying a different
version of cassandra?

/G

On Thu, May 10, 2012 at 11:29 PM, Gurpreet Singh
mailto:gurpreet.si...@gmail.com>>
wrote:

Hi,
i have created 1 node cluster of cassandra 1.0.9. I am
setting this up for testing reads/writes.

I am seeing the following error in the server system.log

ERROR [Selector-Thread-7] 2012-05-10 22:44:02,607
TNonblockingServer.java (line 467) Read an invalid frame
size of 0. Are you using TFramedTransport on the client
side?

Initially i was using a old hector 0.7.x, but even after
switching to hector 1.0-5 and thrift version 0.6.1, i
still see this error.
I am using 20 threads writing/reading from cassandra. The
max write batch size is 10 with payload size constant per
key to be 600 bytes.

On the client side, i see Hector exceptions happenning
coinciding with these messages on the server.

Any ideas why these errors are happenning?

Thanks
Gurpreet











Ball is rolling on High Performance Cassandra Cookbook second edition

2012-06-26 Thread Edward Capriolo
Hello all,

It has not been very long since the first book was published but
several things have been added to Cassandra and a few things have
changed. I am putting together a list of changed content, for example
features like the old per Column family memtable flush settings versus
the new system with the global variable.

My editors have given me the green light to grow the second edition
from ~200 pages currently up to 300 pages! This gives us the ability
to add more items/sections to the text.

Some things were missing from the first edition such as Hector
support. Nate has offered to help me in this area. Please feel contact
me with any ideas and suggestions of recipes you would like to see in
the book. Also get in touch if you want to write a recipe. Several
people added content to the first edition and it would be great to see
that type of participation again.

Thank you,
Edward


Amazingly bad compaction performance

2012-06-26 Thread Dustin Wenz
We occasionally see fairly poor compaction performance on random nodes in our 
7-node cluster, and I have no idea why. This is one example from the log:

[CompactionExecutor:45] 2012-06-26 13:40:18,721 CompactionTask.java 
(line 221) Compacted to 
[/raid00/cassandra_data/main/basic/main-basic.basic_id_index-hd-160-Data.db,].  
26,632,210 to 26,679,667 (~100% of original) bytes for 2 keys at 0.006250MB/s.  
Time: 4,071,163ms.

That particular event took over an hour to compact only 25 megabytes. During 
that time, there was very little disk IO, and the java process (OpenJDK 7) was 
pegged at 200% CPU. The node was also completely unresponsive to network 
requests until the compaction was finished. Most compactions run just over 
7MB/s. This is an extreme outlier, but users definitely notice the hit when it 
occurs.

I grabbed a sample of the process using jstack, and this was the only thread in 
CompactionExecutor:

"CompactionExecutor:54" daemon prio=1 tid=41247522816 nid=0x99a5ff740 
runnable [140737253617664]
   java.lang.Thread.State: RUNNABLE
at org.xerial.snappy.SnappyNative.rawCompress(Native Method)
at org.xerial.snappy.Snappy.rawCompress(Snappy.java:358)
at 
org.apache.cassandra.io.compress.SnappyCompressor.compress(SnappyCompressor.java:80)
at 
org.apache.cassandra.io.compress.CompressedSequentialWriter.flushData(CompressedSequentialWriter.java:89)
at 
org.apache.cassandra.io.util.SequentialWriter.flushInternal(SequentialWriter.java:196)
at 
org.apache.cassandra.io.util.SequentialWriter.reBuffer(SequentialWriter.java:260)
at 
org.apache.cassandra.io.util.SequentialWriter.writeAtMost(SequentialWriter.java:128)
at 
org.apache.cassandra.io.util.SequentialWriter.write(SequentialWriter.java:112)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
- locked <36527862064> (a java.io.DataOutputStream)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:142)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:156)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Is it possible that there is an issue with snappy compression? Based on the 
lousy compression ratio, I think we could get by without it just fine. Can 
compression be changed or disabled on-the-fly with cassandra?

- .Dustin

bulk load problem

2012-06-26 Thread James Pirz
Dear all,

I am trying to use "sstableloader" in cassandra 1.1.1, to bulk load some
data into a single node cluster.
I am running the following command:

bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/

from "another" node (other than the node on which cassandra is running),
while the data should be loaded into a keyspace named "tpch". I made sure
that the 2nd node, from which I run sstableloader, have the same copy of
cassandra.yaml as the destination node.
I have put

tpch-cf0-hd-1-Data.db
tpch-cf0-hd-1-Index.db

under the path, I have passed to sstableloader.

But I am getting the following error:

Could not retrieve endpoint ranges:

Any hint ?

Thanks in advance,

James


Re: Migrate keyspace from version 1.0.8 to 1.1.1

2012-06-26 Thread aaron morton
There is nothing listed in the News file 
https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/06/2012, at 3:16 AM, Thierry Templier wrote:

> Hello,
> 
> What is the correct way to migrate a keyspace version 1.0.8 to 1.1.1? Is 
> there a documentation on this subject?
> 
> Thanks for your help.
> Thierry



Re: Amazingly bad compaction performance

2012-06-26 Thread Igor

Hello

Too much GC? Check JVM heap settings and real usage.

On 06/27/2012 01:37 AM, Dustin Wenz wrote:

We occasionally see fairly poor compaction performance on random nodes in our 
7-node cluster, and I have no idea why. This is one example from the log:

[CompactionExecutor:45] 2012-06-26 13:40:18,721 CompactionTask.java 
(line 221) Compacted to 
[/raid00/cassandra_data/main/basic/main-basic.basic_id_index-hd-160-Data.db,].  
26,632,210 to 26,679,667 (~100% of original) bytes for 2 keys at 0.006250MB/s.  
Time: 4,071,163ms.

That particular event took over an hour to compact only 25 megabytes. During 
that time, there was very little disk IO, and the java process (OpenJDK 7) was 
pegged at 200% CPU. The node was also completely unresponsive to network 
requests until the compaction was finished. Most compactions run just over 
7MB/s. This is an extreme outlier, but users definitely notice the hit when it 
occurs.

I grabbed a sample of the process using jstack, and this was the only thread in 
CompactionExecutor:

"CompactionExecutor:54" daemon prio=1 tid=41247522816 nid=0x99a5ff740 
runnable [140737253617664]
   java.lang.Thread.State: RUNNABLE
at org.xerial.snappy.SnappyNative.rawCompress(Native Method)
at org.xerial.snappy.Snappy.rawCompress(Snappy.java:358)
at 
org.apache.cassandra.io.compress.SnappyCompressor.compress(SnappyCompressor.java:80)
at 
org.apache.cassandra.io.compress.CompressedSequentialWriter.flushData(CompressedSequentialWriter.java:89)
at 
org.apache.cassandra.io.util.SequentialWriter.flushInternal(SequentialWriter.java:196)
at 
org.apache.cassandra.io.util.SequentialWriter.reBuffer(SequentialWriter.java:260)
at 
org.apache.cassandra.io.util.SequentialWriter.writeAtMost(SequentialWriter.java:128)
at 
org.apache.cassandra.io.util.SequentialWriter.write(SequentialWriter.java:112)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
- locked <36527862064> (a java.io.DataOutputStream)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:142)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:156)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Is it possible that there is an issue with snappy compression? Based on the 
lousy compression ratio, I think we could get by without it just fine. Can 
compression be changed or disabled on-the-fly with cassandra?

- .Dustin





Re: Amazingly bad compaction performance

2012-06-26 Thread Derek Andree
Last I heard only Oracle's JDK was officially supported with Cassandra, 
possibly nitpicky but is this still the case?

On Jun 26, 2012, at 3:37 PM, Dustin Wenz wrote:

> (OpenJDK 7) was pegged at 200% CPU