Re: Composite Column

2012-05-17 Thread samal
zCassandra data model is best known for demoralize data. You must have
entry for column name in other column family.

Example is just for posterity, you should not put everything in single row,
very inefficient.

I do not use composite column. The way I do and others do is use complex
column.

rowkey=>username
coulmns=>subsets of user

employee1={
   name:
.
.
   previous_job_xxx_company:null
   previous_job_yyy_company:null
}
employee2={
   name:
.
.
   previous_job::aaa_company:null
   previous_job::yyy_company:null
}

Here you get entire row, as row size is small, filter the similar details
by marker *[previous_job::* is marker here and *xxx_company* is real value
which we need, column value is not required(that depends on requirement)].

Their is very good presentation by datastax folk
http://www.datastax.com/2011/07/video-data-modeling-workshop-from-cassandra-sf-2011and
Joe
http://www.youtube.com/watch?v=EBjWlH4NPMA , it will help you understand
data model.

@samalgorai

On Thu, May 17, 2012 at 12:29 PM, Abhijit Chanda
wrote:

> Samal,
>
> Thanks buddy for interpreting. Now suppose i am inserting data in a column
> family using this data model dynamically, as a result columnNames will be
> dynamic. Now consider there is a entry for *employee1* *name*d "Smith",
> and i want to retrieve that value?
>
> Regards,
> Abhijit
>
> On Thu, May 17, 2012 at 12:03 PM, samal  wrote:
>
>> It is like using your super column inside columns name.
>>
>> empKey{
>>   employee1+name:XX,
>>   employee1+addr:X,
>>   employee2+name:X,
>>   employee2+addr:X
>> }
>>
>> Here all of your employee details are attached to one domain i.e. all of
>> employee1 details will be *"employee1+[anytihng.n numbers of
>> column]"*
>>
>> comaprator=CompositeType(UTF8Type1,UTF8Type2,...,n)
>>
>> /Samal
>>
>> On Thu, May 17, 2012 at 10:40 AM, Abhijit Chanda <
>> abhijit.chan...@gmail.com> wrote:
>>
>>> Aaron,
>>>
>>> Actually Aaron i am looking for a scenario on super columns being
>>> replaced by composite column.
>>> Say this is a data model using super column
>>> rowKey{
>>>   superKey1 {
>>> Name,
>>> Address,
>>> City,.
>>>   }
>>> }
>>>
>>> Actually i am having confusion how exactly the data model will look if
>>> we use composite column instead of super column.
>>>
>>> Thanks,
>>> Abhijit
>>>
>>>
>>>
>>> On Wed, May 16, 2012 at 2:56 PM, aaron morton 
>>> wrote:
>>>
 Abhijit,
 Can you explain the data model a bit more.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/05/2012, at 10:32 PM, samal wrote:

 It is just column with JSON value

 On Tue, May 15, 2012 at 4:00 PM, samal  wrote:

> I have not used CC but yes you can.
> Below is not composite column. It is not not column with JSON hash
> value. Column value can be anything you like.
> date inside value are not indexed.
>
>
> On Tue, May 15, 2012 at 9:27 AM, Abhijit Chanda <
> abhijit.chan...@gmail.com> wrote:
>
>> Is it possible to create this data model with the help of composite
>> column.
>>
>> User_Keys_By_Last_Name = {
>>   "Engineering" : {"anderson", 1 : "ac1263", "anderson", 2 : "724f02", 
>> ... },
>>   "Sales" : { "adams", 1 : "b32704", "alden", 1 : "1553bd", ... },
>> }
>>
>> I am using Astyanax. Please suggest...
>> --
>> Abhijit Chanda
>> Software Developer
>> VeHere Interactive Pvt. Ltd.
>> +91-974395
>>
>>
>


>>>
>>>
>>> --
>>> Abhijit Chanda
>>> Software Developer
>>> VeHere Interactive Pvt. Ltd.
>>> +91-974395
>>>
>>>
>>
>
>
> --
> Abhijit Chanda
> Software Developer
> VeHere Interactive Pvt. Ltd.
> +91-974395
>
>


RE: cassandra read latency help

2012-05-17 Thread Viktor Jevdokimov
> Gurpreet Singh wrote:
> Any ideas on what could help here bring down the read latency even more ?

Avoid Cassandra forwarding request to other nodes:
- Use consistency level ONE;
- Create data model to do single request with single key, since different keys 
may belong to different nodes and requires forwarding requests to them;
- Use smart client to calculate token for key and select appropriate node 
(primary or replica) by token range;
- Turn off Dynamic Snitch (it may forward request to other replica even it has 
the data);
- Have all or hot data in page cache (no HDD disk IO) or use SSD;
- If you do regular updates to key, do not use row cache, otherwise you may try.




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.


Re: Inconsistent dependencies

2012-05-17 Thread Sylvain Lebresne
On Thu, May 17, 2012 at 2:14 AM, Rob Coli  wrote:
> On Tue, Apr 24, 2012 at 12:56 PM, Matthias Pfau  wrote:
>> we just noticed that cassandra is currently published with inconsistent
>> dependencies. The inconsistencies exist between the published pom and the
>> published distribution (tar.gz). I compared hashes of the libs of several
>> versions and the inconsistencies are different each time. However, I have
>> not found a single cassandra release without inconsistencies.
>
> Was there every any answer to this question or resolution to this issue?

https://issues.apache.org/jira/browse/CASSANDRA-4183


>
> If not, I suggest to Matthias that he file a JIRA ticket on the Apache
> Cassandra JIRA.
>
> =Rob
>
> --
> =Robert Coli
> AIM>ALK - rc...@palominodb.com
> YAHOO - rcoli.palominob
> SKYPE - rcoli_palominodb


Exception when truncate

2012-05-17 Thread ruslan usifov
Hello

I have follow situation on our test server:

from cassandra-cli i try to use

truncate purchase_history;

3 times i got:

[default@township_6waves] truncate purchase_history;
null
UnavailableException()
at 
org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077)
at 
org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052)
at 
org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272)
at 
org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)


So this looks that truncate goes very slow and too long, than
rpc_timeout_in_ms: 1 (this can happens because we have very slow
disck on test machine)

But in in cassandra system log i see follow exception:


ERROR [MutationStage:7022] 2012-05-17 12:19:14,356
AbstractCassandraDaemon.java (line 139) Fatal exception in thread
Thread[MutationStage:7022,5,main]
java.io.IOError: java.io.IOException: unable to mkdirs
/home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
at 
org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657)
at 
org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: unable to mkdirs
/home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history
at 
org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:140)
at 
org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:131)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1409)
... 7 more


Also i see that in snapshort dir already exists
1337242754356-purchase_history directory, so i think that snapshort
names that generate cassandra not uniquely.

PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS


Re: Exception when truncate

2012-05-17 Thread ruslan usifov
Also i miss understand why on empty CF(no any SStable) truncate heavy
loads disk??

2012/5/17 ruslan usifov :
> Hello
>
> I have follow situation on our test server:
>
> from cassandra-cli i try to use
>
> truncate purchase_history;
>
> 3 times i got:
>
> [default@township_6waves] truncate purchase_history;
> null
> UnavailableException()
>        at 
> org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212)
>        at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077)
>        at 
> org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052)
>        at 
> org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445)
>        at 
> org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272)
>        at 
> org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220)
>        at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)
>
>
> So this looks that truncate goes very slow and too long, than
> rpc_timeout_in_ms: 1 (this can happens because we have very slow
> disck on test machine)
>
> But in in cassandra system log i see follow exception:
>
>
> ERROR [MutationStage:7022] 2012-05-17 12:19:14,356
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
> Thread[MutationStage:7022,5,main]
> java.io.IOError: java.io.IOException: unable to mkdirs
> /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657)
>        at 
> org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50)
>        at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: unable to mkdirs
> /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history
>        at 
> org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:140)
>        at 
> org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:131)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1409)
>        ... 7 more
>
>
> Also i see that in snapshort dir already exists
> 1337242754356-purchase_history directory, so i think that snapshort
> names that generate cassandra not uniquely.
>
> PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS


RE: Exception when truncate

2012-05-17 Thread Viktor Jevdokimov
Truncate flushes all memtables to free up commit logs, and that on all nodes. 
So this takes time. Discussed on this list not so long ago.

Watch for:
https://issues.apache.org/jira/browse/CASSANDRA-3651
https://issues.apache.org/jira/browse/CASSANDRA-4006



Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.> -Original Message-
> From: ruslan usifov [mailto:ruslan.usi...@gmail.com]
> Sent: Thursday, May 17, 2012 13:06
> To: user@cassandra.apache.org
> Subject: Re: Exception when truncate
>
> Also i miss understand why on empty CF(no any SStable) truncate heavy
> loads disk??
>
> 2012/5/17 ruslan usifov :
> > Hello
> >
> > I have follow situation on our test server:
> >
> > from cassandra-cli i try to use
> >
> > truncate purchase_history;
> >
> > 3 times i got:
> >
> > [default@township_6waves] truncate purchase_history; null
> > UnavailableException()
> >at
> > org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.j
> > ava:20212)
> >at
> > org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.j
> > ava:1077)
> >at
> > org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1
> > 052)
> >at
> > org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445
> > )
> >at
> > org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:
> > 272)
> >at
> > org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.j
> > ava:220)
> >at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)
> >
> >
> > So this looks that truncate goes very slow and too long, than
> > rpc_timeout_in_ms: 1 (this can happens because we have very slow
> > disck on test machine)
> >
> > But in in cassandra system log i see follow exception:
> >
> >
> > ERROR [MutationStage:7022] 2012-05-17 12:19:14,356
> > AbstractCassandraDaemon.java (line 139) Fatal exception in thread
> > Thread[MutationStage:7022,5,main]
> > java.io.IOError: java.io.IOException: unable to mkdirs
> >
> /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
> pur
> > chase_history
> >at
> >
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column
> F
> > amilyStore.java:1433)
> >at
> >
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.j
> > ava:1462)
> >at
> >
> org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.j
> > ava:1657)
> >at
> >
> org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandle
> r
> > .java:50)
> >at
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j
> > ava:59)
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
> > tor.java:886)
> >at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:908)
> >at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.IOException: unable to mkdirs
> >
> /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
> pur
> > chase_history
> >at
> > org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
> > 140)
> >at
> > org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
> > 131)
> >at
> >
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column
> F
> > amilyStore.java:1409)
> >... 7 more
> >
> >
> > Also i see that in snapshort dir already exists
> > 1337242754356-purchase_history directory, so i think that snapshort
> > names that generate cassandra not uniquely.
> >
> > PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS


Matthew Dennis's "Cassandra On EC2"

2012-05-17 Thread Tamar Fraenkel
Hi!

I found the slides of the lecture
http://www.slideshare.net/mattdennis/cassandra-on-ec2
I wonder if there is a way to get a video of the lecture.
Thanks,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
<>

Re: Exception when truncate

2012-05-17 Thread ruslan usifov
Maybe, something changes in cassandra 1.0.x for truncate mechanism,
because in cassandra 0.8 truncate executes much faster on the same
data

2012/5/17 Viktor Jevdokimov :
> Truncate flushes all memtables to free up commit logs, and that on all nodes. 
> So this takes time. Discussed on this list not so long ago.
>
> Watch for:
> https://issues.apache.org/jira/browse/CASSANDRA-3651
> https://issues.apache.org/jira/browse/CASSANDRA-4006
>
>
>
> Best regards / Pagarbiai
>
> Viktor Jevdokimov
> Senior Developer
>
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063
> Fax: +370 5 261 0453
>
> J. Jasinskio 16C,
> LT-01112 Vilnius,
> Lithuania
>
>
>
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.> -Original Message-
>> From: ruslan usifov [mailto:ruslan.usi...@gmail.com]
>> Sent: Thursday, May 17, 2012 13:06
>> To: user@cassandra.apache.org
>> Subject: Re: Exception when truncate
>>
>> Also i miss understand why on empty CF(no any SStable) truncate heavy
>> loads disk??
>>
>> 2012/5/17 ruslan usifov :
>> > Hello
>> >
>> > I have follow situation on our test server:
>> >
>> > from cassandra-cli i try to use
>> >
>> > truncate purchase_history;
>> >
>> > 3 times i got:
>> >
>> > [default@township_6waves] truncate purchase_history; null
>> > UnavailableException()
>> >        at
>> > org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.j
>> > ava:20212)
>> >        at
>> > org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.j
>> > ava:1077)
>> >        at
>> > org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1
>> > 052)
>> >        at
>> > org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445
>> > )
>> >        at
>> > org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:
>> > 272)
>> >        at
>> > org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.j
>> > ava:220)
>> >        at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)
>> >
>> >
>> > So this looks that truncate goes very slow and too long, than
>> > rpc_timeout_in_ms: 1 (this can happens because we have very slow
>> > disck on test machine)
>> >
>> > But in in cassandra system log i see follow exception:
>> >
>> >
>> > ERROR [MutationStage:7022] 2012-05-17 12:19:14,356
>> > AbstractCassandraDaemon.java (line 139) Fatal exception in thread
>> > Thread[MutationStage:7022,5,main]
>> > java.io.IOError: java.io.IOException: unable to mkdirs
>> >
>> /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
>> pur
>> > chase_history
>> >        at
>> >
>> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column
>> F
>> > amilyStore.java:1433)
>> >        at
>> >
>> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.j
>> > ava:1462)
>> >        at
>> >
>> org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.j
>> > ava:1657)
>> >        at
>> >
>> org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandle
>> r
>> > .java:50)
>> >        at
>> >
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j
>> > ava:59)
>> >        at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
>> > tor.java:886)
>> >        at
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>> > java:908)
>> >        at java.lang.Thread.run(Thread.java:662)
>> > Caused by: java.io.IOException: unable to mkdirs
>> >
>> /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
>> pur
>> > chase_history
>> >        at
>> > org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
>> > 140)
>> >        at
>> > org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
>> > 131)
>> >        at
>> >
>> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column
>> F
>> > amilyStore.java:1409)
>> >        ... 7 more
>> >
>> >
>> > Also i see that in snapshort dir already exists
>> > 1337242754356-purchase_history directory, so i think that snapshort
>> > names that generate cassandra not uniquely.
>> >
>> > PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS


Re: 1.0.6 -> 1.1.0 nodetool ownership report, and other anomalies

2012-05-17 Thread Eric Evans
On Wed, May 16, 2012 at 5:08 PM, Ron Siemens  wrote:
>
> I upgraded to 1.0.6 to 1.1.0, and I noticed the effective ownership report 
> changed.
>
> I have a 3-node cluster, with evenly divided tokens and RF=2.  The node tool 
> report on 1.0.6 was:
>
> 33.33%  0
> 33.33%  56713727820156410577229101238628035243
> 33.33%  113427455640312821154458202477256070485
>
> Under 1.1.0 it is
>
> 66.67%  0
> 66.67%  56713727820156410577229101238628035243
> 66.67%  113427455640312821154458202477256070485
>
> Does the updated reporting in 1.1.0 include the replicated data and before it 
> didn't?

Yes.

> As long as I'm posting, I'll report other anomalies that I witnessed but 
> overcame.
>
> I eventually gave up trying to do a live rolling update because it complained 
> certain column families didn't exist.  Like this:  This doesn't appear to be 
> any CF I am using.
>
> ERROR [ReadStage:69680] 2012-05-02 01:54:14,995 AbstractCassandraDaemon.java 
> (line 133) Fatal exception
> in thread Thread[ReadStage:69680,5,main]
> java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown CF 
> 1851
>        at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1238)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.IllegalArgumentException: Unknown CF 1851
>        at org.apache.cassandra.db.Table.getColumnFamilyStore(Table.java:167)
>        at org.apache.cassandra.db.Table.getColumnFamilyStore(Table.java:160)
>        at org.apache.cassandra.db.Table.getRow(Table.java:374)
>        at 
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:58)
>        at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:766
> )
>        at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1234)
>        ... 3 more
>
> I tried repeatedly after various nodetool incantations but never succeeded.  
> Although I know it -can- work.  It worked at least once, but I had to 
> rollback to 1.0.6 though, because I found 1.1.0 was using the new 
> SnappyCompression.  But my target OS is Solaris, and there's only 3 included 
> libraries in the jar for SnappyCompression - Solaris not included.  So I had 
> to update all my column family creation code to explicitly set compression to 
> the previous JavaDeflate default.  That new default was annoying.

It doesn't ring a bell; For something like this, it would be great to
have a ticket with details on your environment, steps to reproduce,
the full output of the logs, and where possible, the data.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: Exception when truncate

2012-05-17 Thread Jeremy Hanna
when doing a truncate, it has to talk to all of the nodes in the ring to 
perform the operation.  by the error, it looks like one of the nodes was 
unreachable for some reason.  you might do a nodetool ring in the cli do a 
'describe cluster;' and see if your ring is okay.

So I think the operation is just as fast, it just looks like it times out (20 
seconds or something) when trying to perform the command against all of the 
nodes in the cluster.

On May 17, 2012, at 9:36 AM, ruslan usifov wrote:

> Maybe, something changes in cassandra 1.0.x for truncate mechanism,
> because in cassandra 0.8 truncate executes much faster on the same
> data
> 
> 2012/5/17 Viktor Jevdokimov :
>> Truncate flushes all memtables to free up commit logs, and that on all 
>> nodes. So this takes time. Discussed on this list not so long ago.
>> 
>> Watch for:
>> https://issues.apache.org/jira/browse/CASSANDRA-3651
>> https://issues.apache.org/jira/browse/CASSANDRA-4006
>> 
>> 
>> 
>> Best regards / Pagarbiai
>> 
>> Viktor Jevdokimov
>> Senior Developer
>> 
>> Email: viktor.jevdoki...@adform.com
>> Phone: +370 5 212 3063
>> Fax: +370 5 261 0453
>> 
>> J. Jasinskio 16C,
>> LT-01112 Vilnius,
>> Lithuania
>> 
>> 
>> 
>> Disclaimer: The information contained in this message and attachments is 
>> intended solely for the attention and use of the named addressee and may be 
>> confidential. If you are not the intended recipient, you are reminded that 
>> the information remains the property of the sender. You must not use, 
>> disclose, distribute, copy, print or rely on this e-mail. If you have 
>> received this message in error, please contact the sender immediately and 
>> irrevocably delete this message and any copies.> -Original Message-
>>> From: ruslan usifov [mailto:ruslan.usi...@gmail.com]
>>> Sent: Thursday, May 17, 2012 13:06
>>> To: user@cassandra.apache.org
>>> Subject: Re: Exception when truncate
>>> 
>>> Also i miss understand why on empty CF(no any SStable) truncate heavy
>>> loads disk??
>>> 
>>> 2012/5/17 ruslan usifov :
 Hello
 
 I have follow situation on our test server:
 
 from cassandra-cli i try to use
 
 truncate purchase_history;
 
 3 times i got:
 
 [default@township_6waves] truncate purchase_history; null
 UnavailableException()
at
 org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.j
 ava:20212)
at
 org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.j
 ava:1077)
at
 org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1
 052)
at
 org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445
 )
at
 org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:
 272)
at
 org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.j
 ava:220)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)
 
 
 So this looks that truncate goes very slow and too long, than
 rpc_timeout_in_ms: 1 (this can happens because we have very slow
 disck on test machine)
 
 But in in cassandra system log i see follow exception:
 
 
 ERROR [MutationStage:7022] 2012-05-17 12:19:14,356
 AbstractCassandraDaemon.java (line 139) Fatal exception in thread
 Thread[MutationStage:7022,5,main]
 java.io.IOError: java.io.IOException: unable to mkdirs
 
>>> /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
>>> pur
 chase_history
at
 
>>> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column
>>> F
 amilyStore.java:1433)
at
 
>>> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.j
 ava:1462)
at
 
>>> org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.j
 ava:1657)
at
 
>>> org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandle
>>> r
 .java:50)
at
 
>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j
 ava:59)
at
 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
 tor.java:886)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
 java:908)
at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: unable to mkdirs
 
>>> /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
>>> pur
 chase_history
at
 org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
 140)
at
 org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
 131)
at
 
>>> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column
>>> F
 amilyStore.java:1409)
... 7 more
 
 
 Also i see that in 

Re: Matthew Dennis's "Cassandra On EC2"

2012-05-17 Thread Jeremy Hanna
Sorry - it was at the austin cassandra meetup and we didn't record the 
presentation.  I wonder if this would be a popular topic to have at the 
upcoming Cassandra SF event which would be recorded...

On May 17, 2012, at 6:51 AM, Tamar Fraenkel wrote:

> Hi!
> 
> I found the slides of the lecture  
> http://www.slideshare.net/mattdennis/cassandra-on-ec2 
> I wonder if there is a way to get a video of the lecture.
> Thanks,
> 
> Tamar Fraenkel 
> Senior Software Engineer, TOK Media 
> 
> 
> 
> ta...@tok-media.com
> Tel:   +972 2 6409736 
> Mob:  +972 54 8356490 
> Fax:   +972 2 5612956 
> 
> 
> 



Zurich / Swiss / Alps meetup

2012-05-17 Thread Sasha Dolgy
All,

A year ago I made a simple query to see if there were any users based in
and around Zurich, Switzerland or the Alps region, interested in
participating in some form of Cassandra User Group / Meetup.  At the time,
1-2 replies happened.  I didn't do much with that.

Let's try this again.  Who all is interested?  I often am jealous about all
the fun I miss out on with the regular meetups that happen stateside ...

Regards,
-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Matthew Dennis's "Cassandra On EC2"

2012-05-17 Thread Sasha Dolgy
Although, probably inappropriate, I would be willing to contribute some
funds for someone to recreate it with animated stick-figures.

thanks. ;)

On Thu, May 17, 2012 at 6:02 PM, Jeremy Hanna wrote:

> Sorry - it was at the austin cassandra meetup and we didn't record the
> presentation.  I wonder if this would be a popular topic to have at the
> upcoming Cassandra SF event which would be recorded...
>
>


Re: Matthew Dennis's "Cassandra On EC2"

2012-05-17 Thread Tamar Fraenkel
I think the topic is very interesting :)
I can't attend the SF event (as I am in Israel) and will appreciate a video!
Thanks,
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Thu, May 17, 2012 at 7:20 PM, Sasha Dolgy  wrote:

> Although, probably inappropriate, I would be willing to contribute some
> funds for someone to recreate it with animated stick-figures.
>
> thanks. ;)
>
>
> On Thu, May 17, 2012 at 6:02 PM, Jeremy Hanna 
> wrote:
>
>> Sorry - it was at the austin cassandra meetup and we didn't record the
>> presentation.  I wonder if this would be a popular topic to have at the
>> upcoming Cassandra SF event which would be recorded...
>>
>>
<>

Re: Data modeling for read performance

2012-05-17 Thread Aaron Turner
On Thu, May 17, 2012 at 8:55 AM, jason kowalewski
 wrote:
> We have been attempting to change our data model to provide more
> performance in our cluster.
>
> Currently there are a couple ways to model the data and i was
> wondering if some people out there could help us out.
>
> We are storing time-series data currently keyed by a user id. This
> current approach is leading to some hot-spotting of nodes likely due
> to the key distribution not being representative of the usage pattern.
> Currently we are using super columns (the super column name is the
> timestamp), which we intend to dispose of as well with this datamodel
> redesign.
>
> The first idea we had is that we can shard the data using composite row
> keys into time buckets:
>
> UserId: : {
>  : = ,
>  :
> ... and so on.
> }
>
> We can then use a wide row index for tracking these in the future:
> : {
>   = null
> }
>
> This first approach would always have the data be retrieved by the composite
> row key.
>
> Alternatively we could just do wide rows using composite columns:
>
> UserId : {
>  : = ,
>  : = 
>
> ... and so on
> }
>
>
> The second approach would have less granular keys, but is easier to group
> historical timeseries rather than sharding the data into buckets. This second
> approach also will depend solely on Range Slices of the columns to retrieve
> the data.
>
> Is there a speed advantage in doing a Row point get in the first approach vs
> range scans on these columns  in the second approach? In the first approach
> each bucket would have no more than 200 events. In the second approach we
> would expect the number of columns to be in the thousands to hundreds of
> thousands... Our reads currently (using supercolumns) are PAINFULLY slow -
> the cluster is constantly timing out on many nodes and disk i/o is very high.
>
> Also, Instead of having each column name as a new composite column is it
> better to serialize the multiple values into some format (json, binary, etc) 
> to
> reduce the amount of disk seeks when paging over this timeseries data?
>
> Thanks for any ideas out there!


You didn't say what your queries look like, but the way I did it was:

|| : {
   = 
}

This provides very efficient read for a given user/stat combination.
If I need to get multiple stats per user, I just use more threads on
the client side.  I'm not using composite row keys (it's just
AsciiType) as that can lead to hotspots on disk.  My timestamps are
also just plain unix epoch's as that takes less space then something
like TimeUUID.



-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: cassandra read latency help

2012-05-17 Thread Gurpreet Singh
Thanks Viktor for the advice.
Right now, i just have 1 node that i am testing against and i am using CL
one.
Are you suggesting that the page cache might be doing better than the row
cache?
I am getting row cache hit of 0.66 right now.

/G

On Thu, May 17, 2012 at 12:26 AM, Viktor Jevdokimov <
viktor.jevdoki...@adform.com> wrote:

> > Gurpreet Singh wrote:
> > Any ideas on what could help here bring down the read latency even more ?
>
> Avoid Cassandra forwarding request to other nodes:
> - Use consistency level ONE;
> - Create data model to do single request with single key, since different
> keys may belong to different nodes and requires forwarding requests to them;
> - Use smart client to calculate token for key and select appropriate node
> (primary or replica) by token range;
> - Turn off Dynamic Snitch (it may forward request to other replica even it
> has the data);
> - Have all or hot data in page cache (no HDD disk IO) or use SSD;
> - If you do regular updates to key, do not use row cache, otherwise you
> may try.
>
>
>
>
> Best regards / Pagarbiai
>
> Viktor Jevdokimov
> Senior Developer
>
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063
> Fax: +370 5 261 0453
>
> J. Jasinskio 16C,
> LT-01112 Vilnius,
> Lithuania
>
>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>


Connection Reset By Peer (CQL over SSH Tunnel)

2012-05-17 Thread Montgomery Burns
Hi,

I am trying to connect to a Cassandra (1.0.8 or 1.1.0) server with the
standard Thrift/CQL client, over an SSH tunnel, but keep getting this
exception:

org.apache.thrift.transport.TTransportException:
java.net.SocketException: Connection reset by peer: socket write error

The CQL query itself, as well as the Java code surrounding it is (or
at least should be) valid, since it runs fine when server and client
are both on the same machine, or at least in the same LAN.

The network setup is as follows:

Client machine with localhost-1, Internal-IP-1, Public-IP-A
SSH server machine with Internal-IP-2, Public-IP-B
Cassandra server machine with localhost-3, Internal-IP-3

The SSH server and the Cassandra server are both on the same LAN and
thus can communicate directly with their internal IPs (Internal-IP-2
to Internal-IP-3). The client machine connects to the SSH server
(Public-IP-A to Public-IP-B) and opens a tunnel forwarding the data
sent to localhost-1 on to Internal-IP-3 on ports 9160, 7000 and 7001.
This seems to be working fine as I can connect to the Cassandra server
with telnet (with telnet localhost 9160), but when starting the
Cassandra client program it throws the exception mentioned above.

Thus I think it is safe to assume that this is neither a connection
refused, nor a timeout issue - indicating the that the initial
connection attempt succeed, and thus no firewall was interfering, but
something still went wrong later on.

Do you have any idea what the issue could be? Or what I would need to change?


Thank you!


Re: need some clarification on recommended memory size

2012-05-17 Thread Yiming Sun
Hi Aaron,

Thank you for guiding us by breaking down the issue.  Please see my answers
embedded

> Is this a single client ?

Yes

> How many columns is it asking for ?

the client knows a list of all row keys, and it randomly picks 100, and
loops 100 times.  It first reads a metadata column to figure out how many
columns to read, and it then reads these columns

> What sort of query are you sending, slice or named columns?

currently all queries are slice queries.  so the first slice query reads
the metadata column (actually 2 metadata columns, one is for Number of
columns to read, the other for other information which is not needed for
the purpose of performance test, but I kept it in there to make it similar
to the real situation).It then generates the column name array and
sends the second slice query.

The timing for the queries is completely isolated, and excludes the time
spent generating column name array etc.


>  From the client side how long is a single read taking ?

I am not 100% sure on what you are asking... are you saying how long it
takes for SliceQuery.execute()?  The average we are getting are between
50-70 ms, and nodetool report similar latency, differ by 5-10ms at top.


> What is the write workload like?  it sounds like it's write once read
many.

Indeed it is like a WORM environment. For the performance, we don't have
any writes.

> memory speed > network speed

yes.  right now, our data is only a sample about 250K rows, so the default
200,000 key cache hits above 90%.  But we soon will be hosting the real
deal with about 3M rows, so I am not sure our memory size will be able to
keep up with it.

In any case, Aaron, please let us know if you have any
suggestions/comments/insights.  Thanks!

-- Y.


On Thu, May 17, 2012 at 1:04 AM, aaron morton wrote:

> The read rate that I have been seeing is about 3MB/sec, and that is
> reading the raw bytes... using string serializer the rate is even lower,
> about 2.2MB/sec.
>
> Can we break this down a bit:
>
> Is this a single client ?
> How many columns is it asking for ?
> What sort of query are you sending, slice or named columns?
> From the client side how long is a single read taking ?
> What is the write workload like?  it sounds like it's write once read
> many.
>
> Use nodetool cfstats to see what the read latency is on a single node.
> (see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) Is there
> much difference between this and the latency from the client perspective ?
>
>
>
> Using JNA may help, but a blog article seems to say it only increase 13%,
> which is not very significant when the base performance is in single-digit
> MBs.
>
> There are other reasons to have JNA installed: more efficient snapshots
> and advising the OS when file operations should not be cached.
>
>  Our environment is virtualized, and the disks are actually SAN through
> fiber channels, so I don't know if that has impact on performance as well.
>
> memory speed > network speed
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
>
>


Re: Exception when truncate

2012-05-17 Thread ruslan usifov
Its our test machine with one node in cluster:-)

2012/5/17 Jeremy Hanna :
> when doing a truncate, it has to talk to all of the nodes in the ring to 
> perform the operation.  by the error, it looks like one of the nodes was 
> unreachable for some reason.  you might do a nodetool ring in the cli do a 
> 'describe cluster;' and see if your ring is okay.
>
> So I think the operation is just as fast, it just looks like it times out (20 
> seconds or something) when trying to perform the command against all of the 
> nodes in the cluster.
>
> On May 17, 2012, at 9:36 AM, ruslan usifov wrote:
>
>> Maybe, something changes in cassandra 1.0.x for truncate mechanism,
>> because in cassandra 0.8 truncate executes much faster on the same
>> data
>>
>> 2012/5/17 Viktor Jevdokimov :
>>> Truncate flushes all memtables to free up commit logs, and that on all 
>>> nodes. So this takes time. Discussed on this list not so long ago.
>>>
>>> Watch for:
>>> https://issues.apache.org/jira/browse/CASSANDRA-3651
>>> https://issues.apache.org/jira/browse/CASSANDRA-4006
>>>
>>>
>>>
>>> Best regards / Pagarbiai
>>>
>>> Viktor Jevdokimov
>>> Senior Developer
>>>
>>> Email: viktor.jevdoki...@adform.com
>>> Phone: +370 5 212 3063
>>> Fax: +370 5 261 0453
>>>
>>> J. Jasinskio 16C,
>>> LT-01112 Vilnius,
>>> Lithuania
>>>
>>>
>>>
>>> Disclaimer: The information contained in this message and attachments is 
>>> intended solely for the attention and use of the named addressee and may be 
>>> confidential. If you are not the intended recipient, you are reminded that 
>>> the information remains the property of the sender. You must not use, 
>>> disclose, distribute, copy, print or rely on this e-mail. If you have 
>>> received this message in error, please contact the sender immediately and 
>>> irrevocably delete this message and any copies.> -Original Message-
 From: ruslan usifov [mailto:ruslan.usi...@gmail.com]
 Sent: Thursday, May 17, 2012 13:06
 To: user@cassandra.apache.org
 Subject: Re: Exception when truncate

 Also i miss understand why on empty CF(no any SStable) truncate heavy
 loads disk??

 2012/5/17 ruslan usifov :
> Hello
>
> I have follow situation on our test server:
>
> from cassandra-cli i try to use
>
> truncate purchase_history;
>
> 3 times i got:
>
> [default@township_6waves] truncate purchase_history; null
> UnavailableException()
>        at
> org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.j
> ava:20212)
>        at
> org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.j
> ava:1077)
>        at
> org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1
> 052)
>        at
> org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445
> )
>        at
> org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:
> 272)
>        at
> org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.j
> ava:220)
>        at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)
>
>
> So this looks that truncate goes very slow and too long, than
> rpc_timeout_in_ms: 1 (this can happens because we have very slow
> disck on test machine)
>
> But in in cassandra system log i see follow exception:
>
>
> ERROR [MutationStage:7022] 2012-05-17 12:19:14,356
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
> Thread[MutationStage:7022,5,main]
> java.io.IOError: java.io.IOException: unable to mkdirs
>
 /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
 pur
> chase_history
>        at
>
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column
 F
> amilyStore.java:1433)
>        at
>
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.j
> ava:1462)
>        at
>
 org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.j
> ava:1657)
>        at
>
 org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandle
 r
> .java:50)
>        at
>
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j
> ava:59)
>        at
>
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
> tor.java:886)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:908)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: unable to mkdirs
>
 /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
 pur
> chase_history
>        at
> org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
> 140)
>        at
> org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
>

Re: 1.0.6 -> 1.1.0 nodetool ownership report, and other anomalies

2012-05-17 Thread Ron Siemens
>> 
>> Does the updated reporting in 1.1.0 include the replicated data and before 
>> it didn't?
> 
> Yes.
> 

Thanks for verifying that.

Ron



row cache -- does it have data from other nodes?

2012-05-17 Thread Maxim Potekhin

Hello,

when I chose to have a rowcache -- will it contain data that is owned by 
other nodes?


Thanks

Maxim



Re: row cache -- does it have data from other nodes?

2012-05-17 Thread Edward Capriolo
No cache can only have data from the local node since write operations
need to be able to evict items.

On Thu, May 17, 2012 at 4:42 PM, Maxim Potekhin  wrote:
> Hello,
>
> when I chose to have a rowcache -- will it contain data that is owned by
> other nodes?
>
> Thanks
>
> Maxim
>


unsubscribe

2012-05-17 Thread casablinca126.com
unsubscribe




Re: sstableloader 1.1 won't stream

2012-05-17 Thread sj.climber
Pieter, Aaron,

Any further progress on this?  I'm running into the same issue, although in
my case I'm trying to stream from Ubuntu 10.10 to a 2-node cluster (also
Cassandra 1.1.0, and running on separate Ubuntu 10.10 hosts).

Thanks in advance!

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/sstableloader-1-1-won-t-stream-tp7535517p7564811.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Using EC2 ephemeral 4disk raid0 cause high iowait trouble

2012-05-17 Thread koji Lin
Hi

We use amazon ami 3.2.12-3.2.4.amzn1.x86_64

and some of our data file are more than 10G

thanks

koji
2012-5-16 下午6:00 於 "aaron morton"  寫道:

> On Ubuntu ? Sounds like http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/05/2012, at 2:13 PM, koji Lin wrote:
>
> Hi
>
> Our service already run cassandra 1.0 on 1x ec2 instances(with ebs), and
> we saw lots of discussion talk about using  ephemeral raid for better
> performance and consistent performance.
>
> So we want to create new instance using 4 ephemeral raid0, and copy the
> data from ebs to finally replace the old instance and reduce some .
>
> we create the xlarge instance with -b '/dev/sdb=ephemeral0' -b
> '/dev/sdc=ephemeral1' -b '/dev/sdd=ephemeral2' -b '/dev/sde=ephemeral3',
>
> and use mdadm command like this  mdadm --create /dev/md0 --level=0 -c256
> --raid-devices=4 /dev/sdb /dev/sdc /dev/sdd /dev/sde
>
> after copying file and start the cassandra(same token as old instance it
> replaced).
>
> we saw the read is really fast always keep 2xxm/sec, but system load
> exceed 40, with high iowait, and lots of client get timeout result. We
> guess maybe it's the problem of ec2 instance, so we create another one with
> same setting to replace other machine ,but the result is same . Then we
> rollback to ebs with single disk ,read speed keeps at 1xmb/sec but system
> becomes well .(using ebs with 2 disks raid0 will keep at 2xmb/sec and
> higher iowait then single disk ,but still works)
>
> Is there anyone meet the same problem too ? or do we forget something to
> configure?
>
> thank you
>
> koji
>
>
>


Re: unsubscribe

2012-05-17 Thread Dave Brosius

On 05/17/2012 09:49 PM, casablinca126.com wrote:

unsubscribe




send that message to


user-unsubscr...@cassandra.apache.org


Re: Safely Disabling Compaction

2012-05-17 Thread Vijay
I would rather set the Keyspace setting min_compaction_threshold
and max_compaction_threshold to be a higher number and once i am ready i
will put the value back... This way i dont need to restart.
Having said that why not set the compaction throughput to 1 (low enough to
not have contention) and complete the stream?

Regards,




On Wed, May 16, 2012 at 2:43 PM, sj.climber  wrote:

> Hi,
>
> In an effort to minimize IO contention, I'd like to disable compactions
> while I'm streaming SSTables to the cluster.  When done streaming, I intend
> on forcing a major compaction through nodetool.
>
> Elsewhere in the forums, various folks suggest setting
> max_compaction_threshold = 0 to disable compaction.  While this works
> sometimes (via 'update column family  with
> max_compaction_threshold=0'), I've observed a number of serious issues with
> this approach:
>
> 1) You can't create a column family with max_compaction_threshold = 0.  The
> CLI reports that min_compaction_threshold must have a value >= 2, and
> max_compaction_threshold can't be lower than it.  Worse yet, trying to
> create a column family with max_compaction_threshold = 0 gets the cluster
> into a Schema Disagreement Exception (since the node on which you issue the
> migration command fails with a fatal error).
>
> 2) Cassandra will allow me to update an existing column family with
> max_compaction_threshold = 0.  But if I restart the node, it will crash on
> startup.
> java.lang.reflect.InvocationTargetException
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at
> org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
> Caused by: java.lang.RuntimeException:
> java.lang.reflect.InvocationTargetException
> ...
>
> org.apache.cassandra.config.CFMetaData.createCompactionStrategyInstance(CFMetaData.java:839)
>... 14 more
> Caused by: java.lang.RuntimeException: The max_compaction_threshold cannot
> be smaller than the min.
>at
>
> org.apache.cassandra.db.ColumnFamilyStore.setMaximumCompactionThreshold(ColumnFamilyStore.java:1740)
>at org.apache.
>
>
> Is there another solution for more safely enabling/disabling compaction?
>
> Thanks!
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Safely-Disabling-Compaction-tp7562777.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


RE: cassandra read latency help

2012-05-17 Thread Viktor Jevdokimov
Row cache is ok until keys are not heavily updated, otherwise it frequently 
invalidates and pressures GC.

The high latency is from your batch of 100 keys. Review your data model to 
avoid such reads, if you need low latency.

500M rows on one node, or on the cluster? Reading 100 random rows at total of 
40KB data from a data set of 180GB uncompressed under 30ms is not an easy task.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider
What is Adform: watch this short video

[Adform News] 


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Gurpreet Singh [mailto:gurpreet.si...@gmail.com]
Sent: Thursday, May 17, 2012 20:24
To: user@cassandra.apache.org
Subject: Re: cassandra read latency help

Thanks Viktor for the advice.
Right now, i just have 1 node that i am testing against and i am using CL one.
Are you suggesting that the page cache might be doing better than the row cache?
I am getting row cache hit of 0.66 right now.

/G

On Thu, May 17, 2012 at 12:26 AM, Viktor Jevdokimov 
mailto:viktor.jevdoki...@adform.com>> wrote:
> Gurpreet Singh wrote:
> Any ideas on what could help here bring down the read latency even more ?
Avoid Cassandra forwarding request to other nodes:
- Use consistency level ONE;
- Create data model to do single request with single key, since different keys 
may belong to different nodes and requires forwarding requests to them;
- Use smart client to calculate token for key and select appropriate node 
(primary or replica) by token range;
- Turn off Dynamic Snitch (it may forward request to other replica even it has 
the data);
- Have all or hot data in page cache (no HDD disk IO) or use SSD;
- If you do regular updates to key, do not use row cache, otherwise you may try.




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

<>

Cassandra 1.0.6 multi data center read question

2012-05-17 Thread Roshan
Hi 

I have setup an Cassandra cluster in production and a separate cluster in
our DR environment. The setup is basically 2 data center setup.

I want to create a separate keyspace on production (production has some
other keyspaces) and only that keyspace will sync the data with DR.

If I do a read operation on the production, will that read operation goes to
DR as well? If so can I disable that call?

My primary purpose is to keep the DR upto date and won't to communicate the
production with DR.

Thanks.

/Roshan 

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-1-0-6-multi-data-center-read-question-tp7564940.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: cassandra read latency help

2012-05-17 Thread Piavlo

  
  
On 05/18/2012 08:49 AM, Viktor Jevdokimov wrote:

  
  
  
  
Row cache is ok until keys are not heavily
updated, otherwise it frequently invalidates and pressures
GC.
  

According to http://www.datastax.com/docs/1.0/operations/tuning
"As of Cassandra 1.0, column family row caches are stored in native
memory by default (outside of the Java heap).
This results in both a smaller per-row memory footprint and reduced
JVM heap requirements, which helps keep the heap size manageable for
good JVM garbage collection performance."
AFAIU it's outside of the Java heap only if JNA is used.

Then I tried row cache for for a few CF (with cassandra 1.0.9), to
my surprise it just killed reads latency , and made very high cpu
usage, the row hit rate was ~20% and reads/writes ~50/50.
The CFs are compressed (does it matter? does the row cache keep rows
compressed or not?)
AFAIU with JNA off heap cache stores the rows in serialized form, so
where does the high cpu come from?


  

 
The
high latency is from your batch of 100 keys. Review your
data model to avoid such reads, if you need low latency.
 
500M
rows on one node, or on the cluster? Reading 100 random rows
at total of 40KB data from a data set of 180GB uncompressed
under 30ms is not an easy task.
 
 



  

  

  Best regards / Pagarbiai
  Viktor Jevdokimov
  Senior Developer
  
  Email: viktor.jevdoki...@adform.com
  Phone: +370 5 212 3063, Fax +370 5 261 0453
  J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
  Follow us on Twitter: 
  @adforminsider
  What is Adform: 
  watch this short video


  


  

  
  
Disclaimer: The information contained in this message and
attachments is intended solely for the attention and use of
the named addressee and may be confidential. If you are not
the intended recipient, you are reminded that the
information remains the property of the sender. You must not
use, disclose, distribute, copy, print or rely on this
e-mail. If you have received this message in error, please
contact the sender immediately and irrevocably delete this
message and any copies.
  



  

  From:
  Gurpreet Singh [mailto:gurpreet.si...@gmail.com]
  
  Sent: Thursday, May 17, 2012 20:24
  To: user@cassandra.apache.org
  Subject: Re: cassandra read latency help

  
   
  Thanks Viktor for the advice.
  
Right now, i just have 1 node that i am
  testing against and i am using CL one.
  
  
Are you suggesting that the page cache
  might be doing better than the row cache?
  I am getting row cache hit of 0.66 right now.
  
  
 
  
  
/G
  
  
 

  On Thu, May 17, 2012 at 12:26 AM,
Viktor Jevdokimov 
wrote:
  
>
  Gurpreet Singh wrote:
  > Any ideas on what could help here bring down the
  read latency even more ?
  
  Avoid Cassandra forwarding request to
other nodes:
- Use consistency level ONE;
- Create data model to do single request with single
key, since different keys may belong to different nodes
and requires forwarding requests to them;
- Use smart client to calculate token for key and select
appropriate node (primary or replica) by token range;
- Turn off Dynamic Snitch (it may forward request to
other replica even it has the data);
- Have all or hot data in page cache (no HDD disk IO) or
use SSD;
- If you do regular updates to key, do not use row
cache, otherwise you may try.




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer