Cassandra implement in two different data-center

2012-08-30 Thread Adeel Akbar

Dear All,

I am going to implement Apache Cassandra in two different data-center 
with 2 nodes in each ring.  I also need to set replica 2 factor in same 
data center. Over the data center data should be replicates between both 
data center rings. Please help me or provide any document which help to 
implement this model.

--


Thanks & Regards

*Adeel**Akbar*



How to set LeveledCompactionStrategy for an existing table

2012-08-30 Thread Jean-Armel Luce
Hello,

I am using Cassandra 1.1.1 and CQL3.
I have a cluster with 1 node (test environment)
Could you tell how to set the compaction strategy to Leveled Strategy for
an existing table ?

I have a table pns_credentials

jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
Use HELP for help.
cqlsh> use test1;
cqlsh:test1> describe table pns_credentials;

CREATE TABLE pns_credentials (
  ise text PRIMARY KEY,
  isnew int,
  ts timestamp,
  mergestatus int,
  infranetaccount text,
  user_level int,
  msisdn bigint,
  mergeusertype int
) WITH
  comment='' AND
  comparator=text AND
  read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  default_validation=text AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='SnappyCompressor';

I want to set the LeveledCompaction strategy for this table, so I execute
the following ALTER TABLE :

cqlsh:test1> alter table pns_credentials
 ... WITH compaction_strategy_class='LeveledCompactionStrategy'
 ... AND compaction_strategy_options:sstable_size_in_mb=10;

In Cassandra logs, I see some informations :
 INFO 10:23:52,532 Enqueuing flush of
Memtable-schema_columnfamilies@965212657(1391/1738 serialized/live bytes,
20 ops)
 INFO 10:23:52,533 Writing Memtable-schema_columnfamilies@965212657(1391/1738
serialized/live bytes, 20 ops)
 INFO 10:23:52,629 Completed flushing
/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-94-Data.db
(1442 bytes) for commitlog position ReplayPosition(segmentId=3556583843054,
position=1987)


However, when I look at the description of the table, the table is still
with the SizeTieredCompactionStrategy
cqlsh:test1> describe table pns_credentials ;

CREATE TABLE pns_credentials (
  ise text PRIMARY KEY,
  isnew int,
  ts timestamp,
  mergestatus int,
  infranetaccount text,
  user_level int,
  msisdn bigint,
  mergeusertype int
) WITH
  comment='' AND
  comparator=text AND
  read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  default_validation=text AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='SnappyCompressor';

In the schema_columnfamilies table (in system keyspace), the table
pns_credentials is still using the SizeTieredCompactionStrategy
cqlsh:test1> use system;
cqlsh:system> select * from schema_columnfamilies ;
...
 test1 |   pns_credentials |   null | KEYS_ONLY
|[] | |
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
|  {}
|
org.apache.cassandra.db.marshal.UTF8Type |
{"sstable_compression":"org.apache.cassandra.io.compress.SnappyCompressor"}
|  org.apache.cassandra.db.marshal.UTF8Type |   864000 |
1029 |   ise | org.apache.cassandra.db.marshal.UTF8Type
|0 |   32
|4 |0.1 |   True
|  null | Standard |null
...


I stopped/started the Cassandra node, but the table is still with
SizeTieredCompactionStrategy

I tried using cassandra-cli, but the alter is still unsuccessfull.

Is there anything I am missing ?


Thanks.

Jean-Armel


Store a timeline with uniques properties

2012-08-30 Thread Morgan Segalis
Hi everyone,

I'm trying to use cassandra in order to store a "timeline", but with values 
that must be unique (replaced). (So not really a timeline, but didn't find a 
better word for it)

Let's me give you an example :

- An user have a list of friends
- Friends can change their nickname, status, profile picture, etc...

at the beginning the CF will look like that for user1: 

lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 
means that the timestamp are older)

user1 row : |   lte |   
lte -1  |   lte -2  |   lte -3  
|   lte -4  |
values :| user2-name-change | user3-pic-change  
| user4-status-change | user2-pic-change| user2-status-change |

If for example, user2 changes it's picture, the row should look like that : 

user1 row : |   lte |   
lte -1  |   lte -2  |   lte -3  
|   lte -4   |
values :|   user2-pic-change| 
user2-name-change | user3-pic-change  | user4-status-change | 
user2-status-change |

notice that user2-pic-change in the first representation (lte -3) has "moved" 
to the (lte) on the second representation.

That way when user1 connects again, It can retrieve only informations that 
occurred between the last time he connected.

e.g. : if the user1's last connexion date it between "lte -2" and "lte -3", 
then he will only be notified that :

- user2 has changed his picture
- user2 has changed his name
- user3 has changed his picture

I would not keep the old data since the "timeline" is saved locally on the 
client, and not on the server.
I really would like not to search for each column in order to find the 
"user2-pic-change", that can be long especially if the user has many friends.

Is there a simple way to do that with cassandra, or I am bound to create 
another CF, with column title holding the action e.g. "user2-pic-change" and 
for value the timestamp when it appears ?

Thanks,

Morgan.



Re: Store a timeline with uniques properties

2012-08-30 Thread Morgan Segalis
Sorry for the scheme that has not keep the right tabulation for some people...
Here's a space-version instead of a tabulation.

user1 row :|   lte|  
lte -1|   lte -2|  lte 
-3   |   lte -4   |
  values :| user2-name-change | user3-pic-change   | 
user4-status-change | user2-pic-change | user2-status-change |

If for example, user2 changes it's picture, the row should look like that : 

user1 row :|lte   | 
  lte -1   |   lte -2   |   
 lte -3  |  lte -4|
values :  |   user2-pic-change| user2-name-change | 
user3-pic-change   | user4-status-change | user2-status-change |

Le 30 août 2012 à 13:22, Morgan Segalis a écrit :

> Hi everyone,
> 
> I'm trying to use cassandra in order to store a "timeline", but with values 
> that must be unique (replaced). (So not really a timeline, but didn't find a 
> better word for it)
> 
> Let's me give you an example :
> 
> - An user have a list of friends
> - Friends can change their nickname, status, profile picture, etc...
> 
> at the beginning the CF will look like that for user1: 
> 
> lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 
> means that the timestamp are older)
> 
> user1 row :   |   lte |   
> lte -1  |   lte -2  |   lte 
> -3  |   lte -4  |
>   values :| user2-name-change | user3-pic-change  
> | user4-status-change | user2-pic-change| user2-status-change |
> 
> If for example, user2 changes it's picture, the row should look like that : 
> 
> user1 row :   |   lte |   
> lte -1  |   lte -2  |   lte 
> -3  |   lte -4   |
>   values :|   user2-pic-change| 
> user2-name-change | user3-pic-change  | user4-status-change | 
> user2-status-change |
> 
> notice that user2-pic-change in the first representation (lte -3) has "moved" 
> to the (lte) on the second representation.
> 
> That way when user1 connects again, It can retrieve only informations that 
> occurred between the last time he connected.
> 
> e.g. : if the user1's last connexion date it between "lte -2" and "lte -3", 
> then he will only be notified that :
> 
> - user2 has changed his picture
> - user2 has changed his name
> - user3 has changed his picture
> 
> I would not keep the old data since the "timeline" is saved locally on the 
> client, and not on the server.
> I really would like not to search for each column in order to find the 
> "user2-pic-change", that can be long especially if the user has many friends.
> 
> Is there a simple way to do that with cassandra, or I am bound to create 
> another CF, with column title holding the action e.g. "user2-pic-change" and 
> for value the timestamp when it appears ?
> 
> Thanks,
> 
> Morgan.
> 



Re: performance is drastically degraded after 0.7.8 --> 1.0.11 upgrade

2012-08-30 Thread Илья Шипицин
we are running somewhat queue-like with aggressive write-read patterns.
I was looking for scripting queries from live Cassandra installation, but I
didn't find any.

is there something like thrift-proxy or other query logging/scripting
engine ?

2012/8/30 aaron morton 

> in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
> times slower than cassandra-0.7.8
>
> We've not had any reports of a performance drop off. All tests so far have
> show improvements in both read and write performance.
>
> I agree, such digests save some network IO, but they seem to be very bad
> in terms of CPU and disk IO.
>
> The sha1 is created so we can diagnose corruptions in the -Data component
> of the SSTables. They are not used to save network IO.
> It is calculated while streaming the Memtable to disk so has no impact on
> disk IO. While not the fasted algorithm I would assume it's CPU overhead in
> this case is minimal.
>
>  there's already relatively small Bloom filter file, which can be used for
> saving network traffic instead of sha1 digest.
>
> Bloom filters are used to test if a row key may exist in an SSTable.
>
> any explanation ?
>
> If you can provide some more information on your use case we may be able
> to help.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 30/08/2012, at 5:18 AM, Илья Шипицин  wrote:
>
> in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
> times slower than cassandra-0.7.8
> after some investigation carried out I noticed files with "sha1" extension
> (which are missing for Cassandra-0.7.8)
>
> in maybeWriteDigest() function I see no option fot switching sha1 digests
> off.
>
> I agree, such digests save some network IO, but they seem to be very bad
> in terms of CPU and disk IO.
> why to use one more digest (which have to be calculated), there's already
> relatively small Bloom filter file, which can be used for saving network
> traffic instead of sha1 digest.
>
> any explanation ?
>
> Ilya Shipitsin
>
>
>


Re: performance is drastically degraded after 0.7.8 --> 1.0.11 upgrade

2012-08-30 Thread Edward Capriolo
If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
soon as possible. If you have large bloomfilters you can hit a bug
where the bloom filters will not work properly.


On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин  wrote:
> we are running somewhat queue-like with aggressive write-read patterns.
> I was looking for scripting queries from live Cassandra installation, but I
> didn't find any.
>
> is there something like thrift-proxy or other query logging/scripting engine
> ?
>
> 2012/8/30 aaron morton 
>>
>> in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
>> times slower than cassandra-0.7.8
>>
>> We've not had any reports of a performance drop off. All tests so far have
>> show improvements in both read and write performance.
>>
>> I agree, such digests save some network IO, but they seem to be very bad
>> in terms of CPU and disk IO.
>>
>> The sha1 is created so we can diagnose corruptions in the -Data component
>> of the SSTables. They are not used to save network IO.
>> It is calculated while streaming the Memtable to disk so has no impact on
>> disk IO. While not the fasted algorithm I would assume it's CPU overhead in
>> this case is minimal.
>>
>>  there's already relatively small Bloom filter file, which can be used for
>> saving network traffic instead of sha1 digest.
>>
>> Bloom filters are used to test if a row key may exist in an SSTable.
>>
>> any explanation ?
>>
>> If you can provide some more information on your use case we may be able
>> to help.
>>
>> Cheers
>>
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 30/08/2012, at 5:18 AM, Илья Шипицин  wrote:
>>
>> in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
>> times slower than cassandra-0.7.8
>> after some investigation carried out I noticed files with "sha1" extension
>> (which are missing for Cassandra-0.7.8)
>>
>> in maybeWriteDigest() function I see no option fot switching sha1 digests
>> off.
>>
>> I agree, such digests save some network IO, but they seem to be very bad
>> in terms of CPU and disk IO.
>> why to use one more digest (which have to be calculated), there's already
>> relatively small Bloom filter file, which can be used for saving network
>> traffic instead of sha1 digest.
>>
>> any explanation ?
>>
>> Ilya Shipitsin
>>
>>
>


Re: Spring - cassandra

2012-08-30 Thread Radim Kolar




You looking for the author of Spring Data Cassandra?
https://github.com/boneill42/spring-data-cassandra

If so, I guess that is me. =)
Did you get in touch with spring guys? They have cassandra support on 
their spring data todo list. They might have some todo or feature list 
they want to implement for cassandra, i am willing to code something to 
make official spring cassandra support happen faster.


Re: Spring - cassandra

2012-08-30 Thread Brian O'Neill

Yes.  I'm in contact with Oliver Gierke and Erez Mazor of Spring Data.

We are working on two fronts:
1) Spring Data support via JPA (using Kundera underneath)
- Initial attempt here:
http://brianoneill.blogspot.com/2012/07/spring-data-w-cassandra-using-jpa.h
tml
- Most recently (an hour ago): The issues w/ MetaModel are fixed, now
waiting on an enhancement to the EntityManager to fully support type
queries.

For this one, we're in a holding pattern until Kundera is fully JPA
compliant.

2) Spring Data support via Astyanax
- The project I'm working below should mimic Spring Data MongoDB's
approach and capabilities, allowing people to use Spring Data with
Cassandra without the constraints of JPA.  I'd love some help working on
the project.  Once we have it functional we should be able to push it to
Spring. (with Oliver's help)

Go ahead and fork.  Feel free to email me directly so we don't spam this
list.
(or setup a googlegroup just in case others want to contribute)

-brian


---
Brian O'Neill
Lead Architect, Software Development
Apache Cassandra MVP
 
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42   €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 8/30/12 9:01 AM, "Radim Kolar"  wrote:

>
>
>> You looking for the author of Spring Data Cassandra?
>> https://github.com/boneill42/spring-data-cassandra
>>
>> If so, I guess that is me. =)
>Did you get in touch with spring guys? They have cassandra support on
>their spring data todo list. They might have some todo or feature list
>they want to implement for cassandra, i am willing to code something to
>make official spring cassandra support happen faster.




Re: Cassandra implement in two different data-center

2012-08-30 Thread Aaron Turner
On Thu, Aug 30, 2012 at 1:14 AM, Adeel Akbar
 wrote:
> Dear All,
>
> I am going to implement Apache Cassandra in two different data-center with 2
> nodes in each ring.  I also need to set replica 2 factor in same data
> center. Over the data center data should be replicates between both data
> center rings. Please help me or provide any document which help to implement
> this model.

http://www.datastax.com/docs/1.1/initialize/cluster_init_multi_dc

has good info on building a multi-DC cluster.

That said, 2 nodes per-DC means you can't use LOCAL_QUORUM/QUORUM for
read & writes.  I would strongly suggest 3 nodes per DC if you care
about consistent reads.  Generally speaking, 3 nodes per-DC is
considered the recommended minimum number of nodes for a production
system.



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"


Cassandra - cqlsh

2012-08-30 Thread Morantus, James (PCLN-NW)
Hello all,

This is my first setup of Cassandra and I'm having some issues running the 
cqlsh tool.
Have any of you come across this error before? If so, please help.

/bin/cqlsh -h localhost -p 9160
No appropriate python interpreter found. 

Thanks
James


adding node to cluster

2012-08-30 Thread Casey Deccio
All,

I'm adding a new node to an existing cluster that uses
ByteOrderedPartitioner.  The documentation says that if I don't configure a
token, then one will be automatically generated to take load from an
existing node.  What I'm finding is that when I add a new node, (super)
column lookups begin failing (not sure if it was the row lookup failing or
the supercolumn lookup failing), and I'm not sure why.  I assumed that
while the existing node is transitioning data to the new node the affected
rows and (super) columns would still be found in the right place.  Any idea
why these lookups might be failing?  When I decommissioned the the new
node, the lookups began working again.  Any help is appreciated.

Regards,
Casey


Re: How to set LeveledCompactionStrategy for an existing table

2012-08-30 Thread feedly team
in cassandra-cli, i did something like:

update column family xyz with
compaction_strategy='LeveledCompactionStrategy'

On Thu, Aug 30, 2012 at 5:20 AM, Jean-Armel Luce  wrote:

>
> Hello,
>
> I am using Cassandra 1.1.1 and CQL3.
> I have a cluster with 1 node (test environment)
> Could you tell how to set the compaction strategy to Leveled Strategy for
> an existing table ?
>
> I have a table pns_credentials
>
> jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
> Connected to Test Cluster at localhost:9160.
> [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
> Use HELP for help.
> cqlsh> use test1;
> cqlsh:test1> describe table pns_credentials;
>
> CREATE TABLE pns_credentials (
>   ise text PRIMARY KEY,
>   isnew int,
>   ts timestamp,
>   mergestatus int,
>   infranetaccount text,
>   user_level int,
>   msisdn bigint,
>   mergeusertype int
> ) WITH
>   comment='' AND
>   comparator=text AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   default_validation=text AND
>   min_compaction_threshold=4 AND
>   max_compaction_threshold=32 AND
>   replicate_on_write='true' AND
>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>   compression_parameters:sstable_compression='SnappyCompressor';
>
> I want to set the LeveledCompaction strategy for this table, so I execute
> the following ALTER TABLE :
>
> cqlsh:test1> alter table pns_credentials
>  ... WITH compaction_strategy_class='LeveledCompactionStrategy'
>  ... AND compaction_strategy_options:sstable_size_in_mb=10;
>
> In Cassandra logs, I see some informations :
>  INFO 10:23:52,532 Enqueuing flush of
> Memtable-schema_columnfamilies@965212657(1391/1738 serialized/live bytes,
> 20 ops)
>  INFO 10:23:52,533 Writing Memtable-schema_columnfamilies@965212657(1391/1738
> serialized/live bytes, 20 ops)
>  INFO 10:23:52,629 Completed flushing
> /var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-94-Data.db
> (1442 bytes) for commitlog position ReplayPosition(segmentId=3556583843054,
> position=1987)
>
>
> However, when I look at the description of the table, the table is still
> with the SizeTieredCompactionStrategy
> cqlsh:test1> describe table pns_credentials ;
>
> CREATE TABLE pns_credentials (
>   ise text PRIMARY KEY,
>   isnew int,
>   ts timestamp,
>   mergestatus int,
>   infranetaccount text,
>   user_level int,
>   msisdn bigint,
>   mergeusertype int
> ) WITH
>   comment='' AND
>   comparator=text AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   default_validation=text AND
>   min_compaction_threshold=4 AND
>   max_compaction_threshold=32 AND
>   replicate_on_write='true' AND
>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>   compression_parameters:sstable_compression='SnappyCompressor';
>
> In the schema_columnfamilies table (in system keyspace), the table
> pns_credentials is still using the SizeTieredCompactionStrategy
> cqlsh:test1> use system;
> cqlsh:system> select * from schema_columnfamilies ;
> ...
>  test1 |   pns_credentials |   null | KEYS_ONLY
> |[] | |
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
> |  {}
> |
> org.apache.cassandra.db.marshal.UTF8Type |
> {"sstable_compression":"org.apache.cassandra.io.compress.SnappyCompressor"}
> |  org.apache.cassandra.db.marshal.UTF8Type |   864000 |
> 1029 |   ise | org.apache.cassandra.db.marshal.UTF8Type
> |0 |   32
> |4 |0.1 |   True
> |  null | Standard |null
> ...
>
>
> I stopped/started the Cassandra node, but the table is still with
> SizeTieredCompactionStrategy
>
> I tried using cassandra-cli, but the alter is still unsuccessfull.
>
> Is there anything I am missing ?
>
>
> Thanks.
>
> Jean-Armel
>


Re: Cassandra - cqlsh

2012-08-30 Thread Tyler Hobbs
What OS are you using?

On Thu, Aug 30, 2012 at 12:09 PM, Morantus, James (PCLN-NW) <
james.moran...@priceline.com> wrote:

> Hello all,
>
> This is my first setup of Cassandra and I'm having some issues running the
> cqlsh tool.
> Have any of you come across this error before? If so, please help.
>
> /bin/cqlsh -h localhost -p 9160
> No appropriate python interpreter found.
>
> Thanks
> James
>



-- 
Tyler Hobbs
DataStax 


Re: adding node to cluster

2012-08-30 Thread Rob Coli
On Thu, Aug 30, 2012 at 10:18 AM, Casey Deccio  wrote:
> I'm adding a new node to an existing cluster that uses
> ByteOrderedPartitioner.  The documentation says that if I don't configure a
> token, then one will be automatically generated to take load from an
> existing node.
> What I'm finding is that when I add a new node, (super)
> column lookups begin failing (not sure if it was the row lookup failing or
> the supercolumn lookup failing), and I'm not sure why.

1) You almost never actually want BOP.
2) You never want Cassandra to pick a token for you. IMO and the
opinion of many others, the fact that it does this is a bug. Specify a
token with initial_token.
3) You never want to use Supercolumns. The project does not support
them but currently has no plan to deprecate them. Use composite row
keys.
4) Unless your existing cluster consists of one node, you almost never
want to add only a single new node to a cluster. In general you want
to double it.

In summary, you are Doing It just about as Wrong as possible... but on
to your actual question ... ! :)

In what way are the lookups "failing"? Is there an exception?

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


RE: Cassandra - cqlsh

2012-08-30 Thread Morantus, James (PCLN-NW)
Red Hat Enterprise Linux Server release 5.8 (Tikanga)

Linux nw-mydb-s05 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 2012 x86_64 
x86_64 x86_64 GNU/Linux

Thanks


From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Thursday, August 30, 2012 2:21 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra - cqlsh

What OS are you using?
On Thu, Aug 30, 2012 at 12:09 PM, Morantus, James (PCLN-NW) 
mailto:james.moran...@priceline.com>> wrote:
Hello all,

This is my first setup of Cassandra and I'm having some issues running the 
cqlsh tool.
Have any of you come across this error before? If so, please help.

/bin/cqlsh -h localhost -p 9160
No appropriate python interpreter found.

Thanks
James



--
Tyler Hobbs
DataStax


Re: Cassandra - cqlsh

2012-08-30 Thread Tyler Hobbs
RHEL 5 only ships with Python 2.4, which is pretty ancient and below what
cqlsh will accept.  You can install Python 2.6 with EPEL enabled:
http://blog.nexcess.net/2011/02/25/python-2-6-for-centos-5/

On Thu, Aug 30, 2012 at 1:34 PM, Morantus, James (PCLN-NW) <
james.moran...@priceline.com> wrote:

> Red Hat Enterprise Linux Server release 5.8 (Tikanga)
>
> ** **
>
> Linux nw-mydb-s05 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 2012
> x86_64 x86_64 x86_64 GNU/Linux
>
> ** **
>
> Thanks
>
> ** **
>
> ** **
>
> *From:* Tyler Hobbs [mailto:ty...@datastax.com]
> *Sent:* Thursday, August 30, 2012 2:21 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra - cqlsh
>
> ** **
>
> What OS are you using?
>
> On Thu, Aug 30, 2012 at 12:09 PM, Morantus, James (PCLN-NW) <
> james.moran...@priceline.com> wrote:
>
> Hello all,
>
> This is my first setup of Cassandra and I'm having some issues running the
> cqlsh tool.
> Have any of you come across this error before? If so, please help.
>
> /bin/cqlsh -h localhost -p 9160
> No appropriate python interpreter found.
>
> Thanks
> James
>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>



-- 
Tyler Hobbs
DataStax 


Re: Why Cassandra secondary indexes are so slow on just 350k rows?

2012-08-30 Thread Tyler Hobbs
pycassa already breaks up the query into smaller chunks, but you should try
playing with the buffer_size kwarg for get_indexed_slices, perhaps lowering
it to ~300, as Aaron suggests:
http://pycassa.github.com/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.get_indexed_slices

On Wed, Aug 29, 2012 at 11:40 PM, aaron morton wrote:

>  *from 12 to 20 seconds (!!!) to find 5000 rows*.
>
> More is not always better.
>
> Cassandra must materialise the full 5000 rows and send them all over the
> wire to be materialised on the other side. Try asking for a few hundred at
> a time and see how it goes.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/08/2012, at 6:46 PM, Robin Verlangen  wrote:
>
> @Edward: I think you should consider a queue for exporting the new rows.
> Just store the rowkey in a queue (you might want to consider looking at
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Distributed-work-queues-td5226248.html
>  )
> and process that row every couple of minutes. Then manually delete columns
> from that queue-row.
>
> With kind regards,
>
> Robin Verlangen
> *Software engineer*
> *
> *
> W http://www.robinverlangen.nl
> E ro...@us2.nl
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>
>
> 2012/8/29 Robin Verlangen 
>
>> "What this means is that eventually you will have 1 row in the secondary
>> index table with 350K columns"
>>
>> Is this really true? I would have expected that Cassandra used internal
>> index sharding/bucketing?
>>
>> With kind regards,
>>
>> Robin Verlangen
>> *Software engineer*
>> *
>> *
>> W http://www.robinverlangen.nl
>> E ro...@us2.nl
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>>
>>
>> 2012/8/29 Dave Brosius 
>>
>>> If i understand you correctly, you are only ever querying for the rows
>>> where is_exported = false, and turning them into trues. What this means is
>>> that eventually you will have 1 row in the secondary index table with 350K
>>> columns that you will never look at.
>>>
>>> It seems to me you that perhaps you should just hold your own "manual
>>> index" cf that points to non exported rows, and just delete those columns
>>> when they are exported.
>>>
>>>
>>>
>>> On 08/28/2012 05:23 PM, Edward Kibardin wrote:
>>>
 I have a column family with the secondary index. The secondary index is
 basically a binary field, but I'm using a string for it. The field called
 *is_exported* and can be *'true'* or *'false'*. After request all loaded
 rows are updated with *is_exported = 'false'*.

 I'm polling this column table each ten minutes and exporting new rows
 as they appear.

 But here the problem: I'm seeing that time for this query grows pretty
 linear with amount of data in column table, and currently it takes *from 12
 to 20 seconds (!!!) to find 5000 rows*. From my understanding, indexed
 request should not depend on number of rows in CF but from number of rows
 per one index value (cardinality), as it's just another hidden CF like:

 "true" : rowKey1 rowKey2 rowKey3 ...
 "false": rowKey1 rowKey2 rowKey3 ...

 I'm using Pycassa to query the data, here the code I'm using:

 column_family = pycassa.ColumnFamily(**cassandra_pool,
 column_family_name, read_consistency_level=2)
 is_exported_expr = create_index_expression('is_**exported',
 'false')
 clause = create_index_clause([is_**exported_expr], count =
 5000)
 column_family.get_indexed_**slices(clause)

 Am I doing something wrong, but I expect this operation to work MUCH
 faster.

 Any ideas or suggestions?

 Some config info:
  - Cassandra 1.1.0
  - RandomPartitioner
  - I have 2 nodes and replication_factor = 2 (each server has a full
 data copy)
  - Using AWS EC2, large instances
  - Software raid0 on ephemeral drives

 Thanks in advance!


>>>
>>
>
>


-- 
Tyler Hobbs
Data

RE: Cassandra - cqlsh

2012-08-30 Thread Morantus, James (PCLN-NW)
Ah... Thanks

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Thursday, August 30, 2012 2:42 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra - cqlsh

RHEL 5 only ships with Python 2.4, which is pretty ancient and below what cqlsh 
will accept.  You can install Python 2.6 with EPEL enabled: 
http://blog.nexcess.net/2011/02/25/python-2-6-for-centos-5/
On Thu, Aug 30, 2012 at 1:34 PM, Morantus, James (PCLN-NW) 
mailto:james.moran...@priceline.com>> wrote:
Red Hat Enterprise Linux Server release 5.8 (Tikanga)

Linux nw-mydb-s05 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 2012 x86_64 
x86_64 x86_64 GNU/Linux

Thanks


From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Thursday, August 30, 2012 2:21 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra - cqlsh

What OS are you using?
On Thu, Aug 30, 2012 at 12:09 PM, Morantus, James (PCLN-NW) 
mailto:james.moran...@priceline.com>> wrote:
Hello all,

This is my first setup of Cassandra and I'm having some issues running the 
cqlsh tool.
Have any of you come across this error before? If so, please help.

/bin/cqlsh -h localhost -p 9160
No appropriate python interpreter found.

Thanks
James



--
Tyler Hobbs
DataStax



--
Tyler Hobbs
DataStax


Re: How to set LeveledCompactionStrategy for an existing table

2012-08-30 Thread Jean-Armel Luce
I tried as you said with cassandra-cli, and still unsuccessfully

[default@unknown] use test1;
Authenticated to keyspace: test1
[default@test1] UPDATE COLUMN FAMILY pns_credentials with
compaction_strategy='LeveledCompactionStrategy';
8ed12919-ef2b-327f-8f57-4c2de26c9d51
Waiting for schema agreement...
... schemas agree across the cluster

And then, when I check the compaction strategy, it is still
SizeTieredCompactionStrategy
[default@test1] describe pns_credentials;
ColumnFamily: pns_credentials
  Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Default column value validator:
org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.1
  DC Local Read repair chance: 0.0
  Replicate on write: true
  Caching: KEYS_ONLY
  Bloom Filter FP chance: default
  Built indexes: []
  Column Metadata:
Column Name: isnew
  Validation Class: org.apache.cassandra.db.marshal.Int32Type
Column Name: ts
  Validation Class: org.apache.cassandra.db.marshal.DateType
Column Name: mergestatus
  Validation Class: org.apache.cassandra.db.marshal.Int32Type
Column Name: infranetaccount
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Column Name: user_level
  Validation Class: org.apache.cassandra.db.marshal.Int32Type
Column Name: msisdn
  Validation Class: org.apache.cassandra.db.marshal.LongType
Column Name: mergeusertype
  Validation Class: org.apache.cassandra.db.marshal.Int32Type
  Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
sstable_compression:
org.apache.cassandra.io.compress.SnappyCompressor



I tried also to create a new table with LeveledCompactionStrategy (using
cqlsh), and when I check the compaction strategy, the
SizeTieredCompactionStrategy is set for this table.

cqlsh:test1> CREATE TABLE pns_credentials3 (
 ...   ise text PRIMARY KEY,
 ...   isnew int,
 ...   ts timestamp,
 ...   mergestatus int,
 ...   infranetaccount text,
 ...   user_level int,
 ...   msisdn bigint,
 ...   mergeusertype int
 ... ) WITH
 ...   comment='' AND
 ...   read_repair_chance=0.10 AND
 ...   gc_grace_seconds=864000 AND
 ...   compaction_strategy_class='LeveledCompactionStrategy' AND
 ...
compression_parameters:sstable_compression='SnappyCompressor';
cqlsh:test1> describe table pns_credentials3

CREATE TABLE pns_credentials3 (
  ise text PRIMARY KEY,
  isnew int,
  ts timestamp,
  mergestatus int,
  infranetaccount text,
  user_level int,
  msisdn bigint,
  mergeusertype int
) WITH
  comment='' AND
  comparator=text AND
  read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  default_validation=text AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='SnappyCompressor';

Maybe something is wrong in my server.
Any idea ?

Thanks.
Jean-Armel


2012/8/30 feedly team 

> in cassandra-cli, i did something like:
>
> update column family xyz with
> compaction_strategy='LeveledCompactionStrategy'
>
>
> On Thu, Aug 30, 2012 at 5:20 AM, Jean-Armel Luce wrote:
>
>>
>> Hello,
>>
>> I am using Cassandra 1.1.1 and CQL3.
>> I have a cluster with 1 node (test environment)
>> Could you tell how to set the compaction strategy to Leveled Strategy for
>> an existing table ?
>>
>> I have a table pns_credentials
>>
>> jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
>> Connected to Test Cluster at localhost:9160.
>> [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
>> Use HELP for help.
>> cqlsh> use test1;
>> cqlsh:test1> describe table pns_credentials;
>>
>> CREATE TABLE pns_credentials (
>>   ise text PRIMARY KEY,
>>   isnew int,
>>   ts timestamp,
>>   mergestatus int,
>>   infranetaccount text,
>>   user_level int,
>>   msisdn bigint,
>>   mergeusertype int
>> ) WITH
>>   comment='' AND
>>   comparator=text AND
>>   read_repair_chance=0.10 AND
>>   gc_grace_seconds=864000 AND
>>   default_validation=text AND
>>   min_compaction_threshold=4 AND
>>   max_compaction_threshold=32 AND
>>   replicate_on_write='true' AND
>>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>>   compression_parameters:sstable_compression='SnappyCompressor';
>>
>> I want to set the LeveledCompaction strategy for this table, so I execute
>> the following ALTER TABLE :
>>
>> cqlsh:test1> alter table pns_credentials
>>  ... WITH compaction_strategy_class='LeveledCompactionStrategy'
>>  ... AND compaction_strategy_options:sstable_si

Re: Why Cassandra secondary indexes are so slow on just 350k rows?

2012-08-30 Thread Edward Kibardin
Thanks Guys for the answers...

The main issue here seems not the secondary index, but speed of searching
for random keys in column family.
I've done the experiment and queried the same 5000 rows not using index but
providing a list of keys to Pycassa... the speed was the same.

Although, using SuperColumns I can get same 5000 rows (SuperColumns) like
in 1-2 seconds... It's understandable, as columns are stored sequentially.

So here the question, is it normal for Cassandra in general to search 5000
rows for 20 seconds or it's just something wrong with my instance?

Ed


On Thu, Aug 30, 2012 at 7:45 PM, Tyler Hobbs  wrote:

> pycassa already breaks up the query into smaller chunks, but you should
> try playing with the buffer_size kwarg for get_indexed_slices, perhaps
> lowering it to ~300, as Aaron suggests:
> http://pycassa.github.com/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.get_indexed_slices
>
>
> On Wed, Aug 29, 2012 at 11:40 PM, aaron morton wrote:
>
>>  *from 12 to 20 seconds (!!!) to find 5000 rows*.
>>
>> More is not always better.
>>
>> Cassandra must materialise the full 5000 rows and send them all over the
>> wire to be materialised on the other side. Try asking for a few hundred at
>> a time and see how it goes.
>>
>> Cheers
>>
>>   -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 29/08/2012, at 6:46 PM, Robin Verlangen  wrote:
>>
>> @Edward: I think you should consider a queue for exporting the new rows.
>> Just store the rowkey in a queue (you might want to consider looking at
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Distributed-work-queues-td5226248.html
>>  )
>> and process that row every couple of minutes. Then manually delete columns
>> from that queue-row.
>>
>> With kind regards,
>>
>> Robin Verlangen
>> *Software engineer*
>> *
>> *
>> W http://www.robinverlangen.nl
>> E ro...@us2.nl
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>>
>>
>> 2012/8/29 Robin Verlangen 
>>
>>> "What this means is that eventually you will have 1 row in the
>>> secondary index table with 350K columns"
>>>
>>> Is this really true? I would have expected that Cassandra used internal
>>> index sharding/bucketing?
>>>
>>> With kind regards,
>>>
>>> Robin Verlangen
>>> *Software engineer*
>>> *
>>> *
>>> W http://www.robinverlangen.nl
>>> E ro...@us2.nl
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>>
>>>
>>> 2012/8/29 Dave Brosius 
>>>
 If i understand you correctly, you are only ever querying for the rows
 where is_exported = false, and turning them into trues. What this means is
 that eventually you will have 1 row in the secondary index table with 350K
 columns that you will never look at.

 It seems to me you that perhaps you should just hold your own "manual
 index" cf that points to non exported rows, and just delete those columns
 when they are exported.



 On 08/28/2012 05:23 PM, Edward Kibardin wrote:

> I have a column family with the secondary index. The secondary index
> is basically a binary field, but I'm using a string for it. The field
> called *is_exported* and can be *'true'* or *'false'*. After request all
> loaded rows are updated with *is_exported = 'false'*.
>
> I'm polling this column table each ten minutes and exporting new rows
> as they appear.
>
> But here the problem: I'm seeing that time for this query grows pretty
> linear with amount of data in column table, and currently it takes *from 
> 12
> to 20 seconds (!!!) to find 5000 rows*. From my understanding, indexed
> request should not depend on number of rows in CF but from number of rows
> per one index value (cardinality), as it's just another hidden CF like:
>
> "true" : rowKey1 rowKey2 rowKey3 ...
> "false": rowKey1 rowKey2 rowKey3 ...
>
> I'm using Pycassa to query the data, here the code I'm using:
>
> column_family = pycas

Re: Why Cassandra secondary indexes are so slow on just 350k rows?

2012-08-30 Thread Hiller, Dean
It seems to me you may want to revisit the design(but not 100% sure as I am not 
sure I understand the entire context) a bit as I could see having partitions 
and a few clients that poll in each partition so you can scale to infinity 
basically with no issues.  If you are doing all this polling from one machine, 
it just won't scale very well.

playOrm does this for you but the basic pattern you can do yourself without 
playOrm would be….

Row 1
Row 2
Row 3
Row 4

Index row for partition 1 - .row1, .row4
Index row for partition 2 - .row2, .row3
…

Now each server is responsible for polling / scanning it's partitions index 
rows above.  If you have 2 servers and 2 partitions, each one would column scan 
the above index rows and then lookup the actual rows.  If it is unbalanced like 
5 severs and 28 partitions, you can use hash code of partition of course and 
number of servers to figure out if server owns that partition are not for 
polling.

All of this is automatic in playOrm with S-JQL (Scalable-JQL – one minor change 
to SQL to make it scalable).

Later,
Dean



From: Edward Kibardin mailto:infa...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Thursday, August 30, 2012 2:14 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Why Cassandra secondary indexes are so slow on just 350k rows?

t should not depend on number of rows in CF but from number of rows per one 
index value


Re: Store a timeline with uniques properties

2012-08-30 Thread aaron morton
Consider trying…

UserTimeline CF

row_key: 
column_names: 
column_values: action details

To get the changes between two times specify the start and end timestamps and 
do not include the other components of the column name. 

e.g. from <1234, NULL, NULL> to <6789, NULL, NULL>

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/08/2012, at 11:32 PM, Morgan Segalis  wrote:

> Sorry for the scheme that has not keep the right tabulation for some people...
> Here's a space-version instead of a tabulation.
> 
> user1 row :|   lte|  
> lte -1|   lte -2|  
> lte -3   |   lte -4   |
>  values :| user2-name-change | user3-pic-change   | 
> user4-status-change | user2-pic-change | user2-status-change |
> 
> If for example, user2 changes it's picture, the row should look like that : 
> 
> user1 row :|lte   |   
> lte -1   |   lte -2   |   
>  lte -3  |  lte -4|
>values :  |   user2-pic-change| user2-name-change 
> | user3-pic-change   | user4-status-change | user2-status-change |
> 
> Le 30 août 2012 à 13:22, Morgan Segalis a écrit :
> 
>> Hi everyone,
>> 
>> I'm trying to use cassandra in order to store a "timeline", but with values 
>> that must be unique (replaced). (So not really a timeline, but didn't find a 
>> better word for it)
>> 
>> Let's me give you an example :
>> 
>> - An user have a list of friends
>> - Friends can change their nickname, status, profile picture, etc...
>> 
>> at the beginning the CF will look like that for user1: 
>> 
>> lte = latest-timestamp-entry, which is the timestamp of the entry (-1 -2 -3 
>> means that the timestamp are older)
>> 
>> user1 row :  |   lte |   
>> lte -1  |   lte -2  |   lte 
>> -3  |   lte -4  |
>>  values :| user2-name-change | user3-pic-change  
>> | user4-status-change | user2-pic-change| user2-status-change |
>> 
>> If for example, user2 changes it's picture, the row should look like that : 
>> 
>> user1 row :  |   lte |   
>> lte -1  |   lte -2  |   lte 
>> -3  |   lte -4   |
>>  values :|   user2-pic-change| 
>> user2-name-change | user3-pic-change  | user4-status-change | 
>> user2-status-change |
>> 
>> notice that user2-pic-change in the first representation (lte -3) has 
>> "moved" to the (lte) on the second representation.
>> 
>> That way when user1 connects again, It can retrieve only informations that 
>> occurred between the last time he connected.
>> 
>> e.g. : if the user1's last connexion date it between "lte -2" and "lte -3", 
>> then he will only be notified that :
>> 
>> - user2 has changed his picture
>> - user2 has changed his name
>> - user3 has changed his picture
>> 
>> I would not keep the old data since the "timeline" is saved locally on the 
>> client, and not on the server.
>> I really would like not to search for each column in order to find the 
>> "user2-pic-change", that can be long especially if the user has many friends.
>> 
>> Is there a simple way to do that with cassandra, or I am bound to create 
>> another CF, with column title holding the action e.g. "user2-pic-change" and 
>> for value the timestamp when it appears ?
>> 
>> Thanks,
>> 
>> Morgan.
>> 
> 



Re: performance is drastically degraded after 0.7.8 --> 1.0.11 upgrade

2012-08-30 Thread aaron morton
>> we are running somewhat queue-like with aggressive write-read patterns.
We'll need some more details…

How much data ?
How many machines ?
What is the machine spec ?
How many clients ?
Is there an example of a slow request ? 
How are you measuring that it's slow ? 
Is there anything unusual in the log ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/08/2012, at 3:30 AM, Edward Capriolo  wrote:

> If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
> soon as possible. If you have large bloomfilters you can hit a bug
> where the bloom filters will not work properly.
> 
> 
> On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин  wrote:
>> we are running somewhat queue-like with aggressive write-read patterns.
>> I was looking for scripting queries from live Cassandra installation, but I
>> didn't find any.
>> 
>> is there something like thrift-proxy or other query logging/scripting engine
>> ?
>> 
>> 2012/8/30 aaron morton 
>>> 
>>> in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
>>> times slower than cassandra-0.7.8
>>> 
>>> We've not had any reports of a performance drop off. All tests so far have
>>> show improvements in both read and write performance.
>>> 
>>> I agree, such digests save some network IO, but they seem to be very bad
>>> in terms of CPU and disk IO.
>>> 
>>> The sha1 is created so we can diagnose corruptions in the -Data component
>>> of the SSTables. They are not used to save network IO.
>>> It is calculated while streaming the Memtable to disk so has no impact on
>>> disk IO. While not the fasted algorithm I would assume it's CPU overhead in
>>> this case is minimal.
>>> 
>>> there's already relatively small Bloom filter file, which can be used for
>>> saving network traffic instead of sha1 digest.
>>> 
>>> Bloom filters are used to test if a row key may exist in an SSTable.
>>> 
>>> any explanation ?
>>> 
>>> If you can provide some more information on your use case we may be able
>>> to help.
>>> 
>>> Cheers
>>> 
>>> 
>>> -
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 30/08/2012, at 5:18 AM, Илья Шипицин  wrote:
>>> 
>>> in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
>>> times slower than cassandra-0.7.8
>>> after some investigation carried out I noticed files with "sha1" extension
>>> (which are missing for Cassandra-0.7.8)
>>> 
>>> in maybeWriteDigest() function I see no option fot switching sha1 digests
>>> off.
>>> 
>>> I agree, such digests save some network IO, but they seem to be very bad
>>> in terms of CPU and disk IO.
>>> why to use one more digest (which have to be calculated), there's already
>>> relatively small Bloom filter file, which can be used for saving network
>>> traffic instead of sha1 digest.
>>> 
>>> any explanation ?
>>> 
>>> Ilya Shipitsin
>>> 
>>> 
>> 



Re: How to set LeveledCompactionStrategy for an existing table

2012-08-30 Thread aaron morton
Looks like a bug. 

Can you please create a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA and update the email thread ?

Can you include this: CFPropDefs.applyToCFMetadata() does not set the 
compaction class on CFM

Thanks


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/08/2012, at 7:05 AM, Jean-Armel Luce  wrote:

> I tried as you said with cassandra-cli, and still unsuccessfully
> 
> [default@unknown] use test1;
> Authenticated to keyspace: test1
> [default@test1] UPDATE COLUMN FAMILY pns_credentials with 
> compaction_strategy='LeveledCompactionStrategy';
> 8ed12919-ef2b-327f-8f57-4c2de26c9d51
> Waiting for schema agreement...
> ... schemas agree across the cluster
> 
> And then, when I check the compaction strategy, it is still  
> SizeTieredCompactionStrategy
> [default@test1] describe pns_credentials;
> ColumnFamily: pns_credentials
>   Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>   Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 0.1
>   DC Local Read repair chance: 0.0
>   Replicate on write: true
>   Caching: KEYS_ONLY
>   Bloom Filter FP chance: default
>   Built indexes: []
>   Column Metadata:
> Column Name: isnew
>   Validation Class: org.apache.cassandra.db.marshal.Int32Type
> Column Name: ts
>   Validation Class: org.apache.cassandra.db.marshal.DateType
> Column Name: mergestatus
>   Validation Class: org.apache.cassandra.db.marshal.Int32Type
> Column Name: infranetaccount
>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
> Column Name: user_level
>   Validation Class: org.apache.cassandra.db.marshal.Int32Type
> Column Name: msisdn
>   Validation Class: org.apache.cassandra.db.marshal.LongType
> Column Name: mergeusertype
>   Validation Class: org.apache.cassandra.db.marshal.Int32Type
>   Compaction Strategy: 
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>   Compression Options:
> sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
> 
> 
> 
> I tried also to create a new table with LeveledCompactionStrategy (using 
> cqlsh), and when I check the compaction strategy, the 
> SizeTieredCompactionStrategy is set for this table.
> 
> cqlsh:test1> CREATE TABLE pns_credentials3 (
>  ...   ise text PRIMARY KEY,
>  ...   isnew int,
>  ...   ts timestamp,
>  ...   mergestatus int,
>  ...   infranetaccount text,
>  ...   user_level int,
>  ...   msisdn bigint,
>  ...   mergeusertype int
>  ... ) WITH
>  ...   comment='' AND
>  ...   read_repair_chance=0.10 AND
>  ...   gc_grace_seconds=864000 AND
>  ...   compaction_strategy_class='LeveledCompactionStrategy' AND
>  ...   compression_parameters:sstable_compression='SnappyCompressor';
> cqlsh:test1> describe table pns_credentials3
> 
> CREATE TABLE pns_credentials3 (
>   ise text PRIMARY KEY,
>   isnew int,
>   ts timestamp,
>   mergestatus int,
>   infranetaccount text,
>   user_level int,
>   msisdn bigint,
>   mergeusertype int
> ) WITH
>   comment='' AND
>   comparator=text AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   default_validation=text AND
>   min_compaction_threshold=4 AND
>   max_compaction_threshold=32 AND
>   replicate_on_write='true' AND
>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>   compression_parameters:sstable_compression='SnappyCompressor';
> 
> Maybe something is wrong in my server.
> Any idea ?
> 
> Thanks.
> Jean-Armel
> 
> 
> 2012/8/30 feedly team 
> in cassandra-cli, i did something like: 
> 
> update column family xyz with compaction_strategy='LeveledCompactionStrategy'
> 
> 
> On Thu, Aug 30, 2012 at 5:20 AM, Jean-Armel Luce  wrote:
> 
> Hello,
> 
> I am using Cassandra 1.1.1 and CQL3.
> I have a cluster with 1 node (test environment)
> Could you tell how to set the compaction strategy to Leveled Strategy for an 
> existing table ?
> 
> I have a table pns_credentials
> 
> jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
> Connected to Test Cluster at localhost:9160.
> [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
> Use HELP for help.
> cqlsh> use test1;
> cqlsh:test1> describe table pns_credentials;
> 
> CREATE TABLE pns_credentials (
>   ise text PRIMARY KEY,
>   isnew int,
>   ts timestamp,
>   mergestatus int,
>   infranetaccount text,
>   user_level int,
>   msisdn bigint,
>   mergeusertype int
> ) WITH
>   comment='' AND
>   comparator=text AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   

Re: performance is drastically degraded after 0.7.8 --> 1.0.11 upgrade

2012-08-30 Thread Илья Шипицин
we are using functional tests ( ~500 tests in time).
it is hard to tell which query is slower, it is "slower in general".

same hardware. 1 node, 32Gb RAM, 8Gb heap. default cassandra settings.
as we are talking about functional tests, so we recreate KS just before
tests are run.

I do not know how to record queries (there are a lot of them), if you are
interested, I can set up a special stand for you.

2012/8/31 aaron morton 

> we are running somewhat queue-like with aggressive write-read patterns.
>
> We'll need some more details...
>
> How much data ?
> How many machines ?
> What is the machine spec ?
> How many clients ?
> Is there an example of a slow request ?
> How are you measuring that it's slow ?
> Is there anything unusual in the log ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 31/08/2012, at 3:30 AM, Edward Capriolo  wrote:
>
> If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
> soon as possible. If you have large bloomfilters you can hit a bug
> where the bloom filters will not work properly.
>
>
> On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин 
> wrote:
>
> we are running somewhat queue-like with aggressive write-read patterns.
> I was looking for scripting queries from live Cassandra installation, but I
> didn't find any.
>
> is there something like thrift-proxy or other query logging/scripting
> engine
> ?
>
> 2012/8/30 aaron morton 
>
>
> in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
> times slower than cassandra-0.7.8
>
> We've not had any reports of a performance drop off. All tests so far have
> show improvements in both read and write performance.
>
> I agree, such digests save some network IO, but they seem to be very bad
> in terms of CPU and disk IO.
>
> The sha1 is created so we can diagnose corruptions in the -Data component
> of the SSTables. They are not used to save network IO.
> It is calculated while streaming the Memtable to disk so has no impact on
> disk IO. While not the fasted algorithm I would assume it's CPU overhead in
> this case is minimal.
>
> there's already relatively small Bloom filter file, which can be used for
> saving network traffic instead of sha1 digest.
>
> Bloom filters are used to test if a row key may exist in an SSTable.
>
> any explanation ?
>
> If you can provide some more information on your use case we may be able
> to help.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 30/08/2012, at 5:18 AM, Илья Шипицин  wrote:
>
> in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
> times slower than cassandra-0.7.8
> after some investigation carried out I noticed files with "sha1" extension
> (which are missing for Cassandra-0.7.8)
>
> in maybeWriteDigest() function I see no option fot switching sha1 digests
> off.
>
> I agree, such digests save some network IO, but they seem to be very bad
> in terms of CPU and disk IO.
> why to use one more digest (which have to be calculated), there's already
> relatively small Bloom filter file, which can be used for saving network
> traffic instead of sha1 digest.
>
> any explanation ?
>
> Ilya Shipitsin
>
>
>
>
>


Re: Memory Usage of a connection

2012-08-30 Thread rohit bhatia
PS: everything above is in bytes, not bits.

On Fri, Aug 31, 2012 at 11:03 AM, rohit bhatia  wrote:

> I was wondering how much would be the memory usage of an established
> connection in cassandra's heap space.
>
> We are noticing extremely frequent young generation garbage collections
> (3.2gb young generation, ParNew gc every 2 seconds) at a traffic of
> 20,000qps for 8 nodes.
> We do connection pooling but with 1 connection for 6 requests with
> phpcassa.
> So, essentially every node has on an average 500 connections
> created/destroyed every second.
> Could these 500 connections/second cause (on average) 2600Mb memory usage
> per 2 second ~ 1300Mb/second.
> or For 1 connection around 2-3Mb.
>
> Is this value expected? (our write requests are simple counter increments
> and cannot take up 500KB per request as calculation suggests, rather should
> take up only a few hundred bytes).
>
> Thanks
> Rohit
>


Re: adding node to cluster

2012-08-30 Thread Casey Deccio
On Thu, Aug 30, 2012 at 11:21 AM, Rob Coli  wrote:

> On Thu, Aug 30, 2012 at 10:18 AM, Casey Deccio  wrote:
> > I'm adding a new node to an existing cluster that uses
> > ByteOrderedPartitioner.  The documentation says that if I don't
> configure a
> > token, then one will be automatically generated to take load from an
> > existing node.
> > What I'm finding is that when I add a new node, (super)
> > column lookups begin failing (not sure if it was the row lookup failing
> or
> > the supercolumn lookup failing), and I'm not sure why.
>
> 1) You almost never actually want BOP.
> 2) You never want Cassandra to pick a token for you. IMO and the
> opinion of many others, the fact that it does this is a bug. Specify a
> token with initial_token.
> 3) You never want to use Supercolumns. The project does not support
> them but currently has no plan to deprecate them. Use composite row
> keys.
> 4) Unless your existing cluster consists of one node, you almost never
> want to add only a single new node to a cluster. In general you want
> to double it.
>
> In summary, you are Doing It just about as Wrong as possible... but on
> to your actual question ... ! :)
>
>
Well, at least I'm consistent :)  Thanks for the hints.  Unfortunately,
when I first brought up my system--with the goal of getting it up
quickly--I thought BOP and Supercolumns were the way to go.  Plus, the
small "cluster" of nodes I was using was on a hodgepodge of hardware.  I've
since had a chance to think somewhat about redesigning and rearchitecting,
but it seems like there's no "easy" way to convert it properly.  Step one
was to migrate everything over to a single dedicated node on reasonable
hardware, so I could begin the process, which brought me to the issue I
initially posted about.  But the problem is that this is a live system, so
data loss is an issue I'd like to avoid.


> In what way are the lookups "failing"? Is there an exception?
>
>
No exception--just failing in that the data should be there, but isn't.

Casey


Re: Memory Usage of a connection

2012-08-30 Thread Peter Schuller
> Could these 500 connections/second cause (on average) 2600Mb memory usage
> per 2 second ~ 1300Mb/second.
> or For 1 connection around 2-3Mb.

In terms of garbage generated it's much less about number of
connections as it is about what you're doing with them. Are you for
example requesting large amounts of data? Large or many columns (or
both), etc. Essentially all "working" data that your request touches
is allocated on the heap and contributes to allocation rate and ParNew
frequency.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Memory Usage of a connection

2012-08-30 Thread rohit bhatia
On Fri, Aug 31, 2012 at 11:27 AM, Peter Schuller <
peter.schul...@infidyne.com> wrote:

> > Could these 500 connections/second cause (on average) 2600Mb memory usage
> > per 2 second ~ 1300Mb/second.
> > or For 1 connection around 2-3Mb.
>
> In terms of garbage generated it's much less about number of
> connections as it is about what you're doing with them. Are you for
> example requesting large amounts of data? Large or many columns (or
> both), etc. Essentially all "working" data that your request touches
> is allocated on the heap and contributes to allocation rate and ParNew
> frequency.
>
>
"write requests are simple counter increments" and in memtables existing in
memory.
There is negligible read traffic (100/200 reads/second).
Also, increasing write traffic si the one that increases gc frequency while
keeping read traffic constant.
So the gc should be independent of reads.


> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>