Re: Multi-type column values in single CF

2011-07-04 Thread Silvère Lestang
We do pretty much the same thing here, dynamic column with a timestamp for
column name and a different value type for each row. We use the
serialization/deserialization classes provided with Hector and store the
type of the value in the key of the row. Example of row key:
"b6c8a1e7281761e62230ea76daa3d841#INT" => every values are Integer
"7f30a6a2bbb1b921afc8216d8c5d9257#DOUBLE" => every values are Double

If I'll have to do it again, I'll try to use (Dynamic)CompositeType for
value or an equivalent mechanism as suggested by Roland.

On 3 July 2011 15:07, Roland Gude  wrote:

> You could do the serialization for all your supported datatypes yourself
> (many libraries for serialization are available and a pretty thorough
> benchmarking for them can be found here:
> https://github.com/eishay/jvm-serializers/wiki) and prepend the serialized
> bytes with an identifier for your datatype.
> This would not avoid casting though but would still be better performing
> then serializing to strings as it is done in your example.
> Prepending the values with the id seems to be better to me, because you can
> be sure that a new insertion to some field overwrites the correct column
> even if it changed the type.
>
> -Ursprüngliche Nachricht-
> Von: osishkin osishkin [mailto:osish...@gmail.com]
> Gesendet: Sonntag, 3. Juli 2011 13:52
> An: user@cassandra.apache.org
> Betreff: Multi-type column values in single CF
>
> Hi all,
>
> I need to store column values that are of various data types in a
> single column family, i.e I have column values that are integers,
> others that are strings, and maybe more later. All column names are
> strings (no comparator problem for me).
> The thing is I need to store unstructured data - I do not have fixed
> and known-in-advacne column names, so I can not use a fixed static map
> for casting the values back to their original type on retrieval from
> cassandra.
>
> My immediate naive thought is to simply prefix every column name with
> the type the value needs to be cast back to.
> For example i'll do the follwing conversion to the columns of some key -
> {'attr1': 'val1','attr2': 100}  ~> {'str_attr1' : 'val1', 'int_attr2' :
> '100'}
> and only then send it to cassandra. This way I know to what should I
> cast it back.
>
> But all this casting back and forth on the client side seems to me to
> be very bad for performance.
> Another option is to split the columns on dedicated column families
> with mathcing validation types - a column family for integer values,
> one for string, one for timestamp etc.
> But that does not seem very efficient either (and worse for any
> rollback mechanism), since now I have to perform several get calls on
> multiple CFs where once I had only one.
>
> I thought perhaps someone has encountered a similar situation in the
> past, and can offer some advice on the best course of action.
>
> Thank you,
> Osi
>
>
>


Re: Multi-type column values in single CF

2011-07-04 Thread osishkin osishkin
I appreciate both your answers.
I'll use them soon.

Thanks!

On Mon, Jul 4, 2011 at 11:48 AM, Silvère Lestang
 wrote:
> We do pretty much the same thing here, dynamic column with a timestamp for
> column name and a different value type for each row. We use the
> serialization/deserialization classes provided with Hector and store the
> type of the value in the key of the row. Example of row key:
> "b6c8a1e7281761e62230ea76daa3d841#INT" => every values are Integer
> "7f30a6a2bbb1b921afc8216d8c5d9257#DOUBLE" => every values are Double
> 
> If I'll have to do it again, I'll try to use (Dynamic)CompositeType for
> value or an equivalent mechanism as suggested by Roland.
>
> On 3 July 2011 15:07, Roland Gude  wrote:
>>
>> You could do the serialization for all your supported datatypes yourself
>> (many libraries for serialization are available and a pretty thorough
>> benchmarking for them can be found here:
>> https://github.com/eishay/jvm-serializers/wiki) and prepend the serialized
>> bytes with an identifier for your datatype.
>> This would not avoid casting though but would still be better performing
>> then serializing to strings as it is done in your example.
>> Prepending the values with the id seems to be better to me, because you
>> can be sure that a new insertion to some field overwrites the correct column
>> even if it changed the type.
>>
>> -Ursprüngliche Nachricht-
>> Von: osishkin osishkin [mailto:osish...@gmail.com]
>> Gesendet: Sonntag, 3. Juli 2011 13:52
>> An: user@cassandra.apache.org
>> Betreff: Multi-type column values in single CF
>>
>> Hi all,
>>
>> I need to store column values that are of various data types in a
>> single column family, i.e I have column values that are integers,
>> others that are strings, and maybe more later. All column names are
>> strings (no comparator problem for me).
>> The thing is I need to store unstructured data - I do not have fixed
>> and known-in-advacne column names, so I can not use a fixed static map
>> for casting the values back to their original type on retrieval from
>> cassandra.
>>
>> My immediate naive thought is to simply prefix every column name with
>> the type the value needs to be cast back to.
>> For example i'll do the follwing conversion to the columns of some key -
>> {'attr1': 'val1','attr2': 100}  ~> {'str_attr1' : 'val1', 'int_attr2' :
>> '100'}
>> and only then send it to cassandra. This way I know to what should I
>> cast it back.
>>
>> But all this casting back and forth on the client side seems to me to
>> be very bad for performance.
>> Another option is to split the columns on dedicated column families
>> with mathcing validation types - a column family for integer values,
>> one for string, one for timestamp etc.
>> But that does not seem very efficient either (and worse for any
>> rollback mechanism), since now I have to perform several get calls on
>> multiple CFs where once I had only one.
>>
>> I thought perhaps someone has encountered a similar situation in the
>> past, and can offer some advice on the best course of action.
>>
>> Thank you,
>> Osi
>>
>>
>
>


Re: Row cache

2011-07-04 Thread Shay Assulin
Hi,

The row cache capacity > 0.


after reading a row - the Caches..KeyCache.Requests attribute
gets incremented but the ColumnFamilies...ReadCount attribute
remains zero and the Caches..RowCache.Size and Requsts
attributes remain zero as well.

It looks like the row-cache is disabled although the capacity is not zero.
In addition the ColumnFamilies...ReadCount does not reflect
the fact that the row was fetched from SSTable.

beside CF.rows_cached parameter (in the yaml) - should i configure anything
else to enable the row-cache? 


10x

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Row-cache-tp6532887p6545416.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


How to scale Cassandra?

2011-07-04 Thread Subscriber
Hi there, 

I read a lot of Cassandra's high scalability feature: allowing seamless 
addition of nodes, no downtime etc.
But I wonder how one will do this in practice in an operational system. 

In the system we're going to implement we're expecting a huge number of writes 
with uniformly distributed keys 
(the keys are given and cannot be generated). That means using 
RandomPartitioner will (more or less) result in 
the same work-load per node as any other OrderPreservePartitioner - right?

But how do you scale a (more or less) balanced Cassandra cluster? I think that 
in the end 
you always have to double the number of nodes (adding just a handful of nodes 
disburdens only the split regions, the
work-load of untouched regions will grow with unchanged speed).

This seems to be ok for small clusters. But what do you do with when you have 
several 100s of nodes in your cluster? 
It seems to me that a balanced cluster is a bless for performance but a curse 
for scalability...

What are the alternatives? One could re-distribute the token ranges, but this 
would cause 
downtimes (AFAIK); not an option!

Is there anything that I didn't understand or do I miss something else? Is the 
only left strategy to make sure that
the cluster grows unbalanced so one can add nodes to the hotspots? However in 
this case you have to make sure
that this strategy is lasting. Could be too optimistic...

Best Regards
Udo

Re: How to scale Cassandra?

2011-07-04 Thread Paul Loy
That's basically how I understand it.

However, I think it gets better with larger clusters as the proportion of
the ring you move around at any time is much lower.

On Mon, Jul 4, 2011 at 10:54 AM, Subscriber  wrote:

> Hi there,
>
> I read a lot of Cassandra's high scalability feature: allowing seamless
> addition of nodes, no downtime etc.
> But I wonder how one will do this in practice in an operational system.
>
> In the system we're going to implement we're expecting a huge number of
> writes with uniformly distributed keys
> (the keys are given and cannot be generated). That means using
> RandomPartitioner will (more or less) result in
> the same work-load per node as any other OrderPreservePartitioner - right?
>
> But how do you scale a (more or less) balanced Cassandra cluster? I think
> that in the end
> you always have to double the number of nodes (adding just a handful of
> nodes disburdens only the split regions, the
> work-load of untouched regions will grow with unchanged speed).
>
> This seems to be ok for small clusters. But what do you do with when you
> have several 100s of nodes in your cluster?
> It seems to me that a balanced cluster is a bless for performance but a
> curse for scalability...
>
> What are the alternatives? One could re-distribute the token ranges, but
> this would cause
> downtimes (AFAIK); not an option!
>
> Is there anything that I didn't understand or do I miss something else? Is
> the only left strategy to make sure that
> the cluster grows unbalanced so one can add nodes to the hotspots? However
> in this case you have to make sure
> that this strategy is lasting. Could be too optimistic...
>
> Best Regards
> Udo




-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy


flushing issue

2011-07-04 Thread Vivek Mishra
Hi,
I know, I might be missing something here.
I am currently facing 1 issue.

I have 2 cassandra clients(1. Using CassandraServer 2. Using Cassandra.Client) 
running connecting to same host.

I have created Keyspace K1, K2 using client1(e.g. CassandraServer), but somehow 
those keyspaces are not available with Client2(e.g. Cassandra.Client).

I have also tried by flusing StorageService.instance.ForceFlush to tables. But 
that also didn't work.



Any help/Suggestion?




Register for Impetus Webinar on 'Leveraging the Cloud for your Product Testing 
Needs' on June 22 (10:00am PT). Meet Impetus as a sponsor for Hadoop Summit 
2011 in Santa Clara, CA on June 29.

Click http://www.impetus.com to know more. Follow us on 
www.twitter.com/impetuscalling


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
Hi all,

we have a mem problem with cassandra. res goes up without bounds (well until 
the os kills the process because we dont have swap)

I found a thread that's about the same problem but on OpenJDK: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html

We are on Debian with Sun JDK.

Resident mem is 7.4G while heap is restricted to 3G.

Anyone else is seeing this with Sun JDK?

Cheers,
Daniel

:/home/dd# java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

:/home/dd# ps aux |grep java
cass 28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21 /usr/bin/java 
-ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms3000M -Xmx3000M 
-Xmn400M ...

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 
28201 cass  20   0  355g 7.4g 1.4g S8 46.9   5656:25 java





Re: Row cache

2011-07-04 Thread Daniel Doubleday
Just to make sure:

The yaml doesn't matter. The cache config is stored in the system tables. Its 
the "CREATE ... WITH ..." stuff you did via cassandra-cli to create the CF.

In Jconsole you see that the cache capacity is > 0?

On Jul 4, 2011, at 11:18 AM, Shay Assulin wrote:

> Hi,
> 
> The row cache capacity > 0.
> 
> 
> after reading a row - the Caches..KeyCache.Requests attribute
> gets incremented but the ColumnFamilies...ReadCount attribute
> remains zero and the Caches..RowCache.Size and Requsts
> attributes remain zero as well.
> 
> It looks like the row-cache is disabled although the capacity is not zero.
> In addition the ColumnFamilies...ReadCount does not reflect
> the fact that the row was fetched from SSTable.
> 
> beside CF.rows_cached parameter (in the yaml) - should i configure anything
> else to enable the row-cache? 
> 
> 
> 10x
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Row-cache-tp6532887p6545416.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Re: Cassandra memory problem

2011-07-04 Thread Jonathan Ellis
mmap'd data will be attributed to res, but the OS can page it out
instead of killing the process.

On Mon, Jul 4, 2011 at 5:52 AM, Daniel Doubleday
 wrote:
> Hi all,
> we have a mem problem with cassandra. res goes up without bounds (well until
> the os kills the process because we dont have swap)
> I found a thread that's about the same problem but on OpenJDK:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
> We are on Debian with Sun JDK.
> Resident mem is 7.4G while heap is restricted to 3G.
> Anyone else is seeing this with Sun JDK?
> Cheers,
> Daniel
> :/home/dd# java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> :/home/dd# ps aux |grep java
> cass     28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
> -Xms3000M -Xmx3000M -Xmn400M ...
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
>
> 28201 cass      20   0  355g 7.4g 1.4g S    8 46.9   5656:25 java
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Cassandra memory problem

2011-07-04 Thread Sebastien Coutu
We had an issue like that a short while ago here. This was mainly happening
under heavy load and we managed to stabilize it by tweaking the Young/Old
space ratio of the JVM and by also tweaking the tenuring thresholds/survivor
ratios. What kind of load to you have on your systems? Mostly reads, writes?

SC

On Mon, Jul 4, 2011 at 6:52 AM, Daniel Doubleday
wrote:

> Hi all,
>
> we have a mem problem with cassandra. res goes up without bounds (well
> until the os kills the process because we dont have swap)
>
> I found a thread that's about the same problem but on OpenJDK:
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
>
> We are on Debian with Sun JDK.
>
> Resident mem is 7.4G while heap is restricted to 3G.
>
> Anyone else is seeing this with Sun JDK?
>
> Cheers,
> Daniel
>
> :/home/dd# java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>
> :/home/dd# ps aux |grep java
> cass 28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
> -Xms3000M -Xmx3000M -Xmn400M ...
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>
>
> 28201 cass  20   0  355g 7.4g 1.4g S8 46.9   5656:25 java
>
>
>
>


RE: How to scale Cassandra?

2011-07-04 Thread Dan Hendry
Moving nodes does not result in downtime provide you use proper replication
factors and read/write consistencies. The typical recommendation is RF=3 and
QUORUM reads/writes.

 

Dan

 

From: Paul Loy [mailto:ketera...@gmail.com] 
Sent: July-04-11 5:59
To: user@cassandra.apache.org
Subject: Re: How to scale Cassandra?

 

That's basically how I understand it.

However, I think it gets better with larger clusters as the proportion of
the ring you move around at any time is much lower.

On Mon, Jul 4, 2011 at 10:54 AM, Subscriber  wrote:

Hi there,

I read a lot of Cassandra's high scalability feature: allowing seamless
addition of nodes, no downtime etc.
But I wonder how one will do this in practice in an operational system.

In the system we're going to implement we're expecting a huge number of
writes with uniformly distributed keys
(the keys are given and cannot be generated). That means using
RandomPartitioner will (more or less) result in
the same work-load per node as any other OrderPreservePartitioner - right?

But how do you scale a (more or less) balanced Cassandra cluster? I think
that in the end
you always have to double the number of nodes (adding just a handful of
nodes disburdens only the split regions, the
work-load of untouched regions will grow with unchanged speed).

This seems to be ok for small clusters. But what do you do with when you
have several 100s of nodes in your cluster?
It seems to me that a balanced cluster is a bless for performance but a
curse for scalability...

What are the alternatives? One could re-distribute the token ranges, but
this would cause
downtimes (AFAIK); not an option!

Is there anything that I didn't understand or do I miss something else? Is
the only left strategy to make sure that
the cluster grows unbalanced so one can add nodes to the hotspots? However
in this case you have to make sure
that this strategy is lasting. Could be too optimistic...

Best Regards
Udo




-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.901 / Virus Database: 271.1.1/3743 - Release Date: 07/04/11
02:35:00



Re: How to scale Cassandra?

2011-07-04 Thread Paul Loy
Well, by issuing a nodetool move when a node is under high load, you
basically make that node unresponsive. That's fine, but a nodetool move on
one node also means that that node's replica data needs to move around the
ring and possibly some replica data from the next (or previous) node in the
ring. So how does this affect other nodes wrt RF and quorum? Will quorum
fail until the replicas have moved also?

On Mon, Jul 4, 2011 at 3:08 PM, Dan Hendry wrote:

> Moving nodes does not result in downtime provide you use proper replication
> factors and read/write consistencies. The typical recommendation is RF=3 and
> QUORUM reads/writes.
>
> ** **
>
> Dan
>
> ** **
>
> *From:* Paul Loy [mailto:ketera...@gmail.com]
> *Sent:* July-04-11 5:59
> *To:* user@cassandra.apache.org
> *Subject:* Re: How to scale Cassandra?
>
> ** **
>
> That's basically how I understand it.
>
> However, I think it gets better with larger clusters as the proportion of
> the ring you move around at any time is much lower.
>
> On Mon, Jul 4, 2011 at 10:54 AM, Subscriber  wrote:
> 
>
> Hi there,
>
> I read a lot of Cassandra's high scalability feature: allowing seamless
> addition of nodes, no downtime etc.
> But I wonder how one will do this in practice in an operational system.
>
> In the system we're going to implement we're expecting a huge number of
> writes with uniformly distributed keys
> (the keys are given and cannot be generated). That means using
> RandomPartitioner will (more or less) result in
> the same work-load per node as any other OrderPreservePartitioner - right?
>
> But how do you scale a (more or less) balanced Cassandra cluster? I think
> that in the end
> you always have to double the number of nodes (adding just a handful of
> nodes disburdens only the split regions, the
> work-load of untouched regions will grow with unchanged speed).
>
> This seems to be ok for small clusters. But what do you do with when you
> have several 100s of nodes in your cluster?
> It seems to me that a balanced cluster is a bless for performance but a
> curse for scalability...
>
> What are the alternatives? One could re-distribute the token ranges, but
> this would cause
> downtimes (AFAIK); not an option!
>
> Is there anything that I didn't understand or do I miss something else? Is
> the only left strategy to make sure that
> the cluster grows unbalanced so one can add nodes to the hotspots? However
> in this case you have to make sure
> that this strategy is lasting. Could be too optimistic...
>
> Best Regards
> Udo
>
>
>
>
> --
> -
> Paul Loy
> p...@keteracel.com
> http://uk.linkedin.com/in/paulloy
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.901 / Virus Database: 271.1.1/3743 - Release Date: 07/04/11
> 02:35:00
>



-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy


Re: Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
Just to make sure: 
You were seeing that res mem was more than twice of max java heap and that did 
change after you tweaked GC settings?

Note that I am not having a heap / gc problem. The VM itself thinks everything 
is golden.

On Jul 4, 2011, at 3:41 PM, Sebastien Coutu wrote:

> We had an issue like that a short while ago here. This was mainly happening 
> under heavy load and we managed to stabilize it by tweaking the Young/Old 
> space ratio of the JVM and by also tweaking the tenuring thresholds/survivor 
> ratios. What kind of load to you have on your systems? Mostly reads, writes?
> 
> SC
> 
> On Mon, Jul 4, 2011 at 6:52 AM, Daniel Doubleday  
> wrote:
> Hi all,
> 
> we have a mem problem with cassandra. res goes up without bounds (well until 
> the os kills the process because we dont have swap)
> 
> I found a thread that's about the same problem but on OpenJDK: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
> 
> We are on Debian with Sun JDK.
> 
> Resident mem is 7.4G while heap is restricted to 3G.
> 
> Anyone else is seeing this with Sun JDK?
> 
> Cheers,
> Daniel
> 
> :/home/dd# java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> 
> :/home/dd# ps aux |grep java
> cass 28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 
> -Xms3000M -Xmx3000M -Xmn400M ...
> 
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND  
>   
>  
> 28201 cass  20   0  355g 7.4g 1.4g S8 46.9   5656:25 java
> 
> 
> 
> 



copy data from multi-node cluster to single node

2011-07-04 Thread Ross Black
Hi,

I am using Cassandra 0.7.5 on Linux machines.

I am trying to backup data from a multi-node cluster (3 nodes) and restore
it into a single node cluster that has a different name (for development
testing).

The multi-node cluster is backed up using clustertool global_snapshot, and
then I copy the snapshot from a single node and replace the data directory
in the single node.
The multi-node cluster has a replication factor of 3, so I assume that
restoring any node from the multi-node cluster will be the same.
When started up this fails with a node name mismatch.

I have tried removing all the Location* files in the data directory (as per
http://wiki.apache.org/cassandra/FAQ#clustername_mismatch) but the single
node then fails with an error message:
org.apache.cassandra.config.ConfigurationException: Found system table
files, but they couldn't be loaded. Did you change the partitioner?


How do you change the name of a cluster?  The FAQ instructions do not seem
to work for me - are they still valid for 0.7.5?
Is the backup / restore mechanism going to work, or is there a
better/simpler to copy data from multi-node to single-node?

Thanks,
Ross


Re: How to scale Cassandra?

2011-07-04 Thread Edward Capriolo
On Mon, Jul 4, 2011 at 10:21 AM, Paul Loy  wrote:

> Well, by issuing a nodetool move when a node is under high load, you
> basically make that node unresponsive. That's fine, but a nodetool move on
> one node also means that that node's replica data needs to move around the
> ring and possibly some replica data from the next (or previous) node in the
> ring. So how does this affect other nodes wrt RF and quorum? Will quorum
> fail until the replicas have moved also?
>
> On Mon, Jul 4, 2011 at 3:08 PM, Dan Hendry wrote:
>
>> Moving nodes does not result in downtime provide you use proper
>> replication factors and read/write consistencies. The typical recommendation
>> is RF=3 and QUORUM reads/writes.
>>
>> ** **
>>
>> Dan
>>
>> ** **
>>
>> *From:* Paul Loy [mailto:ketera...@gmail.com]
>> *Sent:* July-04-11 5:59
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: How to scale Cassandra?
>>
>> ** **
>>
>> That's basically how I understand it.
>>
>> However, I think it gets better with larger clusters as the proportion of
>> the ring you move around at any time is much lower.
>>
>> On Mon, Jul 4, 2011 at 10:54 AM, Subscriber 
>> wrote:
>>
>> Hi there,
>>
>> I read a lot of Cassandra's high scalability feature: allowing seamless
>> addition of nodes, no downtime etc.
>> But I wonder how one will do this in practice in an operational system.
>>
>> In the system we're going to implement we're expecting a huge number of
>> writes with uniformly distributed keys
>> (the keys are given and cannot be generated). That means using
>> RandomPartitioner will (more or less) result in
>> the same work-load per node as any other OrderPreservePartitioner - right?
>>
>> But how do you scale a (more or less) balanced Cassandra cluster? I think
>> that in the end
>> you always have to double the number of nodes (adding just a handful of
>> nodes disburdens only the split regions, the
>> work-load of untouched regions will grow with unchanged speed).
>>
>> This seems to be ok for small clusters. But what do you do with when you
>> have several 100s of nodes in your cluster?
>> It seems to me that a balanced cluster is a bless for performance but a
>> curse for scalability...
>>
>> What are the alternatives? One could re-distribute the token ranges, but
>> this would cause
>> downtimes (AFAIK); not an option!
>>
>> Is there anything that I didn't understand or do I miss something else? Is
>> the only left strategy to make sure that
>> the cluster grows unbalanced so one can add nodes to the hotspots? However
>> in this case you have to make sure
>> that this strategy is lasting. Could be too optimistic...
>>
>> Best Regards
>> Udo
>>
>>
>>
>>
>> --
>> -
>> Paul Loy
>> p...@keteracel.com
>> http://uk.linkedin.com/in/paulloy
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.901 / Virus Database: 271.1.1/3743 - Release Date: 07/04/11
>> 02:35:00
>>
>
>
>
> --
> -
> Paul Loy
> p...@keteracel.com
> http://uk.linkedin.com/in/paulloy
>

No. If you are using nodetool move (or any of the nodetool operations)
quorum and replication factor is properly maintained.


Re: Cassandra memory problem

2011-07-04 Thread Sebastien Coutu
It was among one of the issues we had. One of our hosts was using OpenJDK
and we've switched it to Sun and this part of the issue stabilized. The
other issues we had were Heap going through the roof and then OOM under
load.


On Mon, Jul 4, 2011 at 11:01 AM, Daniel Doubleday
wrote:

> Just to make sure:
> You were seeing that res mem was more than twice of max java heap and that
> did change after you tweaked GC settings?
>
> Note that I am not having a heap / gc problem. The VM itself thinks
> everything is golden.
>
> On Jul 4, 2011, at 3:41 PM, Sebastien Coutu wrote:
>
> We had an issue like that a short while ago here. This was mainly happening
> under heavy load and we managed to stabilize it by tweaking the Young/Old
> space ratio of the JVM and by also tweaking the tenuring thresholds/survivor
> ratios. What kind of load to you have on your systems? Mostly reads, writes?
>
> SC
>
> On Mon, Jul 4, 2011 at 6:52 AM, Daniel Doubleday  > wrote:
>
>> Hi all,
>>
>> we have a mem problem with cassandra. res goes up without bounds (well
>> until the os kills the process because we dont have swap)
>>
>> I found a thread that's about the same problem but on OpenJDK:
>>
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
>>
>> We are on Debian with Sun JDK.
>>
>> Resident mem is 7.4G while heap is restricted to 3G.
>>
>> Anyone else is seeing this with Sun JDK?
>>
>> Cheers,
>> Daniel
>>
>> :/home/dd# java -version
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>
>> :/home/dd# ps aux |grep java
>> cass 28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21
>> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
>> -Xms3000M -Xmx3000M -Xmn400M ...
>>
>>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>>
>>
>> 28201 cass  20   0  355g 7.4g 1.4g S8 46.9   5656:25 java
>>
>>
>>
>>
>
>


Re: Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
Yes thank you. 

I have read about the OpenJDK issue but unfortunately we are already on Sun JDK.

On Jul 4, 2011, at 6:04 PM, Sebastien Coutu wrote:

> It was among one of the issues we had. One of our hosts was using OpenJDK and 
> we've switched it to Sun and this part of the issue stabilized. The other 
> issues we had were Heap going through the roof and then OOM under load.
> 
> 
> On Mon, Jul 4, 2011 at 11:01 AM, Daniel Doubleday  
> wrote:
> Just to make sure: 
> You were seeing that res mem was more than twice of max java heap and that 
> did change after you tweaked GC settings?
> 
> Note that I am not having a heap / gc problem. The VM itself thinks 
> everything is golden.
> 
> On Jul 4, 2011, at 3:41 PM, Sebastien Coutu wrote:
> 
>> We had an issue like that a short while ago here. This was mainly happening 
>> under heavy load and we managed to stabilize it by tweaking the Young/Old 
>> space ratio of the JVM and by also tweaking the tenuring thresholds/survivor 
>> ratios. What kind of load to you have on your systems? Mostly reads, writes?
>> 
>> SC
>> 
>> On Mon, Jul 4, 2011 at 6:52 AM, Daniel Doubleday  
>> wrote:
>> Hi all,
>> 
>> we have a mem problem with cassandra. res goes up without bounds (well until 
>> the os kills the process because we dont have swap)
>> 
>> I found a thread that's about the same problem but on OpenJDK: 
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
>> 
>> We are on Debian with Sun JDK.
>> 
>> Resident mem is 7.4G while heap is restricted to 3G.
>> 
>> Anyone else is seeing this with Sun JDK?
>> 
>> Cheers,
>> Daniel
>> 
>> :/home/dd# java -version
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>> 
>> :/home/dd# ps aux |grep java
>> cass 28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21 
>> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 
>> -Xms3000M -Xmx3000M -Xmn400M ...
>> 
>>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
>>  
>>
>> 28201 cass  20   0  355g 7.4g 1.4g S8 46.9   5656:25 java
>> 
>> 
>> 
>> 
> 
> 



Re: Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
Ok - thanks but maybe some kernel guy can help or point to some good resource 
to get educated because I don'r really get it.

The following is from our other small log cluster with 2 nodes with 8GM ram 
cassandra has 4GB max heap

- We have disabled swap on all cassandra servers
- On the machine were I got the system oom (not java oom) I look at dmesg and 
see

In the normal zone:

free:6604kB show we hava a problem
unevictable:4176332kB because we have jna

and page cache info (DMA32 zone):

active_anon:3118624kB 
active_file:0kB

If I understand it correctly active_anon and _file correspond to file backed 
and non file backed pages. 

Question 1:
So if the res memory thats non-heap is actually mmap'd files shouldn't they 
show up in active_file?

- I compare /proc/meminfo's of the node that was restarted and the other one 
that still survived

Active(anon) on the restarted server is ~100MB on the other its > 1GB.

Question 2:
Could anyone point me to some resource which explains how the file system cache 
usage of the page cache and process usage of page cache are orchestrated?
I understand that the swap daemon only checks the page cache if the number of 
free pages is getting low. So if res memory is used up by the java process 
(which is not controllable by -Xmx settings) it seems to compete with the file 
system cache. Wow is memory usage optimized so that the right parts of file are 
in mem?

Thanks,
Daniel


snip from: dmesg

[7226178.039658] Node 0 DMA32 free:23420kB min:4816kB low:6020kB high:7224kB 
active_anon:3118624kB inactive_anon:20048kB active_file:0kB inactive_file:764kB 
unevictable:268156kB isolated(anon):0kB isolated(file):0kB present:3463804kB 
mlocked:268156kB dirty:0kB writeback:32kB mapped:1944kB shmem:120kB 
slab_reclaimable:1124kB slab_unreclaimable:1740kB kernel_stack:1368kB 
pagetables:9636kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:160 
all_unreclaimable? yes
[7226178.039672] Node 0 Normal free:6604kB min:6652kB low:8312kB high:9976kB 
active_anon:412072kB inactive_anon:68800kB active_file:100kB 
inactive_file:340kB unevictable:4176332kB isolated(anon):0kB isolated(file):0kB 
present:4783356kB mlocked:4176332kB dirty:0kB writeback:28kB mapped:13600kB 
shmem:1376kB slab_reclaimable:5052kB slab_unreclaimable:12172kB 
kernel_stack:1400kB pagetables:11156kB unstable:0kB bounce:0kB 
writeback_tmp:0kB pages_scanned:747 all_unreclaimable? yes
[7226178.039682] lowmem_reserve[]: 0 0 0 0
[7226178.039686] Node 0 DMA: 2*4kB 2*8kB 1*16kB 1*32kB 2*64kB 0*128kB 1*256kB 
0*512kB 1*1024kB 1*2048kB 3*4096kB = 15816kB
[7226178.039696] Node 0 DMA32: 214*4kB 166*8kB 165*16kB 77*32kB 42*64kB 
30*128kB 14*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 23544kB
[7226178.039705] Node 0 Normal: 731*4kB 3*8kB 0*16kB 2*32kB 2*64kB 1*128kB 
2*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 6852kB
[7226178.039714] 4707 total pagecache pages
[7226178.039715] 0 pages in swap cache
[7226178.039717] Swap cache stats: add 0, delete 0, find 0/0
[7226178.039719] Free swap  = 0kB
[7226178.039720] Total swap = 0kB
[7226178.064611] 2097135 pages RAM
[7226178.064614] 47927 pages reserved
[7226178.064615] 14100 pages shared
[7226178.064616] 2031896 pages non-shared
[7226178.064620] Out of memory: kill process 11670 (java) score 13723 or a child


On Jul 4, 2011, at 2:42 PM, Jonathan Ellis wrote:

> mmap'd data will be attributed to res, but the OS can page it out
> instead of killing the process.
> 
> On Mon, Jul 4, 2011 at 5:52 AM, Daniel Doubleday
>  wrote:
>> Hi all,
>> we have a mem problem with cassandra. res goes up without bounds (well until
>> the os kills the process because we dont have swap)
>> I found a thread that's about the same problem but on OpenJDK:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
>> We are on Debian with Sun JDK.
>> Resident mem is 7.4G while heap is restricted to 3G.
>> Anyone else is seeing this with Sun JDK?
>> Cheers,
>> Daniel
>> :/home/dd# java -version
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>> :/home/dd# ps aux |grep java
>> cass 28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21
>> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
>> -Xms3000M -Xmx3000M -Xmn400M ...
>>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>> 
>> 
>> 28201 cass  20   0  355g 7.4g 1.4g S8 46.9   5656:25 java
>> 
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com



Re: Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
Hi Sebastian,

one question: do you use jna.jar and do you see JNA mlockall successful in your 
logs.
There's that wild theory here that our problem might be related to mlockall and 
no swap. 
Maybe the JVM does some realloc stuff and the pinned pages are not cleared ... 

but that's really only wild guessing.

Also you are saying that on your servers res mem is not > max heap and the java 
process is not swapping?

Thanks,
Daniel

On Jul 4, 2011, at 6:04 PM, Sebastien Coutu wrote:

> It was among one of the issues we had. One of our hosts was using OpenJDK and 
> we've switched it to Sun and this part of the issue stabilized. The other 
> issues we had were Heap going through the roof and then OOM under load.
> 
> 
> On Mon, Jul 4, 2011 at 11:01 AM, Daniel Doubleday  
> wrote:
> Just to make sure: 
> You were seeing that res mem was more than twice of max java heap and that 
> did change after you tweaked GC settings?
> 
> Note that I am not having a heap / gc problem. The VM itself thinks 
> everything is golden.
> 
> On Jul 4, 2011, at 3:41 PM, Sebastien Coutu wrote:
> 
>> We had an issue like that a short while ago here. This was mainly happening 
>> under heavy load and we managed to stabilize it by tweaking the Young/Old 
>> space ratio of the JVM and by also tweaking the tenuring thresholds/survivor 
>> ratios. What kind of load to you have on your systems? Mostly reads, writes?
>> 
>> SC
>> 
>> On Mon, Jul 4, 2011 at 6:52 AM, Daniel Doubleday  
>> wrote:
>> Hi all,
>> 
>> we have a mem problem with cassandra. res goes up without bounds (well until 
>> the os kills the process because we dont have swap)
>> 
>> I found a thread that's about the same problem but on OpenJDK: 
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
>> 
>> We are on Debian with Sun JDK.
>> 
>> Resident mem is 7.4G while heap is restricted to 3G.
>> 
>> Anyone else is seeing this with Sun JDK?
>> 
>> Cheers,
>> Daniel
>> 
>> :/home/dd# java -version
>> java version "1.6.0_24"
>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>> 
>> :/home/dd# ps aux |grep java
>> cass 28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21 
>> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 
>> -Xms3000M -Xmx3000M -Xmn400M ...
>> 
>>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
>>  
>>
>> 28201 cass  20   0  355g 7.4g 1.4g S8 46.9   5656:25 java
>> 
>> 
>> 
>> 
> 
> 



Re: How to scale Cassandra?

2011-07-04 Thread Paul Loy
Do you mean the ring does not change until the move has completed?

On Mon, Jul 4, 2011 at 4:49 PM, Edward Capriolo wrote:

>
>
> On Mon, Jul 4, 2011 at 10:21 AM, Paul Loy  wrote:
>
>> Well, by issuing a nodetool move when a node is under high load, you
>> basically make that node unresponsive. That's fine, but a nodetool move on
>> one node also means that that node's replica data needs to move around the
>> ring and possibly some replica data from the next (or previous) node in the
>> ring. So how does this affect other nodes wrt RF and quorum? Will quorum
>> fail until the replicas have moved also?
>>
>> On Mon, Jul 4, 2011 at 3:08 PM, Dan Hendry wrote:
>>
>>> Moving nodes does not result in downtime provide you use proper
>>> replication factors and read/write consistencies. The typical recommendation
>>> is RF=3 and QUORUM reads/writes.
>>>
>>> ** **
>>>
>>> Dan
>>>
>>> ** **
>>>
>>> *From:* Paul Loy [mailto:ketera...@gmail.com]
>>> *Sent:* July-04-11 5:59
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: How to scale Cassandra?
>>>
>>> ** **
>>>
>>> That's basically how I understand it.
>>>
>>> However, I think it gets better with larger clusters as the proportion of
>>> the ring you move around at any time is much lower.
>>>
>>> On Mon, Jul 4, 2011 at 10:54 AM, Subscriber 
>>> wrote:
>>>
>>> Hi there,
>>>
>>> I read a lot of Cassandra's high scalability feature: allowing seamless
>>> addition of nodes, no downtime etc.
>>> But I wonder how one will do this in practice in an operational system.
>>>
>>> In the system we're going to implement we're expecting a huge number of
>>> writes with uniformly distributed keys
>>> (the keys are given and cannot be generated). That means using
>>> RandomPartitioner will (more or less) result in
>>> the same work-load per node as any other OrderPreservePartitioner -
>>> right?
>>>
>>> But how do you scale a (more or less) balanced Cassandra cluster? I think
>>> that in the end
>>> you always have to double the number of nodes (adding just a handful of
>>> nodes disburdens only the split regions, the
>>> work-load of untouched regions will grow with unchanged speed).
>>>
>>> This seems to be ok for small clusters. But what do you do with when you
>>> have several 100s of nodes in your cluster?
>>> It seems to me that a balanced cluster is a bless for performance but a
>>> curse for scalability...
>>>
>>> What are the alternatives? One could re-distribute the token ranges, but
>>> this would cause
>>> downtimes (AFAIK); not an option!
>>>
>>> Is there anything that I didn't understand or do I miss something else?
>>> Is the only left strategy to make sure that
>>> the cluster grows unbalanced so one can add nodes to the hotspots?
>>> However in this case you have to make sure
>>> that this strategy is lasting. Could be too optimistic...
>>>
>>> Best Regards
>>> Udo
>>>
>>>
>>>
>>>
>>> --
>>> -
>>> Paul Loy
>>> p...@keteracel.com
>>> http://uk.linkedin.com/in/paulloy
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com
>>> Version: 9.0.901 / Virus Database: 271.1.1/3743 - Release Date: 07/04/11
>>> 02:35:00
>>>
>>
>>
>>
>> --
>> -
>> Paul Loy
>> p...@keteracel.com
>> http://uk.linkedin.com/in/paulloy
>>
>
> No. If you are using nodetool move (or any of the nodetool operations)
> quorum and replication factor is properly maintained.
>



-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy


Re: Cassandra memory problem

2011-07-04 Thread Sebastien Coutu
Hi Daniel,

Yes we do see it, since I've added the JNA libraries, it takes a bit more
time at that step and locks all the memory. We're using JNA 3.3.0 we've
downloaded from there:

https://github.com/twall/jna#readme

Our servers currently have 32GB of
memory and we've assigned 12GB of memory to the Cassandra JVM. We're seeing
the following in the logs:

 INFO [main] 2011-06-27 11:43:14,605 AbstractCassandraDaemon.java (line 97)
Heap size: 11811160064/11811160064
 INFO [main] 2011-06-27 11:43:21,272 CLibrary.java (line 106) JNA mlockall
successful
 INFO [main] 2011-06-27 11:43:21,292 DatabaseDescriptor.java (line 121)
Loading settings from
file:/home/hadoop/bin/cassandra/yul01fct/conf/cassandra.yaml
 INFO [main] 2011-06-27 11:43:21,404 DatabaseDescriptor.java (line 181)
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap

On the servers, we're seeing a lot of system memory assigned to cache that
is "reassigned" to used memory when the applications running on the system
really needs it. We're not seeing any swapped memory because we've tweaked
swappiness and every application running on the system. We're monitoring the
performance of that cluster with Ganglia and see the memory "movement" from
the standard graphs produced.

Regards,

SC

On Mon, Jul 4, 2011 at 12:33 PM, Daniel Doubleday
wrote:

> Hi Sebastian,
>
> one question: do you use jna.jar and do you see JNA mlockall successful in
> your logs.
> There's that wild theory here that our problem might be related to mlockall
> and no swap.
> Maybe the JVM does some realloc stuff and the pinned pages are not cleared
> ...
>
> but that's really only wild guessing.
>
> Also you are saying that on your servers res mem is not > max heap and the
> java process is not swapping?
>
> Thanks,
> Daniel
>
> On Jul 4, 2011, at 6:04 PM, Sebastien Coutu wrote:
>
> It was among one of the issues we had. One of our hosts was using OpenJDK
> and we've switched it to Sun and this part of the issue stabilized. The
> other issues we had were Heap going through the roof and then OOM under
> load.
>
>
> On Mon, Jul 4, 2011 at 11:01 AM, Daniel Doubleday <
> daniel.double...@gmx.net> wrote:
>
>> Just to make sure:
>> You were seeing that res mem was more than twice of max java heap and that
>> did change after you tweaked GC settings?
>>
>> Note that I am not having a heap / gc problem. The VM itself thinks
>> everything is golden.
>>
>> On Jul 4, 2011, at 3:41 PM, Sebastien Coutu wrote:
>>
>> We had an issue like that a short while ago here. This was mainly
>> happening under heavy load and we managed to stabilize it by tweaking the
>> Young/Old space ratio of the JVM and by also tweaking the tenuring
>> thresholds/survivor ratios. What kind of load to you have on your systems?
>> Mostly reads, writes?
>>
>> SC
>>
>> On Mon, Jul 4, 2011 at 6:52 AM, Daniel Doubleday <
>> daniel.double...@gmx.net> wrote:
>>
>>> Hi all,
>>>
>>> we have a mem problem with cassandra. res goes up without bounds (well
>>> until the os kills the process because we dont have swap)
>>>
>>> I found a thread that's about the same problem but on OpenJDK:
>>>
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
>>>
>>> We are on Debian with Sun JDK.
>>>
>>> Resident mem is 7.4G while heap is restricted to 3G.
>>>
>>> Anyone else is seeing this with Sun JDK?
>>>
>>> Cheers,
>>> Daniel
>>>
>>> :/home/dd# java -version
>>> java version "1.6.0_24"
>>> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
>>> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>>>
>>> :/home/dd# ps aux |grep java
>>> cass 28201  9.5 46.8 372659544 7707172 ?   SLl  May24 5656:21
>>> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
>>> -Xms3000M -Xmx3000M -Xmn400M ...
>>>
>>>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>>>
>>>
>>> 28201 cass  20   0  355g 7.4g 1.4g S8 46.9   5656:25 java
>>>
>>>
>>>
>>>
>>
>>
>
>


Re: How to scale Cassandra?

2011-07-04 Thread ZFabrik Subscriber
Let's assume you have 50 nodes and their work-load grows simultaneously. You 
discover that the nodes are about to reach their limits (btw. what is the 
actual limit of a Cassandra node? 100GB? 500GB? 1TB?) 
You decide to add another 50 nodes. Do you do this within one step? Or one 
after the other? Or in several rounds, always every RF-rd node?
Or you add 20 nodes and move the token ranges. Again, in one step? 20 steps? 4 
steps 5 nodes each?
This could take a while (in terms of days, if not weeks) in larger clusters!

Does anybody has experience with real life scale-outs?

Regards
Udo

Am 04.07.2011 um 16:21 schrieb Paul Loy:

> Well, by issuing a nodetool move when a node is under high load, you 
> basically make that node unresponsive. That's fine, but a nodetool move on 
> one node also means that that node's replica data needs to move around the 
> ring and possibly some replica data from the next (or previous) node in the 
> ring. So how does this affect other nodes wrt RF and quorum? Will quorum fail 
> until the replicas have moved also?
> 
> On Mon, Jul 4, 2011 at 3:08 PM, Dan Hendry  wrote:
> Moving nodes does not result in downtime provide you use proper replication 
> factors and read/write consistencies. The typical recommendation is RF=3 and 
> QUORUM reads/writes.
> 
>  
> 
> Dan
> 
>  
> 
> From: Paul Loy [mailto:ketera...@gmail.com] 
> Sent: July-04-11 5:59
> To: user@cassandra.apache.org
> Subject: Re: How to scale Cassandra?
> 
>  
> 
> That's basically how I understand it.
> 
> However, I think it gets better with larger clusters as the proportion of the 
> ring you move around at any time is much lower.
> 
> On Mon, Jul 4, 2011 at 10:54 AM, Subscriber  wrote:
> 
> Hi there,
> 
> I read a lot of Cassandra's high scalability feature: allowing seamless 
> addition of nodes, no downtime etc.
> But I wonder how one will do this in practice in an operational system.
> 
> In the system we're going to implement we're expecting a huge number of 
> writes with uniformly distributed keys
> (the keys are given and cannot be generated). That means using 
> RandomPartitioner will (more or less) result in
> the same work-load per node as any other OrderPreservePartitioner - right?
> 
> But how do you scale a (more or less) balanced Cassandra cluster? I think 
> that in the end
> you always have to double the number of nodes (adding just a handful of nodes 
> disburdens only the split regions, the
> work-load of untouched regions will grow with unchanged speed).
> 
> This seems to be ok for small clusters. But what do you do with when you have 
> several 100s of nodes in your cluster?
> It seems to me that a balanced cluster is a bless for performance but a curse 
> for scalability...
> 
> What are the alternatives? One could re-distribute the token ranges, but this 
> would cause
> downtimes (AFAIK); not an option!
> 
> Is there anything that I didn't understand or do I miss something else? Is 
> the only left strategy to make sure that
> the cluster grows unbalanced so one can add nodes to the hotspots? However in 
> this case you have to make sure
> that this strategy is lasting. Could be too optimistic...
> 
> Best Regards
> Udo
> 
> 
> 
> 
> -- 
> -
> Paul Loy
> p...@keteracel.com
> http://uk.linkedin.com/in/paulloy
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.901 / Virus Database: 271.1.1/3743 - Release Date: 07/04/11 
> 02:35:00
> 
> 
> 
> 
> -- 
> -
> Paul Loy
> p...@keteracel.com
> http://uk.linkedin.com/in/paulloy



Re: How to scale Cassandra?

2011-07-04 Thread Sebastien Coutu
Hi Udo,

I didn't read the whole thread but can you define the type of workload
you're looking at? Do you have jobs that require reading the whole data
stored in your database? For example one big column family that needs to be
read entirely by a job? Because the amount of time required to read a whole
disk (SATA II) of 1TB is roughly 2-2.5 hours. Now add RAID to this and you
can modify this amount of time but your bottleneck will pretty much always
be your disks. On our cluster, we currently have more than 1TB per node and
it holds but we find that our sweet spot should be around 400-500GB per
node.

Regards,

SC


On Mon, Jul 4, 2011 at 3:01 PM, ZFabrik Subscriber wrote:

> Let's assume you have 50 nodes and their work-load grows simultaneously.
> You discover that the nodes are about to reach their limits (btw. what is
> the actual limit of a Cassandra node? 100GB? 500GB? 1TB?)
> You decide to add another 50 nodes. Do you do this within one step? Or one
> after the other? Or in several rounds, always every RF-rd node?
> Or you add 20 nodes and move the token ranges. Again, in one step? 20
> steps? 4 steps 5 nodes each?
> This could take a while (in terms of days, if not weeks) in larger
> clusters!
>
> Does anybody has experience with real life scale-outs?
>
> Regards
> Udo
>
> Am 04.07.2011 um 16:21 schrieb Paul Loy:
>
> Well, by issuing a nodetool move when a node is under high load, you
> basically make that node unresponsive. That's fine, but a nodetool move on
> one node also means that that node's replica data needs to move around the
> ring and possibly some replica data from the next (or previous) node in the
> ring. So how does this affect other nodes wrt RF and quorum? Will quorum
> fail until the replicas have moved also?
>
> On Mon, Jul 4, 2011 at 3:08 PM, Dan Hendry wrote:
>
>> Moving nodes does not result in downtime provide you use proper
>> replication factors and read/write consistencies. The typical recommendation
>> is RF=3 and QUORUM reads/writes.
>>
>> ** **
>>
>> Dan
>>
>> ** **
>>
>> *From:* Paul Loy [mailto:ketera...@gmail.com]
>> *Sent:* July-04-11 5:59
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: How to scale Cassandra?
>>
>> ** **
>>
>> That's basically how I understand it.
>>
>> However, I think it gets better with larger clusters as the proportion of
>> the ring you move around at any time is much lower.
>>
>> On Mon, Jul 4, 2011 at 10:54 AM, Subscriber 
>> wrote:
>>
>> Hi there,
>>
>> I read a lot of Cassandra's high scalability feature: allowing seamless
>> addition of nodes, no downtime etc.
>> But I wonder how one will do this in practice in an operational system.
>>
>> In the system we're going to implement we're expecting a huge number of
>> writes with uniformly distributed keys
>> (the keys are given and cannot be generated). That means using
>> RandomPartitioner will (more or less) result in
>> the same work-load per node as any other OrderPreservePartitioner - right?
>>
>> But how do you scale a (more or less) balanced Cassandra cluster? I think
>> that in the end
>> you always have to double the number of nodes (adding just a handful of
>> nodes disburdens only the split regions, the
>> work-load of untouched regions will grow with unchanged speed).
>>
>> This seems to be ok for small clusters. But what do you do with when you
>> have several 100s of nodes in your cluster?
>> It seems to me that a balanced cluster is a bless for performance but a
>> curse for scalability...
>>
>> What are the alternatives? One could re-distribute the token ranges, but
>> this would cause
>> downtimes (AFAIK); not an option!
>>
>> Is there anything that I didn't understand or do I miss something else? Is
>> the only left strategy to make sure that
>> the cluster grows unbalanced so one can add nodes to the hotspots? However
>> in this case you have to make sure
>> that this strategy is lasting. Could be too optimistic...
>>
>> Best Regards
>> Udo
>>
>>
>>
>>
>> --
>> -
>> Paul Loy
>> p...@keteracel.com
>> http://uk.linkedin.com/in/paulloy
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.901 / Virus Database: 271.1.1/3743 - Release Date: 07/04/11
>> 02:35:00
>>
>
>
>
> --
> -
> Paul Loy
> p...@keteracel.com
> http://uk.linkedin.com/in/paulloy
>
>
>


Re: How to scale Cassandra?

2011-07-04 Thread ZFabrik Subscriber
Hi SC, 

I'm just talking about workload in general. The point is that sooner or later 
you come to the point that you need to scale-out. And the question is, what's 
the best strategy here? Especially when your cluster is almost balanced.

500 GB seems to be a good ball-park figure, I think I read this number 
somewhere else recently.

Regards
Udo

Am 04.07.2011 um 21:10 schrieb Sebastien Coutu:

> Hi Udo,
> 
> I didn't read the whole thread but can you define the type of workload you're 
> looking at? Do you have jobs that require reading the whole data stored in 
> your database? For example one big column family that needs to be read 
> entirely by a job? Because the amount of time required to read a whole disk 
> (SATA II) of 1TB is roughly 2-2.5 hours. Now add RAID to this and you can 
> modify this amount of time but your bottleneck will pretty much always be 
> your disks. On our cluster, we currently have more than 1TB per node and it 
> holds but we find that our sweet spot should be around 400-500GB per node.
> 
> Regards,
> 
> SC
> 
> 
> On Mon, Jul 4, 2011 at 3:01 PM, ZFabrik Subscriber  
> wrote:
> Let's assume you have 50 nodes and their work-load grows simultaneously. You 
> discover that the nodes are about to reach their limits (btw. what is the 
> actual limit of a Cassandra node? 100GB? 500GB? 1TB?) 
> You decide to add another 50 nodes. Do you do this within one step? Or one 
> after the other? Or in several rounds, always every RF-rd node?
> Or you add 20 nodes and move the token ranges. Again, in one step? 20 steps? 
> 4 steps 5 nodes each?
> This could take a while (in terms of days, if not weeks) in larger clusters!
> 
> Does anybody has experience with real life scale-outs?
> 
> Regards
> Udo
> 
> Am 04.07.2011 um 16:21 schrieb Paul Loy:
> 
>> Well, by issuing a nodetool move when a node is under high load, you 
>> basically make that node unresponsive. That's fine, but a nodetool move on 
>> one node also means that that node's replica data needs to move around the 
>> ring and possibly some replica data from the next (or previous) node in the 
>> ring. So how does this affect other nodes wrt RF and quorum? Will quorum 
>> fail until the replicas have moved also?
>> 
>> On Mon, Jul 4, 2011 at 3:08 PM, Dan Hendry  wrote:
>> Moving nodes does not result in downtime provide you use proper replication 
>> factors and read/write consistencies. The typical recommendation is RF=3 and 
>> QUORUM reads/writes.
>> 
>>  
>> 
>> Dan
>> 
>>  
>> 
>> From: Paul Loy [mailto:ketera...@gmail.com] 
>> Sent: July-04-11 5:59
>> To: user@cassandra.apache.org
>> Subject: Re: How to scale Cassandra?
>> 
>>  
>> 
>> That's basically how I understand it.
>> 
>> However, I think it gets better with larger clusters as the proportion of 
>> the ring you move around at any time is much lower.
>> 
>> On Mon, Jul 4, 2011 at 10:54 AM, Subscriber  wrote:
>> 
>> Hi there,
>> 
>> I read a lot of Cassandra's high scalability feature: allowing seamless 
>> addition of nodes, no downtime etc.
>> But I wonder how one will do this in practice in an operational system.
>> 
>> In the system we're going to implement we're expecting a huge number of 
>> writes with uniformly distributed keys
>> (the keys are given and cannot be generated). That means using 
>> RandomPartitioner will (more or less) result in
>> the same work-load per node as any other OrderPreservePartitioner - right?
>> 
>> But how do you scale a (more or less) balanced Cassandra cluster? I think 
>> that in the end
>> you always have to double the number of nodes (adding just a handful of 
>> nodes disburdens only the split regions, the
>> work-load of untouched regions will grow with unchanged speed).
>> 
>> This seems to be ok for small clusters. But what do you do with when you 
>> have several 100s of nodes in your cluster?
>> It seems to me that a balanced cluster is a bless for performance but a 
>> curse for scalability...
>> 
>> What are the alternatives? One could re-distribute the token ranges, but 
>> this would cause
>> downtimes (AFAIK); not an option!
>> 
>> Is there anything that I didn't understand or do I miss something else? Is 
>> the only left strategy to make sure that
>> the cluster grows unbalanced so one can add nodes to the hotspots? However 
>> in this case you have to make sure
>> that this strategy is lasting. Could be too optimistic...
>> 
>> Best Regards
>> Udo
>> 
>> 
>> 
>> 
>> -- 
>> -
>> Paul Loy
>> p...@keteracel.com
>> http://uk.linkedin.com/in/paulloy
>> 
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.901 / Virus Database: 271.1.1/3743 - Release Date: 07/04/11 
>> 02:35:00
>> 
>> 
>> 
>> 
>> -- 
>> -
>> Paul Loy
>> p...@keteracel.com
>> http://uk.linkedin.com/in/paulloy
> 
> 



RowKey in hexadecimal in CLI

2011-07-04 Thread Sébastien Druon
Hello!

Since we installed cassandra 0.8, the RowKeys are displayed in hexadecimal
in the CLI.
Any idea why and how to fix that?

Thanks in advance

Sebastien


Re: RowKey in hexadecimal in CLI

2011-07-04 Thread Jonathan Ellis
Because you haven't declared a key_validation_class.

On Mon, Jul 4, 2011 at 4:19 PM, Sébastien Druon  wrote:
> Hello!
> Since we installed cassandra 0.8, the RowKeys are displayed in hexadecimal
> in the CLI.
> Any idea why and how to fix that?
> Thanks in advance
> Sebastien



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: secondary index performance

2011-07-04 Thread aaron morton
> Is the assumption that rows/keys cached is inherited correct?  Is there any 
> way to see cfstats on secondary index sub-column families?

They are inherited, but AFAIK only at the time the secondary index is created. 
You would need to drop and re-create the secondary index to see it change. 

cfstats for secondary index CF's are available via JMX / JConsole. 

Cheers  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 4 Jul 2011, at 10:12, Jeremy Hanna wrote:

> 
> On Jul 3, 2011, at 4:29 PM, Jeremy Hanna wrote:
> 
>> Anyone know if secondary index performance should be in the 100-500 ms 
>> range.  That's what we're seeing right now when doing lookups on a single 
>> value.  We've increased keys_cached and rows_cached to 100% for that column 
>> family and assume that the secondary index gets the same attributes.  I've 
>> also reduced read_repair_chance to 0.2 because it doesn't get overwritten 
>> very frequently.
>> 
>> Is the assumption that rows/keys cached is inherited correct?  Is there any 
>> way to see cfstats on secondary index sub-column families?
> 
> the answer appears to be no and no.
> 
> Trying some other stuff with tools mentioned here: 
> http://spyced.blogspot.com/2010/01/linux-performance-basics.html but not 
> seeing anything particularly disk bound, though await (from iostat -x) seems 
> high on one of the devices.
> 
> One of our guys said he pointed at our realtime nodes (instead of analytic 
> nodes) but said the performance was worse.  Granted our analytic nodes are 
> m4.xl and our realtime nodes are currently large, but still with no load on 
> them, it should be quite fast I would think.
> 
>> 
>> Thanks,
>> 
>> Jeremy
> 



Re: flushing issue

2011-07-04 Thread aaron morton
When you say using CassandraServer do you mean an embedded cassandra server ? 
What process did you use to add the Keyspaces ? Adding a KS via the thrift API 
should take care of everything.

The simple test is stop the server and the clients, start the server again and 
see if the KS is defined by using nodetool cfstats. 

Cheers 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 4 Jul 2011, at 22:28, Vivek Mishra wrote:

> Hi,
> I know, I might be missing something here.
> I am currently facing 1 issue.
>  
> I have 2 cassandra clients(1. Using CassandraServer 2. Using 
> Cassandra.Client) running connecting to same host.
>  
> I have created Keyspace K1, K2 using client1(e.g. CassandraServer), but 
> somehow those keyspaces are not available with Client2(e.g. Cassandra.Client).
>  
> I have also tried by flusing StorageService.instance.ForceFlush to tables. 
> But that also didn’t work.
>  
>  
>  
> Any help/Suggestion?
>  
> 
> 
> Register for Impetus Webinar on ‘Leveraging the Cloud for your Product 
> Testing Needs’ on June 22 (10:00am PT). Meet Impetus as a sponsor for Hadoop 
> Summit 2011 in Santa Clara, CA on June 29. 
> 
> Click http://www.impetus.com to know more. Follow us on 
> www.twitter.com/impetuscalling 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.



Re: copy data from multi-node cluster to single node

2011-07-04 Thread aaron morton
> How do you change the name of a cluster?  The FAQ instructions do not seem to 
> work for me - are they still valid for 0.7.5?
> Is the backup / restore mechanism going to work, or is there a better/simpler 
> to copy data from multi-node to single-node?

Bug fixed on 0.7.6 
https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L21

Also you should move to 0.7.6 to get the Gossip fix 
https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L6

When it comes to moving the data back to a single node I would:
- run repair
- snapshot prod node
- clear all data including the system KS data from the dev node
- copy the snapshot data for only your KS to the dev node into the correct 
directory, e.g. data/ . 
- start the dev node
- add your KS, the node will now load the data

Ignoring the system data means the dev node can sort it's cluster name and 
token out using the yaml file. 

Even with 3 nodes and RF 3 it's impossible to ever say that one node has a 
complete copy of the data. Running repair will make it more likely, but the 
node could drop a mutation message during the repair or drop off gossip for few 
seconds. If you really want to have *everything* from the prod cluster then 
copy the data from all 3 nodes onto the dev node and compact it down. 

Hope that helps. 
  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 Jul 2011, at 03:05, Ross Black wrote:

> Hi,
> 
> I am using Cassandra 0.7.5 on Linux machines.
> 
> I am trying to backup data from a multi-node cluster (3 nodes) and restore it 
> into a single node cluster that has a different name (for development 
> testing).
> 
> The multi-node cluster is backed up using clustertool global_snapshot, and 
> then I copy the snapshot from a single node and replace the data directory in 
> the single node.
> The multi-node cluster has a replication factor of 3, so I assume that 
> restoring any node from the multi-node cluster will be the same.
> When started up this fails with a node name mismatch.
> 
> I have tried removing all the Location* files in the data directory (as per 
> http://wiki.apache.org/cassandra/FAQ#clustername_mismatch) but the single 
> node then fails with an error message:
> org.apache.cassandra.config.ConfigurationException: Found system table files, 
> but they couldn't be loaded. Did you change the partitioner?
> 
> 
> How do you change the name of a cluster?  The FAQ instructions do not seem to 
> work for me - are they still valid for 0.7.5?
> Is the backup / restore mechanism going to work, or is there a better/simpler 
> to copy data from multi-node to single-node?
> 
> Thanks,
> Ross
> 



Re: copy data from multi-node cluster to single node

2011-07-04 Thread Zhu Han
On Tue, Jul 5, 2011 at 8:58 AM, aaron morton wrote:

> How do you change the name of a cluster?  The FAQ instructions do not seem
> to work for me - are they still valid for 0.7.5?
> Is the backup / restore mechanism going to work, or is there a
> better/simpler to copy data from multi-node to single-node?
>
>
> Bug fixed on 0.7.6
> https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L21
>
>
> Also
> you should move to 0.7.6 to get the Gossip fix
> https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L6
>
> When
> it comes to moving the data back to a single node I would:
> - run repair
> - snapshot prod node
> - clear all data including the system KS data from the dev node
> - copy the snapshot data for only your KS to the dev node into the correct
> directory, e.g. data/ .
> - start the dev node
> - add your KS, the node will now load the data
>
> Ignoring the system data means the dev node can sort it's cluster name and
> token out using the yaml file.
>
> Even with 3 nodes and RF 3 it's impossible to ever say that one node has a
> complete copy of the data. Running repair will make it more likely, but the
> node could drop a mutation message during the repair or drop off gossip for
> few seconds. If you really want to have *everything* from the prod cluster
> then copy the data from all 3 nodes onto the dev node and compact it down.
>

Is it possible the snapshots from different nodes have the same name?


>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5 Jul 2011, at 03:05, Ross Black wrote:
>
> Hi,
>
> I am using Cassandra 0.7.5 on Linux machines.
>
> I am trying to backup data from a multi-node cluster (3 nodes) and restore
> it into a single node cluster that has a different name (for development
> testing).
>
> The multi-node cluster is backed up using clustertool global_snapshot, and
> then I copy the snapshot from a single node and replace the data directory
> in the single node.
> The multi-node cluster has a replication factor of 3, so I assume that
> restoring any node from the multi-node cluster will be the same.
> When started up this fails with a node name mismatch.
>
> I have tried removing all the Location* files in the data directory (as per
> http://wiki.apache.org/cassandra/FAQ#clustername_mismatch) but the single
> node then fails with an error message:
> org.apache.cassandra.config.ConfigurationException: Found system table
> files, but they couldn't be loaded. Did you change the partitioner?
>
>
> How do you change the name of a cluster?  The FAQ instructions do not seem
> to work for me - are they still valid for 0.7.5?
> Is the backup / restore mechanism going to work, or is there a
> better/simpler to copy data from multi-node to single-node?
>
> Thanks,
> Ross
>
>
>


connection issue

2011-07-04 Thread Aayush Jain
Hi,
When I am using multithreading with Cassandra Query Language ,I have to make 
connections for each thread.
A single connection object for whole of the thread pool is not working. I am 
using JDBC  for connectivity.

I know ,I may be missing something.

Any help/suggestions?



Register for Impetus Webinar on 'Leveraging the Cloud for your Product Testing 
Needs' on June 22 (10:00am PT). Meet Impetus as a sponsor for Hadoop Summit 
2011 in Santa Clara, CA on June 29.

Click http://www.impetus.com to know more. Follow us on 
www.twitter.com/impetuscalling


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


RE: RowKey in hexadecimal in CLI

2011-07-04 Thread Aayush Jain
As of Cassandra 0.8, we need to declare a key_validataion_class for the column 
family:

For example:
 update column family User with key_validation_class=UTF8Type;

From: Sébastien Druon [mailto:sdr...@spotuse.com]
Sent: 05 July 2011 02:50
To: user@cassandra.apache.org
Subject: RowKey in hexadecimal in CLI

Hello!

Since we installed cassandra 0.8, the RowKeys are displayed in hexadecimal in 
the CLI.
Any idea why and how to fix that?

Thanks in advance

Sebastien



Register for Impetus Webinar on 'Leveraging the Cloud for your Product Testing 
Needs' on June 22 (10:00am PT). Meet Impetus as a sponsor for Hadoop Summit 
2011 in Santa Clara, CA on June 29.

Click http://www.impetus.com to know more. Follow us on 
www.twitter.com/impetuscalling


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.