Re: Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

2011-01-17 Thread Erik Onnen
Unfortunately, the previous AMI we used to provision the 7.5 version is no longer available. More unfortunately, the two test nodes we spun up in each AZ did not get Nehalem architectures so the only things I can say for certain after running Mike's test 10x on each test node are: 1) I could not r

Re: about the consistency level

2011-01-17 Thread aaron morton
The ConsistenyLevel is passed with each read and write command. How you set it will depend on the client you are using. Which one are you using ? Aaron On 17/01/2011, at 8:50 PM, raoyixuan (Shandy) wrote: > How to set the consistency level in Cassandra 0.7? I mean what command? > > > 华为技

RE: about the consistency level

2011-01-17 Thread raoyixuan (Shandy)
Both hector and cassandra-cli . Can you tell me respectively? Thanks a lot. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, January 17, 2011 4:17 PM To: user@cassandra.apache.org Subject: Re: about the consistency level The ConsistenyLevel is passed with each read and write com

Re: about the consistency level

2011-01-17 Thread aaron morton
The cassandra-clie works as CL.ONE , currently it cannot be changed. I'm not sure if there is a reason for this, but if it's a feature you would like add a request to JIRA https://issues.apache.org/jira/browse/CASSANDRA In Hector it's part of the m.p.h.api.Keyspace interface as setConsistencyLe

RE: about the consistency level

2011-01-17 Thread raoyixuan (Shandy)
Thanks a lot. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, January 17, 2011 5:01 PM To: user@cassandra.apache.org Subject: Re: about the consistency level The cassandra-clie works as CL.ONE , currently it cannot be changed. I'm not sure if there is a reason for this, but if i

Between Clause

2011-01-17 Thread kh jo
What is the best way to model a query with between clause.. given that you have a large number of entries... thanks Jo

Re: Between Clause

2011-01-17 Thread aaron morton
Can you provide some more information ? Aaron On 17/01/2011, at 11:55 PM, kh jo wrote: > What is the best way to model a query with between clause.. given that you > have a large number of entries... > > thanks > Jo > >

Re: Between Clause

2011-01-17 Thread Donal Zang
On 17/01/2011 11:55, kh jo wrote: What is the best way to model a query with between clause.. given that you have a large number of entries... thanks Jo In my experience,for the row based 'between clause' with a random partition, you should design the column family carefully, So that you

Re: Between Clause

2011-01-17 Thread kh jo
example: finding country from IP address Mysql: I have table with 140,000 rows each with ipNumStart, IpNumEnd, Country so to find the country I use: WHERE ipNum BETWEEN ipNumStart AND ipNumEnd ipNumStart   ipNumEnd    Country 16777216 17301503  Australia 18939904 19005439   

Re: Between Clause

2011-01-17 Thread kh jo
another example:  generating visit statistics given that start and end date are dynamic --- On Mon, 1/17/11, kh jo wrote: From: kh jo Subject: Re: Between Clause To: user@cassandra.apache.org Date: Monday, January 17, 2011, 12:40 PM example: finding country from IP address Mysql: I have tab

Re: balancing load

2011-01-17 Thread Edward Capriolo
On Mon, Jan 17, 2011 at 2:44 AM, aaron morton wrote: > The nodes will not automatically delete stale data, to do that you need to > run nodetool cleanup. > > See step 3 in the Range Changes > Bootstrap > http://wiki.apache.org/cassandra/Operations#Range_changes > > If you are feeling paranoid be

Re: balancing load

2011-01-17 Thread Peter Schuller
> Just to head the next possible problem. If you run 'nodetool cleanup' > on each node and some of your nodes still have more data then others, > then it probably means your are writing the majority of data to a few > keys. ( you probably do not want to do that ) It may also be that a compact is n

Re: Cassandra-Maven-Plugin

2011-01-17 Thread Stephen Connolly
https://issues.apache.org/jira/browse/CASSANDRA-1997 On 16 January 2011 19:59, Stephen Connolly wrote: > it will be an attachment to an as yet un raised jira. look out for it > tomorrow/tuesday > > - Stephen > > --- > Sent from my Android phone, so random spelling mistakes, random nonsense > word

quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Samuel Benz
Dear List I found a strange behavior on our mini cluster during update with consistency level quorum. We have a cluster with 4 nodes. ReplicationFactor is 2, ReplicaPlacment is the RackAwareStrategy and the EndpointSnitch is the PropertyFileEndpointSnitch (with two data center and two racks each)

Re: balancing load

2011-01-17 Thread Edward Capriolo
On Mon, Jan 17, 2011 at 10:51 AM, Peter Schuller wrote: >> Just to head the next possible problem. If you run 'nodetool cleanup' >> on each node and some of your nodes still have more data then others, >> then it probably means your are writing the majority of data to a few >> keys. ( you probably

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Jonathan Ellis
On Mon, Jan 17, 2011 at 9:55 AM, Samuel Benz wrote: > We have a cluster with 4 nodes. ReplicationFactor is 2, ReplicaPlacment > is the RackAwareStrategy and the EndpointSnitch is the > PropertyFileEndpointSnitch (with two data center and two racks each). > > My understanding is, that with this par

Cassandra GC Settings

2011-01-17 Thread Dan Hendry
I am having some reliability problems in my Cassandra cluster which I am almost certain is due to GC. I was about to start delving into the guts of the problem by turning on GC logging but I have never done any serious java GC tuning before (time to learn I guess). As a first step however, I was ho

Internal error when using SimpleSnitch and dynamic_snitch: true

2011-01-17 Thread Jim Ancona
We accidently configured our cluster with SimpleSnitch (instead of PropertyFileSnitch) and dynamic_snitch: true. This is with version 0.7.0. We saw the errors below on get_slice and batch_mutate calls. The errors went away when we switched to PropertyFileSnitch. Should dynamic_snitch work with Si

Re: balancing load

2011-01-17 Thread Peter Schuller
> @Peter Isn't clean up a special case of compaction? IE it works as a > major compaction + removes data not belonging to the node? Yes, sorry. Brain lapse. Ignore my. -- / Peter Schuller

Re: Cassandra GC Settings

2011-01-17 Thread SriSatish Ambati
Dan, Please kindly attach your: 1) java -version 2) full commandline settings, heap sizes. 3) gc log from one of the nodes via: -XX:+PrintTenuringDistribution \ -XX:+PrintGCDetails \ -XX:+PrintGCTimeStamps \ -Xloggc:/var/log/cassandra/gc.log \ 4) number of cores on your system. How busy is the s

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread David Boxenhorn
I think you should just tell everybody that if you want to use QUORUM you need RF >= 3 for it to be meaningful. No one would use QUORUM with RF < 3 except in error. On Mon, Jan 17, 2011 at 6:08 PM, Jonathan Ellis wrote: > On Mon, Jan 17, 2011 at 9:55 AM, Samuel Benz > wrote: > > We have a clus

Re: Cassandra GC Settings

2011-01-17 Thread Peter Schuller
> very quickly from the young generation to the old generation". Furthermore, > the CMSInitiatingOccupancyFraction of 75 (from a JVM default of 68) means > "start gc in the old generation later", presumably to allow Cassandra to use > more of the old generation heap without needlessly trying to fre

Re: Cassandra in less than 1G of memory?

2011-01-17 Thread Victor Kabdebon
Peter : What do you recommand ? using Aaron Morton solution and using JNA or just disable mmap ? (Or is it the same and I missed something ?) Thank you all for your advice, I am surprised to be the only one to have this problem even if I'm using a pretty standard distribution. Best regards, Victo

Re: Cassandra in less than 1G of memory?

2011-01-17 Thread Peter Schuller
> Peter : What do you recommand ? using Aaron Morton solution and using JNA or > just disable mmap ? (Or is it the same and I missed something ?) I suggested disabling mmap() just to get you confidence in what the actual memory usage is, without it being "seemingly" higher than it is due to mmap()

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Peter Schuller
> I think you should just tell everybody that if you want to use QUORUM you > need RF >= 3 for it to be meaningful. > > No one would use QUORUM with RF < 3 except in error. Well, strictly speaking you could have an application designed to talk to Cassandra at QUORUM and an operator may choose to d

Re: balancing load

2011-01-17 Thread Karl Hiramoto
On 01/17/11 15:54, Edward Capriolo wrote: > Just to head the next possible problem. If you run 'nodetool cleanup' > on each node and some of your nodes still have more data then others, > then it probably means your are writing the majority of data to a few > keys. ( you probably do not want to do

Re: balancing load

2011-01-17 Thread Edward Capriolo
On Mon, Jan 17, 2011 at 1:20 PM, Karl Hiramoto wrote: > On 01/17/11 15:54, Edward Capriolo wrote: >> Just to head the next possible problem. If you run 'nodetool cleanup' >> on each node and some of your nodes still have more data then others, >> then it probably means your are writing the majorit

Re: Internal error when using SimpleSnitch and dynamic_snitch: true

2011-01-17 Thread Jonathan Ellis
Already fixed for 0.7.1 in CASSANDRA-1530. On Mon, Jan 17, 2011 at 11:29 AM, Jim Ancona wrote: > We accidently configured our cluster with SimpleSnitch (instead of > PropertyFileSnitch) and dynamic_snitch: true. This is with version > 0.7.0. > > We saw the errors below on get_slice and batch_muta

Re: Cassandra GC Settings

2011-01-17 Thread Jonathan Ellis
On Mon, Jan 17, 2011 at 11:58 AM, Peter Schuller wrote: > 45 seconds is pretty significant even for a 12 gig heap Note that you really need to uncomment the -XX:PrintGC* arguments to get a detailed GC log from the jvm before taking guesses at this; the numbers GCInspector can get are NOT pause ti

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Jonathan Ellis
Adding CL.TWO would be easy enough. :) On Mon, Jan 17, 2011 at 12:12 PM, Peter Schuller wrote: >> I think you should just tell everybody that if you want to use QUORUM you >> need RF >= 3 for it to be meaningful. >> >> No one would use QUORUM with RF < 3 except in error. > > Well, strictly speaki

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Peter Schuller
> Adding CL.TWO would be easy enough. :) True, but the obvious generalization is to be able to select an arbitrary replica count and that seemed like a bigger change to the API. But if CL.TWO would be considered clean enough... I may submit a jira/patch. -- / Peter Schuller

Super CF or two CFs?

2011-01-17 Thread Steven Mac
How can I best map an object containing two maps, one of which is updated very frequently and the other only occasionally? a) As one super CF, which each map in a separate supercolumn and the map entries being the subcolumns? b) As two CFs, one for each map. I'd like to discuss the why behind

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Samuel Benz
On 01/17/2011 05:08 PM, Jonathan Ellis wrote: > On Mon, Jan 17, 2011 at 9:55 AM, Samuel Benz wrote: >> We have a cluster with 4 nodes. ReplicationFactor is 2, ReplicaPlacment >> is the RackAwareStrategy and the EndpointSnitch is the >> PropertyFileEndpointSnitch (with two data center and two racks

Re: about the consistency level

2011-01-17 Thread Aaron Morton
Have you added a Jira for this? Or does anyone else want or not want this feature ? I'll try to add it as practice. Aaron On 17/01/2011, at 10:15 PM, "raoyixuan (Shandy)" wrote: > Thanks a lot. > > From: aaron morton [mailto:aa...@thelastpickle.com] > Sent: Monday, January 17, 2011 5:01 PM >

Re: Super CF or two CFs?

2011-01-17 Thread Dave Viner
can you give an example of the data and how you'd access it? what would your expected columns (and/or supercolumns) be? Dave Viner On Mon, Jan 17, 2011 at 11:05 AM, Steven Mac wrote: > How can I best map an object containing two maps, one of which is updated > very frequently and the other onl

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Jonathan Ellis
On Mon, Jan 17, 2011 at 2:10 PM, Samuel Benz wrote: >>> Case1: >>> If 'TEST' was previous stored on Node1, Node2, Node3 -> The update will >>> succeed. >>> >>> Case2: >>> If 'TEST' was previous stored on Node2, Node3, Node4 -> The update will >>> not work. >> >> If you have RF=2 then it will be st

Re: Cassandra GC Settings

2011-01-17 Thread Dan Hendry
Thanks for all the info, I think I have been able to sort out my issue. The new settings I am using are: -Xmn512M (Very important I think) -XX:SurvivorRatio=5 (Not very important I think) -XX:MaxTenuringThreshold=5 -XX:ParallelGCThreads=8 -XX:CMSInitiatingOccupancyFraction=75 Since applying these

Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-17 Thread Colin Vipurs
Java + Pelops On Sat, Jan 15, 2011 at 10:58 PM, Dave Viner wrote: > Perl using the thrift interface directly. > On Sat, Jan 15, 2011 at 6:10 AM, Daniel Lundin wrote: >> >> python + pycassa >> scala + Hector >> >> On Fri, Jan 14, 2011 at 6:24 PM, Ertio Lew wrote: >> > Hey, >> > >> > If you have

Re: Cassandra GC Settings

2011-01-17 Thread SriSatish Ambati
Thanks, Dan: Yes, -Xmn512MB/1G sizes the Young Generation explicitly and removes the adaptive resizing out of the picture. (If at all possible send your gc log over & we can analyze the promotion failure a little bit more finely.) The low load implies that that you are able to use the parallel thr

Re: Cassandra GC Settings

2011-01-17 Thread Peter Schuller
> Now, a full stop of the application was what I was seeing extensively before > (100-200 times over the course of a major compaction as reported by > gossipers on other nodes). I have also just noticed that the previous > instability (ie application stops) correlated with the compaction of a few >

RE: Super CF or two CFs?

2011-01-17 Thread Steven Mac
Sure, consider stock data, where the stock symbol is the row key. The stock data consists of a rather stable part and a very volatile part, both of which would be a super column. The stable super column would contain subcolumns such as company name, address, and some annual or quarterly data. T

Re: Super CF or two CFs?

2011-01-17 Thread Stephen Connolly
On 17 January 2011 22:36, Steven Mac wrote: > Sure, consider stock data, where the stock symbol is the row key. The stock > data consists of a rather stable part and a very volatile part, both of > which would be a super column. The stable super column would contain > subcolumns such as company na

RE: Super CF or two CFs?

2011-01-17 Thread Steven Mac
I guess I was maybe trying to simplify the question too much. In reality I do not have one volatile part, but multiple ones (say all trading data of day). Each would be a supercolumn identified by the time slot, with the individual fields as subcolumns. Of course, I could prefix the time slot

Re: Super CF or two CFs?

2011-01-17 Thread Brandon Williams
On Mon, Jan 17, 2011 at 5:12 PM, Steven Mac wrote: > I guess I was maybe trying to simplify the question too much. In reality I > do not have one volatile part, but multiple ones (say all trading data of > day). Each would be a supercolumn identified by the time slot, with the > individual field

please help with multiget

2011-01-17 Thread Shu Zhang
Here's the method declaration for quick reference: map> multiget_slice(string keyspace, list keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) It looks like you must have the same SlicePredicate for every key in your batch retrieval, so what are you

Re: please help with multiget

2011-01-17 Thread Brandon Williams
On Mon, Jan 17, 2011 at 6:53 PM, Shu Zhang wrote: > Here's the method declaration for quick reference: > map> multiget_slice(string keyspace, > list keys, ColumnParent column_parent, SlicePredicate predicate, > ConsistencyLevel consistency_level) > > It looks like you must have the same SlicePred

Re: please help with multiget

2011-01-17 Thread Aaron Morton
If you can provide some more information on a specific use case we may be able to help with the modelling. The general approach is to denormalise the data to the point where each request/activity/feature in your application results in a call to get data from one or more rows in one CF. It's not alw

What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?

2011-01-17 Thread Ertio Lew
What would be the best client option to go with in order to use Cassandra through an application to be implemented in PHP. It seems that PHP developers have a high barrier of entry to Cassandra's world because of the unavailability of relatively mature, developed and well proven client options (li

Re: What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?

2011-01-17 Thread Brandon Williams
On Mon, Jan 17, 2011 at 7:22 PM, Ertio Lew wrote: > What would be the best client option to go with in order to use > Pycassa. https://github.com/thobbs/pycassa > Cassandra through an application to be implemented in PHP. > Oh. Then https://github.com/thobbs/phpcassa

Re: What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?

2011-01-17 Thread Rajkumar Gupta
Hey Brandon, 1. ) Is it devloped to the level in order to support all the necessary features to take full advantage of Cassandra? 2. ) Is it used in production by anyone ? 3. ) What are its limitations? Thanks. On Tue, Jan 18, 2011 at 7:11 AM, Brandon Williams wrote: > On Mon, Jan 17, 2011 a

Re: What is be the best possible client option available to a PHP developer for implementing an application ready for production environments ?

2011-01-17 Thread Tyler Hobbs
> > 1. ) Is it devloped to the level in order to support all the > necessary features to take full advantage of Cassandra? > Yes. There aren't some of the niceties of pycassa yet, but you can do everything that Cassandra offers with it. > 2. ) Is it used in production by anyone ? > Yes, I've

Re: quorum calculation seems to depend on previous selected nodes

2011-01-17 Thread Samuel Benz
On 01/17/2011 09:28 PM, Jonathan Ellis wrote: > On Mon, Jan 17, 2011 at 2:10 PM, Samuel Benz wrote: Case1: If 'TEST' was previous stored on Node1, Node2, Node3 -> The update will succeed. Case2: If 'TEST' was previous stored on Node2, Node3, Node4 -> The update will >

Re: Tombstone lifespan after multiple deletions

2011-01-17 Thread Ryan King
On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn wrote: > If I delete a row, and later on delete it again, before GCGraceSeconds has > elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan > In other words, if I have the following scen