Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
Hi Jake, Thanks. Yes, I forgot to mention also that I had raised the solandra.shards.at.once param from 4 to 5 (to match the # of nodes). Should I have raised it to 10 or 15 (multiple of 5)? I have added all the documents that I needed to the index now. It appears the distribution became more even

Re: Weird behavior in Cassandra 1.1.0 - throwing unconfigured CF exceptions when the CF is present

2012-06-24 Thread Tharindu Mathew
Yes, it seems an error on our side. Sorry for the noise. On Sun, Jun 24, 2012 at 11:38 PM, aaron morton wrote: > I would check if the schema's have diverged, run describe cluster in the > cli. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thela

Re: Limited row cache size

2012-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
sorry I meant 1.1.1 build On Mon, Jun 25, 2012 at 10:40 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote: > I was using the datastax build. Do they also have a 1.1 build? > > On Mon, Jun 18, 2012 at 9:05 AM, aaron morton wrote: >> cassandra 1.1.1 ships with concurrentlinkedhashmap-lru-1.3.jar >> >> row_cach

Re: Limited row cache size

2012-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
I was using the datastax build. Do they also have a 1.1 build? On Mon, Jun 18, 2012 at 9:05 AM, aaron morton wrote: > cassandra 1.1.1 ships with concurrentlinkedhashmap-lru-1.3.jar > > row_cache_size_in_mb starts life as an int but the byte size is stored as a > long > https://github.com/apache/c

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Jake Luciani
Hi Safdar, If you want to get better utilization of the cluster raise the solandra.shards.at.once param in solandra.properties -Jake On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy wrote: > Hi, > > I've searched online but was unable to find any leads for the problem > below. This mailing

Consistency Problem with Quorum consistencyLevel configuration

2012-06-24 Thread Jason Tang
Hi I met the consistency problem when we have Quorum for both read and write. I use MultigetSubSliceQuery to query rows from super column limit size 100, and then read it, then delete it. And start another around. But I found, the row which should be delete by last query, it still sh

Re: Starting cassandra with -D option

2012-06-24 Thread Greg Fausak
I did something similar for my installation, but I used ENV variables: I created a directory on a machine (call this the master) with directories for all of the distributions (call them slaves). So, consider: /master/slave1 /master/slave2 ... /master/slaven then i rdist this to all of my slaves.

Re: how to reduce latency?

2012-06-24 Thread Safdar Kureishy
Hi Yan, Did you manage to figure out what was causing the increasing latency on your cluster? Was the resolution just to add more nodes, or something else? Thanks, Safdar On Jun 13, 2012 2:40 PM, "Yan Chunlu" wrote: > I have three nodes running cassandra 0.7.4 about two years, as showed > below:

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
Thanks. Oh, I forgot to mention that I'm using cassandra 1.1.0-beta2...in case that question comes up. Hoping someone can offer some more feedback on the likelyhood of this behavior ... Thanks again, Safdar On Jun 24, 2012 9:22 PM, "Dave Brosius" wrote: > Well it sounds like this doesn't apply t

Re: Fat Client Commit Log

2012-06-24 Thread aaron morton
The fat client would still have some information in the system CF. Are the files big ? Are they continually created ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/06/2012, at 8:07 AM, Frank Ng wrote: > Hi All, > > We are using the

Re: Column names overhead

2012-06-24 Thread aaron morton
> What is the penalty for using longer column names? Each column name is stored in each -Data file were a value is stored for it. So if you have muchos overwrites the column name may be stored many places. > Should I sacrifice longer self-explanatory names for shorter cryptic ones to > save the

Re: Cassandra 1.0.6 data flush query

2012-06-24 Thread aaron morton
> memtable_total_space_in_mb: 200 This means cassandra tries to use less than 200MB of real memory to hold memtables. The problem is java takes a lot more memory to hold data than it takes to store on disk. You can see the ratio of serialized to live bytes logged from the Memtable with messages

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Dave Brosius
Well it sounds like this doesn't apply to you. if you had set up your column family in cql as PRIMARY KEY (domain_name, path) or something like that and where looking at lots and lots of url pages (domain_name + path), but from a very small number domain_names, then the partitioner be

Re: Strange behavior ¿data corruption?

2012-06-24 Thread aaron morton
If you are using more than one node, make sure you have set the Consistency Level of the request to QUOURM. Otherwise check your code for errors. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/06/2012, at 5:30 AM, Juan Ezquerro wrote

Re: Starting cassandra with -D option

2012-06-24 Thread aaron morton
> Idea is to avoid having the copies of cassandra code in each node, If you run cassandra from the NAS you are adding a single point of failure into the system. Better to use some form of deployment automation and install all the requirement components onto each node. Cheers ---

Re: Weird behavior in Cassandra 1.1.0 - throwing unconfigured CF exceptions when the CF is present

2012-06-24 Thread aaron morton
I would check if the schema's have diverged, run describe cluster in the cli. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/06/2012, at 12:22 AM, Tharindu Mathew wrote: > Hi, > > I'm having issues with Hector 1.1.0 and Cassandra 1.1.

Re: Tiered compation on two disks

2012-06-24 Thread aaron morton
> I have a Cassandra installation where we plan to store 1Tb of data, split > between two 1Tb disks. In general it's a good idea to limit the per node storage to 300GB to 400GB. This has more to do with operational issues that any particular issue with cassandra. However storing a very large num

Re: wildcards as both ends

2012-06-24 Thread aaron morton
> I'm wondering how or if it's possible to implement efficient wildcards at > both ends, e.g. *string* No. > - if I can get another equality constraint which narrows down potential > result set significantly, I can do a scan. I'm not sure how feasible this is > without benchmarks. Does any one

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
Hi Dave, Would you mind elaborating a bit more on that, preferably with an example? AFAIK, Solandra uses the unique id of the Solr document as the input for calculating the md5 hash for shard/node assignment. In this case the ids are just millions of varied web URLs that do *not* adhere to any reg

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Dave Brosius
If i read what you are saying, you are _not_ using composite keys? That's one thing that could do it, if the first part of the composite key had a very very low cardinality. On 06/24/2012 11:00 AM, Safdar Kureishy wrote: Hi, I've searched online but was unable to find any leads for the proble

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
An additional detail is that the CPU utilization on those nodes is proportional to the load below, so machines 9.9.9.1 and 9.9.9.3 experience a fraction of CPU load as compared to the remaining 3 nodes. This might further point to the possibility that the keys are hashing minimally to the token ran