Re: Number of client connections

2010-06-02 Thread Ran Tavory
as far as I know, only the os level limitations, e.g. typically ~60k On Thu, Jun 3, 2010 at 9:34 AM, Lev Stesin wrote: > Hi, > > Is there a limit on the number of client connections to a node? Thanks. > > -- > Lev >

Number of client connections

2010-06-02 Thread Lev Stesin
Hi, Is there a limit on the number of client connections to a node? Thanks. -- Lev

Re: nodetool cleanup isn't cleaning up?

2010-06-02 Thread Ran Tavory
getRangeToEndpointMap is very useful, thanks, I didn't know about it... however, I've reconfigured my cluster since (moved some nodes and tokens) so not the problem is gone. I guess I'll use getRangeToEndpointMap next time I see something like this... On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis

Re: Effective cache size

2010-06-02 Thread Jonathan Ellis
On Wed, Jun 2, 2010 at 10:39 PM, David King wrote: > If I go to fetch some row given the rack-unaware placement strategy, the > default snitch and CL==ONE, the node that is asked is the first node in the > ring with the datum that is currently up, then a checksum is sent to the > replicas to tr

Re: ColumnFamilyInputFormat with super columns

2010-06-02 Thread Jonathan Ellis
We don't support supercolumns in CFIF yet. Peng Guo added this in his patchset at http://files.cnblogs.com/gpcuster/CassandraInputFormat.rar but it's mixed in with a ton of other changes. Honestly it's probably easier to start fresh, but it might be useful to look at his code for inspiration. On

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Jonathan Ellis
remember: you get concurrent mode failures, when the old gen fills up with garbage before it can finish the CMS. so adding capacity = reducing load per machine is the easiest way to make this a non-issue. On Wed, Jun 2, 2010 at 12:45 PM, Eric Halpern wrote: > > > Ryan King wrote: >> >> Why run w

Re: Heterogeneous Cassandra Cluster

2010-06-02 Thread Jonathan Ellis
No. And if we did it would be a bad idea: good ops practice is to _minimize_ variability. On Wed, Jun 2, 2010 at 3:18 AM, David Boxenhorn wrote: > Is it possible to make a heterogeneous Cassandra cluster, with both Linux > and Windows nodes? I tried doing it and got > > Error in ThreadPoolExecut

Re: Handling disk-full scenarios

2010-06-02 Thread Jonathan Ellis
this is why JBOD configuration is contraindicated for cassandra. http://wiki.apache.org/cassandra/CassandraHardware On Tue, Jun 1, 2010 at 1:08 PM, Ian Soboroff wrote: > My nodes have 5 disks and are using them separately as data disks.  The > usage on the disks is not uniform, and one is nearly

Re: Start key must sort before (or equal to) finish key in your partitioner

2010-06-02 Thread Jonathan Ellis
that would be reasonable On Wed, Jun 2, 2010 at 6:41 AM, David Boxenhorn wrote: > Would it be better to use an SQL-style timestamp ("-MM-DD HH:MM:SS.MMM") > + unique id, then? They sort lexically the same as they sort > chronologically. > > On Wed, Jun 2, 2010 at 4:37 PM, Leslie Viljoen > wr

Re: Is there any way to detect when a node is down so I can failover more effectively?

2010-06-02 Thread Jonathan Ellis
you're overcomplicating things. just connect to *a* node, and if it happens to be down, try a different one. nodes being down should be a rare event, not a normal condition. no need to optimize for it so much. also see http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to 2010/6/1 Patri

Re: nodetool cleanup isn't cleaning up?

2010-06-02 Thread Jonathan Ellis
Then the next step is to check StorageService.getRangeToEndpointMap via jmx On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory wrote: > I'm using RackAwareStrategy. But it still doesn't make sense I think... > let's see what did I miss... > According to http://wiki.apache.org/cassandra/Operations > > Ra

Re: Monitoring compaction

2010-06-02 Thread Jonathan Ellis
Sure, patching CM stats into nodetool is fine. On Tue, Jun 1, 2010 at 9:50 AM, Ian Soboroff wrote: > Regarding compaction thresholds... the BMT example says to set the threshold > to 0 during an import.  Is this advisable during any bulk import (say using > batch mutations or just lots and lots o

Effective cache size

2010-06-02 Thread David King
If I go to fetch some row given the rack-unaware placement strategy, the default snitch and CL==ONE, the node that is asked is the first node in the ring with the datum that is currently up, then a checksum is sent to the replicas to trigger read repair as appropriate. So with the row cache, tha

Re: Continuously increasing RAM usage

2010-06-02 Thread Jake Luciani
I've started seeing this issue as well. Running 0.6.2. One interesting thing I happened upon, I explicitly called the GC via jconsole and the heap dropped completely fixing the issue. When you explicitly call System.gc() it does a full sweep. I'm wondering if this issue is to do with the GC fla

Re: Read operation with CL.ALL, not yet supported?

2010-06-02 Thread Yuki Morishita
Gary, Thanks for reply. I've opened an issue at https://issues.apache.org/jira/browse/CASSANDRA-1152 Yuki 2010/6/3 Gary Dusbabek : > Yuki, > > Can you file a jira ticket for this > (https://issues.apache.org/jira/browse/CASSANDRA)?  The wiki indicates > that this should be allowed:  http://wiki

ColumnFamilyInputFormat with super columns

2010-06-02 Thread Torsten Curdt
I have a super column along he lines of => { => { att: value }} Now I would like to process a set of rows [from_time..until_time] with Hadoop. I've setup the hadoop job like this job.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setColumnFamil

Re: Changing replication factor from 2 to 3

2010-06-02 Thread Rob Coli
On 6/2/10 12:49 PM, Eric Halpern wrote: We'd like to double our cluster size from 4 to 8 and increase our replication factor from 2 to 3. Is there any special procedure we need to follow to increase replication? Is it sufficient to just start the new nodes with the replication factor of 3 and t

Changing replication factor from 2 to 3

2010-06-02 Thread Eric Halpern
We'd like to double our cluster size from 4 to 8 and increase our replication factor from 2 to 3. Is there any special procedure we need to follow to increase replication? Is it sufficient to just start the new nodes with the replication factor of 3 and then reconfigure the existing nodes to the

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Eric Halpern
Ryan King wrote: > > Why run with so few nodes? > > -ryan > > On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern wrote: >> >> Hello, >> >> We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32 >> GB) using EBS storage with 8 GB of heap allocated to the JVM. >> >> Every couple of

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Eric Halpern
Oleg Anastasjev wrote: > >> >> Has anyone experienced this sort of problem? It would be great to hear >> from >> anyone who has had experience with this sort of issue and/or suggestions >> for >> how to deal with it. >> >> Thanks, Eric > > Yes, i did. Symptoms you described point to concur

Re: Continuously increasing RAM usage

2010-06-02 Thread Torsten Curdt
We've also seen something like this. Will soon investigate and try again with 0.6.2 On Wed, Jun 2, 2010 at 20:27, Paul Brown wrote: > > FWIW, I'm seeing similar issues on a cluster.  Three nodes, Cassandra 0.6.1, > SUN JDK 1.6.0_b20.  I will try to get some heap dumps to see what's building > u

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook
Insert "if you want to use long values for keys and column names" above paragraph 2. I forgot that part. On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook wrote: > If you want to do range queries on the keys, you can use OPP to do this: > (example using UTF-8 lexicographic keys, with bursts split ac

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook
If you want to do range queries on the keys, you can use OPP to do this: (example using UTF-8 lexicographic keys, with bursts split across rows according to row size limits) Events: { "20100601.05.30.003": { "20100601.05.30.003": "20100601.05.30.007": ... } } With a future version

Re: Continuously increasing RAM usage

2010-06-02 Thread Paul Brown
FWIW, I'm seeing similar issues on a cluster. Three nodes, Cassandra 0.6.1, SUN JDK 1.6.0_b20. I will try to get some heap dumps to see what's building up. I've seen this sort of issue in systems that make heavy use of java.util.concurrent queues/executors, e.g.: http://bugs.sun.com/bugdatab

Capacity planning and Re: Handling disk-full scenarios

2010-06-02 Thread Ian Soboroff
Reading some more (someone break in when I lose my clue ;-) Reading the streams page in the wiki about anticompaction, I think the best approach to take when a node gets its disks overfull, is to set the compaction thresholds to 0 on all nodes, decommission the overfull node, wait for stuff to get

Re: Read operation with CL.ALL, not yet supported?

2010-06-02 Thread Gary Dusbabek
Yuki, Can you file a jira ticket for this (https://issues.apache.org/jira/browse/CASSANDRA)? The wiki indicates that this should be allowed: http://wiki.apache.org/cassandra/API Regards, Gary. On Tue, Jun 1, 2010 at 21:50, Yuki Morishita wrote: > Hi, > > I'm testing several read operations(

Re: Error during startup

2010-06-02 Thread Gary Dusbabek
I was able to reproduce the error by staring up a node using RandomPartioner, kill it, switch to OrderPreservingPartitioner, restart, kill, switch back to RandomPartitioner, BANG! So it looks like you tinkered with the partitioner at some point. This has the unfortunate effect of corrupting your s

Re: Heterogeneous Cassandra Cluster

2010-06-02 Thread David Boxenhorn
Our replication factor was 1, so that wasn't the problem. (We tried other replication factors too, just in case, but it didn't help.) On Wed, Jun 2, 2010 at 7:51 PM, Nahor > wrote: > On 2010-06-02 3:18, David Boxenhorn wrote: > >> Is it possible to make a heterogeneous Cassandra cluster, with b

Re: Giant sets of ordered data

2010-06-02 Thread Ben Browning
With a traffic pattern like that, you may be better off storing the events of each burst (I'll call them group) in one or more keys and then storing these keys in the day key. EventGroupsPerDay: { "20100601": { 123456789: "group123", // column name is timestamp group was received, column val

Re: Heterogeneous Cassandra Cluster

2010-06-02 Thread Nahor
On 2010-06-02 3:18, David Boxenhorn wrote: Is it possible to make a heterogeneous Cassandra cluster, with both Linux and Windows nodes? I tried doing it and got Error in ThreadPoolExecutor java.lang.NullPointerException Not sure if this is due to the Linux/Windows mix or something else. Det

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Ryan King
Why run with so few nodes? -ryan On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern wrote: > > Hello, > > We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32 > GB) using EBS storage with 8 GB of heap allocated to the JVM. > > Every couple of hours, each of the nodes does a concur

Re: Giant sets of ordered data

2010-06-02 Thread David Boxenhorn
Let's say you're logging events, and you have billions of events. What if the events come in bursts, so within a day there are millions of events, but they all come within microseconds of each other a few times a day? How do you find the events that happened on a particular day if you can't store t

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook
Either OPP by key, or within a row by column name. I'd suggest the latter. If you have structured data to stick under a column (named by the timestamp), then you can serialize and unserialize it yourself, or you can use a supercolumn. It's effectively the same thing. Cassandra only provides the su

Re: Giant sets of ordered data

2010-06-02 Thread Ben Browning
inute/hour/day/year depending on the volume of your data. Something like the following: SomeTimeData: { // columnfamily "20100601": { // key, mmdd 123456789: "value1", // column name is milliseconds since epoch 123456799: "value2" }, "20100602&q

Giant sets of ordered data

2010-06-02 Thread David Boxenhorn
How do I handle giant sets of ordered data, e.g. by timestamps, which I want to access by range? I can't put all the data into a supercolumn, because it's loaded into memory at once, and it's too much data. Am I forced to use an order-preserving partitioner? I don't want the headache. Is there an

Re: Handling disk-full scenarios

2010-06-02 Thread Ian Soboroff
Ok, answered part of this myself. You can stop a node, move files around on the data disks, as long as they stay in the right keyspace directories, and all is fine. Now, I have a single Data.db file which is 900GB and is compacted. The drive its on is only 1.5TB, so it can't anticompact at all.

Re: Range search on keys not working?

2010-06-02 Thread Jonathan Shook
Can you clarify what you mean by 'random between nodes' ? On Wed, Jun 2, 2010 at 8:15 AM, David Boxenhorn wrote: > I see. But we could make this work if the random partitioner was random only > between nodes, but was still ordered within each node. (Or if there were > another partitioner that did

Re: Start key must sort before (or equal to) finish key in your partitioner

2010-06-02 Thread David Boxenhorn
Would it be better to use an SQL-style timestamp ("-MM-DD HH:MM:SS.MMM") + unique id, then? They sort lexically the same as they sort chronologically. On Wed, Jun 2, 2010 at 4:37 PM, Leslie Viljoen wrote: > On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis wrote: > > OPP uses lexical ordering

Re: Start key must sort before (or equal to) finish key in your partitioner

2010-06-02 Thread Leslie Viljoen
On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis wrote: > OPP uses lexical ordering on the keys, which isn't going to be the > same as the natural order for a time-based uuid. *palmface*

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
I see. But we could make this work if the random partitioner was random only between nodes, but was still ordered within each node. (Or if there were another partitioner that did this.) That way we could get everything we need from each node separately. The results would not be ordered, but they wo

Re: Range search on keys not working?

2010-06-02 Thread Sylvain Lebresne
> So why do the "start" and "finish" range parameters exist? Because especially if you want to iterate over all your key (which as stated by Ben above is the only meaningful way to use get_range_slices() with the random partitionner), you'll want to paginate that. And that's where the 'start' and

Re: Range search on keys not working?

2010-06-02 Thread Ben Browning
They exist because when using OPP they are useful and make sense. On Wed, Jun 2, 2010 at 8:59 AM, David Boxenhorn wrote: > So why do the "start" and "finish" range parameters exist? > > On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote: >> >> Martin, >> >> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Ma

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
So why do the "start" and "finish" range parameters exist? On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote: > Martin, > > On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller > wrote: > > I think you can specify an end key, but it should be a key which does > exist > > in your column family

Re: Range search on keys not working?

2010-06-02 Thread Ben Browning
Martin, On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller wrote: > I think you can specify an end key, but it should be a key which does exist > in your column family. Logically, it doesn't make sense to ever specify an end key with random partitioner. If you specified a start key of "aaa"

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
Here is the relevant part of the previous thread: Thank you. That is very good news. I can sort the results myself - what is important is that I get them! On Thu, May 13, 2010 at 2:42 AM, Vijay wrote: If you use Random partitioner, You will *NOT* get RowKey's sorted. (Columns are sorted always).

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
That's crazy! I could artificially insert a key with just the prefix, as a placeholder, but why can't Cassandra do that virtually? On Wed, Jun 2, 2010 at 3:34 PM, Dr. Martin Grabmüller < martin.grabmuel...@eleven.de> wrote: > I think you can specify an end key, but it should be a key which does

RE: Range search on keys not working?

2010-06-02 Thread Dr . Martin Grabmüller
I think you can specify an end key, but it should be a key which does exist in your column family. But maybe I'm off the track here and someone else here knows more about this key range stuff. Martin From: David Boxenhorn [mailto:da...@lookin2.com]

Re: Range search on keys not working?

2010-06-02 Thread Ben Browning
The keys will not be in any specific order when not using OPP, so, you will never "get out of range" - you have to iterate over every single key to find all keys that start with "CATEGORY". If you don't iterate over every single key you run a chance of missing some. Obviously, this kind of key rang

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
In other words, I should check the values as I iterate, and stop iterating when I get out of range? I'll try that! On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller < martin.grabmuel...@eleven.de> wrote: > When not using OOP, you should not use something like 'CATEGORY/' as the > end key. >

RE: Range search on keys not working?

2010-06-02 Thread Dr . Martin Grabmüller
When not using OOP, you should not use something like 'CATEGORY/' as the end key. Use the empty string as the end key and limit the number of returned keys, as you did with the 'max' value. If I understand correctly, the end key is used to generate an end token by hashing it, and there is not

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
The previous thread where we discussed this is called, "key is sorted?" On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote: > I'm not using OPP. But I was assured on earlier threads (I asked several > times to be sure) that it would work as stated below: the results would not > be ordered, b

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
I'm not using OPP. But I was assured on earlier threads (I asked several times to be sure) that it would work as stated below: the results would not be ordered, but they would be correct. On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote: > Sounds like you are not using an order preserving par

Re: Range search on keys not working?

2010-06-02 Thread Torsten Curdt
Sounds like you are not using an order preserving partitioner? On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: > Range search on keys is not working for me. I was assured in earlier threads > that range search would work, but the results would not be ordered. > > I'm trying to get all the ro

Range search on keys not working?

2010-06-02 Thread David Boxenhorn
Range search on keys is not working for me. I was assured in earlier threads that range search would work, but the results would not be ordered. I'm trying to get all the rows that start with "CATEGORY." I'm doing: String start = "CATEGORY."; . . . keyspace.getSuperRangeSlice(columnParent, slice

Heterogeneous Cassandra Cluster

2010-06-02 Thread David Boxenhorn
Is it possible to make a heterogeneous Cassandra cluster, with both Linux and Windows nodes? I tried doing it and got Error in ThreadPoolExecutor java.lang.NullPointerException Not sure if this is due to the Linux/Windows mix or something else. Details below: [r...@iqdev01 cassandra]# bin/c

Re: [***SPAM*** ] Re: writing speed test

2010-06-02 Thread Shuai Yuan
Thanks Peter! In my test application, for each record, rowkey -> rand() * 4, about 64B column * 20 -> rand() * 20, about 320B I use batch_insert(rowkey, col*20) in thrift. Kevin Yuan ??: Peter Sch??ller ??: user@cassandra.apache.org : [***SPAM*** ] Re:

Re: writing speed test

2010-06-02 Thread Peter Schüller
Since this thread has now gone on for a while... As far as I can tell you never specify the characteristics of your writes. Evaluating expected write throughput in terms of "MB/s to disk" is pretty impossible if one does not know anything about the nature of the writes. If you're expecting 50 MB,

Re: [***SPAM*** ] Re: [***SPAM*** ] Re: [***SPAM*** ] Re: [***SPAM*** ] Re: writing speed test

2010-06-02 Thread Shuai Yuan
Still seems MEM. However it's hard to convince that constantly writing(even great amount of data) needs so much MEM(16GB). The process is quite simple, input_data -> memtable -> flush to disk right? What does cassandra need so much MEM for? Thanks! ?? 2010-06-02 16:24 +0800??lwl?? > N

Re: Nodes dropping out of cluster due to GC

2010-06-02 Thread Oleg Anastasjev
> > Has anyone experienced this sort of problem? It would be great to hear from > anyone who has had experience with this sort of issue and/or suggestions for > how to deal with it. > > Thanks, Eric Yes, i did. Symptoms you described point to concurrent GC FAILURE. During this failure concurr

Re: [***SPAM*** ] Re: [***SPAM*** ] Re: [***SPAM*** ] Re: writing speed test

2010-06-02 Thread Shuai Yuan
Hi, I tried, 1-consistency level ZERO 2-JVM heap 4GB 3-normal Memtable cache and now I have about 30% improvment. However I want to know if you have also done w/r benchmark and what's the result? ?? 2010-06-02 11:35 +0800??lwl?? > and, why did you set "JVM has 8G heap"? > 8g, seems t