as far as I know, only the os level limitations, e.g. typically ~60k
On Thu, Jun 3, 2010 at 9:34 AM, Lev Stesin wrote:
> Hi,
>
> Is there a limit on the number of client connections to a node? Thanks.
>
> --
> Lev
>
Hi,
Is there a limit on the number of client connections to a node? Thanks.
--
Lev
getRangeToEndpointMap is very useful, thanks, I didn't know about it...
however, I've reconfigured my cluster since (moved some nodes and tokens) so
not the problem is gone. I guess I'll use getRangeToEndpointMap next time I
see something like this...
On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis
On Wed, Jun 2, 2010 at 10:39 PM, David King wrote:
> If I go to fetch some row given the rack-unaware placement strategy, the
> default snitch and CL==ONE, the node that is asked is the first node in the
> ring with the datum that is currently up, then a checksum is sent to the
> replicas to tr
We don't support supercolumns in CFIF yet.
Peng Guo added this in his patchset at
http://files.cnblogs.com/gpcuster/CassandraInputFormat.rar but it's
mixed in with a ton of other changes. Honestly it's probably easier
to start fresh, but it might be useful to look at his code for
inspiration.
On
remember: you get concurrent mode failures, when the old gen fills up
with garbage before it can finish the CMS. so adding capacity =
reducing load per machine is the easiest way to make this a non-issue.
On Wed, Jun 2, 2010 at 12:45 PM, Eric Halpern wrote:
>
>
> Ryan King wrote:
>>
>> Why run w
No. And if we did it would be a bad idea: good ops practice is to
_minimize_ variability.
On Wed, Jun 2, 2010 at 3:18 AM, David Boxenhorn wrote:
> Is it possible to make a heterogeneous Cassandra cluster, with both Linux
> and Windows nodes? I tried doing it and got
>
> Error in ThreadPoolExecut
this is why JBOD configuration is contraindicated for cassandra.
http://wiki.apache.org/cassandra/CassandraHardware
On Tue, Jun 1, 2010 at 1:08 PM, Ian Soboroff wrote:
> My nodes have 5 disks and are using them separately as data disks. The
> usage on the disks is not uniform, and one is nearly
that would be reasonable
On Wed, Jun 2, 2010 at 6:41 AM, David Boxenhorn wrote:
> Would it be better to use an SQL-style timestamp ("-MM-DD HH:MM:SS.MMM")
> + unique id, then? They sort lexically the same as they sort
> chronologically.
>
> On Wed, Jun 2, 2010 at 4:37 PM, Leslie Viljoen
> wr
you're overcomplicating things.
just connect to *a* node, and if it happens to be down, try a different one.
nodes being down should be a rare event, not a normal condition. no
need to optimize for it so much.
also see http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
2010/6/1 Patri
Then the next step is to check StorageService.getRangeToEndpointMap via jmx
On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory wrote:
> I'm using RackAwareStrategy. But it still doesn't make sense I think...
> let's see what did I miss...
> According to http://wiki.apache.org/cassandra/Operations
>
> Ra
Sure, patching CM stats into nodetool is fine.
On Tue, Jun 1, 2010 at 9:50 AM, Ian Soboroff wrote:
> Regarding compaction thresholds... the BMT example says to set the threshold
> to 0 during an import. Is this advisable during any bulk import (say using
> batch mutations or just lots and lots o
If I go to fetch some row given the rack-unaware placement strategy, the
default snitch and CL==ONE, the node that is asked is the first node in the
ring with the datum that is currently up, then a checksum is sent to the
replicas to trigger read repair as appropriate. So with the row cache, tha
I've started seeing this issue as well. Running 0.6.2.
One interesting thing I happened upon, I explicitly called the GC via
jconsole and the heap dropped completely fixing the issue. When you
explicitly call System.gc() it does a full sweep. I'm wondering if this
issue is to do with the GC fla
Gary,
Thanks for reply. I've opened an issue at
https://issues.apache.org/jira/browse/CASSANDRA-1152
Yuki
2010/6/3 Gary Dusbabek :
> Yuki,
>
> Can you file a jira ticket for this
> (https://issues.apache.org/jira/browse/CASSANDRA)? The wiki indicates
> that this should be allowed: http://wiki
I have a super column along he lines of
=> { => { att: value }}
Now I would like to process a set of rows [from_time..until_time] with Hadoop.
I've setup the hadoop job like this
job.setInputFormatClass(ColumnFamilyInputFormat.class);
ConfigHelper.setColumnFamil
On 6/2/10 12:49 PM, Eric Halpern wrote:
We'd like to double our cluster size from 4 to 8 and increase our replication
factor from 2 to 3.
Is there any special procedure we need to follow to increase replication?
Is it sufficient to just start the new nodes with the replication factor of
3 and t
We'd like to double our cluster size from 4 to 8 and increase our replication
factor from 2 to 3.
Is there any special procedure we need to follow to increase replication?
Is it sufficient to just start the new nodes with the replication factor of
3 and then reconfigure the existing nodes to the
Ryan King wrote:
>
> Why run with so few nodes?
>
> -ryan
>
> On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern wrote:
>>
>> Hello,
>>
>> We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32
>> GB) using EBS storage with 8 GB of heap allocated to the JVM.
>>
>> Every couple of
Oleg Anastasjev wrote:
>
>>
>> Has anyone experienced this sort of problem? It would be great to hear
>> from
>> anyone who has had experience with this sort of issue and/or suggestions
>> for
>> how to deal with it.
>>
>> Thanks, Eric
>
> Yes, i did. Symptoms you described point to concur
We've also seen something like this. Will soon investigate and try
again with 0.6.2
On Wed, Jun 2, 2010 at 20:27, Paul Brown wrote:
>
> FWIW, I'm seeing similar issues on a cluster. Three nodes, Cassandra 0.6.1,
> SUN JDK 1.6.0_b20. I will try to get some heap dumps to see what's building
> u
Insert "if you want to use long values for keys and column names"
above paragraph 2. I forgot that part.
On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook wrote:
> If you want to do range queries on the keys, you can use OPP to do this:
> (example using UTF-8 lexicographic keys, with bursts split ac
If you want to do range queries on the keys, you can use OPP to do this:
(example using UTF-8 lexicographic keys, with bursts split across rows
according to row size limits)
Events: {
"20100601.05.30.003": {
"20100601.05.30.003":
"20100601.05.30.007":
...
}
}
With a future version
FWIW, I'm seeing similar issues on a cluster. Three nodes, Cassandra 0.6.1,
SUN JDK 1.6.0_b20. I will try to get some heap dumps to see what's building up.
I've seen this sort of issue in systems that make heavy use of
java.util.concurrent queues/executors, e.g.:
http://bugs.sun.com/bugdatab
Reading some more (someone break in when I lose my clue ;-)
Reading the streams page in the wiki about anticompaction, I think the best
approach to take when a node gets its disks overfull, is to set the
compaction thresholds to 0 on all nodes, decommission the overfull node,
wait for stuff to get
Yuki,
Can you file a jira ticket for this
(https://issues.apache.org/jira/browse/CASSANDRA)? The wiki indicates
that this should be allowed: http://wiki.apache.org/cassandra/API
Regards,
Gary.
On Tue, Jun 1, 2010 at 21:50, Yuki Morishita wrote:
> Hi,
>
> I'm testing several read operations(
I was able to reproduce the error by staring up a node using
RandomPartioner, kill it, switch to OrderPreservingPartitioner,
restart, kill, switch back to RandomPartitioner, BANG!
So it looks like you tinkered with the partitioner at some point.
This has the unfortunate effect of corrupting your s
Our replication factor was 1, so that wasn't the problem. (We tried other
replication factors too, just in case, but it didn't help.)
On Wed, Jun 2, 2010 at 7:51 PM, Nahor
> wrote:
> On 2010-06-02 3:18, David Boxenhorn wrote:
>
>> Is it possible to make a heterogeneous Cassandra cluster, with b
With a traffic pattern like that, you may be better off storing the
events of each burst (I'll call them group) in one or more keys and
then storing these keys in the day key.
EventGroupsPerDay: {
"20100601": {
123456789: "group123", // column name is timestamp group was
received, column val
On 2010-06-02 3:18, David Boxenhorn wrote:
Is it possible to make a heterogeneous Cassandra cluster, with both
Linux and Windows nodes? I tried doing it and got
Error in ThreadPoolExecutor
java.lang.NullPointerException
Not sure if this is due to the Linux/Windows mix or something else.
Det
Why run with so few nodes?
-ryan
On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern wrote:
>
> Hello,
>
> We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32
> GB) using EBS storage with 8 GB of heap allocated to the JVM.
>
> Every couple of hours, each of the nodes does a concur
Let's say you're logging events, and you have billions of events. What if
the events come in bursts, so within a day there are millions of events, but
they all come within microseconds of each other a few times a day? How do
you find the events that happened on a particular day if you can't store
t
Either OPP by key, or within a row by column name. I'd suggest the latter.
If you have structured data to stick under a column (named by the
timestamp), then you can serialize and unserialize it yourself, or you
can use a supercolumn. It's effectively the same thing. Cassandra
only provides the su
inute/hour/day/year depending on the volume of your data.
Something like the following:
SomeTimeData: { // columnfamily
"20100601": { // key, mmdd
123456789: "value1", // column name is milliseconds since epoch
123456799: "value2"
},
"20100602&q
How do I handle giant sets of ordered data, e.g. by timestamps, which I want
to access by range?
I can't put all the data into a supercolumn, because it's loaded into memory
at once, and it's too much data.
Am I forced to use an order-preserving partitioner? I don't want the
headache. Is there an
Ok, answered part of this myself. You can stop a node, move files around on
the data disks, as long as they stay in the right keyspace directories, and
all is fine.
Now, I have a single Data.db file which is 900GB and is compacted. The
drive its on is only 1.5TB, so it can't anticompact at all.
Can you clarify what you mean by 'random between nodes' ?
On Wed, Jun 2, 2010 at 8:15 AM, David Boxenhorn wrote:
> I see. But we could make this work if the random partitioner was random only
> between nodes, but was still ordered within each node. (Or if there were
> another partitioner that did
Would it be better to use an SQL-style timestamp ("-MM-DD HH:MM:SS.MMM")
+ unique id, then? They sort lexically the same as they sort
chronologically.
On Wed, Jun 2, 2010 at 4:37 PM, Leslie Viljoen wrote:
> On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis wrote:
> > OPP uses lexical ordering
On Mon, May 31, 2010 at 8:52 PM, Jonathan Ellis wrote:
> OPP uses lexical ordering on the keys, which isn't going to be the
> same as the natural order for a time-based uuid.
*palmface*
I see. But we could make this work if the random partitioner was random only
between nodes, but was still ordered within each node. (Or if there were
another partitioner that did this.) That way we could get everything we need
from each node separately. The results would not be ordered, but they wo
> So why do the "start" and "finish" range parameters exist?
Because especially if you want to iterate over all your key (which as
stated by Ben above
is the only meaningful way to use get_range_slices() with the random
partitionner), you'll
want to paginate that. And that's where the 'start' and
They exist because when using OPP they are useful and make sense.
On Wed, Jun 2, 2010 at 8:59 AM, David Boxenhorn wrote:
> So why do the "start" and "finish" range parameters exist?
>
> On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote:
>>
>> Martin,
>>
>> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Ma
So why do the "start" and "finish" range parameters exist?
On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote:
> Martin,
>
> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
> wrote:
> > I think you can specify an end key, but it should be a key which does
> exist
> > in your column family
Martin,
On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
wrote:
> I think you can specify an end key, but it should be a key which does exist
> in your column family.
Logically, it doesn't make sense to ever specify an end key with
random partitioner. If you specified a start key of "aaa"
Here is the relevant part of the previous thread:
Thank you. That is very good news. I can sort the results myself - what is
important is that I get them!
On Thu, May 13, 2010 at 2:42 AM, Vijay wrote:
If you use Random partitioner, You will *NOT* get RowKey's sorted. (Columns
are sorted always).
That's crazy! I could artificially insert a key with just the prefix, as a
placeholder, but why can't Cassandra do that virtually?
On Wed, Jun 2, 2010 at 3:34 PM, Dr. Martin Grabmüller <
martin.grabmuel...@eleven.de> wrote:
> I think you can specify an end key, but it should be a key which does
I think you can specify an end key, but it should be a key which does exist in
your column family.
But maybe I'm off the track here and someone else here knows more about this
key range stuff.
Martin
From: David Boxenhorn [mailto:da...@lookin2.com]
The keys will not be in any specific order when not using OPP, so, you
will never "get out of range" - you have to iterate over every single
key to find all keys that start with "CATEGORY". If you don't iterate
over every single key you run a chance of missing some. Obviously,
this kind of key rang
In other words, I should check the values as I iterate, and stop iterating
when I get out of range?
I'll try that!
On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller <
martin.grabmuel...@eleven.de> wrote:
> When not using OOP, you should not use something like 'CATEGORY/' as the
> end key.
>
When not using OOP, you should not use something like 'CATEGORY/' as the end
key.
Use the empty string as the end key and limit the number of returned keys, as
you did with
the 'max' value.
If I understand correctly, the end key is used to generate an end token by
hashing it, and
there is not
The previous thread where we discussed this is called, "key is sorted?"
On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn wrote:
> I'm not using OPP. But I was assured on earlier threads (I asked several
> times to be sure) that it would work as stated below: the results would not
> be ordered, b
I'm not using OPP. But I was assured on earlier threads (I asked several
times to be sure) that it would work as stated below: the results would not
be ordered, but they would be correct.
On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt wrote:
> Sounds like you are not using an order preserving par
Sounds like you are not using an order preserving partitioner?
On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote:
> Range search on keys is not working for me. I was assured in earlier threads
> that range search would work, but the results would not be ordered.
>
> I'm trying to get all the ro
Range search on keys is not working for me. I was assured in earlier threads
that range search would work, but the results would not be ordered.
I'm trying to get all the rows that start with "CATEGORY."
I'm doing:
String start = "CATEGORY.";
.
.
.
keyspace.getSuperRangeSlice(columnParent, slice
Is it possible to make a heterogeneous Cassandra cluster, with both Linux
and Windows nodes? I tried doing it and got
Error in ThreadPoolExecutor
java.lang.NullPointerException
Not sure if this is due to the Linux/Windows mix or something else.
Details below:
[r...@iqdev01 cassandra]# bin/c
Thanks Peter!
In my test application, for each record,
rowkey -> rand() * 4, about 64B
column * 20 -> rand() * 20, about 320B
I use batch_insert(rowkey, col*20) in thrift.
Kevin Yuan
??: Peter Sch??ller
??: user@cassandra.apache.org
: [***SPAM*** ] Re:
Since this thread has now gone on for a while...
As far as I can tell you never specify the characteristics of your
writes. Evaluating expected write throughput in terms of "MB/s to
disk" is pretty impossible if one does not know anything about the
nature of the writes. If you're expecting 50 MB,
Still seems MEM.
However it's hard to convince that constantly writing(even great amount
of data) needs so much MEM(16GB). The process is quite simple,
input_data -> memtable -> flush to disk
right? What does cassandra need so much MEM for?
Thanks!
?? 2010-06-02 16:24 +0800??lwl??
> N
>
> Has anyone experienced this sort of problem? It would be great to hear from
> anyone who has had experience with this sort of issue and/or suggestions for
> how to deal with it.
>
> Thanks, Eric
Yes, i did. Symptoms you described point to concurrent GC FAILURE. During this
failure concurr
Hi,
I tried,
1-consistency level ZERO
2-JVM heap 4GB
3-normal Memtable cache
and now I have about 30% improvment.
However I want to know if you have also done w/r benchmark and what's
the result?
?? 2010-06-02 11:35 +0800??lwl??
> and, why did you set "JVM has 8G heap"?
> 8g, seems t
60 matches
Mail list logo