Re: High read latency

2010-06-04 Thread Sylvain Lebresne
As written in the third point of http://wiki.apache.org/cassandra/CassandraLimitations, right now, super columns are not indexed and deserialized fully when you access them. Another way to put it is, you'll want to user super columns with only a relatively small number of columns in them. Because i

Re: Cassandra training Jun 18 in SF

2010-06-04 Thread Oleg Anastasjev
Jonathan Ellis gmail.com> writes: > > This will be Riptano's 6th training session (including the four we've > done that were on-site with a specific customer), and in my humble > opinion the material's really solid at this point. > > We are actively working on lining up other locations. > Do y

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Lu Ming
I notice that: there are more than 100 "CLOSE_WAIT" incomming connections on storage port 7000 In my two cassandra node: 126 of 146 storage connections is "CLOSE_WAIT" 196 of 217 storage connections is "CLOSE_WAIT" Is it normal? -- From: "Chris

Re: High read latency

2010-06-04 Thread Ma Xiao
It's really a pain to modify the data model, the problem is how to handle "one-to-many" relation in cassandra? The limitation of the row size will lead to impossible to store them with columns. On Fri, Jun 4, 2010 at 4:13 PM, Sylvain Lebresne wrote: > As written in the third point of > http://w

Fatal exception in with compaction

2010-06-04 Thread casablinca126.com
hi , I get a fatal exception with my cassandra cluster: java.lang.NoClassDefFoundErrororg/apache/cassandra/db/CompactionManager$4 at org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:156) at org.apache.cassandra.db.Compa

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Lu Ming
I do the Thread Dump on each cassandra node, and count the thread with call stack string "at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)atorg.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav a:66)" in "thread-xxx" then I find a

Re: Fatal exception in with compaction

2010-06-04 Thread casablinca126.com
hi, I have not used nodetool repair or nodetool compact . So how is MajorCompaction triggered? -- casablinca126.com 2010-06-04 - 发件人:casablinca126.com 发送日期:2010-06-04 18:05:11 收件人:u

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Lu Ming
Does the following code in IncomingStreamReader.read cause 100% CPU??? while (bytesRead < pendingFile.getExpectedBytes()) { bytesRead += fc.transferFrom(socketChannel, bytesRead, FileStreamTask.CHUNK_SIZE); pendingFile.update(bytesRead); } BTW: our cas

Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Per Olesen
Are 6..8 seconds to read 23.000 small rows - as it should be? I have a quick question on what I think is bad read performance for this simple setup: SCF:Dashboard key:username1 -> { SC:uniqStr1 -> { col1:val1, col2: val2, ... col8:val8 }, SC:uniqStr2 -> { col1:val1, col2: v

Re: Cassandra training Jun 18 in SF

2010-06-04 Thread S Ahmed
Nice! Would it be possible to give more than 2 weeks notice for the following events? Preferrably a month, its not that easy to get off work etc. On Fri, Jun 4, 2010 at 4:22 AM, Oleg Anastasjev wrote: > Jonathan Ellis gmail.com> writes: > > > > > This will be Riptano's 6th training session (in

RE: Cassandra Cluster Setup

2010-06-04 Thread Stephan Pfammatter
Tx Nahor/Gary/Ben. I blew everything away, adjusted the seed to Cassandra-ca and changed the replication factor. Now I can see the sample keys being spread across the 3 nodes: Cassandra-ca holds rows (keys: 1,5,8) Cassandra-az holds rows (keys: 11,55) Cassandra-or hodls rows (keys: 88) I don't se

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Gary Dusbabek
Chris, Can you get me a stack dump of one of the busy nodes (kill -3)? Gary On Thu, Jun 3, 2010 at 22:50, Chris Goffinet wrote: > We're seeing this as well. We were testing with a 40+ node cluster on the > latest 0.6 branch from few days ago. > > -Chris > > On Jun 3, 2010, at 9:55 PM, Lu Ming

RE: Cassandra training Jun 18 in SF

2010-06-04 Thread Parsacala Jr, Nazario R. [Tech]
Do you have details on this ..? From: S Ahmed [mailto:sahmed1...@gmail.com] Sent: Friday, June 04, 2010 9:50 AM To: user@cassandra.apache.org Subject: Re: Cassandra training Jun 18 in SF Nice! Would it be possible to give more than 2 weeks notice for the following events? Preferrably a month, i

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Jonathan Ellis
get_slice reads a single row. do you mean there are 23,000 columns, or are you running get_slice in a loop 23000 times? On Fri, Jun 4, 2010 at 4:59 AM, Per Olesen wrote: > Are 6..8 seconds to read 23.000 small rows - as it should be? > > I have a quick question on what I think is bad read perfor

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Per Olesen
On Jun 4, 2010, at 4:46 PM, Jonathan Ellis wrote: > get_slice reads a single row. do you mean there are 23,000 columns, > or are you running get_slice in a loop 23000 times? Hi Jonathan, thanks for answering! No, I do only one get_slice call. There are 23.000 SUPER columns, which I read using

http://voltdb.com/ ?

2010-06-04 Thread Denis Haskin
Anybody looked at VoltDB? I haven't dug into it, but curious about it. dwh

Re: [***SPAM*** ] Re: question about class SlicePredicate

2010-06-04 Thread David Boxenhorn
It works for Random Partitioner only if you want to get all keys. 2010/6/4 Shuai Yuan > It's documented that get_range_slice() supports all partitioner in 0.6 > > Kevin > > 原始信件 > 发件人: Olivier Mallassi > 收件人: user@cassandra.apache.org > 主题: [***SPAM*** ] Re: question about cl

RE: http://voltdb.com/ ?

2010-06-04 Thread Jones, Nick
I saw a tweet about claiming far better performance to Cassandra. After following up, I found out it requires the entire DB to reside in memory across the nodes. Nick Jones From: Denis Haskin [mailto:de...@haskinferguson.net] Sent: Friday, June 04, 2010 10:17 AM To: user Subject: http://voltdb

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Ben Browning
How many subcolumns are in each supercolumn and how large are the values? Your example shows 8 subcolumns, but I didn't know if that was the actual number. I've been able to read columns out of Cassandra at an order of magnitude higher than what you're seeing here but there are too many variables t

Re: Cassandra Cluster Setup

2010-06-04 Thread Gary Dusbabek
Great. It looks as if the replication factor is still 1, not 3. This means that each key lives only on one node. By increasing it to 3, data will be replicated across all 3 nodes. Gary. On Fri, Jun 4, 2010 at 06:51, Stephan Pfammatter wrote: > Tx Nahor/Gary/Ben. I blew everything away, adjus

RE: Cassandra Cluster Setup

2010-06-04 Thread Stephan Pfammatter
Nice catch Gary. Just realized that replication factor is granular to key space. -Original Message- From: Gary Dusbabek [mailto:gdusba...@gmail.com] Sent: Friday, June 04, 2010 11:49 AM To: user@cassandra.apache.org Subject: Re: Cassandra Cluster Setup Great. It looks as if the replic

Re: Fatal exception in with compaction

2010-06-04 Thread Stu Hood
A "major" compaction is any compaction that sees all of the sstables for a column family. In the context of the method you edited, that means that all of the SSTables fall into a single bucket, and can be compacted together. -Original Message- From: "casablinca126.com" Sent: Friday, Jun

Embedded usage

2010-06-04 Thread Sten Roger Sandvik
Hi. I have looked at cassandra before and now I'm revisiting the project :-) At the project I am working on we need a fast storage for blobs and lucene indexes that is available on each node in the cluster. Cassandra seems to fit very good for the blob storage and cassandra/lucandra for the indexi

Re: Handling disk-full scenarios

2010-06-04 Thread Ian Soboroff
Story continued, in hopes this experience is useful to someone... I shut down the node, removed the huge file, restarted the node, and told everybody to repair. Two days later, AE stages are still running. Ian On Thu, Jun 3, 2010 at 2:21 AM, Jonathan Ellis wrote: > this is why JBOD configurat

Re: Embedded usage

2010-06-04 Thread Jonathan Ellis
look at o.a.c.service.EmbeddedCassandraService On Fri, Jun 4, 2010 at 9:29 AM, Sten Roger Sandvik wrote: > Hi. > > I have looked at cassandra before and now I'm revisiting the project :-) At > the project I am working on we need a fast storage for blobs and lucene > indexes that is available on e

Re: Embedded usage

2010-06-04 Thread Sten Roger Sandvik
2010/6/4 Jonathan Ellis > look at o.a.c.service.EmbeddedCassandraService > > Yes. Looked at this and the CassandraDaemon. But it seems that it's not possible to create the configuration programatically (or create the actual config file and pass it in). /srs

Re: Giant sets of ordered data

2010-06-04 Thread Benjamin Black
Use index rows named for time intervals that contain columns named for the row keys of the base data rows from each interval. b On Wed, Jun 2, 2010 at 8:32 AM, David Boxenhorn wrote: > How do I handle giant sets of ordered data, e.g. by timestamps, which I want > to access by range? > > I can't

Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Philip Stanhope
Here's the scenario: would like R = N where N is the number of nodes. Let's say 8. 1. Create first node, modify storage-conf.xml and change the to be the ip of the node. Change replication factor to 8 for CF of interest. Start the puppy up. 2. Create 2nd node, modify storage-confg.xml and ch

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 10:36 AM, Philip Stanhope wrote: > > Here's the scenario: would like R = N where N is the number of nodes. Let's > say 8. > > 1. Create first node, modify storage-conf.xml and change the to be > the ip of the node. Change replication factor to 8 for CF of interest. Start

Re: Embedded usage

2010-06-04 Thread Ran Tavory
Cassandra expects a config file and does not expose an alternative API, for this file, that's correct. I think it's not hard to add such API but so far the demand for it didn't exist. On Jun 4, 2010 8:01 PM, "Sten Roger Sandvik" wrote: 2010/6/4 Jonathan Ellis > > look at o.a.c.service.Embedd

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Philip Stanhope
Thanks on the correction about Keyspace versus ColumnFamily ... I knew that just mis-typed. I guess it should be stated (to be obvious) ... that when you are auto bootstrapping a node ... the seed better be alive. The scenario I'm dealing with is that it might not be (reasons for that are tange

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 11:04 AM, Philip Stanhope wrote: > > I am contemplating a situation where there may be 2N servers ... but only N > online at any one time. But, for operational purposes, N+n (where n is 1 or > 2), N may be occasionally greater than R. > Then Cassandra is probably not the

Keyspace with single CF or Keyspace per CF

2010-06-04 Thread Philip Stanhope
This is a data modeling question, not operational like my previous ones today. I have a data model where I'm going to have some 1:1 relationship between a CF1/Key1/Col value and another CF2/Key where the value in CF1/Key1/Col is the CF2/Key. CF1 will grow to have 1B+ keys. CF1 will have 10...N

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Philip Stanhope
I guess I'm thick ... What would be the right choice? Our data demands have already been proven to scale beyond what RDB can handle for our purposes. We are quite pleased with Cassandra read/write/scale out. Just trying to understand the operational considerations. On Jun 4, 2010, at 2:11 PM,

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Per Olesen
On Jun 4, 2010, at 5:19 PM, Ben Browning wrote: > How many subcolumns are in each supercolumn and how large are the > values? Your example shows 8 subcolumns, but I didn't know if that was > the actual number. I've been able to read columns out of Cassandra at > an order of magnitude higher than

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope wrote: > I guess I'm thick ... > > What would be the right choice? Our data demands have already been proven to > scale beyond what RDB can handle for our purposes. We are quite pleased with > Cassandra read/write/scale out. Just trying to underst

Re: Embedded usage

2010-06-04 Thread Sten Roger Sandvik
2010/6/4 Ran Tavory > Cassandra expects a config file and does not expose an alternative API, for > this file, that's correct. > I think it's not hard to add such API but so far the demand for it didn't > exist. > I see that making a config api is not that hard. Will probably take a stab at it :-

Expected wait while bootstrapping?

2010-06-04 Thread Aaron Lav
I'm trying to bring up a new node. The log on the new node says: INFO 19:00:47,794 Joining: getting bootstrap token INFO 19:00:49,003 New token will be 98546264352590127933376074587615077134 to assume load from /xx.xxx.xx.xxx I had expected to see messages about the beginning of anticompaction

Re: Expected wait while bootstrapping?

2010-06-04 Thread Gary Dusbabek
Most of the streaming messages are DEBUG, so you'll have to amp up logging. Gary. On Fri, Jun 4, 2010 at 12:26, Aaron Lav wrote: > I'm trying to bring up a new node.  The log on the new node says: > > INFO 19:00:47,794 Joining: getting bootstrap token > INFO 19:00:49,003 New token will be 98546

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Torsten Curdt
> Yes, I know. And I might end up doing this in the end. I do though have > pretty hard upper limits of how many rows I will end up with for each key, > but anyways it might be a good idea none the less. Thanks for the advice on > that one. You set count to Integer.MAX. Did you try with say 300

Re: Expected wait while bootstrapping?

2010-06-04 Thread Aaron Lav
On Fri, Jun 04, 2010 at 12:35:51PM -0700, Gary Dusbabek wrote: > Most of the streaming messages are DEBUG, so you'll have to amp up logging. I've upped logging on the bootstrapping node, and I realize that it's trying to assume load from two nodes. The other node (ie the one not mentioned in the

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Mike Malone
> > Yes, I know. And I might end up doing this in the end. I do though have > pretty hard upper limits of how many rows I will end up with for each key, > but anyways it might be a good idea none the less. Thanks for the advice on > that one. > > You set count to Integer.MAX. Did you try with say 3

Re: Column or SuperColumn

2010-06-04 Thread Jonathan Ellis
if you have a relatively small, static set of subcolumns, that you read as a group, then using supercolumns is reasonable On Tue, Jun 1, 2010 at 7:33 PM, Peter Hsu wrote: > I have a pretty simple data modeling question.  I don't know whether or not > to use a CF or SCF in one instance. > > Here'

Row Time range

2010-06-04 Thread Nicholas Sun
Is there a mechanism to select a time range within a row range query? Is this planned? For example, return to me the last 10 post starting at 7:00pm yesterday? Nick

strange load balancing across three nodes

2010-06-04 Thread Mike Subelsky
Hello everyone, One of my nodes has a much higher load (10x) than the other two nodes. I don't think it's because a few keys have a lot more columns than others -- the keys are well distributed and I'm using the random partitioner. Could someone point me in the direction of what should I be chec

Conditional get

2010-06-04 Thread Lev Stesin
Hi, I am not sure how to implement multiget or slice_range based on a conditional predicate. For example what if I want to get only keys with containing certain columns. Thanks. -- Lev

Strage Read Perfoamnce 1xN column slice or N column slice

2010-06-04 Thread Arya Goudarzi
Hi Fellows, I have the following design for a system which holds basically key->value pairs (aka Columns) for each user (SuperColumn Key) in different namespaces (SuperColumnFamily row key). Like this: Namesapce->user->column_name = column_value; keyspaces: - name: NKVP replica_placeme

Re: Expected wait while bootstrapping?

2010-06-04 Thread Aaron Lav
On Fri, Jun 04, 2010 at 12:35:51PM -0700, Gary Dusbabek wrote: > Most of the streaming messages are DEBUG, so you'll have to amp up logging. I upped the logging to DEBUG on the bootstrapping node and the nodes being bootstrapped from, and the bootstrap completed fine, so I'm not sure what was goin

Re: Fatal exception in with compaction

2010-06-04 Thread Jonathan Ellis
as the stacktrace suggests, HintedHandoffManager does major compactions of just the hints columnfamily after hint delivery 2010/6/4 casablinca126.com : > hi, >I have not used nodetool repair or nodetool compact . So how is > MajorCompaction triggered? > > -- > casablinca12

Re: strange load balancing across three nodes

2010-06-04 Thread Jonathan Ellis
The sections on ring management and token selection on http://wiki.apache.org/cassandra/Operations will help. On Fri, Jun 4, 2010 at 2:27 PM, Mike Subelsky wrote: > Hello everyone, > > One of my nodes has a much higher load (10x) than the other two nodes. >  I don't think it's because a few keys

Re: Row Time range

2010-06-04 Thread Benjamin Black
That's entirely up to you. If you make row keys that are time ordered and include the time as a prefix in the key, you just use get_range() as usual, start now, end 7pm yesterday, count of 10. On Fri, Jun 4, 2010 at 2:23 PM, Nicholas Sun wrote: > Is there a mechanism to select a time range withi

Performance Characteristics of CASSANDRA-16 (Memory Efficient Compactions)

2010-06-04 Thread Jeremy Davis
https://issues.apache.org/jira/browse/CASSANDRA-16 Can someone (Jonathan?) help me understand the performance characteristics of this patch? Specifically: If I have an open ended CF, and I keep inserting with ever increasing column names (for example current Time), will things generally work out

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Jonathan Shook
If I may ask, why the need for frequent topology changes? On Fri, Jun 4, 2010 at 1:21 PM, Benjamin Black wrote: > On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope wrote: >> I guess I'm thick ... >> >> What would be the right choice? Our data demands have already been proven to >> scale beyond

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread pstanhope
I never said it would be frequent. That was an assumption made by Ben. I am trying to understand how to "set the dials" to ensure availability and durability ... And understand the cost when the inevitable hardware failure occurs. Sent via BlackBerry from T-Mobile -Original Message-

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 6:51 PM, wrote: > I never said it would be frequent. That was an assumption made by Ben. > You indicated in an earlier email that you expected half the nodes to be offline at any time. It is unclear how you expected that to work for either the consistency processes or the

Re: Is there any way to detect when a node is down so I can failover more effectively?

2010-06-04 Thread Patricio Echagüe
Thanks Johathan On Wed, Jun 2, 2010 at 11:17 PM, Jonathan Ellis wrote: > you're overcomplicating things. > > just connect to *a* node, and if it happens to be down, try a different > one. > > nodes being down should be a rare event, not a normal condition. no > need to optimize for it so much.

Can't start up Cassnadra service.

2010-06-04 Thread Ma Xiao
Cassdra can't start it's service with following error, what's wrong with it? ERROR 14:28:22,631 Exception encountered during startup. java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.apache.cassandra.d