Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-20 Thread Benjamin Black
Not so reasonable, given what you are trying to accomplish. A 1GB heap (on a 2GB machine) is fine for development and functional testing, but I wouldn't try to deal with the number of rows you are describing with less than 8GB/node with 4-6GB heap. b On Mon, Apr 19, 2010 at 7:32 PM, Ken Sandney

How to increase cassandra's performance in read?

2010-04-20 Thread yangfeng
I get 10 columns Family by keys and one columns Family has 30 columns. I use multigetSlice once to get 10 column Family.but the performance is so poor. anyone has other thought to increase the performance.

RE: Cassandra Java Client

2010-04-20 Thread Dop Sun
Hi, I have downloaded hector-0.6.0-10.jar. As you mentioned, it has good implementation for the connection pooling, JMX counters. What I’m doing is: using Hector to create the Cassandra client (be specific: borrow_client(url, port)). And my understanding is: in this way, the Jassandra wi

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-20 Thread Eric Evans
On Tue, 2010-04-20 at 10:39 +0800, Ken Sandney wrote: > Sorry I just don't know how to resolve this :) > On Tue, Apr 20, 2010 at 10:37 AM, Jonathan Ellis > wrote: > > > Ken, I linked you to the FAQ answering your problem in the

Re: tcp CLOSE_WAIT bug

2010-04-20 Thread Ingram Chen
I trace IncomingStreamReader source and found that incoming socket comes from MessagingService$SocketThread. but there is no close() call on either accepted socket or socketChannel. Should I file a bug report ? On Tue, Apr 20, 2010 at 11:02, Ingram Chen wrote: > this happened after several hour

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
I too am seeing very slow performance while testing worst case scenarios of 1 key leading to 1 supercolumn and 1 column beyond that. Key -> SuperColumn -> 1 Column (of ~ 500 bytes) Drive utilization is 80-90% and I'm only dealing with 50-70 million rows. (With NO swapping) So far, I've found

Tool for managing cluster nodes?

2010-04-20 Thread Joost Ouwerkerk
What are people using to manage Cassandra cluster nodes? i.e. to start, stop, copy config files, etc. I'm using cssh and wondering if there is a better way... Joost.

Re: Tool for managing cluster nodes?

2010-04-20 Thread Roger Schildmeijer
dancer's shell / distributed shell On 20 apr 2010, at 17.18em, Joost Ouwerkerk wrote: > What are people using to manage Cassandra cluster nodes? i.e. to start, > stop, copy config files, etc. I'm using cssh and wondering if there is a > b

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Jonathan Ellis
How many columns are in the supercolumn total? "in super columnfamilies there is a third level of subcolumns; these are not indexed, and any request for a subcolumn deserializes _all_ the subcolumns in that supercolumn" On Tue, Apr 20, 2010 a

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
When I first read this, it bothered me because it seemed like it couldn't be so. So I read the link, and it says the whole thing, so I have to ask for some classification here. I had always assumed a super column was similar to a local keyspace, and that the SubColumns under it were similar to

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
Sorry, I didn't answer your question in my response, I have at this point: Key(ID) When/Where SuperColumn Tag: and Columns {Data: One Value (not yet written, tags, flags)} Under some keys (very small #) there will be 2 values like: Key(ID) When/Where SuperColumn Tag: and Columns {Da

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Jonathan Ellis
Not all the data associated w/ the key is brought into memory, just all the data associated w/ the supercolumns being queried. Supercolumns are so you can update a smallish number of subcolumns independently (e.g. when denormalizing an entire narrow row, usually with a finite set of columns). If

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
To make sure I'm clear on what you are saying: Are the "Individual Emails" in the example below, Supercolumns and the {body, header, tags...} the subcolumns? Is that a sane data layout for an email system? Where the Supercolumn identifier is the "conversation label" Sorry to be so daft, but

Re: Modelling assets and user permissions

2010-04-20 Thread tsuraan
> Suppose I have a CF that holds some sort of assets that some users of > my program have access to, and that some do not.  In SQL-ish terms it > would look something like this: > > TABLE Assets ( >  asset_id serial primary key, >  ... > ); > > TABLE Users ( >  user_id serial primary key, >  user_n


2010-04-20 Thread Christian Torres
Hello! Is there any way to make filters (WHEREs) in cassandra? Or I have to manages to do it For example: I have a ColumnFamily with a column in each row whose value is a state... Public or Private, so I want to filter all rows that are private and also the public ones in other form... Beside in

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-20 Thread Tatu Saloranta
On Mon, Apr 19, 2010 at 7:12 PM, Brandon Williams wrote: > On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang wrote: >> >> 2. Reject the request when be short of resource, instead of throws OOME >> and exit (crash). > > Right, that is the crux of the problem  It will be addressed here: > https://iss

RE: Filters

2010-04-20 Thread Mark Jones
You will have to pull the columns and filter yourself. From: Christian Torres [] Sent: Tuesday, April 20, 2010 11:50 AM To: Cc: Subject: Filters Hello! Is there any way to make filters (WHEREs) in cassandra? Or I have t

RE: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-20 Thread Mark Jones
I would think this is on the roadmap, just not available yet. It can be managed by adjusting the Heap size (to a large degree). -Original Message- From: Tatu Saloranta [] Sent: Tuesday, April 20, 2010 12:18 PM To: Subject: Re: 0.6.1 in

Re: Re: Modelling assets and user permissions

2010-04-20 Thread charleswoerner
The short answer as to what people normally do is that they use a relational database for something like this. I'm curious as to how you would have so many asset / user permissions that you couldn't use a standard relational database to model them. Is this some sort of multi-tenant system w

Re: Tool for managing cluster nodes?

2010-04-20 Thread B. Todd Burruss Roger Schildmeijer wrote: dancer's shell / distributed shell On 20 apr 2010, at 17.18em, Joost Ouwerkerk wrote: What are people using to manage Cassandra cluster nodes? i.e. to s

Re: Filters

2010-04-20 Thread Christian Torres
Mmmm... According with this doc that a developer mailed to me It's possible!! I sent you as reference On Tue, Apr 20, 2010 at 11:17 AM, Mark Jones wrote: > You will have to pull the columns and filter yourself. > > > > *From:* Christian Torres [m

Re: Cassandra Java Client

2010-04-20 Thread Nathan McCall
Dop, Thank you for trying out hector. I think you have the right approach for using it with your project. Feel free to ping us directly regarding Hector on either of these mailings lists as appropriate: Cheers, -Nate On Tue, Apr 20, 2010 at 7:11

Re: Cassandra Java Client

2010-04-20 Thread Ran Tavory
great, I'm happy you found hector useful and reused it in your client. On Tue, Apr 20, 2010 at 5:11 PM, Dop Sun wrote: > Hi, > > > > I have downloaded hector-0.6.0-10.jar. As you mentioned, it has good > implementation for the connection pooling, JMX counters. > > > > What I’m doing is: using H

RE: Filters

2010-04-20 Thread Mark Jones
If you notice the SlicePredicate accepts column names, but not values. You can tell it pull these 3 columns, but there is no "if/where" in there. SliceRange is I think, based on the key, since it doesn't have a way to pair up column names/values From: Christian Torres [

Re: Filters

2010-04-20 Thread Miguel Verde get_slice retrieves the values for either (a) a list of column names or (b) a range of columns, depending on the SlicePredicate you use. It does not allow you to filter a la SQL's WHERE. You would need to create your own index to do so, at least unti

Re: Filters

2010-04-20 Thread Roger Schildmeijer
My bad. Missed your one-to-one relationship (row key <-> column ) On 20 apr 2010, at 19.24em, Christian Torres wrote: > Mmmm... > > According with this doc that a > developer mailed to me It's possible!! > > I sent you as reference > > On Tue, Ap

cleaning house

2010-04-20 Thread B. Todd Burruss
i'm trying to draw some correlation between the size of my data and the space used on disk. i have set 1 so there isn't any reason to keep data around. my approach is this: after only doing "puts" to cassandra for a while i stop my client and want to perform the proper "cleanup" and/or "comp

Re: 0.6 insert performance .... Re: [RELEASE] 0.6.1

2010-04-20 Thread Masood Mortazavi
You're welcome Schubert. I look forward to any new results you may come up with. { It would also be interesting, when you run your tests again, to look at the GC logs and see to what extent is the culprit for what you will see. Identifying any ot

Re: Re: Modelling assets and user permissions

2010-04-20 Thread tsuraan
> I'm curious as to how you would have so many asset / user permissions that > you couldn't use a standard relational database to model them. Is this some > sort of multi-tenant system where you're providing some generalized asset > check-out mechanism to many, many customers? Even so, I'm not sure

Delete row

2010-04-20 Thread Sonny Heer
How do i delete a row using BMT method? Do I simply do a mutate with column delete flag set to true? Thanks.

Re: cleaning house

2010-04-20 Thread Benjamin Black
Are you deleting data through the API or just doing a bunch of inserts and then running a compaction? The latter will not result in anything to clean up since data must be explicitly deleted. b On Tue, Apr 20, 2010 at 10:33 AM, B. Todd Burruss wrote: > i'm trying to draw some correlation betwe

Re: cleaning house

2010-04-20 Thread Jonathan Ellis
Added to SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. You can force a GC from jconsole if necessary but this is not necessary; Cassandra will force one itself if it detects that it is low on sp

Re: cleaning house

2010-04-20 Thread B. Todd Burruss
i have done no deletes, just inserts. so you are correct, there isn't any "data" to cleanup. however when i run some of the cleanup and/or compaction tasks the space used on disk actually grows, and i would like to force any unneeded files to be removed. as i write this, jonathan has respond

Re: cleaning house

2010-04-20 Thread B. Todd Burruss
thx, that did the trick. Jonathan Ellis wrote: Added to SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. You can force a GC from jconsole if necessary but this is not necessary; Cassandra will f

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Benjamin Black
I can't answer for its sanity, but I would not do it that way. I'd have a CF for Emails, with 1 email per row, and another CF for UserEmails with per-user index rows referencing the Emails rows. b On Tue, Apr 20, 2010 at 9:44 AM, Mark Jones wrote: > To make sure I'm clear on what you are sayin

Re: get_range_slices in hector

2010-04-20 Thread Ran Tavory
We haven't gotten around to implementing this yet and so far no one needed that badly enough to write it. We accept contributions or forks and we use github, so feel free to diy (forks are preferable). On Tue, Apr 20, 2010 at 3:25 AM, Chris Dean wrote: > Ok, thank

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
When I look at this arrangement, I see one lookup by key for the user, followed by a large read for all the "email indexes" (these are all columns in the same row, right?) Then one lookup by key for each email Seems very seek intensive. Would a better way be to index each email with a ke

Re: Re: Modelling assets and user permissions

2010-04-20 Thread Vick Khera
On Tue, Apr 20, 2010 at 1:37 PM, tsuraan wrote: > The assets are binary files on a document tracking system.  Our > current platform is postgres-backed; the entire system we've written > is fairly easily distributed across multiple computers, but postgres > isn't.  There are reliable databases tha

Re: Filters

2010-04-20 Thread Christian Torres
So the sugestion would be create a column family with the values or states and with columns save the matches? On Tue, Apr 20, 2010 at 11:27 AM, Roger Schildmeijer wrote: > My bad. Missed your one-to-one relationship (row key <-> column > > ) > On 20 apr 2010, at 19.24em, Christian Torres wrote:

Re: Filters

2010-04-20 Thread Christian Torres
And the key would be the state or value matched, I'm getting it well? On Tue, Apr 20, 2010 at 2:46 PM, Christian Torres wrote: > So the sugestion would be create a column family with the values or states > and with columns save the matches? > > > On Tue, Apr 20, 2010 at 11:27 AM, Roger Schildmeij

Re: Re: Modelling assets and user permissions

2010-04-20 Thread tsuraan
> It seems to me you might get by with putting the actual assets into > cassandra (possibly breaking them up into chunks depending on how big > they are) and storing the pointers to them in Postgres along with all > the other metadata.  If it were me, I'd split each file into a fixed > chunksize an

Using get_range_slices

2010-04-20 Thread Chris Dean
I'd like to use get_range_slices to pull all the keys from a small CF with 10,000 keys. I'd also like to get them in chunks of 100 at a time. Is there a way to do that? I thought I could set start_token and end_token in KeyRange, but I can't figure out what the intial start_token should be. Chee

Re: Using get_range_slices

2010-04-20 Thread Jonathan Ellis
you should use keys, not tokens. start with empty string. On Tue, Apr 20, 2010 at 5:12 PM, Chris Dean wrote: > I'd like to use get_range_slices to pull all the keys from a small CF > with 10,000 keys.  I'd also like to get them in chunks of 100 at a time. > Is there a way to do that? > > I thoug

Big Data Workshop 4/23 was Re: Cassandra Hackathon in SF @ Digg - 04/22 6:30pm

2010-04-20 Thread Joseph Boyle
Reminder - price goes up after tonight at We now have enough people interested in a bus or van from SF to Mountain View to offer one. Check the interested box when you register and we will send you pickup point information. We will have people from the Cass

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Benjamin Black
On Tue, Apr 20, 2010 at 11:54 AM, Mark Jones wrote: > When I look at this arrangement, I see one lookup by key for the user, > followed by a large read for all the "email indexes"  (these are all columns > in the same row, right?) > > Then one lookup by key for each email  Seems very seek in

TimeoutException when I put very large value

2010-04-20 Thread Jeff Zhang
Hi all, When I insert very large value, the thrift will throw TimeOutException, event If I set the socket timeout as 10 minutes. I believe the 10 minutes is enough for inserting the large value and spreading the replica to other machines, the ConsistencyLevel I choose is DCQUORUM. So is there any

Re: TimeoutException when I put very large value

2010-04-20 Thread Ryan King
what's your RPC timeout in storage-conf? -ryan On Tue, Apr 20, 2010 at 6:46 PM, Jeff Zhang wrote: > Hi all, > > When I insert very large value, the thrift will throw TimeOutException, > event If I set the socket timeout as 10 minutes.  I believe the 10 minutes > is enough for inserting the large

Re: TimeoutException when I put very large value

2010-04-20 Thread acrd seek
Thanks Ryan, I also notice this prameter in storage-conf just now. I am going to increase this number to test whether it will work 2010/4/21 Ryan King > what's your RPC timeout in storage-conf? > > -ryan > > On Tue, Apr 20, 2010 at 6:46 PM, Jeff Zhang wrote: > > Hi all, > > > > When I insert

Batch row deletion

2010-04-20 Thread Carlos Sanchez
All, Is there or will there be a feature to batch delete rows? (KeyRange delete?) Thanks Carlos This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected fro

Re: Batch row deletion

2010-04-20 Thread Jonathan Ellis
This will be done in On Tue, Apr 20, 2010 at 10:45 PM, Carlos Sanchez wrote: > All, > > Is there or will there be a feature to batch delete rows? (KeyRange delete?) > > Thanks > > Carlos > > This email message and any attachments are for the sol

RE: Batch row deletion

2010-04-20 Thread Carlos Sanchez
Awesome thx.. Carlos From: Jonathan Ellis [] Sent: Tuesday, April 20, 2010 10:52 PM To: Subject: Re: Batch row deletion This will be done in On Tue, Apr 20, 201

new hector version and updates

2010-04-20 Thread Ran Tavory
A few recent changes made at hector: 1. We keep several branches in parallel: 0.5.0, 0.5.1, 0.6.0 and master. We've now changed master to be at version 0.6.0. 0.6.1 is compatible with 0.6.0 as the API didn't change, so practically master is now at the latest released cassandra version. 2. We added