CF that is like a non-clustered index, are key lookups that fast?

2010-06-15 Thread S Ahmed
If you store only the key mappings in a column family, for custom ordering of rows etc. for things like: friends = { user_id : { friendid1, friendid2, } } or topForumPosts = { forum_id1 : { post2343, post32343, post32223, ...} } Now on friends page or on the top_forum_posts page

How to get previous / next data?

2010-06-15 Thread Bram van der Waaij
Hello, We want to use cassandra to store and retrieve time related data. Storing the time-value pairs is easy and works perfectly. The problem arrives at retrieving the data. We do not only want to retrieve data from within a time range, but also be able to get the previous and/or next data sample

Re: How to get previous / next data?

2010-06-15 Thread Sylvain Lebresne
You want to use 'reversed' in SliceRange (and a start with whatever you want and a count of 2). -- Sylvain On Tue, Jun 15, 2010 at 12:01 PM, Bram van der Waaij wrote: > Hello, > > We want to use cassandra to store and retrieve time related data. Storing > the time-value pairs is easy and works p

Re: CF that is like a non-clustered index, are key lookups that fast?

2010-06-15 Thread Gary Dusbabek
On Tue, Jun 15, 2010 at 04:29, S Ahmed wrote: > If you store only the key mappings in a column family, for custom ordering > of rows etc. for things like: > friends = { > >    user_id : { friendid1, friendid2, } > } > or > topForumPosts = { > >     forum_id1 : { post2343, post32343, post32223,

Re: JVM Options for Production

2010-06-15 Thread Ted Zlatanov
On Mon, 14 Jun 2010 16:01:57 -0700 Anthony Molinaro wrote: AM> Now I would assume that for 'production' you want to remove AM>-ea AM> and AM>-XX:+HeapDumpOnOutOfMemoryError AM> as well as adjust -Xms and Xmx accordingly, but are there any others AM> which should be tweaked? Is there a

Re: CF that is like a non-clustered index, are key lookups that fast?

2010-06-15 Thread S Ahmed
well it won't be a range, it will be random key lookups. On Tue, Jun 15, 2010 at 8:44 AM, Gary Dusbabek wrote: > On Tue, Jun 15, 2010 at 04:29, S Ahmed wrote: > > If you store only the key mappings in a column family, for custom > ordering > > of rows etc. for things like: > > friends = { > > >

Re: CF that is like a non-clustered index, are key lookups that fast?

2010-06-15 Thread Jonathan Ellis
In a read-mostly workload it will be better to denormalize the post contents into subcolumns of the top_posts rows. On Tue, Jun 15, 2010 at 2:29 AM, S Ahmed wrote: > If you store only the key mappings in a column family, for custom ordering > of rows etc. for things like: > friends = { > >    use

Re: How to get previous / next data?

2010-06-15 Thread Bram van der Waaij
Perfect! Thanks :-) 2010/6/15 Sylvain Lebresne > You want to use 'reversed' in SliceRange (and a start with whatever > you want and a count of 2). > > -- > Sylvain > > On Tue, Jun 15, 2010 at 12:01 PM, Bram van der Waaij > wrote: > > Hello, > > > > We want to use cassandra to store and retrieve

Re: java.lang.OutofMemoryerror: Java heap space

2010-06-15 Thread Jonathan Ellis
if you are reading 500MB per thrift request from each of 3 threads, then yes, simple arithmetic indicates that 1GB heap is not enough. On Mon, Jun 14, 2010 at 6:13 PM, Caribbean410 wrote: > Hi, > > I wrote 200k records to db with each record 5MB. Get this error when I uses > 3 threads (each threa

Cassandra timeouts under low load

2010-06-15 Thread Drew Dahlke
Hi, I'm running cassandra .6.2 on a dedicated 4 node cluster and I also have a dedicated 4 node hadoop cluster. I'm trying to run a simple map reduce job against a single column family and it only takes 32 map tasks before I get floods of thrift timeouts. That would make sense to me if the cassandr

RE: java.lang.OutofMemoryerror: Java heap space

2010-06-15 Thread caribbean410
Sorry, the record size should be 5KB not 5MB. Coz 4KB is still OK. I will try Benjamin's suggestion. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Tuesday, June 15, 2010 8:09 AM To: user@cassandra.apache.org Subject: Re: java.lang.OutofMemoryerror: Java heap spa

RE: java.lang.OutofMemoryerror: Java heap space

2010-06-15 Thread caribbean410
Today I retry the 2GB heap now it's working. No that out of memory error. Looks like I have to restart Cassandra several times before the new changes take effect. -Original Message- From: Benjamin Black [mailto:b...@b3k.us] Sent: Monday, June 14, 2010 7:46 PM To: user@cassandra.apache.or

Re: java.lang.OutofMemoryerror: Java heap space

2010-06-15 Thread Benjamin Black
You should only have to restart once per node to pick up config changes. On Tue, Jun 15, 2010 at 9:41 AM, caribbean410 wrote: > Today I retry the 2GB heap now it's working. No that out of memory error. > Looks like I have to restart Cassandra several times before the new changes > take effect. >

Re: Replication Factor and Data Centers

2010-06-15 Thread Jonathan Ellis
(moving to user@) On Mon, Jun 14, 2010 at 10:43 PM, Masood Mortazavi wrote: > Is the clearer interpretation of this statement (in > conf/datacenters.properties) given anywhere else? > > # The sum of all the datacenter replication factor values should equal > # the replication factor of the keyspa

Re: help for designing a cassandra

2010-06-15 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/ArticlesAndPresentations might help. On Mon, Jun 14, 2010 at 1:13 PM, Johannes Weissensel wrote: > Hi everyone, > i am new to nosql databases and especially column-oriented Databases > like cassandra. > I am a student on information-systems and i evaluate a fittin

java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Julie
I am running a 10 node cassandra 0.6.1 cluster with a replication factor of 3. To populate the database to perform my read benchmarking, I have 8 applications using Thrift, each connecting to a different cassandra server and writing 100,000 rows of data (100 KB each row), using a consistencyLev

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
You are likely exhausting your heap space (probably still at the very small 1G default?), and maximizing the amount of resource consumption by using CL.ALL. Why are you using ALL? On Tue, Jun 15, 2010 at 11:58 AM, Julie wrote: > I am running a 10 node cassandra 0.6.1 cluster with a replication f

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Julie
Benjamin Black b3k.us> writes: > > You are likely exhausting your heap space (probably still at the very > small 1G default?), and maximizing the amount of resource consumption > by using CL.ALL. Why are you using ALL? > > On Tue, Jun 15, 2010 at 11:58 AM, Julie nextcentury.com> wrote: ... >

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Phil Stanhope
How are you doing your inserts? I draw a clear line between 1) bootstrapping a cluster with data and 2) simulating expected/projected read/write behavior. If you are bootstrapping then I would look into the batch_mutate APIs. They allow you to improve your performance on writes dramatically. I

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 1:40 PM, Julie wrote: > Thanks for your reply.  Yes, my heap space is 1G.  My vms have only 1.7G of > memory so I hesitate to use more. Then write slower. There is no free lunch. b

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Jonathan Ellis
On Tue, Jun 15, 2010 at 1:58 PM, Julie wrote: > Coinciding with my write timeouts, all 10 of my cassandra servers are getting > the following exception written to system.log: "Value too large for defined data type" looks like a bug found in older JREs. Upgrade to u19 or later. > Another thing t

stalled streaming

2010-06-15 Thread aaron
Hello,

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Julie
Phil Stanhope wimba.com> writes: > > How are you doing your inserts? > > I draw a clear line between 1) bootstrapping a cluster with data and 2) simulating expected/projected > read/write behavior. > > If you are bootstrapping then I would look into the batch_mutate APIs. They allow you to imp

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Jonathan Ellis
On Tue, Jun 15, 2010 at 5:15 PM, Julie wrote: > I'm also baffled that after all compactions are done on every one of the 10 > servers, about 5 out of 10 servers are still at 40% CPU usage, although they > are doing 0 disk IO. I am not running anything else running on these server > nodes except fo

[OT] Real Time Open source solutions for aggregation and stream processing

2010-06-15 Thread Ian Holsman
firstly, my apologies for the off-topic message, but I thought most people on this list would be knowledgeable and interested in this kind of thing. We are looking to find a open source, scalable solution to do RT aggregation and stream processing (similar to what the 'hop' project http://cod

stalled streaming

2010-06-15 Thread aaron
hello, I have a 4 node cassandra cluster with 0.6.1 installed. We've been running a mixed read / write workload test how it works in our environment, we run about 4M bath mutations and 40M get_range_slice requests over 6 to 8 hours that load about 10 to 15 GB of data. Yesterday while there was

Re: stalled streaming

2010-06-15 Thread Benjamin Black
Known bug, fixed in latest 0.6 release. On Tue, Jun 15, 2010 at 3:29 PM, aaron wrote: > hello, > > I have a 4 node cassandra cluster with 0.6.1 installed. We've been running > a mixed read / write workload test how it works in our environment, we run > about 4M bath mutations and 40M get_range_sl

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Charles Butterfield
Benjamin Black b3k.us> writes: > > Then write slower. There is no free lunch. > > b Are you implying that clients need to throttle their collective load on the server to avoid causing the server to fail? That seems undesirable. Is this a side effect of a server bug, or is it part of the int

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 3:55 PM, Charles Butterfield wrote: > Benjamin Black b3k.us> writes: > >> >> Then write slower.  There is no free lunch. >> >> b > > Are you implying that clients need to throttle their collective load on the > server to avoid causing the server to fail?  That seems undesi

Re: stalled streaming

2010-06-15 Thread aaron
Thanks, will move to 0.6.2. Aaron On Tue, 15 Jun 2010 15:55:46 -0700, Benjamin Black wrote: > Known bug, fixed in latest 0.6 release. > > On Tue, Jun 15, 2010 at 3:29 PM, aaron wrote: >> hello, >> >> I have a 4 node cassandra cluster with 0.6.1 installed. We've been >> running >> a mixed read

RE: read operation is slow

2010-06-15 Thread Dop Sun
Thanks for your updates, good to know that your performance is better now. Actually, if the user asks one record a time, usually it will be done in multi-threading, since most likely the requests coming from different users. If a single users want 200k, and there are no difference to get 1

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Charles Butterfield
Benjamin Black b3k.us> writes: > > I am only saying something obvious: if you don't have sufficient > resources to handle the demand, you should reduce demand, increase > resources, or expect errors. Doing lots of writes without much heap > space is such a situation (whether or not it is happen

PPA for Cassandra Ubuntu Packages

2010-06-15 Thread Clint Byrum
I've setup a launchpad project, team, and a PPA (https://launchpad.net/ppa) for Cassandra packages on Ubuntu here: https://launchpad.net/cassandra-packages https://launchpad.net/~cassandra-ubuntu This team is currently made up of a few members of the Ubuntu Server Team. We'd like to appeal to y

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Jonathan Shook
Actually, you shouldn't expect errors in the general case, unless you are simply trying to use data that can't fit in available heap. There are some practical limitations, as always. If there aren't enough resources on the server side to service the clients, the expectation should be that the serv

Some questions about using Cassandra

2010-06-15 Thread Anthony Ikeda
We are currently looking at a distributed database option and so far Cassandra ticks all the boxes. However, I still have some questions. Is there any need for archiving of Cassandra and what backup options are available? As it is a no-data-loss system I'm guessing archiving is not exactly rele

Re: Some questions about using Cassandra

2010-06-15 Thread Jonathan Shook
There is JSON import and export, of you want a form of external backup. No, you can't hook event subscribers into the storage engine. You can modify it to do this, however. It may not be trivial. An easier way to do this would be to have a boundary system (or dedicated thread, for example) consum

Re: Some questions about using Cassandra

2010-06-15 Thread Jonathan Shook
Doh! Replace "of" with "if" in the top line. On Tue, Jun 15, 2010 at 7:57 PM, Jonathan Shook wrote: > There is JSON import and export, of you want a form of external backup. > > No, you can't hook event subscribers into the storage engine. You can > modify it to do this, however. It may not be t

Re: stalled streaming

2010-06-15 Thread Rob Coli
On Tue, 15 Jun 2010 15:55:46 -0700, Benjamin Black wrote: Known bug, fixed in latest 0.6 release. >>> On 6/15/10 4:06 PM, aaron wrote: >>> Thanks, will move to 0.6.2. I believe that this thread refers to CASSANDRA-1169, and fix version for that is the (unreleased) cassandra 0.6.3, not ("the

RE: Some questions about using Cassandra

2010-06-15 Thread Anthony Ikeda
Thanks Jonathan, I was only asking about the event listeners because an alternative we are considering is TIBCO Active Spaces which draws quite a lot of parallels to Cassandra. I guess it would be interesting to find out how other people use Cassandra, i.e., is it your one stop shop for data st

Re: stalled streaming

2010-06-15 Thread Benjamin Black
This is not the bug to which I was referring. I don't recall the number, perhaps someone else can assist on that front? I just know I specifically upgraded to 0.6 trunk a bit before 0.6.2 to pick up the fix (and it worked). b On Tue, Jun 15, 2010 at 6:07 PM, Rob Coli wrote: > >>> On Tue, 15 J

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 4:44 PM, Charles Butterfield wrote: > > I guess my point is that I have rarely run across database servers that die > from either too many client connections, or too rapid client requests.  They > generally stop accepting incoming connections when there are too many > conn

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 4:44 PM, Charles Butterfield wrote: > To clarify the history here -- initially we were writing with CL=0 and had > great performance but ended up killing the server.  It was pointed out that > we were really asking the server to accept and acknowledge an unbounded > number

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 4:58 PM, Jonathan Shook wrote: > If there aren't enough resources on the server side to service the > clients, the expectation should be that the servers have a graceful > performance degradation, or in the worst case throw an error specific > to resource exhaustion or expl

Re: Some questions about using Cassandra

2010-06-15 Thread Benjamin Black
On Tue, Jun 15, 2010 at 6:07 PM, Anthony Ikeda wrote: > > Thanks Jonathan, I was only asking about the event listeners because an > alternative we are considering is TIBCO Active Spaces which draws quite a lot > of parallels to Cassandra. > > Based on painful production experience, I would not

Re: Some questions about using Cassandra

2010-06-15 Thread Benjamin Black
https://issues.apache.org/jira/browse/CASSANDRA-1016 'Plugins', excuse me. b

Re: Some questions about using Cassandra

2010-06-15 Thread Rob Coli
On 6/15/10 6:35 PM, Benjamin Black wrote: jmhodges contributed a patch (I remain incompetent at Jira searches) for 'coprocessors' to do what you want. That'd be where I'd start looking. https://issues.apache.org/jira/browse/CASSANDRA-1016 =Rob

RE: Some questions about using Cassandra

2010-06-15 Thread Anthony Ikeda
Thanks Benjamin. Looking at the 'plugins' now :) -Original Message- From: Benjamin Black [mailto:b...@b3k.us] Sent: Wednesday, 16 June 2010 11:35 AM To: user@cassandra.apache.org Subject: Re: Some questions about using Cassandra On Tue, Jun 15, 2010 at 6:07 PM, Anthony Ikeda wrote: > >

Re: stalled streaming

2010-06-15 Thread Jonathan Ellis
I think the one you're referring to is https://issues.apache.org/jira/browse/CASSANDRA-1076 On Tue, Jun 15, 2010 at 8:16 PM, Benjamin Black wrote: > This is not the bug to which I was referring.  I don't recall the > number, perhaps someone else can assist on that front?  I just know I > specific

Re: stalled streaming

2010-06-15 Thread Benjamin Black
Yes! On Tue, Jun 15, 2010 at 6:44 PM, Jonathan Ellis wrote: > I think the one you're referring to is > https://issues.apache.org/jira/browse/CASSANDRA-1076 > > On Tue, Jun 15, 2010 at 8:16 PM, Benjamin Black wrote: >> This is not the bug to which I was referring.  I don't recall the >> number, p

Re: JVM Options for Production

2010-06-15 Thread Jonathan Ellis
The main change you'd commonly make is decreasing the max new gen size on large heaps (say to 2GB) from the default of 1/3 of the heap. IMO keeping heap dump on OOM around is a good idea in production; it doesn't cost much (you're already screwed at the point where it starts writing a dump, so why

RE: read operation is slow

2010-06-15 Thread caribbean410
Thank you for the update. For the select issue, right now we just focus on read and write, later we may test delete operation which need to query all keys. From: Dop Sun [mailto:su...@dopsun.com] ks Sent: Tuesday, June 15, 2010 4:14 PM To: user@cassandra.apache.org Subject: RE: read operation i