supercolumn in supercolumns

2010-06-11 Thread bujjigadu made in bangalore
hi all, can anybody give interface to create supercolumn in supercolumn and fill the data...in cassandra with example Thanks bujji

Re: supercolumn in supercolumns

2010-06-11 Thread Sylvain Lebresne
> can anybody give interface to create supercolumn in supercolumn and fill the > data...in cassandra with example That would be hard since you cannot have supercolumns in supercolumns. Supercolumns give you one more level of nesting but that's it. -- Sylvain

Re: Running Cassandra as a Windows Service

2010-06-11 Thread Gary Dusbabek
Sure. Please create a jira ticket (https://issues.apache.org/jira/browse/CASSANDRA) and attach the files you wish to contribute. One of the committers (probably myself) will review them and decide how to integrate them into the project. If it's not too much trouble, an ant build script would be

Pelops - a new Java client library paradigm

2010-06-11 Thread Dominic Williams
Pelops is a new high quality Java client library for Cassandra. It has a design that: * reveals the full power of Cassandra through an elegant "Mutator and Selector" paradigm * generates better, cleaner, less bug prone code * reduces the learning curve for new users * drives rapid application deve

Re: Pelops - a new Java client library paradigm

2010-06-11 Thread Ian Soboroff
Sounds nice. Can you say something about the scales at which you've used this library? Both write and read load? Size of clusters and size of data? Ian On Fri, Jun 11, 2010 at 9:41 AM, Dominic Williams < thedwilli...@googlemail.com> wrote: > Pelops is a new high quality Java client library fo

cassandra crashed

2010-06-11 Thread hive13 Wong
One of our cassandra nodes suddenly crashed, then the other 2... Exceptions found in the system.log are attached below. Any ideas? Does it mean that we've got some bad data running around in the cluster? Many thanks The exeption on the node that crashed first was like ERROR [RESPONSE-STAGE:669] 20

Re: Pelops - a new Java client library paradigm

2010-06-11 Thread Dominic Williams
Hi good question. The scalability of Pelops is dependent on Cassandra, not the library itself. The library aims to provide an more effective access layer on top of the Thrift API. The library does perform connection pooling, and you can control the size of the pool and other parameters using a po

Re: Quick help on Cassandra please: cluster access and performance

2010-06-11 Thread Julie
li wei yahoo.com> writes: > > Thanks you very much, Per! > > - Original Message > From: Per Olesen trifork.com> > To: "user cassandra.apache.org" cassandra.apache.org> > Sent: Wed, June 9, 2010 4:02:52 PM > Subject: Re: Quick help on Cassandra please: cluster access and performanc

Re: Pelops - a new Java client library paradigm

2010-06-11 Thread Riyad Kalla
Dominic, I like the API; reads clearly and fairly intuitive. I think Ian was asking about what large-scale production deployments Pelops has been deployed in that you could speak to -- he's trying to get a confidence index and I am interested as well ;) Best, Riyad On Fri, Jun 11, 2010 at 7:04

Re: cassandra out of heap space crash

2010-06-11 Thread Julie
Ran Tavory gmail.com> writes: > > I can't say exactly how much memory is the correct amount, but surely 1G is very little. By replicating 3 times your cluster now makes 3 times more work than it used to do, both on reads and on writes while the readers/writers continue hammering it the same pace

Re: read operation is slow

2010-06-11 Thread Riyad Kalla
Caribbean410, This comes up on the Redis list alot as well -- what you are actually measuring is the client sending a network connection to the Cas server and it replying -- so the performance numbers you are getting can easily be 70% network wait time and not necessarily hardcore read/write serve

Re: Quick help on Cassandra please: cluster access and performance

2010-06-11 Thread Julie
li wei yahoo.com> writes: > > Thanks you very much, Per! > > - Original Message > From: Per Olesen trifork.com> > To: "user cassandra.apache.org" cassandra.apache.org> > Sent: Wed, June 9, 2010 4:02:52 PM > Subject: Re: Quick help on Cassandra please: cluster access and performance

Re: cassandra out of heap space crash

2010-06-11 Thread Gary Dusbabek
On Fri, Jun 11, 2010 at 10:14, Julie wrote: > Ran Tavory gmail.com> writes: > >> >> I can't say exactly how much memory is the correct amount, but surely 1G is > very little. By replicating 3 times your cluster now makes 3 times more work > than it used to do, both on reads and on writes while th

Re: Cassandra Write Performance, CPU usage

2010-06-11 Thread Mike Malone
Jonathan, while I agree with you re: this being an unusual load for the system, it is interesting that he's found at least one use-case where Cassandra is CPU-bound, not IO-bound. I'd definitely be interested in learning what his critical path is and seeing if there's some low-hanging fruit that ma

Re: cassandra out of heap space crash

2010-06-11 Thread William Ashley
Would it be reasonable to have (possibly configurable) caps on the maximum size of any internal Cassandra queues that are directly populated by client requests? I understand this might mean sometimes breaking the API contract for writers using CL.ZERO by blocking on those calls, but on the other

Re: read operation is slow

2010-06-11 Thread Caribbean410
Thanks Riyad. Right now I am just testing Cassandra on single node. The server and client are running on the same machine. I tried the read test again on two machines, on one machine the cpu usage is around 30% most of the time and another is 90%. Pelops is one way to access Cassandra, there are

Timestamp of an entire row?

2010-06-11 Thread Steven Haar
What is the best way to determine the timestamp of a row (ie. the most recent time when any of the columns of a row were modified). Correct me if I am wrong, but I beleive a row does not have a timestamp, only individual columns have timestamps. So for instance if I need to know when a row was la

Re: Timestamp of an entire row?

2010-06-11 Thread Sylvain Lebresne
> Correct me if I am wrong, but I beleive a row does not have a timestamp, > only individual columns have timestamps. You are correct > So for instance if I need to know when a row was last modified, would I have > to perform a get_slice and get all columns pertaining to that row, and then > iter

RackAwareStrategy Documentation Disparity

2010-06-11 Thread Blew, Aaron
Hello, I was browsing the docs for the RackAware replication strategy and saw this on the wiki: "RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in another data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the

Re: cassandra out of heap space crash

2010-06-11 Thread Jonathan Ellis
We give you enough rope to hang yourself. Don't use ZERO if that's not what you want. :) On Fri, Jun 11, 2010 at 9:23 AM, William Ashley wrote: > Would it be reasonable to have (possibly configurable) caps on the maximum > size of any internal Cassandra queues that are directly populated by cli

Re: RackAwareStrategy Documentation Disparity

2010-06-11 Thread Jonathan Ellis
I'm not sure what parts of these you think are in conflict. On Fri, Jun 11, 2010 at 10:34 AM, Blew, Aaron wrote: > Hello, > I was browsing the docs for the RackAware replication strategy and saw this > on the wiki: > > "RackAwareStrategy: replica 2 is placed in the first node along the ring the

Re: read operation is slow

2010-06-11 Thread Jonathan Ellis
you need to look at cfstats to see what the latency is internal to cassandra, vs what your client is introducing then you should probably read the comments in the configuration file about caching On Fri, Jun 11, 2010 at 9:38 AM, Caribbean410 wrote: > > Thanks Riyad. > > Right now I am just testi

Re: RackAwareStrategy Documentation Disparity

2010-06-11 Thread Blew, Aaron
The first text says the N-2 replicas are placed in the same rack while the second text indicates that the N-2 replicas will be placed in different racks. Am I missing something? -Aaron --- Aaron Blew - Sr. Infrastructure Engineer | The Automator iovation - The Power of Reputation™ aaron.b...@io

Re: Cassandra won't start after node crash

2010-06-11 Thread Lucas Di Pentima
Hello Jonathan, El 08/06/2010, a las 19:15, Jonathan Ellis escribió: > Sounds like you had some bad hardware take down your index files. > (Cassandra fsyncs them after writing them and before renaming them to > being live, so if it's missing pieces then it's always been hardware > at fault that I

Re: RackAwareStrategy Documentation Disparity

2010-06-11 Thread Jonathan Ellis
You're mis-parsing the first. It says "place one replica in a different datacenter, and the others on different racks in the same one," where the antecedent of "one" is "datacenter," not rack. On Fri, Jun 11, 2010 at 11:09 AM, Blew, Aaron wrote: > The first text says the N-2 replicas are placed

Re: Cassandra won't start after node crash

2010-06-11 Thread Jonathan Ellis
Hmm. That seems to be saying that sstable2json is using the index file, and erroring out there the same way the Cassandra server does. So it doesn't necessarily mean the data files are corrupt. On Fri, Jun 11, 2010 at 11:20 AM, Lucas Di Pentima wrote: > Hello Jonathan, > > El 08/06/2010, a las 1

Re: Cassandra Write Performance, CPU usage

2010-06-11 Thread Rishi Bhardwaj
I think it would be a good exercise to know what the CPU bottleneck is on the write path. The fact that Cassandra optimizes disk I/O for writes would only go so far if the CPU becomes a big bottleneck on continuous writes. I am fairly new to Java ecosystem performance profiling but I would give

Re: RackAwareStrategy Documentation Disparity

2010-06-11 Thread Aaron Blew
That makes sense. It still seems to conflict with what's stated in the wiki: "...the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the *same* rack as the first" -Aaron On Fri, Jun 11, 2010 at 11:20 AM, Jonathan Ellis wrote: > You're mis-parsing the first. I

Snapshot Location

2010-06-11 Thread Stephan Pfammatter
Hello, Is the snapshot location somewhere configurable with 0.62 (There is a ticket http://www.mail-archive.com/cassandra-comm...@incubator.apache.org/msg09432.html but seems to be not implemented yet)? Tx.

Re: cassandra out of heap space crash

2010-06-11 Thread Ran Tavory
Gary fwiw I get oom with Cl one quite commonly if I'm not careful with my writes On Jun 11, 2010 8:48 PM, "Jonathan Ellis" wrote: We give you enough rope to hang yourself. Don't use ZERO if that's not what you want. :) On Fri, Jun 11, 2010 at 9:23 AM, William Ashley wrote: > Would it be reas

Re: RackAwareStrategy Documentation Disparity

2010-06-11 Thread Jonathan Ellis
Then the wiki is incorrect. On Fri, Jun 11, 2010 at 11:33 AM, Aaron Blew wrote: > That makes sense.  It still seems to conflict with what's stated in the > wiki: > "...the remaining N-2 replicas, if any, are placed on the first nodes along > the ring in the *same* rack as the first" > -Aaron > >

Re: Snapshot Location

2010-06-11 Thread Jonathan Ellis
That ticket was closed wontfix because it doesn't make sense to make this configurable, which would allow users to shoot themselves in the foot very easily (by mis-configuring snapshot location to a different volume) for no real benefit. On Fri, Jun 11, 2010 at 11:37 AM, Stephan Pfammatter wrote:

Re: Cassandra Write Performance, CPU usage

2010-06-11 Thread Jonathan Ellis
yes, it is expected that writes are cpu-bound. On Fri, Jun 11, 2010 at 11:29 AM, Rishi Bhardwaj wrote: > I think it would be a good exercise to know what the CPU bottleneck is on > the write path. The fact that Cassandra optimizes disk I/O for writes would > only go so far if the CPU becomes a bi

Read latency on one ColumnFamily greater than the rest of Column Families ..

2010-06-11 Thread Nazario Parsacala
So I have setup some test with Cassandra (with OCM). Though not very impressed with the single read speeds , I have observed that it does scale very well even with numerous number of concurrent readers .. However I have observed that no matter what I do , I am somehow limitted to around 4 ms of rea

Re: File Descriptor leak

2010-06-11 Thread Matthew Conway
Thanks, I just tried apache-cassandra-2010-06-11_12-30-33 (hudson 462) but my tests ares still reporting a leak (though not as bad), I do the following (ruby tests using cassandra_object/cassandra, but you should be able to get the idea): should "not leak file descriptors" do cassa

CLI failure due to default constructor in Types

2010-06-11 Thread Eben Hewitt
Hi Everyone I am wondering why default visibility constructors were added to each of the *Type classes in the db.marshal package? If I'm understanding right, this breaks the CLI. Explanation: This is the original UTF8Type class from 0.6.2, which worked: import java.io.UnsupportedEncodingExcepti

Re: Read latency on one ColumnFamily greater than the rest of Column Families ..

2010-06-11 Thread Jonathan Ellis
you're probably reading more or larger columns from that CF. I would consider using row cache on that CF and on Deployment based on the query volumes seen here. On Fri, Jun 11, 2010 at 12:32 PM, Nazario Parsacala wrote: > So I have setup some test with Cassandra (with OCM). Though not very > imp

Data format stability

2010-06-11 Thread Matthew Conway
Hi All, I'd like to start using trunk for something real, but am concerned about stability of the data format. That is, will I be able to upgrade a running system to a newer version of trunk and eventually to the 7.0 release, or are there any changes planned to the format of the data stored on

Re: Cassandra Write Performance, CPU usage

2010-06-11 Thread Rishi Bhardwaj
Can we configure/optimize the write path to lower CPU bound and improve performance? I am wondering if I should investigate and see what is eating up so much of CPU (memory/data copying? bloom filters? etc.). Would this be a worthwhile investigation to see if we can improve on things or is there

Re: read operation is slow

2010-06-11 Thread Caribbean410
This is the cfstats. Right now I use three thread to read 200k records. I only use Keyspace1 and Column family Standard2. For other unused column families, do I need to comment them out in storage configure file? The latency is 0.2576ms per records, is this a regular number (we are reading from ssd

Re: CLI failure due to default constructor in Types

2010-06-11 Thread Jonathan Ellis
The types should be singletons. Jeremy has submitted a patch fixing the CLI. On Fri, Jun 11, 2010 at 2:01 PM, Eben Hewitt wrote: > Hi Everyone > I am wondering why default visibility constructors were added to each of the > *Type classes in the db.marshal package? If I'm understanding right, thi

Re: File Descriptor leak

2010-06-11 Thread Jonathan Ellis
it goes up by exactly 2000, which is the number of loop iterations exactly? are you sure this isn't just counting your open sockets? On Fri, Jun 11, 2010 at 1:53 PM, Matthew Conway wrote: > Thanks, I just tried apache-cassandra-2010-06-11_12-30-33 (hudson 462) but my > tests ares still reportin

Re: Data format stability

2010-06-11 Thread Jonathan Ellis
If you're comfortable following comm...@cassandra.apache.org, it should be pretty obvious which changes are going to break things temporarily or require a commitlog drain. Otherwise, we recommend sticking with the stable branch until a beta is released. On Fri, Jun 11, 2010 at 2:24 PM, Matthew Co

Re: Cassandra won't start after node crash

2010-06-11 Thread Brandon Williams
On Fri, Jun 11, 2010 at 1:23 PM, Jonathan Ellis wrote: > Hmm. That seems to be saying that sstable2json is using the index > file, and erroring out there the same way the Cassandra server does. > So it doesn't necessarily mean the data files are corrupt. I believe you can confirm this with sst

Re: Cassandra won't start after node crash

2010-06-11 Thread Jonathan Ellis
Other way around: sstablekeys _only_ reads the index. On Fri, Jun 11, 2010 at 4:51 PM, Brandon Williams wrote: > On Fri, Jun 11, 2010 at 1:23 PM, Jonathan Ellis wrote: >> >> Hmm.  That seems to be saying that sstable2json is using the index >> file, and erroring out there the same way the Cassan

Re: read operation is slow

2010-06-11 Thread Caribbean410
I remove some unnecessary column family and change the size of rowcache and keycache, now the latency changes from 0.25ms to 0.09ms. In essence 0.09ms*200k=18s. I don't know why it takes more than 400s total. Here is the client code and cfstats. There are not many operations here, why is the extra

RE: read operation is slow

2010-06-11 Thread Dop Sun
Jassandra is used here: Map> map = criteria.select(); The select here basically is a call to Thrift API: get_range_slices From: Caribbean410 [mailto:caribbean...@gmail.com] Sent: Saturday, June 12, 2010 8:00 AM To: user@cassandra.apache.org Subject: Re: read operation is slow I

Re: read operation is slow

2010-06-11 Thread Caribbean410
Hi, do you mean this one should not introduce much extra delay? To read a record, I need select here, not sure where the extra delay comes from. On Fri, Jun 11, 2010 at 5:29 PM, Dop Sun wrote: > Jassandra is used here: > > > > Map> map = criteria.select(); > > > > The select here basically is a

RE: read operation is slow

2010-06-11 Thread Dop Sun
You mean after you "I remove some unnecessary column family and change the size of rowcache and keycache, now the latency changes from 0.25ms to 0.09ms. In essence 0.09ms*200k=18s.", it still takes 400 seconds to returning? From: Caribbean410 [mailto:caribbean...@gmail.com] Sent: Saturday, Jun

Re: read operation is slow

2010-06-11 Thread Caribbean410
Hi, previously it is 438s. Now it is 399s. Still large. On Fri, Jun 11, 2010 at 5:56 PM, Dop Sun wrote: > You mean after you “I remove some unnecessary column family and change > the size of rowcache and keycache, now the latency changes from 0.25ms to > 0.09ms. In essence 0.09ms*200k=18s.”, it

RE: read operation is slow

2010-06-11 Thread Dop Sun
And also, you are only select 1 key and 10 columns? criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, nameFirst, 10); Then, if you have 200k keys, you have 200k Thrift calls. If this is the case, you may need to optimize the way you do the query (to combine multiple key

RE: read operation is slow

2010-06-11 Thread caribbean410
Thanks for the suggestion. For the test case, it is 1 key and 1 column. I once changed 10 to 1, as I remember there is no much difference. I have 200k keys and each key is randomly generated. I will try the optimized query next week. But maybe you still have to face the case that each time a cl

using cassandra w/django

2010-06-11 Thread S Ahmed
When using cassandra with django, can you still use the rapid development freatures of django w/cassandra or are you basically just using the framework but the models and ORM features are up to you to implement since you are using cassandra.

Re: using cassandra w/django

2010-06-11 Thread Jeremy Dunck
There's no direct support for cassandra in django, but there are a couple starts. http://www.allbuttonspressed.com/projects/django-nonrel http://github.com/enki/tragedy http://code.djangoproject.com/wiki/SummerOfCode2010 All of the features which Django has and which build on the ORM are out, of

Re: using cassandra w/django

2010-06-11 Thread S Ahmed
I see, well I am new to python + django so I wasn't sure what I really meant :) So basically I am using django for its framework related features, but excluding the ORM/autogen admin pages. That's reasonable and understable thanks. On Fri, Jun 11, 2010 at 10:38 PM, Jeremy Dunck wrote: > There'

Re: read operation is slow

2010-06-11 Thread Jonathan Ellis
sounds like most of the latency is in your client code, or waiting for the network On Fri, Jun 11, 2010 at 6:02 PM, Caribbean410 wrote: > Hi, previously it is 438s. Now it is 399s. Still large. > > On Fri, Jun 11, 2010 at 5:56 PM, Dop Sun wrote: >> >> You mean after you “I remove some unnecessar

Re: using cassandra w/django

2010-06-11 Thread Jeremy Dunck
I think you'll find the django-users mailing list pretty helpful. If you'd like to contribute use-cases to the GSoC nonrel work, that'd be django-developers. Good luck. :-) On Fri, Jun 11, 2010 at 9:43 PM, S Ahmed wrote: > I see, well I am new to python + django so I wasn't sure what I really

file corruption since 0.6.2

2010-06-11 Thread Lu Ming
Many files are corrupted when our cassandra is update to 0.6.2 COMPACTION-POOL is down caused by the following error. and some nodes can NOT startup because of this error. Is it caused by the issue CASSANDRA-1169? The node got the wrong or corrupted stream file? ERROR [COMPACTION-POOL:1] 2010-06