Re: Cassandra 1.x and proper JNA setup

2011-11-03 Thread Maciej Miklas
According to source code, JNA is being used to call malloc and free. In this case each cached row will be serialized into RAM. We must be really careful when defining cache size - to large size would cause out of memory. Previous Cassandra releases has logic that would decrease cache size if heap i

Re: Second Cassandra users survey

2011-11-03 Thread Peter Tillotson
I'm using Cassandra as a big graph database, loading large volumes of data live and linking on the fly.  The number of edges grow geometrically with data added, and need to be read to continue linking the graph on the fly.  Consequently, my problem is constrained by:  * Predominantly read - esp

Re: Second Cassandra users survey

2011-11-03 Thread Radim Kolar
* Compaction is expensive Yes, it is. Thats why i deciced not to go with hadoop hdfs backed by cassandra.

Re: Second Cassandra users survey

2011-11-03 Thread Mohit Anchlia
On Thu, Nov 3, 2011 at 5:46 AM, Peter Tillotson wrote: > I'm using Cassandra as a big graph database, loading large volumes of data > live and linking on the fly. Not sure if Cassandra is right fit to model complex vertexes and edges. > The number of edges grow geometrically with data added, and

Re: Second Cassandra users survey

2011-11-03 Thread Peter Tillotson
>>  * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey) >>    I do a lot of checking against dynamic colnames > >I agree, some kind of integration with search engine is required to >support adhoc queries as well and searching on column names. This will >be really helpful. > >Curr

Re: Cassandra 1.x and proper JNA setup

2011-11-03 Thread Jonathan Ellis
Relying on that was always a terrible idea because you could easily OOM before it could help. There's no substitute for "don't make the caches too large" in the first place. We're working on https://issues.apache.org/jira/browse/CASSANDRA-3143 to make cache sizing easier. On Thu, Nov 3, 2011 at

Re: Second Cassandra users survey

2011-11-03 Thread Ertio Lew
Provide an option to sort columns by timestamp i.e, in the order they have been added to the row, with the facility to use any column names. On Wed, Nov 2, 2011 at 4:29 AM, Jonathan Ellis wrote: > Hi all, > > Two years ago I asked for Cassandra use cases and feature requests. > [1] The results

Re: data model for unique users in a time period

2011-11-03 Thread David Jeske
On Wed, Nov 2, 2011 at 7:26 PM, David Jeske wrote: > - make sure the summarizer does try to do it's job for a batch of counters > until they are fully replicated and 'static' (no new increments will appear) > Apologies. make the summarizer ( doesn't ) try to do it's job...

Re: Second Cassandra users survey

2011-11-03 Thread Konstantin Naryshkin
I realize that it is not realistic to expect it, but is would be good to have a Partitioner that supports both range slices and automatic load balancing. On Thu, Nov 3, 2011 at 13:57, Ertio Lew wrote: > Provide an option to sort columns by timestamp i.e, in the order they have > been added to the

Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just fine with the rolling upgrade. But now I'm having extreme load growth on one of my nodes (and others are growing faster than usual also). I attempted to run a cfstats against the extremely large node that was seeing 2x th

Re: Problem after upgrade to 1.0.1

2011-11-03 Thread Jonathan Ellis
Just to rule it out: you didn't do anything tricky like update HintsColumnFamily to use compression? On Thu, Nov 3, 2011 at 1:39 PM, Bryce Godfrey wrote: > I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go just > fine with the rolling upgrade.  But now I’m having extreme load gr

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Dan Hendry
Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [ma

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
Nope. I did alter two of my own column families to use Leveled compaction and then ran scrub on each node, is the only change I have made from the upgrade. Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 -Original Message- From: Jonathan

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
Disk utilization is actually about 80% higher than what is reported for nodetool ring across all my nodes on the data drive Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: 206.926.1978 | M: 206.849.2477 From: Dan Hendry [mailto:dan.hendry.j...@gmail.com

Re: Problem after upgrade to 1.0.1

2011-11-03 Thread Jonathan Ellis
Does restarting the node fix this? On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey wrote: > Disk utilization is actually about 80% higher than what is reported for > nodetool ring across all my nodes on the data drive > > > > Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: > 206.92

Retreiving column by names Vs by range, which is more performant ?

2011-11-03 Thread Ertio Lew
Retrieving columns by names vs by range which is more performant , when you have the options to do both ?

Re: Debian package jna bug workaroung

2011-11-03 Thread paul cannon
I can't reproduce this. What version of the cassandra deb are you using, exactly, and why are you symlinking or copying jna.jar into /usr/share/cassandra? The initscript should be adding /usr/sahre/java/jna.jar to the classpath, and that should be all you need. The failure you see with o.a.c.cach

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
A restart fixed the load numbers, they are back to where I expect them to be now, but disk utilization is double the load #. I'm also still get the cfstats exception from any node. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 11:5

Re: Retreiving column by names Vs by range, which is more performant ?

2011-11-03 Thread Brandon Williams
On Thu, Nov 3, 2011 at 2:05 PM, Ertio Lew wrote: > Retrieving columns by names vs by range which is more performant , when you > have the options to do both ? Assuming the columns have never been overwritten, range has a small advantage. However, in the face of frequently updated (overwritten) c

Concatenating ids with extension to keep multiple rows related to an entity in a single CF

2011-11-03 Thread Aditya Narayan
I am concatenating two Integer ids through bitwise operations(as described below) to create a single primary key of type long. I wanted to know if this is a good practice. This would help me in keeping multiple rows of an entity in a single column family by appending different extensions to the en

Re: Debian package jna bug workaroung

2011-11-03 Thread Peter Tillotson
Cassandra 1.0.1 and only seemed to happen with * JAVA_HOME=/usr/lib/jvm/java-6-sun and jna.jar copied into /usr/share/cassandra(/lib) I then saw the detail in the init script and how it was being linked Is there a way I can verify which provider is being used? I want to make sure Off heap is bein

Re: Second Cassandra users survey

2011-11-03 Thread Todd Burruss
- Better performance when access random columns in a wide row - caching subsets of wide rows - possibly on the same boundaries as the index - some sort of notification architecture when data is inserted. This could be co-processors, triggers, plugins, etc - auto load balance when adding new nodes

Read perf investigation

2011-11-03 Thread Ian Danforth
All, I've done a bit more homework, and I continue to see long 200ms to 300ms read times for some keys. Test Setup EC2 M1Large sending requests to a 5 node C* cluster also in EC2, also all M1Large. RF=3. ReadConsistency = ONE. I'm using pycassa from python for all communication. Data Model On

RE: Read perf investigation

2011-11-03 Thread Dan Hendry
Uh, so look at your await time of *107.3*. From the iostat man page: "await: The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them." If the key you are reading from is not

Benchmarking Cassandra scalability to over 1M writes/s on AWS

2011-11-03 Thread Adrian Cockcroft
Hi folks, we just posted a detailed Netflix technical blog entry on this http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Hope you find it interesting/useful Cheers Adrian

Re: Problem after upgrade to 1.0.1

2011-11-03 Thread Jonathan Ellis
I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451. If you build with that patch and rerun scrub the exception should go away. On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey wrote: > A restart fixed the load numbers, they are back to where I expect them

Re: Benchmarking Cassandra scalability to over 1M writes/s on AWS

2011-11-03 Thread Jonathan Ellis
<3 the straight line. Fantastic! On Thu, Nov 3, 2011 at 6:41 PM, Adrian Cockcroft wrote: > Hi folks, > > we just posted a detailed Netflix technical blog entry on this > http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html > > Hope you find it interesting/useful > > Che

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Bryce Godfrey
Thanks for the help so far. Is there any way to find out why my HintsColumnFamily is so large now, since it wasn't this way before the upgrade and it seems to just climbing? I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking I have a bunch of stale hints from upg

Re: Concatenating ids with extension to keep multiple rows related to an entity in a single CF

2011-11-03 Thread Tyler Hobbs
On Thu, Nov 3, 2011 at 3:48 PM, Aditya Narayan wrote: > I am concatenating two Integer ids through bitwise operations(as > described below) to create a single primary key of type long. I wanted to > know if this is a good practice. This would help me in keeping multiple > rows of an entity in a

Upcoming Apache Cassandra trainings from DataStax

2011-11-03 Thread Nate McCall
As an FYI for folks interested in quickly gaining an in-depth understanding of developing for and operating Apache Cassandra clusters, DataStax has the following training courses scheduled: Austin, TX (Nov. 14th): http://datastaxaustin.eventbrite.com/ San Mateo, CA (Dec. 13th): http://datastaxsf.

Re: Concatenating ids with extension to keep multiple rows related to an entity in a single CF

2011-11-03 Thread Aditya Narayan
the data in different rows of an entity is all of similar type but serves different features but still has almost similar storage and retrieval needs thus I wanted to put them in one CF and reduce column families. >From my knowledge, I believe compositeType existed for columns as an alternative c