Re: ReplicateOnWrite issues

2011-07-12 Thread Yang
interesting, first just to make sure: since replicateOnWrite is for Counters, you are using counters (you use the word "insert" instead of "add/increment" ) right? if you are using counters, supposedly the leader runs replicateOnWrite, somehow all your adds find the one box as leader, that's prob

Re: ReplicateOnWrite issues

2011-07-12 Thread David Hawthorne
It's definitely for counters, and some of the rows I'm inserting are long-ish, if 1.3MB is long. Maybe it would help if I said I was using counter super columns. I'm also writing to only a handful of rows at a time, until they are full. It looks like the counter super column code in addReadCo

Kundera 2.0.1 Released

2011-07-12 Thread Amresh Singh
We are happy to announce release of Kundera 2.0.1 Kundera 2.0.1 is a JPA compliant, Object-Datastore Mapping Library for NoSQL Datastores. The idea behind Kundera is to make working with NoSQL Databases drop-dead simple and fun. Some salient features of Kundera are: 1. Fully JPA 1.0 compliant.

R: Re: AntiEntropy?

2011-07-12 Thread cbert...@libero.it
>From "Cassandra the definitive guide" - Basic Maintenance - Repair Running nodetool repair causes Cassandra to execute a Major Compaction [...] AntiEntropyService implements the Singleton pattern and defines the static Differencer class as well, which is used to compare two trees. If it finds a

Re: CassandraFS in 1.0?

2011-07-12 Thread Rustam Aliyev
Hi David, This is interesting topic and it would be interesting to hear from someone who is using it in prod. Particularly - How your fs implementation behaves for medium/large files, e.g. > 1MB? If you store large files, how large is your store per node and how does it handle compactions

Secondry Index over Super Column

2011-07-12 Thread Vivek Mishra
Recently we released Kundera-2.0.1 One feature which is included in this release is Secondary index support over Super Column. I have tried to compile a blog post for the same, which can be found for a reference at: http://mevivs.wordpress.com/2011/07/12/cassandra-play-kundera-orm/ -Vivek

Predefined columns for Secondary index

2011-07-12 Thread osishkin osishkin
I'm thinking about Secondary indexes as an alternative for using column families to index my data. I'd appreciate some answers to basic questions about this new feature. 1. Can you insert a row to a column family with a predefined column metadata (and secondary index for these columns) that does n

Survey: Cassandra/JVM Resident Set Size increase

2011-07-12 Thread Chris Burroughs
### Preamble There have been several reports on the mailing list of the JVM running Cassandra using "too much" memory. That is, the resident set size is >>(max java heap size + mmaped segments) and continues to grow until the process swaps, kernel oom killer comes along, or performance just degra

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-12 Thread A J
>From "Cassandra the definitive guide" - Basic Maintenance - Repair "Running nodetool repair causes Cassandra to execute a major compaction. During a major compaction (see “Compaction” in the Glossary), the server initiates a TreeRequest/TreeReponse conversation to exchange Merkle trees with ne

Re: Re: AntiEntropy?

2011-07-12 Thread Peter Schuller
> So now I'm confused ... Cassandra doc says that I have to run it by myself, > Cassandra book says I don't have to. > Did I misunderstand something? The book is wrong, at least by current versions of Cassandra (I'm basing that on the quote you pasted, I don't know the context). nodetool repair m

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread AJ
Yang, I'm not sure I understand what you mean by "prefix of the HLog". Also, can you explain what failure scenario you are talking about? The major failure that I see is when the leader node confirms to the client a successful local write, but then fails before the write can be replicated to

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-12 Thread Peter Schuller
> From "Cassandra the definitive guide" - Basic Maintenance - Repair > "Running nodetool repair causes Cassandra to execute a major compaction. > During a major compaction (see “Compaction” in the Glossary), the > server initiates a > TreeRequest/TreeReponse conversation to exchange Merkle tree

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-12 Thread A J
Just confirming. Thanks for the clarification. On Tue, Jul 12, 2011 at 10:53 AM, Peter Schuller wrote: >> From "Cassandra the definitive guide" - Basic Maintenance - Repair >> "Running nodetool repair causes Cassandra to execute a major compaction. >> During a major compaction (see “Compactio

Anyone using Facebook's flashcache?

2011-07-12 Thread AJ
With big data requirements pressuring me to pack up to a terabyte on one node, I suspect that even 32 GB of RAM just will not be large enough for Cass' various memory caches to be effective. 32/1000 is a tiny working set to data store ratio... even assuming non-random reads. So, I'm investiga

Re: CassandraFS in 1.0?

2011-07-12 Thread David Strauss
We actually cap the maximum file size pretty low (30-50MB) because this system is mainly designed to handle images and audio from browser uploads. Most files are accessed via HTTP requests, which get cached in our edge layer. So, mostly the design aims to be distributed and available more than blaz

Re: Anyone using Facebook's flashcache?

2011-07-12 Thread Peter Schuller
> Do any Cass developers have any thoughts on this and whether or not it would > be helpful considering Cass' architecture and operation? A well-functioning L2 cache should definitely be very useful with Cassandra for read-intensive workloads where the request distribution is such that additional

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread Ryan King
If you're interested in this idea, you should read up about Spinnaker: http://www.vldb.org/pvldb/vol4/p243-rao.pdf -ryan On Mon, Jul 11, 2011 at 2:48 PM, Yang wrote: > I'm not proposing any changes to be done, but this looks like a very > interesting topic for thought/hack/learning, so the follo

Write placement questions: ringIterator() and firstTokenIndex()

2011-07-12 Thread Eric tamme
I have been reading through some code in TokenMetadata.java, specifically with ringIterator() and firstTokenIndex(). I am trying to get a very firm grasp on how nodes are collected for writes. I have run into a bit of confusion about what happens when the data's token is larger than than the larg

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread Yang
for example, coord writes record 1,2 ,3 ,4,5 in sequence if u have replica A, B, C currently A can have 1 , 3 B can have 1,3,4, C can have 2345 by "prefix", I mean I want them to have only 1---n where n is some number between 1 and 5, for example A having 1,2,3 B having 1,2,3,4 C having 1,2,3,4,

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread Yang
thanks , let me read it... On Tue, Jul 12, 2011 at 9:27 AM, Ryan King wrote: > If you're interested in this idea, you should read up about Spinnaker: > http://www.vldb.org/pvldb/vol4/p243-rao.pdf > > -ryan > > On Mon, Jul 11, 2011 at 2:48 PM, Yang wrote: >> I'm not proposing any changes to be do

Off-heap Cache

2011-07-12 Thread Raj N
Do we need to do anything special to turn off-heap cache on? https://issues.apache.org/jira/browse/CASSANDRA-1969 -Raj

Re: ReplicateOnWrite issues

2011-07-12 Thread Yang
what do you mean by "until they are full" ? right now I guess a quick black-box testing method for this problem is to try inserting only shorter rows , and see if that persists. as you said, it could be that addReadCommandFromColumnFamily is taking a lot of time to read, if that's from disk, it's

R: Re: Re: AntiEntropy?

2011-07-12 Thread cbert...@libero.it
>The book is wrong, at least by current versions of Cassandra (I'm >basing that on the quote you pasted, I don't know the context). To be sure that I didn't misunderstand (English is not my mother tongue) here is what the entire "repair paragraph" says ... Basic Maintenance There are a few tasks

Re: node stuck "leaving"

2011-07-12 Thread Brandon Williams
On Mon, Jul 11, 2011 at 11:51 PM, Casey Deccio wrote: > java.lang.RuntimeException: Cannot recover SSTable with version f (current > version g). You need to scrub before any streaming is performed. -Brandon

Re: Anyone using Facebook's flashcache?

2011-07-12 Thread AJ
On 7/12/2011 10:19 AM, Peter Schuller wrote: Do any Cass developers have any thoughts on this and whether or not it would be helpful considering Cass' architecture and operation? A well-functioning L2 cache should definitely be very useful with Cassandra for read-intensive workloads where the re

Re: Out of memory error in cassandra

2011-07-12 Thread Jonathan Ellis
Have you seen http://www.datastax.com/docs/0.8/troubleshooting/index#nodes-are-dying-with-oom-errors ? On Mon, Jul 11, 2011 at 1:55 PM, Anurag Gujral wrote: > Hi All, >    I am getting following error from cassandra: > ERROR [ReadStage:23] 2011-07-10 17:19:18,300 > DebuggableThreadPoolEx

Re: Secondary Index doesn't work with LOCAL_QUORUM

2011-07-12 Thread Jonathan Ellis
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-2870 to me. You can disable the dynamic snitch as a workaround, or use a different consistencylevel. On Mon, Jul 11, 2011 at 11:38 AM, Hefeng Yuan wrote: > Hi, > We're using Cassandra with 2 DC > - one OLTP Cassandra, 6 nodes, with RF3

Re: Off-heap Cache

2011-07-12 Thread Jonathan Ellis
You need to set row_cache_provider=SerializingCacheProvider on the columnfamily definition (via the cli) On Tue, Jul 12, 2011 at 9:57 AM, Raj N wrote: > Do we need to do anything special to turn off-heap cache on? > https://issues.apache.org/jira/browse/CASSANDRA-1969 > -Raj -- Jonathan Ellis

Re: R: Re: Re: AntiEntropy?

2011-07-12 Thread aaron morton
> Running nodetool repair causes Cassandra to execute a major compaction This is not what I would call factually accurate. Repair does not run a major compaction. Major compaction is when all SSTables for a CF are compacted down to one SSTable. Cheers - Aaron Morton Freelance C

Re: ReplicateOnWrite issues

2011-07-12 Thread Sylvain Lebresne
When you do counter increment at CL.ONE, the write is acknowledged as soon as the first replica getting the the write has pushed the increment into his memtable. However, there is a read happening for the replication to the other replicas (this is necessary to the counter design). What is happening

Re: Write placement questions: ringIterator() and firstTokenIndex()

2011-07-12 Thread Jonathan Ellis
You're going to be mad at how simple the answer turns out to be. :) Nodes "own" the range from (previous, token], NOT from [token, next). So, the last node will get from (50, 75] and the first will get from (75, 0]. On Tue, Jul 12, 2011 at 9:38 AM, Eric tamme wrote: > I have been reading through

Re: Write placement questions: ringIterator() and firstTokenIndex()

2011-07-12 Thread Eric tamme
On Tue, Jul 12, 2011 at 3:32 PM, Jonathan Ellis wrote: > You're going to be mad at how simple the answer turns out to be. :) > > Nodes "own" the range from (previous, token], NOT from [token, next). > So, the last node will get from (50, 75] and the first will get from > (75, 0]. > Okay i figured

sstabletojson

2011-07-12 Thread Stephen Pope
Hey there. I'm trying to convert one of my sstables to json, but it doesn't appear to be escaping quotes. As a result, I've got a line in my resulting json like this: "3230303930373139313734303236efbfbf3331313733": [["6d6573736167655f6964", ""<66AA9165386616028BD3FECF893BBAC204347F3BAF@CONFLIC

Re: ReplicateOnWrite issues

2011-07-12 Thread David Hawthorne
Well, I was using a large number of clients: I tried configuring a hector pool of 20-200 to see what affect that had on throughput. There's definitely a point after which there's no gain, so I dialed it back down. To clarify a few other things, when I say inserts I mean increments, as this te

Re: Write placement questions: ringIterator() and firstTokenIndex()

2011-07-12 Thread Jonathan Ellis
the intended meaning of "initial" is "use this the first time you start up; it will be ignored after that, if you use nodetool to move it around." Sorry for the confusion. On Tue, Jul 12, 2011 at 12:53 PM, Eric tamme wrote: > On Tue, Jul 12, 2011 at 3:32 PM, Jonathan Ellis wrote: >> You're goin

Re: sstabletojson

2011-07-12 Thread Jonathan Ellis
You can upgrade to 0.8.1 to fix this. :) On Tue, Jul 12, 2011 at 1:03 PM, Stephen Pope wrote: >  Hey there. I'm trying to convert one of my sstables to json, but it doesn't > appear to be escaping quotes. As a result, I've got a line in my resulting > json like this: > > "3230303930373139313734

Re: ReplicateOnWrite issues

2011-07-12 Thread Sylvain Lebresne
On Tue, Jul 12, 2011 at 11:42 PM, David Hawthorne wrote: > Well, I was using a large number of clients:  I tried configuring a hector > pool of 20-200 to see what affect that had on throughput. There's definitely > a point after which there's no gain, so I dialed it back down.  To clarify a > f

CL.ALL & counters

2011-07-12 Thread Philippe
Hello, I'm testing using a 2 node cluster. My CF only has counters and I have replicate_on_write=true. I loaded data into the CF using CL=ALL. At some point, it couldn't write anymore (looked like a GC froze one of the machines) which makes sense. However, I happened to run nodetool repair it is h

Re: Questions about Cassandra reads

2011-07-12 Thread Philippe
Hi Jonathan, Thanks for the answer, I wanted to report on the improvements I got because someone else is bound to run into the same questions... > > C) I want to access a key that is at the 50th position in that table, > > Cassandra will seek position 0 and then do a sequential read of the file >

Re: ReplicateOnWrite issues

2011-07-12 Thread David Hawthorne
On Jul 12, 2011, at 3:02 PM, Sylvain Lebresne wrote: > On Tue, Jul 12, 2011 at 11:42 PM, David Hawthorne > wrote: >> Well, I was using a large number of clients: I tried configuring a hector >> pool of 20-200 to see what affect that had on throughput. There's definitely >> a point after whic

Re: Out of memory error in cassandra

2011-07-12 Thread Anurag Gujral
Hi Jonathan, Thanks for your mail. But no-one of the things mentioned in the link pertains to OOM error I we are seeing. thanks Anurag On Tue, Jul 12, 2011 at 10:42 AM, Jonathan Ellis wrote: > Have you seen > http://www.datastax.com/docs/0.8/troubleshooting/index#nodes-are-d

Re: Survey: Cassandra/JVM Resident Set Size increase

2011-07-12 Thread Sasha Dolgy
I'll post more tomorrow ... However, we set up one node in a single node cluster and have left it with no datareviewing memory consumption graphs...it increased daily until it gobbled (highly technical term) all memory...the system is now running just below 100% memory usagewhich i find pec

Re: ReplicateOnWrite issues

2011-07-12 Thread Sylvain Lebresne
On Wed, Jul 13, 2011 at 12:18 AM, David Hawthorne wrote: > > On Jul 12, 2011, at 3:02 PM, Sylvain Lebresne wrote: > >> On Tue, Jul 12, 2011 at 11:42 PM, David Hawthorne >> wrote: >>> Well, I was using a large number of clients:  I tried configuring a hector >>> pool of 20-200 to see what affect

Re: node stuck "leaving"

2011-07-12 Thread Casey Deccio
On Tue, Jul 12, 2011 at 10:10 AM, Brandon Williams wrote: > On Mon, Jul 11, 2011 at 11:51 PM, Casey Deccio wrote: > > java.lang.RuntimeException: Cannot recover SSTable with version f > (current > > version g). > > You need to scrub before any streaming is performed. > > Okay, turns out that my

Re: ReplicateOnWrite issues

2011-07-12 Thread Sylvain Lebresne
On Wed, Jul 13, 2011 at 1:00 AM, David Hawthorne wrote: > Thanks for looking at that. > > Our use case involves supercolumns that have 2-20,000 counters within them.   > For a set of continuous updates to one supercolumn, the behavior you're > describing is: Here's your problem. Don't do that. I

Re: Questions about Cassandra reads

2011-07-12 Thread Jonathan Ellis
Thanks for the update, that is very useful! On Tue, Jul 12, 2011 at 3:16 PM, Philippe wrote: > Hi Jonathan, > Thanks for the answer, I wanted to report on the improvements I got because > someone else is bound to run into the same questions... > >> >> > C) I want to access a key that is at the 50

Re: Out of memory error in cassandra

2011-07-12 Thread Jonathan Ellis
Then you'll want to use MAT to analyze the dump the JVM gave you of the heap at OOM time. (http://www.eclipse.org/mat/) On Tue, Jul 12, 2011 at 3:22 PM, Anurag Gujral wrote: > Hi Jonathan, >     Thanks for  your mail. But no-one of the things > mentioned in the link pertains to O

CQL + Counters = bad request

2011-07-12 Thread Aaron Turner
Using Cassandra 0.8.1 and cql 1.0.3 and following the syntax mentioned in https://issues.apache.org/jira/browse/CASSANDRA-2473 cqlsh> UPDATE RouterAggWeekly SET 1310367600 = 1310367600 + 17 WHERE KEY = '1_20110728_ifoutmulticastpkts'; Bad Request: line 1:51 no viable alternative at character '+'

Re: CQL + Counters = bad request

2011-07-12 Thread Jonathan Ellis
Try quoting the column name. On Tue, Jul 12, 2011 at 5:30 PM, Aaron Turner wrote: > Using Cassandra 0.8.1 and cql 1.0.3 and following the syntax mentioned > in https://issues.apache.org/jira/browse/CASSANDRA-2473 > > cqlsh> UPDATE RouterAggWeekly SET 1310367600 = 1310367600 + 17 WHERE > KEY = '1_

Re: CQL + Counters = bad request

2011-07-12 Thread Aaron Turner
Doesn't seem to help: cqlsh> UPDATE RouterAggWeekly SET '1310367600' = '1310367600' + 17 WHERE KEY = '1_20110728_ifoutmulticastpkts'; Bad Request: line 1:55 no viable alternative at character '+' cqlsh> UPDATE RouterAggWeekly SET 1310367600 = '1310367600' + 17 WHERE KEY = '1_20110728_ifoutmultica

Re: Cassandra scaling problem in virtualized environment

2011-07-12 Thread Jonathan Ellis
As Ryan said, Cassandra's really designed to manage DAS. That's probably a big part of the problem. If you have to use a SAN, I recommend checking out this thread: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-on-iSCSI-td5945217.html On Tue, Jun 14, 2011 at 8:16 AM,

Re: Anyone using Facebook's flashcache?

2011-07-12 Thread Peter Schuller
> Thanks Peter, but... hmmm, are you saying that even after a cache miss which > results in a disk read and blocks being moved to the ssd, that by the next > cache miss for the same data and subsequent same file blocks, that the ssd > is unlikely to have those same blocks present anymore? I am say

Re: Re: Re: AntiEntropy?

2011-07-12 Thread Peter Schuller
> To be sure that I didn't misunderstand (English is not my mother tongue) here > is what the entire "repair paragraph" says ... Read it, I maintain my position - the book is wrong or at the very least strongly misleading. You *definitely* need to run nodetool repair periodically for the reasons

AssertionError: No data found for NamesQueryFilter

2011-07-12 Thread Kyle Gibson
Running version 0.7.6-2, recently upgraded from 0.7.3. I am get a time out exception when I run a particular get_indexed_slices, which results in the following error showing up on a few nodes: ERROR [ReadStage:16] 2011-07-12 23:01:31,424 AbstractCassandraDaemon.java (line 114) Fatal exception in

Re: Anyone using Facebook's flashcache?

2011-07-12 Thread AJ
On 7/12/2011 9:02 PM, Peter Schuller wrote: Thanks Peter, but... hmmm, are you saying that even after a cache miss which results in a disk read and blocks being moved to the ssd, that by the next cache miss for the same data and subsequent same file blocks, that the ssd is unlikely to have those

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread AJ
On 7/12/2011 10:48 AM, Yang wrote: for example, coord writes record 1,2 ,3 ,4,5 in sequence if u have replica A, B, C currently A can have 1 , 3 B can have 1,3,4, C can have 2345 by "prefix", I mean I want them to have only 1---n where n is some number between 1 and 5, for example A having 1,2

Re: Strong Consistency with ONE read/writes

2011-07-12 Thread Yang
that is not an important issue, it's separate from the replication question I'm thinking about. for now I'll just think about the case where every node owns the same key range , or N=RF. > Are you saying:  All replicas will receive the value whether or not they > actually own the key range for th