Re: Help for creating a custom partitioner

2012-10-01 Thread Clement Honore
Hi, thanks for your answer. We plan to use manual indexing too (with native C* indexing for other cases). So, for one index, we will get plenty of FK and a MultiGet call to get all the associated entities, with RP, would then spread all the cluster. As we don't know the cluster size yet, and as i

Re: Help for creating a custom partitioner

2012-10-01 Thread Tim Wintle
On Mon, 2012-10-01 at 10:45 +0200, Clement Honore wrote: > We plan to use manual indexing too (with native C* indexing for other > cases). > So, for one index, we will get plenty of FK and a MultiGet call to get all > the associated entities, with RP, would then spread all the cluster. > As we don'

Re: Prevent queries from OOM nodes

2012-10-01 Thread Віталій Тимчишин
It's not about columns, it's about rows, see example statement. In QueryProcessor#processStatement it reads rows into list, then does list.size() 2012/10/1 aaron morton > CQL will read everything into List to make latter a count. > > > From 1.0 onwards count paginated reading the columns. What v

SST Inconsistency

2012-10-01 Thread Daniel Doubleday
Hi all we are running c* 1.0.8 and found some strange row level tombstone problems. Some rows (~50 in around 2B keys) have markedForDeleteAt timestamps in the future (so they 'drop' all writes) and 0 values as localDeletionTime. A non-thorough check didn't bring up any code paths that could l

Re: downgrade from 1.1.4 to 1.0.X

2012-10-01 Thread Daniel Doubleday
Since I was just fiddling around with sst2json: if you have row level deletes you might get problems since row level deletion info is not exported in at least 1.0. But if you're not using those you might be fine. Віталій Тимчишин wrote: I suppose the way is to convert all SST to json, then i

Re: Help for creating a custom partitioner

2012-10-01 Thread Hiller, Dean
I would be surprised if random partitioner hurt your performance. In general, doing performance tests on a 6 node cluster with PlayOrm Scalable SQL, even joins queries ended up faster as the parallel disks of reading all the rows was way faster than reading from a single machine(remember, one d

Re: Rebalancing cluster

2012-10-01 Thread Darvin Denmian
Hi, 1) Yes. I'm using Random Partitioner 2) All 3 nodes in the cluster have used "auto_bootstrap" to acquire their part of data (I really don't know if this is correct). Regards. On Sat, Sep 29, 2012 at 5:56 AM, Bradford Toney wrote: > Are you using the random partitoner? Did you set up the tok

Re: Rebalancing cluster

2012-10-01 Thread Hiller, Dean
You should check the cassandra.yaml file. There is an initial_token in that file that you should have set. The comment above that property reads # You should always specify InitialToken when setting up a production # cluster for the first time, and often when adding capacity later. # The princi

Advice on correct storage configuration

2012-10-01 Thread Lewis John Mcgibbney
Hi, I wish to confirm whether the current mapping (storage) configuration I have is suited to store data commonly extracted field data from Web Pages. My mapping can be seen here [0] which basically specifies three column families e.g. parse (p), fetch (f) and super columns (sc) within the webpag

Re: Rebalancing cluster

2012-10-01 Thread Darvin Denmian
When I execute the command "nodetool ring" the following output appears: Address DC RackStatus State Load Effective-Ownership Token 15977425217612630204561194765242499 10.36.214.118 datacenter1 rack1 Up Normal 16.62 GB 54.36% 8211980577784

Re: Rebalancing cluster

2012-10-01 Thread Hiller, Dean
Nodetool has a move command so you can move to a new better token. Read up on the documentation there. I have not used it yet myselfŠ.good idea to test it on your test cluster first. Dean On 10/1/12 8:03 AM, "Darvin Denmian" wrote: >as you can see there is no "Zero Token". Maybe I did somethi

Re: Advice on correct storage configuration

2012-10-01 Thread Hiller, Dean
What is really going to matter is what is the applications trying to read? That is really the critical piece of context. Without knowing what the application needs to read, it is very hard to design. One example from a previous post that was a great questions wasŠ 1. I need to get the last 100 r

Getting serialized Rows from CommitLogSegment file

2012-10-01 Thread Felipe Schmidt
Hello. I'm trying to catch the serialized RowMutations from a CommitLogSegment to capture the data change, but I don't have much idea about how to proceed. Some one know a way of how to do it? I supposed that it would be kind of simple. Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS

Re: Cassandra supercolumns with same name

2012-10-01 Thread Cyril Auburtin
Yep Tyler is right It seems I have trailing *\u* (null) characters , (one column name is mymed_embrun.ma...@gmail.com the other mymed_embrun.ma...@gmail.com\u\u for example) I'm trying to know at what point they are created... Thx 2012/9/21 Tyler Hobbs > If you're seeing that in cas

Re: Cassandra supercolumns with same name

2012-10-01 Thread Cyril Auburtin
Yep Tyler is right It seems I have trailing *\u* (null) characters , (one column name is mymed_embrun.ma...@gmail.com the other mymed_embrun.ma...@gmail.com\u\u for example) I'm trying to know at what point they are created... Thx 2012/9/21 Tyler Hobbs > If you're seeing that in cas

[solved] Cassandra columns with same name

2012-10-01 Thread Cyril Auburtin
well don't know why I couldn't reply to the initial thread.. yep Tyler is right I have columns with trailing null chars mycolumn, mycolumn\u\u prints the same in cassandra-cli, but not if you write the output in a file I haven't find where they come from, but no matters the architecture we

Cassandra vs Couchbase benchmarks

2012-10-01 Thread Andy Cobley
There are some interesting results in the benchmarks below: http://www.slideshare.net/renatko/couchbase-performance-benchmarking Without starting a flame war etc, I'm interested if these results should be considered "Fair and Balanced" or if the methodology is flawed in some way ? (for instance i

Re: Cassandra vs Couchbase benchmarks

2012-10-01 Thread Peter Lin
Here is my own experience testing couchdb versus cassandra for an internal application. My test wasn't some dummy test case, it was realistic workloads that is 95% write and 5% read. We insert data in batches to maximize throughput. The critical thing for my use case was to answer "when does the s

Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
Dean, We have the same question... We have thousands of separate feeds of data as well (20,000+). To date, we've been using a CF per feed strategy, but as we scale this thing out to accommodate all of those feeds, we're trying to figure out if we're going to blow out the memory. The initial doc

Re: Cassandra vs Couchbase benchmarks

2012-10-01 Thread Michael Kjellman
From their wiki: "The replication is an incremental one way process involving two databases (a source and a destination). The aim of the replication is that at the end of the process, all active documents on the source database are also in the destination database and all documents that were delete

Re: Collections in native protocol

2012-10-01 Thread Jonathan Rudenberg
Thanks. Do you want me to open a JIRA ticket? On Oct 1, 2012, at 2:45 AM, Sylvain Lebresne wrote: > Ok, I'll look what's wrong. > > -- > Sylvain

Re: Cassandra vs Couchbase benchmarks

2012-10-01 Thread horschi
Hi Andy, things I find odd: - Replicacount=1 for mongo and couchdb. How is that a realistic benchmark? I always want at least 2 replicas for my data. Maybe thats just me. - On the Mongo Config slide they said they disabled journaling. Why do you disable all safety mechanisms that you would want i

Re: Ball is rolling on High Performance Cassandra Cookbook second edition

2012-10-01 Thread Edward Capriolo
Hello all, Work has begun on the second edition! Keep hitting me up with ideas. In particular I am looking for someone who has done work with flume+Cassandra and pig+Cassandra. Both of these things topics will be covered to some extent in the second edition, but these are two instances in which I

Re: Collections in native protocol

2012-10-01 Thread Sylvain Lebresne
Sure. On Mon, Oct 1, 2012 at 5:33 PM, Jonathan Rudenberg wrote: > Thanks. Do you want me to open a JIRA ticket? > > On Oct 1, 2012, at 2:45 AM, Sylvain Lebresne wrote: > >> Ok, I'll look what's wrong. >> >> -- >> Sylvain

Re: 1000's of column families

2012-10-01 Thread Hiller, Dean
Well, I am now thinking of adding a virtual capability to PlayOrm which we currently use to allow grouping entities into one column family. Right now the CF creation comes from a single entity so this then may change for those entities that define they are in a single CF groupŠ.This should not be

read-repair and deletes / forgotten deletes

2012-10-01 Thread Hiller, Dean
I know there is a 10 day limit if you have a node out of the cluster where you better be running read-repair or you end up with forgotten deletes, but what about on a clean cluster with all nodes always available? Shouldn't the deletes eventually take place or does one have to keep running read

Re: read-repair and deletes / forgotten deletes

2012-10-01 Thread Aaron Turner
the 10 days is actually configurable... look into gc_grace. Basically, you always need to run repair once per/gc_grace period. You won't see empty/deleted rows go away until they're compacted away. On Mon, Oct 1, 2012 at 6:32 PM, Hiller, Dean wrote: > I know there is a 10 day limit if you have a

Re: Rebalancing cluster

2012-10-01 Thread Darvin Denmian
Hey, now I got it. I'll try it this night :) Thanks for all replies. Regards. On Mon, Oct 1, 2012 at 11:08 AM, Hiller, Dean wrote: > Nodetool has a move command so you can move to a new better token. Read > up on the documentation there. I have not used it yet myselfŠ.good idea > to test it o

Re: read-repair and deletes / forgotten deletes

2012-10-01 Thread Hiller, Dean
Thanks, (actually new it was configurable) BUT what I don't get is why I have to run a repair. IF all nodes became consistent on the delete, it should not be possible to get a forgotten delete, correct. The forgotten delete will only occur if I have a node down and out for 10 days and it comes ba

Re: read-repair and deletes / forgotten deletes

2012-10-01 Thread Hiller, Dean
Oh, and I have been reading Aaron Mortan's article here http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/ On 10/1/12 12:46 PM, "Hiller, Dean" wrote: >Thanks, (actually new it was configurable) BUT what I don't get is why I >have to run a repair. IF all nodes became consistent on the

Re: read-repair and deletes / forgotten deletes

2012-10-01 Thread Aaron Turner
inline... On Mon, Oct 1, 2012 at 7:46 PM, Hiller, Dean wrote: > Thanks, (actually new it was configurable) BUT what I don't get is why I > have to run a repair. IF all nodes became consistent on the delete, it > should not be possible to get a forgotten delete, correct. The forgotten > delete w

Re: cassandra key cache question

2012-10-01 Thread Tamar Fraenkel
Created https://issues.apache.org/jira/browse/CASSANDRA-4742 Any clue regarding the first question in this thread (key cache > number of keys in CF, and not many deletes on that CF)? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel:

Re: 1000's of column families

2012-10-01 Thread Ben Hood
Brian, On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill wrote: > We haven't committed either way yet, but given Ed Anuff's presentation > on virtual keyspaces, we were leaning towards a single column family > approach: > http://blog.apigee.com/detail/building_a_mobile_data_platform_with_cassandra_-_

Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
Its just a convenient way of prefixing: http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html -brian On Mon, Oct 1, 2012 at 4:22 PM, Ben Hood <0x6e6...@gmail.com> wrote: > Brian, > > On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill wrote: >> We haven't committed either wa

Re: pig and widerows

2012-10-01 Thread aaron morton
That looks like it may be a bug, can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA ? Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/09/2012, at 7:50 AM, William Oberman wrote: > I don't want to switch my cassa

Re: How to stop streaming

2012-10-01 Thread aaron morton
I assume you have resolved this but if someone else is looking… You can stop a validation compaction with nodetool stop AFAIk the only way to stop streaming is to restart the node. You can also use nodetool setstreamthroughput to reduce the streaming throughput. Cheers - Aa

Re: Remove node from cluster and have it run as a single node cluster by itself

2012-10-01 Thread aaron morton
> The other nodes may be trying to connect to it - it may be listed as a > seed node on the other machines? The other nodes will be looking for it. Change the Cluster Name in the yaml file. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On

Re: 1000's of column families

2012-10-01 Thread Ben Hood
On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill wrote: > Its just a convenient way of prefixing: > http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html So given that it is possible to use a CF per tenant, should we assume that there at sufficient scale that there is less

Re: Data Modeling: Comments with Voting

2012-10-01 Thread aaron morton
You cannot (and probably do not want to) sort continually when the voting is going on. You can store the votes using CounterColumnTypes in column values. When someone votes you then (somehow) queue a job that will read the vote counts for the post / comment, pivot and sort on the vote count, a

Re: Prevent queries from OOM nodes

2012-10-01 Thread aaron morton
> It's not about columns, it's about rows, see example statement. My bad, misread the CQL. My jira search fu is poor, but I could not find an open ticket for paging row counts. Could you create one ? https://issues.apache.org/jira/browse/CASSANDRA Cheers - Aaron Morton Freela

Read latency issue

2012-10-01 Thread Arindam Barua
We are trying to setup a Cassandra cluster and have low read latency requirements. Running some tests, we do not see the performance that we were hoping for. Wanted to check if anyone has thoughts on: 1. If these are expected latency times for the data/machine config, etc 2. If not

RE: Data Modeling: Comments with Voting

2012-10-01 Thread Roshni Rajagopal
Hi , To explain my suggestions - my thoughts were a) you need to store entity type information about a comment like date created, comment text, commented by etc. I cant think of any other master information for a comment, but in general one starts with entities in a standard static column fam

Re: How to stop streaming

2012-10-01 Thread Senthilvel Rangaswamy
Thanks Aaron. That's what I did. Also, how do I lookup the current value of the setstreamthroughput. On Mon, Oct 1, 2012 at 2:32 PM, aaron morton wrote: > I assume you have resolved this but if someone else is looking… > > You can stop a validation compaction with nodetool stop > > AFAIk the onl

Re: Why data tripled in size after repair?

2012-10-01 Thread Peter Schuller
> It looks like what I need. Couple questions. > Does it work with RandomPartinioner only? I use ByteOrderedPartitioner. I believe it should work with BOP based on cursory re-examination of the patch. I could be wrong. > I don't see it as part of any release. Am I supposed to build my own > versi