Re: any ways to have compaction use less disk space?

2012-09-24 Thread Віталій Тимчишин
Why so? What are pluses and minuses? As for me, I am looking for number of files in directory. 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view. 700GB/5MB*5 = 70 files, that is too much for single directory, too much memory used for SST data, too huge compaction queue (that le

DunDDD NoSQL and Big Data

2012-09-24 Thread Andy Cobley
Hi All, I'm organising the NoSQL and Big Data track at Developer Day Dundee: This is free mini conference at Dundee University, Dundee Scotland. For the past 2 years we've had a track on NoSQL and had some great speakers. However I don't believe we've had anyone f

Cassandra failures while moving token

2012-09-24 Thread Shashilpi Krishan
Hi Actually problem is that while we move the token in a 12 node cluster we observe cassandra misses (no data as per cassandra for requested row key). As per our understanding we expect that when we move token then that node will first sync up the data as per the new assigned token & only after

Re: compression

2012-09-24 Thread Tamar Fraenkel
Hi! I ran UPDATE COLUMN FAMILY cf_name WITH compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64}; I then ran on all my nodes (3) sudo nodetool -h localhost scrub tok cf_name I have replication factor 3. The size of the data on disk was cut in half in the first node and i

workarounds for

2012-09-24 Thread Radim Kolar
Are there any tested patches around for fixing this issue in 1.0 branch? I have to do keyspace wide flush every 30 seconds to survive delete-only workload. This is very inefficient.

Nodetool repair and Leveled Compaction

2012-09-24 Thread Sergey Tryuber
Hi Guys We've noticed a strange behavior on our 3-nodes staging Cassandra cluster with RF=2 and LeveledCompactionStrategy. When we run "nodetool repair -pr" on a node, the other nodes start "validation" process and when this process is finished one of the other 2 nodes reports that there are app

Re: Varchar indexed column and IN(...)

2012-09-24 Thread Sylvain Lebresne
On Sun, Sep 23, 2012 at 11:30 PM, aaron morton wrote: > If this is intended behavior, could somebody please point me to where this > is > documented? > > It is intended. It is not in fact. We should either refuse the query as "yet unsupported" or we should do the right thing, but returning nothin

Re: downgrade from 1.1.4 to 1.0.X

2012-09-24 Thread Arend-Jan Wijtzes
On Thu, Sep 20, 2012 at 10:13:49AM +1200, aaron morton wrote: > No. > They use different minor file versions which are not backwards compatible. Thanks Aaron. Is upgradesstables capable of downgrading the files to 1.0.8? Looking for a way to make this work. Regards, Arend-Jan > On 18/09/2012

[BETA RELEASE] Apache Cassandra 1.2.0-beta1 released

2012-09-24 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the first beta for the future Apache Cassandra 1.2.0. Let me first stress that this is beta software and as such is *not* ready for production use. The goal of this release is to give a preview of what will become Cassandra 1.2 and to get w

Re: Correct model

2012-09-24 Thread Marcelo Elias Del Valle
2012/9/23 Hiller, Dean > You need to split data among partitions or your query won't scale as more > and more data is added to table. Having the partition means you are > querying a lot less rows. > This will happen in case I can query just one partition. But if I need to query things in multipl

Re: Nodetool repair and Leveled Compaction

2012-09-24 Thread Radim Kolar
Repair process by itself is going well in a background, but the issue I'm concerned is a lot of unnecessary compaction tasks number in compaction tasks counter is over estimated. For example i have 1100 tasks left and if I will stop inserting data, all tasks will finish within 30 minutes. I

Re: Correct model

2012-09-24 Thread Hiller, Dean
I am confused. In this email you say you want "get all requests for a user" and in a previous one you said "Select all the users which has new requests, since date D" so let me answer both… For latter, you make ONE query into the latest partition(ONE partition) of the GlobalRequestsCF which gi

Re: Correct model

2012-09-24 Thread Marcelo Elias Del Valle
2012/9/24 Hiller, Dean > I am confused. In this email you say you want "get all requests for a > user" and in a previous one you said "Select all the users which has new > requests, since date D" so let me answer both… > I have both needs. These are the two queries I need to perform on the mode

RE: Cassandra Counters

2012-09-24 Thread Roshni Rajagopal
Hi folks, I looked at my mail below, and Im rambling a bit, so Ill try to re-state my queries pointwise. a) what are the performance tradeoffs on reads & writes between creating a standard column family and manually doing the counts by a lookup on a key, versus using counters. b) whats the


2012-09-24 Thread Shailesh Bagad

Re: Correct model

2012-09-24 Thread Hiller, Dean
Oh, ok, you were talking about the wide row pattern, right? yes But playORM is compatible with Aaron's model, isn't it? Not yet, PlayOrm supports partitioning one table multiple ways as it indexes the columns(in your case, the userid FK column and the time column) Can I map exactly this using

RE: Cassandra Counters

2012-09-24 Thread Milind Parikh
IMO You would use Cassandra Counters (or other variation of distributed counting) in case of having determined that a centralized version of counting is not going to work. You'd determine the non_feasibility of centralized counting by figuring the speed at which you need to sustain writes and rea

Re: Correct model

2012-09-24 Thread Marcelo Elias Del Valle
Dean, There is one last thing I would like to ask about playOrm by this list, the next questiosn will come by stackOverflow. Just because of the context, I prefer asking this here: When you say playOrm indexes a table (which would be a CF behind the scenes), what do you mean? PlayOrm will

Re: Correct model

2012-09-24 Thread Hiller, Dean
PlayOrm will automatically create a CF to index my CF? It creates 3 CF's for all indices, IntegerIndice, DecimalIndice, and StringIndice such that the ad-hoc tool that is in development can display the indices as it knows the prefix of the composite column name is of Integer, Decimal or String

Re: Is it possible to create a schema before a Cassandra node starts up ?

2012-09-24 Thread Rob Coli
On Fri, Sep 14, 2012 at 7:05 AM, Xu, Zaili wrote: > I am pretty new to Cassandra. I have a script that needs to set up a schema > first before starting up the cassandra node. Is this possible ? Can I create > the schema directly on cassandra storage and then when the node starts up it > will pick

Re: Correct model

2012-09-24 Thread Marcelo Elias Del Valle
Dean, this sounds like magic :D I don't know details about the performance on the index implementations you chose, but it would pay the way to use it in my case, as I don't need the best performance in the world when reading, but I need to assure scalability and have a simple model to maintain. I l

Prevent queries from OOM nodes

2012-09-24 Thread Bryce Godfrey
Is there anything I can do on the configuration side to prevent nodes from going OOM due to queries that will read large amounts of data and exceed the heap available? For the past few days of we had some nodes consistently freezing/crashing with OOM. We got a heap dump into MAT and figured ou

Cassandra compression not working?

2012-09-24 Thread Michael Theroux
Hello, We are running into an unusual situation that I'm wondering if anyone has any insight on. We've been running a Cassandra cluster for some time, with compression enabled on one column family in which text documents are stored. We enabled compression on the column family, utilizing the S

Re: Cassandra compression not working?

2012-09-24 Thread Mike
I forgot to mention we are running Cassandra 1.1.2. Thanks, -Mike On Sep 24, 2012, at 5:00 PM, Michael Theroux wrote: > Hello, > > We are running into an unusual situation that I'm wondering if anyone has any > insight on. We've been running a Cassandra cluster for some time, with > compres

performance for different kinds of row keys

2012-09-24 Thread Marcelo Elias Del Valle
Suppose two cases: 1. I have a Cassandra column family with non-composite row keys = incremental id 2. I have a Cassandra column family with a composite row keys = incremental id 1 : group id Which one will be faster to insert? And which one will be faster to read by incremental

Re: Code example for CompositeType.Builder and SSTableSimpleUnsortedWriter

2012-09-24 Thread Edward Kibardin
Hey... >From my understanding, there are several ways to use composites with SSTableSimpleUnsortedWriter but which is the best? And as usual, code examples are welcome ;) Thanks in advance! On Thu, Sep 20, 2012 at 11:23 PM, Edward Kibardin wrote: > Hi Everyone, > > I'm writing a conversion too

Re: any ways to have compaction use less disk space?

2012-09-24 Thread Aaron Turner
On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин wrote: > Why so? > What are pluses and minuses? > As for me, I am looking for number of files in directory. > 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view. > 700GB/5MB*5 = 70 files, that is too much for single directory,

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-24 Thread Edward Capriolo
Haha Ok. It is not a total waste, but practically your time is better spent in other places. The problem is just about everything is a moving target, schema, request rate, hardware. Generally tuning nudges a couple variables in one direction or the other and you see some decent returns. But each nu


2012-09-24 Thread Siddiqui, Akmal
Thanks, Akmal


2012-09-24 Thread Fred Groen

Re: Secondary index loss on node restart

2012-09-24 Thread aaron morton
Can you contribute your experience to this ticket ? Thanks - Aaron Morton Freelance Developer @aaronmorton On 24/09/2012, at 6:22 AM, Michael Theroux wrote: > Hello, > > We have been noticing

Re: any ways to have compaction use less disk space?

2012-09-24 Thread Edward Capriolo
If you are using ext3 there is a hard limit on number if files in a directory of 32K. EXT4 as a much higher limit (cant remember exactly IIRC). So true that having many files is not a problem for the file system though your VFS cache could be less efficient since you would have a higher inode->dat

Re: [problem with OOM in nodes]

2012-09-24 Thread aaron morton
> What exactly is the problem with big rows? During compaction the row will be passed through a slower two pass processing, this add's to IO pressure. Counting big rows requires that the entire row be read. Repairing big rows requires that the entire row be repaired. I generally avoid rows abo

Re: Cassandra compression not working?

2012-09-24 Thread Fred Groen
You are going to need a fully optimized flux-capacitor for that. On Tue, Sep 25, 2012 at 5:00 AM, Michael Theroux wrote: > Hello, > > We are running into an unusual situation that I'm wondering if anyone has > any insight on. We've been running a Cassandra cluster for some time, with > compressi

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-24 Thread Peter Schuller
> It is not a total waste, but practically your time is better spent in other > places. The problem is just about everything is a moving target, schema, > request rate, hardware. Generally tuning nudges a couple variables in one > direction or the other and you see some decent returns. But each nud


2012-09-24 Thread Vijay
Hi Manu, Glad that you have the issue resolved. If i understand the issue correctly Your cassandra installation had RandomParitioner but the bulk loader configuration (cassandra.yaml) had Murmur3Partitioner? By fixing the cassandra.yaml for the bulk loader the issue got resolved? If not then

RE: Cassandra Counters

2012-09-24 Thread Roshni Rajagopal
Thanks Milind, Has anyone implemented counting in a standard col family in cassandra, when you can have increments and decrements to the count. Any comparisons in performance to using counter column families? Regards,Roshni Date: Mon, 24 Sep 2012 11:02:51 -0700 Subject: RE: Cassandra Counters


2012-09-24 Thread Manu Zhang
I had Murmur3Partitioner for both of them, otherwise bulk loader would have complained since I put them under the same project. I saw some negative token issues of Murmur3Partitioner on JIRA recently so I moved back to RandomPartitioner. Thanks for your concern On Tue, Sep 25, 2012 at 12:49 PM,

Re: Cassandra Counters

2012-09-24 Thread Oleksandr Petrov
Maybe I'm missing the point, but counting in a standard column family would be a little overkill. I assume that "distributed counting" here was more of a map/reduce approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a lot. We're doing some more complex counting (e.q. based on

Re: Nodetool repair and Leveled Compaction

2012-09-24 Thread Sergey Tryuber
Hi Radim Unfortunately number of compaction tasks is not overestimated. The number is decremented one-by-one and this process takes several hours for our 40GB node(( Also, when a lot of compaction tasks appears, we see that total disk space used (via JMX) is doubled and Cassandra really tries to c

RE: Cassandra Counters

2012-09-24 Thread Roshni Rajagopal
Thanks for the reply and sorry for being bull - headed. Once you're past the stage where you've decided its distributed, and NoSQL and cassandra out of all the NoSQL options,Now to count something, you can do it in different ways in cassandra. In all the ways you want to use cassandra's best f