Re: Newbie question about writer/reader consistency

2011-12-27 Thread Vladimir Mosgalin
Hi R. Verlangen! On 2011.12.27 at 15:50:24 +0100, R. Verlangen wrote next: > You might consider a hybrid solution with a transactional db for all data > that should be ACID complient and Cassandra for the huge amounts of data > you want to store. > > 2011/12/27 Radim Kolar > > > > > makes me

Re: Presentations from NYC?

2011-12-27 Thread Brian O'Neill
Yep. They put them up here: http://www.datastax.com/events/cassandranyc2011/presentations -brian On Dec 27, 2011, at 4:52 AM, Alain RODRIGUEZ wrote: > Anything new about this ? > > I'm specifically interestead in the Joe Stein (Medialets) talk about how to > manage real-time multidimensional

Re: better anti OOM

2011-12-27 Thread Radim Kolar
I don't know what you are basing that on. It seems unlikely to me that the working set of a compaction is 600 MB. However, it may very well be that the allocation rate is such that it contributes to an additional 600 MB average heap usage after a CMS phase has completed. I will investigate situa

Previously deleted rows resurrected by repair?

2011-12-27 Thread Jonas Borgström
Hi, I Have a 3 node cluster running Cassandra 1.0.3 and using replication factor=3. Recently I've noticed that some previously deleted rows have started to reappear for some reason. And now I wonder if this is a known issue with 1.0.3? Repairs have been running every weekend (gc_grace is 1

Re: Newbie question about writer/reader consistency

2011-12-27 Thread R. Verlangen
You might consider a hybrid solution with a transactional db for all data that should be ACID complient and Cassandra for the huge amounts of data you want to store. 2011/12/27 Radim Kolar > > makes me feel disappointed about consistency in Cassandra, but I wonder is >> there is a way to work a

Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Brian O'Neill
Kevin, I just pulled the code and read through the design. Great stuff. Any thought to potentially using this for real-time processing as well? Right now, we have a set of Hadoop M/R jobs that operate against Cassandra for ETL. We were looking at using Storm for the real-time processing side

Re: Presentations from NYC?

2011-12-27 Thread Alain RODRIGUEZ
Anything new about this ? I'm specifically interestead in the Joe Stein (Medialets) talk about how to manage real-time multidimensional metrics. 2011/12/10 Jonathan Ellis > Not yet -- we're working on it. > > On Fri, Dec 9, 2011 at 1:48 PM, Brian O'Neill > wrote: > > > > I may have missed it..

index sampling

2011-12-27 Thread Radim Kolar
> That is a good reason for both to be configurable IMO. index sampling is currently configurable only per node, it would be better to have it per Keyspace because we are using OLTP like and OLAP keyspaces in same cluster. OLAP Keyspaces has about 1000x more rows. But its difficult to estimate

simplest example of a query by date range

2011-12-27 Thread Michael Cetrulo
I want to store an ID and a date and I want to retrieve all entries from dateA up to dateB, what exactly do I need to be able to perform: select from my_column_family where date >= dateA and date < dateB; @so: http://stackoverflow.com/q/8638646/226201

Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Kevin Burton
> A key innovation here is a partitioning layout algorithm that can support >> fast >> many to many recovery similar to HDFS but still support partitioned >> operation >> with deterministic key placement. >> > > Thanks for your contribution. > > Is here more detail info on this point? > yes... our

Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Zhu Han
On Tue, Dec 27, 2011 at 2:31 PM, Kevin Burton wrote: > > I'm pleased to announce Peregrine 0.5.0 - a new map reduce framework > optimized > for iterative and pipelined map reduce jobs. > > http://peregrine_mapreduce.bitbucket.org/ > > This originally started off with some internal work at Spinn3r

Re: Newbie question about writer/reader consistency

2011-12-27 Thread Radim Kolar
makes me feel disappointed about consistency in Cassandra, but I wonder is there is a way to work around it. cassandra is not suitable for this kind of programs. CouchDB is slightly better, it has transactions but no locking and i am not sure if transaction isolation is supported now. mongodb

Re: better anti OOM

2011-12-27 Thread Peter Schuller
> I will investigate situation more closely using gc via jconsole, but isn't > bloom filter for new sstable entirely in memory? On disk there are only 2 > files Index and Data. > -rw-r--r--  1 root  wheel   1388969984 Dec 27 09:25 > sipdb-tmp-hc-4634-Index.db > -rw-r--r--  1 root  wheel  1096522137

Re: Newbie question about writer/reader consistency

2011-12-27 Thread Radim Kolar
> But is there any way of implementing minimum required ACID subset on top of Cassandra? try this, its nosql ACID compliant. I haven't tested this, it will have most likely pretty slow writes and lot of bugs like any other oracle application. http://www.oracle.com/technetwork/database/nosqld

Re: index sampling

2011-12-27 Thread Peter Schuller
> on node with 300m rows (small node), it will be 585937 index sample entries > with 512 sampling. lets say 100 bytes per entry this will be 585 MB, bloom > filters are 884 MB. With default sampling 128, sampled entries will use > majority of node memory. Index sampling should be reworked like bloo

will compaction delete empty rows after all columns expired?

2011-12-27 Thread Feng Qu
Compaction should delete empty rows once gc_grace_seconds is passed, right?    Feng Qu

Re: will compaction delete empty rows after all columns expired?

2011-12-27 Thread Peter Schuller
> Compaction should delete empty rows once gc_grace_seconds is passed, right? Yes. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: will compaction delete empty rows after all columns expired?

2011-12-27 Thread Peter Schuller
>> Compaction should delete empty rows once gc_grace_seconds is passed, right? > > Yes. But just to be extra clear: Data will not actually be removed once the row in question participates in compaction. Compactions will not be actively triggered by Cassandra for tombstone processing reasons. --

Re: will compaction delete empty rows after all columns expired?

2011-12-27 Thread Radim Kolar
But just to be extra clear: Data will not actually be removed once the row in question participates in compaction. Compactions will not be actively triggered by Cassandra for tombstone processing reasons. leveled compaction is really good for this because it compacts often

improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread Igor Lino
Hi! I was trying to get an understanding of the real strengths of Cassandra against other competitors. Its actually not that simple and depends a lot on details on the actual requirements. Reading the following comparison: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis It felt like

Re: improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread Edward Capriolo
This is not really a comparison of anything because each NoSQL has its own bullet points like: Boats great for traveling on water Cars great for traveling on land So the conclusion I should gather is? Also as for the Cassandra bullet points, they are really thin (and wrong). Such as: Cassan

Re: improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread Peter Schuller
> Also when comparing these technologies very subtle differences in design > have profound in effects in operation and performance. Thus someone trying > to paper over 6 technologies and compare them with a few bullet points is > really doing the world an injustice. +1. Same goes for 99% of all be

Restart for change of endpoint_snitch ?

2011-12-27 Thread A J
If I change endpoint_snitch from SimpleSnitch to PropertyFileSnitch, does it require restart of cassandra on that node ? Thanks.

new configurable bloom filters - coming soon

2011-12-27 Thread Radim Kolar
demo, it will be in cassandra 1.0.7 standard cassa bloom filter -rw-r--r-- 1 root wheel 19307376721 Dec 27 20:06 sipdb-hc-4634-Data.db -rw-r--r-- 1 root wheel 63 Dec 27 20:06 sipdb-hc-4634-Digest.sha1 -rw-r--r-- 1 root wheel770714896 Dec 27 20:06 sipdb-hc-4634-Filter.db -rw

Re: better anti OOM

2011-12-27 Thread Radim Kolar
> How large is the bloom filters in total? I.e., sizes of the *-Filter.db files. On moderate node about 6.5 GB, index sampling will be about 4 GB, heap 12 gb. > In general, don't expect to be able to run at close to heap capacity; there *will* be spikes. i try to tune for 80% of heap.

Re: better anti OOM

2011-12-27 Thread Peter Schuller
>> In general, don't expect to be able to run at close to heap capacity; >> there *will* be spikes. > i try to tune for 80% of heap. Just FYI, at 80% target heap usage you're likely to have fallbacks to full compacting GC:s is my guess. If you are doing analytics only and aren't latency critical,

Re: improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread Igor Lino
You are totally right. I'm far from being an expert on the subject, but the comparison felt inconsistent and incomplete. (I could not express that in my 1st email, not to bias the opinion) Do you know of any similar comparison, which is not biased towards some particular technology or solution

Re: improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread CharSyam
Don't trust NoSQL Benchmark. It's not a lie. but. NoSQL has different performance in many different environment. Do Benchmark with your real environment. and choose it. Thank you. 2011/12/28 Igor Lino > You are totally right. I'm far from being an expert on the subject, but > the comparison fe

Re: Restart for change of endpoint_snitch ?

2011-12-27 Thread Peter Schuller
> If I change endpoint_snitch from SimpleSnitch to PropertyFileSnitch, > does it require restart of cassandra on that node ? Yes. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: better anti OOM

2011-12-27 Thread Edward Capriolo
I do major companions and I have ran into bloom filters causing oom. One trick I did was using nodetool to lower the size of row/key caches before triggering the compact and raising them after companion finished. As suggested running with spare heap is a very good idea it lowers the chance of a sto

cassandra site wsod's /mysql site functions

2011-12-27 Thread Tim Dunphy
hello, I am new to the world of non-relational databases. Cassandra is refreshingly easy to setup and has a great command line environment. I genuinely like the command line tools and look forward to learning more. However I have been asked to setup a php/cassandra site that also has some mysql