date:20100504

Re: Appropriate use for Cassandra?

2010-05-04 Thread David Strauss

On 2010-05-05 04:50, Denis Haskin wrote: > I've been reading everything I can get my hands on about Cassandra and > it sounds like a possibly very good framework for our data needs; I'm > about to take the plunge and do some prototyping, but I thought I'd > see if I can get a reality check here on

Appropriate use for Cassandra?

2010-05-04 Thread Denis Haskin

I've been reading everything I can get my hands on about Cassandra and it sounds like a possibly very good framework for our data needs; I'm about to take the plunge and do some prototyping, but I thought I'd see if I can get a reality check here on whether it makes sense. Our schema should be fai

Re: Export to another cassandra cluster

2010-05-04 Thread Jonathan Ellis

I would lightly hack sstable2json to write rows to the other cluster, instead of spitting them out as json. That would be a pretty simple modification. On Tue, May 4, 2010 at 9:21 PM, Joost Ouwerkerk wrote: > I want to export data from one cassandra cluster (production) to > another (development

Re: Use binary memtable to load data

2010-05-04 Thread Jonathan Ellis

On Tue, May 4, 2010 at 8:09 PM, Weijun Li wrote: > Does anyone use binary memtable to import data into Cassandra? Yes. > When you do > this how do you determine the destination node that should own those data? You let the StorageProxy API figure that out. > Is replication factor taken into con

Re: Cassandra Streaming Service

2010-05-04 Thread Jonathan Ellis

The Streaming service is what moves data around for load balancing, bootstrap, and decommission operations. On Tue, May 4, 2010 at 8:08 PM, Weijun Li wrote: > A dumb question: what is the use of Cassandra streaming service? Any use > case or example? > > Thanks, > -Weijun > -- Jonathan Ellis

Re: Building on top of Cassandra's core layer

2010-05-04 Thread Jonathan Ellis

On Tue, May 4, 2010 at 4:55 PM, David Rosenstrauch wrote: > I've had some neat ideas that I'd like to tinker with for a distributed DB > that implements a very different data model than Cassandra. However, I > obviously don't want to reinvent the wheel - particularly because in the > case of dist

Re: Getting all the keys from a ColumnFamily ?

2010-05-04 Thread Jonathan Ellis

get_range_slices with an empty list of column names should work On Tue, May 4, 2010 at 3:02 PM, Chris Dean wrote: > I have a ColumnFamily with a small number of keys, but each key has a > large number of columns. > > What's the best way to get just the keys back? I don't want to load all > the c

Updating (as opposed to just setting) Cassandra data via Hadoop

2010-05-04 Thread Mark Schnitzius

I have a situation where I need to accumulate values in Cassandra on an ongoing basis. Atomic increments are still in the works apparently (see https://issues.apache.org/jira/browse/CASSANDRA-721) so for the time being I'll be using Hadoop, and attempting to feed in both the existing values and th

Re: Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Vick Khera

On Tue, May 4, 2010 at 8:50 PM, Jonathan Ellis wrote: > Yes, although when and where are TBD. > Having it the day before/after Velocity conference at the end of June would be ideal (hint, hint). I'm sure a lot of people with interest in Cassandra will be in the area.

Re: Trove maps

2010-05-04 Thread Cagatay Kavukcuoglu

Did removing Trove collections have a noticeable effect on performance or memory use at the time? On Tuesday, May 4, 2010, Avinash Lakshman wrote: > Hahaha, Jeff - I remember scampering to remove those references to the Trove > maps, I think around 2 years ago. > Avinash > > On Tue, May 4, 2010

Export to another cassandra cluster

2010-05-04 Thread Joost Ouwerkerk

I want to export data from one cassandra cluster (production) to another (development). This is not a case of replication, because I just want a snapshot, not a continuous synchronization. I guess my options include 'nodetool snapshot' and 'sstable2json'. In our case, however, the development cl

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook

Ah! Thank you. Explained better here: http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency On Tue, May 4, 2010 at 8:38 PM, Robert Coli wrote: > On 5/4/10 7:16 AM, Jonathan Shook wrote: > >> I may be wrong here. Someone please correct me if I am. >> ... >>

Re: Cassandra and Request routing

2010-05-04 Thread Robert Coli

On 5/4/10 7:16 AM, Jonathan Shook wrote: I may be wrong here. Someone please correct me if I am. ... The ability to set the replication factor on inserts and gets allows you to decide when (if) and how much (little) to pay the price for consistency. You mean "Consistency Level", not "Replicati

Use binary memtable to load data

2010-05-04 Thread Weijun Li

Does anyone use binary memtable to import data into Cassandra? When you do this how do you determine the destination node that should own those data? Is replication factor taken into consideration when you import binary memtable? Thanks, -Weijun

Cassandra Streaming Service

2010-05-04 Thread Weijun Li

A dumb question: what is the use of Cassandra streaming service? Any use case or example? Thanks, -Weijun

Re: Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Jonathan Ellis

Yes, although when and where are TBD. On Tue, May 4, 2010 at 7:38 PM, Mark Greene wrote: > Jonathan, > Awesome! Any plans to offer this training again in the future for those of > us who can't make it this time around? > -Mark > > On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis wrote: >> >> I'll

Re: Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Mark Greene

Jonathan, Awesome! Any plans to offer this training again in the future for those of us who can't make it this time around? -Mark On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis wrote: > I'll be running a day-long Cassandra training class on Friday, May 21. > I'll cover > > - Installation and

Re: Trove maps

2010-05-04 Thread Prashant Malik

;) ya I it was painful On Tue, May 4, 2010 at 10:53 AM, Avinash Lakshman < avinash.laksh...@gmail.com> wrote: > Hahaha, Jeff - I remember scampering to remove those references to the > Trove maps, I think around 2 years ago. > > Avinash > > > On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher wrot

Re: sstable2jason bat script on windows

2010-05-04 Thread Jonathan Ellis

You didn't miss anything. There aren't many .bat files yet. On Tue, May 4, 2010 at 6:29 PM, Dop Sun wrote: > Hi, > > > > As of 0.6.1, I don’t find sstable2jason.bat. I don’t know if I missed > anything? > > > > It will good if we can have one, which can help import/ export data in/ out > develop

sstable2jason bat script on windows

2010-05-04 Thread Dop Sun

Hi, As of 0.6.1, I don't find sstable2jason.bat. I don't know if I missed anything? It will good if we can have one, which can help import/ export data in/ out development machine. Thanks, Regards, Dop

Building on top of Cassandra's core layer

2010-05-04 Thread David Rosenstrauch

I've had some neat ideas that I'd like to tinker with for a distributed DB that implements a very different data model than Cassandra. However, I obviously don't want to reinvent the wheel - particularly because in the case of distributed systems, the wheel is quite complicated and hard to get

Re: strange get_range_slices behaviour v0.6.1

2010-05-04 Thread Jonathan Ellis

On Tue, May 4, 2010 at 4:17 PM, aaron wrote: > I was noticing cases under the random partitioner where keys I expected to > be returned > were not. Can you give a little advice on the expected behaviour of > get_range_slices > with the RP and I'll try to write a JUnit for it. e.g. Is it essentiall

Re: BloomFilter is taking too much memory

2010-05-04 Thread Weijun Li

More insight for this sstable: the ArrayList for IndexSummary has 644195 entries, so total number of entries for this sstable is: 644195*128=~82mil. The problem is that the total bits for its BloomFilter (long[19400551] inside BitSet) is 19400551*64=1241635264, which means each key is taking ~15bit

Re: strange get_range_slices behaviour v0.6.1

2010-05-04 Thread aaron

Thanks Jonathan. After looking at the Lucandra code I realized my confusions has to do with get_range_slices and the RandomPartitioner. When I switched to the OPP I got the expected behaviour. I was noticing cases under the random partitioner where keys I expected to be returned were not.

Re: Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Johan Hilding

On 4 May 2010 23:07, "Jonathan Ellis" wrote: I'll be running a day-long Cassandra training class on Friday, May 21. I'll cover - Installation and configuration - Application design - Basics of Cassandra internals - Operations - Tuning and troubleshooting Details at http://riptanobayarea2010052

Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Jonathan Ellis

I'll be running a day-long Cassandra training class on Friday, May 21. I'll cover - Installation and configuration - Application design - Basics of Cassandra internals - Operations - Tuning and troubleshooting Details at http://riptanobayarea20100521.eventbrite.com/ -- Jonathan Ellis Project C

Re: BloomFilter is taking too much memory

2010-05-04 Thread Jonathan Ellis

BloomFilter is not redundant, because it stores information about _all_ keys while the index summary stores every 1/128 key. On Tue, May 4, 2010 at 3:47 PM, Weijun Li wrote: > Hello, > > We stored about 47mil keys in one Cassandra node and what a memory dump > shows for one of the SStableReader:

BloomFilter is taking too much memory

2010-05-04 Thread Weijun Li

Hello, We stored about 47mil keys in one Cassandra node and what a memory dump shows for one of the SStableReader: SSTableReader: 386MB. Among this 386MB, IndexSummary takes about 231MB but BloomFilter takes 155MB with an embedded huge array long[19.4mil]. It seems that BloomFilter is taking

Getting all the keys from a ColumnFamily ?

2010-05-04 Thread Chris Dean

I have a ColumnFamily with a small number of keys, but each key has a large number of columns. What's the best way to get just the keys back? I don't want to load all the columns if I don't have to. There also isn't necessarily any column names in common between the different rows. Cheers, Chri

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Kyusik Chung

Im using Ubuntu 8.04 on 64 bit hosts on rackspace cloud. Im in the middle of repeating some perf tests, but so far, I get as-good or slightly better read perf by using standard disk access mode vs mmap. So far consecutive tests are returning consistent numbers. Im not sure how to explain it...

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Ran Tavory

it's a 64bit host. when I cancel mmap I see less memory used and zero swapping, but it's slowly growing so I'll have to wait and see. Performance isn't much better, not sure what's the bottleneck now (could also be the application). Now on the same host I see: top - 15:43:59 up 12 days, 4:23, 1

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Jonathan Ellis

Are you using 32 bit hosts? If not don't be scared of mmap using a lot of address space, you have plenty. It won't make you swap more than using buffered i/o. On Tue, May 4, 2010 at 1:57 PM, Ran Tavory wrote: > I canceled mmap and indeed memory usage is sane again. So far performance > hasn't b

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Nathan McCall

You could try mmap_index_only - this would restrict mmap usage to the index files. -Nate On Tue, May 4, 2010 at 11:57 AM, Ran Tavory wrote: > I canceled mmap and indeed memory usage is sane again. So far performance > hasn't been great, but I'll wait and see. > I'm also interested in a way to ca

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Vick Khera

On Tue, May 4, 2010 at 2:57 PM, Ran Tavory wrote: > I'm also interested in a way to cap mmap so I can take advantage of it but > not swap the host to death... > Isn't the point of mmap() to just directly access a file as if it were memory? I can see how it would fool the reporting tools into thi

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Ran Tavory

I canceled mmap and indeed memory usage is sane again. So far performance hasn't been great, but I'll wait and see. I'm also interested in a way to cap mmap so I can take advantage of it but not swap the host to death... On Tue, May 4, 2010 at 9:38 PM, Kyusik Chung wrote: > This sounds just like

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Kyusik Chung

This sounds just like the slowness I was asking about in another thread - after a lot of reads, the machine uses up all available memory on the box and then starts swapping. My understanding was that mmap helps greatly with read and write perf (until the box starts swapping I guess)...is there

Re: Trove maps

2010-05-04 Thread Avinash Lakshman

Hahaha, Jeff - I remember scampering to remove those references to the Trove maps, I think around 2 years ago. Avinash On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher wrote: > Hey, > > History repeating itself a bit, here: one delay in getting Cassandra into > the open source world was removin

Re: Trove maps

2010-05-04 Thread Paul Brown

We went through this with Ode w.r.t. Hibernate. Note that Ode still ships with Hibernate support there, just not with Hibernate libraries in the distribution or with a strong dependence on Hibernate. So, if you made Trove maps optional and provided an adapter, you'd be OK. You just can't bun

Re: Trove maps

2010-05-04 Thread Joe Stump

On May 4, 2010, at 6:24 PM, Tatu Saloranta wrote: > But of course Apache can impose their own, however misguided silly > rules on projects under their umbrella. :-) I smell an -ac'esque patch to Cassandra brewing. ;) --Joe

Re: Trove maps

2010-05-04 Thread Tatu Saloranta

Oh boy... that stupid, stupid bickering about true nature of LGPL. Both Apache Foundation and FSF appeared like little kids arguing over whose dad is stronger (this was few years back, when it was discussed whether LGPL components could be used for Apache License projects) Almost made me explicitly

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Schubert Zhang

1. When initially startup your nodes, please plan your InitialToken of each node evenly. 2. standard On Tue, May 4, 2010 at 9:09 PM, Boris Shulman wrote: > I think that the extra (more than 4GB) memory usage comes from the > mmaped io, that is why it happens only for reads. > > On Tue, May 4, 20

Re: Error in TBaseHelper compareTo(byte [] a , byte [] b)

2010-05-04 Thread Erik Holstad

Thanks Jonathan! Yeah, I will just wait until we are ready for upgrade and hold of on that project for now. Erik

Re: Best way to store millisecond-accurate data

2010-05-04 Thread Miguel Verde

One would use batch processes (e.g. through Hadoop) or client-side aggregation, yes. In theory it would be possible to introduce runtime sharding across rows into the Cassandra server side, but it's not part of its design. In practice, one would want to model their data such that the 'row h

Re: Cassandra 0.6.1 - Help Required to setup Multiple Nodes/Cluster

2010-05-04 Thread Mohammad Mamajiwala

Thanks for prompt reply. As per your reply, my configuration should be like, Node 1: Configuraiton 43.193.211.215 43.193.213.160 Node 2: Configuration 43.193.211.215 43.193.213.160 About replication - In my case it should be 2 as i got two cluster node. Am i right?In C

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook

I may be wrong here. Someone please correct me if I am. There may be a race condition if you aren't increasing your replication factor. If you insert to node A with replication factor 1, and then get from node B with replication factor 1, it should be possible (and even more likely in uneven loadi

Re: Cassandra 0.6.1 - Help Required to setup Multiple Nodes/Cluster

2010-05-04 Thread Shinpei Ohtani

> All other parameters are identical in both servers. I have added some data > from both node > but i am confused on which node data stores. Does it stores in both node > OR only stores in one node from where it has been added. I can retrieve data > from both nodes > but sometime can not. Not sur

Cassandra 0.6.1 - Help Required to setup Multiple Nodes/Cluster

2010-05-04 Thread Mohammad Mamajiwala

Hi, I am very new to Cassandra 0.6.1. I have setup the two node on two different server. I would like to know how data distribution and replication work. Node 1 IP:43.193.211.215Node 2 IP:43.193.213.160 Node 1: Configuraiton 43.193.211.215 Node 2: Configuration 43.193.213.160 4

Re: Trove maps

2010-05-04 Thread Boris Shulman

LGPL ia listed as a part of a forbidden licenses for apache projects (see Excluded Licenses in http://www.apache.org/legal/3party.html)... On Tue, May 4, 2010 at 12:34 PM, Jeff Hammerbacher wrote: > Hey, > > History repeating itself a bit, here: one delay in getting Cassandra into > the open sour

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Boris Shulman

I think that the extra (more than 4GB) memory usage comes from the mmaped io, that is why it happens only for reads. On Tue, May 4, 2010 at 2:02 PM, Jordan Pittier wrote: > I'm facing the same issue with swap. It only occurs when I perform read > operations (write are very fast :)). So I can't he

Re: Best way to store millisecond-accurate data

2010-05-04 Thread Даниел Симеонов

Hi Miguel, I'd like to ask is it possible to have runtime sharding or rows in cassandra, i.e. if the row has too much new columns inserted then create another one row (let's say if the original timesharding is one day per row, then we would have two rows for that day). Maybe batch processes could

Re: Design Query

2010-05-04 Thread Dorin Dragutoiu

2. I have used the same configuration (3 machines with 4GB RAM) and I got an Out of memory error on compactation each time trying to compact 4 x 128MB sstables. Tried different configuration incl Java Opts, same result. When I have used 16GB ram machine everything worked like a charm. Pe 04.05

Re: how to fetch latest data

2010-05-04 Thread vineet daniel

If R + W > N, where R, W, and N are respectively the read replica count, the write replica count, and the replication factor, all client reads will see the most recent write. On Tue, May 4, 2010 at 4:39 PM, vineet daniel wrote: > Hi > > In a cluster of cassandra if we are updating any key/value a

how to fetch latest data

2010-05-04 Thread vineet daniel

Hi In a cluster of cassandra if we are updating any key/value and perform the fetch query on that same key, we get old/stale data. This can be because of Read Repair. Is there any way to fetch the latest updated data from the cluster, as old data stands no significance and showing it to client is

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Jordan Pittier

I'm facing the same issue with swap. It only occurs when I perform read operations (write are very fast :)). So I can't help you with the memory probleme. But to balance the load evenly between nodes in cluster just manually fix their token.(the "formula" is i * 2^127 / nb_nodes). Jordzn On Tue,

Re: Trove maps

2010-05-04 Thread Jeff Hammerbacher

Hey, History repeating itself a bit, here: one delay in getting Cassandra into the open source world was removing its use of the Trove collections library, as the license (LGPL) is not compatible with the Apache 2.0 license. Later, Jeff On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta wrote: >

Re: Design Query

2010-05-04 Thread vineet daniel

As you havent specified all the details pertaining to filters and your data layout (structure) at a very high level what i can suggest is that you need to create a seperate CF for each filter. On Sat, May 1, 2010 at 5:04 PM, Rakesh Rajan wrote: > I am evaluating cassandra to implement activity

Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

2010-05-04 Thread vineet daniel

Reduce GCGraceSeconds in storage.conf, that should work. On Tue, May 4, 2010 at 2:31 PM, vineet daniel wrote: > Only major compactions can clean out obsolete tombstones. > > On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis wrote: > >> On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami >> wrote: >> >

Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

2010-05-04 Thread vineet daniel

Only major compactions can clean out obsolete tombstones. On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis wrote: > On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami > wrote: > > Let me rephrase my question. > > > > How does Cassandra deal with bloom filter's false positives on deleted > records? >

Re: Cassandra and Request routing

2010-05-04 Thread Olivier Mallassi

:) I think this is simpler and I am just stupid I retried with clean data and commit log directories and everything works well. I should have missed something (maybe when I upgraded from 0.5.1 to 0.6) but anyway, I am just in test. On Tue, May 4, 2010 at 8:47 AM, Jonathan Shook wrote: > I

59 matches

Mail list logo