Re: Trove maps

2010-05-05 Thread Avinash Lakshman
Well it wasn't used for any critical operations. So there is no way to have figured what impact it did or did not have. Avinash On Tue, May 4, 2010 at 7:49 PM, Cagatay Kavukcuoglu wrote: > Did removing Trove collections have a noticeable effect on performance > or memory use at the time? > > On

RE: performance tuning - where does the slowness come from?

2010-05-05 Thread Mark Jones
~ 70 million keys (20 bytes each using Random Partitioner) 1.4GB of key data + the structures to support it. Which seems a good bit smaller than the 32GB of RAM available on the 4 machines. How many machines should it take to 2-3000 lookups/second? From: Brandon Williams [mailto:dri...@gmail.

Re: Appropriate use for Cassandra?

2010-05-05 Thread philip andrew
http://www.youtube.com/watch?v=eaCCkfjPm0o 3.30 song begins 4.00 starfish loves you and Cassandra loves you! On Thu, May 6, 2010 at 11:03 AM, Denis Haskin wrote: > i can haz hints pleez? > > On Wed, May 5, 2010 at 9:28 PM, philip andrew > wrote: > > Starfish loves you. > > > > On Wed, May 5, 201

Re: Updating (as opposed to just setting) Cassandra data via Hadoop

2010-05-05 Thread Mark Schnitzius
Apologies, Hadoop recently deprecated a whole bunch of classes and I misunderstood how the new ones work. What I'll be doing is creating an InputFormat class that uses ColumnFamilyInputFormat to get splits from the existing Cassandra data, and merges them with splits from a SequenceFileInputFormat

Re: Appropriate use for Cassandra?

2010-05-05 Thread Denis Haskin
i can haz hints pleez? On Wed, May 5, 2010 at 9:28 PM, philip andrew wrote: > Starfish loves you. > > On Wed, May 5, 2010 at 1:16 PM, David Strauss > wrote: >> >> On 2010-05-05 04:50, Denis Haskin wrote: >> > I've been reading everything I can get my hands on about Cassandra and >> > it sounds l

Re: Appropriate use for Cassandra?

2010-05-05 Thread philip andrew
Starfish loves you. On Wed, May 5, 2010 at 1:16 PM, David Strauss wrote: > On 2010-05-05 04:50, Denis Haskin wrote: > > I've been reading everything I can get my hands on about Cassandra and > > it sounds like a possibly very good framework for our data needs; I'm > > about to take the plunge and

Re: performance tuning - where does the slowness come from?

2010-05-05 Thread Jonathan Ellis
How many columns are in the rows you are reading from? 30ms is quite high, so I suspect you have relatively large rows, in which case decreasing the column index threshold may help. On Wed, May 5, 2010 at 4:59 PM, Ran Tavory wrote: > let's see if I can make some assertions, feel free to correct

Re: Appropriate use for Cassandra?

2010-05-05 Thread Denis Haskin
Hmm... I was actually thinking of the inverse of that: 20K rows (one per entity), with one supercolumn per time-series sample... it would be something like 700,000 supercolumns (1.5 years, to start with) growing to maybe 2,400,000 supercolumns. That may be an issue for our access path needs, howev

Re: About SStable Writer

2010-05-05 Thread Anty
THX schubert. On Thu, May 6, 2010 at 6:56 AM, Jonathan Ellis wrote: > Yes, this is a bug. Patch attached to > https://issues.apache.org/jira/browse/CASSANDRA-1056 > > On Wed, May 5, 2010 at 2:09 AM, Anty wrote: > > HI:All > > > > In source code of 0.6.1 ,in SSTableWriter, > > private void afte

Re: performance tuning - where does the slowness come from?

2010-05-05 Thread Brandon Williams
On Wed, May 5, 2010 at 6:59 PM, Mark Jones wrote: > My data is single row/key to a 500 byte column and I’m reading ALL random > keys (worst case read scenario) Cache has minimal effectiveness, so the > Bloom trees and indexes are getting a real work out. I’m on 8GB Ubuntu 9.10 > boxes (64bit).

RE: performance tuning - where does the slowness come from?

2010-05-05 Thread Mark Jones
My data is single row/key to a 500 byte column and I'm reading ALL random keys (worst case read scenario) Cache has minimal effectiveness, so the Bloom trees and indexes are getting a real work out. I'm on 8GB Ubuntu 9.10 boxes (64bit). Yea, I was griping about the performance earlier, disk i

Re: performance tuning - where does the slowness come from?

2010-05-05 Thread Brandon Williams
On Wed, May 5, 2010 at 6:36 PM, Mark Jones wrote: > Have you actually managed to get 10K reads/second, or are you just > estimating that you can? I’ve run into similar issues, but I never got > reads to scale when searching for unique keys even using 40 threads, I did > discover that using 80+

Re: Is SuperColumn necessary?

2010-05-05 Thread Mike Malone
Nice, Ed, we're doing something very similar but less generic. Now replace all of the various methods for querying with a simple query interface that takes a Predicate, allow the user to specify (in storage-conf) which levels of the nested Columns should be indexed, and completely remove Comparato

Re: About SStable Writer

2010-05-05 Thread Jonathan Ellis
Yes, this is a bug. Patch attached to https://issues.apache.org/jira/browse/CASSANDRA-1056 On Wed, May 5, 2010 at 2:09 AM, Anty wrote: > HI:All > > In source code of 0.6.1 ,in SSTableWriter, > private void afterAppend(DecoratedKey decoratedKey, long dataPosition, int > dataSize) throws IOExcepti

Re: performance tuning - where does the slowness come from?

2010-05-05 Thread Ran Tavory
I haven't tried it. On May 6, 2010 1:22 AM, "Mark Greene" wrote: Ran, Did you find differing results from stress.py? -Mark On Wed, May 5, 2010 at 5:59 PM, Ran Tavory wrote: > > let's see if I can make s...

replacing columns via remove and insert

2010-05-05 Thread Jonathan Shook
When I try to replace a set of columns, like this: 1) remove all columns under a CF/row 2) batch insert columns into the same CF/row .. the columns cease to exist. Is this expected? This is just across 2 nodes with Replication Factor 2 and Consistency Level QUOROM.

Re: Is SuperColumn necessary?

2010-05-05 Thread Jonathan Ellis
Very interesting, thanks! On Wed, May 5, 2010 at 1:31 PM, Ed Anuff wrote: > Follow-up from last weeks discussion, I've been playing around with a simple > column comparator for composite column names that I put up on github.  I'd > be interested to hear what people think of this approach. > > htt

Re: Is SuperColumn necessary?

2010-05-05 Thread Stu Hood
Hey Ed, I've been working on a similar approach for arbitarily nested/compound column names in #998. See: http://github.com/stuhood/cassandra/blob/998/src/java/org/apache/cassandra/db/ColumnKey.java The goal is to provide native support and potentially (in the very long term), API support for

Re: Building on top of Cassandra's core layer

2010-05-05 Thread David Rosenstrauch
On 05/05/2010 12:13 AM, Jonathan Ellis wrote: So I'm wondering: * Anyone know if such a thing has been attempted before? (And, if so, links to any stories about success / failure / tips.) I believe Jun Rao and Sandeep Tata built a kind of chain replication starting from Cassandra 0.4-ish. I

Re: Is SuperColumn necessary?

2010-05-05 Thread Ed Anuff
Follow-up from last weeks discussion, I've been playing around with a simple column comparator for composite column names that I put up on github. I'd be interested to hear what people think of this approach. http://github.com/edanuff/CassandraCompositeType Ed On Wed, Apr 28, 2010 at 12:52 PM,

Re: About SStable Writer

2010-05-05 Thread Schubert Zhang
Hi Jonathan, Could you please have a check this? On Wed, May 5, 2010 at 6:19 PM, Schubert Zhang wrote: > Include d...@cassandra.apache.org > > > On Wed, May 5, 2010 at 3:09 PM, Anty wrote: > >> HI:All >> >> In source code of 0.6.1 ,in SSTableWriter, >> private void afterAppend(DecoratedKey deco

Re: how to get apache cassandra version with thrift client ?

2010-05-05 Thread Ted Zlatanov
On Tue, 27 Apr 2010 19:06:11 -0500 Jonathan Ellis wrote: JE> On Mon, Apr 26, 2010 at 9:46 PM, Shuge Lee wrote: >> I know I can get thrift API version. >> However, I writing a CLI for Cassandra in Python with readline support, >> and it will supports one-key deploy/upgrade cassandra+thrift remot

Re: Anti compaction and readonly compaction?

2010-05-05 Thread Stu Hood
Readonly Compactions are used to hash column families for http://wiki.apache.org/cassandra/ArchitectureAntiEntropy . Roger's link refers to anticompaction specifically. -Original Message- From: "Weijun Li" Sent: Wednesday, May 5, 2010 11:29am To: user@cassandra.apache.org Subject: Re: A

Re: Use binary memtable to load data

2010-05-05 Thread Jonathan Ellis
Yes. On Wed, May 5, 2010 at 11:06 AM, Weijun Li wrote: > So when you bundle list of keys into a binary table you don't need to worry > about grouping them by nodes or whatever, and StorageProxy will take care of > routing of these keys to the proper nodes. Is this correct? > > Thanks, > > -Weijun

Re: Anti compaction and readonly compaction?

2010-05-05 Thread Weijun Li
Thanks Roger! It helps a lot! On Wed, May 5, 2010 at 9:20 AM, Roger Schildmeijer wrote: > http://wiki.apache.org/cassandra/Streaming > > On 5 maj 2010, at 18.18em, Weijun Li wrote: > > What's the purpose of anti-compaction? In what scenario does Cassandra need > to split bit sstables into smalle

Re: Anti compaction and readonly compaction?

2010-05-05 Thread Roger Schildmeijer
http://wiki.apache.org/cassandra/Streaming On 5 maj 2010, at 18.18em, Weijun Li wrote: > What's the purpose of anti-compaction? In what scenario does Cassandra need > to split bit sstables into smaller piece? Also I noticed readonly compaction > in the code. What's the use of this compaction t

Anti compaction and readonly compaction?

2010-05-05 Thread Weijun Li
What's the purpose of anti-compaction? In what scenario does Cassandra need to split bit sstables into smaller piece? Also I noticed readonly compaction in the code. What's the use of this compaction type? Thanks, -Weijun

Re: performance tuning - where does the slowness come from?

2010-05-05 Thread Jonathan Ellis
- your key cache isn't warm. capacity 17M, size 0.5M, 468083 reads sounds like most of your reads have been for unique keys. - the kind of reads you are doing can have a big effect (mostly number of columns you are asking for). column index granularity plays a role (for non-rowcached reads); so

Re: Use binary memtable to load data

2010-05-05 Thread Weijun Li
So when you bundle list of keys into a binary table you don't need to worry about grouping them by nodes or whatever, and StorageProxy will take care of routing of these keys to the proper nodes. Is this correct? Thanks, -Weijun On Tue, May 4, 2010 at 9:14 PM, Jonathan Ellis wrote: > On Tue, M

Re: Cassandra Streaming Service

2010-05-05 Thread Weijun Li
Thank you Jonathan! Good to know. On Tue, May 4, 2010 at 9:13 PM, Jonathan Ellis wrote: > The Streaming service is what moves data around for load balancing, > bootstrap, and decommission operations. > > On Tue, May 4, 2010 at 8:08 PM, Weijun Li wrote: > > A dumb question: what is the use of Ca

Re: Cassandra training on May 21 in Palo Alto

2010-05-05 Thread Jonathan Ellis
We're exploring other locations, yes. I'm skeptical that we can achieve comparable quality online as with in-person training, though. On Wed, May 5, 2010 at 9:53 AM, Scott Mann wrote: > I'd certainly be interested in attending such a course, but I'd never > get the company to pay for a trip to t

Re: Cassandra training on May 21 in Palo Alto

2010-05-05 Thread Scott Mann
I'd certainly be interested in attending such a course, but I'd never get the company to pay for a trip to the bay area for a one day class. Any thought about taking this on the road? Or running it online? On Tue, May 4, 2010 at 8:56 PM, Vick Khera wrote: > On Tue, May 4, 2010 at 8:50 PM, Jonatha

RE: sstable2jason bat script on windows

2010-05-05 Thread Dop Sun
CASSANDRA-1051 created and files attached. I'm happy to follow this if any changes required. Thanks, Regards, Dop -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, May 05, 2010 10:28 PM To: user@cassandra.apache.org Subject: Re: sstable2jason bat script

Re: Add a new keyspace in cassandra cluster

2010-05-05 Thread Yu-Chun Chang
Thank you very much for your help! Best regards, Yu-Chun On Wed, May 5, 2010 at 8:51 PM, Gary Dusbabek wrote: > Yes. You need to edit the configuration of every machine in the cluster. > > Gary. > > On Wed, May 5, 2010 at 06:27, Yu-Chun Chang wrote: > > Dear all, > > I'm a starter for playing

Re: sstable2jason bat script on windows

2010-05-05 Thread Jonathan Ellis
can you create a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your .bat files? On Wed, May 5, 2010 at 8:32 AM, Dop Sun wrote: > Thanks for your reply. > > Since I need to import several hundred lines of data into my laptop instance > (singe node), I tried to created two ba

Re: UUIDs must be exactly 16 bytes ?

2010-05-05 Thread Jonathan Ellis
to elaborate, your row key is a uuid, but not your column name On Wed, May 5, 2010 at 4:46 AM, roger schildmeijer wrote: > "'CompareWith' tells Cassandra how to sort the columns for slicing > operations". It looks like your column name isn't using the correct type > (TimeUUIDType). > > // Roger S

Re: cassandra jvm crash in GCTaskThread

2010-05-05 Thread Jonathan Ellis
Yes, those are the recommended versions (_20 is just _19 with some security fix that doesn't affect Cassandra). On Wed, May 5, 2010 at 2:33 AM, Ran Tavory wrote: > has anyone tried jvm _20 or _19 with cassandra? > > On Wed, May 5, 2010 at 10:00 AM, Nathan McCall > wrote: >> >> Ran, >> You may wa

Re: Updating (as opposed to just setting) Cassandra data via Hadoop

2010-05-05 Thread Jonathan Ellis
I'm a little confused. CombineFileInputFormat is designed to combine multiple small input splits into one larger one. It's not for merging data (that needs to be part of the reduce phase). Maybe I'm misunderstanding what you're saying. On Tue, May 4, 2010 at 10:53 PM, Mark Schnitzius wrote: >

RE: sstable2jason bat script on windows

2010-05-05 Thread Dop Sun
Thanks for your reply. Since I need to import several hundred lines of data into my laptop instance (singe node), I tried to created two bat file based on sstable2jason and jason2sstable, and roughly tested, works for me so far. I have uploaded these two bat files: http://code.google.com/p/jassa

Re: Add a new keyspace in cassandra cluster

2010-05-05 Thread Gary Dusbabek
Yes. You need to edit the configuration of every machine in the cluster. Gary. On Wed, May 5, 2010 at 06:27, Yu-Chun Chang wrote: > Dear all, > I'm a starter for playing Cassandra. I've set up three machines (called > PC_A, PC_B, PC_C) and run them as a Cassandra cluster. > I've inserted a reco

Re: Appropriate use for Cassandra?

2010-05-05 Thread David Strauss
Given that your current schema has ~18 small columns per row, adding a level by using supercolumns may make sense for you because the limitation of unserializing a whole supercolumn at once isn't going to be a problem for you. 20K supercolumns per row with ~18 small subcolumns each is completely r

Add a new keyspace in cassandra cluster

2010-05-05 Thread Yu-Chun Chang
Dear all, I'm a starter for playing Cassandra. I've set up three machines (called PC_A, PC_B, PC_C) and run them as a Cassandra cluster. I've inserted a record in the running cluster and saw it did work through three machines. Then I shutdowned Cassandra of PC_A, and modified storage-conf.xml to a

Re: Appropriate use for Cassandra?

2010-05-05 Thread Denis Haskin
David -- thanks for the thoughts. In re: your question > Does the random partitioner support what you need? I guess my answer is "I'm not sure yet", but also my initial thought was that we'd use the (or a) OrderPreservingPartitioner so that we could use range scans and that rows for a given entit

Re: eventuality

2010-05-05 Thread Даниел Симеонов
Hi, You are right, but I have the feeling that this is a different use cases, i.e. we have a happy case (eventuality when everything is up and working) and not so happy one to say. Best regards, Daniel. 2010/5/5 Peter Schüller > >I have one question about the eventuality, i.e. do you know

Re: eventuality

2010-05-05 Thread Peter Schüller
>    I have one question about the eventuality, i.e. do you know what are the > variables from which it depends. Well the most obvoius is the > ConsistencyLevel, so lets assume it is set to ONE. The question is that the > eventuallity is the relative time to spread changes across the cassandra > no

Re: Login failure with SimpleAuthenticator

2010-05-05 Thread Julio Carlos Barrera Juez
Thank you very much! It works! 2010/5/3 roger schildmeijer > You need to define two more properties: passwd.properties and > access.properties (hint > -Dpasswd.properties=/user/schildmeijer/cassandra/conf/passwd.properties and > analogous for access.properties) > > > > // Roger Schildmeijer > >

eventuality

2010-05-05 Thread Даниел Симеонов
Hi, I have one question about the eventuality, i.e. do you know what are the variables from which it depends. Well the most obvoius is the ConsistencyLevel, so lets assume it is set to ONE. The question is that the eventuallity is the relative time to spread changes across the cassandra nodes. I

Re: About SStable Writer

2010-05-05 Thread Schubert Zhang
Include d...@cassandra.apache.org On Wed, May 5, 2010 at 3:09 PM, Anty wrote: > HI:All > > In source code of 0.6.1 ,in SSTableWriter, > private void afterAppend(DecoratedKey decoratedKey, long dataPosition, int > dataSize) throws IOException > { > String diskKey = partitioner.convert

Re: Skip large size (Configurable) SSTable in minor or/and major compaction

2010-05-05 Thread Schubert Zhang
Replace the CASSANDRA-1041-0.6.1.patch. We found it is difficult distinguish major and minor compaction in current codebase. So, only one optional attribute for ColumnFamily is provided here: CompactSkipInGB. Maybe the use whose application need not delete operations can use this patch. On Tue,

Re: Best way to store millisecond-accurate data

2010-05-05 Thread Даниел Симеонов
Hi "In practice, one would want to model their data such that the 'row has too much columns' scenario is prevented." I am curious how really to prevent this, if the data is sharded with one day granularity, nothing stops the client to insert enormous amount of new columns (very often it is n

Re: UUIDs must be exactly 16 bytes ?

2010-05-05 Thread roger schildmeijer
"'CompareWith' tells Cassandra how to sort the *columns* for slicing operations". It looks like your column name isn't using the correct type (TimeUUIDType). // Roger Schildmeijer On Wed, May 5, 2010 at 11:34 AM, Shuge Lee wrote: > Hi all: > > in storage-conf.xml > ... > > > > > org.apache.

UUIDs must be exactly 16 bytes ?

2010-05-05 Thread Shuge Lee
Hi all: in storage-conf.xml ... org.apache.cassandra.locator.RackUnawareStrategy 1 org.apache.cassandra.locator.EndPointSnitch ... Python code #!/usr/bin/env python import uuid from pprint import pprint as pp import pycassa from cassandra.ttypes import ConsistencyLevel keyspace =

Write Cassandra Books - Packt Publishing

2010-05-05 Thread Kshipra Singh
Hi All, I am writing to you for Packt Publishing, the publishers of computer related books. We are planning to expand the catalogue of our books on databases and are looking forward to publish some books on NoSQL. Currently we are inviting people interested in writing NoSQL books for Packt.

Re: cassandra jvm crash in GCTaskThread

2010-05-05 Thread Ran Tavory
has anyone tried jvm _20 or _19 with cassandra? On Wed, May 5, 2010 at 10:00 AM, Nathan McCall wrote: > Ran, > You may want to upgrade your jvm. I had a similar core with a heavily > loaded tomcat recently and found the following on the release notes > page for 1.6.0_18: > "Card-Marking Optimizat

Re: cassandra jvm crash in GCTaskThread

2010-05-05 Thread Ran Tavory
I'll try that, thanks Nate On Wed, May 5, 2010 at 10:00 AM, Nathan McCall wrote: > Ran, > You may want to upgrade your jvm. I had a similar core with a heavily > loaded tomcat recently and found the following on the release notes > page for 1.6.0_18: > "Card-Marking Optimization Issue > > A flaw

About SStable Writer

2010-05-05 Thread Anty
HI:All In source code of 0.6.1 ,in SSTableWriter, private void afterAppend(DecoratedKey decoratedKey, long dataPosition, int dataSize) throws IOException { String diskKey = partitioner.convertToDiskFormat(decoratedKey); bf.add(diskKey); lastWrittenKey = decoratedKey;

Re: cassandra jvm crash in GCTaskThread

2010-05-05 Thread Nathan McCall
Ran, You may want to upgrade your jvm. I had a similar core with a heavily loaded tomcat recently and found the following on the release notes page for 1.6.0_18: "Card-Marking Optimization Issue A flaw in the implementation of a card-marking performance optimization in the JVM can cause heap corru