Re: nodetool repair does not return...

2011-08-25 Thread Boris Yen
We tried to dump the stack trace of threads, we noticed that "manual-repair-d08349af-189f-47cb-9cc3-452538ce04d1" daemon prio=10 tid=0x406a3000 nid=0x1890 waiting on condition [0x7f5c97be8000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-25 Thread Jeremy Hanna
As somewhat of a conclusion to this thread, we have resolved the major issue having to do with the hotspots. We were balanced between availability zones in aws/ec2 us-east - a,b,c with the number of nodes in our cluster. However we didn't alternate by rack with the token order. We are using t

Re: help creating data model

2011-08-25 Thread Helder Oliveira
Hello, thanks for your time. I have suggested a SCF but i am still testing the system with CF, making some tests and testing the data flow ( insert / select ). Making subdata as JSON already came into my mind, but it's not possible because later i will need to apply filter to that data, and if

Re: For multi-tenant, is it good to have a key space for each tenant?

2011-08-25 Thread Ryan Lowe
I've been doing multi-tenant with cassandra for a while, and from what I have found, it is better to keep your keyspaces down in number. That said, I have been using composite keys for my multi-tenancy now and it works great: Column Family: User Key: [AccountId]/[UserId] This makes it super han

Is Cassandra suitable for this use case?

2011-08-25 Thread Ruby Stevenson
hi, all - I am very new to Cassandra, please bear with me if this is really a FAQ. We are exploring if Cassandra is suitable use for a data management project. The basic characteristics of the data are the following: - it centers around data files, each data file's size can be very small to very

Re: For multi-tenant, is it good to have a key space for each tenant?

2011-08-25 Thread Terje Marthinussen
Depends of course a lot on how many tenants you have. Hopefully the new off heap memtables is 1.0 may help as well as java gc on large heaps is getting a much bigger issue than memory cost. Regards, Terje On 25 Aug 2011, at 14:20, Himanshi Sharma wrote: > > I am working on similar sort of st

Re: Customized Secondary Index Schema

2011-08-25 Thread Alvin UW
Yes, this is what I am worrying about. 2011/8/24 Ryan King > On Tue, Aug 23, 2011 at 10:03 AM, Alvin UW wrote: > > Hello, > > > > As mentioned by Ed Anuff in his blog and slides, one way to build > customized > > secondary index is: > > We use one CF, each row to represent a secondary index, wi

Removal of old data files

2011-08-25 Thread yuki watanabe
We are using Cassandra 0.8.0 with 8 node ring and only one CF. Every column has TTL of 86400 (24 hours). we also set 'GC grace second' to 43200 (12 hours). We have to store massive amount of data for one day now and eventually for five days if we get more disk space. Even for one day, we do run ou

Column Family names

2011-08-25 Thread Stephen Pope
Using 0.8.2, I've created a column family called "_Schema" (without the quotes). For some reason, I can't seem to list the rows in it from the cli: I've tried: [default@BIM] list _Schema; Syntax error at position 5: unexpected "_" for `list _Schema;`. [default@BIM] list '_Schema'; Syntax error a

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread Evgeniy Ryabitskiy
Hi, If you want to store files with partition/replication, you could use Distributed File System(DFS). Like http://hadoop.apache.org/hdfs/ or any other: http://en.wikipedia.org/wiki/Distributed_file_system Still you could use Cassandra to store any metadata and filepath in DFS. So: Cassandra + H

Re: question about cassandra.in.sh

2011-08-25 Thread Koert Kuipers
hey eric, the one thing i do not agree that it is the element of least surprise. i would argue that the default behavior for *nix appplications is that they find out what their home directory is and operate relative to that. something like: script_dir="$(dirname "$(readlink -f ${BASH_SOURCE[0]})")

RE: Column Family names

2011-08-25 Thread Stephen Pope
Hmm...I've tried changing my column family name to "MySchema" instead. Now the cli is behaving normally, but the OOM error still occurs when I get_range_slices from my code. From: Stephen Pope [mailto:stephen.p...@quest.com] Sent: Thursday, August 25, 2011 11:10 AM To: user@cassandra.apache.org

RE: Column Family names

2011-08-25 Thread Stephen Pope
Never mind. I've got a hard-coded Count on the KeyRange set to 2 billion, which is apparently beyond the maximum allowable. From: Stephen Pope [mailto:stephen.p...@quest.com] Sent: Thursday, August 25, 2011 11:15 AM To: user@cassandra.apache.org Subject: RE: Column Family names Hmm...I've tried

Re: multi-node cassandra config doubt

2011-08-25 Thread Thamizh
Hi Aaron, Thanks a lot for your suggestions. I have got exhausted with below error. It would great if you point me what went wrong with my approach. I wanted to install cassandra-0.8.4 on 3 nodes and to run Map/Reduce job that uploads data from HDFS to Cassandra. I have installed Cassnadra on

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-25 Thread mcasandra
Thanks for the update Jeremy Hanna wrote: > > It appears though that when choosing the non-local replicas, it looks for > the next token in the ring of the same rack and the next token of a > different rack (depending on which it is looking for). Can you please explain this little more? -- V

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread Ruby Stevenson
hi Evgeny I appreciate the input. The concern with HDFS is that it has own share of problems - its name node, which essentially a metadata server, load all files information into memory (roughly 300 MB per million files) and its failure handling is far less attractive ... on top of configuring an

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread Sasha Dolgy
You can chunk the files into pieces and store the pieces in Cassandra... Munge all the pieces back together when delivering back to the client... On Aug 25, 2011 6:33 PM, "Ruby Stevenson" wrote: > hi Evgeny > > I appreciate the input. The concern with HDFS is that it has own > share of problems -

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread Robert Jackson
I believe this is conceptually similar to what Brisk is doing under CassandraFS (HDFS compliant file system on top of cassandra). Robert Jackson [1] - https://github.com/riptano/brisk - Original Message - From: "Sasha Dolgy" To: user@cassandra.apache.org Sent: Thursday, August

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread Ruby Stevenson
hi Sasha - Yes indeed. this solution was in the second part of my original question - it just seems "out of norm" on what people usually use Cassandra for, I guess I am looking for some reassurance before I roll up the sleeve of trying it. Thanks Ruby On Thu, Aug 25, 2011 at 12:36 PM, Sasha Dol

Re: Customized Secondary Index Schema

2011-08-25 Thread Ed Anuff
How many unique last names do you anticipate having? How many characters in the last name do you anticipate keeping in your index? You can easily do the math to figure out how many you could fit on a node. I think you'll find that the ceiling might be quite a bit higher than you think. If you h

Re: slides for "Testing out a slab allocator for Cassandra to reduce GC promotion failures by @stuhood "?

2011-08-25 Thread Ryan King
On Thu, Aug 25, 2011 at 9:33 AM, Yang wrote: > http://twitoaster.com/country-us/lenn0x/testing-out-a-slab-allocator-for-cassandra-to-reduce-gc-promotion-failures-by-stuhood-cassandra-memtables-gc-cc-jointheflock/ > > hi:  I'm interested in learning more about the slaballocator, anyone > has a copy

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread mcasandra
Ruby Stevenson wrote: > > hi Sasha - > > Yes indeed. this solution was in the second part of my original > question - it just seems "out of norm" on what people usually use > Cassandra for, I guess I am looking for some reassurance before I roll > up the sleeve of trying it. > > Thanks > > Rub

Re: slides for "Testing out a slab allocator for Cassandra to reduce GC promotion failures by @stuhood "?

2011-08-25 Thread Yang
hmmm, I somehow came across some links that mentions cassandra SF conference with this one, maybe I was wrong. anyway, found this link that gives a very good background (on Hbase though, ) http://www.cloudera.com/blog/2011/03/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-3

Re: slides for "Testing out a slab allocator for Cassandra to reduce GC promotion failures by @stuhood "?

2011-08-25 Thread Ryan King
On Thu, Aug 25, 2011 at 10:26 AM, Yang wrote: > hmmm, I somehow came across some links that mentions cassandra SF > conference with this one, maybe I was wrong. Yes there was a conference and this topic was discussed a bit. > anyway, found this link that gives a very good background (on Hbase th

Re: For multi-tenant, is it good to have a key space for each tenant?

2011-08-25 Thread Nate McCall
We have a 'virtual keyspaces' feature baked into the Hector client that might be of interest: https://github.com/rantav/hector/wiki/Virtual-Keyspaces On Thu, Aug 25, 2011 at 8:23 AM, Terje Marthinussen wrote: > > Depends of course a lot on how many tenants you have. > Hopefully the new off heap m

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread Ruby Stevenson
hi Robert - This is quite interesting. Now CassandraFS on google.code seems inactive now. I don't see any release out of that. Do you know if Brisk is considered stable at all or still very experimental? thanks Ruby On Thu, Aug 25, 2011 at 12:44 PM, Robert Jackson wrote: > I believe this is

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread Robert Jackson
As far as I know the CassandraFS google code project has nothing to do with the current implementation in Brisk (although I really have no idea about that). Some additional information about CFS in Brisk can be found in the presentations from Cassandra SF 2011 [1]. There is a nice presentation

Re: Customized Secondary Index Schema

2011-08-25 Thread Konstantin Naryshkin
Why are you keeping all your indexes in the same row? We do a similar thing (maintain several indexes over the same data) and we just have an index column family with keys like "dest192.168.0.1" which means destination index of 192.168.0.1. You can do rows like User_Keys_By_Last_Name_adams and

Re: Customized Secondary Index Schema

2011-08-25 Thread Ed Anuff
Agreed, that's what I meant by "there are a lot of simple ways to split it up over multiple rows", assuming it necessary. On Thu, Aug 25, 2011 at 4:24 PM, Konstantin Naryshkin wrote: > Why are you keeping all your indexes in the same row? We do a similar thing > (maintain several indexes over the

Re: Customized Secondary Index Schema

2011-08-25 Thread Alvin UW
Thanks. Assume I use this approach, use the last names as the row keys of secondary index, and use the base column family key as the column name. There may be duplication key issue. We may solve it by composite key, like "adams_1" , "adams_2". Then, we can query these index by range query starting

Re: cassandra unexpected shutdown

2011-08-25 Thread Adi
Ernst, Can you share the logs just before the crash. Specially the GCInspector logs.Check the last reported used heap space and whether it was close to the threshold for full GC. Also how frequent are your OOM crashes? The cassandra default for kicking in full GC is 75% ( -XX:CMSInitiati

Re: Could Not connect to cassandra-cli on windows

2011-08-25 Thread aaron morton
With a fresh cassandra install and a pre built client what error do you get ? Can you connect with node tool ? If not what error ? What about the cassandra CLI ? - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 25/08/2011, at 2:51 AM,

Re: Commit log fills up in less than a minute

2011-08-25 Thread aaron morton
Could you put together some information on this in a ticket and references this one https://issues.apache.org/jira/browse/CASSANDRA-3071 The short term fix is to disable HH. You will still get consistent reads. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton ht

Re: Cassandra-cli not able to find CF after fresh CF insert.

2011-08-25 Thread aaron morton
What about if you do a get ? What happens if you re-start the cassandra-cli ? If you can reproduce the fault with a cli script please create a jira ticket here https://issues.apache.org/jira/browse/CASSANDRA Thanks - Aaron Morton Freelance Cassandra Developer @aaronmorton http:

Re: how to migrate?

2011-08-25 Thread aaron morton
> One final question: should I add new nodes as Brisk instances instead of my > home brew cassandra + hadoop nodes? I've obviously already put in the > pain/effort of learning how to run hadoop + cassandra… yes, make you life easier. > create keyspace civicscience with replication_factor=3 an

Re: Customized Secondary Index Schema

2011-08-25 Thread Konstantin Naryshkin
Well you could group all the duplicate adams as columns in the same row. This has several advantages: * one, I am not sure what partitioner you plan to use, but if you plan to do key range queries over all the same last names, you cannot use a RandomPartitioner since it does not support key ran

Re: nodetool repair does not return...

2011-08-25 Thread aaron morton
That's a thread waiting for other threads / activities to complete. Nothing unusual there. Work out how fair the repair gets. Is there a validation compaction listed in nodetool compactionstats ? Are there any streams running in nodetool netstats ? Look through the logs on the machine you st

Re: help creating data model

2011-08-25 Thread aaron morton
> later i will need to apply filter to that data, Sounds like a read query you should support by denormalising the data. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 25/08/2011, at 10:50 PM, Helder Oliveira wrote: > Hello, >

Re: Removal of old data files

2011-08-25 Thread aaron morton
If cassandra does not have enough disk space to create a new file it will provoke a JVM GC which should result in compacted SStables that are no longer needed been deleted. Otherwise they are deleted at some time in the future. Compacted SSTables have a file written out with a "compacted" extens

Live migrating data from 2 separate cassandra clusters

2011-08-25 Thread Josep Blanquer
Hi, I am looking for an efficient way migrate a portion of the data existing in a Cassandra cluster to another, separate Cassandra cluster. What I need is to solve the typical live migration problem that appears in any "DB sharding" where need to transfer "ownership" of certain rows from DB1 to D

Re: cassandra unexpected shutdown

2011-08-25 Thread Ernst D Schoen-René
Thanks. The only logs I have are system and cassandra. I've included those. I don't have gcinspector logs. I log gc via munin on other machines, but I need to install it on these. On 8/25/11 2:22 PM, Adi wrote: Ernst, Can you share the logs just before the crash. Specially the GC

Re: how to migrate?

2011-08-25 Thread William Oberman
> > > create keyspace civicscience with replication_factor=3 and > strategy_options = [{us-east:3}] and > placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'; > > FYI the replication_factor property with the NTS is incorrect, the next(?) > revision of 0.8 will raise an error

Re: nodetool repair does not return...

2011-08-25 Thread Boris Yen
No pending tasks for compactionstats and netstats. On Fri, Aug 26, 2011 at 6:07 AM, aaron morton wrote: > That's a thread waiting for other threads / activities to complete. Nothing > unusual there. > > Work out how fair the repair gets. Is there a validation compaction listed > in nodetool compa

Cassandra 082 - Large swap memory

2011-08-25 Thread King JKing
Dear all, My Cassandra 082 server had very large swap memory. JConsole show memory used just 2.9GB. But htop (top) show Cassandra process take 8700MB. Here is my config: MAX_HEAP_SIZE="6G" HEAP_NEWSIZE="400M" JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC" JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC" JVM_O

Re: question about cassandra.in.sh

2011-08-25 Thread Eric Evans
On Thu, Aug 25, 2011 at 10:13 AM, Koert Kuipers wrote: > hey eric, the one thing i do not agree that it is the element of least > surprise. i would argue that the default behavior for *nix appplications is > that they find out what their home directory is and operate relative to > that. something

Re: Cassandra 082 - Large swap memory

2011-08-25 Thread Jonathan Ellis
Sounds like http://wiki.apache.org/cassandra/FAQ#mmap On Thu, Aug 25, 2011 at 10:36 PM, King JKing wrote: > Dear all, > My Cassandra 082 server had very large swap memory. > JConsole show memory used just 2.9GB. But htop (top) show Cassandra process > take 8700MB. > Here is my config: > MAX_HEAP_

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread Eric Evans
On Thu, Aug 25, 2011 at 6:31 AM, Ruby Stevenson wrote: > - Although Cassandra (and other decentralized NoSQL data store) has > been reported to handle very large data in total, my preliminary > understanding is the individual "column value" is quite limited. I > have read some posts saying you sho

Re: Cassandra 082 - Large swap memory

2011-08-25 Thread King JKing
Dear Jonathan, Cassandra process has 63.5 GB virtual size. I mention about RES column in top. RES is 8.3G. Very large than 2.5G Used Memory Used show in JConsole. On Fri, Aug 26, 2011 at 11:04 AM, Jonathan Ellis wrote: > Sounds like http://wiki.apache.org/cassandra/FAQ#mmap > > On Thu, Aug 25,

Re: multi-node cassandra config doubt

2011-08-25 Thread Thamizh
Hi All, It looks it is know issue with Cassandra-0.8.4. So either I have to wait till 0.8.5 to be released or have to switch to 0.7.8 if this has been resolved in that. Ref: https://issues.apache.org/jira/browse/CASSANDRA-3044 Regards, Thamizhannal P --- On Thu, 25/8/11, Thamizh wrote: Fr