Re: OutOfMemory on count on cassandra 0.6.8 for large number of columns

2010-12-12 Thread Dave Martin
Thanks Tyler. I was unaware of counters. The use case for column counts is really from a operational perspective, to allow a sysadmin to do adhoc checks on columns to see if something has gone wrong in software outside of cassandra. I think running a cassandra-cli command such as count, which mak

Re: N to N relationships

2010-12-12 Thread David Boxenhorn
You want to store every value twice? That would be a pain to maintain, and possibly lead to inconsistent data. On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey wrote: > I would also recommend two column families. Storing the key as NxN would > require you to hit multiple machines to query for an ent

Quorum and Datacenter loss

2010-12-12 Thread Jonathan Colby
Hi cassandra experts - We're planning a cassandra cluster across 2 datacenters (datacenter-aware, random partitioning) with QUORUM consistency. It seems to me that with 2 datacenters, if one datacenter is lost, the reads/writes to cassandra will fail in the surviving datacenter because of the N

Re: Quorum and Datacenter loss

2010-12-12 Thread Peter Schuller
> Is my logic wrong here?  Is there a way to ensure the nodes in the > alive datacenter respond successfully if the second datacenter is > lost?  Anyone have experience with this kind of problem? It's impossible to achieve the consistency and availability at the same time. See: http://en.wikip

Re: Quorum and Datacenter loss

2010-12-12 Thread Peter Schuller
>> Is my logic wrong here?  Is there a way to ensure the nodes in the >> alive datacenter respond successfully if the second datacenter is >> lost?  Anyone have experience with this kind of problem? > > It's impossible to achieve the consistency and availability at the > same time. See: (Assuming

Unsubscribe

2010-12-12 Thread Colin
Unsubscribe Please Sent from my iPad On Dec 12, 2010, at 1:26 AM, Dave Martin wrote: > Hi there, > > I see the following: > > 1) Add 8,000,000 columns to a single row. Each column name is a UUID. > 2) Use cassandra-cli to run count keyspace.cf['myGUID'] > > The following is reported in the

Re: Unsubscribe

2010-12-12 Thread Peter Schuller
> Unsubscribe http://wiki.apache.org/cassandra/FAQ#unsubscribe -- / Peter Schuller

Re: Quorum and Datacenter loss

2010-12-12 Thread Jonathan Colby
Thanks a lot Peter. So basically we would need to choose a consistency other than QUORUM.I think in our case consistency is not necessarily an issue since our data is write-once, read-many (immutable data). I suppose having a replication factor of 4 would result in two nodes in each datacen

Re: Memory leak with Sun Java 1.6 ?

2010-12-12 Thread Timo Nentwig
On Dec 10, 2010, at 19:37, Peter Schuller wrote: > To cargo cult it: Are you running a modern JVM? (Not e.g. openjdk b17 > in lenny or some such.) If it is a JVM issue, ensuring you're using a > reasonably recent JVM is probably much easier than to start tracking > it down... I had OOM problems

Re: Memory leak with Sun Java 1.6 ?

2010-12-12 Thread Jonathan Ellis
http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors On Sun, Dec 12, 2010 at 9:52 AM, Timo Nentwig wrote: > > On Dec 10, 2010, at 19:37, Peter Schuller wrote: > > > To cargo cult it: Are you running a modern JVM? (Not e.g. openjdk b17 > > in lenny or some such.) I

Dynamic Snitch / Read Path Questions

2010-12-12 Thread Daniel Doubleday
Hi again. It would be great if someone could comment whether the following is true or not. I tried to understand the consequences of using |-Dcassandra.dynamic_snitch=true for the read path |and that's what I came up with: 1) If using CL > 1 than using the dynamic snitch will result in a dat

Re: Quorum and Datacenter loss

2010-12-12 Thread Peter Schuller
> Thanks a lot Peter.   So basically we would need to choose a > consistency other than QUORUM.    I think in our case consistency is > not necessarily an issue since our data is write-once, read-many > (immutable data).   I suppose having a replication factor of 4 would > result in two nodes in ea

iterate over all the rows with RP

2010-12-12 Thread shimi
Is the same connection is required when iterating over all the rows with Random Paritioner or is it possible to use a different connection for each iteration? Shimi

Re: iterate over all the rows with RP

2010-12-12 Thread Peter Schuller
> Is the same connection is required when iterating over all the rows with > Random Paritioner or is it possible to use a different connection for each > iteration? In general, the choice of RPC connection (I assume you mean the underlying thrift connection) does not affect the semantics of the RP

Re: iterate over all the rows with RP

2010-12-12 Thread shimi
So if I will use a different connection (thrift via Hector), will I get the same results? It's make sense when you use OPP and I assume it is the same with RP. I just wanted to make sure this is the case and there is no state which is kept. Shimi On Sun, Dec 12, 2010 at 8:14 PM, Peter Schuller w

Re: N to N relationships

2010-12-12 Thread Edward Capriolo
On Sun, Dec 12, 2010 at 3:20 AM, David Boxenhorn wrote: > You want to store every value twice? That would be a pain to maintain, and > possibly lead to inconsistent data. > > On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey wrote: >> >> I would also recommend two column families. Storing the key as N

Re: OutOfMemory on count on cassandra 0.6.8 for large number of columns

2010-12-12 Thread Tyler Hobbs
Well, in this case I would say you probably need about 300MB of space in the heap, since that's what you've calculated. The APIs are designed to let you do what you think is best and they definitely won't stop you from shooting yourself in the foot. Counting a huge row, or trying to grab every ro

Re: iterate over all the rows with RP

2010-12-12 Thread Ran Tavory
This should be the case, yes, semantics isn't affected by the connection and state isn't kept. What might happen if you read/write with low consistency levels then when you hit a different host on the ring it might have an inconsistent state in case of partition. On Sunday, December 12, 2010, shim

Re: Cassandra for Ad-hoc Aggregation and formula calculation

2010-12-12 Thread Aaron Morton
Nice email Dan. I would also add if you are still in the initial stages take a look at Hadoop+Pig. If your source data is write once read many it may be a better fit, but then you would also need to calculate the aggregates and store them somewhere. So Cassandra *may* be just what you want. T

Re: N to N relationships

2010-12-12 Thread Aaron Morton
RE: storing every value twice. Cassandra is not a RDBMS, the goal is not to achieve fifth normal form. The goal is to design your storage schema to support the queries you wish to run. Storage is cheap. And it's really not a pain to store the values more than once. Use the batch_mutate() funct

Re: Quorum and Datacenter loss

2010-12-12 Thread Dave Viner
I think there's a flaw in your logic. Take the following scenario: - you use QUORUM for reads and QUROUM for writes - you have 2 datacenters (DC1, DC2), with 3 servers in each (so 6 nodes total). - you set replication factor to 3 - you use RackAwareStrategy So, you have DC1-S1, DC1-S2, DC1-S3, DC

unable to start cassandra-0.7r2

2010-12-12 Thread Liangzhao Zeng
I am trying to run cassandra-0.7r2 in eclipse by following http://wiki.apache.org/cassandra/RunningCassandraInEclipse. there is not compiling errors however, got error message: Bad configuration; unable to start server. Any idea? Liangzhao