Re: understanding tombstones

2011-03-10 Thread Wangpei (Peter)
My question: what the client would get, when following happens:(RF=3, N=3) 1, write with timestamp T and succeed in all nodes. 2, delete with timestamp T+1, CL=Q, and succeed in node1 and node2 but failed in node3. 3, force flush + compaction 4, read CL=Q Does the client will get the row and rea

Re: understanding tombstones

2011-03-10 Thread Sylvain Lebresne
2011/3/10 Wangpei (Peter) > My question: > what the client would get, when following happens:(RF=3, N=3) > 1, write with timestamp T and succeed in all nodes. > 2, delete with timestamp T+1, CL=Q, and succeed in node1 and node2 but > failed in node3. > 3, force flush + compaction > 4, read CL=Q >

On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-10 Thread Jedd Rashbrooke
Howdi, Assortment of questions relating to an upgrade combined with a possible migration between Data Centers (or perhaps a multi-DC redesign). Apologies if some of these have been asked before - I have kept half an eye on the list in recent times but haven't seen anything covering these pa

mutator.execute() timings - big variance noted - pointers needed on understanding/improving it

2011-03-10 Thread Roshan Dawrani
Hi, I am in the middle of some load testing on a 1-node Cassandra setup. We are not on very high loads yet. We have recorded the timings taken up by mutator.execute() calls and we see this kind of variation during the test run: So, 25% of the times, execute() calls come back in 25 milli-seconds,

Re: problem with bootstrap

2011-03-10 Thread Patrik Modesto
Hi, I'm stil fighting the Exception in thread "main" java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2). When I have a 2-server cluster, create Keyspace with RF 3, I'm able to add (without auto_bootstrap) another node but cluster nodetool commands don't work a

Re: mutator.execute() timings - big variance noted - pointers needed on understanding/improving it

2011-03-10 Thread sridhar basam
Sounds like GC from your description of fast->slow->fast. Collect GC times from both the client and server side and plot against your application timing. If you uncomment the verbose GC entries in the cassandra-env.sh file you should get timing for the server side, pass in the same arguments for

Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread Adi
Environment: Cassandra 0.7.0 , C++ Thrift client on windows I have a column family with a secondary index ColumnFamily: Page Columns sorted by: org.apache.cassandra.db.marshal.BytesType Built indexes: [Page.index_domain, Page.index_content_size] Column Metadata: Column N

Re: FW: Very slow batch insert using version 0.7.2

2011-03-10 Thread Ryan King
Why use such a large batch size? -ryan On Thu, Mar 10, 2011 at 6:31 AM, Desimpel, Ignace wrote: > > > Hello, > > I had a demo application with embedded cassandra version 0.6.x, inserting > about 120 K  row mutations in one call. > > In version 0.6.x that usually took about 5 seconds, and I could

Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread Tyler Hobbs
I looked again at the original emailand noticed that besides the bit-shift issue that gets corrected in the next email in the thread, there is another probl

Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Matt Kennedy
Well it looks like the index creation job crashed the cluster. All of the nodes were down having dumped out .hprof files. I brought the cluster back up and when I do "describe keyspace ks" it looks like the index build process has started over again. Is it safe to attempt to stop that by running

Re: Nodes frozen in GC

2011-03-10 Thread Peter Schuller
I think it would be very useful to get to the bottom of this but without further details (like the asked for GC logs) I'm not sure what to do/suggest. It's clear that a single CF with a 64 MB memtable flush threshold and without key cache and row cache and some bulk insertion, should not be causin

Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Jonathan Ellis
If you read the bugs I linked, you would see that this is expected behavior with 0.7.3 once you get more data than you can index in-memory. You should wait for the next Hudson build (which will include 2295) and use that. Or, create your indexes before adding the data. On Thu, Mar 10, 2011 at 12

Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread Adi
That was it. Thanks thobbs :-) The queries work as expected now. -Adi On Thu, Mar 10, 2011 at 1:01 PM, Tyler Hobbs wrote: > I looked again at the original > email

Re: problem with bootstrap

2011-03-10 Thread mcasandra
mcasandra wrote: > > > aaron morton wrote: >> >> >> The issue I think you and Patrik are seeing occurs when you *remove* >> nodes from the ring. The ring does not know if they are up or down. E.g. >> you have a ring of 3 nodes, and add a keyspace with RF 3. Then for >> whatever reason 2 nodes

how to force a GC in cronjob to free up disk space?

2011-03-10 Thread Karl Hiramoto
Reading the FAQ http://wiki.apache.org/cassandra/FAQ "SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. You can force a GC from jconsole if necessary" How can i force the GC with a simple java commandline? Is

Re: Modeling Multi-Valued Fields

2011-03-10 Thread aaron morton
Two approaches here. First the "many columns" approach. Have a super column called Email, for each email address store the type as the column name and the email address as the column name. In cassandra you can store information in the column names as well as the column values. And you do not ne

Re: mutator.execute() timings - big variance noted - pointers needed on understanding/improving it

2011-03-10 Thread aaron morton
http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts Aaron On 11 Mar 2011, at 05:08, sridhar basam wrote: > > Sounds like GC from your description of fast->slow->fast. Collect GC times > from both the client and server side and plot against your application timing. > > If you

Re: Modeling Multi-Valued Fields

2011-03-10 Thread Sasha Dolgy
hm. i use this approach and have secondary indexes configured on the columns if i need to do a specific search for an address. alternately, in the user cf, if you wanted to be very uncool, but optimized for always retrieving the user email addresses, you could have the uuid for the user record an

Re: problem with bootstrap

2011-03-10 Thread aaron morton
Can you include this info... - output from nodetool ring for all nodes so we can see whats in the ring - what you've run on the node you are trying to bring in - the nodetool command you are trying to run - error logs In general asking the cluster to replicate data more times than the number of

Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Matt Kennedy
Sorry, I wasn't clear on the timeline of events. I started the index build and then posted this message to the list. Once I read the links you posted, I did expect the cluster to crash, but I let it run until it blew up anyway, since I didn't really know how to stop the index build. Which is sort

RE: Nodes frozen in GC

2011-03-10 Thread Gregory Szorc
I do believe there is a fundamental issue with compactions allocating too much memory and incurring too many garbage collections (at least with 0.6.12). On nearly every Cassandra node I operate, garbage collections simply get out of control during compactions of any reasonably sized CF (>1GB). I

Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Jonathan Ellis
Drop the index, then restart once more. It shouldn't try to rebuild the index after that. On Thu, Mar 10, 2011 at 3:36 PM, Matt Kennedy wrote: > Sorry, I wasn't clear on the timeline of events.  I started the index build > and then posted this message to the list. Once I read the links you poste

Re: problem with bootstrap

2011-03-10 Thread mcasandra
I am completely confused. I repeated same test after turning on auto_bootstrap to true and it worked this time. I did it exactly same way where I killed 2 nodes and this time it started with no issues. Could it be because once auto_bootstrap is off it's off forever? I am using hector and upgraded

Re: problem with bootstrap

2011-03-10 Thread Peter Schuller
> Could it be because once auto_bootstrap is off it's off forever? I am not entirely sure if this answers your question (I revisisted the thread history but I'm a bit confused myself): If by that you mean that given a node which was started with auto_bootstrap=false, and it successfully joined the

Re: problem with bootstrap

2011-03-10 Thread Peter Schuller
> Bootstrapping uses the same mechanisms as a repair to streams data from other > nodes. This can be a heavy weight process and you may want to control when it > starts. > > Joining the ring just tells the other nodes you exists and this is your token. And in general, except when initially setti

Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Matt Kennedy
Great, that worked, thanks for your time. On Thu, Mar 10, 2011 at 4:57 PM, Jonathan Ellis wrote: > Drop the index, then restart once more. It shouldn't try to rebuild > the index after that. > > On Thu, Mar 10, 2011 at 3:36 PM, Matt Kennedy > wrote: > > Sorry, I wasn't clear on the timeline of

Re: Exception when running a clean up

2011-03-10 Thread Stu King
I have upgraded from 0.7.0 to 0.7.3. I then run nodetool scrub on my keyspace and now see this exception: Exception in thread "main" java.io.IOError: java.io.IOException: Cannot run program "ln": java.io.IOException: error=24, Too many open files at org.apache.cassandra.db.ColumnFamilyStore.snapsh

Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread buddhasystem
Tyler, as a collateral issue - I've been wondering for a while what advantage if any it buys me, if I declare a value 'long' (which it roughly is) as opposed to passing around strings. String is flattened onto a replica of itself, I assume? No conversion? Maybe it even means better speed. Thanks,

Re: Is secondary index consistent with its base table?

2011-03-10 Thread Alvin UW
Thanks. Why secondary indexes are recommended for only attributes with low cardinality and they are not very useful for high cardinality values? 2011/3/7 Jonathan Ellis > It does, but this is an implementation detail subject to change (e.g., > the bitmap indexes being added do not). > > On Mon,

Re: mutator.execute() timings - big variance noted - pointers needed on understanding/improving it

2011-03-10 Thread Roshan Dawrani
Hi All, Thanks for the inputs. I will start investigating this morning with the help of these. Regards, Roshan On Fri, Mar 11, 2011 at 2:49 AM, aaron morton wrote: > http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts > >

Cassandra startup port problem, apache-cassandra-0.7.3 on Snow Leopard.

2011-03-10 Thread Bob Futrelle
After a reboot, cassandra spits out many lines on startup but then appears to stall. Worse, trying to run cassandra a second time stops immediately because of a port problem: apache-cassandra-0.7.3: sudo ./bin/cassandra -f -p pidfile Password: Error: Exception thrown by the agent : java.rmi.serve

Re: Cassandra startup port problem, apache-cassandra-0.7.3 on Snow Leopard.

2011-03-10 Thread Jeremy Hanna
Comments in-line. On Mar 10, 2011, at 8:10 PM, Bob Futrelle wrote: > After a reboot, cassandra spits out many lines on startup but then appears to > stall. > > Worse, trying to run cassandra a second time stops immediately because of a > port problem: > > apache-cassandra-0.7.3: sudo ./bin/c

memory utilization

2011-03-10 Thread Bill Hastings
Hi All Memory utilization reported by JCOnsole for Cassandra seems to be much lesser than that reported by top ("RES" memory). Can someone explain this? Maybe off topic but would appreciate a response. -- Cheers Bill

Re: Exception when running a clean up

2011-03-10 Thread Jonathan Ellis
Unrelated to either upgrade or scrub. That just means you need to install JNA to get native linking instead of having to fork to run ln. On Thu, Mar 10, 2011 at 5:54 PM, Stu King wrote: > I have upgraded from 0.7.0 to 0.7.3. I then run nodetool scrub on my > keyspace and now see this exception:

Re: memory utilization

2011-03-10 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/FAQ#mmap On Thu, Mar 10, 2011 at 8:26 PM, Bill Hastings wrote: > Hi All > > Memory utilization reported by JCOnsole for Cassandra seems to be much > lesser than that reported by top ("RES" memory). Can someone explain this? > Maybe off topic but would appreciate a

Re: FW: Very slow batch insert using version 0.7.2

2011-03-10 Thread Erik Forkalsrud
I see the same behavior with smaller batch sizes. It appears to happen when starting Cassandra with the defaults on relatively large systems. Attached is a script I created to reproduce the problem. (usage: mutate.sh /path/to/apache-cassandra-0.7.3-bin.tar.gz) It extracts a stock cassand

Fatal configuration error, so how to change listen_address:storage_port in cassandra.yaml ?

2011-03-10 Thread Bob Futrelle
Now that I've made the JMX_PORT change cassandra will attempt to run. (Dumb me, I didn't need to ask - the answer about changing JMX_PORT was already in the archives. I'm getting with it now, so I know to look there first. Just finding my way around cassandra) Made the change: JMX_PORT="808

Secondary Index not working?

2011-03-10 Thread Rommel Garcia
I tried the tutorial on this site - http://www.datastax.com/docs/0.7/data_model/secondary_indexes and worked on creating an index on a new column. That went good. But when I indexed an existing column, my query below returns 0 row where in fact it should return 1. Query: get users where state

Re: Secondary Index not working?

2011-03-10 Thread Jonathan Ellis
https://issues.apache.org/jira/browse/CASSANDRA-2244 On Thu, Mar 10, 2011 at 9:28 PM, Rommel Garcia wrote: > I tried the tutorial on this site > - http://www.datastax.com/docs/0.7/data_model/secondary_indexes and worked > on creating an index on a new column. That went good. But when I indexed an

How long will all nodes data sync.

2011-03-10 Thread Vincent Lu (ECL)
Hi all, I have a question about eventually consistency. If there are 3 nodes and RF=3, Write-C=Quorum. How long will all 3 nodes data sync? Does any configuration can change that? Thanks in advance. Vincent This correspondence is from Cyberlink Corp. and is intended only for use by the

Pig output to Cassandra

2011-03-10 Thread Mark
I thought I read somewhere that Pig has an output format that can write to Cassandra but I am unable to find any documentation on this. Is this possible and if so can someone please point me in the right direction. Thanks

Re: Pig output to Cassandra

2011-03-10 Thread Matt Kennedy
On its way... https://issues.apache.org/jira/browse/CASSANDRA-1828 On Mar 10, 2011, at 11:17 PM, Mark wrote: > I thought I read somewhere that Pig has an output format that can write to > Cassandra but I am unable to find any documentation on this. Is this possible > and if so can someone pleas

Re: Fatal configuration error, so how to change listen_address:storage_port in cassandra.yaml ?

2011-03-10 Thread Aaron Morton
Something else is using the port, perhaps an existing Cassandra process? Use "lsof -i | grep 7000" to see what is. If you need to change it, you are looking for storage_port in the config. Aaron On 11/03/2011, at 3:43 PM, Bob Futrelle wrote: > Now that I've made the JMX_PORT change cassandra

Re: Pig output to Cassandra

2011-03-10 Thread Mark
Sweet! This is exactly what I was looking for and it looks like it was just resolved. Are there any working examples or documentation on this feature? Thanks On 3/10/11 8:57 PM, Matt Kennedy wrote: On its way... https://issues.apache.org/jira/browse/CASSANDRA-1828 On Mar 10, 2011, at 11:17 P

Secondary indices: Why low cardinality?

2011-03-10 Thread Kevin
There's pretty limited information on Cassandra's built-in secondary index facility as is, but trying to find out why the secondary index has to have low cardinality has been like finding a needle in a haystack..that is floating somewhere in the Atlantic. Can someone explain why low cardinality