Re: How to remove/add node

2011-07-13 Thread Abdul Haq Shaik
Thanks a lot dear. I will try it out and will let you know if the problem persists. On Thu, Jul 14, 2011 at 5:52 AM, Sameer Farooqui wrote: > As long as you have no data in this cluster, try clearing out the > /var/lib/cassandra directory from all nodes and restart Cassandra. > > The only way to

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu
okay, I am not sure if it is infinite loop, I change log4j to "DEBUG" only because cassandra never get online after run cassandra, it seems just halt. I enable debug then it start showing those message very fast and never end. I have just run nodetool cleanup, and it start reading commitlog, seem

RE: JDBC CQL Driver unable to locate cassandra.yaml

2011-07-13 Thread Vivek Mishra
setting server.config ->$SERVER_PATH/Cassandra.yaml as a system property should resolve this? -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, July 14, 2011 3:53 AM To: user@cassandra.apache.org Subject: Re: JDBC CQL Driver unable to locate cassandra.yam

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Jonathan Ellis
That says "I'm collecting data to answer requests." I don't see anything here that indicates an infinite loop. I do see that it's saying "N of 2147483647" which looks like you're doing slices with a much larger limit than is advisable (good way to OOM the way you already did). On Wed, Jul 13, 20

Re: Survey: Cassandra/JVM Resident Set Size increase

2011-07-13 Thread Zhu Han
On Wed, Jul 13, 2011 at 9:45 PM, Konstantin Naryshkin wrote: > Do you mean that it is using all of the available heap? That is the > expected behavior of most long running Java applications. The JVM will not > GC until it needs memory (or you explicitly ask it to) and will only free up > a bit of

Re: Replicating to all nodes

2011-07-13 Thread Maki Watanabe
Consistency and Availability are in trade-off each other. If you use RF=7 + CL=ONE, your read/write will success if you have one node alive during replicate data to 7 nodes. Of course you will have a chance to read old data in this case. If you need strong consistency, you must use CL=QUORUM. maki

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu
problem is I can't take cassandra back does that because not enough memory for cassandra? On Thu, Jul 14, 2011 at 11:29 AM, Bret Palsson wrote: > How much total memory does your machine have? > > -- > Bret > > On Wednesday, July 13, 2011 at 9:27 PM, Yan Chunlu wrote: > > I gave cassandra 8GB

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu
16GB On Thu, Jul 14, 2011 at 11:29 AM, Bret Palsson wrote: > How much total memory does your machine have? > > -- > Bret > > On Wednesday, July 13, 2011 at 9:27 PM, Yan Chunlu wrote: > > I gave cassandra 8GB heap size and somehow it run out of memory and > crashed. after I start it, it just run

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Bret Palsson
How much total memory does your machine have? -- Bret On Wednesday, July 13, 2011 at 9:27 PM, Yan Chunlu wrote: > I gave cassandra 8GB heap size and somehow it run out of memory and crashed. > after I start it, it just runs in to the following infinite loop, the last > line: > DEBUG [main] 2

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu
I gave cassandra 8GB heap size and somehow it run out of memory and crashed. after I start it, it just runs in to the following infinite loop, the last line: DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 100zs:false:14@1310168625866434 goes for e

cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 100zs:false:14@1310168625866434

Re: Replicating to all nodes

2011-07-13 Thread Kyle Gibson
Thanks for the reply Peter. The goal is to configure a cluster in which reads and writes can complete successfully even if only 1 node is online. For this to work, each node would need the entire dataset. Your example of a 3 node ring with RF=3 would satisfy this requirement. However, if two nodes

Re: How to remove/add node

2011-07-13 Thread Sameer Farooqui
As long as you have no data in this cluster, try clearing out the /var/lib/cassandra directory from all nodes and restart Cassandra. The only way to change tokens after they've been set is using a nodetool move or clearing /var/lib/cassandra. On Wed, Jul 13, 2011 at 7:41 AM, Abdul Haq Shaik < a

Question about compaction

2011-07-13 Thread Sameer Farooqui
Running Cassandra 0.8.1. Ran major compaction via: sudo /home/ubuntu/brisk/resources/cassandra/bin/nodetool -h localhost compact & >From what I'd read about Cassandra, I thought that after compaction all of the different SSTables on disk for a Column Family would be merged into one new file. How

Re: Replicating to all nodes

2011-07-13 Thread Peter Schuller
> Read and write operations should succeed even if only 1 node is online. > > When a read is performed, it is performed against all active nodes. Using QUORUM is the closest thing you get for reads without modifying Cassandra. You can't make it wait for all nodes that happen to be up. > When a wr

Re: JDBC CQL Driver unable to locate cassandra.yaml

2011-07-13 Thread Jonathan Ellis
The current version of the driver does require having the server's cassandra.yaml on the classpath. This is a bug. On Wed, Jul 13, 2011 at 3:13 PM, Derek Tracy wrote: > I am trying to integrate the Cassandra JDBC CQL driver with my companies ETL > product. > We have an interface that performs da

JDBC CQL Driver unable to locate cassandra.yaml

2011-07-13 Thread Derek Tracy
I am trying to integrate the Cassandra JDBC CQL driver with my companies ETL product. We have an interface that performs database queries using there respective JDBC drivers. When I try to use the Cassandra CQL JDBC driver I keep getting a stacktrace: Unable to locate cassandra.yaml I am using Ca

Replicating to all nodes

2011-07-13 Thread Kyle Gibson
I am wondering if the following cluster figuration is possible with cassandra, and if so, how it could be achieved. Please also feel free to point out any issues that may make this configuration undesired that I may not have thought of. Suppose a cluster of N nodes. Each node replicates the data

R: Re: Re: Re: Re: AntiEntropy?

2011-07-13 Thread cbert...@libero.it
>Note that if GCGraceSeconds is 10 days, you want to run repair often >enough that there will never be a moment where there is more than >exactly 10 days since the last successfully completed repair >*STARTED*. >When scheduling repairs, factor in things like - what happens if >repair fails? Who ge

Re: commitlog replay missing data

2011-07-13 Thread Peter Schuller
> # wait for a bit until no one is sending it writes anymore More accurately, until all other nodes have realized it's down (nodetool ring on each respective host). -- / Peter Schuller (@scode on twitter)

Re: commitlog replay missing data

2011-07-13 Thread Peter Schuller
> What are the other ways to stop Cassandra? nodetool disablegossip nodetool disablethrift # wait for a bit until no one is sending it writes anymore nodetool flush # only relevant if in periodic mode # then kill it > What's the difference between batch vs periodic? Search for "batch" on http://

Re: commitlog replay missing data

2011-07-13 Thread mcasandra
Peter Schuller wrote: > >> Recently upgraded to 0.8.1 and noticed what seems to be missing data >> after a >> commitlog replay on a single-node cluster. I start the node, insert a >> bunch >> of stuff (~600MB), stop it, and restart it. There are log messages > > If you stop by a kill, make sure

Re: commitlog replay missing data

2011-07-13 Thread Peter Schuller
> Recently upgraded to 0.8.1 and noticed what seems to be missing data after a > commitlog replay on a single-node cluster. I start the node, insert a bunch > of stuff (~600MB), stop it, and restart it. There are log messages If you stop by a kill, make sure you use batched commitlog synch mode in

Re: CQL + Counters = bad request

2011-07-13 Thread Aaron Turner
Thanks. Looks like we tracked down the problem to the datasax 0.8.1 rpm is actually 0.8.0. rpm -qa | grep cassandra apache-cassandra08-0.8.1-1 grep ' Cassandra version:' /var/log/cassandra/system.log | tail -1 INFO [main] 2011-07-13 12:04:31,039 StorageService.java (line 368) Cassandra version:

Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-13 Thread Aaron Morton
If you can provide some more details on the use case we may be able to provide some data model help. You can always use a dedicated CF for the counters, and use the same row key. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 1

Re: commitlog replay missing data

2011-07-13 Thread Aaron Morton
Have you verified that data you expect to see is not in the server after shutdown? WRT the differed in the difference between the Memtable data size and SSTable live size, don't believe everything you read :) Memtable live size is increased by the serialised byte size of every column inserted,

Re: Off-heap Cache

2011-07-13 Thread Raj N
How do I ensure it is indeed using the SerializingCacheProvider. Thanks -Rajesh On Tue, Jul 12, 2011 at 1:46 PM, Jonathan Ellis wrote: > You need to set row_cache_provider=SerializingCacheProvider on the > columnfamily definition (via the cli) > > On Tue, Jul 12, 2011 at 9:57 AM, Raj N wrote:

Re: Re: Re: Re: AntiEntropy?

2011-07-13 Thread Peter Schuller
> In the company I work for I suggested many times to run repair at least 1 > every 10 days (gcgraceseconds is set approx to 10 days in our config) -- but > this book has been used against me :-) I will ask to run repair asap Note that if GCGraceSeconds is 10 days, you want to run repair often eno

Re: BulkLoader

2011-07-13 Thread Sylvain Lebresne
I'll have to apologize on that one. Just saw that the JMX call I was talking about doesn't work as it should. I'll fix that for 0.8.2 but in the meantime you'll want to use sstableloader on a different IP as pointed by Jonathan. -- Sylvain On Wed, Jul 13, 2011 at 5:11 PM, Sylvain Lebresne wrote:

Re: Escaping characters in cqlsh

2011-07-13 Thread Jonathan Ellis
You can escape quotes but I don't think you can escape semicolons. Can you create a ticket for us to fix this? On Wed, Jul 13, 2011 at 10:16 AM, Blake Visin wrote: > I am trying to get all the columns named "fmd:" in cqlsh. > I am using: > select 'fmd:'..'fmd;' from feeds where; > But I am gettin

Re: One node down but it thinks its fine...

2011-07-13 Thread Ray Slakinski
And fixed! a co-worker put in a bad host line entry last night that through it all off :( Thanks for the assist guys. -- Ray Slakinski On Wednesday, July 13, 2011 at 1:32 PM, Ray Slakinski wrote: > Was all working before, but we ran out of file handles and ended up > restarting the nodes. No

Re: CQL + Counters = bad request

2011-07-13 Thread samal
> > >>> cqlsh> UPDATE RouterAggWeekly SET 1310367600 = 1310367600 + 17 WHERE > >>> KEY = '1_20110728_ifoutmulticastpkts'; > >>> Bad Request: line 1:51 no viable alternative at character '+' > I m able to insert it. ___ cqlsh> cqlsh> UPDATE counts SET 1310367600 = 1310367600 +

Re: CQL + Counters = bad request

2011-07-13 Thread Aaron Turner
I've tried using the Thrift/execute_cql_query() API as well, and it doesn't work either. I've also tried using a CF where the column names are of AsciiType to see if that was the problem (quoted and unquoted column names) and I get the exact same error of: no viable alternative at character '+' F

Re: One node down but it thinks its fine...

2011-07-13 Thread Ray Slakinski
Was all working before, but we ran out of file handles and ended up restarting the nodes. No yaml changes have occurred. Ray Slakinski On 2011-07-13, at 12:55 PM, Sasha Dolgy wrote: > any firewall changes? ping is fine ... but if you can't get from > node(a) to nodes(n) on the specific ports

Escaping characters in cqlsh

2011-07-13 Thread Blake Visin
I am trying to get all the columns named "fmd:" in cqlsh. I am using: select 'fmd:'..'fmd;' from feeds where; But I am getting errors (as expected). Is there any way to escape the colon or semicolon in cqlsh? Thanks, Blake

Re: BulkLoader

2011-07-13 Thread Sylvain Lebresne
Also note that if you have a cassandra node running on the local node from which you want to bulk load sstables, there is a JMX (StorageService->bulkLoad) call to do just that. May be simpler than using sstableloader if that is what you want to do. -- Sylvain On Wed, Jul 13, 2011 at 3:46 PM, Step

Re: JSR-347

2011-07-13 Thread Yang
"data grids", it seems that this really does not have much relationship to "java", since all major noSQL solutions explicitly create interfaces in almost all languages and try to be language-agnostic by using RPC like thrift,avro etc. On Wed, Jul 13, 2011 at 9:06 AM, Pete Muir wrote: > Hi, > > I

Re: One node down but it thinks its fine...

2011-07-13 Thread Sasha Dolgy
any firewall changes? ping is fine ... but if you can't get from node(a) to nodes(n) on the specific ports... On Wed, Jul 13, 2011 at 6:47 PM, samal wrote: > Check seed ip is same in all node and should not be loopback ip on cluster. > > On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski > wrote: >

JSR-347

2011-07-13 Thread Pete Muir
Hi, I am looking to "round out" the EG membership of JSR-347 so that we can get going with discussions. It would be great if someone from the Cassandra community could join to represent the experiences of developing HBase :-) We'll be communicating using https://groups.google.com/forum/#!forum/

Re: One node down but it thinks its fine...

2011-07-13 Thread samal
Check seed ip is same in all node and should not be loopback ip on cluster. On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski wrote: > One of our nodes, which happens to be the seed thinks its Up and all the > other nodes are down. However all the other nodes thinks the seed is down > instead. The l

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
Got it. Thanks! On Wed, Jul 13, 2011 at 6:05 PM, Jonathan Ellis wrote: > (1) the hash calculation is a small amount of CPU -- MD5 is > specifically designed to be efficient in this kind of situation > (2) we compute one hash per query, so for multiple columns the > advantage over timestamp-per-

RE: BulkLoader

2011-07-13 Thread Stephen Pope
Ahhh..ok. Thanks. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, July 13, 2011 11:35 AM To: user@cassandra.apache.org Subject: Re: BulkLoader Because it's hooking directly into gossip, so the local instance it's ignoring is the bulkloader process, no

Re: BulkLoader

2011-07-13 Thread Jonathan Ellis
Because it's hooking directly into gossip, so the local instance it's ignoring is the bulkloader process, not Cassandra. You'd need to run the bulkloader from a different IP, than Cassandra. On Wed, Jul 13, 2011 at 8:22 AM, Stephen Pope wrote: >  Fair enough. My original question stands then. :)

RE: BulkLoader

2011-07-13 Thread Stephen Pope
Fair enough. My original question stands then. :) Why aren't you allowed to talk to a local installation using BulkLoader? -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, July 13, 2011 11:06 AM To: user@cassandra.apache.org Subject: Re: BulkLoader

One node down but it thinks its fine...

2011-07-13 Thread Ray Slakinski
One of our nodes, which happens to be the seed thinks its Up and all the other nodes are down. However all the other nodes thinks the seed is down instead. The logs for the seed node show everything is running as it should be. I've tried restarting the node, turning on/off gossip and thrift and

Re: BulkLoader

2011-07-13 Thread Jonathan Ellis
Sure, that will work fine with a single machine. The advantage of bulkloader is it handles splitting the sstable up and sending each piece to the right place(s) when you have more than one. On Wed, Jul 13, 2011 at 7:47 AM, Stephen Pope wrote: >  I think I've solved my own problem here. After gen

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread Jonathan Ellis
(1) the hash calculation is a small amount of CPU -- MD5 is specifically designed to be efficient in this kind of situation (2) we compute one hash per query, so for multiple columns the advantage over timestamp-per-column gets large quickly. On Wed, Jul 13, 2011 at 7:31 AM, David Boxenhorn wrote

Re: AssertionError: No data found for NamesQueryFilter

2011-07-13 Thread Jonathan Ellis
This (https://issues.apache.org/jira/browse/CASSANDRA-2653) is fixed in 0.7.7, which will be out soon. On Tue, Jul 12, 2011 at 9:13 PM, Kyle Gibson wrote: > Running version 0.7.6-2, recently upgraded from 0.7.3. > > I am get a time out exception when I run a particular > get_indexed_slices, which

RE: BulkLoader

2011-07-13 Thread Stephen Pope
I think I've solved my own problem here. After generating the sstable using json2sstable it looks like I can simply copy the created sstable into my data directory. Can anyone think of any potential problems with doing it this way? -Original Message- From: Stephen Pope [mailto:stephen

How to remove/add node

2011-07-13 Thread Abdul Haq Shaik
Hi, I have deleted the data, commitlog and saved cache directories. I have removed one of the nodes from the seeds of cassandra.yaml. When i tried to use nodetool, itshowing the removed node as up.. Thanks, Abdul

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
Is that the actual reason? This seems like a big inefficiency to me. For those of us who don't worry about this extreme edge case (that probably will NEVER happen in real life, for most applications), is there a way to turn this off? Or am I wrong about this making the operation MUCH more expensi

Re: insert a super column

2011-07-13 Thread Konstantin Naryshkin
A ColumnPath can contain a super column, so you should be fine inserting a super column family (in fact I do that). Quoting cassandra.thrift: struct ColumnPath { 3: required string column_family, 4: optional binary super_column, 5: optional binary column, } - Original Message ---

Re: Survey: Cassandra/JVM Resident Set Size increase

2011-07-13 Thread Konstantin Naryshkin
Do you mean that it is using all of the available heap? That is the expected behavior of most long running Java applications. The JVM will not GC until it needs memory (or you explicitly ask it to) and will only free up a bit of memory at a time. That is very good behavior from a performance sta

BulkLoader

2011-07-13 Thread Stephen Pope
I'm trying to figure out how to use the BulkLoader, and it looks like there's no way to run it against a local machine, because of this: Set hosts = Gossiper.instance.getLiveMembers(); hosts.remove(FBUtilities.getLocalAddress()); if (hosts.isEmpty(

RE: sstabletojson

2011-07-13 Thread Stephen Pope
Perfect, thanks! -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Tuesday, July 12, 2011 5:53 PM To: user@cassandra.apache.org Subject: Re: sstabletojson You can upgrade to 0.8.1 to fix this. :) On Tue, Jul 12, 2011 at 1:03 PM, Stephen Pope wrote: >  Hey there.

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread Boris Yen
For a specific column, If there are two versions with the same timestamp, the value of the column is used to break the tie. if v1.value().compareTo(v2.value()) < 0, it means that v2 wins. On Wed, Jul 13, 2011 at 7:13 PM, David Boxenhorn wrote: > How would you know which data is correct, if they

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
How would you know which data is correct, if they both have the same timestamp? On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen wrote: > I can only say, "data" does matter, that is why the developers use hash > instead of timestamp. If hash value comes from other node is not a match, a > read repair

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread Boris Yen
I can only say, "data" does matter, that is why the developers use hash instead of timestamp. If hash value comes from other node is not a match, a read repair would perform. so that correct data can be returned. On Wed, Jul 13, 2011 at 5:08 PM, David Boxenhorn wrote: > If you have to pieces of

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
If you have to pieces of data that are different but have the same timestamp, how can you resolve consistency? This is a pathological situation to begin with, why should you waste effort to (not) solve it? On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen wrote: > I guess it is because the timestamp

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread Boris Yen
I guess it is because the timestamp does not guarantee data consistency, but hash does. Boris On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn wrote: > I just saw this > > http://wiki.apache.org/cassandra/DigestQueries > > and I was wondering why it returns a hash of the data. Wouldn't it be >

Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
I just saw this http://wiki.apache.org/cassandra/DigestQueries and I was wondering why it returns a hash of the data. Wouldn't it be better and easier to return the timestamp? You don't really care what the data is, you only care whether it is more or less recent than another piece of data.

Re: Key_Cache @ Row_Cache

2011-07-13 Thread samal
> > Can you give me a bit idea how key_cache and row_cache effects on > performance of cassandra. How these things works in different scenario > depending upon the data size? > > While reading, if row_cached is set, it check for row_cache first then key_cached, memtable & disk. row_cache store al

Re: Re: Re: AntiEntropy?

2011-07-13 Thread Maki Watanabe
I'll write a FAQ for this topic :-) maki 2011/7/13 Peter Schuller : >> To be sure that I didn't misunderstand (English is not my mother tongue) here >> is what the entire "repair paragraph" says ... > > Read it, I maintain my position - the book is wrong or at the very > least strongly misleading

Re: insert a super column

2011-07-13 Thread yulinyen
For batch_insert, I think you could use batch_mutate instead. For multi_get, I think you could use multiget_slice instead. Boris 在 ,魏金仙 寫道: insert(key, column_path, column, consistency_level) can only insert a standard column.Is batch_mutate the only API to insert a super column? and also

R: Re: Re: Re: AntiEntropy?

2011-07-13 Thread cbert...@libero.it
Thanks for the confirmatio, Peter. In the company I work for I suggested many times to run repair at least 1 every 10 days (gcgraceseconds is set approx to 10 days in our config) -- but this book has been used against me :-) I will ask to run repair asap >Messaggio originale >Da: peter.s

insert a super column

2011-07-13 Thread 魏金仙
insert(key, column_path, column, consistency_level) can only insert a standard column. Is batch_mutate the only API to insert a super column? and also can someone tell why batch_insert,multi_get is removed in version 0.7.4?

Re:Key_Cache @ Row_Cache

2011-07-13 Thread 魏金仙
row_Cache caches a whole row, Key_cache caches the key and the row location. thus, if the request is hit in row_Cache then the result can be given without disk seek. If it is hit in key_Cache, result can be obtains after one disk seek. without key_Cache or row_cache, it will check the index file f

Key_Cache @ Row_Cache

2011-07-13 Thread Nilabja Banerjee
Hi All, Can you give me a bit idea how key_cache and row_cache effects on performance of cassandra. How these things works in different scenario depending upon the data size? Thank You Nilabja Banerjee