Re: simple erlang example

2011-02-18 Thread Sasha Dolgy
there is a current stratregy to use cassandra for data storage and it makes sense to have user management and roster management exist in the same place for all the different services that we provide. specific to user interaction, i started looking at ejabberd because Apache Vysper is not as featur

Re: simple erlang example

2011-02-18 Thread Joshua Partogi
Is there any reason why you would be interested to use erlang with cassandra instead of other erlang based database [i.e Couchbase, Riak] ? I am interested to know the reason. Kind regards, Joshua On Sat, Feb 19, 2011 at 9:39 AM, Sasha Dolgy wrote: > hi, > does anyone have an erlang example for

Re: Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem
Dude, I never mentioned the server side, sorry if it wasn't obvious. As for python being slow, I'm not going away from it. It performs amazingly well in other circumstances. Jonathan Ellis-3 wrote: > > That doesn't make sense to me. IntegerType validation is a no-op and > LongType validation i

Re: Virtues and pitfall of using TYPES?

2011-02-18 Thread Jonathan Ellis
That doesn't make sense to me. IntegerType validation is a no-op and LongType validation is pretty close (just a size check). If you meant that the conversion is killing performance on your client, you should switch to a more performant client language. :) On Fri, Feb 18, 2011 at 9:56 PM, buddha

Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem
I've been too smart for my own good trying to type columns, on the theory that it would later increase performance by having more efficient comparators in place. So if a string represents an integer, I would convert it to an integer and declare the column as such. Same for LONG. What I found is t

Re: Timeout

2011-02-18 Thread mcasandra
Forgot to mention replication factor is 1 and I am running Cassandra 0.7.0. It's using SimpleStrategy -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-tp6042052p6042150.html Sent from the cassandra-u...@incubator.apache.org mailing list ar

Re: Timeout

2011-02-18 Thread mcasandra
This is a test cluster of 3 nodes. This is a test code that does the following: 1) First 4 lines physically drop, create keyspace and then creates CF and column definition on the server 2) Right after from 5th line onwards it then gets the reference to keyspace and tries to insert a row and colu

Re: Timeout

2011-02-18 Thread Javier Canillas
Why don't you post some details about your Cassandra Cluster, version, information about the keyspace you are creating (for example which is the replication factor within)? It might be of help. Besides, I don't fully understand your code. First you drop KEYSPACE, then create it again with a column

Timeout

2011-02-18 Thread mcasandra
I have this below code and what I see is that when I run this below code there is a timeout that occurs when I try to insert a column. But when I comment out first 4 lines (drop to display) then it works without any issues. I am trying to understand why. If required I can sleep and then insert. Is

Re: Error when bringing up 3rd node

2011-02-18 Thread Ching-Cheng Chen
If you know you will have 3 nodes, you should set the initial token inside the cassandra.yaml for each node. Then you won't need to run nodetool move. Regards, Chen www.evidentsoftware.com On Fri, Feb 18, 2011 at 5:24 PM, mcasandra wrote: > > Thanks! I feel so horrible after realizing what m

simple erlang example

2011-02-18 Thread Sasha Dolgy
hi, does anyone have an erlang example for connecting to cassandra and performing an operation like a get? I'm not having much luck with: \thrift-0.5.0\test\erl\src\* as a reference point. I generated all of the erlang files using thrift and have successfully compiled them but am having a pretty

Cassandra as write-behind, Cassandra as Cache

2011-02-18 Thread Benson Margulies
Cassandra as dessert topping? Cassandra as floor-wax? I do apologize for this basket of clueless questions, but I'm exploring new territory for me. Overall problem has two datasets with distinct storage characteristics. The first is a set of data that can fit in memory, but which needs reliable

Re: Error when bringing up 3rd node

2011-02-18 Thread mcasandra
Thanks! I feel so horrible after realizing what mistaked I made :) After I bring up the new node I just need to run the following on old nodes? 1) New node set the initial token to 56713727820156410577229101238628035242 2) start new node 3) On second node run nodetool move 1134274556403128211544

Re: Are row-keys sorted by the compareWith?

2011-02-18 Thread Michal Augustýn
Hi, I see "The CompareWith attribute tells Cassandra how to sort the columns for slicing operations." on wiki ( http://wiki.apache.org/cassandra/StorageConfiguration). So the CompareWith defines how to sort column (or super-columns) in scope of one row. So this option is relate to (multi)get_slice

Re: Error when bringing up 3rd node

2011-02-18 Thread Ching-Cheng Chen
try this BigInteger bi = new BigInteger("2"); BigInteger or = new BigInteger("2"); for (int i=1;i<127;i++) { or = or.multiply(bi); } or = or.divide(new BigInteger("3")); for (int i=0;i<3;i++) { System.out.println(or.multiply(new BigInteger(""+i))); } which generate 0 56713727820156410577229101

Re: Error when bringing up 3rd node

2011-02-18 Thread Jonathan Ellis
Also, ^ means xor in Java, not exponentiation. Just use the Python Eric linked. :) On Fri, Feb 18, 2011 at 3:24 PM, Ching-Cheng Chen wrote: > 41 > 82 > 123 > These certainly not correct.  Can't just use 2 ^ 127, will overflow > You can't use Java's primitive type to do this calculation.   long o

Re: Error when bringing up 3rd node

2011-02-18 Thread Eric Gilmore
I'm not sure I can say exactly why, but I'm sure those numbers can't be correct. One node should be zero and the other values should be very long numbers like 85070591730234615865843651857942052863. We need another Java expert's opinion here, but it looks like your snippet may have "integer overf

Re: Async write

2011-02-18 Thread Anthony John
Fact as i understand them:- - A write call to db triggers a number of async writes to all nodes where the particular write should be recorded (and the nodes are up per Gossip and so on) - Once desired CL number of writes acknowledge - the call returns So your issue is moot. That is what is happeni

Re: Error when bringing up 3rd node

2011-02-18 Thread Ching-Cheng Chen
41 82 123 These certainly not correct. Can't just use 2 ^ 127, will overflow You can't use Java's primitive type to do this calculation. long only use 64 bit. You'd need to use BigInteger class to do this calculation. Regards, Chen www.evidentsoftware.com On Fri, Feb 18, 2011 at 4:04 PM,

Re: Are row-keys sorted by the compareWith?

2011-02-18 Thread Jonathan Ellis
No. CompareWith is for columns. On Fri, Feb 18, 2011 at 3:16 PM, cbert...@libero.it wrote: > Hi all, > I created a CF in which i need to get, sorted by time, the Rows inside. Each > Row represents a comment. > > > > I've created a few rows using as Row Key a generated TimeUUID but when I call >

Re: Async write

2011-02-18 Thread A J
W always stands for number of sync writes. N-W is the number of async writes. Note, N decides number of replicas. W only decides out of those N replicas, how many should be written synchronously before returning success of write to client. All writes always happen to a total of N nodes (W right awa

Are row-keys sorted by the compareWith?

2011-02-18 Thread cbert...@libero.it
Hi all, I created a CF in which i need to get, sorted by time, the Rows inside. Each Row represents a comment. I've created a few rows using as Row Key a generated TimeUUID but when I call the Pelops method "GetColumnsFromRows" I don't get the data back as I expect: rows are not sorted by Tim

Re: frequent client exceptions on 0.7.0

2011-02-18 Thread Andy Skalet
On Thu, Feb 17, 2011 at 12:22 PM, Aaron Morton wrote: > Messages been dropped means the machine node is overloaded. Look at the > thread pool stats to see which thread pools have queues. It may be IO > related, so also check the read and write latency on the CF and use iostat. > > i would try th

Re: Async write

2011-02-18 Thread mcasandra
So does it mean there is no way to say use sync + async ? I am thinking if I have to write accross data center and doing it synchronuosly is going to be very slow and will be bad for clients to have to wait. What are my options or alternatives? Use N=3 and W=2? And the 3rd one (assuming will be a

Re: Error when bringing up 3rd node

2011-02-18 Thread mcasandra
Thanks! This is what I got. Is this right? public class TokenCalc{ public static void main(String ...args){ int nodes=3; for(int i = 1 ; i <= nodes; i++) { System.out.println( (2 ^ 127) / nodes * i); } } } 41 82 123 -- View this message in context: htt

Re: Async write

2011-02-18 Thread Anthony John
This is transparent! Essentially - when enough writes are acknowledged to meet the desired Consistency Level - it returns. On Fri, Feb 18, 2011 at 2:48 PM, mcasandra wrote: > > I am still trying to understand how writes work. Is there any concept of > sync > and async writes? For eg: > > If I w

Re: Error when bringing up 3rd node

2011-02-18 Thread Eric Gilmore
A Java program should work fine. The Wiki and the DataStax documentation use a python program for the same purpose: http://www.datastax.com/docs/0.7/operations/clustering#calculating-tokens On Fri, Feb 18, 2011 at 12:45 PM, mcasandra wrote: > > Yes I had set the first node to token 0. I think

Async write

2011-02-18 Thread mcasandra
I am still trying to understand how writes work. Is there any concept of sync and async writes? For eg: If I want to have W=2 but 1 write as sync and the 2nd as async. Or say I want to have W=3 with networktopology with DC1 getting 1 sync write + 1 async write and DC2 always getting async write

Re: Error when bringing up 3rd node

2011-02-18 Thread mcasandra
Yes I had set the first node to token 0. I think I read somewhere in the docs. What should I do. Should I write a java program to calculate the hash for 3 nodes and distribute it accross 3 nodes? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Er

Re: Error when bringing up 3rd node

2011-02-18 Thread Eric Gilmore
It sounds like one of your existing nodes already has the initial token zero. Did you set the intial token of the first node you brought online to zero? On Fri, Feb 18, 2011 at 12:35 PM, mcasandra wrote: > > I see following error. Is it because I have initial token defined? What > token > shoul

Error when bringing up 3rd node

2011-02-18 Thread mcasandra
I see following error. Is it because I have initial token defined? What token should I use as initial token? INFO 12:31:36,689 Finished hinted handoff of 0 rows to endpoint /172.16.208.12 INFO 12:32:58,448 Joining: getting bootstrap token ERROR 12:32:58,451 Fatal error: Bootstraping to existing

Re: cluster size, several cluster on one node for multi-tenancy

2011-02-18 Thread Mimi Aluminium
Nick, Assuming I have a tenant that has only one CF, and I am using NetworkAware repliaction strategy where the keys of this CF are replicated 3 times, each copy in a different DC (DC1,DC2,DC3) Now lets assume the cluster holds 5 DCs. As far as I understand only the servers that belong to the thre

Metadata

2011-02-18 Thread A J
If I wish to find name of all the keys in all the column families along with other related metadata (such as last updated, size of column value field), is there an additional solution that caches this metadata OR do I have to always perform range queries and get the information ? I am not interest

Re: managing a limited-length list as a value

2011-02-18 Thread Norman Maurer
Hi there, there is not such an operation in cassandra. The only thing which comes "close" is the TTL support which will "delete" columns after a given time. See: http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns Bye, Norman 2011/2/18 Benson Margulies : > The following is

Re: Understand eventually consistent

2011-02-18 Thread A J
#1, R=2, so if only one machine is up, by definition R cannot be satisfied. So it will not return. #2, consistency is an involved topic with no quick and easy explanation and answers. my 2 cents, Question of eventual consistency comes in distributed systems, where you can write to one machine but

Re: Frequent updates of freshly written columns

2011-02-18 Thread Sylvain Lebresne
On Fri, Feb 18, 2011 at 6:19 PM, Aklin_81 wrote: > Sylvain, > I also need to store data that is frequently updated, same column > being updated several times during each user session, at each action > by user, But, this data is not very fresh and hence when I update this > column frequently, ther

managing a limited-length list as a value

2011-02-18 Thread Benson Margulies
The following is derived from the redis list operations. The data model is that a key maps to an list of items. The operation is to push a new item into the front, and discard any items from the end above a threshold number of items. of course, this can be done by reading a value, fiddling with i

Re: Understand eventually consistent

2011-02-18 Thread Anthony John
Again, my understanding! 1. Writes will go thru w/hinted handoff, read will fail 2. Yes - but Oracle and others have no partition tolerance and lower levels of availability. To build in partition tolerance and high availability and still be shared nothing to avoid SPOF (to cover the RAC implementa

Re: Understand eventually consistent

2011-02-18 Thread mcasandra
I have couple of more quesitons: 1. What happens when RF = 3, R = 2 and W = 2 and 2 machines go down? Would read and write fail or get the results from that one machine that is up? 2. Someone in this thread mentioned that write is eventually consistent. Is it because response is returned to the c

Re: R and N

2011-02-18 Thread Anthony John
K - let me state the facts first (As I see know them) - I do not know the inner workings, so interpret my response with that caveat. Although, at an architectural level, one should be able to keep detailed implementation at bay - Quorum is (N+!)/2 where N is the Replication Factor (RF) - And consis

Re: Frequent updates of freshly written columns

2011-02-18 Thread Aklin_81
Sylvain, I also need to store data that is frequently updated, same column being updated several times during each user session, at each action by user, But, this data is not very fresh and hence when I update this column frequently, there would be many versions of the same column in several sst fi

Re: Replacing Redis

2011-02-18 Thread Benson Margulies
typical experiment. Redis 2.0.4 deployed on my macbook pro. Saves enabled. appendfsync off. vm enabled, 1g max memory. 72 databases. Each database asked to store 13*N key-value pairs with lpush, bucket size not very big, N -> 500,000. Client jredis. Start running against a stream of inputs.

RE: cassandra & php

2011-02-18 Thread David Quattlebaum
John, Just wondering what you are using if not phpcassa? Thanks! David From: John Lennard [mailto:j...@gravitate.co.nz] Sent: Thursday, February 17, 2011 6:41 PM To: user@cassandra.apache.org Subject: Re: cassandra & php Hi, How does this connection pooling fit in with the

Re: Replacing Redis

2011-02-18 Thread Jonathan Shook
Benson, I was considering using Redis for a specific project. Can you elaborate a bit on your problem with it? What were the circumstances, loading factors, etc? On Fri, Feb 18, 2011 at 9:19 AM, Benson Margulies wrote: > redis times out at random regardless of what we configure for client > timeo

Re: Schema init 'best practice'

2011-02-18 Thread Jonathan Ellis
On Fri, Feb 18, 2011 at 9:59 AM, Benson Margulies wrote: > I want to package some schema with a library. > > I could use the hector API to create the schema if not found. That's probably simplest for your users. (This is what stress.java does, for instance.) Otherwise, I'd recommend bundling a

Re: R and N

2011-02-18 Thread A J
Couple of more related questions: 5. For reads, does Cassandra first read N nodes or just the R nodes it selects ? I am thinking unless it reads all the N nodes, how will it know which node has the latest write. 6. Who decides the timestamp that gets inserted into the timestamp field of every col

Schema init 'best practice'

2011-02-18 Thread Benson Margulies
I want to package some schema with a library. I could use the hector API to create the schema if not found. Or I could, what, stuff a yaml file into something? Is there an API for that, or do I end up where I started?

Re: Understand eventually consistent

2011-02-18 Thread Jonathan Ellis
On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood wrote: > But, the reason that it isn't safe to say that we are a strongly consistent > store is that if 2 of your 3 replicas were to die and come back with no > data, QUORUM might return the wrong result. Not so. If you allow vaporizing arbitrary numbe

Re: Coordinator node

2011-02-18 Thread A J
Hi, Are there any blogs/writeups anyone is aware of that talks of using primary replica as coordinator node (rather than a random coordinator node) in production scenarios ? Thank you. On Wed, Feb 16, 2011 at 10:53 AM, A J wrote: > Thanks for the confirmation. Interesting alternatives to avoid

Re: Frequent updates of freshly written columns

2011-02-18 Thread James Churchman
ok great, thanks for the exact clarification On 18 Feb 2011, at 14:11, Aklin_81 wrote: > Compaction does not 'mutate' the sst files, it 'merges' several sst files > into one with new indexes, merged data rows & deleting tombstones. Thus you > reclaim your disk space. > > > On Fri, Feb 18, 201

R and N

2011-02-18 Thread A J
Questions about R and N (and W): 1. If I set R to Quorum and cassandra identifies a need for read repair before returning, would the read repair happen on R nodes (I mean subset of R that needs repair) or N nodes before the data is delivered to the client ? 2. Also does the repair happen at level o

Re: Replacing Redis

2011-02-18 Thread Benson Margulies
redis times out at random regardless of what we configure for client timeouts; the platform-sensitive binaries are painful for us since we support many platform; just to name two reasons. On Fri, Feb 18, 2011 at 10:04 AM, Joshua Partogi wrote: > Any reason why you want to do that? > > On Sat, Feb

Re: Replacing Redis

2011-02-18 Thread Joshua Partogi
Any reason why you want to do that? On Sat, Feb 19, 2011 at 1:32 AM, Benson Margulies wrote: > I'm about to launch off on replacing redis with cassandra. I wonder if > anyone else has ever been there and done that. > -- http://twitter.com/jpartogi

Replacing Redis

2011-02-18 Thread Benson Margulies
I'm about to launch off on replacing redis with cassandra. I wonder if anyone else has ever been there and done that.

Re: Frequent updates of freshly written columns

2011-02-18 Thread Aklin_81
Compaction does not 'mutate' the sst files, it 'merges' several sst files into one with new indexes, merged data rows & deleting tombstones. Thus you reclaim your disk space. On Fri, Feb 18, 2011 at 7:34 PM, James Churchman wrote: > but a compaction will mutate the sstables and reclaim the > spa

Re: Frequent updates of freshly written columns

2011-02-18 Thread James Churchman
but a compaction will mutate the sstables and reclaim the space (eventually) ? james On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote: > On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81 wrote: > Are the very freshly written columns to a row in memtables, efficiently > updated/overwritten by edited

Re: Understand eventually consistent

2011-02-18 Thread Markus Klems
Related question: Is it a good idea to specify ConsistencyLevels on a per-operation basis? For example: Read ONE Write ALL would deliver consistent read results, just like Read ALL Write ONE. However, if you specify Read ONE Write QUORUM you cannot give such guarantees anymore. Should there be (is

Re: Understand eventually consistent

2011-02-18 Thread Anthony John
At Quorum - if 2 of 3 nodes are down, a read should not be returned, right ? But yes - if single node READs are opted for, it will go through. The original question was - "Why is Cassandra called eventually consistent data store?" Because at write time, there is not a guarantee that all replicas

Queries on secondary indexes

2011-02-18 Thread Rauan Maemirov
With this schema: create column family Userstream with comparator=UTF8Type and rows_cached = 1 and keys_cached = 10 and column_metadata=[{column_name:account_id, validation_class:IntegerType, index_type: 0, index_name:UserstreamAccountidIdx}, {column_name:from_id, validation_class:IntegerT

How to use NetworkTopologyStrategy

2011-02-18 Thread Héctor Izquierdo Seliva
Hi! Can some body give me some hints about how to configure a keyspace with NetworkTopologyStrategy via cassandra-cli? Or what is the preferred method to do so? Thanks!

Re: cluster size, several cluster on one node for multi-tenancy

2011-02-18 Thread Mimi Aluminium
Thanks a lot for you suggestions, I will check the virtual keyspace solution - btw, currently I am using Thrift client with Pycassa, I am not familiar with Hector - does it mean we'll need to move to Hector client? I thought of using keyspaces for each tenant, but I dont understand how to define t

Re: Frequent updates of freshly written columns

2011-02-18 Thread Sylvain Lebresne
On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81 wrote: > Are the very freshly written columns to a row in memtables, efficiently > updated/overwritten by edited/new column values. > > After flushing of memtable, are those(edited + unedited ones) columns > stored together on disk (in same blocks!?) as i

Re: memory consuption

2011-02-18 Thread Peter Schuller
> main argument for using mmap() instead of standard I/O is the fact > that reading entails just touching memory - in the case of the memory > being resident, you just read it - you don't even take a page fault > (so no overhead in entering the kernel and doing a semi-context > switch). Oh and in

Re: memory consuption

2011-02-18 Thread Peter Schuller
> Jonathan, > When you get time could you please explain that a little more. Got a feeling > I'm about to learn something :) I'm not Jonathan, but: The operating system's virtual memory system supports mapping files into a process' address space. This will "use" virtual memory; i.e. address space.