Data modelling for range retrieval. Was: Re: Hadoop/Cassandra for data transformation (rather than analysis)?

2013-08-12 Thread Jan Algermissen
Aaron, On 12.08.2013, at 23:17, Aaron Morton wrote: >> As I do not have Billions of input records (but a max of 10 Milllion) the >> added benefit of scaling out the per-line processing is probably not worth >> the additional setup and operations effort of Hadoop. > I would start with a regul

Re: Custom commands in cassandra

2013-08-12 Thread Robert Coli
On Mon, Jul 29, 2013 at 12:42 PM, Nulik Nol wrote: > > Embedding the server will add *a lot* of complexity. > > that's a conjecture one would come at first sight, but if you analyze > it , it is the opposite. Complexity increases with code, and > communication between processes (like via socket o

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Robert Coli
On Mon, Aug 12, 2013 at 4:57 PM, Morgan Segalis wrote: > So I should not touch the cassandra-topology.properties file ? > > And the fact that the node 1 and node 2 are both DC1 RACK1 does not bother > cassandra ? > Nope, with the simple snitch and RF=N, Cassandra has no option when choosing repl

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis
Le 13 août 2013 à 01:50, Robert Coli a écrit : > On Mon, Aug 12, 2013 at 4:41 PM, Morgan Segalis wrote: > So I fetched the whole apache-cassandra (not the /var/lib/cassandra) folder > from my first server to my second server. > > Including the data directory for your keyspace? That's the sim

Re: Decommission an entire DC

2013-08-12 Thread Robert Coli
On Wed, Jul 24, 2013 at 12:26 PM, Lanny Ripple wrote: > That one is documented -- > http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html > That doc is incomplete, IMO. It (and the doc for adding a node) should contain an instruction to

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Robert Coli
On Mon, Aug 12, 2013 at 4:41 PM, Morgan Segalis wrote: > So I fetched the whole apache-cassandra (not the /var/lib/cassandra) > folder from my first server to my second server. > Including the data directory for your keyspace? That's the simplest way to do this operation in your case. > So I'm

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis
Hi Robert, Thanks for helping me (again). As you know, I'm a real newbie. So I fetched the whole apache-cassandra (not the /var/lib/cassandra) folder from my first server to my second server. So I'm sure to use the exact same version. I have changed the token of the second node to 8507059173023

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Robert Coli
On Mon, Aug 12, 2013 at 4:19 PM, Morgan Segalis wrote: > It is still a little fuzzy when it comes to calculate a token for 50% > distribution… How do I do that, it is not like I wanted to have 10,23% on > one node, and 89,77% and the other ;-) > The "feature" which picks a random token and resul

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis
It is still a little fuzzy when it comes to calculate a token for 50% distribution… How do I do that, it is not like I wanted to have 10,23% on one node, and 89,77% and the other ;-) I have found this website http://blog.milford.io/cassandra-token-calculator/ not sure If I should get the token

Re: Lost everything after topology change

2013-08-12 Thread Morgan Segalis
Thanks Aaron, but Robert help me for every step on free node #cassandra ! Regards, Morgan. Le 12 août 2013 à 23:30, Aaron Morton a écrit : > I think you need to get the DOWN node out of their, run nodetool removenode > > Then let us know what the ring looks like and what you want to change,

Re: Lost everything after topology change

2013-08-12 Thread Aaron Morton
I think you need to get the DOWN node out of their, run nodetool removenode Then let us know what the ring looks like and what you want to change, we should be able to help. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 13

Re: cql throw error when secondary timeuuid field use dateof()

2013-08-12 Thread Aaron Morton
> Anyway to solve those kind of issue? Like ignore if field is null or not null > fields set 0 date while altering table? Sounds like a bug, I could not find an existing ticket so can you raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA ? Please include the version you are usi

Re: Handling quorum writies fails

2013-08-12 Thread Aaron Morton
> So when I am performing a QUORUM write on cluster with RF=3 and one node > fails, I will get write error status and one successful write on another node. > If you lost one node during or before a write at QUOURM and RF 3 the write would succeed without any error to the client. > write will b

Re: Hadoop/Cassandra for data transformation (rather than analysis)?

2013-08-12 Thread Aaron Morton
> As I do not have Billions of input records (but a max of 10 Milllion) the > added benefit of scaling out the per-line processing is probably not worth > the additional setup and operations effort of Hadoop. I would start with a regular app and then go to hadoop if needed, assuming you are on

Re: cassandra 1.2.5- virtual nodes (num_token) pros/cons?

2013-08-12 Thread Aaron Morton
> Aaron - I read about the virtual nodes at > http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 Thanks, I did not see anything in there about making repair smoother / faster. Cheers A - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastp

Re: JmxReporter.java (line 388) Error processing

2013-08-12 Thread Aaron Morton
Are you running more than one node ? Is so make sure they have the same seed list, cluster name and can see each other on port 7000. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 9/08/2013, at 10:08 AM, kohanm wrote: >

Re: Distributed lock for cassandra

2013-08-12 Thread srmore
On Mon, Aug 12, 2013 at 2:49 PM, Robert Coli wrote: > On Mon, Aug 12, 2013 at 12:31 PM, srmore wrote: > >> There are some operations that demand the use lock and I was wondering >> whether Cassandra has a built in locking mechanism. After hunting the web >> for a while it appears that the answer

Re: Distributed lock for cassandra

2013-08-12 Thread Robert Coli
On Mon, Aug 12, 2013 at 12:31 PM, srmore wrote: > There are some operations that demand the use lock and I was wondering > whether Cassandra has a built in locking mechanism. After hunting the web > for a while it appears that the answer is no, although I found this > outdated wiki page which des

Distributed lock for cassandra

2013-08-12 Thread srmore
All, There are some operations that demand the use lock and I was wondering whether Cassandra has a built in locking mechanism. After hunting the web for a while it appears that the answer is no, although I found this outdated wiki page which describes the algorithm http://wiki.apache.org/cassandra

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Francisco Andrades Grassi
Hi, You should use a 50% token distribution as Mohit pointed out, but configure a replication factor of 2, so all your rows will be effectively in both nodes. -- Francisco Andrades Grassi www.bigjocker.com @bigjocker On Aug 12, 2013, at 2:44 PM, Morgan Segalis wrote: > Hi, thank you for you a

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Robert Coli
On Mon, Aug 12, 2013 at 12:14 PM, Morgan Segalis wrote: > Hi, thank you for you answer… > > I don't want 50% I would like 100% so I one is down the second can take > over. > In order to have 2 nodes with each having 100% ownership and 100% of data, you need to increase replication factor to 2. T

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis
Hi, thank you for you answer… I don't want 50% I would like 100% so I one is down the second can take over. Thank you. Le 12 août 2013 à 21:09, Mohit Anchlia a écrit : > You need to get it to 50% on each to equally distribute the has range. You > need to 1) Calculate new token 2) move nodes t

Re: Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Mohit Anchlia
You need to get it to 50% on each to equally distribute the has range. You need to 1) Calculate new token 2) move nodes to that token or use vnodes For the first option see: http://www.datastax.com/docs/0.8/install/cluster_init On Mon, Aug 12, 2013 at 12:06 PM, Morgan Segalis wrote: > Hi ever

Having 2 nodes with 100% Ownership ?

2013-08-12 Thread Morgan Segalis
Hi everyone, I would like to have 100% Effective-Owership on both cassandra nodes… I just have created the second node now… ./nodetool ring gives me : Address DC RackStatus State Load Effective-Owership Token

Re: understanding memory footprint

2013-08-12 Thread Robert Coli
On Mon, Aug 12, 2013 at 11:14 AM, Paul Ingalls wrote: > I don't really need exact numbers, just a rough cost would be sufficient. > I'm running into memory problems on my cluster, and I'm trying to decide > if reducing the number of column families would be worth the effort. > Looking at the rul

Re: understanding memory footprint

2013-08-12 Thread Paul Ingalls
I don't really need exact numbers, just a rough cost would be sufficient. I'm running into memory problems on my cluster, and I'm trying to decide if reducing the number of column families would be worth the effort. Looking at the rule of thumb from the wiki entry made it seem like reducing th

Re: For Multi Datacenter Geo redundancy which snitch works better

2013-08-12 Thread Robert Coli
On Mon, Aug 12, 2013 at 7:06 AM, Ken Schell wrote: > I'm building a two datacenter cluster for Geo redundancy, each with a > minimum of 12 nodes. > > Which Snitch would you recommend, PropertyFileSnitch or > RackInferringSnitch? > I'd probably use GossipingPropertyFileSnitch before PropertyFile

Re: understanding memory footprint

2013-08-12 Thread Robert Coli
On Mon, Aug 12, 2013 at 10:22 AM, Paul Ingalls wrote: > At the core, my question really is: > > "Does the number of column families still significantly impact the memory > footprint? If so, what is the incremental cost of a column family/table?" > This question has been asked about a kabillion ti

understanding memory footprint

2013-08-12 Thread Paul Ingalls
I'm trying to get a handle on how newer cassandra handles memory. Most of what I am seeing via google, on the wiki etc. appears old. For example, this wiki article appears out of date relative to post 1.0: http://wiki.apache.org/cassandra/MemtableThresholds specifically this is the section I'

Re: Lost everything after topology change

2013-08-12 Thread Morgan Segalis
Actually I was creating my second node… Since I wanted to have a full replication I have changed the typology… I have reverted the topology by getting the one on the tgz file since it was the first time I mess with it… now that I reverted the file back it still does not get me my data : nodet

Re: Custom data type class in pycassa

2013-08-12 Thread Tyler Hobbs
You can't specify that sort of custom type as part of the schema; instead, use BytesType and tell pycassa to interpret those columns as your custom type through the 'column_validators' attribute on your ColumnFamily object. For example: mycf = ColumnFamily(...) mycf.column_validators["email_addre

Re: Lost everything after topology change

2013-08-12 Thread Robert Coli
On Mon, Aug 12, 2013 at 9:36 AM, Morgan Segalis wrote: > I'm coming to you because I'm quite in a pickle, and need to get the > Cassandra database working asap… > First, #cassandra on freenode is usually better for emergent cases like this. > I tried to change the topology file and tried a nod

Lost everything after topology change

2013-08-12 Thread Morgan Segalis
Hi everyone, I'm coming to you because I'm quite in a pickle, and need to get the Cassandra database working asap… I tried to change the topology file and tried a node tool repair… in cassandra-cli when I tried to list a column family, it tells me null UnavailableException() at org.a

For Multi Datacenter Geo redundancy which snitch works better

2013-08-12 Thread Ken Schell
Hello Everyone, I'm building a two datacenter cluster for Geo redundancy, each with a minimum of 12 nodes. Which Snitch would you recommend, PropertyFileSnitch or RackInferringSnitch? Our network addressing will support either method. All Keyspaces have a replication factor of 3 to 5, with in

Re: Any good GUI based tool to manage data in Casandra?

2013-08-12 Thread Andrew Cobley
Actually, is that true ? Certainly I guess most versions of C* run on Linux, but do most C* db admins use linux on the desktop ? Or ar they connecting form Mac/PC for management purposes ? Andy On 9 Aug 2013, at 21:11, Keith Freeman <8fo...@gmail.com> wrote: Sounds like a