Re: Any good GUI based tool to manage data in Casandra?

2013-08-06 Thread Aaron Morton
There is a list here. 

http://wiki.apache.org/cassandra/Administration%20Tools

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 3/08/2013, at 6:19 AM, Tony Anecito  wrote:

> Hi All,
> 
> Is there a GUI tool for managing data in Cassandra database? I have google 
> and seen tools but they seem to be schema management or explorer to just view 
> data. IT would be great to delete/inset rows or update values for a column 
> via GUI.
> 
> Thanks,
> -Tony



Re: cassandra 1.2.5- virtual nodes (num_token) pros/cons?

2013-08-06 Thread Aaron Morton
> The reason for me looking at virtual nodes is because of terrible experiences 
> we had with 0.8 repairs and as per documentation (an logically) the virtual 
> nodes seems like it will help repairs being smoother. Is this true?
I've not thought too much about how they help repair run smoother, what was the 
documentation you read ? 

> Also how to get the right number of virtual nodes?
Use the default 256


Hope that helps. 

 
-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 3/08/2013, at 7:39 AM, rash aroskar  wrote:

> Thanks for helpful responses. The upgrade from 0.8 to 1.2 is not direct, we 
> have setup test cluster where we did upgrade from 0.8 to 1.1 and then 1.2. 
> Also we will do a whole different cluster with 1.2, the 0.8 cluster will not 
> be upgraded. But the data will be moved from 0.8 cluster to 1.2 cluster. 
> The reason for me looking at virtual nodes is because of terrible experiences 
> we had with 0.8 repairs and as per documentation (an logically) the virtual 
> nodes seems like it will help repairs being smoother. Is this true? Also how 
> to get the right number of virtual nodes? David suggested 64 vnodes for 20 
> machines. Is there a formula or a thought process to be followed to get this 
> number right?
> 
> 
> On Mon, Jul 29, 2013 at 4:15 AM, aaron morton  wrote:
> I would *strongly* recommend against upgrading from 0.8 directly to 1.2. 
> Skipping a major version is generally not recommended, skipped 3 would seem 
> like carelessness. 
> 
>> I second Romain, do the upgrade and make sure the health is good first.
> 
> +1 but I would also recommend deciding if you actually need to use virtual 
> nodes. The shuffle process can take a long time and people have had mixed 
> experiences with it. 
> 
> If you wanted to move to 1.2 and get vNodes I would consider spinning up a 
> new cluster and bulk loading into it. You could do an initial load and then 
> to delta loads using snapshots, there would however be a period of stale data 
> in the new cluster until the last delta snapshot is loaded. 
> 
> Cheers
> 
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 27/07/2013, at 3:36 AM, David McNelis  wrote:
> 
>> I second Romain, do the upgrade and make sure the health is good first.
>> 
>> If you have or plan to have a large number of nodes, you might consider 
>> using fewer than 256 as your initial vnodes amount.  I think that number is 
>> inflated from reasonable in the docs, as we've had some people talk about 
>> potential performance degradation if you have a large number of nodes and a 
>> very high number of vnodes, if I had it to do over again, I'd have done 64 
>> vnodes as my default (across 20 nodes).
>> 
>> Another thing to be very cognizant of before shuffle is disk space.  You 
>> *must* have less than 50% used in order to do the shuffle successfully 
>> because no data is removed (cleaned) from a node during the shuffle process 
>> and the shuffle process essentially doubles the amount of data until you're 
>> able to run a clean.
>> 
>> 
>> On Fri, Jul 26, 2013 at 11:25 AM, Romain HARDOUIN 
>>  wrote:
>> Vnodes are a great feature. More nodes are involved during operations such 
>> as bootstrap, decommission, etc. 
>> DataStax documentation is definitely a must read. 
>> That said, If I were you, I'd wait somewhat before to shuffle the ring. I'd 
>> focus on cluster upgrade and monitoring the nodes. (number of files handles, 
>> memory usage, latency, etc). 
>> Upgrading from 0.8 to 1.2 can be tricky, there are so many changes since 
>> then. Be careful about compaction strategies you choose and double check the 
>> options. 
>> 
>> Regards, 
>> Romain 
>> 
>> rash aroskar  a écrit sur 25/07/2013 23:25:11 :
>> 
>> > De : rash aroskar  
>> > A : user@cassandra.apache.org, 
>> > Date : 25/07/2013 23:25 
>> > Objet : cassandra 1.2.5- virtual nodes (num_token) pros/cons? 
>> > 
>> > Hi, 
>> > I am upgrading my cassandra cluster from 0.8 to 1.2.5.  
>> > In cassandra 1.2.5 the 'num_token' attribute confuses me.  
>> > I understand that it distributes multiple tokens per node but I am 
>> > not clear how that is helpful for performance or load balancing. Can
>> > anyone elaborate? has anyone used this feature  and knows its 
>> > advantages/disadvantages? 
>> > 
>> > Thanks, 
>> > Rashmi
>> 
> 
> 



Re: Better to have lower or greater cardinality for partition key in CQL3?

2013-08-06 Thread Aaron Morton
> So from anyones experience, is it better to use a low cardinality
> partition key or a high cardinality.
IMHO go with whatever best supports the read paths. They all get 
If you have lots (e.g. north of 1 billion) rows per node there are extra 
considerations that come into play. Cassandra 1.2 helps a lot with the bloom 
filters and compression meta off heap. Basically you may need to pay more 
attention to memory usage at that scale. 

This is one place for the LCS can help. It allows you to have a higher bloom 
filter FP chance, which results in a lower memory overhead for a given number 
of rows. Remember that LCS uses roughly twice the IO though, so make sure you 
can handle the throughput. 

Otherwise your update workflow sounds is a perfect match for Size Tiered 
compaction.   

Hope that helps.

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/08/2013, at 4:57 PM, David Ward  wrote:

> Hello,
> Was curious what people had found to be better for
> structuring/modeling data into C*?   With my data I have two primary
> keys, one 64 bit int thats 0 - 50 million ( its unlikely to go higher
> then 70 million ever ) and another 64 bit that's probably close to
> hitting a trillion in the next year or so.   Looking at how the data
> is going to behave, for the first few months each row/record will be
> updated but after that its practically written in stone.  Still I was
> leaning toward leveled compaction as it gets updated anywhere from
> once an hour to at least once a day for the first 7 days.
> 
> So from anyones experience, is it better to use a low cardinality
> partition key or a high cardinality.   Additionally data organized by
> the low cardinality set is probably 1-6B ( and growing ) but the high
> cardinality would be 1-6MB only 2-3x a year.
> 
> 
> Thanks,
>   Dave
> 
> 
> new high cardinality keys in 1 year ~15,768,00,000
> new low cardinality keys in 1 year = 10,000-30,000
> 
> low cardinality key set size ~1-6GB
> high cardinality key set size 1-5MB



Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread Aaron Morton
> how many nodes to start with(2 ok?) ?
I'd recommend 3, that will give you some redundancy see 
http://thelastpickle.com/2011/06/13/Down-For-Me/

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/08/2013, at 1:41 AM, Rajkumar Gupta  wrote:

> okay, so what should a workable VPS configuration to start with & minimum how 
> many nodes to start with(2 ok?) ?  Seriously I cannot afford the tensions of 
> colocation setup.  My hosting provider provides SSD drives with KVM 
> virtualization.



Re: Reducing the number of vnodes

2013-08-06 Thread Aaron Morton
Repair runs in two phases, first it works out the differences then it streams 
the data. The length of the first depends on the size of the data and the 
second on the level of inconsistency. 

To track the first use nodetool compaction stats or look in the logs for the 
messages about requesting or receiving Merkle Tree's. 

To track the second use nodetool netstats or look in the logs for messages 
about streaming X number of ranges.

Hope that helps. 
 
-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/08/2013, at 1:41 AM, Christopher Wirt  wrote:

> 1.2.4. Really hesitant to upgrade versions due to the inevitable issues it 
> will cause.
>  
> Guess I could upgrade a single node and let that run for a while before 
> upgrading all nodes.
>  
> From: Haithem Jarraya [mailto:a-hjarr...@expedia.com] 
> Sent: 05 August 2013 13:04
> To: user@cassandra.apache.org
> Subject: Re: Reducing the number of vnodes
>  
> Chris,
> Which C* version are you running? 
> You might want to do an upgrade to the latest version before reducing the 
> vnode counts, a lot of fixes and improvement went in lately, it might help 
> you getting your repair faster.
>  
> H
>  
> On 5 Aug 2013, at 12:30, Christopher Wirt  wrote:
> 
> 
> Hi,
>  
> I’m thinking about reducing the number of vnodes per server.
>  
> We have 3 DC setup – one with 9 nodes, two with 3 nodes each.
>  
> Each node has 256 vnodes. We’ve found that repair operations are beginning to 
> take too long.
>  
> Is reducing the number of vnodes to 64/32 likely to help our situation?
> What options do I have for achieving this in a live cluster?
>  
>  
> Thanks,
>  
> Chris



Re: Unable to bootstrap node

2013-08-06 Thread Aaron Morton
> Caused by: java.io.FileNotFoundException: 
> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
>  (No such file or directory)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> at 
> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:67)
> at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.
This is somewhat serous, specially if it's from the a bug in dropping tables. 
Though I would expect that would show up for a lot of people. 

Does the file exist on disk?
Are the permissions correct ? 

IMHO you need to address this issue on the existing nodes before worrying about 
the new node. 

Cheers
 
-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/08/2013, at 1:25 PM, sankalp kohli  wrote:

> Let me know if this fixes the problem?
> 
> 
> On Mon, Aug 5, 2013 at 6:24 PM, sankalp kohli  wrote:
> So the problem is that when you dropped and recreated the table with the same 
> name, some how the old CFStore object was not purged. So now there were two 
> objects which caused same sstable to have 2 SSTableReader object. 
> 
> The fix is to find all nodes which is emitting this FileNotFound Exception 
> and restart them. 
> 
> In your case, restart the node which is serving the data and emitting 
> FileNotFound exception. 
> 
> Once this is up, again restart the bootstrapping node with bootstrap 
> argument. Now it will successfully stream the data. 
> 
> 
> On Mon, Aug 5, 2013 at 6:08 PM, Keith Wright  wrote:
> Yes we likely dropped and recreated tables.  If we stop the sending node, 
> what will happen to the bootstrapping node?
> 
> sankalp kohli  wrote:
> 
> Hi,
> The problem is that the node sending the stream is hitting this 
> FileNotFound exception. You need to restart this node and it should fix the 
> problem. 
> 
> Are you seeing lot of FileNotFoundExceptions? Did you do any schema change 
> recently?
> 
> Sankalp
> 
> 
> On Mon, Aug 5, 2013 at 5:39 PM, Keith Wright  wrote:
> Hi all,
> 
>I have been trying to bootstrap a new node into my 7 node 1.2.4 C* cluster 
> with Vnodes RF3 with no luck.  It gets close to completing and then the 
> streaming just stalls with  streaming at 99% from 1 or 2 nodes.  Nodetool 
> netstats shows the items that have yet to stream but the logs on the new node 
> do not show any errors.  I tried shutting down then node, clearing all 
> data/commit logs/caches, and re-boot strapping with no luck.  The nodes that 
> are hanging sending the data only have the error below but that's related to 
> compactions (see below) although it is one of the files that is waiting to be 
> sent.  I tried nodetool scrub on the column family with the missing item but 
> got an error indicating it could not get a hard link.  Any ideas?  We were 
> able to bootstrap one of the new nodes with no issues but this other one has 
> been a real pain.  Note that when the new node is joining the cluster, it 
> does not appear in nodetool status.  Is that expected?
> 
> Thanks all, my next step is to try getting a new IP for this machine, my 
> thought being that the cluster doesn't like me continuing to attempt to 
> bootstrap the node repeatedly each time getting a new host id.
> 
> [kwright@lxpcas008 ~]$ nodetool netstats | grep 
> rts-40301_feedProducts-ib-1-Data.db
>rts: 
> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
>  sections=73 progress=0/1884669 - 0%
> 
> ERROR [ReadStage:427] 2013-08-05 23:23:29,294 CassandraDaemon.java (line 174) 
> Exception in thread Thread[ReadStage:427,5,main]
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
>  (No such file or directory)
> at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:46)
> at 
> org.apache.cassandra.io.util.CompressedSegmentedFile.createReader(CompressedSegmentedFile.java:57)
> at 
> org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:41)
> at 
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:976)
> at 
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableNamesIterator.java:98)
> at 
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:117)
> at 
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:64)
> at 
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
> at 
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
> at 
> org.apache.cassandra.db.CollationController.collectTimeOrdered

Re: Question about 'duplicate' columns

2013-08-06 Thread Aaron Morton
Yes. If you overwrite much older data with new data both "versions" of the 
column will remain on disk until compaction get's to work on both fragments of 
the row.

Cheers
 
-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/08/2013, at 6:48 PM, Franc Carter  wrote:

> 
> I've been thinking through some cases that I can see happening at some point 
> and thought I'd ask on the list to see if my understanding is correct.
> 
> Say a bunch of columns have been loaded 'a long time ago', i.e long enough in 
> the past that they have been compacted. My understanding is that if some 
> these columns get reloaded then they are likely to sit in additional sstables 
> until the larger sstable is called up for compaction, which might be a while.
> 
> The case that springs to mind is filling small gaps in data by doing bulk 
> loads around the gap to make sure that the gap is filled.
> 
> Have I understood correctly ?
> 
> thanks
> 
> -- 
> Franc Carter | Systems architect | Sirca Ltd
> franc.car...@sirca.org.au | www.sirca.org.au
> Tel: +61 2 8355 2514 
> Level 4, 55 Harrington St, The Rocks NSW 2000
> PO Box H58, Australia Square, Sydney NSW 1215
> 



Re: Question about 'duplicate' columns

2013-08-06 Thread Franc Carter
On Tue, Aug 6, 2013 at 6:10 PM, Aaron Morton wrote:

> Yes. If you overwrite much older data with new data both "versions" of the
> column will remain on disk until compaction get's to work on both fragments
> of the row.
>

thanks


>
> Cheers
>
>  -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/08/2013, at 6:48 PM, Franc Carter  wrote:
>
>
> I've been thinking through some cases that I can see happening at some
> point and thought I'd ask on the list to see if my understanding is correct.
>
> Say a bunch of columns have been loaded 'a long time ago', i.e long enough
> in the past that they have been compacted. My understanding is that if some
> these columns get reloaded then they are likely to sit in additional
> sstables until the larger sstable is called up for compaction, which might
> be a while.
>
> The case that springs to mind is filling small gaps in data by doing bulk
> loads around the gap to make sure that the gap is filled.
>
> Have I understood correctly ?
>
> thanks
>
> --
> *Franc Carter* | Systems architect | Sirca Ltd
>  
> franc.car...@sirca.org.au | www.sirca.org.au
> Tel: +61 2 8355 2514
>  Level 4, 55 Harrington St, The Rocks NSW 2000
> PO Box H58, Australia Square, Sydney NSW 1215
>
>
>


-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Is there update-in-place on maps?

2013-08-06 Thread Jan Algermissen
Hi,

I think it does not fit the model of how C* does writes, but just to verify:

Is there an update-in-place possibility on maps? That is, could I do an atomic 
increment on a value in a map?

Jan

Effect of TTL on collection updates

2013-08-06 Thread Jan Algermissen
Hi,

after seeing Patrick's truly excellent 3-part series on modeling, this question 
pops up:

When I do an update on a collection, using a TTL in the update statement (like 
Patrick does in the example with the login-location time series example), does 
the TTL apply to the update only, or to the row as a whole?

Jan

Re: cassandra 1.2.5- virtual nodes (num_token) pros/cons?

2013-08-06 Thread Richard Low
On 6 August 2013 08:40, Aaron Morton  wrote:

> The reason for me looking at virtual nodes is because of terrible
> experiences we had with 0.8 repairs and as per documentation (an logically)
> the virtual nodes seems like it will help repairs being smoother. Is this
> true?
>
> I've not thought too much about how they help repair run smoother, what
> was the documentation you read ?
>

There might be a slight improvement but I haven't observed any.  The
difference might be that, because every node shares replicas with every
other (with high probability), a single repair operation does the same work
on the node it was called on, but the rest is spread out over the cluster,
rather than just the RF nodes either side of the repairing node.  This
means the post-repair compaction work will take less time and the length of
time a node is loaded for during repair is less.

However, the other benefits of vnodes are likely to be much more useful.

Richard.


Re: Effect of TTL on collection updates

2013-08-06 Thread Alain RODRIGUEZ
Hi Jan

"TTLs if used only apply to the newly inserted/updated values", from :
http://cassandra.apache.org/doc/cql3/CQL.html#collections

This manual is updated often enough to be up to date, and so, useful, you
should keep it bookmarked.

Alain




2013/8/6 Jan Algermissen 

> Hi,
>
> after seeing Patrick's truly excellent 3-part series on modeling, this
> question pops up:
>
> When I do an update on a collection, using a TTL in the update statement
> (like Patrick does in the example with the login-location time series
> example), does the TTL apply to the update only, or to the row as a whole?
>
> Jan


Re: Is there update-in-place on maps?

2013-08-06 Thread Alain RODRIGUEZ
Once again, this should answer your question :
http://cassandra.apache.org/doc/cql3/CQL.html#collections

Alain


2013/8/6 Jan Algermissen 

> Hi,
>
> I think it does not fit the model of how C* does writes, but just to
> verify:
>
> Is there an update-in-place possibility on maps? That is, could I do an
> atomic increment on a value in a map?
>
> Jan


Re: Is there update-in-place on maps?

2013-08-06 Thread Jan Algermissen
Alain,

On 06.08.2013, at 11:17, Alain RODRIGUEZ  wrote:

> Once again, this should answer your question : 
> http://cassandra.apache.org/doc/cql3/CQL.html#collections

yup, I understand the hint :-) However, since I am about to base application 
architecture on these capabilities, I wanted to make sure I do not read 
anything into the docs that isn't there.

As for the atomic increment, I take the answer is 'no, there is no atomic 
increment, I have to pull the value to the client and send an update with the 
new value'.



Jan



> 
> Alain
> 
> 
> 2013/8/6 Jan Algermissen 
> Hi,
> 
> I think it does not fit the model of how C* does writes, but just to verify:
> 
> Is there an update-in-place possibility on maps? That is, could I do an 
> atomic increment on a value in a map?
> 
> Jan
> 



Re: Is there update-in-place on maps?

2013-08-06 Thread Andy Twigg
Store pointers to counters as map values?


Re: Is there update-in-place on maps?

2013-08-06 Thread Jan Algermissen

On 06.08.2013, at 11:36, Andy Twigg  wrote:

> Store pointers to counters as map values?

Sorry, but this fits into nothing I know about C* so far - can you explain?

Jan



Re: Is there update-in-place on maps?

2013-08-06 Thread Andy Twigg
Counters can be atomically incremented (
http://wiki.apache.org/cassandra/Counters). Pick a UUID for the counter,
and use that: c=map.get(k); c.incr()


On 6 August 2013 11:01, Jan Algermissen  wrote:

>
> On 06.08.2013, at 11:36, Andy Twigg  wrote:
>
> > Store pointers to counters as map values?
>
> Sorry, but this fits into nothing I know about C* so far - can you explain?
>
> Jan
>
>


-- 
Dr Andy Twigg
Junior Research Fellow, St Johns College, Oxford
Room 351, Department of Computer Science
http://www.cs.ox.ac.uk/people/andy.twigg/
andy.tw...@cs.ox.ac.uk | +447799647538


Re: Any good GUI based tool to manage data in Casandra?

2013-08-06 Thread Tony Anecito
Thanks Aaron. I found that before I asked the question and Helenos seems the 
closest but it does not allow you to easily use CRUD like say SQL Server 
Management tools where you can get a list of say 1,000 records in a grid 
control and select rows for deletion or insert or update.
 
I will look closer at that one since this is the reply from the team but if 
users on this email list have other suggestions please do not hesitate to reply.
 
Many Thanks,
-Tony

From: Aaron Morton 
To: Cassandra User  
Sent: Tuesday, August 6, 2013 1:38 AM
Subject: Re: Any good GUI based tool to manage data in Casandra?



There is a list here.  

http://wiki.apache.org/cassandra/Administration%20Tools

Cheers


-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com/

On 3/08/2013, at 6:19 AM, Tony Anecito  wrote:

Hi All,
>
>
>Is there a GUI tool for managing data in Cassandra database? I have google and 
>seen tools but they seem to be schema management or explorer to just view 
>data. IT would be great to delete/inset rows or update values for a column via 
>GUI.
>
>
>Thanks,
>-Tony
>

clarification of token() in CQL3

2013-08-06 Thread Keith Freeman
I've seen in several places the advice to use queries like to this page 
through lots of rows:

select id from mytable where token(id) > token(last_id)


But it's hard to find detailed information about how this works (at 
least that I can understand -- the description in the Cassandra manual 
is pretty brief).


One thing I'd like to know is if new rows are always guaranteed to have 
token(new_id) > token(ids-of-all-previous-rows)?  E.g. if I have one 
process that adds rows to a table, and another that processes rows from 
the table, can the "processor" save the id of the last row processed and 
when he wakes up use:


   select * from mytable where token(id) > token(last_processed_id)


to process only new rows?  Will this always work to get only new rows?


Re: clarification of token() in CQL3

2013-08-06 Thread Richard Low
On 6 August 2013 15:12, Keith Freeman <8fo...@gmail.com> wrote:

>  I've seen in several places the advice to use queries like to this page
> through lots of rows:
>
>
> select id from mytable where token(id) > token(last_id)
>
>
> But it's hard to find detailed information about how this works (at least
> that I can understand -- the description in the Cassandra manual is pretty
> brief).
>
> One thing I'd like to know is if new rows are always guaranteed to have
> token(new_id) > token(ids-of-all-previous-rows)?  E.g. if I have one
> process that adds rows to a table, and another that processes rows from the
> table, can the "processor" save the id of the last row processed and when
> he wakes up use:
>
> select * from mytable where token(id) > token(last_processed_id)
>
>
> to process only new rows?  Will this always work to get only new rows?
>

No, unfortunately not.  The tokens are generated by the partitioner - they
are the hash of the row key.  New tokens could be anywhere in the range of
tokens so you can't use token ordering to find new rows.

The query you suggest works to page through all the data in your column
family.  Rows will be returned regardless of when they were added (as long
as they were added before the query started).  Finding rows that have been
added since a certain time is hard in Cassandra since they are stored in
token order.  In general you have to read through all the data and work out
from e.g. a date field if they should be treated as new.

Richard.


RE: Counters and replication

2013-08-06 Thread Christopher Wirt
Hi Richard,

Thanks for your reply.

 

The uid value is a generated guid and should distribute nicely.

I've just checked the data yesterday there are only 3 uids out of millions
for which there would have been more than 1000 increments.

We started with 256 num_tokens.

Client and server side I can see the writes being balanced.

 

 

Anyway, think I've got things under control now.

 

I appears I hadn't set an sstable size on the cf compaction strategy (LCS).
I guess this was defaulting to 10MB.

 

After setting this to 256MB one of the 'bad' nodes fixed itself. The other
two appeared to stall mid compaction, but after a quick restart both resumed
compaction with acceptable CPU utilization.

 

Any insight into how this caused the issue is welcome. 

 

 

 

Thanks

 

 

 

 

From: Richard Low [mailto:rich...@wentnet.com] 
Sent: 05 August 2013 20:30
To: user@cassandra.apache.org
Subject: Re: Counters and replication

 

On 5 August 2013 20:04, Christopher Wirt  wrote:

Hello,

 

Question about counters, replication and the ReplicateOnWriteStage

 

I've recently turned on a new CF which uses a counter column. 

 

We have a three DC setup running Cassandra 1.2.4 with vNodes, hex core
processors, 32Gb memory.

DC 1 - 9 nodes with RF 3

DC 2 - 3 nodes with RF 2 

DC 3 - 3 nodes with RF 2

 

DC 1 one receives most of the updates to this counter column. ~3k per sec.

 

I've disabled any client reads while I sort out this issue.

Disk utilization is very low

Memory is aplenty (while not reading)

Schema:

CREATE TABLE cf1 (

  uid uuid,

  id1 int,

  id2 int,

  id3 int,

  ct counter,

  PRIMARY KEY (uid, id1, id2, id3)

) WITH .

 

Three of the machines in DC 1 are reporting very high CPU load.

Looking at tpstats there is a large number of pending ReplicateOnWriteStage
just on those machines.

 

Why would only three of the machines be reporting this? 

Assuming its distributed by uuid value there should be an even load across
the cluster, yea?

Am I missing something about how distributed counters work?

 

If you have many different uid values and your cluster is balanced then you
should see even load.  Were your tokens chosen randomly?  Did you start out
with num_tokens set high or upgrade from num_tokens=1 or an earlier
Cassandra version?  Is it possible your workload is incrementing the counter
for one particular uid much more than the others?

 

The distribution of counters works the same as for non-counters in terms of
which nodes receive which values.  However, there is a read on the
coordinator (randomly chosen for each inc) to read the current value and
replicate it to the remaining replicas.  This makes counter increments much
more expensive than normal inserts, even if all your counters fit in cache.
This is done in the ReplicateOnWriteStage, which is why you are seeing that
queue build up.

 

Is changing CL to ONE fine if I'm not too worried about 100% consistency?

 

Yes, but to make the biggest difference you will need to turn off
replicate_on_write (alter table cf1 with replicate_on_write = false;) but
this *guarantees* your counts aren't replicated, even if all replicas are
up.  It avoids doing the read, so makes a huge difference to performance,
but means that if a node is unavailable later on, you *will* read
inconsistent counts.  (Or, worse, if a node fails, you will lose counts
forever.)  This is in contrast to CL.ONE inserts for normal values when
inserts are still attempted on all replicas, but only one is required to
succeed.

 

So you might be able to get a temporary performance boost by changing
replicate_on_write if your counter values aren't important.  But this won't
solve the root of the problem.

 

Richard.



Re: clarification of token() in CQL3

2013-08-06 Thread Keith Freeman

Ok, I get that, I'll have to find another way to sort out new rows.

Your description makes me think that if new rows are added during the 
paging (i.e. between one select with token()'s and another), they might 
show up in the query results, right?  (because the hash of the new row 
keys might fall sequentially after token(last_processed_row))


On 08/06/2013 08:18 AM, Richard Low wrote:
On 6 August 2013 15:12, Keith Freeman <8fo...@gmail.com 
> wrote:


I've seen in several places the advice to use queries like to this
page through lots of rows:

select id from mytable where token(id) > token(last_id)


But it's hard to find detailed information about how this works
(at least that I can understand -- the description in the
Cassandra manual is pretty brief).

One thing I'd like to know is if new rows are always guaranteed to
have token(new_id) > token(ids-of-all-previous-rows)?  E.g. if I
have one process that adds rows to a table, and another that
processes rows from the table, can the "processor" save the id of
the last row processed and when he wakes up use:

select * from mytable where token(id) > token(last_processed_id)


to process only new rows?  Will this always work to get only new rows?


No, unfortunately not.  The tokens are generated by the partitioner - 
they are the hash of the row key.  New tokens could be anywhere in the 
range of tokens so you can't use token ordering to find new rows.


The query you suggest works to page through all the data in your 
column family.  Rows will be returned regardless of when they were 
added (as long as they were added before the query started).  Finding 
rows that have been added since a certain time is hard in Cassandra 
since they are stored in token order.  In general you have to read 
through all the data and work out from e.g. a date field if they 
should be treated as new.


Richard.




Re: clarification of token() in CQL3

2013-08-06 Thread Richard Low
On 6 August 2013 16:56, Keith Freeman <8fo...@gmail.com> wrote:

 Your description makes me think that if new rows are added during the
> paging (i.e. between one select with token()'s and another), they might
> show up in the query results, right?  (because the hash of the new row keys
> might fall sequentially after token(last_processed_row))
>

Yes, new rows will appear if their hash is greater than last_processed_row.

Richard.


CQL3 select between is broken?

2013-08-06 Thread Keith Freeman
I've been looking at examples about modeling series data in Cassandra, 
and in one experiment created a table like this:

 create table vvv (k text, t bigint, value text, primary key (k, t));
After inserting some data with identical k values and differing t 
values, I tried this query (which is nearly identical to another example 
I found on the mailing list):

cqlsh:smdbxp> select * from vvv where k = 'a' and t between 111 and 222;
Bad Request: line 1:54 no viable alternative at input '222'
Why doesn't that work?  Is the syntax of the select wrong for CQL3? (I'm 
running 1.2.8)


Re: CQL3 select between is broken?

2013-08-06 Thread David Ward
http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt

try `and t > 111 and t < 222'  or >= and <= if you want inclusive.

On Tue, Aug 6, 2013 at 10:35 AM, Keith Freeman <8fo...@gmail.com> wrote:
> I've been looking at examples about modeling series data in Cassandra, and
> in one experiment created a table like this:
>>
>>  create table vvv (k text, t bigint, value text, primary key (k, t));
>
> After inserting some data with identical k values and differing t values, I
> tried this query (which is nearly identical to another example I found on
> the mailing list):
>>
>> cqlsh:smdbxp> select * from vvv where k = 'a' and t between 111 and 222;
>> Bad Request: line 1:54 no viable alternative at input '222'
>
> Why doesn't that work?  Is the syntax of the select wrong for CQL3? (I'm
> running 1.2.8)


Re: Unable to bootstrap node

2013-08-06 Thread Keith Wright
The file does not appear on disk and the permissions are definitely correct.  
We have seen the file in snapshots.   This is completely blocking us from 
adding the new node.  How can we recover?  Just run repairs?

Thanks

From: Aaron Morton mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, August 6, 2013 4:06 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Unable to bootstrap node

Caused by: java.io.FileNotFoundException: 
/data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
 (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:67)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.
This is somewhat serous, specially if it's from the a bug in dropping tables. 
Though I would expect that would show up for a lot of people.

Does the file exist on disk?
Are the permissions correct ?

IMHO you need to address this issue on the existing nodes before worrying about 
the new node.

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/08/2013, at 1:25 PM, sankalp kohli 
mailto:kohlisank...@gmail.com>> wrote:

Let me know if this fixes the problem?


On Mon, Aug 5, 2013 at 6:24 PM, sankalp kohli 
mailto:kohlisank...@gmail.com>> wrote:
So the problem is that when you dropped and recreated the table with the same 
name, some how the old CFStore object was not purged. So now there were two 
objects which caused same sstable to have 2 SSTableReader object.

The fix is to find all nodes which is emitting this FileNotFound Exception and 
restart them.

In your case, restart the node which is serving the data and emitting 
FileNotFound exception.

Once this is up, again restart the bootstrapping node with bootstrap argument. 
Now it will successfully stream the data.


On Mon, Aug 5, 2013 at 6:08 PM, Keith Wright 
mailto:kwri...@nanigans.com>> wrote:

Yes we likely dropped and recreated tables.  If we stop the sending node, what 
will happen to the bootstrapping node?

sankalp kohli mailto:kohlisank...@gmail.com>> wrote:



Hi,
The problem is that the node sending the stream is hitting this 
FileNotFound exception. You need to restart this node and it should fix the 
problem.

Are you seeing lot of FileNotFoundExceptions? Did you do any schema change 
recently?

Sankalp


On Mon, Aug 5, 2013 at 5:39 PM, Keith Wright 
mailto:kwri...@nanigans.com>> wrote:
Hi all,

   I have been trying to bootstrap a new node into my 7 node 1.2.4 C* cluster 
with Vnodes RF3 with no luck.  It gets close to completing and then the 
streaming just stalls with  streaming at 99% from 1 or 2 nodes.  Nodetool 
netstats shows the items that have yet to stream but the logs on the new node 
do not show any errors.  I tried shutting down then node, clearing all 
data/commit logs/caches, and re-boot strapping with no luck.  The nodes that 
are hanging sending the data only have the error below but that's related to 
compactions (see below) although it is one of the files that is waiting to be 
sent.  I tried nodetool scrub on the column family with the missing item but 
got an error indicating it could not get a hard link.  Any ideas?  We were able 
to bootstrap one of the new nodes with no issues but this other one has been a 
real pain.  Note that when the new node is joining the cluster, it does not 
appear in nodetool status.  Is that expected?

Thanks all, my next step is to try getting a new IP for this machine, my 
thought being that the cluster doesn't like me continuing to attempt to 
bootstrap the node repeatedly each time getting a new host id.

[kwright@lxpcas008 ~]$ nodetool netstats | grep 
rts-40301_feedProducts-ib-1-Data.db
   rts: 
/data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
 sections=73 progress=0/1884669 - 0%

ERROR [ReadStage:427] 2013-08-05 23:23:29,294 CassandraDaemon.java (line 174) 
Exception in thread Thread[ReadStage:427,5,main]
java.lang.RuntimeException: java.io.FileNotFoundException: 
/data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
 (No such file or directory)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:46)
at 
org.apache.cassandra.io.util.CompressedSegmentedFile.createReader(CompressedSegmentedFile.java:57)
at 
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:41)
at 
org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:976)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableN

Re: Unable to bootstrap node

2013-08-06 Thread sankalp kohli
@Aaron
This problem happens when you drop and recreate a keyspace with the same
name and you do it very quickly. I have also filed a JIRA for it

https://issues.apache.org/jira/browse/CASSANDRA-5843


On Tue, Aug 6, 2013 at 10:31 AM, Keith Wright  wrote:

> The file does not appear on disk and the permissions are definitely
> correct.  We have seen the file in snapshots.   This is completely blocking
> us from adding the new node.  How can we recover?  Just run repairs?
>
> Thanks
>
> From: Aaron Morton 
> Reply-To: "user@cassandra.apache.org" 
> Date: Tuesday, August 6, 2013 4:06 AM
> To: "user@cassandra.apache.org" 
> Subject: Re: Unable to bootstrap node
>
> Caused by: java.io.FileNotFoundException:
> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
> (No such file or directory)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> at
> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:67)
> at
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.
>
> This is somewhat serous, specially if it's from the a bug in dropping
> tables. Though I would expect that would show up for a lot of people.
>
> Does the file exist on disk?
> Are the permissions correct ?
>
> IMHO you need to address this issue on the existing nodes before worrying
> about the new node.
>
> Cheers
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/08/2013, at 1:25 PM, sankalp kohli  wrote:
>
> Let me know if this fixes the problem?
>
>
> On Mon, Aug 5, 2013 at 6:24 PM, sankalp kohli wrote:
>
>> So the problem is that when you dropped and recreated the table with the
>> same name, some how the old CFStore object was not purged. So now there
>> were two objects which caused same sstable to have 2 SSTableReader object.
>>
>> The fix is to find all nodes which is emitting this FileNotFound
>> Exception and restart them.
>>
>> In your case, restart the node which is serving the data and emitting
>> FileNotFound exception.
>>
>> Once this is up, again restart the bootstrapping node with bootstrap
>> argument. Now it will successfully stream the data.
>>
>>
>> On Mon, Aug 5, 2013 at 6:08 PM, Keith Wright wrote:
>>
>>>
>>> Yes we likely dropped and recreated tables.  If we stop the sending node, 
>>> what will happen to the bootstrapping node?
>>>
>>> sankalp kohli  wrote:
>>>
>>>
>>> Hi,
>>> The problem is that the node sending the stream is hitting this
>>> FileNotFound exception. You need to restart this node and it should fix the
>>> problem.
>>>
>>> Are you seeing lot of FileNotFoundExceptions? Did you do any schema
>>> change recently?
>>>
>>> Sankalp
>>>
>>>
>>> On Mon, Aug 5, 2013 at 5:39 PM, Keith Wright wrote:
>>>
 Hi all,

I have been trying to bootstrap a new node into my 7 node 1.2.4 C*
 cluster with Vnodes RF3 with no luck.  It gets close to completing and then
 the streaming just stalls with  streaming at 99% from 1 or 2 nodes.
  Nodetool netstats shows the items that have yet to stream but the logs on
 the new node do not show any errors.  I tried shutting down then node,
 clearing all data/commit logs/caches, and re-boot strapping with no luck.
  The nodes that are hanging sending the data only have the error below but
 that's related to compactions (see below) although it is one of the files
 that is waiting to be sent.  I tried nodetool scrub on the column family
 with the missing item but got an error indicating it could not get a hard
 link.  Any ideas?  We were able to bootstrap one of the new nodes with no
 issues but this other one has been a real pain.  Note that when the new
 node is joining the cluster, it does not appear in nodetool status.  Is
 that expected?

 Thanks all, my next step is to try getting a new IP for this machine,
 my thought being that the cluster doesn't like me continuing to attempt to
 bootstrap the node repeatedly each time getting a new host id.

 [kwright@lxpcas008 ~]$ nodetool netstats | grep
 rts-40301_feedProducts-ib-1-Data.db
rts:
 /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
 sections=73 progress=0/1884669 - 0%

 ERROR [ReadStage:427] 2013-08-05 23:23:29,294 CassandraDaemon.java
 (line 174) Exception in thread Thread[ReadStage:427,5,main]
 java.lang.RuntimeException: java.io.FileNotFoundException:
 /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
 (No such file or directory)
 at
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:46)
 at
 org.apache.cassandra.io.util.CompressedSegmentedFile.createReader(CompressedSegmentedFile.java:57)
 at
 org.apache.cassa

Large number of pending gossip stage tasks in nodetool tpstats

2013-08-06 Thread Faraaz Sareshwala
I'm running cassandra-1.2.8 in a cluster with 45 nodes across three racks. All
nodes are well behaved except one. Whenever I start this node, it starts
churning CPU. Running nodetool tpstats, I notice that the number of pending
gossip stage tasks is constantly increasing [1]. When looking at nodetool
gossipinfo, I notice that this node has updated to the latest schema hash, but
that it thinks other nodes in the cluster are on the older version. I've tried
to drain, decommission, wipe node data, bootstrap, and repair the node. However,
the node just started doing the same thing again.

Has anyone run into this issue before? Can anyone provide any insight into why
this node is the only one in the cluster having problems? Are there any easy
fixes?

Thank you,
Faraaz

[1] $ /cassandra/bin/nodetool tpstats
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 0 0  8 0  
   0
RequestResponseStage  0 0  49198 0  
   0
MutationStage 0 0 224286 0  
   0
ReadRepairStage   0 0  0 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   1  2213 18 0  
   0
AntiEntropyStage  0 0  0 0  
   0
MigrationStage0 0 72 0  
   0
MemtablePostFlusher   0 0102 0  
   0
FlushWriter   0 0 99 0  
   0
MiscStage 0 0  0 0  
   0
commitlog_archiver0 0  0 0  
   0
InternalResponseStage 0 0 19 0  
   0
HintedHandoff 0 0  2 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 0
MUTATION 0
_TRACE   0
REQUEST_RESPONSE 0


Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread S Ahmed
>From what I understood "tons" of people are running things on ec2, but it
could be the instance size is pretty large that it compares to a dedicated
server (especially if you go with SSD, it is like 1K/month!)


On Tue, Aug 6, 2013 at 3:54 AM, Aaron Morton wrote:

> how many nodes to start with(2 ok?) ?
>
> I'd recommend 3, that will give you some redundancy see
> http://thelastpickle.com/2011/06/13/Down-For-Me/
>
> Cheers
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/08/2013, at 1:41 AM, Rajkumar Gupta  wrote:
>
> okay, so what should a workable VPS configuration to start with & minimum
> how many nodes to start with(2 ok?) ?  Seriously I cannot afford the
> tensions of colocation setup.  My hosting provider provides SSD drives with
> KVM virtualization.
>
>
>


Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread David Ward
3 node EC2 m1.xlarge is ~ $1000/k month + any incidental costs ( s3
backups, transfer out of the AZ ), etc ) or ~$300/month after a ~$1400
upfront 1 year reservation fee.

There are some uncomfortable spots when compaction kicks on concurrently
for several large CF's but otherwise its been performant and so far stable
using ephemeral raid0 ala ( Datastax's 2.4 AMI ).


On Tue, Aug 6, 2013 at 2:15 PM, S Ahmed  wrote:

> From what I understood "tons" of people are running things on ec2, but it
> could be the instance size is pretty large that it compares to a dedicated
> server (especially if you go with SSD, it is like 1K/month!)
>
>
> On Tue, Aug 6, 2013 at 3:54 AM, Aaron Morton wrote:
>
>> how many nodes to start with(2 ok?) ?
>>
>> I'd recommend 3, that will give you some redundancy see
>> http://thelastpickle.com/2011/06/13/Down-For-Me/
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 5/08/2013, at 1:41 AM, Rajkumar Gupta  wrote:
>>
>> okay, so what should a workable VPS configuration to start with & minimum
>> how many nodes to start with(2 ok?) ?  Seriously I cannot afford the
>> tensions of colocation setup.  My hosting provider provides SSD drives with
>> KVM virtualization.
>>
>>
>>
>


Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread Ertio Lew
Amazon seems to much overprice its services. If you look out for a similar
size deployment elsewhere like linode or digital ocean(very competitive
pricing), you'll notice huge differences. Ok, some services & features are
extra but may we all don't need them necessarily & when you can host on
non-dedicated virtual servers on Amazon you can also do it with similar
configuration nodes elsewhere too.

IMO these huge costs associated with cassandra deployment are too heavy for
small startups just starting out. I believe, If you consider a deployment
for similar application using MySQL it should be quite cheaper/
affordable(though i'm not exactly sure). Atleast you don't usually create a
cluster from the beginning. Probably we made a wrong decision to choose
cassandra considering only its technological advantages.


Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread Janne Jalkanen

Well, Amazon is expensive. Hetzner will sell you dedicated SSD RAID-1 servers 
with 32GB RAM and 4 cores with HT for €59/mth.  However, if pricing is an 
issue, you could start with:

1 server : read at ONE, write at ONE, RF=1. You will have consistency, but not 
high availability. This is the same as with MySQL or any other single-server 
solution - if the db server goes down, your service goes down.  You will need 
to be extra careful with backups here, because if your node blows, you will 
need to restore.

then you upgrade to

2 servers: read at ONE, write at ONE, RF=2. You can now tolerate one node going 
down with automatic failover, but you won't get consistency.  This is kinda 
having MySQL master/slave replication (yes, I know, it's not really the same, 
but it's pretty close as an effect)

then you upgrade to

3 servers: read at QUORUM, write at QUORUM, RF=3. You can tolerate one node 
going down, and you will have consistent data. This is where Cassandra starts 
to shine.

then you get a big heap-o-money, and keep adding servers and you realize that 
with pretty much everything else you would be spending a LOT of time just to 
keep sure that your cluster is up and running and performing.

It's always a question of tradeoffs. Cassandra is cool 'cos it gives you the 
ability to run a lot of different configurations and will go up-up-up when you 
need it without a lot of special magic.

/Janne

On Aug 7, 2013, at 07:36 , Ertio Lew  wrote:

> Amazon seems to much overprice its services. If you look out for a similar 
> size deployment elsewhere like linode or digital ocean(very competitive 
> pricing), you'll notice huge differences. Ok, some services & features are 
> extra but may we all don't need them necessarily & when you can host on 
> non-dedicated virtual servers on Amazon you can also do it with similar 
> configuration nodes elsewhere too.
> 
> IMO these huge costs associated with cassandra deployment are too heavy for 
> small startups just starting out. I believe, If you consider a deployment for 
> similar application using MySQL it should be quite cheaper/ affordable(though 
> i'm not exactly sure). Atleast you don't usually create a cluster from the 
> beginning. Probably we made a wrong decision to choose cassandra considering 
> only its technological advantages.