RE: mysql based columnar DB to Cassandra DB - Migration

2014-11-25 Thread Akshay Ballarpure
Thanks Andy for quick reply. I will have a look at the below link and get back. Best Regards Akshay Ballarpure Tata Consultancy Services Cell:- 9985084075 Mailto: akshay.ballarp...@tcs.com Website: http://www.tcs.com Experience certainty. IT Servic

RE: mysql based columnar DB to Cassandra DB - Migration

2014-11-25 Thread Andreas Finke
Hi Akshay, this heavily depends on your data model. There is no general way to do it. It includes several steps: 1) Migration of applications using Mysql to Cassandra 2) Migration of the Mysql Database to Cassandra itself Keep in mind that there are no such things like relations or joins in Ca

Re: mysql based columnar DB to Cassandra DB - Migration

2014-11-25 Thread Akshay Ballarpure
Hello Folks, I have one mysql based columnar DB, i want to migrate it to Cassandra. How its possible ? Best Regards Akshay Ballarpure Tata Consultancy Services Cell:- 9985084075 Mailto: akshay.ballarp...@tcs.com Website: http://www.tcs.com Experience

Re: Partial replication to a DC

2014-11-25 Thread Robert Coli
On Tue, Nov 25, 2014 at 8:52 PM, Robert Wille wrote: > Is it possible to replicate a subset of the keyspaces to a data center? > For example, if I want to run reports without impacting my production > nodes, can I put the relevant column families in a keyspace and create a DC > for reporting that

Partial replication to a DC

2014-11-25 Thread Robert Wille
Is it possible to replicate a subset of the keyspaces to a data center? For example, if I want to run reports without impacting my production nodes, can I put the relevant column families in a keyspace and create a DC for reporting that replicates only that keyspace? Robert

Re: High cpu usage & segfaulting

2014-11-25 Thread Otis Gospodnetic
Hi Stan, Put some monitoring on this. The first thing I think of when I hear "chewing up CPU" for Java apps is GC. In SPM you can easily see individual JVM memory pools and see if any of them are at (close to) 100%. You can typically correlate that to increased GC tim

Re: High cpu usage & segfaulting

2014-11-25 Thread Robert Coli
On Tue, Nov 25, 2014 at 8:07 PM, Stan Lemon wrote: > We are using v2.0.11 and have seen several instances in our 24 node > cluster where the node becomes unresponsive, when we look into it we find > that there is a cassandra process chewing up a lot of CPU. There are no > other indications in log

Re: RAM vs SSD for real world performance?

2014-11-25 Thread Kevin Burton
I imagine I’d generally be happy if we were CPU bound :-) … as long as the number of transactions per second is generally reasonable. On Tue, Nov 25, 2014 at 7:35 PM, Robert Coli wrote: > On Tue, Nov 25, 2014 at 5:31 PM, Kevin Burton wrote: > >> Curious what other people have seen here in pract

High cpu usage & segfaulting

2014-11-25 Thread Stan Lemon
We are using v2.0.11 and have seen several instances in our 24 node cluster where the node becomes unresponsive, when we look into it we find that there is a cassandra process chewing up a lot of CPU. There are no other indications in logs or anything as to what might be happening, however if we st

Re: RAM vs SSD for real world performance?

2014-11-25 Thread Robert Coli
On Tue, Nov 25, 2014 at 5:31 PM, Kevin Burton wrote: > Curious what other people have seen here in practice. Are they getting > comparable performance to RAM in practice? Latencies would be higher of > course but we’re fine with that. > My understanding is that when one runs Cassandra with SSDs

RAM vs SSD for real world performance?

2014-11-25 Thread Kevin Burton
The new SSDs that we have (as well as Fusion IO) in theory can saturate the gigabit ethernet port. The 4k random read and write IOs they’re doing now can easily add up quick and they’re faster than gigabit and even two gigabit. However, not all of that 4k is actually used. I suspect that on aver

Re: large range read in Cassandra

2014-11-25 Thread Dan Kinder
Thanks, very helpful Rob, I'll watch for that. On Tue, Nov 25, 2014 at 11:45 AM, Robert Coli wrote: > On Tue, Nov 25, 2014 at 10:45 AM, Dan Kinder wrote: > >> To be clear, I expect this range query to take a long time and perform >> relatively heavy I/O. What I expected Cassandra to do was use

Re: Data synchronization between 2 running clusters on different availability zone

2014-11-25 Thread Robert Coli
On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin wrote: > 1. For ensuring high availability I would like to install one Cassandra > cluster on one availability zone > (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2 > US-west). > One cluster, replication factor of 2, cluste

Re: Keyspace and table/cf limits

2014-11-25 Thread Robert Coli
On Tue, Nov 25, 2014 at 9:07 AM, Raj N wrote: > What's the latest on the maximum number of keyspaces and/or tables that > one can have in Cassandra 2.1.x? > Most relevant changes lately would be : https://issues.apache.org/jira/browse/CASSANDRA-6689 and https://issues.apache.org/jira/browse/CAS

Re: Cassandra version 1.0.10 Data Loss upon restart

2014-11-25 Thread Robert Coli
On Tue, Nov 25, 2014 at 6:40 AM, Ankit Patel wrote: > The JIRA https://issues.apache.org/jira/browse/CASSANDRA-4446 refers to > the problem after we've invoked drain. However, we did not invoke drain or > flush. We are running one node cassandra within one data center and it is > being replicated

Unsubscribe

2014-11-25 Thread Kevin Daly
Only BlueCat IPAM delivers true network intelligence, a smarter way to manage your network and devices. Read more at www.bluecatnetworks.com/networkintelligence . This e-mail and any attachments are for the sole use of the intended recipien

Re: large range read in Cassandra

2014-11-25 Thread Robert Coli
On Tue, Nov 25, 2014 at 10:45 AM, Dan Kinder wrote: > To be clear, I expect this range query to take a long time and perform > relatively heavy I/O. What I expected Cassandra to do was use auto-paging ( > https://issues.apache.org/jira/browse/CASSANDRA-4415, > http://stackoverflow.com/questions/1

Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Ah, clear then. SSD usage imposes a different bias in terms of costs;-) On Tue, Nov 25, 2014 at 9:48 PM, Nikolai Grigoriev wrote: > Andrei, > > Oh, yes, I have scanned the top of your previous email but overlooked the > last part. > > I am using SSDs so I prefer to put extra work to keep my syste

Re: Compaction Strategy guidance

2014-11-25 Thread Nikolai Grigoriev
Andrei, Oh, yes, I have scanned the top of your previous email but overlooked the last part. I am using SSDs so I prefer to put extra work to keep my system performing and save expensive disk space. So far I've been able to size the system more or less correctly so these LCS limitations do not ca

Re: large range read in Cassandra

2014-11-25 Thread Dan Kinder
Thanks Rob. To be clear, I expect this range query to take a long time and perform relatively heavy I/O. What I expected Cassandra to do was use auto-paging ( https://issues.apache.org/jira/browse/CASSANDRA-4415, http://stackoverflow.com/questions/17664438/iterating-through-cassandra-wide-row-with

Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Nikolai, Just in case you've missed my comment in the thread (guess you have) - increasing sstable size does nothing (in our case at least). That is, it's not worse but the load pattern is still the same - doing nothing most of the time. So, I switched to STCS and we will have to live with extra s

Re: max ttl for column

2014-11-25 Thread Rajanish GJ
Mark / Philip - Thanks a lot. This is really helpful. BTW It was my bad, i was mistaken that ttl was in miliseconds rather than seconds.. Regards Rajanish GJ apigee | rajan...@apigee.com On Fri, Nov 21, 2014 at 9:42 AM, Philip Thompson < philip.thomp...@datastax.com> wrote: > With the newest ve

Keyspace and table/cf limits

2014-11-25 Thread Raj N
What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? -Raj

Re: Rule of thumb for concurrent asynchronous queries?

2014-11-25 Thread Nikolai Grigoriev
I think it all depends on how many machines will be involved in the query (read consistency is also a factor) and how long is a typical response in bytes. Large responses will put more pressure on the GC, which will result in more time spent in GC and possibly long(er) GC pauses. Cassandra can tol

Re: Rule of thumb for concurrent asynchronous queries?

2014-11-25 Thread Jack Krupansky
Great question. The safe answer is to do a proof of concept implementation and try various rates to determine where the bottleneck is. It will also depend on the row size. Hard to say if you will be limited by the cluster load or network bandwidth. Is there only one client talking to your clus

Rule of thumb for concurrent asynchronous queries?

2014-11-25 Thread Robert Wille
Suppose I have the primary keys for 10,000 rows and I want them all. Is there a rule of thumb for the maximum number of concurrent asynchronous queries I should execute?

Re: Issues in moving data from cassandra to elasticsearch in java.

2014-11-25 Thread William Arbaugh
Sounds like you're trying to use C* as a message broker. Perhaps try using Kafka or RabbitMQ as a front-end. Then have two subscribers - one pulls and places into elasticsearch and the other inserts into C*. Yes it is more complex front-end, but it will give you the functionality you want. > O

Re: Issues in moving data from cassandra to elasticsearch in java.

2014-11-25 Thread Eric Stevens
Consider adding log_bucket timestamp, and then indexing that column. Your data loader can SELECT * FROM logs WHERE log_bucket = ?. The value you supply there would be the timestamp log bucket you're processing - in your case logged_at % 5. However, I'll caution against writing data to Cassandra

Data synchronization between 2 running clusters on different availability zone

2014-11-25 Thread Spico Florin
Hello! I have the following scenario: 1. For ensuring high availability I would like to install one Cassandra cluster on one availability zone (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2 US-west). 2.I have pipeline that is running on Amazon EC2-EAST and is feeding t

Re: Compaction Strategy guidance

2014-11-25 Thread Nikolai Grigoriev
Hi Jean-Armel, I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but there are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra 2.0.10. I have about 1,8Tb of data per node now in total, which falls into that range. As I said, it is really a problem with large amoun

Re: Cassandra version 1.0.10 Data Loss upon restart

2014-11-25 Thread Ankit Patel
Rob, The JIRA https://issues.apache.org/jira/browse/CASSANDRA-4446 refers to the problem after we've invoked drain. However, we did not invoke drain or flush. We are running one node cassandra within one data center and it is being replicated with another node in another data center. We are using

Re: Getting the counters with the highest values

2014-11-25 Thread Eric Stevens
> We have too many documents per day to materialize in memory, so querying per day and aggregating the results isn’t really possible. You don't really need to, that's part of the point. You can paginate across a partition with most client drivers, and materializing this view is just copying data

Re: Cassandra schema migrator

2014-11-25 Thread Phil Wise
On 25.11.2014 10:22, Jens Rantil wrote: > Anyone who is using, or could recommend, a tool for versioning > schemas/migrating in Cassandra? I've recently written a tool to solve schema migration at our company which may be useful: https://github.com/advancedtelematic/cql-migrate > My list of re

Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Yep, Marcus, I know. It's mainly a question of cost of those extra x2 disks, you know. Our "final" setup will be more like 30TB, so doubling it is still some cost. But i guess, we will have to live with it On Tue, Nov 25, 2014 at 1:26 PM, Marcus Eriksson wrote: > If you are that write-heavy you s

Re: Cassandra schema migrator

2014-11-25 Thread Jan Kesten
Hi Jens, maybe you should have a look at mutagen for cassandra: https://github.com/toddfast/mutagen-cassandra It is a litte quiet around this for some months, but maybe still worth it. Cheers, Jan Am 25.11.2014 um 10:22 schrieb Jens Rantil: Hi, Anyone who is using, or could recommend, a too

Re: Compaction Strategy guidance

2014-11-25 Thread Marcus Eriksson
If you are that write-heavy you should definitely go with STCS, LCS optimizes for reads by doing more compactions /Marcus On Tue, Nov 25, 2014 at 11:22 AM, Andrei Ivanov wrote: > Hi Jean-Armel, Nikolai, > > 1. Increasing sstable size doesn't work (well, I think, unless we > "overscale" - add mo

Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Hi Jean-Armel, Nikolai, 1. Increasing sstable size doesn't work (well, I think, unless we "overscale" - add more nodes than really necessary, which is prohibitive for us in a way). Essentially there is no change. I gave up and will go for STCS;-( 2. We use 2.0.11 as of now 3. We are running on EC

Cassandra schema migrator

2014-11-25 Thread Jens Rantil
Hi, Anyone who is using, or could recommend, a tool for versioning schemas/migrating in Cassandra? My list of requirements is:  * Support for adding tables.  * Support for versioning of table properties. All our tables are to be defaulted to LeveledCompactionStrategy.  * Support for adding non-

Fwd: Issues in moving data from cassandra to elasticsearch in java.

2014-11-25 Thread Vinod Joseph
Hi, I am working on a java plugin which moves data from cassandra to elasticsearch. This plugin must run in the server for every 5 seconds. The data is getting moved, but the issue is that every time the plugin runs(ie after every 5 seconds) all the data, including data which has been

Re: Cassandra backup via snapshots in production

2014-11-25 Thread Jens Rantil
> Truncate does trigger snapshot creation though Doesn’t it? With “auto_snapshot: true” it should. ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan wrote:

Re: Compaction Strategy guidance

2014-11-25 Thread Jean-Armel Luce
Hi Andrei, Hi Nicolai, Which version of C* are you using ? There are some recommendations about the max storage per node : http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to handle 10x (3-5TB)". I have

Re: Cassandra backup via snapshots in production

2014-11-25 Thread DuyHai Doan
True Delete in CQL just create tombstone so from the storage engine pov it's just adding some physical columns Truncate does trigger snapshot creation though Le 21 nov. 2014 19:29, "Robert Coli" a écrit : > On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil wrote: > >> > The main purpose is to prote