Re: counter column family

2012-04-03 Thread Tamar Fraenkel
Hi! So, if I am using Hector, I need to do: cassandraHostConfigurator.setRetryDownedHosts(false)? How will this affect my application generally? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490

Re: data size difference between supercolumn and regular column

2012-04-03 Thread Tamar Fraenkel
Do you have a good reference for maintenance scripts for Cassandra ring? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Apr 3, 2012 at 4:37 AM, aaron morton wr

Re: cassandra 1.08 on java7 and win7

2012-04-03 Thread puneet loya
thank u Gopala :) Der is no issue with it.. Might be i was typing something wrong.,. Minor mistake :) On Tue, Apr 3, 2012 at 11:51 PM, Gopala wrote: > puneet loya gmail.com> writes: > > > > > > > create keyspace DEMO > > > > with placement_strategy = > 'org.apache.cassandra.locator.Network

Re: size tiered compaction - improvement

2012-04-03 Thread Igor
Here is small python script I run once per day. You have to adjust size and/or age limits in the 'if' operator. Also I use mx4j interface for jmx calls. #!/usr/bin/env python import sys,os,glob,time,urllib2 CASSANDRA_DATA='/spool1/cassandra/data' DONTTOUCH=('system',) now = time.time() def

Re: size tiered compaction - improvement

2012-04-03 Thread igor
The first is keyspace name, second is sstable name (like transaction-hc-1024-Data.db   -Original Message- From: Radim Kolar To: user@cassandra.apache.org Sent: Wed, 04 Apr 2012 3:14 Subject: Re: size tiered compaction - improvement Dne 3.4.2012 23:04, i...@4friends.od.ua napsal(a): >

Re: System keyspace leak?

2012-04-03 Thread David Leimbach
Well I just found this: http://wiki.apache.org/cassandra/LiveSchemaUpdates which explains a ton... It looks like this particular Column Family will grow infinitely (it's just one row with a column per migration), so if I'm pounding on my Cassandra node with CREATE/DROP activity, I'm going to mak

Re: Largest 'sensible' value

2012-04-03 Thread Franc Carter
On Wed, Apr 4, 2012 at 8:56 AM, Jonathan Ellis wrote: > We use 2MB chunks for our CFS implementation of HDFS: > http://www.datastax.com/dev/blog/cassandra-file-system-design > thanks > > On Mon, Apr 2, 2012 at 4:23 AM, Franc Carter > wrote: > > > > Hi, > > > > We are in the early stages of th

System keyspace leak?

2012-04-03 Thread David Leimbach
I've been trying to understand the overhead of create/drop keyspace on Cassandra 1.0.8. It's not free, especially when I've managed to drive up the LiveDiskSpaceUsed for the Migrations CF in the "system" keyspace up to over 12 MB of disk. I've tried doing "nodetool -h localhost repair system" and

Re: size tiered compaction - improvement

2012-04-03 Thread Radim Kolar
Dne 3.4.2012 23:04, i...@4friends.od.ua napsal(a): if you know for sure that you will free lot of space compacting some old table, then you can call UserdefinedCompaction for this table(you can do this from cron). There is also a ticket in jira with discussion on per-sstable expierd column an

Re: tombstones problem with 1.0.8

2012-04-03 Thread Jonathan Ellis
Removing expired columns actually requires two compaction passes: one to turn the expired column into a tombstone; one to remove the tombstone after gc_grace_seconds. (See https://issues.apache.org/jira/browse/CASSANDRA-1537.) Perhaps CASSANDRA-2786 was causing things to (erroneously) be cleaned u

Re: key cache size calculation

2012-04-03 Thread Shoaib Mir
On Wed, Apr 4, 2012 at 8:04 AM, aaron morton wrote: > It depends on the workload. > > Increase the cache size until you see the hit rate decrease, or see it > create memory pressure. Watch the logs for messages that the caches have > been decreased. > > Take a look at the Recent Read Latency for t

Re: really bad select performance

2012-04-03 Thread Jonathan Ellis
Secondary indexes can generate a lot of random i/o. iostat -x can confirm if that's your problem. On Thu, Mar 29, 2012 at 5:52 PM, Chris Hart wrote: > Hi, > > I have the following cluster: > > 136112946768375385385349842972707284580 >  MountainViewRAC1        Up     Normal  1.86 GB         20.0

Re: column’s timestamp

2012-04-03 Thread Jonathan Ellis
That would work, with the caveat that you'd have to delete it and re-insert if you want to preserve that relationship on update. On Mon, Apr 2, 2012 at 12:18 PM, Pierre Chalamet wrote: > Hi, > > What about using a ts as column name and do a get sliced instead ? > > > --Original Message--

Re: Largest 'sensible' value

2012-04-03 Thread Jonathan Ellis
We use 2MB chunks for our CFS implementation of HDFS: http://www.datastax.com/dev/blog/cassandra-file-system-design On Mon, Apr 2, 2012 at 4:23 AM, Franc Carter wrote: > > Hi, > > We are in the early stages of thinking about a project that needs to store > data that will be accessed by Hadoop. On

Re: size tiered compaction - improvement

2012-04-03 Thread Jonathan Ellis
Twitter tried a timestamp-based compaction strategy in https://issues.apache.org/jira/browse/CASSANDRA-2735. The conclusion was, "this actually resulted in a lot more compactions than the SizeTieredCompactionStrategy. The increase in IO was not acceptable for our use and therefore stopped working

Re: Error Replicate on write

2012-04-03 Thread aaron morton
What is logged when it cannot find JNA ? What is passed to the java service ? Check with ps aux | grep cassandra Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/04/2012, at 7:28 PM, Carlos Juzarte Rolo wrote: > It is, but it doesn't lo

Re: key cache size calculation

2012-04-03 Thread aaron morton
It depends on the workload. Increase the cache size until you see the hit rate decrease, or see it create memory pressure. Watch the logs for messages that the caches have been decreased. Take a look at the Recent Read Latency for the CF. This is how long it takes to actually read data on th

Re: size tiered compaction - improvement

2012-04-03 Thread igor
if you know for sure that you will free lot of space compacting some old table, then you can call UserdefinedCompaction for this table(you can do this from cron). There is also a ticket in jira with discussion on per-sstable expierd column and tombstones counters.   -Original Message

size tiered compaction - improvement

2012-04-03 Thread Radim Kolar
there is problem with size tiered compaction design. It compacts together tables of similar size. sometimes it might happen that you will have some sstables sitting on disk forever (Feb 23) because no other similar sized tables were created and probably never be. because flushed sstable is abo

Re: Write performance compared to Postgresql

2012-04-03 Thread Віталій Тимчишин
Hello. We are using java async thrift client. As of ruby, it seems you need to use something like http://www.mikeperham.com/2010/02/09/cassandra-and-eventmachine/ (Not sure as I know nothing about ruby). Best regards, Vitalii Tymchyshyn 2012/4/3 Jeff Williams > Vitalii, > > Yep, that sounds l

RE: Counter Column

2012-04-03 Thread Jeremiah Jordan
Right, it affects every version of Cassandra from 0.8 beta 1 until the Fix Version, which right now is None, so it isn't fixed yet... From: Avi-h [avih...@gmail.com] Sent: Tuesday, April 03, 2012 5:23 AM To: cassandra-u...@incubator.apache.org Subject: Re:

Re: cassandra 1.08 on java7 and win7

2012-04-03 Thread Gopala
puneet loya gmail.com> writes: > > > create keyspace DEMO  > >     with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' > >     and strategy_options=[{datacenter1:1}]; > > try it n check if it executes Hi Puneet, I have same issue. Running the command you menti

Re: composite query performance depends on component ordering

2012-04-03 Thread Alexandru Sicoe
Hi Sylvain and Aaron, Thanks for the comment Sylvain, what you say makes sense, I have microsecond precision timestamps and looking at some row printouts I see everything is happening at a different timestamp which means that it won't compare the second 100 bytes component. As for the methodology

Re: 2 questions DataStax Enterprise

2012-04-03 Thread Jake Luciani
Hi reply inline. On Tue, Apr 3, 2012 at 12:18 PM, Alexandru Sicoe wrote: > Hi guys, > I'm trying out DSE and looking for the best way to arrange the cluster. I > have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6 > outside the gateway that are supposed to take replicas f

RE: Write performance compared to Postgresql

2012-04-03 Thread Collard, David L (Dave)
Where is your client running? -Original Message- From: Jeff Williams [mailto:je...@wherethebitsroam.com] Sent: Tuesday, April 03, 2012 11:09 AM To: user@cassandra.apache.org Subject: Re: Write performance compared to Postgresql Vitalii, Yep, that sounds like a good idea. Do you have any

2 questions DataStax Enterprise

2012-04-03 Thread Alexandru Sicoe
Hi guys, I'm trying out DSE and looking for the best way to arrange the cluster. I have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6 outside the gateway that are supposed to take replicas from the other 3 and serve reads and analytics jobs. 1. Is it ok to run the 3 nodes

AUTO: Manoj Chaudhary: My computer is in for repair (returning 04/10/2012)

2012-04-03 Thread Manoj Chaudhary
I am out of the office until 04/10/2012. I am out of Office from 04/02/2012 to 04/10/2012. I am travelling for Customer and Partner visit in Japan. I will try to respond to email between meetings if possible. For anything urgent please contact Rishi Vaish (Rishi Vaish/San Jose/IBM) Note: T

RE: Write performance compared to Postgresql

2012-04-03 Thread Jeremiah Jordan
So Cassandra may or may not be faster than your current system when you have a couple connections. Where it is faster, and scales, is when you get hundreds of clients across many nodes. See: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html With 60 clients running

Re: Write performance compared to Postgresql

2012-04-03 Thread Jeff Williams
Vitalii, Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client? Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing. Jeff On Apr 3, 2

Re: Compression on client side vs server side

2012-04-03 Thread Віталій Тимчишин
We are using client-side compression because of next points. Can you confirm they are valid? 1) Server-side compression uses replication factor more CPU (3 times more with replication factor of 3). 2) Network is used more by compression factor (as you are sending uncompressed data over the wire). 4

Re: Write performance compared to Postgresql

2012-04-03 Thread Vitalii Tymchyshyn
Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same. Best regards, Vitalii Tymchyshyn. 03.04.12 16:18, Jeff Williams написав(ла): Ok, so you think the write speed is limited by the cli

Re: Write performance compared to Postgresql

2012-04-03 Thread Jeff Williams
Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test! Jeff On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote: > Hi Jeff,

Re: Counter Column

2012-04-03 Thread Alain RODRIGUEZ
Sylvain explained a lot of things about counters at Cassandra SF 2011 : http://blip.tv/datastax/counters-in-cassandra-5497678 (video), http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf(slides). I think it is always important knowing how the things work. Alain 2012/4/3

Re: Repair in loop?

2012-04-03 Thread Sylvain Lebresne
On Tue, Apr 3, 2012 at 1:55 PM, Nuno Jordao wrote: > Ok, Thank you! :) > > One last question then, is "nodetool repair -pr" enough to recover a failed > node? It's not. It's more for doing repair of full cluster (to ensure the all nodes are in synch), in which case you'd want to run "nodetool re

RE: Repair in loop?

2012-04-03 Thread Nuno Jordao
Ok, Thank you! :) One last question then, is "nodetool repair -pr" enough to recover a failed node? Nuno -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: terça-feira, 3 de Abril de 2012 12:38 To: user@cassandra.apache.org Subject: Re: Repair in loop? Import

Re: Repair in loop?

2012-04-03 Thread Sylvain Lebresne
On Tue, Apr 3, 2012 at 12:52 PM, Nuno Jordao wrote: > Thank you for your response. > My question is that it is repeating the same column family: > > INFO 19:12:24,656 [repair #69c95b50-7cee-11e1--6b5cbd036faf] BlockData_b6 > is fully synced (255 remaining column family to sync for this sessio

Re: Write performance compared to Postgresql

2012-04-03 Thread Jake Luciani
Hi Jeff, Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve. Jake On Apr 3, 2012, at 7:08 AM, Jeff Williams wrote: > Hi, > > I am looking at cassandra for a logging application. We currently log to a >

Write performance compared to Postgresql

2012-04-03 Thread Jeff Williams
Hi, I am looking at cassandra for a logging application. We currently log to a Postgresql database. I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated

RE: Repair in loop?

2012-04-03 Thread Nuno Jordao
Thank you for your response. My question is that it is repeating the same column family: INFO 19:12:24,656 [repair #69c95b50-7cee-11e1--6b5cbd036faf] BlockData_b6 is fully synced (255 remaining column family to sync for this session) [...] INFO 10:03:50,269 [repair #a66c8240-7d6a-11e1--6b

Re: Repair in loop?

2012-04-03 Thread Sylvain Lebresne
It just means that you have lots of column family and repair does 1 column family at a time. Each line is just saying it's done with one of the column family. There is nothing wrong, but it does mean the repair is *not* done yet. -- Sylvain On Tue, Apr 3, 2012 at 12:28 PM, Nuno Jordao wrote: > H

Re: Counter Column

2012-04-03 Thread Sylvain Lebresne
Again, it will be relevant until CASSANDRA-2495 is fixed. Until then (then being undefined so far), it affects all version that have counters (including 1.0.8). -- Sylvain On Tue, Apr 3, 2012 at 12:23 PM, Avi-h wrote: > this bug is for 0.8 beta 1, is it also relevant for 1.0.8? > > > -- > View t

Repair in loop?

2012-04-03 Thread Nuno Jordao
Hello, I'm doing some test with cassandra 1.0.8 using multiple data directories with individual disks in a three node cluster (replica=3). One of the tests was to replace a couple of disks and start a repair process. It started ok and refilled the disks but I noticed that after the recovery proc

Re: Error Replicate on write

2012-04-03 Thread Carlos Juzarte Rolo
It is, but it doesn't load it. I tried the default package manager version (3.2), the 3.3 and the 3.4 version and this node always say that was unable to load the JNA. I put the jna.jar inside /.../cassandra/lib/ where the other .jar files are. I have other nodes with the same config (without

Re: Counter Column

2012-04-03 Thread Sylvain Lebresne
On Tue, Apr 3, 2012 at 9:11 AM, Avi-h wrote: > I have encountered the following piece of information regarding the use of > ‘Counter Column’ in Cassandra: “If a write fails unexpectedly (timeout or > loss of connection to the coordinator node) the client will not know if the > operation has been p

Counter Column

2012-04-03 Thread Avi-h
I have encountered the following piece of information regarding the use of ‘Counter Column’ in Cassandra: “If a write fails unexpectedly (timeout or loss of connection to the coordinator node) the client will not know if the operation has been performed. A retry can result in an over count” (- quot