R: Re: Migration from 0.7 to 1.0
Aaron first of all thanks for your great support. I'm paranoid, so I would upgrade 1 node and let it soak in for a few hours. Nothing like upgrading an entire cluster and then discovering a problem. Ok but as far as my application is concerned is safe to keep a cluster with part of 1.0 and part of 0.7?I've read that they can communicate but will it bring to "strange" situations? Will my application continue working (java/pelops)? You can take some extra steps when doing a rolling restart see http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/ This is what I was looking for! :-)Thanks for the repair tips ... Best regards,Carlo Messaggio originale Da: aa...@thelastpickle.com Data: 04/01/2012 22.00 A: Ogg: Re: Migration from 0.7 to 1.0 Sounds good. You can take some extra steps when doing a rolling restart see http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/ Also make sure repair *does not* run until all the nodes have been upgraded. Do i miss something (I will backup everything before the upgrade)? I'm paranoid, so I would upgrade 1 node and let it soak in for a few hours. Nothing like upgrading an entire cluster and then discovering a problem. As far as maintenance is concerned, is enough to run a repair every x? (x < GCGraceSeconds)once for each node with in that time frame http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair Cheers -Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com On 5/01/2012, at 2:47 AM, cbert...@libero.it wrote:Hi, I'm going to migrate from Cassandra 0.7 to 1.0 in production and I'd like to know the best way to do it ... "Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling restart, one node at a time. (0.8.0 or 0.8.1 are NOT network-compatible with 1.0: upgrade to the most recent 0.8 release first.) You do not need to bring down the whole cluster at once. - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones." So what I'd do is for each node to ... 1 - run nodetool drain 2 - stop cassandra process 3 - start the new cassandra 1.0 4 - run nodetool scrub on the node Is it right? Do i miss something (I will backup everything before the upgrade)? Should I worry for some kind of particular/known problems? As far as maintenance is concerned, is enough to run a repair every x? (x < GCGraceSeconds) Best regards, Carlo
Re: Should I throttle deletes?
> I use a batch mutator in Pycassa to delete ~1M rows based on > a longish list of keys I'm extracting from an auxiliary CF (with no > problem of any sort). What is the size of the deletion batches ? > Now, it appears that such heads-on delete puts a temporary > but large load on the cluster. I have SSD's and they go to 100% > utilization, and the CPU spikes to significant loads. Does the load spike during the deletion or after it ? Do any of the thread pool back up in nodetool tpstats during the load ? I can think of a few general issues you may want to avoid: * Each row in a batch mutation is handled by a task in a thread pool on the nodes. So if you send a batch to delete 1,000 rows it will put 1,000 tasks in the Mutation stage. This will reduce the query throughput. * Lots of deletes in a row will add overhead to reads on the row. You may want to check for excessive memtable flushing, but if you have default automatic memory management running lots of deletes should not result in extra flushing. Hope that helps Aaron - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/01/2012, at 10:13 AM, Maxim Potekhin wrote: > Now that my cluster appears to run smoothly and after a few successful > repairs and compacts, I'm back in the business of deletion of portions > of data based on its date of insertion. For reasons too lengthy to be > explained here, I don't want to use TTL. > > I use a batch mutator in Pycassa to delete ~1M rows based on > a longish list of keys I'm extracting from an auxiliary CF (with no > problem of any sort). > > Now, it appears that such heads-on delete puts a temporary > but large load on the cluster. I have SSD's and they go to 100% > utilization, and the CPU spikes to significant loads. > > Does anyone do throttling on such mass-delete procedure? > > Thanks in advance, > > Maxim >
Writes slower then reads
Hi there, I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram). When I'm running my writes against the cluster with cl = ONE all reads appear to be faster then the writes. Average write speed = 1600us/operation Average read speed = 200us/operation I'm really wondering why this is the case. Anyone got a clue? With kind regards, Robin
Re: Writes slower then reads
What can you see in vmstat/dstat ? Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : > Hi there, > > I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2. > Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~ > 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram). > > When I'm running my writes against the cluster with cl = ONE all reads > appear to be faster then the writes. > > Average write speed = 1600us/operation > Average read speed = 200us/operation > > I'm really wondering why this is the case. Anyone got a clue? > > With kind regards, > Robin >
Re: Writes slower then reads
CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out). Paging 0. System int ~ 1300, csw ~ 2500. 2012/1/5 Philippe > What can you see in vmstat/dstat ? > Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : > > Hi there, >> >> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2. >> Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~ >> 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram). >> >> When I'm running my writes against the cluster with cl = ONE all reads >> appear to be faster then the writes. >> >> Average write speed = 1600us/operation >> Average read speed = 200us/operation >> >> I'm really wondering why this is the case. Anyone got a clue? >> >> With kind regards, >> Robin >> >
Re: Writes slower then reads
As I posted this I noticed that the other node's CPU is running high on some other cronjobs (every couple of minutes to 60% usage). Is the lack of more CPU cycles a problem in this case? Robin 2012/1/5 R. Verlangen > CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. > Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out). > Paging 0. System int ~ 1300, csw ~ 2500. > > > 2012/1/5 Philippe > >> What can you see in vmstat/dstat ? >> Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : >> >> Hi there, >>> >>> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2. >>> Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~ >>> 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram). >>> >>> When I'm running my writes against the cluster with cl = ONE all reads >>> appear to be faster then the writes. >>> >>> Average write speed = 1600us/operation >>> Average read speed = 200us/operation >>> >>> I'm really wondering why this is the case. Anyone got a clue? >>> >>> With kind regards, >>> Robin >>> >> >
Re: Consistency Level
I missed a ! in the code :) The query will break the token ring into ranges based on the node tokens and then find the UP nodes for each range. I've taken another walk through the code, the logs helped. In short, you do not have enough UP nodes to support an indexed get at CL ONE. It is working by design and you *should* have gotten an UnavailableException returned. There must be CL up replicas for each token range. In your test node 200.190 is down and so the next node, with RF 2 this means there are no replicas for the range. The log line below is logged just before the UnavalableException is raised > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) > Live nodes do not satisfy ConsistencyLevel (1 required) You will need at least every RF'th node UP. Another way to look at is if you have RF contiguous nodes DOWN you cannot perform an indexed get. If you are interested this is what the logs are saying… > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) > scan ranges are > [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1] There are 4 token ranges to query, i.e. we have to make 4 reads to query over the whole cluster. > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) > Blockfor/repair is 1/false; setting up requests to /172.16.200.130 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) > reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) > reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118 Starting to read for the first token range. A bug in 0.8.6 makes it read from 202.118 when it does not need to. > DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line > 1550) Primary scan clause is member > DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line > 1563) Expanding slice filter to entire row to cover additional expressions > DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line > 1605) Scanning index 'Audit_Log.member EQ kamal' starting with > DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) > collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009 > DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line > 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx > [7a7a32323636373030303438303031:false:0@1325704860925009,]) Scanned the secondary index on 200.130 and found an entry for the row key 7a7a32323636373030303438303031 matched the index expression. > DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line > 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130 Returning ZERO rows for the query result. Because the row key we read above has the token 111413491371349413596553235966977111575L which is not in the first token range from above (0,42535295865117307932921825928971026432] and this is the range we are interested in now. > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) > Blockfor/repair is 1/false; setting up requests to /172.16.202.118 > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) > reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118 Processing the second range now. There is only one node up for this range, 202.118 > DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 > ResponseVerbHandler.java (line 48) Processing response on a callback from > 172@/172.16.202.118 > DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 > ResponseVerbHandler.java (line 48) Processing response on a callback from > 173@/172.16.202.118 Got the callback from 202.118 for both the query ranges. The logs on 202.118 show the same local local query. But I'm a little confused as to why the row exists on node 2 at all. > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) > Blockfor/repair is 1/false; setting up requests to Moving on, time to process the third token range (85070591730234615865843651857942052864,127605887595351923798765477786913079296] > DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) > Live nodes do not satisfy ConsistencyLevel (1 required) Oh noes there are no nodes available for that token range. Throw UnavailableException Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/01/2012, at 10:52 AM, Kamal Bahadur wrote: > Hi Aaron, > > Thanks for your response! > > I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1 > conta
Re: is it bad to have lots of column families?
Sort of. Depends. In Cassandra automatic memory management means the server can support more CF's and it has apparently been tested to 100's or 1000's of CF's. Having lots of CF's will impact performance by putting memory and IO under pressure though. If you have 10's you should not have to worry too much. Best thing is to test and post your findings. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/01/2012, at 11:49 AM, Michael Cetrulo wrote: > in a traditional database it's not a good a idea to have hundreds of tables > but is it also bad to have hundreds of column families in cassandra? thank > you.
Re: Migration from 0.7 to 1.0
> Ok but as far as my application is concerned is safe to keep a cluster with > part of 1.0 and part of 0.7? I *think* it should be so long as it's a short time and you do not run any repairs. If 1.0 creates any new files, via mutations or compaction, they will not be readable by 0.7. So the rollback to 0.7 will require going back to the snapshot. > I've read that they can communicate but will it bring to "strange" > situations? Will my application continue working (java/pelops)? Again I think so (framed transport is there). But this should be an easy test to do against a dev server. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/01/2012, at 9:33 PM, cbert...@libero.it wrote: > > Aaron first of all thanks for your great support. > > I'm paranoid, so I would upgrade 1 node and let it soak in for a few > hours. Nothing like upgrading an entire cluster and then discovering a > problem. > > Ok but as far as my application is concerned is safe to keep a cluster with > part of 1.0 and part of 0.7? > I've read that they can communicate but will it bring to "strange" > situations? Will my application continue working (java/pelops)? > > You can take some extra steps when doing a rolling restart see > http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/ > > This is what I was looking for! :-) > Thanks for the repair tips ... > > Best regards, > Carlo > > > > > > Messaggio originale > Da: aa...@thelastpickle.com > Data: 04/01/2012 22.00 > A: > Ogg: Re: Migration from 0.7 to 1.0 > > Sounds good. > > You can take some extra steps when doing a rolling restart see > http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/ > > Also make sure repair *does not* run until all the nodes have been upgraded. > >> Do i miss something (I will backup everything before the >> upgrade)? > I'm paranoid, so I would upgrade 1 node and let it soak in for a few hours. > Nothing like upgrading an entire cluster and then discovering a problem. > >> As far as >> maintenance is concerned, is enough to run a repair every x? (x < >> GCGraceSeconds) > once for each node with in that time frame > http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair > > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 5/01/2012, at 2:47 AM, cbert...@libero.it wrote: > >> Hi, >> I'm going to migrate from Cassandra 0.7 to 1.0 in production and I'd like to >> know the best way to do it ... >> >> "Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling restart, >> one node at a time. (0.8.0 or 0.8.1 are NOT network-compatible with 1.0: >> upgrade to the most recent 0.8 release first.) You do not need to bring down >> the whole cluster at once. - After upgrading, run nodetool scrub against >> each >> node before running repair, moving nodes, or adding new ones." >> >> So what I'd do is for each node to ... >> >> 1 - run nodetool drain >> 2 - stop cassandra process >> 3 - start the new cassandra 1.0 >> 4 - run nodetool scrub on the node >> >> Is it right? Do i miss something (I will backup everything before the >> upgrade)? Should I worry for some kind of particular/known problems? As far >> as >> maintenance is concerned, is enough to run a repair every x? (x < >> GCGraceSeconds) >> >> Best regards, >> Carlo > > > >
Re: emptying my cluster
Hi, On Wed, Jan 4, 2012 at 9:54 PM, aaron morton wrote: > Some thoughts on the plan: > > * You are monkeying around with things, do not be surprised when > surprising things happen. > I am just trying to explore different solutions for solving my problem. > * Deliberately unbalancing the cluster may lead to Bad Things happening. > I will take your advice on this. I would have liked to have an extra node to have 2 nodes in each DC. > * In the design discussed it is perfectly reasonable for data not to be on > the archive node. > You mean when having the 2 DC setup I mentioned and using TTL? In case I have the 2 DC setup but don't use TTL I don't understand why data wouldn't be on the archive node? > * Truncate is a cluster wide operation and all nodes must be online before > it will start. > * Truncate will snapshot before deleting data, you could use this snapshot. > * TTL for a column is for a column no matter which node it is on. > Thanks for clarifying these! > * IMHO Cassandra data files (sstables or JSON dumps) are not a good format > for a historical archive, nothing against Cassandra. You need the lowest > common format. > So what data format should I use for historical archiving? > > If you have the resources for a second cluster could you put the two > together and just have one cluster with a very large retention policy? One > cluster is easier than two. > I am constrained to have limited retention on the Cassandra cluster that is collecting the data . Once I archive the data for long term storage I cannot bring it back in the same Cassandra cluster that collected it in the first place because it's in an enclosed network with strict rules. I have to load it in another cluster outside the enclosed network. It's not that I have the resources for a second cluster, I am forced to use a second cluster. > > Assuming there is no business case for this, consider either: > > * Dumping the historical data into a Hadoop (with or without HDFS) cluster > with high compression. If needed you could then run Hive / Pig to fill a > companion Cassandra cluster with data on demand. Or just query using Hadoop. > * Dumping the historical data to files with high compression and a roll > your own solution to fill a cluster. > > Ok, thanks for these suggestions, I will have to investigate further. > Also considering talking to Data Stax about DSE. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 5/01/2012, at 1:41 AM, Alexandru Sicoe wrote: > > Cheers, Alex > Hi, > > On Tue, Jan 3, 2012 at 8:19 PM, aaron morton wrote: > >> Running a time based rolling window of data can be done using the TTL. >> Backing up the nodes for disaster recover can be done using snapshots. >> Restoring any point in time will be tricky because to may restore columns >> where the TTL has expired. >> > > Yeah, that's the thing...if I want to use the system as I explain further > below, I cannot do backing up of data (for later restoration) if I'm using > TTLs. > > >> >> Will I get a single copy of the data in the remote storage or will it be >> twice the data (data + replica)? >> >> You will RF copies of the data. (By the way, there is no original copy) >> > > Well, if I organize the cluster as I mentioned in the first email, I will > get one copy of each row at a certain point in time on node2 if I take it > offline, perform a major compaction and GC, won't I? I don't want to send > duplicated data to the mass storage! > > >> >> Can you share a bit more about the use case ? How much data and what sort >> of read patterns ? >> >> > I have several applications that feed into Cassandra about 2 million > different variables (each representing a different monitoring > value/channel). The system receives updates for each of these monitoring > values at different rates. For each new update, the timestamp and value are > recorded in a Cassandra name-value pair. The schema of Cassandra is built > using one CF for data and 4 other CFs for metadata (metadata CFs are static > - don't grow almost at all once they've been loaded). The data CF uses a > row for each variable. Each row acts as a 4 hour time bin. I achieve this > by creating the row key as a concatenation of the first 6 digits of the > timestamp at which the data is inserted + the unique ID of the variable. > After the time bin expires, a new row will be created for the same variable > ID. > > The system can currently sustain the insertion load. Now I'm looking into > organizing > the flow of data out of the cluster and retrieval performance for random > queries: > > Why do I need to organize the data out? Well, my requirement is to keep > all the data coming into the system at the highest granularity for long > term (several years). The 3 node cluster I mentioned is the online cluster > which is supposed to be able to absorb the input load for a relatively > short period of time, a few weeks (I a
Composite column docs
Is there a doc for using composite columns with thrift? Is https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java the only doc? does the client needs to add the length to the get \ get_slice... queries or is it taken care of on the server side? Shimi
Re: is it bad to have lots of column families?
My 0.8 production cluster contains around 150 CFs spread across 5 keyspaces. Haven't found that to be an issue (yet?). Some of them are huge (dozens of GB), some are tiny (some MB). Cheers 2012/1/5 aaron morton > Sort of. Depends. > > In Cassandra automatic memory management means the server can support more > CF's and it has apparently been tested to 100's or 1000's of CF's. Having > lots of CF's will impact performance by putting memory and IO under > pressure though. > > If you have 10's you should not have to worry too much. Best thing is to > test and post your findings. > > Hope that helps. > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 5/01/2012, at 11:49 AM, Michael Cetrulo wrote: > > in a traditional database it's not a good a idea to have hundreds of > tables but is it also bad to have hundreds of column families in cassandra? > thank you. > > >
Re: Writes slower then reads
Depending on the CL you're reading at it will yes : if the CL requires that the "slow" node create a digest of the data and send it to the coordinator then it might explain the poor performance on reads. What is your read CL ? 2012/1/5 R. Verlangen > As I posted this I noticed that the other node's CPU is running high on > some other cronjobs (every couple of minutes to 60% usage). Is the lack of > more CPU cycles a problem in this case? > > Robin > > 2012/1/5 R. Verlangen > > CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. >> Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out). >> Paging 0. System int ~ 1300, csw ~ 2500. >> >> >> 2012/1/5 Philippe >> >>> What can you see in vmstat/dstat ? >>> Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : >>> >>> Hi there, I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram). When I'm running my writes against the cluster with cl = ONE all reads appear to be faster then the writes. Average write speed = 1600us/operation Average read speed = 200us/operation I'm really wondering why this is the case. Anyone got a clue? With kind regards, Robin >>> >> >
Re: Writes slower then reads
I'm also reading with CL = ONE 2012/1/5 Philippe > Depending on the CL you're reading at it will yes : if the CL requires > that the "slow" node create a digest of the data and send it to the > coordinator then it might explain the poor performance on reads. What is > your read CL ? > > 2012/1/5 R. Verlangen > > As I posted this I noticed that the other node's CPU is running high on >> some other cronjobs (every couple of minutes to 60% usage). Is the lack of >> more CPU cycles a problem in this case? >> >> Robin >> >> 2012/1/5 R. Verlangen >> >> CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. >>> Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out). >>> Paging 0. System int ~ 1300, csw ~ 2500. >>> >>> >>> 2012/1/5 Philippe >>> What can you see in vmstat/dstat ? Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : Hi there, > > I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = > 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk > throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with > 16GB > ram). > > When I'm running my writes against the cluster with cl = ONE all reads > appear to be faster then the writes. > > Average write speed = 1600us/operation > Average read speed = 200us/operation > > I'm really wondering why this is the case. Anyone got a clue? > > With kind regards, > Robin > >>> >> >
Re: Writes slower then reads
What if you shutdown the cassandra service on the slow node, does that improve your read performance ? If it does then that sole node is responsible for the slow down because it can't act as a coordinator fast enough. 2012/1/5 R. Verlangen > I'm also reading with CL = ONE > > > 2012/1/5 Philippe > >> Depending on the CL you're reading at it will yes : if the CL requires >> that the "slow" node create a digest of the data and send it to the >> coordinator then it might explain the poor performance on reads. What is >> your read CL ? >> >> 2012/1/5 R. Verlangen >> >> As I posted this I noticed that the other node's CPU is running high on >>> some other cronjobs (every couple of minutes to 60% usage). Is the lack of >>> more CPU cycles a problem in this case? >>> >>> Robin >>> >>> 2012/1/5 R. Verlangen >>> >>> CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out). Paging 0. System int ~ 1300, csw ~ 2500. 2012/1/5 Philippe > What can you see in vmstat/dstat ? > Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : > > Hi there, >> >> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = >> 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk >> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with >> 16GB >> ram). >> >> When I'm running my writes against the cluster with cl = ONE all >> reads appear to be faster then the writes. >> >> Average write speed = 1600us/operation >> Average read speed = 200us/operation >> >> I'm really wondering why this is the case. Anyone got a clue? >> >> With kind regards, >> Robin >> > >>> >> >
Re: Writes slower then reads
It does not appear to affect the response time, certainly not in a positive way. 2012/1/5 Philippe > What if you shutdown the cassandra service on the slow node, does that > improve your read performance ? > If it does then that sole node is responsible for the slow down because it > can't act as a coordinator fast enough. > > 2012/1/5 R. Verlangen > > I'm also reading with CL = ONE >> >> >> 2012/1/5 Philippe >> >>> Depending on the CL you're reading at it will yes : if the CL requires >>> that the "slow" node create a digest of the data and send it to the >>> coordinator then it might explain the poor performance on reads. What is >>> your read CL ? >>> >>> 2012/1/5 R. Verlangen >>> >>> As I posted this I noticed that the other node's CPU is running high on some other cronjobs (every couple of minutes to 60% usage). Is the lack of more CPU cycles a problem in this case? Robin 2012/1/5 R. Verlangen CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. > Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out). > Paging 0. System int ~ 1300, csw ~ 2500. > > > 2012/1/5 Philippe > >> What can you see in vmstat/dstat ? >> Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : >> >> Hi there, >>> >>> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = >>> 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk >>> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with >>> 16GB >>> ram). >>> >>> When I'm running my writes against the cluster with cl = ONE all >>> reads appear to be faster then the writes. >>> >>> Average write speed = 1600us/operation >>> Average read speed = 200us/operation >>> >>> I'm really wondering why this is the case. Anyone got a clue? >>> >>> With kind regards, >>> Robin >>> >> > >>> >> >
Re: Writes slower then reads
You may be overloading the cluster though... My hypothesis is that your traffic is being spread across your node and that one slow node is slowing down the fraction of traffic that goes to that node (when it's acting as coordinator). So what I would do is reduce the read load a lot to make sure I don't overload the cluster and measure if I see a 1/RF improvement in response time which would validate my hypothesis. 2012/1/5 R. Verlangen > It does not appear to affect the response time, certainly not in a > positive way. > > > 2012/1/5 Philippe > >> What if you shutdown the cassandra service on the slow node, does that >> improve your read performance ? >> If it does then that sole node is responsible for the slow down because >> it can't act as a coordinator fast enough. >> >> 2012/1/5 R. Verlangen >> >> I'm also reading with CL = ONE >>> >>> >>> 2012/1/5 Philippe >>> Depending on the CL you're reading at it will yes : if the CL requires that the "slow" node create a digest of the data and send it to the coordinator then it might explain the poor performance on reads. What is your read CL ? 2012/1/5 R. Verlangen As I posted this I noticed that the other node's CPU is running high on > some other cronjobs (every couple of minutes to 60% usage). Is the lack of > more CPU cycles a problem in this case? > > Robin > > 2012/1/5 R. Verlangen > > CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. >> Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / >> out). >> Paging 0. System int ~ 1300, csw ~ 2500. >> >> >> 2012/1/5 Philippe >> >>> What can you see in vmstat/dstat ? >>> Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : >>> >>> Hi there, I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram). When I'm running my writes against the cluster with cl = ONE all reads appear to be faster then the writes. Average write speed = 1600us/operation Average read speed = 200us/operation I'm really wondering why this is the case. Anyone got a clue? With kind regards, Robin >>> >> > >>> >> >
libQtCassandra minus Qt
Good afternoon, I am curious if anyone here has taken the libQtCassandra high-level client and stripped-out the Qt pieces to make it Qt independent? Thanks, David Gosselin Senior Software Engineer Acme Packet (781) 328-2604
Re: Writes slower then reads
The write and read load is very minimal the moment. Roughly 10 writes + 10 reads / second. So 20 operations per second. Don't think that overloads my cluster, does it? 2012/1/5 Philippe > You may be overloading the cluster though... > > My hypothesis is that your traffic is being spread across your node and > that one slow node is slowing down the fraction of traffic that goes to > that node (when it's acting as coordinator). > So what I would do is reduce the read load a lot to make sure I don't > overload the cluster and measure if I see a 1/RF improvement in response > time which would validate my hypothesis. > > > 2012/1/5 R. Verlangen > > It does not appear to affect the response time, certainly not in a >> positive way. >> >> >> 2012/1/5 Philippe >> >>> What if you shutdown the cassandra service on the slow node, does that >>> improve your read performance ? >>> If it does then that sole node is responsible for the slow down because >>> it can't act as a coordinator fast enough. >>> >>> 2012/1/5 R. Verlangen >>> >>> I'm also reading with CL = ONE 2012/1/5 Philippe > Depending on the CL you're reading at it will yes : if the CL requires > that the "slow" node create a digest of the data and send it to the > coordinator then it might explain the poor performance on reads. What is > your read CL ? > > 2012/1/5 R. Verlangen > > As I posted this I noticed that the other node's CPU is running high >> on some other cronjobs (every couple of minutes to 60% usage). Is the >> lack >> of more CPU cycles a problem in this case? >> >> Robin >> >> 2012/1/5 R. Verlangen >> >> CPU is idle (< 10% usage). Disk reads occasionally blocks over >>> 32/64K. Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s >>> (in >>> / out). Paging 0. System int ~ 1300, csw ~ 2500. >>> >>> >>> 2012/1/5 Philippe >>> What can you see in vmstat/dstat ? Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : Hi there, > > I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF > = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk > throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine > with 16GB > ram). > > When I'm running my writes against the cluster with cl = ONE all > reads appear to be faster then the writes. > > Average write speed = 1600us/operation > Average read speed = 200us/operation > > I'm really wondering why this is the case. Anyone got a clue? > > With kind regards, > Robin > >>> >> > >>> >> >
Hector and CQL
Hi Folk, I am a beginner user in Cassandra. I have a question about the usage and integration (or installation) hector into eclipse IDE? I try to find the answer by googling, but I do not find a proper guidance to do it. Would you want to help me by telling me how to do it or showing me the proper guidance in the internet?? Thank you.
Re: is it bad to have lots of column families?
2012/1/5 Michael Cetrulo > in a traditional database it's not a good a idea to have hundreds of > tables but is it also bad to have hundreds of column families in cassandra? > thank you. > As far as I can see, this may raise memory requirements for you, since you need to have index/bloom filter for each column family in memory. -- Best regards, Vitalii Tymchyshyn
Integration Error between Cassandra and Eclipse
Hi There, I am a beginner user in Cassandra. I hear from many people said Cassandra is a powerful database software which is used by Facebook, Twitter, Digg, etc. So I feel interesting to study more about Cassandra. When I performed integration process between Cassandra with Eclipse IDE (in this case I use Java as computer language), I get trouble and have many problem. I have already followed all instruction from http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this tutorial was not working properly. I got a lot of errors and warnings while creating Java project in eclipse. These are the errors and warnings: Error(X) (1 item): Description Resource Location The method rangeSet(Range...) in the type Range is not applicable for the arguments (Range[]) RangeTest.java line 178 Warnings(!) (100 of 2916 items): Description Resource Location AbstractType is a raw type. References to generic type AbstractType should be parameterized AbstractColumnContainer.java line 72 (and many same warnings) These are what i've done: 1. I checked out cassandra-trunk from given link using SlikSvn as svn client. 2. I moved to cassandra-trunk folder, and build with ant using "ant build" command. 3. I generate eclipse files with ant using "ant generate-eclipse-files" command. 4. I create new java project on eclipse, insert project name with "cassandra-trunk", browse the location into cassandra-trunk folder. Do I perform any mistakes? Or there are something wrong with the tutorial in http://wiki.apache.org/cassandra/RunningCassandraInEclipse ?? I have already googling to find the solution to solve this problem, but unfortunately I found no results. Would you want to help me by giving me a guide how to solve this problem? Please Thank you very much for your help. Best Regards, Wira Saputra
RE: java.lang.AssertionError
Thanks Aaron. Michael From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, January 04, 2012 10:06 PM To: user@cassandra.apache.org Subject: Re: java.lang.AssertionError Will be fixed in 1.0.7 https://issues.apache.org/jira/browse/CASSANDRA-3656 Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/01/2012, at 11:26 PM, Michael Vaknine wrote: Hi, I have a 4 cluster version 1.0.3 which was upgraded from 0.7.6 in 2 stages. Upgrade to 1.0.0 run scrub on all nodes Upgrade to 1.0.3 I keep getting this errors from time to time on all 4 nodes. Is there any maintenance I can do to fix the problem? I tried to run repair on the cluster a few times but it did not help. Thanks in advance for your help. Michael The kind of errors I get: NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 java.lang.AssertionError NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:103) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$ 101(ScheduledThreadPoolExecutor.java:98) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeri odic(ScheduledThreadPoolExecutor.java:181) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch eduledThreadPoolExecutor.java:205) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at java.lang.Thread.run(Thread.java:619)
Re: Writes slower then reads
Unless you are doing huge batches no... don't have any other idea for now... 2012/1/5 R. Verlangen > The write and read load is very minimal the moment. Roughly 10 writes + 10 > reads / second. So 20 operations per second. Don't think that overloads my > cluster, does it? > > > 2012/1/5 Philippe > >> You may be overloading the cluster though... >> >> My hypothesis is that your traffic is being spread across your node and >> that one slow node is slowing down the fraction of traffic that goes to >> that node (when it's acting as coordinator). >> So what I would do is reduce the read load a lot to make sure I don't >> overload the cluster and measure if I see a 1/RF improvement in response >> time which would validate my hypothesis. >> >> >> 2012/1/5 R. Verlangen >> >> It does not appear to affect the response time, certainly not in a >>> positive way. >>> >>> >>> 2012/1/5 Philippe >>> What if you shutdown the cassandra service on the slow node, does that improve your read performance ? If it does then that sole node is responsible for the slow down because it can't act as a coordinator fast enough. 2012/1/5 R. Verlangen I'm also reading with CL = ONE > > > 2012/1/5 Philippe > >> Depending on the CL you're reading at it will yes : if the CL >> requires that the "slow" node create a digest of the data and send it to >> the coordinator then it might explain the poor performance on reads. What >> is your read CL ? >> >> 2012/1/5 R. Verlangen >> >> As I posted this I noticed that the other node's CPU is running high >>> on some other cronjobs (every couple of minutes to 60% usage). Is the >>> lack >>> of more CPU cycles a problem in this case? >>> >>> Robin >>> >>> 2012/1/5 R. Verlangen >>> >>> CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out). Paging 0. System int ~ 1300, csw ~ 2500. 2012/1/5 Philippe > What can you see in vmstat/dstat ? > Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : > > Hi there, >> >> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), >> RF = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. >> Disk >> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine >> with 16GB >> ram). >> >> When I'm running my writes against the cluster with cl = ONE all >> reads appear to be faster then the writes. >> >> Average write speed = 1600us/operation >> Average read speed = 200us/operation >> >> I'm really wondering why this is the case. Anyone got a clue? >> >> With kind regards, >> Robin >> > >>> >> > >>> >> >
Deciding on CF
Hello, We are working on some new cassandra requirements and I wanted to get your recommendations on how to go ahead and put schema in place in terms of how many CF one should have for below scenario: 1- There are 10 applications. Out of which 1 or 2 applications are very active giving 90%+ load. 2- Every application has 10-15 defined transaction types. 3- Transaction Data needs to be stored in cassandra that is categorized based on application (#1), transaction type (#2) and originating server. Size of each transaction data is 5KB. There can be max. of 250 million transactions per day. Transaction Data can be purged after 60 days. (There are no updates but only inserts) 4- Finally, Transaction Data Report need to be generated that can be rolled-up based on a timeline (could be past 5 mins upto max of 60 days) based on application, transaction type and/or originating server. Wanted to take the user group suggestion on how to decide on umber of CF and indexing option.
Re: Hector and CQL
Hector is a library. It needs to be added to your Eclipse project's "build classpath" somehow before you can begin using it in Eclipse. On Thu, Jan 05, 2012 at 11:25:16PM +0700, dir dir wrote: >Hi Folk, >I am a beginner user in Cassandra. I have a question about the usage and >integration (or installation) hector into eclipse IDE? I try to find the >answer >by googling, but I do not find a proper guidance to do it. Would you want >to help me >by telling me how to do it or showing me the proper guidance in the >internet?? >Thank you. �
Re: emptying my cluster
> * In the design discussed it is perfectly reasonable for data not to be on > the archive node. > > You mean when having the 2 DC setup I mentioned and using TTL? In case I have > the 2 DC setup but don't use TTL I don't understand why data wouldn't be on > the archive node? Originally you were talking about taking the archive node down, and then having HH write hints back. HH is not considered a reliable mechanism for obtaining consistency, it's better in 1.0 but repair is AFAIK still considered the way to achieve consistency. For example HH only collects hints for a down node for 1 hour. Also a read operation will check consistency and may repair it, snapshots do not do that. Finally if you write into the DC with 2 nodes at a CL other than QUORUM or EACH_QUORUM there is no guarantee that the write will be committed in the other DC. > So what data format should I use for historical archiving? Plain text file, with documentation. So that any who follows you can work with the data. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/01/2012, at 12:31 AM, Alexandru Sicoe wrote: > Hi, > > On Wed, Jan 4, 2012 at 9:54 PM, aaron morton wrote: > Some thoughts on the plan: > > * You are monkeying around with things, do not be surprised when surprising > things happen. > > I am just trying to explore different solutions for solving my problem. > > * Deliberately unbalancing the cluster may lead to Bad Things happening. > > I will take your advice on this. I would have liked to have an extra node to > have 2 nodes in each DC. > > * In the design discussed it is perfectly reasonable for data not to be on > the archive node. > > You mean when having the 2 DC setup I mentioned and using TTL? In case I have > the 2 DC setup but don't use TTL I don't understand why data wouldn't be on > the archive node? > > * Truncate is a cluster wide operation and all nodes must be online before it > will start. > * Truncate will snapshot before deleting data, you could use this snapshot. > * TTL for a column is for a column no matter which node it is on. > > Thanks for clarifying these! > > * IMHO Cassandra data files (sstables or JSON dumps) are not a good format > for a historical archive, nothing against Cassandra. You need the lowest > common format. > > So what data format should I use for historical archiving? > > > If you have the resources for a second cluster could you put the two together > and just have one cluster with a very large retention policy? One cluster is > easier than two. > > I am constrained to have limited retention on the Cassandra cluster that is > collecting the data . Once I archive the data for long term storage I cannot > bring it back in the same Cassandra cluster that collected it in the first > place because it's in an enclosed network with strict rules. I have to load > it in another cluster outside the enclosed network. It's not that I have the > resources for a second cluster, I am forced to use a second cluster. > > > Assuming there is no business case for this, consider either: > > * Dumping the historical data into a Hadoop (with or without HDFS) cluster > with high compression. If needed you could then run Hive / Pig to fill a > companion Cassandra cluster with data on demand. Or just query using Hadoop. > * Dumping the historical data to files with high compression and a roll your > own solution to fill a cluster. > > Ok, thanks for these suggestions, I will have to investigate further. > > Also considering talking to Data Stax about DSE. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 5/01/2012, at 1:41 AM, Alexandru Sicoe wrote: > > > Cheers, > Alex >> Hi, >> >> On Tue, Jan 3, 2012 at 8:19 PM, aaron morton wrote: >> Running a time based rolling window of data can be done using the TTL. >> Backing up the nodes for disaster recover can be done using snapshots. >> Restoring any point in time will be tricky because to may restore columns >> where the TTL has expired. >> >> Yeah, that's the thing...if I want to use the system as I explain further >> below, I cannot do backing up of data (for later restoration) if I'm using >> TTLs. >> >> >>> Will I get a single copy of the data in the remote storage or will it be >>> twice the data (data + replica)? >> You will RF copies of the data. (By the way, there is no original copy) >> >> Well, if I organize the cluster as I mentioned in the first email, I will >> get one copy of each row at a certain point in time on node2 if I take it >> offline, perform a major compaction and GC, won't I? I don't want to send >> duplicated data to the mass storage! >> >> >> Can you share a bit more about the use case ? How much data and what sort of >> read patterns ? >> >> >> I have several applications that feed into Cassandra
Re: Composite column docs
What client are you using ? For example pycassa has some sweet documentation http://pycassa.github.com/pycassa/assorted/composite_types.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/01/2012, at 12:48 AM, Shimi Kiviti wrote: > Is there a doc for using composite columns with thrift? > Is > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java > the only doc? > does the client needs to add the length to the get \ get_slice... queries or > is it taken care of on the server side? > > Shimi
Re: Writes slower then reads
What happens when you turn off the cron jobs ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/01/2012, at 6:57 AM, Philippe wrote: > Unless you are doing huge batches no... don't have any other idea for now... > > 2012/1/5 R. Verlangen > The write and read load is very minimal the moment. Roughly 10 writes + 10 > reads / second. So 20 operations per second. Don't think that overloads my > cluster, does it? > > > 2012/1/5 Philippe > You may be overloading the cluster though... > > My hypothesis is that your traffic is being spread across your node and that > one slow node is slowing down the fraction of traffic that goes to that node > (when it's acting as coordinator). > So what I would do is reduce the read load a lot to make sure I don't > overload the cluster and measure if I see a 1/RF improvement in response time > which would validate my hypothesis. > > > 2012/1/5 R. Verlangen > > It does not appear to affect the response time, certainly not in a positive > way. > > > 2012/1/5 Philippe > What if you shutdown the cassandra service on the slow node, does that > improve your read performance ? > If it does then that sole node is responsible for the slow down because it > can't act as a coordinator fast enough. > > 2012/1/5 R. Verlangen > > I'm also reading with CL = ONE > > > 2012/1/5 Philippe > Depending on the CL you're reading at it will yes : if the CL requires that > the "slow" node create a digest of the data and send it to the coordinator > then it might explain the poor performance on reads. What is your read CL ? > > 2012/1/5 R. Verlangen > > As I posted this I noticed that the other node's CPU is running high on some > other cronjobs (every couple of minutes to 60% usage). Is the lack of more > CPU cycles a problem in this case? > > Robin > > 2012/1/5 R. Verlangen > > CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. Writes > around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out). Paging 0. > System int ~ 1300, csw ~ 2500. > > > 2012/1/5 Philippe > What can you see in vmstat/dstat ? > > Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : > > Hi there, > > I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2. > Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~ > 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram). > > When I'm running my writes against the cluster with cl = ONE all reads appear > to be faster then the writes. > > Average write speed = 1600us/operation > Average read speed = 200us/operation > > I'm really wondering why this is the case. Anyone got a clue? > > With kind regards, > Robin > > > > > > > > >
Re: Should I throttle deletes?
Hello Aaron, On 1/5/2012 4:25 AM, aaron morton wrote: I use a batch mutator in Pycassa to delete ~1M rows based on a longish list of keys I'm extracting from an auxiliary CF (with no problem of any sort). What is the size of the deletion batches ? 2000 mutations. Now, it appears that such heads-on delete puts a temporary but large load on the cluster. I have SSD's and they go to 100% utilization, and the CPU spikes to significant loads. Does the load spike during the deletion or after it ? During. Do any of the thread pool back up in nodetool tpstats during the load ? Haven't checked, thank you for the lead. I can think of a few general issues you may want to avoid: * Each row in a batch mutation is handled by a task in a thread pool on the nodes. So if you send a batch to delete 1,000 rows it will put 1,000 tasks in the Mutation stage. This will reduce the query throughput. Aah. I didn't know that. I was under the impression that batching saves the communication overhead, and that's it. Then I do have a question, what do people generally use as the batch size? Thanks Maxim
Re: Writes slower then reads
I turned off 1 large cronjob which caused the CPU not to get used for ~ 60% once every 10 minutes. Both write and read are fast now. Just think I was overloading the node. Weird though that shutting down the node did not improve the speed. Thank you all for your time! Robin 2012/1/5 aaron morton > What happens when you turn off the cron jobs ? > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 6/01/2012, at 6:57 AM, Philippe wrote: > > Unless you are doing huge batches no... don't have any other idea for > now... > > 2012/1/5 R. Verlangen > >> The write and read load is very minimal the moment. Roughly 10 writes + >> 10 reads / second. So 20 operations per second. Don't think that overloads >> my cluster, does it? >> >> >> 2012/1/5 Philippe >> >>> You may be overloading the cluster though... >>> >>> My hypothesis is that your traffic is being spread across your node and >>> that one slow node is slowing down the fraction of traffic that goes to >>> that node (when it's acting as coordinator). >>> So what I would do is reduce the read load a lot to make sure I don't >>> overload the cluster and measure if I see a 1/RF improvement in response >>> time which would validate my hypothesis. >>> >>> >>> 2012/1/5 R. Verlangen >>> >>> It does not appear to affect the response time, certainly not in a positive way. 2012/1/5 Philippe > What if you shutdown the cassandra service on the slow node, does that > improve your read performance ? > If it does then that sole node is responsible for the slow down > because it can't act as a coordinator fast enough. > > 2012/1/5 R. Verlangen > > I'm also reading with CL = ONE >> >> >> 2012/1/5 Philippe >> >>> Depending on the CL you're reading at it will yes : if the CL >>> requires that the "slow" node create a digest of the data and send it to >>> the coordinator then it might explain the poor performance on reads. >>> What >>> is your read CL ? >>> >>> 2012/1/5 R. Verlangen >>> >>> As I posted this I noticed that the other node's CPU is running high on some other cronjobs (every couple of minutes to 60% usage). Is the lack of more CPU cycles a problem in this case? Robin 2012/1/5 R. Verlangen CPU is idle (< 10% usage). Disk reads occasionally blocks over > 32/64K. Writes around 0-5MB per second. Network traffic 0.1 / 0.1 > MB/s (in > / out). Paging 0. System int ~ 1300, csw ~ 2500. > > > 2012/1/5 Philippe > >> What can you see in vmstat/dstat ? >> Le 5 janv. 2012 11:58, "R. Verlangen" a écrit : >> >> Hi there, >>> >>> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), >>> RF = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. >>> Disk >>> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine >>> with 16GB >>> ram). >>> >>> When I'm running my writes against the cluster with cl = ONE all >>> reads appear to be faster then the writes. >>> >>> Average write speed = 1600us/operation >>> Average read speed = 200us/operation >>> >>> I'm really wondering why this is the case. Anyone got a clue? >>> >>> With kind regards, >>> Robin >>> >> > >>> >> > >>> >> > >
Re: Hector and CQL
I hate to admit it, but I use maven to get the classpaths right in Eclipse: org.apache.cassandra cassandra-all 1.0.6 jar compile org.cassandraunit cassandra-unit 1.0.1.1 jar compile Chris Gerken On Jan 5, 2012, at 12:51 PM, rektide wrote: > Hector is a library. It needs to be added to your Eclipse project's "build > classpath" > somehow before you can begin using it in Eclipse. > > On Thu, Jan 05, 2012 at 11:25:16PM +0700, dir dir wrote: >> Hi Folk, >> I am a beginner user in Cassandra. I have a question about the usage and >> integration (or installation) hector into eclipse IDE? I try to find the >> answer >> by googling, but I do not find a proper guidance to do it. Would you want >> to help me >> by telling me how to do it or showing me the proper guidance in the >> internet?? >> Thank you. �
Re: Hector and CQL
If you are looking to add hector, you'll need: me.prettyprint hector 1.0-2 -brian Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ On 1/5/12 3:04 PM, "Chris Gerken" wrote: >I hate to admit it, but I use maven to get the classpaths right in >Eclipse: > > > org.apache.cassandra > cassandra-all > 1.0.6 > jar > compile > > > org.cassandraunit > cassandra-unit > 1.0.1.1 > jar > compile > > >Chris Gerken > > >On Jan 5, 2012, at 12:51 PM, rektide wrote: > >> Hector is a library. It needs to be added to your Eclipse project's >>"build classpath" >> somehow before you can begin using it in Eclipse. >> >> On Thu, Jan 05, 2012 at 11:25:16PM +0700, dir dir wrote: >>> Hi Folk, >>> I am a beginner user in Cassandra. I have a question about the usage >>>and >>> integration (or installation) hector into eclipse IDE? I try to find >>>the >>> answer >>> by googling, but I do not find a proper guidance to do it. Would you >>>want >>> to help me >>> by telling me how to do it or showing me the proper guidance in the >>> internet?? >>> Thank you. � >
Re: Should I throttle deletes?
> > Then I do have a question, what do people generally use as the batch size? > I used to do batches from 500 to 2000 like you do. After investigating issues such as the one you've encountered I've moved to batches of 20 for writes and 256 for reads. Everything is a lot smoother : no more timeouts. The downside though is that I have to run more client threads in parallele to maximize throughput. Cheers
Re: is it bad to have lots of column families?
Does index for CFs must fit in node's memory? 2012/1/5 Віталій Тимчишин > > > 2012/1/5 Michael Cetrulo > >> in a traditional database it's not a good a idea to have hundreds of >> tables but is it also bad to have hundreds of column families in cassandra? >> thank you. >> > > As far as I can see, this may raise memory requirements for you, since you > need to have index/bloom filter for each column family in memory. > > -- > Best regards, > Vitalii Tymchyshyn > -- Carlo Pires 62 8209-1444 TIM 62 3251-1383 Skype: carlopires
Re: Should I throttle deletes?
Thanks, that's quite helpful. I'm wondering though if multiplying the number of clients will end up doing same thing. On 1/5/2012 3:29 PM, Philippe wrote: Then I do have a question, what do people generally use as the batch size? I used to do batches from 500 to 2000 like you do. After investigating issues such as the one you've encountered I've moved to batches of 20 for writes and 256 for reads. Everything is a lot smoother : no more timeouts. The downside though is that I have to run more client threads in parallele to maximize throughput. Cheers
Re: Integration Error between Cassandra and Eclipse
I wouldn't worry about the warnings. Eclipse Java support defaults to fairly restrictive warning settings. You can go into the preferences for Java->Compiler and change the 'warning' settings to 'ignore' for any of those problems that you don't or shouldn't really care about. As for the error, is that a Test class or part of the main source body? Chris Gerken On Jan 5, 2012, at 11:04 AM, bobby saputra wrote: > Hi There, > > I am a beginner user in Cassandra. I hear from many people said Cassandra is > a powerful database software which is used by Facebook, Twitter, Digg, etc. > So I feel interesting to study more about Cassandra. > > When I performed integration process between Cassandra with Eclipse IDE (in > this case I use Java as computer language), I get trouble and have many > problem. > I have already followed all instruction from > http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this tutorial > was not working properly. I got a lot of errors and warnings while creating > Java project in eclipse. > > These are the errors and warnings: > > Error(X) (1 item): > Description > ResourceLocation > The method rangeSet(Range...) in the type Range is not applicable for the > arguments (Range[]) RangeTest.java line 178 > > Warnings(!) (100 of 2916 items): > Description > ResourceLocation > AbstractType is a raw type. References to generic type AbstractType should > be parameterizedAbstractColumnContainer.javaline 72 > (and many same warnings) > > These are what i've done: > 1. I checked out cassandra-trunk from given link using SlikSvn as svn client. > 2. I moved to cassandra-trunk folder, and build with ant using "ant build" > command. > 3. I generate eclipse files with ant using "ant generate-eclipse-files" > command. > 4. I create new java project on eclipse, insert project name with > "cassandra-trunk", browse the location into cassandra-trunk folder. > > Do I perform any mistakes? Or there are something wrong with the tutorial in > http://wiki.apache.org/cassandra/RunningCassandraInEclipse ?? > > I have already googling to find the solution to solve this problem, but > unfortunately > I found no results. Would you want to help me by giving me a guide how to > solve > this problem? Please > > Thank you very much for your help. > > Best Regards, > Wira Saputra
Re: Integration Error between Cassandra and Eclipse
How about to use "File->Import..." rather than "File->New Java Project"? After extracting the source, ant build, and ant generate-eclipse-files: 1. File->Import... 2. Choose "Existing Project into workspace..." 3. Choose your source directory as root directory and then push "Finish" 2012/1/6 bobby saputra : > Hi There, > > I am a beginner user in Cassandra. I hear from many people said Cassandra is > a powerful database software which is used by Facebook, Twitter, Digg, etc. > So I feel interesting to study more about Cassandra. > > When I performed integration process between Cassandra with Eclipse IDE (in > this case I use Java as computer language), I get trouble and have many > problem. > I have already followed all instruction from > http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this > tutorial was not working properly. I got a lot of errors and warnings while > creating Java project in eclipse. > > These are the errors and warnings: > > Error(X) (1 item): > Description Resource Location > The method rangeSet(Range...) in the type Range is not applicable for the > arguments (Range[]) RangeTest.java line 178 > > Warnings(!) (100 of 2916 items): > Description Resource Location > AbstractType is a raw type. References to generic type AbstractType > should be parameterized AbstractColumnContainer.java line 72 > (and many same warnings) > > These are what i've done: > 1. I checked out cassandra-trunk from given link using SlikSvn as svn > client. > 2. I moved to cassandra-trunk folder, and build with ant using "ant build" > command. > 3. I generate eclipse files with ant using "ant generate-eclipse-files" > command. > 4. I create new java project on eclipse, insert project name with > "cassandra-trunk", browse the location into cassandra-trunk folder. > > Do I perform any mistakes? Or there are something wrong with the tutorial in > http://wiki.apache.org/cassandra/RunningCassandraInEclipse ?? > > I have already googling to find the solution to solve this problem, but > unfortunately > I found no results. Would you want to help me by giving me a guide how to > solve > this problem? Please > > Thank you very much for your help. > > Best Regards, > Wira Saputra -- w3m
Re: Integration Error between Cassandra and Eclipse
Sorry, ignore my reply. I had same result with import. ( 1 error in unit test code & many warnings ) 2012/1/6 Maki Watanabe : > How about to use "File->Import..." rather than "File->New Java Project"? > > After extracting the source, ant build, and ant generate-eclipse-files: > 1. File->Import... > 2. Choose "Existing Project into workspace..." > 3. Choose your source directory as root directory and then push "Finish" > > > 2012/1/6 bobby saputra : >> Hi There, >> >> I am a beginner user in Cassandra. I hear from many people said Cassandra is >> a powerful database software which is used by Facebook, Twitter, Digg, etc. >> So I feel interesting to study more about Cassandra. >> >> When I performed integration process between Cassandra with Eclipse IDE (in >> this case I use Java as computer language), I get trouble and have many >> problem. >> I have already followed all instruction from >> http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this >> tutorial was not working properly. I got a lot of errors and warnings while >> creating Java project in eclipse. >> >> These are the errors and warnings: >> >> Error(X) (1 item): >> Description Resource Location >> The method rangeSet(Range...) in the type Range is not applicable for the >> arguments (Range[]) RangeTest.java line 178 >> >> Warnings(!) (100 of 2916 items): >> Description Resource Location >> AbstractType is a raw type. References to generic type AbstractType >> should be parameterized AbstractColumnContainer.java line 72 >> (and many same warnings) >> >> These are what i've done: >> 1. I checked out cassandra-trunk from given link using SlikSvn as svn >> client. >> 2. I moved to cassandra-trunk folder, and build with ant using "ant build" >> command. >> 3. I generate eclipse files with ant using "ant generate-eclipse-files" >> command. >> 4. I create new java project on eclipse, insert project name with >> "cassandra-trunk", browse the location into cassandra-trunk folder. >> >> Do I perform any mistakes? Or there are something wrong with the tutorial in >> http://wiki.apache.org/cassandra/RunningCassandraInEclipse ?? >> >> I have already googling to find the solution to solve this problem, but >> unfortunately >> I found no results. Would you want to help me by giving me a guide how to >> solve >> this problem? Please >> >> Thank you very much for your help. >> >> Best Regards, >> Wira Saputra > > > > -- > w3m -- w3m
Re: Integration Error between Cassandra and Eclipse
Also note that Cassandra project switched to git from svn. See "Source control" section of http://cassandra.apache.org/download/ . Regards, Yuki -- Yuki Morishita On Thursday, January 5, 2012 at 7:59 PM, Maki Watanabe wrote: > Sorry, ignore my reply. > I had same result with import. ( 1 error in unit test code & many warnings ) > > 2012/1/6 Maki Watanabe (mailto:watanabe.m...@gmail.com)>: > > How about to use "File->Import..." rather than "File->New Java Project"? > > > > After extracting the source, ant build, and ant generate-eclipse-files: > > 1. File->Import... > > 2. Choose "Existing Project into workspace..." > > 3. Choose your source directory as root directory and then push "Finish" > > > > > > 2012/1/6 bobby saputra mailto:zaibat...@gmail.com)>: > > > Hi There, > > > > > > I am a beginner user in Cassandra. I hear from many people said Cassandra > > > is > > > a powerful database software which is used by Facebook, Twitter, Digg, > > > etc. > > > So I feel interesting to study more about Cassandra. > > > > > > When I performed integration process between Cassandra with Eclipse IDE > > > (in > > > this case I use Java as computer language), I get trouble and have many > > > problem. > > > I have already followed all instruction from > > > http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this > > > tutorial was not working properly. I got a lot of errors and warnings > > > while > > > creating Java project in eclipse. > > > > > > These are the errors and warnings: > > > > > > Error(X) (1 item): > > > Description Resource Location > > > The method rangeSet(Range...) in the type Range is not applicable for > > > the > > > arguments (Range[]) RangeTest.java line 178 > > > > > > Warnings(!) (100 of 2916 items): > > > Description Resource Location > > > AbstractType is a raw type. References to generic type AbstractType > > > should be parameterized AbstractColumnContainer.java line 72 > > > (and many same warnings) > > > > > > These are what i've done: > > > 1. I checked out cassandra-trunk from given link using SlikSvn as svn > > > client. > > > 2. I moved to cassandra-trunk folder, and build with ant using "ant build" > > > command. > > > 3. I generate eclipse files with ant using "ant generate-eclipse-files" > > > command. > > > 4. I create new java project on eclipse, insert project name with > > > "cassandra-trunk", browse the location into cassandra-trunk folder. > > > > > > Do I perform any mistakes? Or there are something wrong with the tutorial > > > in > > > http://wiki.apache.org/cassandra/RunningCassandraInEclipse ?? > > > > > > I have already googling to find the solution to solve this problem, but > > > unfortunately > > > I found no results. Would you want to help me by giving me a guide how to > > > solve > > > this problem? Please > > > > > > Thank you very much for your help. > > > > > > Best Regards, > > > Wira Saputra > > > > > > > > > > > > > -- > > w3m > > > > > > > -- > w3m > >
RE: Integration Error between Cassandra and Eclipse
Hi, Can you post the error(saying that only 1 error is there), that'll make things more clear. Thanks Kuldeep Singh Sengar Opera Solutions Tech Boulevard,8th floor, Tower C, Sector 127, Plot No 6,Noida 201 301 +91 (120) 4642424 facsimile, Ext : 2418 +91 8800595878 (M) -Original Message- From: Maki Watanabe [mailto:watanabe.m...@gmail.com] Sent: Friday, January 06, 2012 7:30 AM To: user@cassandra.apache.org Subject: Re: Integration Error between Cassandra and Eclipse Sorry, ignore my reply. I had same result with import. ( 1 error in unit test code & many warnings ) 2012/1/6 Maki Watanabe : > How about to use "File->Import..." rather than "File->New Java Project"? > > After extracting the source, ant build, and ant generate-eclipse-files: > 1. File->Import... > 2. Choose "Existing Project into workspace..." > 3. Choose your source directory as root directory and then push "Finish" > > > 2012/1/6 bobby saputra : >> Hi There, >> >> I am a beginner user in Cassandra. I hear from many people said Cassandra is >> a powerful database software which is used by Facebook, Twitter, Digg, etc. >> So I feel interesting to study more about Cassandra. >> >> When I performed integration process between Cassandra with Eclipse IDE (in >> this case I use Java as computer language), I get trouble and have many >> problem. >> I have already followed all instruction from >> http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this >> tutorial was not working properly. I got a lot of errors and warnings while >> creating Java project in eclipse. >> >> These are the errors and warnings: >> >> Error(X) (1 item): >> Description Resource Location >> The method rangeSet(Range...) in the type Range is not applicable for the >> arguments (Range[]) RangeTest.java line 178 >> >> Warnings(!) (100 of 2916 items): >> Description Resource Location >> AbstractType is a raw type. References to generic type AbstractType >> should be parameterized AbstractColumnContainer.java line 72 >> (and many same warnings) >> >> These are what i've done: >> 1. I checked out cassandra-trunk from given link using SlikSvn as svn >> client. >> 2. I moved to cassandra-trunk folder, and build with ant using "ant build" >> command. >> 3. I generate eclipse files with ant using "ant generate-eclipse-files" >> command. >> 4. I create new java project on eclipse, insert project name with >> "cassandra-trunk", browse the location into cassandra-trunk folder. >> >> Do I perform any mistakes? Or there are something wrong with the tutorial in >> http://wiki.apache.org/cassandra/RunningCassandraInEclipse ?? >> >> I have already googling to find the solution to solve this problem, but >> unfortunately >> I found no results. Would you want to help me by giving me a guide how to >> solve >> this problem? Please >> >> Thank you very much for your help. >> >> Best Regards, >> Wira Saputra > > > > -- > w3m -- w3m
Re: Integration Error between Cassandra and Eclipse
This works for me http://wiki.apache.org/cassandra/HowToDebug On 01/06/2012 01:18 AM, Kuldeep Sengar wrote: Hi, Can you post the error(saying that only 1 error is there), that'll make things more clear. Thanks Kuldeep Singh Sengar Opera Solutions Tech Boulevard,8th floor, Tower C, Sector 127, Plot No 6,Noida 201 301 +91 (120) 4642424 facsimile, Ext : 2418 +91 8800595878 (M) -Original Message- From: Maki Watanabe [mailto:watanabe.m...@gmail.com] Sent: Friday, January 06, 2012 7:30 AM To: user@cassandra.apache.org Subject: Re: Integration Error between Cassandra and Eclipse Sorry, ignore my reply. I had same result with import. ( 1 error in unit test code& many warnings ) 2012/1/6 Maki Watanabe: How about to use "File->Import..." rather than "File->New Java Project"? After extracting the source, ant build, and ant generate-eclipse-files: 1. File->Import... 2. Choose "Existing Project into workspace..." 3. Choose your source directory as root directory and then push "Finish" 2012/1/6 bobby saputra: Hi There, I am a beginner user in Cassandra. I hear from many people said Cassandra is a powerful database software which is used by Facebook, Twitter, Digg, etc. So I feel interesting to study more about Cassandra. When I performed integration process between Cassandra with Eclipse IDE (in this case I use Java as computer language), I get trouble and have many problem. I have already followed all instruction from http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this tutorial was not working properly. I got a lot of errors and warnings while creating Java project in eclipse. These are the errors and warnings: Error(X) (1 item): Description Resource Location The method rangeSet(Range...) in the type Range is not applicable for the arguments (Range[]) RangeTest.java line 178 Warnings(!) (100 of 2916 items): Description Resource Location AbstractType is a raw type. References to generic type AbstractType should be parameterized AbstractColumnContainer.java line 72 (and many same warnings) These are what i've done: 1. I checked out cassandra-trunk from given link using SlikSvn as svn client. 2. I moved to cassandra-trunk folder, and build with ant using "ant build" command. 3. I generate eclipse files with ant using "ant generate-eclipse-files" command. 4. I create new java project on eclipse, insert project name with "cassandra-trunk", browse the location into cassandra-trunk folder. Do I perform any mistakes? Or there are something wrong with the tutorial in http://wiki.apache.org/cassandra/RunningCassandraInEclipse ?? I have already googling to find the solution to solve this problem, but unfortunately I found no results. Would you want to help me by giving me a guide how to solve this problem? Please Thank you very much for your help. Best Regards, Wira Saputra -- w3m
Re: Dealing with "Corrupt (negative) value length encountered"
Thanks Aaron, I was able to complete the repair by scrubbing the column family on all three replicas. Cheers 2012/1/4 aaron morton > I was able to scrub the node the repair that failed was running on. Are > you saying the error could be displayed on that node but the bad data > coming from another node ? > > Yes. The error occurred the node was receiving a data stream from another, > you will need to clean the source of the data. You can either crawl through > the logs or scrub the entire cluster. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 4/01/2012, at 9:15 AM, Philippe wrote: > > I was able to scrub the node the repair that failed was running on. Are > you saying the error could be displayed on that node but the bad data > coming from another node ? > > Log inspection also showed many of these, they seem to happen around when > a stream transfer finishes. > ERROR [Thread-550876] 2012-01-03 16:35:31,922 > AbstractCassandraDaemon.java (line 139) Fatal exception in thread > Thread[Thread-550876,5,main] > java.lang.IllegalArgumentException > at > sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:586) > at > org.apache.cassandra.streaming.IncomingStreamReader.readnwrite(IncomingStreamReader.java:110) > at > org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:85) > at > org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61) > at > org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) > > Thanks > > 2012/1/2 aaron morton > >> I would try to nodetool scrub the data on the node that that sent the bad >> data in the stream. You may be able to work which node from the logs, or it >> may be easier to just scrub them all. >> >> Hope that helps. >> >> - >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 31/12/2011, at 12:20 AM, Philippe wrote: >> >> Hello, >> Running a combination of 0.8.6 and 0.8.8 with RF=3, I am getting the >> following while repairing one node (all other nodes completed successfully). >> Can I just stop the instance, erase the SSTable and restart cleanup ? >> Thanks >> >> ERROR [Thread-402484] 2011-12-29 14:51:03,687 >> AbstractCassandraDaemon.java (line 139) Fatal exception in thread >> Thread[Thread-402484,5,main] >> java.lang.RuntimeException: java.util.concurrent.ExecutionException: >> java.io.IOError: java.io.IOException: Corrupt (negative) value length >> encountered >> at >> org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:154) >> at >> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63) >> at >> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189) >> at >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) >> Caused by: java.util.concurrent.ExecutionException: java.io.IOError: >> java.io.IOException: Corrupt (negative) value length encountered >> at >> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) >> at java.util.concurrent.FutureTask.get(FutureTask.java:83) >> at >> org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:138) >> ... 3 more >> >> >> > >