Re: Commit log on USB flash disk?

2013-11-16 Thread Philippe
Hi david, we tried it two years ago and the performance of the USB stick was so dismal we stopped. Cheers Le 16 nov. 2013 15:13, "David Tinker" a écrit : > Our hosting provider has a cost effective server with 2 x 4TB disks > with a 16G (or 64G) USB thumb drive option. Would it make sense to put

Troubleshooting IO performance ?

2011-06-04 Thread Philippe
I run raises the IOs to 80% utilization of the SSD drives even though I'm running the same query over and over (no cache??) Any ideas on how to troubleshoot this, or better, how to solve this ? thanks Philippe

Re: Troubleshooting IO performance ?

2011-06-06 Thread Philippe
gain. Le 5 juin 2011 16:55, "Jonathan Ellis" a écrit : > You may be swapping. > > http://spyced.blogspot.com/2010/01/linux-performance-basics.html > explains how to check this as well as how to see what threads are busy > in the Java process. > > On Sat, Jun 4, 2011

Re: Troubleshooting IO performance ?

2011-06-06 Thread Philippe
ery in a rollup table that was originally in MySQL and it doesn't look like the performance to query by key is better. So I'm betting I'm doing something wrong here... but what ? Any ideas ? Thanks 2011/6/6 Philippe > hum..no, it wasn't swapping. cassandra was the only

Re: Troubleshooting IO performance ?

2011-06-07 Thread Philippe
howing during the slow down ? >> - exactly how much data are you asking for ? how many rows and what sort of >> slice >> - has their been a lot of deletes or TTL columns used ? >> >> Hope that helps. >> Aaron >> >> - >> Aaron Mort

Re: Troubleshooting IO performance ?

2011-06-07 Thread Philippe
ng to be a read-heavy, update-heavy cluster. No TTL columns, no counter columns One question : when nodetool cfstats says the average read latency is 5ms, is that counted once the query is being executed or does that include the time spent "pending" ? Thanks Philippe > > Ho

Re: Troubleshooting IO performance ?

2011-06-10 Thread Philippe
right ? So what are my options ? My rows are very small at the moment (like well < 4 kBytes). Should I reduce the read buffer ? Should I reduce the number of SST tables ? Thanks Philippe > > Hope that helps. > > - > Aaron Morton > Freelance Cassandra D

Re: Where is the Overview Documentation on Counters?

2011-06-10 Thread Philippe
Ian, Have you been able to measure the performance penalty of running at CL=ALL ? Right now I'm spreading updates over such counter columns across workers so they don't overlap keys and that way I don't go to CL=ALL but maybe that's not worth it? Any input? Thanks Philippe 2

Re: Troubleshooting IO performance ?

2011-06-11 Thread Philippe
More info below > I just loaded 4.8GB of similar data in another keyspace and ran the same > process as in my previous tests but on that data. > I started with three threads hitting cassandra. No I/O, hardly any CPU (15% > on a 4 core server) > After an hour or so, I raised it to 6 threads in par

Read performance vs. vmstat + your experience with read optimizations

2011-06-20 Thread Philippe
Hi all, I am having trouble reconciling various metrics regarding reads so I'm hoping someone here can help me understand what's going on. I am running tests on a single node cluster with 16GB of RAM. I'm testing on the following column family: Column Family: PUBLIC_MONTHLY

Questions about Cassandra reads

2011-06-24 Thread Philippe
Hello, I am trying to understand the way cassandra reads data. I've been reading a lot and here is what I understand. Can I get some feedback on the following claims ? Which are right and which are wrong? A) Upon opening an SSTTable for read, Cassandra samples one key in 100 to speed up disk acce

Re: Decorator Algorithm

2011-06-27 Thread Philippe
a quick followup on this : when using Byte ordered partitioner. how does a short key get mapped to the 128bit token ? what about keys longer than 128 ? does Cassandra just pad and truncate ? thanks Le 24 juin 2011 04:53, "Maki Watanabe" a écrit : > A little addendum > > Key := Your data to identi

Re: Counter Column

2011-06-27 Thread Philippe
if i write at ALL and read at ONE,is that setting required ? thanks Le 27 juin 2011 17:22, "Donal Zang" a écrit : > On 27/06/2011 17:04, Artem Orobets wrote: >> >> Hi! >> >> As I know, we use counter column only with replication factor ALL, so >> is it mean that we can't read data while any replic

CL.ALL & counters

2011-07-12 Thread Philippe
Hello, I'm testing using a 2 node cluster. My CF only has counters and I have replicate_on_write=true. I loaded data into the CF using CL=ALL. At some point, it couldn't write anymore (looked like a GC froze one of the machines) which makes sense. However, I happened to run nodetool repair it is h

Re: Questions about Cassandra reads

2011-07-12 Thread Philippe
Hi Jonathan, Thanks for the answer, I wanted to report on the improvements I got because someone else is bound to run into the same questions... > > C) I want to access a key that is at the 50th position in that table, > > Cassandra will seek position 0 and then do a sequential read of the file >

Re: Slow Reads

2011-07-26 Thread Philippe
i believe it's because it needs to read the whole row to get to your super column. you might have to reconsider your model. Le 26 juil. 2011 17:39, "Priyanka" a écrit : > > Hello All, > > I am doing some read tests on Cassandra on a single node.But they > are turning up to be very slow. > Here is

Read Repairs

2011-07-30 Thread Philippe
Hello, I have a 3-node ring at RF=3 that is doing reads & writes. I am using two types of consistency levels. - write ALL,read ONE for one set of column families - write QUORUM, read ONE for another set of families Every day, I have a cron job that runs a nodetool repair on each node. The n

Write everywhere, read anywhere

2011-08-03 Thread Philippe
Hello, I have a 3-node, RF=3, cluster configured to write at CL.ALL and read at CL.ONE. When I take one of the nodes down, writes fail which is what I expect. When I run a repair, I see data being streamed from those column families... that I didn't expect. How can the nodes diverge ? Does this mea

Dropped messages

2011-08-05 Thread Philippe
Hi, I see lines like this in my log file INFO [ScheduledTasks:1] 2011-08-06 00:51:57,650 MessagingService.java (line 586) 358 MUTATION messages dropped in server lifetime INFO [ScheduledTasks:1] 2011-08-06 00:51:57,658 MessagingService.java (line 586) 297 READ messages dropped in server lifetime

Re: Dropped messages

2011-08-07 Thread Philippe
tp://www.thelastpickle.com > > On 6 Aug 2011, at 10:53, Philippe wrote: > > Hi, > I see lines like this in my log file > INFO [ScheduledTasks:1] 2011-08-06 00:51:57,650 MessagingService.java > (line 586) 358 MUTATION messages dropped in server lifetime > INFO [ScheduledTas

batch mutates & throughput

2011-08-07 Thread Philippe
A question regarding batch mutates and how others might be throttling the system to prevent timeouts. My 3-node, RF=3 cluster has been performing ok while bulk loading data (applying counter updates). I've been able to run 16 threads in parallel that each perform about 400 mutates/s on a loaded cl

Replicate on write stage errors

2011-08-07 Thread Philippe
hello, I've got new errors showing up in my cassandra log file since I starting testing batch mutates (and it failed). I have done a rolling restart and they are not disappearing. How can I fix this ? What is this really saying about my data and my cluster ? Thanks ERROR [ReplicateOnWriteStage:35

Re: batch mutates & throughput

2011-08-07 Thread Philippe
Quick followup. I have pushed the RPC timeout to 30s. Using Hector, I'm doing 1 thread doing batches of 10 mutates at a time so that's even slower than when I was doing 16 threads in parallel doing non-batched mutations. After a couple hundred execute() calls, I get a timeout for every node; I have

Re: batch mutates & throughput

2011-08-07 Thread Philippe
setting "cassandraThriftSocketTimeout" > of hector. https://github.com/rantav/hector/wiki/User-Guide > > > On Mon, Aug 8, 2011 at 6:54 AM, Philippe wrote: > >> Quick followup. >> I have pushed the RPC timeout to 30s. Using Hector, I'm doing 1 thread >> doing batches of 10

Re: batch mutates & throughput

2011-08-08 Thread Philippe
le weeks now so it will take a while to sanitize stuff before posting it. This is now hitting me in another part of the app where I had batched stuff... oh well On Mon, Aug 8, 2011 at 12:34 AM, Philippe wrote: > > Hi Boris, > > Thanks for the suggestion, I didn't know there was on

Re: batch mutates & throughput

2011-08-08 Thread Philippe
he.cassandra.service.StorageProxy$5$1.runMayThrow(StorageProxy.java:455) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more > > On Mon, Aug 8, 2011 at 12:34 AM, Philippe wrote: > > Hi Boris, > > Thanks for the suggesti

Re: Replicate on write stage errors

2011-08-08 Thread Philippe
thanks sylvain. I'd ended up finding that issue. and you answered my subsequent questions. Merci Le 8 août 2011 19:32, "Sylvain Lebresne" a écrit :

Unable to repair a node

2011-08-14 Thread Philippe
Hello, I've been fighting with my cluster for a couple days now... Running 0.8.1.3, using Hector and loadblancing requests across all nodes. My question is : how do I get my node back under control so that it runs like the other two nodes. It's a 3 node, RF=3 cluster with reads & writes at LC=QUO

Re: Unable to repair a node

2011-08-14 Thread Philippe
SST tables went back to normal for the other keyspace. The first repair is still not over. 2011/8/14 Philippe > Hello, I've been fighting with my cluster for a couple days now... Running > 0.8.1.3, using Hector and loadblancing requests across all nodes. > My question is : how do

Merged counter shard with a count != 0

2011-08-14 Thread Philippe
Hi I'm getting the following at startup on one of the nodes on my 3 node cluster with RF=3. I have 6 keyspaces each with 10 column families that contain supercolumns that contain only counter columns. Looking at http://www.datastax.com/dev/blog/whats-new-in-cassandra-0-8-part-2-countersI see that

Re: Unable to repair a node

2011-08-14 Thread Philippe
No it depends on the consistency level. It's different : for example, QUORUM = 2 for RF=3 Anyway, anyone have an answer to my real issue ? Thanks 2011/8/14 Stephen Connolly > oh i know you can run rf 3 on a 3 node cluster. more i thought that if you > have one fail you have less nodes than the

Scalability question

2011-08-14 Thread Philippe
to me like that's independent from the write throughput, just a question of how long it takes. What am I missing ? Thanks Philippe

Re: Unable to repair a node

2011-08-14 Thread Philippe
@Teijo : thanks for the procedure, I hope I won't have to do that Peter, I'll answer inline. Thanks for the detailed answer. > > the number of SSTables for some keyspaces goes dramatically up (from 3 or > 4 > > to several dozens). > > Typically with a long running compaction, such as that trigge

Re: Merged counter shard with a count != 0

2011-08-15 Thread Philippe
data because of out of sync nodes. P > > On Sun, Aug 14, 2011 at 12:28 PM, Philippe wrote: > > Hi I'm getting the following at startup on one of the nodes on my 3 node > > cluster with RF=3. > > I have 6 keyspaces each with 10 column families that contain superco

Re: Merged counter shard with a count != 0

2011-08-15 Thread Philippe
by doing that ! > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 15 Aug 2011, at 05:28, Philippe wrote: > > Hi I'm getting the following at startup on one of the nodes on

Re: Scalability question

2011-08-15 Thread Philippe
Adding nodes every couple of weeks ? Philippe

Re: Scalability question

2011-08-15 Thread Philippe
Forgot to mention that stopping & restarting the server brought the data directory down to 283GB in less than 1 minute. Philippe 2011/8/15 Philippe > It's another reason to avoid major / manual compactions which create a >> single big SSTable. Minor compactions keep things

Truncate column families

2011-08-16 Thread Philippe
Hello, what are the guarantees regarding truncates issued through the CLI ? I have a 3 node ring at RF=3. No writes going to the keyspace at issue here. I go to the CLI on one of the nodes and issue a truncate on all CF of the keyspace. I run a list [CF] and make sure there is no data. When I run

Re: Scalability question

2011-08-16 Thread Philippe
ONTHLY_20 with column_type = Super with comparator = UTF8Type with subcomparator = BytesType and min_compaction_threshold=2 and read_repair_chance=0 and keys_cached = 20 and rows_cached = 50 and default_validation_class = CounterColumnType and replicate_on_write=true; Philippe 2011/8/16 Teij

Re: Truncate column families

2011-08-16 Thread Philippe
t 8:45 AM, Philippe wrote: > > Hello, what are the guarantees regarding truncates issued through the CLI > ? > > I have a 3 node ring at RF=3. No writes going to the keyspace at issue > here. > > I go to the CLI on one of the nodes and issue a truncate on all CF of the > > ke

Re: Truncate column families

2011-08-16 Thread Philippe
tes > successfully on all nodes. Is this 0.8.4? Can you reproduce with a > toy cluster using https://github.com/pcmanus/ccm ? > > On Tue, Aug 16, 2011 at 9:44 AM, Philippe wrote: > > The title and the comments describe a node restarting. This is not my > case. > > could it

Re: Unable to repair a node

2011-08-16 Thread Philippe
g up my compactions hence the huge pile up of compactions until the disk fulls. I know there's an issue related to failed streams & repairs, could I be hitting it ? Thanks 2011/8/14 Philippe > @Teijo : thanks for the procedure, I hope I won't have to do that > > Peter

Re: Unable to repair a node

2011-08-16 Thread Philippe
acknowledged, node A doesn't ? 2011/8/16 Philippe > I'm still trying different stuff. Here are my latest findings, maybe > someone will find them useful: > >- I have been able to repair some small column families by issuing a >repair [KS] [CF]. When testing on the ri

Re: Unable to repair a node

2011-08-16 Thread Philippe
Thanks for the pointers, responses inline. On Tue, Aug 16, 2011 at 3:48 PM, Philippe wrote: > > I have been able to repair some small column families by issuing a repair > > [KS] [CF]. When testing on the ring with no writes at all, it still takes > > about 2 repairs to get &qu

Re: Unable to repair a node

2011-08-16 Thread Philippe
One last thought : what happens when you ctrl-c a nodetool repair ? Does it stop the repair on the server ? If not, then I think I have multiple repairs still running. Is there any way to check this ? Thanks 2011/8/16 Philippe > Even more interesting behavior : a repair on a CF

Re: Unable to repair a node

2011-08-16 Thread Philippe
t; Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 17/08/2011, at 10:09 AM, Philippe wrote: > > One last thought : what happens when you ctrl-c a nodetool repair ? Does it > stop the repair on the server ? If not, then I think

Bulk loading into live data

2011-08-16 Thread Philippe
http://www.datastax.com/dev/blog/bulk-loading indicates that "it is perfectly reasonable to load data into a live, active cluster." So lets say my cluster has a single KS & CF and it contains a key "test" with a SC named "Cass" and a normal subcolumn named "Data" that has value 1. If I SSTLoad da

Re: Bulk loading into live data

2011-08-17 Thread Philippe
> > What if the column is a counter ? Does it overwrite or increment ? Ie if > the SST I am loading has the exact same setup but value 2, will my value > change to 3 ? > > Counter columns only know how to increment (assuming no deletes), so you > will get 3. See > https://github.com/apache/cassandr

Re: nodetool repair caused high disk space usage

2011-08-17 Thread Philippe
Look at my last two or three threads. I've encountered the same thing and got some pointers/answers. On Aug 17, 2011 4:03 PM, "Huy Le" wrote: > Hi, > > After upgrading to cass 0.8.4 from cass 0.6.11. I ran scrub. That worked > fine. Then I ran nodetool repair on one of the nodes. The disk usage on

Repairs are both ways ?

2011-08-17 Thread Philippe
Looking at the logs, I see that repairs stream data TO and FROM a node to its replicas. So on a 3-node RF=3 cluster, one only needs to launch repairs on a single node right ? Thanks

Re: Unable to repair a node

2011-08-17 Thread Philippe
, I'm guessing it's neither the hardware or the network. I could provide the data directories privately to a commiter if that helps... I assume an eighth repair would also stream stuff around. The data directories are : 8.3GB, 3.3GB and 3.1GB Thanks 2011/8/17 Philippe > ctrl-c

Re: nodetool repair caused high disk space usage

2011-08-17 Thread Philippe
t in on of the threads that the >> issue is not reprocible, but multiple users have the same issue. This there >> anything that I should do to determine the cause of this issue for I do a >> rolling restart and try to run repair again? Thanks! >> >> Huy >> >> &

Re: Repairs are both ways ?

2011-08-17 Thread Philippe
> > Almost, but not quite: if you have nodes A,B,C and repair A, it will > transfer A<->B, A<->C, but not B<->C. > But on a 3 node cluster once you do A<->B & A<->C, why don't you transitively get B<->C ? Thanks

Re: Repairs are both ways ?

2011-08-18 Thread Philippe
> > Because they are occurring in parallel. > So if a range is out of sync between A<->B and A<->C, A will receive the repairing stream from both (in any order) and will apply mutations based on that and the usual overwrite rules so necessarily exclude one of the repairing stream and that data will

Re: Repairs are both ways ?

2011-08-18 Thread Philippe
> > Because they are occurring in parallel. >> > So if a range is out of sync between A<->B and A<->C, A will receive the > repairing stream from both (in any order) and will apply mutations based on > that and the usual overwrite rules so necessarily exclude one of the > repairing stream and that

Re: nodetool repair caused high disk space usage

2011-08-18 Thread Philippe
Unfortunately repairing one cf at a time didn't help in my case because it still streams all CF and that triggers lots of compactions On Aug 18, 2011 3:48 PM, "Huy Le" wrote:

Re: nodetool repair caused high disk space usage

2011-08-20 Thread Philippe
Péter, In our case they get created exclusively during repairs. Compactionstats showed a huge number of sstable build compactions On Aug 20, 2011 1:23 AM, "Peter Schuller" wrote: >> Is there any chance that the entire file from source node got streamed to >> destination node even though only smal

Re: nodetool repair caused high disk space usage

2011-08-20 Thread Philippe
> > Do you have an indication that at least the disk space is in fact > consistent with the amount of data being streamed between the nodes? I > think you had 90 -> ~ 450 gig with RF=3, right? Still sounds like a > lot assuming repairs are not running concurrently (and compactions are > able to run

Re: Different Load values after stress test runs....

2011-08-23 Thread Philippe
Have you run repair on the nodes ? Maybe some data was lost and not repaired yet ? Philippe 2011/8/23 Chris Marino > Hi, we're running some performance tests against some clusters and I'm > curious about some of the numbers I see. > > I'm running the stress t

Re: Cassandra Node Requirements

2011-08-26 Thread Philippe
> > Sort of. There's some fine print, such as the 50% number is only if > you're manually forcing major compactions, which is not recommended, > but a bigger thing to know is that 1.0 will introduce "leveled > compaction" [1] inspired by leveldb. The free space requirement will > then be a small

Moving to a new cluster

2011-09-21 Thread Philippe
Hello, We're currently running on a 3-node RF=3 cluster. Now that we have a better grip on things, we want to replace it with a 12-node RF=3 cluster of "smaller" servers. So I wonder what the best way to move the data to the new cluster would be. I can afford to stop writing to the current cluster

Re: Moving to a new cluster

2011-09-22 Thread Philippe
> Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 22/09/2011, at 10:27 AM, Philippe wrote: > >> Hello, >> We're currently running on a 3-node RF=3 cluster. Now that we have a better grip on things, we want t

Re: Moving to a new cluster

2011-09-22 Thread Philippe
Developer > @aaronmorton > http://www.thelastpickle.com > > On 22/09/2011, at 7:23 PM, Philippe wrote: > >> Hi Aaron >> Thanks for the reply >> >> I should hhave mentionned that all current nodes are running 0.8.4. >> All current and future services have 2TB disks of which i

Re: is it possible for light-traffic CF to hold down many commit logs?

2011-09-22 Thread Philippe
It sure looks like what I'm seeing on my cluster where a 100G commit lot partition fills up in 12 hours (0.8.x) Le 23 sept. 2011 03:45, "Yang" a écrit : > in 1.0.0 we don't have memtable_throughput for each individual CF , > and instead > which memtable/CF to flush is determined by "largest > getT

Token != DecoratedKey assertion

2011-09-25 Thread Philippe
Hello, I've seen a couple of these in my logs, running 0.8.4. This is a RF=3, 3-node cluster. 2 nodes including this one are on 0.8.4 and one is on 0.8.5 The node is still functionning hours later. Should I be worried ? Thanks ERROR [ReadStage:94911] 2011-09-24 22:40:30,043 AbstractCassandraDaem

Seed vs non-seed in YAML

2011-09-25 Thread Philippe
Hello, I'm deploying my cluster with Puppet so it's actually easier for me to add all cassandra nodes to the seed list in the YAML file than to choose a few. Would there be any reason NOT to do this ? Thanks

Re: frequent node UP/Down?

2011-09-25 Thread Philippe
I have this happening on 0.8.x It looks to me as this happens when the node is under heavy load such as unthrottled compactions or a huge GC. 2011/9/24 Yang > I'm using 1.0.0 > > > there seems to be too many node Up/Dead events detected by the failure > detector. > I'm using a 2 node cluster on

Re: Token != DecoratedKey assertion

2011-09-25 Thread Philippe
t; > On Sun, Sep 25, 2011 at 2:27 AM, Philippe wrote: >> Hello, >> I've seen a couple of these in my logs, running 0.8.4. >> This is a RF=3, 3-node cluster. 2 nodes including this one are on 0.8.4 and >> one is on 0.8.5 >> >> The node is still functionn

GC for ParNew on 0.8.6

2011-09-26 Thread Philippe
Ever since upgrading to 0.8.6, my nodes' system.log is littered with GCInspector logs such as these INFO [ScheduledTasks:1] 2011-09-26 21:23:40,468 GCInspector.java (line 122) GC for ParNew: 209 ms for 1 collections, 4747932608 used; max is 16838033408 INFO [ScheduledTasks:1] 2011-09-26 21:23:43,7

Assertion error in AntiEntropyService.rendezvous()

2011-09-27 Thread Philippe
Hello, Have just ran into a new assertion error, again after upgrading a 2 month-old cluster to 0.8.6 Can someone explain what this means and the possible consequences ? Thanks ERROR [AntiEntropyStage:2] 2011-09-27 06:07:41,960 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thre

Re: [RELEASE CANDIDATE] Apache Cassandra 1.0.0-rc1 released

2011-09-27 Thread Philippe
Congrats. Is there a target date for the release. If not is it likely to be in October? Le 27 sept. 2011 18:57, "Sylvain Lebresne" a écrit : > The Cassandra team is pleased to announce the release of the first release > candidate for the future Apache Cassandra 1.0. > > The warnings first: this i

Partitioner per keyspace

2011-09-28 Thread Philippe
Hi is there any reason why configuring a partitioner per keyspace wouldn't be possible technically ? Thanks.

Re: GC for ParNew on 0.8.6

2011-09-28 Thread Philippe
No it was an upgrade from 0.8.4 or 0.8.5 depending on the nodes. No cassandra-env files were changed during the update. Any other ideas? The cluster has just been weird ever since running 0.8.6 : has anyone else upgraded and not run into this? Le 28 sept. 2011 09:32, "Peter Schuller" a écrit : >>

Why is mutation stage increasing ??

2011-10-05 Thread Philippe
Hello, I have my 3-node, RF=3 cluster acting strangely. Can someone shed a light as to what is going on ? It was stuck for a couple of hours (all clients TimedOut). nodetool tpstats showed huge increasing MutationStages (in the hundreds of thousands). I restarted one node and it took a while to rep

Re: Why is mutation stage increasing ??

2011-10-05 Thread Philippe
l what client are you using? And can you give a hint to your node > hardware? > > 從我的 BlackBerry® 無線裝置 > -- > *From: * Philippe > *Date: *Wed, 5 Oct 2011 10:33:21 +0200 > *To: *user > *ReplyTo: * user@cassandra.apache.org > *Subject: *Why is mutation

Re: Token != DecoratedKey assertion

2011-10-05 Thread Philippe
ween nodes, scrub fixes local > issues with data. ) > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 26/09/2011, at 12:53 PM, Philippe wrote: > > Juste did > Could there be da

Re: Why is mutation stage increasing ??

2011-10-05 Thread Philippe
s my cluster won't be too out of sync. Thanks 2011/10/5 Philippe > Thanks for the quick responses. > > @Yi > Using Hector 0.8.0-1 > Hardware is : > >- AMD Opteron 4174 6x 2.30+ GHz >- 32 Go DDR3 >- 1 Gbps Lossless > > > @aaron > I

12-node cluster mystery

2011-10-08 Thread Philippe
Dear all, I've just fired up our production cluster : 12 nodes, RF=3 and I've run into something I don't understand at all. Our test cluster was 3 nodes, RF=3 Test cluster was AMD opteron CPUs (6x2.33) w/ 32GB RAM while the production cluster is core i5 (4x2.66) w/ 16 GB RAM. I'm running the same

What does a cluster throttled by the network look like ?

2011-10-30 Thread Philippe
Dear all, I'm working with a 12-node, RF=3 cluster on low-end hardware (core i5 with 16GB of RAM & SATA disks). I'm using a BOP and each node has a load between 50GB and 100GB (yes, I apparently did not set my tokens right... I'll fix that later). I'm hitting the cluster with a little over 100 con

Moving experiences ?

2011-11-09 Thread Philippe
Hello, I am going to need to move some nodes to rebalance my cluster. How safe is this to do on a cluster with writes & reads ? Thanks

Data model for counting uniques online

2011-11-09 Thread Philippe
Hello, I'd like to get some ideas on how to model counting uniques with cassandra. My use-case is that I have various counters that I increment based on data received from multiple devices. I'd like to be able to know if at least X unique devices contributed to a counter value. I've thought of the

Network traffic patterns

2011-11-15 Thread Philippe
Hello, I'm trying to understand the network usage I am seeing in my cluster, can anyone shed some light? It's an RF=3, 12-node, cassandra 0.8.6 cluster. The nodes are p13,p14,p15...p24 and are consecutive in that order on the ring. Each node is only a cassandra database. I am hitting the cluster fr

Re: Network traffic patterns

2011-11-15 Thread Philippe
chooses matching values and sends back data to p4 So if p13-p15 are outputting 80Mb/s why am I not seeing 80Mb/s coming into p4 which is on the receiving end ? Thanks 2011/11/15 Philippe > Hello, > I'm trying to understand the network usage I am seeing in my cluster, can > anyon

Re: Network traffic patterns

2011-11-17 Thread Philippe
ware? Since those machines are sending > data somewhere, maybe they are behind in replicating and are continuously > catching up? > > Use a tool like tcpdump to find out where the data is going > > From: Philippe > Reply-To: "user@cassandra.apache.org" > Date: Tue,

nodetool move in parallel

2011-11-18 Thread Philippe
Running on cassandra 0.8.(6|7), I have issued two moves in the same cluster at the same time, on two different nodes. There are no writes being issued to the cluster. I saw a mailing post mentioning doing moves one node at a time. Did I just trash my cluster ? Thanks

Re: Network traffic patterns

2011-11-20 Thread Philippe
I'm using BOP. Le 20 nov. 2011 13:09, "Boris Yen" a écrit : > I am just curious about which partitioner you are using? > > On Thu, Nov 17, 2011 at 4:30 PM, Philippe wrote: > >> Hi Todd >> Yes all equal hardware. Nearly no CPU usage and no memory issues. &g

Re: What sort of load do the tombstones create on the cluster?

2011-11-21 Thread Philippe
I don't remember your exact situation but could it be your network connectivity? I know I've been upgrading mine because I'm maxing out fastethernet on a 12 node cluster. Le 20 nov. 2011 22:54, "Jahangir Mohammed" a écrit : > Mostly, they are I/O and CPU intensive during major compaction. If gang

Re: Token != DecoratedKey assertion

2011-11-24 Thread Philippe
Oct 5, 2011 12:22 PM, "Philippe" wrote: > A little feedback, > I scrubbed on each server and I haven't seen this error again. The load on > each server eems to be correct. > nodetool compactionstats shows boat-load of "Scrub" at 100% on my 3rd > node bu

Corrupt (negative) value length encountered

2011-11-28 Thread Philippe
Hello, While running a cleanup, Cassandra stopped with the following exception and inspecting the logs revealed several exceptions such as below over the past 3 days.Given that it's dying on compactions, I'm really worried. If a row was trashed, will the error propagate from node to node or will i

Re: Corrupt (negative) value length encountered

2011-11-29 Thread Philippe
Didn't mention this is a 0.8.6 cluster with 5 nodes and RF=3 I bootstrapped a new node 2 days ago. What's weird is that it didn't pickup the token i provided in the yaml file so i had to move it. It looks like CASSANDRA-1992 but I'm not on 0.7.x Le 29 nov. 2011 08:00,

CPU bound workload

2011-12-07 Thread Philippe
Hello, I've got a batch process running every so often that issues a bunch of counter increments. I have noticed that when this process runs without being throttled it will raise the CPU to 80-90% utilization on the nodes handling the requests. This in turns timeouts and general lag on queries runn

CPU -> 0% on nodes

2011-12-08 Thread Philippe
low). This happens every single time. And I can see the second process gets paused during this timeout Any ideas why this might be happening ? Thanks Philippe 0 0128 165472 8304 1434741200 0 0 289 281 0 0 100 0 5 0128 166452 8288 1433783600 232 4

Meaning of values in tpstats

2011-12-10 Thread Philippe
Hello, Here's an example tpstats on one node in my cluster. I only issue multigetslice reads to counter columns Pool NameActive Pending Completed Blocked All time blocked ReadStage27 2166 3565927301 0 0 MutationStag

Re: Meaning of values in tpstats

2011-12-11 Thread Philippe
Was't that on the 1.0 branch ? I'm still running 0.8x ? @Peter: investigating a little more before answering. Thanks 2011/12/10 Edward Capriolo > There was a recent patch that fixed an issue where counters were hitting > the same natural endpoint rather then being randomized across all of them.

Re: Meaning of values in tpstats

2011-12-11 Thread Philippe
Answer below > > Pool NameActive Pending Completed Blocked > All > > time blocked > > ReadStage27 2166 3565927301 0 > With the slicing, I'm not sure off the top of my head. I'm sure > someone else can chime in. For e.g. a mult

Re: CPU bound workload

2011-12-11 Thread Philippe
Hi Peter, I'm going to mix the response to your email along with my other email from yesterday since they pertain to the same issue. Sorry this is a little long, but I'm stomped and I'm trying to describe what I've investigated. In a nutshell, in case someone has encountered this and won't read it

Re: CPU bound workload

2011-12-11 Thread Philippe
for me is : only batch if you gain something because it might break stuff. Given this work-around, can anyone explain to me why this was happening ? 2011/12/11 Philippe > Hi Peter, > I'm going to mix the response to your email along with my other email from > yesterday since they p

Re: Meaning of values in tpstats

2011-12-12 Thread Philippe
given the better results I've seen on my workload with smaller batches, I'm going to do just that. Philippe

Re: CPU bound workload

2011-12-12 Thread Philippe
hiccup (e.g. dirty buffer flushing by linux) this should be visibly > correlated with 'iostat -x -k 1'. > some CPU correlation in some cases (on one node) no iostat correlation ever Thanks Philippe

Re: a query that's killing cassandra

2011-12-12 Thread Philippe
You've got at least one row over 1GB, compacted ! Have you checked whether you are running out of heap ? 2011/12/12 Wojtek Kulik > Hello everyone! > > I have a strange problem with Cassandra (v1.0.5, one node, 8GB, 2xCPU): a > query asking for each key from a certain (super) CF results in timeou

  1   2   >