RE: RE 200TB in Cassandra ?

2012-04-19 Thread Dan Hendry
> The bit I am trying to understand is whether my figure of 400GB/node in practice for Cassandra is correct, or whether we can push the GB/node higher and if so how high Our cluster runs with up to 2TB/node (thats the compressed size) and an RF=2. The figure of 400GB/node is by no way a maximum

RE: WARN [Memtable] live ratio

2012-01-31 Thread Dan Hendry
http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ - Gives some background information (specific to 0.8 but still valid for 1.0 I believe). Not quite sure why a warning message is logged but a ration of < 1 may occur for column families with a very high update to insert ratio. Dan -

RE: Slow Compactions - CASSANDRA-3592

2011-12-13 Thread Dan Hendry
user@cassandra.apache.org Subject: Re: Slow Compactions - CASSANDRA-3592 Does your issue look similar this one? https://issues.apache.org/jira/browse/CASSANDRA-3532 It is also dealing with compactaion taking 10X longer in 1.0.X On 12/13/2011 09:00 AM, Dan Hendry wrote: I have been observing that

Slow Compactions - CASSANDRA-3592

2011-12-13 Thread Dan Hendry
://issues.apache.org/jira/browse/CASSANDRA-3592 Dan Hendry (403) 660-2297

RE: split large sstable

2011-11-21 Thread Dan Hendry
mber-19-11 19:42 To: user@cassandra.apache.org Subject: Re: split large sstable Dne 17.11.2011 17:42, Dan Hendry napsal(a): > What do you mean by ' better file offset caching'? Presumably you mean > 'better page cache hit rate'? fs metadata used to find blocks in smaller f

RE: The use of SuperColumns

2011-11-18 Thread Dan Hendry
My understanding is that support for super columns will remain within the thrift API for the foreseeable future (perhaps indefinitely?) but under the covers, super column families will get transitioned to regular column families with composite column names (

RE: Data Model Design for Login Servie

2011-11-18 Thread Dan Hendry
I they are not limited to repeating values but the Datastax docs[1] on secondary indexes certainly seem to indicate they would be a poor fit for this case (high read load, many unique values). [1] http://www.datastax.com/docs/1.0/ddl/indexes Dan From: Maciej Miklas [mailto:mac.mik...@

RE: Data Model Design for Login Servie

2011-11-17 Thread Dan Hendry
Your first approach, skinny rows, will almost certainly be a better solution although it never hurts to experiment for yourself. Even for low end hardware (for sake of argument, EC2 m1.smalls), a few million rows is basically nothing (again though, I encourage you to verify for yourself). For re

RE: split large sstable

2011-11-17 Thread Dan Hendry
What do you mean by ' better file offset caching'? Presumably you mean 'better page cache hit rate'? Out of curiosity, why do you think this? What data are you seeing which makes you think it's better? I am certainly not even close to a virtual memory or page caching expert but I am pretty sure fil

RE: Compaction -> CPU load 100% -> time out

2011-11-15 Thread Dan Hendry
I really don't recommend using t1.micros. The problem with them is that they have CPU bursting, basically meaning you get lots of CPU resources for a short time but if you use more than you have been allocated you get basically nothing for 10+ seconds afterwards. By 'basically nothing' I really mea

RE: 0.8.6 - Cannot Restart Node, Commit Log Exception

2011-11-15 Thread Dan Hendry
After removing the offending commit log, the node was able to restart successfully. Sufficed to say that's not something I really wanted to do. Dan From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: November-15-11 10:44 To: 'user@cassandra.apache.org' Subject:

0.8.6 - Cannot Restart Node, Commit Log Exception

2011-11-15 Thread Dan Hendry
Last night one of my nodes died inexplicably, with no log entries anywhere indicating the reason (cassandras log, /var/log/messages, etc). I tried to restart it but the node will not restart as it fails with the errors below when replaying the commit log. I should point out that the cluster is curr

RE: 1.0.2 Assertion Error

2011-11-10 Thread Dan Hendry
that this block the MemtablePostFlusher is unfortunately related. Restarting the node would fix but we need to make that more solid too. -- Sylvain On Thu, Nov 10, 2011 at 9:04 PM, Dan Hendry wrote: > Just happened again, seems to be with the same column family (at least on a > flusher thread f

RE: 1.0.2 Assertion Error

2011-11-10 Thread Dan Hendry
Active Pending Completed Blocked All time blocked MemtablePostFlusher 118 16 0 0 Dan From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] Sent: November-10-11 14:49 To: 'user@cassandra.apache.org' Subject: 1.0.2 Assertion Erro

1.0.2 Assertion Error

2011-11-10 Thread Dan Hendry
es and the node seems to be behaving normally. Thoughts? Dan Hendry

RE: shutdown by KILL

2011-11-08 Thread Dan Hendry
I *thought* Cassandra was supposed to have a crash only design[1]. My understanding is that it is safe to simply kill the process and with the regular TERM signal and, shutdown would be blocked on fsyncing the commit log but nothing else (obviously not true if you kill -9 the sucker) even when u

RE: Read perf investigation

2011-11-03 Thread Dan Hendry
Uh, so look at your await time of *107.3*. From the iostat man page: "await: The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them." If the key you are reading from is not

RE: Problem after upgrade to 1.0.1

2011-11-03 Thread Dan Hendry
Regarding load growth, presumably you are referring to the load as reported by JMX/nodetool. Have you actually looked at the disk utilization on the nodes themselves? Potential issue I have seen: http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html Dan From: Bryce Godfrey [ma

RE: Cassandra cluster HW spec (commit log directory vs data file directory)

2011-10-25 Thread Dan Hendry
> 2. ... So I am going to use rotational disk for the commit log and an SSD for data. Does this make sense? Yes, just keep in mind however that the primary characteristic of SSDs is lower seek times which translates into faster random access. We have a similar Cassandra use case (time series da

RE: Specific Question, General Problem

2011-10-25 Thread Dan Hendry
> of 2-300ms for getting a single row. This seems slow, but is it unusual? What are those numbers? 2 ms being average? 300 ms a 95th/99th percentile? A value you saw once? Yes, this *seems* slow given your row definition but without knowing what the value represents it's almost impossible to ju

Cassandra 1.0.0 - Node Load Bug

2011-10-20 Thread Dan Hendry
this a known issue or has anybody else observed a similar pattern? Dan Hendry (403) 660-2297

RE: Immutable CFs and read consistency

2011-10-07 Thread Dan Hendry
The R+W > RF requirement for strong consistency applies regardless of whether your data is 'immutable' or is being updated. A W=1, R=1 approach will not guarantee consistency between reads and writes. > R=1 might cassandra look on one of the two nodes, find no data there, and prematurely give

RE: Unable to link C library (for jna.jar) on 0.7.5

2011-09-01 Thread Dan Hendry
Have you installed 'jna'? On RHEL (6 at least) it should be possible using the default yum repos. You need the native code and the JAR in Cassandras classpath from what I understand. Dan From: eczec...@gmail.com [mailto:eczec...@gmail.com] On Behalf Of Eric Czech Sent: September-01-11 11:13

RE: Disk usage for CommitLog

2011-08-30 Thread Dan Hendry
> 86GB in commitlog and 42GB in data Whoa, that seems really wrong, particularly given your data spans 13 months. Have you changed any of the default cassandra.yaml setting? What is the maximum memtable_flush_after across all your CFs? Any warnings/errors in the Cassandra log? > Out of curi

Re: Disk usage for CommitLog

2011-08-29 Thread Dan Hendry
First off, what version of Cassandra are you using? > We've noticed that when we restart cassandra disk utilization decreases dramatically Presumably you mean 'utilization' as in free space. Specifically on a restart, this type of behavior is likely due to Cassandra deleting compacted SSTables. C

RE: Memtable flush thresholds - what am I missing?

2011-08-18 Thread Dan Hendry
ct: Re: Memtable flush thresholds - what am I missing? See http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/, specifically the section on memtable_total_space_in_mb On Thu, Aug 18, 2011 at 2:43 PM, Dan Hendry wrote: > I am in the process of trying to tune the memtable flush thresho

Memtable flush thresholds - what am I missing?

2011-08-18 Thread Dan Hendry
p. . There are no major compactions running on this machine and no repairs running across the cluster . Hinted handoff is disabled Any insight would be appreciated. Dan Hendry

RE: How to scale Cassandra?

2011-07-04 Thread Dan Hendry
Moving nodes does not result in downtime provide you use proper replication factors and read/write consistencies. The typical recommendation is RF=3 and QUORUM reads/writes. Dan From: Paul Loy [mailto:ketera...@gmail.com] Sent: July-04-11 5:59 To: user@cassandra.apache.org Subject: Re: How

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Dan Hendry
, Jun 16, 2011 at 11:00 PM, AJ wrote: > On 6/16/2011 7:56 PM, Dan Hendry wrote: > >> How would your solution deal with complete network partitions? A node >> being 'down' does not actually mean it is dead, just that it is unreachable >> from whatever is making the

Re: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Dan Hendry
How would your solution deal with complete network partitions? A node being 'down' does not actually mean it is dead, just that it is unreachable from whatever is making the decision to mark it 'down'. Following from Ryan's example, consider nodes A, B, and C but within a fully partitioned network

RE: Propose new ConsistencyLevel.ALL_AVAIL for reads

2011-06-16 Thread Dan Hendry
I think this would add a lot of complexity behind the scenes and be conceptually confusing, particularly for new users. The Cassandra consistency model is pretty elegant and this type of approach breaks that elegance in many ways. It would also only really be useful when the value has a high pro

RE: Schemas diverging while dynamically creating CF.

2011-04-15 Thread Dan Hendry
Uh... don't create a column family per user. Column families are meant to be fairly static; conceptually equivalent to a table in a relational database. Why do you need (or even want) a CF per user? Reconsider your data model, a single column family with an inverted index for a 'user' column is pro

RE: Consistency model

2011-04-15 Thread Dan Hendry
So Cassandra does not use an atomic commit protocol at the cluster level. Strong consistency on a quorum read is only guaranteed *after* a successful quorum write. The behaviour you are seeing is possible if you are reading in the middle of a write or the write failed (which should be reported to y

RE: recurring EOFException exception in 0.7.4

2011-04-15 Thread Dan Hendry
Try running nodetool scrub on the cf: its pretty good at detecting and fixing most corruption problems. Dan -Original Message- From: Jonathan Colby [mailto:jonathan.co...@gmail.com] Sent: April-15-11 15:41 To: user@cassandra.apache.org Subject: recurring EOFException exception in 0.7.4

RE: Compaction threshold does not save with nodetool

2011-04-06 Thread Dan Hendry
There are two layers of settings, the default, cluster wide, settings part of the schema and exposed/modifiable via the cli and individual settings exposed/modifiable via JMX and nodetool. Using nodetool, you are only modifying the in memory settings for a single node, changes to those settings are

RE: batch_mutate failed: out of sequence response

2011-04-05 Thread Dan Hendry
I too have seen the out of sequence response problem. My solution has just been to retry and it seems to work. None of my mutations are THAT large (< 200 columns). The only related information I could find points to a thrift/ubuntu bug of some kind (http://markmail.org/message/xc3tskhhvsf5awz7

RE: question about replicas & dynamic response to load

2011-03-03 Thread Dan Hendry
To some extent, the boot-strapping problem will be an issue with most solutions: the data has to be duplicated from somewhere. Bootstrapping should not cause much performance degradation unless you are already pushing capacity limits. It's the decommissioning problem which makes Cassandra somewhat

RE: frequent client exceptions on 0.7.0

2011-02-17 Thread Dan Hendry
Try turning on GC logging in Cassandra-env.sh, specifically: -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/cassandra/gc.log Look for things like: "Total time for which application threads were stopped: 52.8795600 seconds". Anything over about a few seconds may be causing you

RE: Possible EOFException regression in 0.7.1

2011-02-15 Thread Dan Hendry
I have been having plenty of problems (on 0.7.0, http://www.mail-archive.com/user@cassandra.apache.org/msg09341.html, http://www.mail-archive.com/user@cassandra.apache.org/msg09230.html, http://www.mail-archive.com/user@cassandra.apache.org/msg09122.html, http://www.mail-archive.com/dev@cassandra.a

RE: Data distribution

2011-02-14 Thread Dan Hendry
> 1) If I insert a key and want to verify which node it went to then how do I > do that? I don't think you can and there should be no reason to care. Cassandra abstracts where data is being stored, think in terms of consistency levels not actual nodes. > 2) How can I verify if the replication is

RE: per-connection "read-after-my-write" consistency

2011-02-12 Thread Dan Hendry
same connection. I can handle the current issue in application but I'm sure that I will not be able to handle some future situation in application. So the suggestion is to use at least 3 nodes with RF=3 and CL.QUORUM for both write and reads where high consistency is required, right? T

RE: per-connection "read-after-my-write" consistency

2011-02-12 Thread Dan Hendry
Are you using a higher level client (hector/pelops/pycassa/etc) or the actual thrift API? Higher level clients often pool connections and two subsequent operations (read then write) may be performed with connections to different nodes. If you are sure you are using the same connection (the actu

RE: Exceptions on 0.7.0

2011-02-09 Thread Dan Hendry
Out of curiosity, do you really have on the order of 1,986,622,313 elements (I believe elements=keys) in the cf? Dan From: shimi [mailto:shim...@gmail.com] Sent: February-09-11 15:06 To: user@cassandra.apache.org Subject: Exceptions on 0.7.0 I have a 4 node test cluster were I test the

Argh: Data Corruption (LOST DATA) (0.7.0)

2011-01-29 Thread Dan Hendry
.java:3 13) at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetche r.getNextBlock(IndexedSliceReader.java:180) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:119) ... 22 more Dan Hendry (403) 660-2297

Re: Node going down when streaming data, what next?

2011-01-26 Thread Dan Hendry
When this has happened to me, restarting the node you are trying to move works. I can't remeber the exact conditions but I have also hade to restart all nodes in the cluster simultaneously once or twice as well. I would love to know if there is a better way of doing it. On Wednesday, January 26,

My new nemesis: EOFException (0.7.0)

2011-01-26 Thread Dan Hendry
I am having yet another issue on one of my Cassandra nodes. Last night, one of my nodes ran out of memory and crashed after flooding the logs with the same type of errors I am seeing below. After restarting, they are popping up again. My solution has been to drop the consistency from ALL to ONE for

RE: Repair on single CF not working (0.7)

2011-01-26 Thread Dan Hendry
...@gmail.com] Sent: January-24-11 19:19 To: user@cassandra.apache.org Subject: Re: Repair on single CF not working (0.7) On Mon, Jan 24, 2011 at 4:15 PM, Dan Hendry wrote: I am trying to repair a single CF using nodetool. It seems like the request to limit the repair to one CF is not being respected

RE: Errors During Compaction

2011-01-25 Thread Dan Hendry
with this? More joy, less joy or a continuation of the current level of joy? Aaron On 24/01/2011, at 9:38 AM, Dan Hendry wrote: I have run into a strange problem and was hoping for suggestions on how to fix it (0.7.0). When compaction occurs on one node for what appears to be one

Repair on single CF not working (0.7)

2011-01-24 Thread Dan Hendry
g of file /var/lib/cassandra/data/kikmetrics/UserEventsByEvent-e-1348-Data.db/(2953062 0905,58066315307) progress=18898300928/28535694402 - 66% from org.apache.cassandra.streaming.StreamInSession@8df7e0c failed: requesting a retry. Dan Hendry (403) 660-2297

Errors During Compaction

2011-01-23 Thread Dan Hendry
at org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java :52) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdenti tyIterator.java:69) ... 20 more Dan Hendry (403) 660-2297

Re: Cassandra GC Settings

2011-01-17 Thread Dan Hendry
Thanks for all the info, I think I have been able to sort out my issue. The new settings I am using are: -Xmn512M (Very important I think) -XX:SurvivorRatio=5 (Not very important I think) -XX:MaxTenuringThreshold=5 -XX:ParallelGCThreads=8 -XX:CMSInitiatingOccupancyFraction=75 Since applying these

Cassandra GC Settings

2011-01-17 Thread Dan Hendry
I am having some reliability problems in my Cassandra cluster which I am almost certain is due to GC. I was about to start delving into the guts of the problem by turning on GC logging but I have never done any serious java GC tuning before (time to learn I guess). As a first step however, I was ho

RE: Having trouble getting cassandra to stay up

2010-12-24 Thread Dan Hendry
s there a way to fully remove cassandra and start off with a fully fresh copy? Thanks Alex On Fri, Dec 24, 2010 at 1:42 PM, Dan Hendry wrote: Hum, very strange. More what I was trying to get at was: did the process truly die or was it just non-responsive and looking like it was dead? It would

Re: Having trouble getting cassandra to stay up

2010-12-24 Thread Dan Hendry
ype":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"}

Re: Having trouble getting cassandra to stay up

2010-12-23 Thread Dan Hendry
Your details are rather vague, what do you mean by killed? Is the Cassandra java process still running? Any other warning or error log messages (from either node)? Could you provide the last few Cassandra log lines from each machine? Can you connect to the node via JMX? What is the output of nodeto

RE: Cassandra Node Routinely Goes Down - 0.7 RC2

2010-12-22 Thread Dan Hendry
The main diagnosing feature of the problem I was seeing is very high system CPU with no user CPU utilization(check with top or sar -u), vmstat showing one process waiting for run-time but never seeming to get it, a high page scan rate, and no Cassandra error messages (although nodes dying did *seem

RE: Severe Reliability Problems - 0.7 RC2

2010-12-20 Thread Dan Hendry
n are you running? I have seen with I/O intense nodes with 2.6.18 to 2.6.24 the kernel has a bug where it locks the JVM and spins to 100%. On Mon, Dec 20, 2010 at 1:14 PM, Brandon Williams wrote: On Mon, Dec 20, 2010 at 2:13 PM, Dan Hendry wrote: Yes, I have tried that (although only twice).

RE: Severe Reliability Problems - 0.7 RC2

2010-12-20 Thread Dan Hendry
one of my own application (java running over linux box) i tried that command to see what the application was doing or trying to. Kani On Mon, Dec 20, 2010 at 3:48 PM, Dan Hendry wrote: I have been having severe and strange reliability problems within my Cassandra cluster. This weekend

Severe Reliability Problems - 0.7 RC2

2010-12-20 Thread Dan Hendry
I have been having severe and strange reliability problems within my Cassandra cluster. This weekend, all four of my nodes were down at once. Even now I am loosing one every few hours. I have attached output from all the system monitoring commands I can think of. What seems to happen is that the j

RE: Errors when decommissioning - 0.7 RC1

2010-12-15 Thread Dan Hendry
ommand looks like on both *.19 and *.17 after the decommission is run. I assume you were running the ring command on another node? I'll look into the logs more and see if anything jumps out. On Wed, Dec 15, 2010 at 6:37 AM, Dan Hendry wrote: I am seeing very strange things when trying t

Errors when decommissioning - 0.7 RC1

2010-12-15 Thread Dan Hendry
before (I believe 0.7 b1 or later) but wrote it off as being caused by my misguided tinkering and/or other Cassandra bugs. This time around, I have done very little with JMX/CLI/nodetool and I can find no related Cassandra bugs. Help/suggestions? Dan Hendry (403) 660-2297 Report

RE: Cassandra for Ad-hoc Aggregation and formula calculation

2010-12-10 Thread Dan Hendry
Perhaps other, more experienced and reputable contributors to this list can comment but to be frank: Cassandra is probably not for you (at least for now). I personally feel Cassandra is one of the stronger NoSQL options out there and has the potential to become the defacto standard; but its not

RE: How do you implement pagination?

2010-12-10 Thread Dan Hendry
Or you can just start at the 1 + nth id given ids must be unique (you don't have to specify an existing id as the start of a slice). You don't HAVE to load the n + 1 record. This (slightly) more optimal approach has the disadvantage that you don't know with certainty when you have reached the

Estimate Keys - JMX

2010-12-10 Thread Dan Hendry
luster (RF=2) and am getting wildly different estimates from different nodes. For example, one says ~50 million and another ~75 million. Dan Hendry (403) 660-2297

RE: Various exceptions on 0.7

2010-12-06 Thread Dan Hendry
nodes is sending garbage to the others. Either there's a bug in the bleeding edge code you are running (did you try rc1?) or you do have nodes on different versions or you have a hardware problem. On Sat, Dec 4, 2010 at 5:51 PM, Dan Hendry wrote: > Here are two other errors which app

Re: Various exceptions on 0.7

2010-12-04 Thread Dan Hendry
) On Sat, Dec 4, 2010 at 6:29 PM, Dan Hendry wrote: > No, all nodes are running very recent (< 2 day old) code out of the 0.7 > branch. This cluster has always had 0.7 RC1(+) code running on it > > > On Sat, Dec 4, 2010 at 6:24 PM, Jonathan Ellis wrote: > >> Are you

Re: Various exceptions on 0.7

2010-12-04 Thread Dan Hendry
No, all nodes are running very recent (< 2 day old) code out of the 0.7 branch. This cluster has always had 0.7 RC1(+) code running on it On Sat, Dec 4, 2010 at 6:24 PM, Jonathan Ellis wrote: > Are you mixing different Cassandra versions? > > On Sat, Dec 4, 2010 at 4:58 PM, Dan Hen

Re: Various exceptions on 0.7

2010-12-04 Thread Dan Hendry
tays alive but there seem to be problems reading from the node. At the very least, read performance is massively degraded. On Sat, Dec 4, 2010 at 5:52 PM, Dan Hendry wrote: > One of my Cassandra nodes is giving me a number of errors then effectively > dying. I think it was somehow caused

Various exceptions on 0.7

2010-12-04 Thread Dan Hendry
One of my Cassandra nodes is giving me a number of errors then effectively dying. I think it was somehow caused by interrupting a nodetool clean operation. Running a recent 0.7 build out of svn. ERROR [MutationStage:26] 2010-12-04 16:23:04,395 RowMutationVerbHandler.java (line 83) Error in row mut

Re: Confused about consistency

2010-12-03 Thread Dan Hendry
you weren't > guaranteed to see it on the first. > > This was fixed in 0.6.4 but apparently I botched the merge to the 0.7 > branch. I corrected that just now, so when you update, you should be > good to go. > > On Fri, Dec 3, 2010 at 9:19 PM, Dan Hendry > wrote: &g

Confused about consistency

2010-12-03 Thread Dan Hendry
I am seeing fairly strange, behavior in my Cassandra cluster. Setup - 3 nodes (lets call them nodes 1 2 and 3) - RF=2 - A set of servers (producers) which which write data to the cluster at consistency level ONE - A set of servers (consumers/processors) which read data from the cluster at cons

Hung Repair

2010-10-22 Thread Dan Hendry
simply kill the node given the "14 outstanding" log message and as doing so has caused me problems in the past when using beta versions. Dan Hendry

Out of Memory Issues - SERIOUS

2010-10-07 Thread Dan Hendry
ite frankly nearly unbelievable, is the fact Cassandra cant seem to recover from the error and I am loosing data. Dan Hendry

RE: Column TTL

2010-10-06 Thread Dan Hendry
: @param ttl. An optional, positive delay (in seconds) after which the column will be automatically deleted. Augi 2010/10/6 Dan Hendry Hi, I have a quick and quite frankly ridiculous question regarding the column TTL value; what are the time units? Milliseconds/seconds or something else

Column TTL

2010-10-06 Thread Dan Hendry
appreciated. Thanks, Dan Hendry