Re: Blog post on Cassandra's inner workings and performance - feedback welcome

2016-07-10 Thread Graham Sanderson
2 )”why more memory makes things worse” - I’d be interested to see you argue that - it really isn’t true with big boxes. (but yes off heap is good) - we run 24 gig JVMs with 8g new gen and never see more than a second or so STW and that is rare (but we do have lot of -XX: options) > On Jul 9, 2

Re: large system hint partition

2016-09-19 Thread Graham Sanderson
The reason for large partitions is that the partition key is just the uuid of the target node More recent (I think 2.2) don't have this problem since they write hints to the file system as per the commit log Sadly the large partitions make things worse when you are hinting hence presumably und

Re: JVM safepoints, mmap, and slow disks

2016-10-08 Thread Graham Sanderson
We don’t use Azul’s Zing, but it does have the nice feature that all threads don’t have to reach safepoints at the same time. That said we make heavy use of Cassandra (with off heap memtables - not directly related but allows us a lot more GC headroom) and SOLR where we switched to mmap because

Re: JVM safepoints, mmap, and slow disks

2016-10-08 Thread Graham Sanderson
I haven’t studied the read path that carefully, but there might be a spot at the C* level rather than JVM level where you could effectively do a JNI touch of the mmap region you’re going to need next. > On Oct 8, 2016, at 7:17 PM, Graham Sanderson wrote: > > We don’t use Azul’s Zin

Re: JVM safepoints, mmap, and slow disks

2016-10-08 Thread Graham Sanderson
a lot of cache and TLB misses > with out prefetching though. > > There is a system call to page the memory in which might be better for > larger reads. Still no guarantee things stay cached though. > > Ariel > > > On Sat, Oct 8, 2016, at 08:21 PM, Graham Sanderson wrote: >

Re: Do partition keys create skinny or wide rows?

2016-10-08 Thread Graham Sanderson
Nomenclature is tricky, but PRIMARY KEY((organization_id, employee_id)) will make organization_id, employee_id the partition key which equates roughly to your latter sentence (I’m not sure about the 4 billion limit - that may be the new actual limit, but probably not a good idea). > On Oct 8, 2

Re: Do partition keys create skinny or wide rows?

2016-10-08 Thread Graham Sanderson
ct ... where organization_id = x, to get all employees in a > particular organization? > > And, this will put all those employees in the same node, right? > > On Sun, Oct 9, 2016 at 9:17 AM, Graham Sanderson <mailto:gra...@vast.com>> wrote: > Nomenclature is t

Re: Java GC pauses, reality check

2016-11-25 Thread Graham Sanderson
If you are seeing 25-30 second GC pauses then (unless you are so badly configured) seeing full GC under CMS (though G1 may have similar problems). With CMS eventual fragmentation causing promotion failure is inevitable (unless you cycle your nodes before it happens). Either your heap has way too

Re: Java GC pauses, reality check

2016-11-26 Thread Graham Sanderson
ra/browse/CASSANDRA-10969> when we restarted some nodes for other reasons. > On Nov 26, 2016, at 12:07 AM, Oleksandr Shulgin > wrote: > > On Nov 25, 2016 23:47, "Graham Sanderson" <mailto:gra...@vast.com>> wrote: > If you are seeing 25-30 second GC paus

Re: Java GC pauses, reality check

2016-11-26 Thread Graham Sanderson
It was removed in the 3.0.x line, but not in the 3.x line (post 9472) as far as I can tell. It looks to be available in 3.11 and in 3.X branches > On Nov 26, 2016, at 1:17 PM, Oleksandr Shulgin > wrote: > > On Nov 26, 2016 20:04, "Graham Sanderson" <mailto:gra..

Re: How to measure disk space used by a keyspace?

2015-07-01 Thread graham sanderson
If you are pushing metric data to graphite, there is org.apache.cassandra.metrics.keyspace..LiveDiskSpaceUsed.value … for each node; Easy enough to graph the sum across machines. Metrics/JMX are tied together in C*, so there is an equivalent value exposed via JMX… I don’t know what it is called

Re: What are problems with schema disagreement

2015-07-02 Thread graham sanderson
What version of C* are you running? Some versions of 2.0.x might occasionally fail to propagate schema changes in a timely fashion (though they would fix themselves eventually - in the order of a few minutes) > On Jul 2, 2015, at 9:37 PM, John Wong wrote: > > Hi. > > Here is a schema disagree

Re: Bulk loading performance

2015-07-13 Thread Graham Sanderson
Ironically in my experience the fastest ways to get data into C* are considered “anti-patterns” by most (but I have no problem saturating multiple gigabit network links if I really feel like inserting fast) It’s been a while since I tried some of the newer approaches though (my fast load code i

Re: Slow performance because of used-up "Waste" in AtomicBTreeColumns

2015-07-23 Thread Graham Sanderson
Multiple writes to a single partition key are guaranteed to be atomic. Therefore there has to be some protection. First rule of thumb, don’t write at insanely high rates to the same partition key concurrently (you can probably avoid this, but hints as currently implemented suffer because the p

Re: High CPU usage on some of nodes

2015-09-10 Thread Graham Sanderson
Haven’t been following this thread, but we run beefy machines with 8gig new gen, 12 gig old gen (down from 16g since moving memtables off heap, we can probably go lower)… Apart from making sure you have all the latest -XX: flags from cassandra-env.sh (and MALLOC_ARENA_MAX), I personally would r

Re: High CPU usage on some of nodes

2015-09-11 Thread Graham Sanderson
gt; > > On Thu, Sep 10, 2015 at 12:00 PM, Graham Sanderson <mailto:gra...@vast.com>> wrote: > Haven’t been following this thread, but we run beefy machines with 8gig new > gen, 12 gig old gen (down from 16g since moving memtables off heap, we can > probably go lower)… &

Re: To batch or not to batch: A question for fast inserts

2015-09-27 Thread Graham Sanderson
We are about to prototype upgrading our batch inserts, so I’m really glad about this thread… we are able to saturate our dedicated network links from hadoop when inserting via thrift API (Astyanax) - at the time we wrote that code CQL wasn’t there. Reasons to replace our current solution: 1) W

Re: Running Cassandra on Java 8 u60..

2015-09-27 Thread Graham Sanderson
IMHO G1 is still buggy on JDK8 (based solely on being subscribed to the gc-dev mailing list)… I think JDK9 will be the one. > On Sep 25, 2015, at 7:14 PM, Stefano Ortolani wrote: > > I think those were referring to Java7 and G1GC (early versions were buggy). > > Cheers, > Stefano > > > On Fr

Re: addition of nodes with auth enabled on a datacenter causes existing nodes to loose their permissions

2015-10-01 Thread Graham Sanderson
You are seeing https://issues.apache.org/jira/browse/CASSANDRA-9519 > On Oct 1, 2015, at 9:16 PM, K F wrote: > > Hi, > > I have 3 DCs out of which in one of the DC, I added 20 nodes. All of the DCs > had auth enabled, it was functioning

Re: Realtime data and (C)AP

2015-10-09 Thread Graham Sanderson
Most of our writes are not user facing so local_quorum is good... We also read at local_quorum because we prefer guaranteed consistency... But we very quickly fall back to local_one in the cases where some data fast is better than a failure. Currently we do that on a per read basis but we could

Re: Realtime data and (C)AP

2015-10-09 Thread Graham Sanderson
from my iPhone > On Oct 9, 2015, at 8:02 PM, Graham Sanderson wrote: > > Most of our writes are not user facing so local_quorum is good... We also > read at local_quorum because we prefer guaranteed consistency... But we very > quickly fall back to local_one in the cases where so

Re: Realtime data and (C)AP

2015-10-10 Thread Graham Sanderson
ed the Java driver's DowngradingConsistencyRetryPolicy for that in > cases where it makes sense. > > Ref: > http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html > > Steve > > > >> On Fri, Oct 9, 2015 at 6:06

Re: Realtime data and (C)AP

2015-10-11 Thread Graham Sanderson
things like integrate Zipkin tracing at a driver level, and add other utility > like token aware batches, and concurrent token aware batch selects. > > On Sat, Oct 10, 2015 at 2:49 PM Graham Sanderson <mailto:gra...@vast.com>> wrote: > Cool - yeah we are still on astyanax

BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Graham Sanderson
If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra 2.1, you may have had commitlog_sync: batch commitlog_sync_batch_window_in_ms: 25 in you cassiandra.yaml It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just happened immediately), but fixed in 2.1, which

Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Graham Sanderson
from starving. The suggested default is now 2ms. was added retroactively to NEWS.txt in 2.1.6 which is why it is not obvious > On Oct 19, 2015, at 11:03 AM, Michael Shuler wrote: > > On 10/19/2015 10:55 AM, Graham Sanderson wrote: >> If you had Cassandra 2.0.x (possibly before)

Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Graham Sanderson
issue > On Oct 19, 2015, at 11:37 AM, Graham Sanderson wrote: > > - commitlog_sync_batch_window_in_ms behavior has changed from the > maximum time to wait between fsync to the minimum time. We are > working on making this more user-friendly (see CASSANDRA-9533) but in the >

Re: unusual GC log

2015-10-20 Thread Graham Sanderson
What version of C* are you running? any special settings in cassandra.yaml; are you running with stock GC settings in cassandra-env.sh? what JDK/OS? > On Oct 19, 2015, at 11:40 PM, 曹志富 wrote: > > INFO [Service Thread] 2015-10-20 10:42:47,854 GCInspector.java:252 - ParNew > GC in 476ms. CMS O

Re: High cpu usage when the cluster is idle

2015-10-24 Thread Graham Sanderson
I would imagine you are running on fairly slow machines (given the CPU usage), but 2.0.12 and 2.1 use a fairly old version of the yammer/codehale metrics library. It is waking up every 5 seconds, and updating Meters… there are a bunch of these Meters per table (embedded in Timers), so your larg

Re: Cassandra stalls and dropped messages not due to GC

2015-10-29 Thread Graham Sanderson
you didn’t say what you upgraded from, but if it is 2.0.x, then look at CASSANDRA-9504 If so and you use commitlog_sync: batch Then you probably want to set commitlog_sync_batch_window_in_ms: 1 (or 2) Note I’m only slightly convinced this is the cause because of your READ_REPAIR issues (though i

Re: Cassandra stalls and dropped messages not due to GC

2015-10-29 Thread Graham Sanderson
, > delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the database > technology and transactional backbone of choice fo

Re: compression cpu overhead

2015-11-03 Thread Graham Sanderson
On read or write? https://issues.apache.org/jira/browse/CASSANDRA-7039 and friends in 2.2 should make some difference, I didn’t immediately find perf numbers though. > On Nov 3, 2015, at 5:42 PM, Dan Kinder wrote: > > Hey all, > > Just w

Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread Graham Sanderson
Agreed too. It also matters what you are inserting… if you are inserting to the same (or small set of) partition key(s) you will be limited because writes to the same partition key on a single node are atomic and isolated. > On Nov 5, 2015, at 8:49 PM, Venkatesh Arivazhagan > wrote: > > I agr

Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread Graham Sanderson
Also it sounds like you are reading the data from a single file - the problem could easily be with your load tool try (as someone suggested) using cassandra stress > On Nov 5, 2015, at 9:06 PM, Graham Sanderson wrote: > > Agreed too. It also matters what you are inserting… i

Re: Behavior difference between 2.0 and 2.1

2015-12-03 Thread Graham Sanderson
You didn’t specify which version of 2.0 you were on. There were a number of inconsistencies with static columns fixed in 2.0.10 for example CASSANDRA-7490, and CASSANDRA-7455, but there were others, and the same bugs may have caused a bunch of other issues. It very much depends exactly how you

Re: Cassandra Tuning Issue

2015-12-06 Thread Graham Sanderson
What version of C* are you using; what JVM version - you showed a partial GC config but if that is still CMS (not G1) then you are going to have insane GC pauses... Depending on C* versions are you using on/off heap memtables and what type Those are the sorts of issues related to fat nodes; I'

Re: cassandra full gc too often

2015-12-31 Thread Graham Sanderson
If you are lucky that might mask the real issue, but I doubt it… that is an insane number of compaction tasks and indicative of another problem. I would check release notes of 2.0.6+, if I recall that was not a stable version and may have had leaks. Aside from that, just FYI, if you use native_

Re: update static column using partition key

2014-09-07 Thread graham sanderson
Presumably you meant unread_ids to be a static column (it isn’t in your table definition) On Sep 7, 2014, at 10:14 AM, tommaso barbugli wrote: > Hi, > I am trying to use a couple of static columns; I am using cassandra 2.0.7 and > when I try to set a value using the partition key only, I get a

Re: update static column using partition key

2014-09-07 Thread graham sanderson
Note also (though you are likely not hitting them) there were a bunch of static column related edge cases fixed in 2.0.10 On Sep 7, 2014, at 1:18 PM, graham sanderson wrote: > Presumably you meant unread_ids to be a static column (it isn’t in your table > definition) > > On Sep

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread graham sanderson
delete inserts a tombstone which is likely smaller than the original record (though still (currently) has overhead of cost for full key/column name the data for the insert after a delete would be identical to the data if you just inserted/updated no real benefit I can think of for doing the dele

Re: Storage: upsert vs. delete + insert

2014-09-10 Thread graham sanderson
ing. > Moreover, it needs one op more to compute resulting row. > cheers, > Olek > > 2014-09-10 22:18 GMT+02:00 graham sanderson : >> delete inserts a tombstone which is likely smaller than the original record >> (though still (currently) has overhead of cost for full key/

Re: ava.lang.OutOfMemoryError: unable to create new native thread

2014-09-17 Thread graham sanderson
Are you running on a 32 bit JVM? On Sep 17, 2014, at 9:43 AM, Yatong Zhang wrote: > Hi there, > > I am using leveled compaction strategy and have many sstable files. The error > was during the startup, so any idea about this? > > ERROR [FlushWriter:4] 2014-09-17 22:36:59,383 CassandraDaemon.

Re: Unable to query with token range.. unable to make long from ‘...'

2014-09-28 Thread graham sanderson
It is expecting a 64 bit value … murmer3 partitioner uses 64 bit long tokens… where did you get your 128 bit long from, and what partitioner are you using? On Sep 28, 2014, at 1:39 PM, Kevin Burton wrote: > I’m trying to query an entire table in parallel by splitting it up in token > ranges. >

Re: Unable to query with token range.. unable to make long from ‘...'

2014-09-28 Thread graham sanderson
igned to your nodes need to be distributed throughout the > > entire possible range of tokens (0 to 2127 -1) > > so it would need to be 2^63 -1 or 2^127-1 > > > > On Sun, Sep 28, 2014 at 1:19 PM, graham sanderson wrote: > It is expecting a 64 bit value … murmer3 partit

Re: best practice for waiting for schema changes to propagate

2014-09-30 Thread graham sanderson
Also be aware of https://issues.apache.org/jira/browse/CASSANDRA-7734 if you are using C* 2.0.6+ (2.0.6 introduced a change that can sometimes causes initial schema propagation not to happen, introducing potentially long delays until some other code path repairs it later) On Sep 30, 2014, at 1:

Re: Bitmaps

2014-10-06 Thread graham sanderson
You certainly have plenty of freedom to trade off size vs access granularity using multiple blobs. It really depends on how mutable the data is, how you intend to read it, whether it is highly sparse and or highly dense (in which case you perhaps don’t need to store every bit) etc. On Oct 6, 20

Re: describe tables… and vertical formatting?

2014-10-12 Thread graham sanderson
select keyspace_name, columnfamily_name from system.schema_columns; ? On Oct 12, 2014, at 10:29 AM, Kevin Burton wrote: > It seems annoying that I can’t get “describe tables” to vertical. > > maybe there’s some option I’m missing? > > Kevin > > -- > > Founder/CEO Spinn3r.com > Location: S

Re: LOCAL_* consistency levels

2014-10-14 Thread graham sanderson
There were some versions of C* that didn’t allow you to use LOCAL_* and a single DC NetworkTopologyStrategy, or with SimpleTopologyStrategy. https://issues.apache.org/jira/browse/CASSANDRA-6238 I think You should use a NetworkTopologyStrategy with one DC for now. On Oct 14, 2014, at 7:39 AM, Ro

Re: describe tables… and vertical formatting?

2014-10-14 Thread graham sanderson
is that there are multiple entries > per table... > > On Sun, Oct 12, 2014 at 10:39 AM, graham sanderson wrote: > select keyspace_name, columnfamily_name from system.schema_columns; > ? > > On Oct 12, 2014, at 10:29 AM, Kevin Burton wrote: > >> It seems annoyin

Re: Intermittent long application pauses on nodes

2014-10-24 Thread graham sanderson
This certainly sounds like a JVM bug. We are running C* 2.0.9 on pretty high end machines with pretty large heaps, and don’t seem to have seen this (note we are on 7u67, so that might be an interesting data point, though since the old thread predated that probably not) 1) From the app/java side

Re: Intermittent long application pauses on nodes

2014-10-24 Thread graham sanderson
Actually - there is -XX:+SafepointTimeout which will print out offending threads (assuming you reach a 10 second pause)… That is probably your best bet. > On Oct 24, 2014, at 2:38 PM, graham sanderson wrote: > > This certainly sounds like a JVM bug. > > We are running C*

Re: Intermittent long application pauses on nodes

2014-10-24 Thread graham sanderson
And -XX:SafepointTimeoutDelay=xxx to set how long before it dumps output (defaults to 1 I believe)… Note it doesn’t actually timeout by default, it just prints the problematic threads after that time and keeps on waiting > On Oct 24, 2014, at 2:44 PM, graham sanderson wrote: > >

Re: Intermittent long application pauses on nodes

2014-10-31 Thread graham sanderson
Dan van Kley <mailto:dvank...@salesforce.com>> wrote: > Excellent, thanks for the tips, Graham. I'll give SafepointTimeout a try and > see if that gives us anything to act on. > > On Fri, Oct 24, 2014 at 3:52 PM, graham sanderson <mailto:gra...@vast.com>&g

Re: Client-side compression, cassandra or both?

2014-11-03 Thread graham sanderson
I wouldn’t do both. Unless a little server CPU or (and you’d have to measure it - I imagine it is probably not significant - as you say C* has more context, and hopefully most things can compress “0, “ repeatedly) disk space are an issue, I wouldn’t bother to compress yourself. Compression acros

Re: Why is one query 10 times slower than the other?

2014-11-05 Thread graham sanderson
In your “lookup_code” example “type” is not a clustercolumn it is the partition key, and hence the first query only hits one partition The second query is a range slice across all possible keys, so the sub-ranges are farmed out to nodes with the data. You are likely at CL_ONE, so it only needs re

Re: What actually causing java.lang.OutOfMemoryError: unable to create new native thread

2014-11-10 Thread graham sanderson
First question are you running 32bit or 64bit… on 32bit you can easily run out of virtual address space for thread stacks. > On Nov 10, 2014, at 8:25 AM, Jason Wee wrote: > > Hello people, below is an extraction from cassandra system log. > > ERROR [Thread-273] 2012-04-10 16:33:18,328 Abstract

Re: Trying to build Cassandra for FreeBSD 10.1

2014-11-17 Thread graham sanderson
Only thing I can see from looking at the exception, is that it looks like - I didn’t disassemble the code from hex - that the “peer” value in the RefCountedMemory object is probably 0 Given that Unsafe.allocateMemory should not return 0 even on allocation failure (which should throw OOM) - thou

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-11-28 Thread graham sanderson
Your GC settings would be helpful, though you can see guesstimate by eyeballing (assuming settings are the same across all 4 images) Bursty load can be a big cause of old gen fragmentation (as small working set objects tends to get spilled (promoted) along with memtable slabs which aren’t flush

Re: Nodes get stuck in crazy GC loop after some time, leading to timeouts

2014-11-28 Thread graham sanderson
Nov 28, 2014, at 6:54 PM, graham sanderson wrote: > > Your GC settings would be helpful, though you can see guesstimate by > eyeballing (assuming settings are the same across all 4 images) > > Bursty load can be a big cause of old gen fragmentation (as small working set >

Re: Error when dropping keyspaces; One row required, 0 found

2014-12-02 Thread graham sanderson
I don’t know what it is but I also saw “empty” keyspaces via CQL while migrating an existing test cluster from 2.0.9 to 2.1.0 (final release bits prior to labelling). Since I was doing this manually (and had cqlsh problems due to python change) I figured it might have been me. My observation w

Re: Startup failure (Core dump) in Solaris 11 + JDK 1.8.0

2015-01-13 Thread graham sanderson
This might well be https://issues.apache.org/jira/browse/CASSANDRA-8325 try the latest patch for that if you can. > On Jan 13, 2015, at 4:50 AM, Bernardino Mota > wrote: > > Hi, > > Yes, with JDK1.7 it works but only in 32bits mode. It

Re: Versioning in cassandra while indexing ?

2015-01-21 Thread graham sanderson
I believe you can use “USING TIMESTAMP XXX” with your inserts which will set the actual cell write times to the timestamp you provide. Then at least on read you’ll get the “latest” value… you may or may not incur an actual write of the old data to disk, but either way it’ll get cleaned up for yo

Re: No schema agreement from live replicas?

2015-02-03 Thread graham sanderson
What version of C* are you using; you could be seeing https://issues.apache.org/jira/browse/CASSANDRA-7734 which I think affects 2.0.7 thru 2.0.10 > On Feb 3, 2015, at 9:47 AM, Clint Kelly wrote: > > FWIW increasing the threshold for with

Re: Fastest way to map/parallel read all values in a table?

2015-02-09 Thread graham sanderson
Depending on whether you have deletes/updates, if this is an ad-hoc thing, you might want to just read the ss tables directly. > On Feb 9, 2015, at 12:56 PM, Kevin Burton wrote: > > I had considered using spark for this but: > > 1. we tried to deploy spark only to find out that it was missing

Re: OOM and high SSTables count

2015-03-04 Thread graham sanderson
We can confirm a problem on 2.1.3 (sadly our beta sstable state obviously did not match our production ones in some critical way) We have about 20k sstables on each of 6 nodes right now; actually a quick glance shows 15k of those are from OpsCenter, which may have something to do with beta/prod

Re: Upgrade from 2.0.9 to 2.1.3

2015-03-06 Thread graham sanderson
I would definitely wait for at least 2.1.4 > On Mar 6, 2015, at 8:13 AM, Fredrik Larsson Stigbäck > wrote: > > So no upgradeSSTables are required? > /Fredrik > >> 6 mar 2015 kl. 15:11 skrev Carlos Rolo > >: >> >> I would not recommend an upgrade to 2.1.x for now. Do y

Re: best practices for time-series data with massive amounts of records

2015-03-06 Thread graham sanderson
Note that using static column(s) for the “head” value, and trailing TTLed values behind is something we’re considering. Note this is especially nice if your head state includes say a map which is updated by small deltas (individual keys) We have not yet studied the effect of static columns on s

Re: Upgrade from 2.0.9 to 2.1.3

2015-03-06 Thread graham sanderson
2015, at 3:15 PM, Robert Coli wrote: > > On Fri, Mar 6, 2015 at 6:25 AM, graham sanderson <mailto:gra...@vast.com>> wrote: > I would definitely wait for at least 2.1.4 > > +1 > > https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ >

Re: What are the reasons for holding off on 2.1.x at this point?

2015-03-09 Thread graham sanderson
2.1.3 has a few memory leaks/issues, resource management race conditions. That is horribly vague, however looking at some of the fixes in 2.1.4 I’d be tempted to wait on that. 2.1.3 is fine for testing though. > On Mar 9, 2015, at 6:42 PM, Jacob Rhoden wrote: > > I notice some of the discussi

Re: Disastrous profusion of SSTables

2015-03-26 Thread graham sanderson
you may be seeing https://issues.apache.org/jira/browse/CASSANDRA-8860 https://issues.apache.org/jira/browse/CASSANDRA-8635 related issues (which ends up with excessive numbers of sstab

Re: Huge number of sstables after adding server to existing cluster

2015-04-03 Thread graham sanderson
As does 2.1.3 > On Apr 3, 2015, at 5:36 PM, Robert Coli wrote: > > On Fri, Apr 3, 2015 at 1:04 PM, Thomas Borg Salling > wrote: > I agree with Pranay. I have experienced exactly the same on C* 2.1.2. > > 2.1.2 had a serious bug which resulted in extra files, whic

Re: Astyanax Thrift Frame Size Hardcoded - Breaks Ring Describe

2015-04-03 Thread graham sanderson
It is very stable for us; we don’t use it in many cases (generally older stuff where it was the best choice), but I think it is a little harsh to write it off > On Apr 3, 2015, at 1:55 PM, Robert Coli wrote: > > On Fri, Apr 3, 2015 at 11:16 AM, Eric Stevens > wrote: >

Re: Huge number of sstables after adding server to existing cluster

2015-04-04 Thread graham sanderson
I understand correctly 32 is the max > number for sstables for normally operating cassandra node? > > > Best regards > Mantas > > On Sat, Apr 4, 2015 at 4:47 AM, graham sanderson <mailto:gra...@vast.com>> wrote: > As does 2.1.3 > >> On

Re: Uderstanding Read after update

2015-04-13 Thread Graham Sanderson
Yes it will look in each sstable that according to the bloom filter may have data for that partition key and use time stamps to figure out the latest version (or none in case of newer tombstone) to return for each clustering key Sent from my iPhone > On Apr 12, 2015, at 11:18 PM, Anishek Agarwa

DateTieredCompactionStrategy and static columns

2015-04-30 Thread graham sanderson
I have a potential use case I haven’t had a chance to prototype yet, which would normally be a good candidate for DTCS (i.e. data delivered in order and a fixed TTL), however with every write we’d also be updating some static cells (namely a few key/values in a static map CQL column). There coul

cassanulldra 2.2

2015-05-11 Thread graham sanderson
I think vast may have changed the release schedule of cassandra. I talk a lot with one of their key developers, and 3.0 was going to drop off heap memtables for several releases due to a rewrite of the storage engine to be more CQL friendly. 2.2 will take all of the improvements in 3.0 but not

Re: 10000+ CF support from Cassandra

2015-05-26 Thread graham sanderson
Are the CFs different, or all the same schema? Are you contractually obligated to actually separate data into separate CFs? It seems like you’d have a lot simpler time if you could use the part of the partition key to separate data. Note also, I don’t know what disks you are using, but disk cach

Re: 10000+ CF support from Cassandra

2015-05-28 Thread Graham Sanderson
Depending on your use case and data types (for example if you can have a minimally Nested Json representation of the objects; Than you could go with a common map representation where keys are top love object fields and values are valid Json literals as strings; eg unquoted primitives, quoted str

Re: GC pauses affecting entire cluster.

2015-06-01 Thread graham sanderson
Yes native_objects is the way to go… you can tell if memtables are you problem because you’ll see promotion failures of objects sized 131074 dwords. If your h/w is fast enough make your young gen as big as possible - we can collect 8G in sub second always, and this gives you your best chance of

Re: 10000+ CF support from Cassandra

2015-06-01 Thread graham sanderson
> > I strongly advise against this approach. > Jon, I think so too. But so you actually foresee any problems with this > approach? > I can think of a few. [I want to evaluate if we can live with this problem] Just to be clear, I’m not saying this is a great approach, I AM saying that it may be be

Re: Throttle Heavy Read / Write Loads

2015-06-05 Thread Graham Sanderson
Are you doing large batch inserts via thrift - you need to be careful there Sent from my iPhone > On Jun 4, 2015, at 11:37 PM, Anishek Agarwal wrote: > > may be just increase the read and write timeouts at cassandra currently at 5 > sec i think. i think the datastax java client driver provides

Re: Question about consistency in cassandra 2.0.9

2015-06-11 Thread graham sanderson
It looks (I’m guessing with entirely not enough info) that you only have two nodes in DC4, and are probably writing at QUORUM reading at LOCAL_ONE. But please specify your configuration > On Jun 11, 2015, at 7:01 PM, K F wrote: > > Hi, > > I am running a cassandra cluster with 4 dcs. Out of 4

Re: Cassandra 2.2, 3.0, and beyond

2015-06-11 Thread graham sanderson
I think the point is that 2.2 will replace 2.1.x + (i.e. the done/safe bits of 3.0 are included in 2.2).. so 2.2.x and 2.1.x are somewhat synonymous. > On Jun 11, 2015, at 8:14 PM, Mohammed Guller wrote: > > Considering that 2.1.6 was just released and it is the first “stable” release > ready

Question about consistency levels

2013-11-09 Thread graham sanderson
I’m trying to be more succinct this time since no answers on my last attempt. We are currently using 2.0.2 in test (no C* in production yet), and use (LOCAL_)QUORUM CL on read and writes which guarantees (if successful) that we read latest data. That said, it is highly likely that (LOCAL_)ONE w

Re: Question about consistency levels

2013-11-10 Thread graham sanderson
erent version of Cassandra, a different > client and likely have different read/write/delete usage. > > Hope that helps. > > -Original Message- > From: graham sanderson [mailto:gra...@vast.com] > Sent: 10 November 2013 06:12 > To: user@cassandra.apache.org >

Disaster recovery question

2013-11-16 Thread graham sanderson
We are currently looking to deploy on the 2.0 line of cassandra, but obviously are watching for bugs (we are currently on 2.0.2) - we are aware of a couple of interesting known bugs to be fixed in 2.0.3 and one in 2.1, but none have been observed (in production use cases) or are likely to affect

Re: Disaster recovery question

2013-11-16 Thread graham sanderson
:13 PM, Mikhail Stepura wrote: > Looks like someone has the same (1-4) questions: > https://issues.apache.org/jira/browse/CASSANDRA-6364 > > -M > > "graham sanderson" wrote in message > news:7161e7e0-cf24-4b30-b9ca-2faafb0c4...@vast.com... > > We are curren

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread graham sanderson
Perhaps not the way forward, however I can bulk insert data via astyanax at a rate that maxes out our (fast) networks. That said for our next release (of this part of our product - our other current is node.js via binary protocol) we will be looking at insert speed via java driver, and also alte

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread graham sanderson
I should probably give you a number which is about 300 meg / s via thrift api and use 1mb batches On Dec 10, 2013, at 5:14 AM, graham sanderson wrote: > Perhaps not the way forward, however I can bulk insert data via astyanax at a > rate that maxes out our (fast) networks. That said f

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread graham sanderson
he CQL interface is in. Where does that leave Astyanax? > > On Tue, Dec 10, 2013 at 1:14 PM, graham sanderson wrote: >> Perhaps not the way forward, however I can bulk insert data via astyanax at >> a rate that maxes out our (fast) networks. That said for our next release >>

Re: Clarification on how multi-DC replication works

2014-02-11 Thread graham sanderson
slightly off topic, but does anyone know off the top of their head what happens if data is being written at LOCAL_QUORUM to a multi data center setup faster than the inter data center link can handle… something has to block, throw an exception, die, or have unbounded growth (memory, threads, on

Re: Consistency Level One Question

2014-02-20 Thread graham sanderson
Writing at a consistency level of ONE means that your write will be acknowledged as soon as one replica confirms that it has made the write to memtable and the commit log (might not be quite synced to disk, but that’s a separate issue). All the writes are submitted in parallel, so it is very pos

Re: Consistency Level One Question

2014-02-20 Thread graham sanderson
Note also; that reading at ONE there will be no read repair, since the coordinator does not know that another replica has stale data (remember at ONE, basically only one node is asked for the answer). In practice for our use cases, we always write at LOCAL_QUORUM (failing the whole update if th

Re: Consistency Level One Question

2014-02-21 Thread graham sanderson
carry out read repair by getting data from all the nodes. */ On Feb 21, 2014, at 3:10 AM, Duncan Sands wrote: > Hi Graham, > > On 21/02/14 07:54, graham sanderson wrote: >> Note also; that reading at ONE there will be no read repair, since the >> coordinator does not know

Re: read one -- internal behavior

2014-03-08 Thread graham sanderson
Note that article pretty much covers it all; the nice thing about rapid-read protection is that the dynamic snitch works on a per node statistics level to pick which node(s) (in this case one), so a single poorly performing table (perhaps corrupted SSTables on that node causing no responses and

binary protocol server side sockets

2014-04-08 Thread graham sanderson
Is there a way to configure KEEPALIVE on the server end sockets of the binary protocol. rpc_keepalive only affects thrift. This is on 2.0.5 Thanks, Graham smime.p7s Description: S/MIME cryptographic signature

Re: binary protocol server side sockets

2014-04-09 Thread graham sanderson
built and maybe the connection is already established... > > Regards > > Duy Hai DOAN > > > On Wed, Apr 9, 2014 at 12:59 AM, graham sanderson wrote: > Is there a way to configure KEEPALIVE on the server end sockets of the binary > protocol. > > rpc_keepalive

Re: binary protocol server side sockets

2014-04-09 Thread graham sanderson
VPNs fault in this case… that said and maybe this is a dev list question, it seems like the option to set keepalive should exist. On Apr 9, 2014, at 12:25 PM, Michael Shuler wrote: > On 04/09/2014 11:39 AM, graham sanderson wrote: >> Thanks, but I would think that just sets keep alive

Re: binary protocol server side sockets

2014-04-09 Thread graham sanderson
particular harm to setting keepalive) On Apr 9, 2014, at 1:34 PM, Michael Shuler wrote: > On 04/09/2014 12:41 PM, graham sanderson wrote: >> Michael, it is not that the connections are being dropped, it is that >> the connections are not being dropped. > > Thanks for the clarificatio

Re: Question about READS in a multi DC environment.

2014-05-11 Thread graham sanderson
You have a read_repair_chance of 1.0 which is probably why your query is hitting all data centers. On May 11, 2014, at 3:44 PM, Mark Farnan wrote: > Im trying to understand READ load in Cassandra across a multi-datacenter > cluster. (Specifically why it seems to be hitting more than one DC)

Re: Cyclop - CQL web based editor has been released!

2014-05-11 Thread graham sanderson
Looks cool - giving it a try now (note FYI when building, TestDataConverter.java line 46 assumes a specific time zone) On May 11, 2014, at 12:41 AM, Maciej Miklas wrote: > Hi everybody, > > I am aware that this mailing list is meant for Cassandra users, but I’ve > developed something that is

  1   2   >