Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Carl Mueller
"large/giant clusters and admins are the target audience for the value we select" There are reasons aside from massive scale to pick cassandra, but the primary reason cassandra is selected technically is to support vertically scaling to large clusters. Why pick a value that once you reach scale y

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Carl Mueller
edit: 4 is bad at small cluster sizes and could scare off adoption On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller wrote: > "large/giant clusters and admins are the target audience for the value we > select" > > There are reasons aside from massive scale to pick cassan

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Carl Mueller
So why even have virtual nodes at all, why not work on improving single token approaches so that we can support cluster doubling, which IMO would enable cassandra to more quickly scale for volatile loads? It's my guess/understanding that vnodes eliminate the token rebalancing that existed back in

gossip tuning

2019-09-26 Thread Carl Mueller
We have three datacenters (EU, AP, US) in aws and have problems bootstrapping new nodes in the AP datacenter: java.lang.RuntimeException: A node required to move the data consistently is down (/SOME_NODE). If you wish to move the data from a potentially inconsistent replica, restart the node with

Re: Bootstrapping process questions (CASSANDRA-15155)

2019-06-17 Thread Carl Mueller
We are in conversations with AWS, hopefully an IPV6 expert, to examine what happened. On Thu, Jun 13, 2019 at 11:19 AM Carl Mueller wrote: > Our cassandra.ring_delay_ms is current around 30 to get nodes to > bootstrap. > > On Wed, Jun 12, 2019 at 5:56 PM Carl Mueller > wr

Re: Bootstrapping process questions (CASSANDRA-15155)

2019-06-13 Thread Carl Mueller
Our cassandra.ring_delay_ms is current around 30 to get nodes to bootstrap. On Wed, Jun 12, 2019 at 5:56 PM Carl Mueller wrote: > We're seeing nodes bootstrapping but not streaming and joining a cluster > in 2.2.13. > > I have been looking through the MigrationMan

Bootstrapping process questions (CASSANDRA-15155)

2019-06-12 Thread Carl Mueller
We're seeing nodes bootstrapping but not streaming and joining a cluster in 2.2.13. I have been looking through the MigrationManager code and the StorageService code that seems relevant based on the Bootstrap status messages that are coming through. I'll be referencing line numbers from the 2.2.13

Upgrading 2.1.x with EC2MRS problems: CASSANDRA-15068

2019-03-26 Thread Carl Mueller
Can someone do a quick check of https://issues.apache.org/jira/browse/CASSANDRA-15068?jql=text%20~%20%22EC2MRS%22 I think we may do a custom class old-behavior snitch that doesn't have the broadcast_rpc_address==null check and see if that works. But that seems extreme, can someone that knows the

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

2019-02-01 Thread Carl Mueller
I'd still need a "all events for app_id" query. We have seconds-level events :-( On Fri, Feb 1, 2019 at 3:02 PM Jeff Jirsa wrote: > On Fri, Feb 1, 2019 at 12:58 PM Carl Mueller > wrote: > > > Jeff: so the partition key with timestamp would then need a separate

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

2019-02-01 Thread Carl Mueller
ecated". On Fri, Feb 1, 2019 at 2:53 PM Carl Mueller wrote: > Interesting. Now that we have semiautomated upgrades, we are going to > hopefully get everything to 3.11X once we get the intermediate hop to 2.2. > > I'm thinking we could also use sstable metadata markings + c

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

2019-02-01 Thread Carl Mueller
tly, so I’m not sure if it’s current state, but worth considering > that > > this may be much better on 3.0+ > > > > > > > > -- > > Jeff Jirsa > > > > > > > On Jan 31, 2019, at 1:56 PM, Carl Mueller < > carl.muel...@smartthings.co

SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

2019-01-31 Thread Carl Mueller
Situation: We use TWCS for a task history table (partition is user, column key is timeuuid of task, TWCS is used due to tombstone TTLs that rotate out the tasks every say month. ) However, if we want to get a "slice" of tasks (say, tasks in the last two days and we are using TWCS sstable blocks o

Re: SEDA, queues, and a second lower-priority queue set

2019-01-16 Thread Carl Mueller
additionally, a certain number of the threads in each stage could be restricted from serving the low-priority queues at all, say 8/32 or 16/32 threads, to further ensure processing availability to the higher-priority tasks. On Wed, Jan 16, 2019 at 3:04 PM Carl Mueller wrote: > At a theoreti

SEDA, queues, and a second lower-priority queue set

2019-01-16 Thread Carl Mueller
At a theoretical level assuming it could be implemented with a magic wand, would there be value to having a dual set of queues/threadpools at each of the SEDA stages inside cassandra for a two-tier of priority? Such that you could mark queries that return pages and pages of data as lower-priority w

Re: EOL 2.1 series?

2019-01-16 Thread Carl Mueller
A second vote for any bugfixes we can get for 2.1, we'd probably use it. We are finally getting traction behind upgrades with an automated upgrade/rollback for 2.1 --> 2.2, but it's going to be a while. Aaron, if you want to use our 2.1-2.2 migration tool, we can talk about at the next MPLS meetu

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-11-03 Thread Carl Mueller
IMO slightly bigger memory requirements for substantial improvements is a good exchange, especially for a 4.0 release of the database. Optane and lots of other memory are coming down the hardware pipeline, and risk-wise almost all cassandra people know to testbed the major versions, so major versio

Re: Built in trigger: double-write for app migration

2018-10-19 Thread Carl Mueller
s), medium-term/less common(days to weeks), long/years ), with the aim of avoiding having to do compaction at all and just truncating buckets as they "expire" for a nice O(1) compaction process. On Fri, Oct 19, 2018 at 9:57 AM Carl Mueller wrote: > new DC and then split is one way, b

Re: Built in trigger: double-write for app migration

2018-10-19 Thread Carl Mueller
or forwards a copy of any write > request regarding tokens that are being transferred to the new node. > > [1] Incremental Elasticity for NoSQL Data Stores, SRDS’17, > https://ieeexplore.ieee.org/document/8069080 > > > > On 18 Oct 2018, at 18:53, Carl Mueller > > >

Re: Built in trigger: double-write for app migration

2018-10-18 Thread Carl Mueller
pling is adding an extra instance with the same schema to > test things like yaml params or compaction without impacting reads or > correctness - it’s different than what you describe > > > > -- > Jeff Jirsa > > > > On Oct 18, 2018, at 5:57 PM, Carl Mueller > >

Re: Built in trigger: double-write for app migration

2018-10-18 Thread Carl Mueller
I guess there is also write-survey-mode from cass 1.1: https://issues.apache.org/jira/browse/CASSANDRA-3452 Were triggers intended to supersede this capability? I can't find a lot of "user level" info on it. On Thu, Oct 18, 2018 at 10:53 AM Carl Mueller wrote: > tl;dr: a

Built in trigger: double-write for app migration

2018-10-18 Thread Carl Mueller
tl;dr: a generic trigger on TABLES that will mirror all writes to facilitate data migrations between clusters or systems. What is necessary to ensure full write mirroring/coherency? When cassandra clusters have several "apps" aka keyspaces serving applications colocated on them, but the app/keyspa

Re: Proposing an Apache Cassandra Management process

2018-10-16 Thread Carl Mueller
I too have built a framework over the last year similar to what cstar does but for our purposes at smartthings. The intention is to OSS it, but it needs a round of polish, since it really is more of a utility toolbox for our small cassandra group. It relies on ssh heavily and performing work on th

Re: Java 11 Z garbage collector

2018-09-06 Thread Carl Mueller
Thanks Jeff. On Fri, Aug 31, 2018 at 1:01 PM Jeff Jirsa wrote: > Read heavy workload with wider partitions (like 1-2gb) and disable the key > cache will be worst case for GC > > > > > -- > Jeff Jirsa > > > > On Aug 31, 2018, at 10:51 AM, Carl Mueller > &g

Re: Transient Replication 4.0 status update

2018-08-31 Thread Carl Mueller
at least for now. I think we want a design that > changes the operational, availability, and consistency story as little as > possible when it's completed. > > Ariel > On Fri, Aug 31, 2018, at 2:27 PM, Carl Mueller wrote: > > SOrry to spam this with two messages... > >

Re: Transient Replication 4.0 status update

2018-08-31 Thread Carl Mueller
if I understand the paper/protocol to RF3+1 transient. On Fri, Aug 31, 2018 at 1:07 PM Carl Mueller wrote: > I put these questions on the ticket too... Sorry if some of them are > stupid. > > So are (basically) these transient nodes basically serving as centralized > hinted handof

Re: Transient Replication 4.0 status update

2018-08-31 Thread Carl Mueller
I put these questions on the ticket too... Sorry if some of them are stupid. So are (basically) these transient nodes basically serving as centralized hinted handoff caches rather than having the hinted handoffs cluttering up full replicas, especially nodes that have no concern for the token range

Re: Java 11 Z garbage collector

2018-08-31 Thread Carl Mueller
I'm assuming that p99 that Rocksandra tries to target is caused by GC pauses, does anyone have data patterns or datasets that will generate GC pauses in Cassandra to highlight the abilities of Rocksandra (and... Scylla?) and perhaps this GC approach? On Thu, Aug 30, 2018 at 8:11 PM Carl Mu

Re: Java 11 Z garbage collector

2018-08-30 Thread Carl Mueller
l. > > That said, I didn't try it with a huge heap (i think it was 16 or 24GB), so > maybe it'll do better if I throw 50 GB RAM at it. > > > > On Thu, Aug 30, 2018 at 8:42 AM Carl Mueller > wrote: > > > https://www.opsian.com/blog/javas-new-zgc-is-very-exci

Java 11 Z garbage collector

2018-08-30 Thread Carl Mueller
https://www.opsian.com/blog/javas-new-zgc-is-very-exciting/ .. max of 4ms for stop the world, large terabyte heaps, seems promising. Will this be a major boon to cassandra p99 times? Anyone know the aspects of cassandra that cause the most churn and lead to StopTheWorld GC? I was under the impres

Re: replicated data in different sstables

2018-07-25 Thread Carl Mueller
Oh duh, RACS does this already. But it would be nice to get some education on the bloom filter memory use vs # sstables question. On Wed, Jul 25, 2018 at 10:41 AM Carl Mueller wrote: > It would seem to me that if the replicated data managed by a node is in > separate sstables from the

replicated data in different sstables

2018-07-25 Thread Carl Mueller
It would seem to me that if the replicated data managed by a node is in separate sstables from the "main" data it manages, when a new node came online it would be easier to discard the data it no longer is responsible for since it was shifted a slot down the ring. Generally speaking I've been aski

RangeAwareCompaction for manual token management

2018-07-19 Thread Carl Mueller
I don't want to comment on the 10540 ticket since it seems very well focused on vnode-aligned sstable partitioning and compaction. I'm pretty excited about that ticket. RACS should enable: - smaller scale LCS, more constrained I/O consumption - less sstables to hit in read path - multithreaded/mul

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Carl Mueller
; fairly confident there’s a JIRA for this, if not it’s been discussed in > > person among various operators for years as an obvious future > improvement. > > > > -- > > Jeff Jirsa > > > > > > > On Apr 17, 2018, at 8:17 AM, Carl Mueller < > carl.muel

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Carl Mueller
particular - I’m > fairly confident there’s a JIRA for this, if not it’s been discussed in > person among various operators for years as an obvious future improvement. > > -- > Jeff Jirsa > > > > On Apr 17, 2018, at 8:17 AM, Carl Mueller > wrote: > > > &g

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Carl Mueller
Do Vnodes address anything besides alleviating cluster planners from doing token range management on nodes manually? Do we have a centralized list of advantages they provide beyond that? There seem to be lots of downsides. 2i index performance, the above availability, etc. I also wonder if in vno

Re: Repair scheduling tools

2018-04-16 Thread Carl Mueller
16, 2018 at 12:21 PM, Carl Mueller wrote: > Is the fundamental nature of sstable fragmentation the big wrench here? > I've been trying to imagine aids like an offline repair resolver or a > gradual node replacement/regenerator process that could serve as a > backstop/insurance for

Re: Repair scheduling tools

2018-04-16 Thread Carl Mueller
Is the fundamental nature of sstable fragmentation the big wrench here? I've been trying to imagine aids like an offline repair resolver or a gradual node replacement/regenerator process that could serve as a backstop/insurance for compaction and repair problems. After all, some of the "we don't ev

Re: Repair scheduling tools

2018-04-03 Thread Carl Mueller
LastPickle's reaper should be the starting point of any discussion on repair scheduling. On Tue, Apr 3, 2018 at 12:48 PM, Blake Eggleston wrote: > Hi dev@, > > > > The question of the best way to schedule repairs came up on > CASSANDRA-14346, and I thought it would be good to bring up the idea o

Re: [DISCUSS] java 9 and the future of cassandra on the jdk

2018-03-25 Thread Carl Mueller
sions >>> may need to fiddle with yum and apt sources to get OpenJDK 8, but this >>> is a relatively solved problem.) >>> >>> Users have the ability to deviate and set a JAVA_HOME env var to use a >>> custom-installed JDK of their liking, or go dow

Re: [DISCUSS] java 9 and the future of cassandra on the jdk

2018-03-23 Thread Carl Mueller
I am now thinking that aligning to the major JDK release that is for-pay three years if you want it is the best strategy. What I think will happen is that there will be a consortium that maintains/backports that release level independent of oracle, if only to spite them. I'm thinking IBM, Azul, etc

Re: [DISCUSS] java 9 and the future of cassandra on the jdk

2018-03-22 Thread Carl Mueller
Is OpenJDK really not addressing this at all? Is that because OpenJDK is beholden to Oracle somehow? This is a major disservice to Apache and the java ecosystem as a whole. When java was fully open sourced, it was supposed to free the ecosystem to a large degree from Oracle. Why is OpenJDK being s

Re: [DISCUSS] java 9 and the future of cassandra on the jdk

2018-03-20 Thread Carl Mueller
So this is basically Oracle imposing a rapid upgrade path on free users to force them to buy commercial to get LTS stability? This will probably shake out in the community somehow. Cassandra is complex but we are small fry in the land of IT supports and Enterprise upgrades. Something will organize

Re: Making RF4 useful aka primary and secondary ranges

2018-03-15 Thread Carl Mueller
cassandra.apache.org%3E > > Which led to the creation of this JIRA: > https://issues.apache.org/jira/browse/CASSANDRA-13645 > > > On Wed, Mar 14, 2018 at 4:23 PM, Carl Mueller < > carl.muel...@smartthings.com> > wrote: > > > Since this is basically driver syntacti

Re: Making RF4 useful aka primary and secondary ranges

2018-03-14 Thread Carl Mueller
Wed, Mar 14, 2018 at 3:47 PM Carl Mueller > > wrote: > > > But we COULD have CL2 write (for RF4) > > > > The extension to this idea is multiple backup/secondary replicas. So you > > have RF5 or RF6 or higher, but still are performing CL2 against the > > pr

Re: Making RF4 useful aka primary and secondary ranges

2018-03-14 Thread Carl Mueller
datacenters, but are doing local_quorum on the one datacenter. Well, except switchover is a bit more granular if you run out of replicas in the local. On Wed, Mar 14, 2018 at 5:17 PM, Jeff Jirsa wrote: > Write at CL 3 and read at CL 2 > > -- > Jeff Jirsa > > > > On Mar 14

Re: Making RF4 useful aka primary and secondary ranges

2018-03-14 Thread Carl Mueller
I also wonder if the state of hinted handoff can inform the validity of extra replicas. Repair is mentioned in 7168. On Wed, Mar 14, 2018 at 4:55 PM, Carl Mueller wrote: > For my reference: https://issues.apache.org/jira/browse/CASSANDRA-7168 > > > On Wed, Mar 14, 2018 at 4:

Re: Making RF4 useful aka primary and secondary ranges

2018-03-14 Thread Carl Mueller
gt; jira/browse/CASSANDRA-13442 > > It's been discussed quite a bit offline and I did a presentation on it at > NGCC. Hopefully we will see some movement on it soon. > > Ariel > > On Wed, Mar 14, 2018, at 5:40 PM, Carl Mueller wrote: > > Currently there is little use for RF4

Making RF4 useful aka primary and secondary ranges

2018-03-14 Thread Carl Mueller
Currently there is little use for RF4. You're getting the requirements of QUORUM-3 but only one extra backup. I'd like to propose something that would make RF4 a sort of more heavily backed up RF3. A lot of this is probably achievable with strictly driver-level logic, so perhaps it would belong m

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller
Alternative: JVM per vnode. On Thu, Feb 22, 2018 at 4:52 PM, Carl Mueller wrote: > BLoom filters... nevermind > > > On Thu, Feb 22, 2018 at 4:48 PM, Carl Mueller < > carl.muel...@smartthings.com> wrote: > >> Is the current reason for a large starting heap due to t

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller
BLoom filters... nevermind On Thu, Feb 22, 2018 at 4:48 PM, Carl Mueller wrote: > Is the current reason for a large starting heap due to the memtable? > > On Thu, Feb 22, 2018 at 4:44 PM, Carl Mueller < > carl.muel...@smartthings.com> wrote: > >> ... compact

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller
Is the current reason for a large starting heap due to the memtable? On Thu, Feb 22, 2018 at 4:44 PM, Carl Mueller wrote: > ... compaction on its own jvm was also something I was thinking about, > but then I realized even more JVM sharding could be done at the table level. > > On

Re: Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller
;> orgs who run multiple cassandra instances on the same node (multiple > >> gossipers in that case is at least a little wasteful). > >> > >> I've also played around with using domain sockets for IPC inside of > >> cassandra. I never ran a proper benchmark, but

Why isn't there a separate JVM per table?

2018-02-22 Thread Carl Mueller
GC pauses may have been improved in newer releases, since we are in 2.1.x, but I was wondering why cassandra uses one jvm for all tables and keyspaces, intermingling the heap for on-JVM objects. ... so why doesn't cassandra spin off a jvm per table so each jvm can be tuned per table and gc tuned a

penn state academic paper - "scalable" bloom filters

2018-02-22 Thread Carl Mueller
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.62.7953&rep=rep1&type=pdf looks to be an adaptive approach where the "initial guess" bloom filters are enhanced with more layers of ones generated after usage stats are gained. Disclaimer: I suck at reading academic papers.

Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-20 Thread Carl Mueller
When memtables/CommitLogs are flushed to disk/sstable, does the sstable go through sstable organization specific to each compaction strategy, or is the sstable creation the same for all compactionstrats and it is up to the compaction strategy to recompact the sstable if desired?

Re: scheduled work compaction strategy

2018-02-17 Thread Carl Mueller
wrote: > There’s a company using TWCS in this config - I’m not going to out them, > but I think they do it (or used to) with aggressive tombstone sub > properties. They may have since extended/enhanced it somewhat. > > -- > Jeff Jirsa > > > > On Feb 16, 2018, at 2

Re: scheduled work compaction strategy

2018-02-16 Thread Carl Mueller
sstable. This strategy assumes TTLs would be cleaning up these row fragments, so that the distribution of the data across many many sstables wouldn't pollute the bloom filters too much. On Fri, Feb 16, 2018 at 4:24 PM, Carl Mueller wrote: > Oh and as a further refinement outside of our

Re: scheduled work compaction strategy

2018-02-16 Thread Carl Mueller
fault, and as the near term comes into play, that is considered a different "level". Of course all this relies on the ability to look at the data in the rowkey or the TTL associated with the row. On Fri, Feb 16, 2018 at 4:17 PM, Carl Mueller wrote: > We have a scheduler app here at sm

scheduled work compaction strategy

2018-02-16 Thread Carl Mueller
We have a scheduler app here at smartthings, where we track per-second tasks to be executed. These are all TTL'd to be destroyed after the second the event was registered with has passed. If the scheduling window was sufficiently small, say, 1 day, we could probably use a time window compaction s

Re: row tombstones as a separate sstable citizen

2018-02-16 Thread Carl Mueller
for these than the general row cache difficulties for cassandra data. Those caches could only be loaded during compaction operations too. On Thu, Feb 15, 2018 at 11:24 AM, Jeff Jirsa wrote: > Worth a JIRA, yes > > > On Wed, Feb 14, 2018 at 9:45 AM, Carl Mueller < > carl.muel

Re: row tombstones as a separate sstable citizen

2018-02-14 Thread Carl Mueller
So is this at least a decent candidate for a feature request ticket? On Tue, Feb 13, 2018 at 8:09 PM, Carl Mueller wrote: > I'm particularly interested in getting the tombstones to "promote" up the > levels of LCS more quickly. Currently they get attached at the low level

Re: row tombstones as a separate sstable citizen

2018-02-13 Thread Carl Mueller
gic from > 7109 would probably go a long ways. Though if you are bulk inserting > deletes that is what you would end up with, so maybe it already works. > > -Jeremiah > > > On Feb 13, 2018, at 6:04 PM, Jeff Jirsa wrote: > > > > On Tue, Feb 13, 2018 at 2:38 PM, Carl Mu

row tombstones as a separate sstable citizen

2018-02-13 Thread Carl Mueller
In process of doing my second major data purge from a cassandra system. Almost all of my purging is done via row tombstones. While performing this the second time while trying to cajole compaction to occur (in 2.1.x, LevelledCompaction) to goddamn actually compact the data, I've been thinking as t