"large/giant clusters and admins are the target audience for the value we
select"
There are reasons aside from massive scale to pick cassandra, but the
primary reason cassandra is selected technically is to support vertically
scaling to large clusters.
Why pick a value that once you reach scale y
edit: 4 is bad at small cluster sizes and could scare off adoption
On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller
wrote:
> "large/giant clusters and admins are the target audience for the value we
> select"
>
> There are reasons aside from massive scale to pick cassan
So why even have virtual nodes at all, why not work on improving single
token approaches so that we can support cluster doubling, which IMO would
enable cassandra to more quickly scale for volatile loads?
It's my guess/understanding that vnodes eliminate the token rebalancing
that existed back in
We have three datacenters (EU, AP, US) in aws and have problems
bootstrapping new nodes in the AP datacenter:
java.lang.RuntimeException: A node required to move the data
consistently is down (/SOME_NODE). If you wish to move the data from a
potentially inconsistent replica, restart the node with
We are in conversations with AWS, hopefully an IPV6 expert, to examine what
happened.
On Thu, Jun 13, 2019 at 11:19 AM Carl Mueller
wrote:
> Our cassandra.ring_delay_ms is current around 30 to get nodes to
> bootstrap.
>
> On Wed, Jun 12, 2019 at 5:56 PM Carl Mueller
> wr
Our cassandra.ring_delay_ms is current around 30 to get nodes to
bootstrap.
On Wed, Jun 12, 2019 at 5:56 PM Carl Mueller
wrote:
> We're seeing nodes bootstrapping but not streaming and joining a cluster
> in 2.2.13.
>
> I have been looking through the MigrationMan
We're seeing nodes bootstrapping but not streaming and joining a cluster in
2.2.13.
I have been looking through the MigrationManager code and the
StorageService code that seems relevant based on the Bootstrap status
messages that are coming through. I'll be referencing line numbers from the
2.2.13
Can someone do a quick check of
https://issues.apache.org/jira/browse/CASSANDRA-15068?jql=text%20~%20%22EC2MRS%22
I think we may do a custom class old-behavior snitch that doesn't have the
broadcast_rpc_address==null check and see if that works.
But that seems extreme, can someone that knows the
I'd still need a "all events for app_id" query. We have seconds-level
events :-(
On Fri, Feb 1, 2019 at 3:02 PM Jeff Jirsa wrote:
> On Fri, Feb 1, 2019 at 12:58 PM Carl Mueller
> wrote:
>
> > Jeff: so the partition key with timestamp would then need a separate
ecated".
On Fri, Feb 1, 2019 at 2:53 PM Carl Mueller
wrote:
> Interesting. Now that we have semiautomated upgrades, we are going to
> hopefully get everything to 3.11X once we get the intermediate hop to 2.2.
>
> I'm thinking we could also use sstable metadata markings + c
tly, so I’m not sure if it’s current state, but worth considering
> that
> > this may be much better on 3.0+
> >
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Jan 31, 2019, at 1:56 PM, Carl Mueller <
> carl.muel...@smartthings.co
Situation:
We use TWCS for a task history table (partition is user, column key is
timeuuid of task, TWCS is used due to tombstone TTLs that rotate out the
tasks every say month. )
However, if we want to get a "slice" of tasks (say, tasks in the last two
days and we are using TWCS sstable blocks o
additionally, a certain number of the threads in each stage could be
restricted from serving the low-priority queues at all, say 8/32 or 16/32
threads, to further ensure processing availability to the higher-priority
tasks.
On Wed, Jan 16, 2019 at 3:04 PM Carl Mueller
wrote:
> At a theoreti
At a theoretical level assuming it could be implemented with a magic wand,
would there be value to having a dual set of queues/threadpools at each of
the SEDA stages inside cassandra for a two-tier of priority? Such that you
could mark queries that return pages and pages of data as lower-priority
w
A second vote for any bugfixes we can get for 2.1, we'd probably use it.
We are finally getting traction behind upgrades with an automated
upgrade/rollback for 2.1 --> 2.2, but it's going to be a while.
Aaron, if you want to use our 2.1-2.2 migration tool, we can talk about at
the next MPLS meetu
IMO slightly bigger memory requirements for substantial improvements is a
good exchange, especially for a 4.0 release of the database. Optane and
lots of other memory are coming down the hardware pipeline, and risk-wise
almost all cassandra people know to testbed the major versions, so major
versio
s), medium-term/less common(days
to weeks), long/years ), with the aim of avoiding having to do compaction
at all and just truncating buckets as they "expire" for a nice O(1)
compaction process.
On Fri, Oct 19, 2018 at 9:57 AM Carl Mueller
wrote:
> new DC and then split is one way, b
or forwards a copy of any write
> request regarding tokens that are being transferred to the new node.
>
> [1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,
> https://ieeexplore.ieee.org/document/8069080
>
>
> > On 18 Oct 2018, at 18:53, Carl Mueller
> >
>
pling is adding an extra instance with the same schema to
> test things like yaml params or compaction without impacting reads or
> correctness - it’s different than what you describe
>
>
>
> --
> Jeff Jirsa
>
>
> > On Oct 18, 2018, at 5:57 PM, Carl Mueller
> >
I guess there is also write-survey-mode from cass 1.1:
https://issues.apache.org/jira/browse/CASSANDRA-3452
Were triggers intended to supersede this capability? I can't find a lot of
"user level" info on it.
On Thu, Oct 18, 2018 at 10:53 AM Carl Mueller
wrote:
> tl;dr: a
tl;dr: a generic trigger on TABLES that will mirror all writes to
facilitate data migrations between clusters or systems. What is necessary
to ensure full write mirroring/coherency?
When cassandra clusters have several "apps" aka keyspaces serving
applications colocated on them, but the app/keyspa
I too have built a framework over the last year similar to what cstar does
but for our purposes at smartthings. The intention is to OSS it, but it
needs a round of polish, since it really is more of a utility toolbox for
our small cassandra group.
It relies on ssh heavily and performing work on th
Thanks Jeff.
On Fri, Aug 31, 2018 at 1:01 PM Jeff Jirsa wrote:
> Read heavy workload with wider partitions (like 1-2gb) and disable the key
> cache will be worst case for GC
>
>
>
>
> --
> Jeff Jirsa
>
>
> > On Aug 31, 2018, at 10:51 AM, Carl Mueller
> &g
at least for now. I think we want a design that
> changes the operational, availability, and consistency story as little as
> possible when it's completed.
>
> Ariel
> On Fri, Aug 31, 2018, at 2:27 PM, Carl Mueller wrote:
> > SOrry to spam this with two messages...
> >
if I understand the paper/protocol to
RF3+1 transient.
On Fri, Aug 31, 2018 at 1:07 PM Carl Mueller
wrote:
> I put these questions on the ticket too... Sorry if some of them are
> stupid.
>
> So are (basically) these transient nodes basically serving as centralized
> hinted handof
I put these questions on the ticket too... Sorry if some of them are
stupid.
So are (basically) these transient nodes basically serving as centralized
hinted handoff caches rather than having the hinted handoffs cluttering up
full replicas, especially nodes that have no concern for the token range
I'm assuming that p99 that Rocksandra tries to target is caused by GC
pauses, does anyone have data patterns or datasets that will generate GC
pauses in Cassandra to highlight the abilities of Rocksandra (and...
Scylla?) and perhaps this GC approach?
On Thu, Aug 30, 2018 at 8:11 PM Carl Mu
l.
>
> That said, I didn't try it with a huge heap (i think it was 16 or 24GB), so
> maybe it'll do better if I throw 50 GB RAM at it.
>
>
>
> On Thu, Aug 30, 2018 at 8:42 AM Carl Mueller
> wrote:
>
> > https://www.opsian.com/blog/javas-new-zgc-is-very-exci
https://www.opsian.com/blog/javas-new-zgc-is-very-exciting/
.. max of 4ms for stop the world, large terabyte heaps, seems promising.
Will this be a major boon to cassandra p99 times? Anyone know the aspects
of cassandra that cause the most churn and lead to StopTheWorld GC? I was
under the impres
Oh duh, RACS does this already. But it would be nice to get some education
on the bloom filter memory use vs # sstables question.
On Wed, Jul 25, 2018 at 10:41 AM Carl Mueller
wrote:
> It would seem to me that if the replicated data managed by a node is in
> separate sstables from the
It would seem to me that if the replicated data managed by a node is in
separate sstables from the "main" data it manages, when a new node came
online it would be easier to discard the data it no longer is responsible
for since it was shifted a slot down the ring.
Generally speaking I've been aski
I don't want to comment on the 10540 ticket since it seems very well
focused on vnode-aligned sstable partitioning and compaction. I'm pretty
excited about that ticket. RACS should enable:
- smaller scale LCS, more constrained I/O consumption
- less sstables to hit in read path
- multithreaded/mul
; fairly confident there’s a JIRA for this, if not it’s been discussed in
> > person among various operators for years as an obvious future
> improvement.
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Apr 17, 2018, at 8:17 AM, Carl Mueller <
> carl.muel
particular - I’m
> fairly confident there’s a JIRA for this, if not it’s been discussed in
> person among various operators for years as an obvious future improvement.
>
> --
> Jeff Jirsa
>
>
> > On Apr 17, 2018, at 8:17 AM, Carl Mueller
> wrote:
> >
> &g
Do Vnodes address anything besides alleviating cluster planners from doing
token range management on nodes manually? Do we have a centralized list of
advantages they provide beyond that?
There seem to be lots of downsides. 2i index performance, the above
availability, etc.
I also wonder if in vno
16, 2018 at 12:21 PM, Carl Mueller wrote:
> Is the fundamental nature of sstable fragmentation the big wrench here?
> I've been trying to imagine aids like an offline repair resolver or a
> gradual node replacement/regenerator process that could serve as a
> backstop/insurance for
Is the fundamental nature of sstable fragmentation the big wrench here?
I've been trying to imagine aids like an offline repair resolver or a
gradual node replacement/regenerator process that could serve as a
backstop/insurance for compaction and repair problems. After all, some of
the "we don't ev
LastPickle's reaper should be the starting point of any discussion on
repair scheduling.
On Tue, Apr 3, 2018 at 12:48 PM, Blake Eggleston
wrote:
> Hi dev@,
>
>
>
> The question of the best way to schedule repairs came up on
> CASSANDRA-14346, and I thought it would be good to bring up the idea o
sions
>>> may need to fiddle with yum and apt sources to get OpenJDK 8, but this
>>> is a relatively solved problem.)
>>>
>>> Users have the ability to deviate and set a JAVA_HOME env var to use a
>>> custom-installed JDK of their liking, or go dow
I am now thinking that aligning to the major JDK release that is for-pay
three years if you want it is the best strategy. What I think will happen
is that there will be a consortium that maintains/backports that release
level independent of oracle, if only to spite them. I'm thinking IBM, Azul,
etc
Is OpenJDK really not addressing this at all? Is that because OpenJDK is
beholden to Oracle somehow? This is a major disservice to Apache and the
java ecosystem as a whole.
When java was fully open sourced, it was supposed to free the ecosystem to
a large degree from Oracle. Why is OpenJDK being s
So this is basically Oracle imposing a rapid upgrade path on free users to
force them to buy commercial to get LTS stability?
This will probably shake out in the community somehow. Cassandra is complex
but we are small fry in the land of IT supports and Enterprise upgrades.
Something will organize
cassandra.apache.org%3E
>
> Which led to the creation of this JIRA:
> https://issues.apache.org/jira/browse/CASSANDRA-13645
>
>
> On Wed, Mar 14, 2018 at 4:23 PM, Carl Mueller <
> carl.muel...@smartthings.com>
> wrote:
>
> > Since this is basically driver syntacti
Wed, Mar 14, 2018 at 3:47 PM Carl Mueller >
> wrote:
>
> > But we COULD have CL2 write (for RF4)
> >
> > The extension to this idea is multiple backup/secondary replicas. So you
> > have RF5 or RF6 or higher, but still are performing CL2 against the
> > pr
datacenters, but are doing
local_quorum on the one datacenter. Well, except switchover is a bit more
granular if you run out of replicas in the local.
On Wed, Mar 14, 2018 at 5:17 PM, Jeff Jirsa wrote:
> Write at CL 3 and read at CL 2
>
> --
> Jeff Jirsa
>
>
> > On Mar 14
I also wonder if the state of hinted handoff can inform the validity of
extra replicas. Repair is mentioned in 7168.
On Wed, Mar 14, 2018 at 4:55 PM, Carl Mueller
wrote:
> For my reference: https://issues.apache.org/jira/browse/CASSANDRA-7168
>
>
> On Wed, Mar 14, 2018 at 4:
gt; jira/browse/CASSANDRA-13442
>
> It's been discussed quite a bit offline and I did a presentation on it at
> NGCC. Hopefully we will see some movement on it soon.
>
> Ariel
>
> On Wed, Mar 14, 2018, at 5:40 PM, Carl Mueller wrote:
> > Currently there is little use for RF4
Currently there is little use for RF4. You're getting the requirements of
QUORUM-3 but only one extra backup.
I'd like to propose something that would make RF4 a sort of more heavily
backed up RF3.
A lot of this is probably achievable with strictly driver-level logic, so
perhaps it would belong m
Alternative: JVM per vnode.
On Thu, Feb 22, 2018 at 4:52 PM, Carl Mueller
wrote:
> BLoom filters... nevermind
>
>
> On Thu, Feb 22, 2018 at 4:48 PM, Carl Mueller <
> carl.muel...@smartthings.com> wrote:
>
>> Is the current reason for a large starting heap due to t
BLoom filters... nevermind
On Thu, Feb 22, 2018 at 4:48 PM, Carl Mueller
wrote:
> Is the current reason for a large starting heap due to the memtable?
>
> On Thu, Feb 22, 2018 at 4:44 PM, Carl Mueller <
> carl.muel...@smartthings.com> wrote:
>
>> ... compact
Is the current reason for a large starting heap due to the memtable?
On Thu, Feb 22, 2018 at 4:44 PM, Carl Mueller
wrote:
> ... compaction on its own jvm was also something I was thinking about,
> but then I realized even more JVM sharding could be done at the table level.
>
> On
;> orgs who run multiple cassandra instances on the same node (multiple
> >> gossipers in that case is at least a little wasteful).
> >>
> >> I've also played around with using domain sockets for IPC inside of
> >> cassandra. I never ran a proper benchmark, but
GC pauses may have been improved in newer releases, since we are in 2.1.x,
but I was wondering why cassandra uses one jvm for all tables and
keyspaces, intermingling the heap for on-JVM objects.
... so why doesn't cassandra spin off a jvm per table so each jvm can be
tuned per table and gc tuned a
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.62.7953&rep=rep1&type=pdf
looks to be an adaptive approach where the "initial guess" bloom filters
are enhanced with more layers of ones generated after usage stats are
gained.
Disclaimer: I suck at reading academic papers.
When memtables/CommitLogs are flushed to disk/sstable, does the sstable go
through sstable organization specific to each compaction strategy, or is
the sstable creation the same for all compactionstrats and it is up to the
compaction strategy to recompact the sstable if desired?
wrote:
> There’s a company using TWCS in this config - I’m not going to out them,
> but I think they do it (or used to) with aggressive tombstone sub
> properties. They may have since extended/enhanced it somewhat.
>
> --
> Jeff Jirsa
>
>
> > On Feb 16, 2018, at 2
sstable. This strategy assumes TTLs would be cleaning up these row
fragments, so that the distribution of the data across many many sstables
wouldn't pollute the bloom filters too much.
On Fri, Feb 16, 2018 at 4:24 PM, Carl Mueller
wrote:
> Oh and as a further refinement outside of our
fault, and as the
near term comes into play, that is considered a different "level".
Of course all this relies on the ability to look at the data in the rowkey
or the TTL associated with the row.
On Fri, Feb 16, 2018 at 4:17 PM, Carl Mueller
wrote:
> We have a scheduler app here at sm
We have a scheduler app here at smartthings, where we track per-second
tasks to be executed.
These are all TTL'd to be destroyed after the second the event was
registered with has passed.
If the scheduling window was sufficiently small, say, 1 day, we could
probably use a time window compaction s
for these than the general row cache
difficulties for cassandra data. Those caches could only be loaded during
compaction operations too.
On Thu, Feb 15, 2018 at 11:24 AM, Jeff Jirsa wrote:
> Worth a JIRA, yes
>
>
> On Wed, Feb 14, 2018 at 9:45 AM, Carl Mueller <
> carl.muel
So is this at least a decent candidate for a feature request ticket?
On Tue, Feb 13, 2018 at 8:09 PM, Carl Mueller
wrote:
> I'm particularly interested in getting the tombstones to "promote" up the
> levels of LCS more quickly. Currently they get attached at the low level
gic from
> 7109 would probably go a long ways. Though if you are bulk inserting
> deletes that is what you would end up with, so maybe it already works.
>
> -Jeremiah
>
> > On Feb 13, 2018, at 6:04 PM, Jeff Jirsa wrote:
> >
> > On Tue, Feb 13, 2018 at 2:38 PM, Carl Mu
In process of doing my second major data purge from a cassandra system.
Almost all of my purging is done via row tombstones. While performing this
the second time while trying to cajole compaction to occur (in 2.1.x,
LevelledCompaction) to goddamn actually compact the data, I've been
thinking as t
63 matches
Mail list logo