Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

Benedict Tue, 17 Sep 2024 02:03:31 -0700

All of these options are managed by us, the only property that is passed through to chronicle is the “RollCycle” that we can trivially replicate, or that we could simply deprecate.

On 17 Sep 2024, at 09:57, Štefan Miklošovič <[email protected]> wrote:

There are configuration properties related to controlling what that bin log does in runtime so if we completely changed the vehicle it operates on then the only thing which would stay in common is the name of the command and the logical operation it does (enable / disable, get the config if there is any) ...

If we ever make another solution happen, I think it would be better if we just kept the old stuff in and developed something parallel and when it is stable enough we would ditch the old solution.

BTW I have one technical question here, not directed to Benedict as I reply him but to the broader audience out there:

If this in the javadocs is true as I linked that above already:

"Performance safety is accomplished by feeding items to the binary log using a weighted queue and dropping records if the binary log falls sufficiently far behind."

then how is it possible that FQL works? If there is a chance to drop some events, hence we dropped the actual query which was executed, then when we replay the logs (FQL framework can replay the logs against an empty database), then there is no guarantee that we actually get the same state of the database after it is replayed? So FQL is in this sense "the best effort" kind of tooling?

On Tue, Sep 17, 2024 at 10:37 AM Benedict <[email protected]> wrote:
My point is only that AFAICT we use it for something incredibly basic that we do all the time elsewhere without it.

I’m not proposing we remove it, I don’t have a position on that. But if we don’t _trust_ ourselves to replace it we should get out of the database game.

The fact it would break compatibility between releases is suboptimal, but IMO not at all a dealbreaker because these files are not required to be compatible between versions - they’re offline logs and I think it would be fine to require different viewers for files produced by different versions of Cassandra.

I do not think any of the nodetool methods would be affected by this, as they do not appear to touch the contents of the log files.

On 17 Sep 2024, at 09:28, Štefan Miklošovič <[email protected]> wrote:

to Benedict:

well ... I was not around when the decision about the usage of Chronicle Queues was made. I think that at that time it was the most obvious candidate without reinventing the wheel given the features and capabilities it had so taking something off the shelf was a natural conclusion.

Josh / Jordan:

not only FQL but Audit as well these are two separate things. There is also quite a "rich" ecosystem around that.

1) nodetool commands like

enableauditlog
enablefullquerylog
disableauditlog
disablefullquerylog
getauditlog
getfullquerylog

Also, because the files it produces are binary, we need a special tooling to inspect it, it is in tools/fqltool with a bunch of classes, and there is also an AuditLogViewer for reviewing audit logs.

There are MBean methods enabling nodetool commands.

We have also shipped that in two major releases (4.0 and now in 5.0) so the community is quite well used to this, they have the processes set around this etc.

I mention this all because it is just not so easy to replace it with something else if somebody wanted that, in any case. How do we even go around deprecating this if we are indeed going to replace that?

To discuss the release aspect they have in place: I think you are right that the latest ea is as close as possible, if not the same, as what they release privately. Yes. But if we want to stick to the rule that we upgrade only to the latest ea relese before their next minor, then

1) we will be always at least one minor late
2) we do not know when they make up their minds to transition to a new minor so we can upgrade to the latest ea one minor before
3) if something is broken and we need to fix it and we are on ea, then what we get to update to is the latest ea at that time which might fix the issue but it will also bring new stuff in which might open doors to instability as well. So we update to fix the bugs but we might include new ones unknowingly.

Anyway, I don't think this has any silver bullet solution, we might just stick to the latest "ea" and be done with it. I do not expect this project to evolve wildly and unpredictably, it just solves "one problem", there is basically nothing new coming in.

Brandon:

I understand your concerns about phoning home but

1) we already resolved this by setting the respective property
2) I do not think that Chronicle will mess with this once they introduce that. There is nothing to "improve" or "change" there. It is phoning home or not and it is driven by one property. If they made a change that we can not turn it off then we would really be in trouble but for now we are not and practically speaking I don't expect this would change.

I know that this might sound like wishful thinking but in practical terms I really just don't expect this phoning home thing would come back ever.

Speaking of alternatives, I think the primary reason Chronicle was used is this (1).

"It's goal is good enough performance, predictable footprint, simplicity in terms of implementation and configuration and most importantly minimal impact on producers of log records."

While I understand English (I guess, well enough :D), I just don't understand what "good enough performance" is. How is this measured? What is a "predictable footprint"? Was that measured too? How did we quantify that?

" Performance safety is accomplished by feeding items to the binary log using a weighted queue and dropping records if the binary log falls sufficiently far behind."

This is interesting, if I understand correctly, the messages are weighted and the heavier they are, the more probable it is they will be dropped when it is overloaded? Or vice versa, the tighter ones are dropped first?

Have we _ever_ experienced in production that some log events were really dropped? Has anybody ever hit that?

When it comes to alternatives, what about logback + slf4j? It has appenders where we want, it is sync / async, we can code some nio appender too I guess, it logs it as text into a file so we do not need any special tooling to review that. For tailing which Chronicle also offers, I guess "tail -f that.log" just does the job? logback even rolls the files after they are big enough so it rolls the files the same way after some configured period / size as Chronicle does (It even compresses the logs).

Do we log so much so that battle-tested logback is just absolutely not enough for us? Come on, this is not a rocket science that we need to use a library from the realm of "high frequency trading" to just append queries and audit logs as they are executed. logback can handle the load we have just fine imo ...

Or maybe I am completely wrong and we just HAVE TO use Chronicle?

(1) https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/binlog/BinLog.java#L58-L69

On Tue, Sep 17, 2024 at 3:12 AM Brandon Williams <[email protected]> wrote:
My concern is that we have to keep making sure it's not phoning home(1,2).

(1) https://issues.apache.org/jira/browse/CASSANDRA-18538
(2) https://issues.apache.org/jira/browse/CASSANDRA-19656

Kind Regards,
Brandon

On Mon, Sep 16, 2024 at 7:53 PM Josh McKenzie <[email protected]> wrote:
>
> I think it's FQLTool only right now; I bumped into it recently doing the JDK21 compat work.
>
> I'm not concerned about current usage / dependency, but if our usage expands this could start to become a problem and that's going to be a hard thing to track and mange.
>
> So reading through those issues Stefan, I think it boils down to:
>
> The latest ea is code identical to the stable release
> Subsequent bugfixes get applied to the customer-only stable branch and one release forward
> Projects running ea releases would need to cherry-pick those bugfixes back or run on the next branch's ea, which could introduce the project to API changes or other risks
>
> Assuming that's the case... blech. Our exposure is low, but that seems like a real pain.
>
> On Mon, Sep 16, 2024, at 5:16 PM, Benedict wrote:
>
>
> Don’t we essentially just use it as a file format for storing a couple of kinds of append-only data?
>
> I was never entirely clear on the value it brought to the project.
>
>
> On 16 Sep 2024, at 22:11, Jordan West <[email protected]> wrote:
>
>
> Thanks for the sleuthing Stefan! This definitely is a bit unfortunate. It sounds like a replacement is not really practical so I'll ignore that option for now, until a viable alternative is proposed. I am -1 on us writing our own without strong, strong justification -- primarily because I think the likelihood is we introduce more bugs before getting to something stable.
>
> Regarding the remaining options, mostly some thoughts:
>
> - it would be nice to have some specific evidence of other projects using the EA versions and what their developers have said about it.
> - it sounds like if we go with the EA route, the onus to test for correctness / compatibility increases. They do test but anything marked "early access" I think deserves more scrutiny from the C* community before release. That could come in the form of more tests (or showing that we already have good coverage of where its used).
> - i assume each time we upgrade we would pick the most recently released EA version
>
> Jordan
>
>
> On Mon, Sep 16, 2024 at 1:46 PM Štefan Miklošovič <[email protected]> wrote:
>
> We are using a library called Chronicle Queue (1) and its dependencies and we ship them in the distribution tarball.
>
> The version we use in 5.0 / trunk as I write this is 2.23.36. If you look closely here (2), there is one more release like this, 2.23.37 and after that all these releases have "ea" in their name.
>
> "ea" stands for "early access". The project has changed the versioning / development model in such a way that "ea" releases act, more or less, as glorified snapshots which are indeed released to Maven Central but the "regular" releases are not there. The reason behind this is that "regular" releases are published only for customers who pay to the company behind this project and they offer commercial support for that.
>
> "regular" releases are meant to get all the bug fixes after "ea" is published and they are official stable releases. On the other hand "ea" releases are the ones where the development happens and every now and then, once the developers think that it is time to cut new 2.x, they just publish that privately.
>
> I was investigating how this all works here (3) and while they said that, I quote (4):
>
> "In my experience this is consumed by a large number of open source projects reliably (for our other artifacts too). This development/ea branch still goes through an extensive test suite prior to release. Releases from this branch will contain the latest features and bug fixes."
>
> I am not completely sure if we are OK with this. For the record, Mick is not overly comfortable with that and Brandon would prefer to just replace it / get rid of this dependency (comments / reasons / discussion from (5) to the end)
>
> The question is if we are OK with how things are and if we are then what are the rules when upgrading the version of this project in Cassandra in the context of "ea" versions they publish.
>
> If we are not OK with this, then the question is what we are going to replace it with.
>
> If we are going to replace it, I very briefly took a look and there is practically nothing out there which would hit all the buttons for us. Chronicle is just perfect for this job and I am not a fan of rewriting this at all.
>
> I would like to have this resolved because there is CEP-12 I plan to deliver and I hit this and I do not want to base that work on something we might eventually abandon. There are some ideas for CEP-12 how to bypass this without using Chronicle but I would like to firstly hear your opinion.
>
> Regards
>
> (1) https://github.com/OpenHFT/Chronicle-Queue
> (2) https://repo1.maven.org/maven2/net/openhft/chronicle-core/
> (3) https://github.com/OpenHFT/Chronicle-Core/issues/668
> (4) https://github.com/OpenHFT/Chronicle-Core/issues/668#issuecomment-2322038676
> (5) https://issues.apache.org/jira/browse/CASSANDRA-18712?focusedCommentId=17878254&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17878254
>
>

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

Reply via email to