Re: CommitLogReadHandler for Cassandra 4

Stefan Miklosovic Thu, 05 May 2022 03:15:57 -0700

Hi Sanal,

This was quite an unknown territory for me as well, Debezium connector
was implemented in such a way that it loaded the schema, but the
implementation of the handler has not seen any updates which happened
after the schema was loaded. Debezium connector is quite special
because it runs as a standalone program (different jvm) so if you go
and change your schema on your node, changes are applied in the
context of Cassandra JVM, but Debezium connector does not know
anything about it because it was not notified about that at all. The
obvious result of that was that if you detected a new commit log file
to process, it would see that its "cdc_enabled" is false, because the
fact whether a table is cdc enabled or not is not serialised and part
of Mutation. It is somewhere is table metadata in PartitionUpdate or
similar, but from connector's point of view it was never changed. This
is a little bit harder concept to grasp so feel free to go over this
mentally multiple times.


Because of the complexity of this problem, I wrote a document for the
Debezium team to fully understand what is going on, you can read more
about it in depth here (1) and here (2).

So, I load schemas only on connectors startup, but after that, I need
to be somehow notified what changes have happened in Cassandra JVM so
I can act accordingly in the connector. The solution I came up with is
that I implemented a schema change listener in driver which reacts to
changes done in Cassandra and I apply it to my "local", "connectors"
Cassandra stuff just for having schemas updated in "connectors jvm"
and metadata would contain changes I am interested in.

If you somehow manage to run your connector in the same JVM as
Cassandra runs, I think you would not have this kind of problem. I
guess the same would hold if you run your handler as an JVM agent to
Cassandra.

(1) 
https://github.com/debezium/debezium-connector-cassandra/blob/ac43b7797c084c3e67cedde3662af1e58de8a4c2/REPORT.adoc
(2) 
https://github.com/debezium/debezium-connector-cassandra/blob/ac43b7797c084c3e67cedde3662af1e58de8a4c2/REPORT_2.adoc

On Wed, 4 May 2022 at 14:30, Sanal Vasudevan <get2sa...@gmail.com> wrote:
>
> Hi Stefan,
>
> First of all, many thanks for responding to my email.
> Let me explain my journey so far with this. I could not find any 
> documentation for this, so it is good to have someone to discuss this :)
>
> The program which I had earlier for version 3.9 did the following:
> 3.9:
> Config.setClientMode(true);
>
> Porting to 3.11, I used the following:
> DatabaseDescriptor.clientInitialization();
>
> Now with 4.0, when I use DatabaseDescriptor.clientInitialization(), it throws 
> up an error leading something as follows:
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.cassandra.config.DatabaseDescriptor.getMaxMutationSize(DatabaseDescriptor.java:1959)
>         at org.apache.cassandra.db.IMutation.<clinit>(IMutation.java:29)
>         ... 3 more
>
> Then I tried
>         DatabaseDescriptor.daemonInitialization()
> with system property, -Dcassandra.config=file:///path/to/cassandra.yaml
>
> After this, it errored out for property cassandra.storagedir not set. I set 
> this to a dummy value,
> System.setProperty("cassandra.storagedir","/tmp");
>
> With this, I was able to run the standalone program without errors but I was 
> not able to read mutations from user tables.
> After loading Schema using Schema.instance.load(keyspace), I was able to read 
> mutations from the commit logs.
>
> I looked at the code that you've implemented, I have some questions:
> 1) For Cassandra 3 and Cassandra 4, you have used 
> DatabaseDescriptor.toolInitialization()
>         May I ask if external applications should always use 
> DatabaseDescriptor.toolInitialization() ?
>
> 2) In your code, keyspace metadata (table metadata and column metadata) is 
> not constructed and loaded into the Schema instance.
>      You are using Schema.instance.loadFromDisk(false)
>       Is this the preferred way to load the schema?
>
> I will try out your approach and get back soon.
>
> Again, many thanks.
>
> Best regards
> Sanal
>
> On Wed, May 4, 2022 at 2:44 PM Stefan Miklosovic 
> <stefan.mikloso...@instaclustr.com> wrote:
>>
>> Hi Sanal,
>>
>> I have recently updated a project called Debezium and its Cassandra
>> connector to work with Cassandra 4 (1)
>>
>> The implementation of CommitLogReadHandler is here (2)
>>
>> (1) https://github.com/debezium/debezium-connector-cassandra
>> (2) 
>> https://github.com/debezium/debezium-connector-cassandra/blob/main/cassandra-4/src/main/java/io/debezium/connector/cassandra/Cassandra4CommitLogReadHandlerImpl.java
>>
>> Feel free to reach me privately or here on ML if you have any specific
>> questions.
>>
>> Regards
>>
>> Stefan
>>
>> On Wed, 4 May 2022 at 01:40, Sanal Vasudevan <get2sa...@gmail.com> wrote:
>> >
>> > Hi Folks,
>> >
>> > I have a standalone Java application that implements the interface 
>> > CommitLogReadHandler to read cassandra commit log files generated by 
>> > Cassandra 3.11.
>> > I recently tried to use this to read the commit logs generated by 
>> > Cassandra 4, but it does not work.
>> > Has anyone tried to implement CommitLogReadHandler for Cassandra 4 or is 
>> > there a better way to read/parse Cassandra 4 commit logs?
>> > Any help would be appreciated.
>> >
>> > Thanks!
>> >
>> > Best regards
>> > Sanal
>
>
>
> --
> Sanal Vasudevan Nair

Re: CommitLogReadHandler for Cassandra 4

Reply via email to