Iftach,

This is a very useful finding. While I don't know the answer to your question below, I would like to take this opportunity to encourage you to write a blog about this finding =)

Thanks,

-- Ricardo

On 7/7/20 2:48 AM, Iftach Ben-Yosef wrote:
I believe I got it to work with "source->dest.producer.compression.type = gzip" Is there a way to set this globally for the mm2 process and not to do it per mirroring flow?

Thanks,
Iftach


On Tue, Jul 7, 2020 at 9:34 AM Iftach Ben-Yosef <iben-yo...@outbrain.com <mailto:iben-yo...@outbrain.com>> wrote:

    Upon further investigation, it the issue is indeed compression as
    in the logs i see 'ompression.type = none'
    Does anyone know how to configure gzip compression for
    the connect-mirror-maker.properties file?

    I tried "producer.override.compression.type = gzip" but that
    doesnt seem to work.

    Thanks,
    Iftach


    On Mon, Jul 6, 2020 at 8:03 AM Iftach Ben-Yosef
    <iben-yo...@outbrain.com <mailto:iben-yo...@outbrain.com>> wrote:

        Ricardo,

        Thanks for the reply. I did some more testing. I tried
        mirroring a different topic from 1 of the 3 source clusters
        used from the previous test, into the same destination
        cluster. Again, the result topic on the dest cluster is about
        2 times larger than the source, same config and retention
        (both have compression.type producer)

        regarding my configuration, other than the clusters and
        mirroring direction/topic whitelist configs I have the
        following - changed all the prefixes to .. to make it shorter;

        ..tasks.max = 128
        ..fetch.max.wait.ms <http://fetch.max.wait.ms> = 150
        ..fetch.min.bytes = 10485760
        ..fetch.max.bytes = 52428800
        ..max.request.size = 10485760
        ..enable.idempotence = true
        ..sync.topic.configs.enabled=false (played with this as true
        and as false)

        Don't see how anything other than perhaps the idempotency
        could affect the topic size. I have also tried without
        idempotency config, but it looks the same - and in any case I
        expect idempotency to maybe decrease the topic size, not
        increase it...

        Thanks,
        Iftach



        On Thu, Jul 2, 2020 at 5:30 PM Ricardo Ferreira
        <rifer...@riferrei.com <mailto:rifer...@riferrei.com>> wrote:

            Iftach,

            I think you should try observe if this happens with other
            topics. Maybe something unrelated might have happened
            already in the case of the topic that currently has ~3TB
            of data -- making things even harder to troubleshoot.

            I would recommend creating a new topic with few partitions
            and configure that topic in the whitelist. Then, observe
            if the same behavior occur. If it does then it might be
            something wrong with MM2 -- likely a bug or
            misconfiguration. If not then you can eliminate MM2 as the
            cause and work at a smaller scale to see if something went
            south with the topic. Maybe that could be something not
            even related to MM2 such as network failures that forced
            the internal producer of MM2 to retry multiple times and
            hence produce more data that it should.

            The bottom-line is that certain troubleshooting exercises
            are hard or sometimes impossible to diagnose with cases
            that might have been an outlier.

            -- Ricardo

            On 7/1/20 10:02 AM, Iftach Ben-Yosef wrote:
            Hi Ryanne, thanks for the quick reply.

            I had the thought it might be compression. I see that the topics 
have the
            following config "compression.type=producer". This is for both the 
source
            and destination topics. Should I check something else regarding 
compression?

            Also, the destination topics are larger than the same topic being 
mirrored
            using mm1 - the sum of the 3 topics mirrored by mm2 is much larger 
than the
            1 topic that mm1 produced (they have the same 3 source topics, only 
mm1
            aggregates to 1 destination topic). Retention is again the same 
between the
            mm1 destination topic and the mm2 destination topics.

            Thanks,
            Iftach


            On Wed, Jul 1, 2020 at 4:54 PM Ryanne Dolan<ryannedo...@gmail.com>  
<mailto:ryannedo...@gmail.com>  wrote:

            Iftach, is it possible the source topic is compressed?

            Ryanne

            On Wed, Jul 1, 2020, 8:39 AM Iftach Ben-Yosef<iben-yo...@outbrain.com>  
<mailto:iben-yo...@outbrain.com>
            wrote:

            Hello everyone.

            I'm testing mm2 for our cross dc topic replication. We used to do it
            using
            mm1 but faced various issues.

            So far, mm2 is working well, but I have 1 issue which I can't really
            explain; the destination topic is larger than the source topic.

            For example, We have 1 topic which on the source cluster is around
            2.8-2.9TB withretention.ms  <http://retention.ms>=86400000

            I added to our mm2 cluster the "sync.topic.configs.enabled=false" 
config,
            and edited theretention.ms  <http://retention.ms>  of the 
destination topic to be 57600000.
            Other
            than that, I haven't touched the topic created by mm2 on the 
destination
            cluster.

            By logic I'd say that if I shortened the retention on the 
destination,
            the
            topic size should decrease, but in practice, I see that it is 
larger than
            the source topic (it's about 4.6TB).
            This same behaviour is seen on all 3 topics which I am currently
            mirroring
            (all 3 from different source clusters, into the same destination
            clusters)
            Does anyone have any idea as to why mm2 acts this way for me?

            Thanks,
            Iftach

            --
            The above terms reflect a potential business arrangement, are 
provided
            solely as a basis for further discussion, and are not intended to 
be and
            do
            not constitute a legally binding obligation. No legally binding
            obligations
            will be created, implied, or inferred until an agreement in final 
form is
            executed in writing by all parties involved.


            This email and any
            attachments hereto may be confidential or privileged.  If you 
received
            this
            communication by mistake, please don't forward it to anyone else, 
please
            erase all copies and attachments, and please let me know that it 
has gone
            to the wrong person. Thanks.


The above terms reflect a potential business arrangement, are provided solely as a basis for further discussion, and are not intended to be and do not constitute a legally binding obligation. No legally binding obligations will be created, implied, or inferred until an agreement in final form is executed in writing by all parties involved.

This email and any attachments hereto may be confidential or privileged.  If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person. Thanks.

Reply via email to