Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

Chris Egerton Mon, 05 Jun 2023 07:46:18 -0700

Hi Daniel,

Thanks for the updates! A few more thoughts:

1. The "Proposed Changes" section seems a bit like a pseudocode
implementation of the KIP. We don't really need this level of detail; most
of the time, we're just looking for implementation details that don't
directly affect the user-facing changes proposed in the "Public Interfaces"
section but are worth mentioning because they (1) demonstrate how the
user-facing changes are made possible, (2) indirectly affect user-facing
behavior, or (3) go into more detail (like providing examples) about the
user-facing behavior. For this KIP, I think examples of user-facing
behavior (like before/after of thread names and log messages) and possibly
an official description of the scope of the changes (which threads are
going to be renamed and/or include the new MDC key, and which aren't?) are
all that we'd really need in this section; everything else is fairly clear
IMO. FWIW, the reason we want to discourage going into too much detail with
KIPs is that it can quickly devolve into code review over mailing list,
which can hold KIPs up for longer than necessary when the core design
changes they contain are already basically accepted by everyone.

2. The "MM2 distributed mode client.id and log change" header seems like it
may no longer be accurate; the contents do not mention any changes to
client IDs. I might be missing something though; please correct me if I am.

3. Can you provide some before/after examples of what thread names and log
messages will look like? I'm wondering about the thread that the
distributed herder runs on, threads for connectors and tasks, and threads
for polling internal topics (which we currently manage with the
KafkaBasedLog class). It's fine if some of these are unchanged, I just want
to better understand the scope of the proposed changes and get an idea of
how they may appear to users.

4. There's no mention of changes to the default Log4j config files that we
ship. Is this intentional? I feel like we need some way for users to easily
discover this feature; if we're not going to touch our default Log4j config
files, is there another way that we can expect users to find out about the
new MDC key?

5. RE the "Test Plan" section: can you provide a little more detail of
which cases we'll be covering with the proposed unit tests? Will we be
verifying that the MDC context is set in various places? If so, which
places? And the same with thread names? (There doesn't have to be a ton of
detail, but a little more than "unit tests" would be nice 😄)

Cheers,

Chris

On Mon, Jun 5, 2023 at 9:45 AM Dániel Urbán <urb.dani...@gmail.com> wrote:

> I updated the KIP accordingly. Tried to come up with more generic terms in
> the Connect code instead of referring to "flow" anywhere.
> Daniel
>
> Dániel Urbán <urb.dani...@gmail.com> ezt írta (időpont: 2023. jún. 5., H,
> 14:49):
>
> > Hi Chris,
> >
> > Thank you for your comments!
> >
> > I agree that the toString based logging is not ideal, and I believe all
> > occurrences are within a proper logging context, so they can be ignored.
> > If thread names can be changed unconditionally, I agree, using a new MDC
> > key is the ideal solution.
> >
> > Will update the KIP accordingly.
> >
> > Thanks,
> > Daniel
> >
> > Chris Egerton <chr...@aiven.io.invalid> ezt írta (időpont: 2023. jún.
> 2.,
> > P, 22:23):
> >
> >> Hi Daniel,
> >>
> >> Are there any cases of Object::toString being used by internal classes
> for
> >> logging context that can't be covered by MDC contexts? For example,
> >> anything logged by WorkerSinkTask (or any WorkerTask subclass) already
> has
> >> the MDC set for the task [1]. Since the Object::toString-prefixed style
> of
> >> logging is a bit obsolete after the introduction of MDC contexts it'd
> feel
> >> a little strange to continue trying to accommodate it, especially if the
> >> changes from this KIP are going to be opt-in regardless.
> >>
> >> As far as thread names go: unlike log statements, I don't believe
> changing
> >> them requires a KIP, and would be fine with merging a PR for that
> without
> >> worrying about compatibility.
> >>
> >> With that in mind, I'm still wondering if we can add a separate MDC key
> >> for
> >> MM2 replication flows (perhaps "mirror.maker.flow") and unconditionally
> >> add
> >> that to the MDC contexts for every thread that gets spun up by each
> >> DistributedHerder instance that MM2 creates. This would be different
> from
> >> the draft PR and KIP, which involve altering the content of the existing
> >> "connector.context" MDC key. Users would opt in to the change by
> altering
> >> their Log4j config instead of altering their MM2 config, which matches
> >> precedent set with the introduction of the "connector.context" key.
> >>
> >> Thoughts?
> >>
> >> [1] -
> >>
> >>
> https://github.com/apache/kafka/blob/146a6976aed0d9f90c70b6f21dca8b887cc34e71/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerTask.java#L255
> >>
> >> Cheers,
> >>
> >> Chris
> >>
> >> On Tue, May 23, 2023 at 11:36 AM Dániel Urbán <urb.dani...@gmail.com>
> >> wrote:
> >>
> >> > Hi Chris,
> >> >
> >> > Thank you for your comments!
> >> > 1. This approach is OK for me. I thought that the "sample" configs in
> >> the
> >> > repo do not fall into the same category as the default of a new
> config.
> >> > Adding a commented line instead should be ok, and the future opt-out
> >> change
> >> > sounds good to me.
> >> >
> >> > 2. Besides the MDC, there are 2 other places where the flow context
> >> > information could be added:
> >> > - Some of the Connect internal classes use their .toString methods
> when
> >> > logging (e.g. WorkerSinkTask has a line like this: log.info("{}
> >> Executing
> >> > sink task", this);). These toString implementations contain the
> >> connector
> >> > name, and nothing else, so in MM2 dedicated mode, adding the flow
> would
> >> > make these lines unique.
> >> > - Connect worker thread names. Currently, they contain the connector
> >> > name/task ID, but to make them unique, the flow should be added in MM2
> >> > distributed mode.
> >> > In my draft PR, I changed both of these, but for the sake of backward
> >> > compatibility, I made the new toString/thread name dependent on the
> new,
> >> > suggested configuration flag.
> >> > If we go with a new MDC key, but we still want to change the toString
> >> > methods and the thread names, we still might need an extra flag to
> turn
> >> on
> >> > the new behavior.
> >> > AFAIK the toString calls are all made inside a proper logging context,
> >> so
> >> > changing the toString methods don't add much value. I think that the
> >> thread
> >> > name changes are useful, giving more context for log lines written
> >> outside
> >> > of a log context.
> >> >
> >> > In short, I think that MDC + thread name changes are necessary to make
> >> MM2
> >> > dedicated logs useful for diagnostics. If we keep both, then maybe
> >> having a
> >> > single config to control both at the same time is better than having a
> >> new
> >> > MDC key (configured separately in the log pattern) and a config flag
> >> (set
> >> > separately in the properties file) for the thread name change.
> >> >
> >> > Thanks,
> >> > Daniel
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-916: MM2 distributed mode flow log context

Reply via email to