Updated the KIP with a few example log lines before/after the change. Daniel
Dániel Urbán <urb.dani...@gmail.com> ezt írta (időpont: 2023. jún. 7., Sze, 13:59): > Hi Chris, > > Thank you for your comments! I updated the KIP. I still need to add the > example before/after log lines, will do that soon, but I addressed all the > other points. > 1. Added more details on thread renaming under Public Interfaces, removed > pseudo code. > 2. Removed the stale header - originally, client.id related changes were > part of the KIP, and I failed removing all leftovers of that version. > 3. Threads listed under Public Interfaces with current/proposed names. > 4. Added a comment in the connect-log4j.properties, similar to the one > introduced in KIP-449. We don't have a dedicated MM2 log4j config, not sure > if we should introduce it here. > 5. Clarified testing section - I think thread names should not be tested > (they never were), but testing will focus on the new MDC context value. > > Thanks, > Daniel > > Chris Egerton <chr...@aiven.io.invalid> ezt írta (időpont: 2023. jún. 5., > H, 16:46): > >> Hi Daniel, >> >> Thanks for the updates! A few more thoughts: >> >> 1. The "Proposed Changes" section seems a bit like a pseudocode >> implementation of the KIP. We don't really need this level of detail; most >> of the time, we're just looking for implementation details that don't >> directly affect the user-facing changes proposed in the "Public >> Interfaces" >> section but are worth mentioning because they (1) demonstrate how the >> user-facing changes are made possible, (2) indirectly affect user-facing >> behavior, or (3) go into more detail (like providing examples) about the >> user-facing behavior. For this KIP, I think examples of user-facing >> behavior (like before/after of thread names and log messages) and possibly >> an official description of the scope of the changes (which threads are >> going to be renamed and/or include the new MDC key, and which aren't?) are >> all that we'd really need in this section; everything else is fairly clear >> IMO. FWIW, the reason we want to discourage going into too much detail >> with >> KIPs is that it can quickly devolve into code review over mailing list, >> which can hold KIPs up for longer than necessary when the core design >> changes they contain are already basically accepted by everyone. >> >> 2. The "MM2 distributed mode client.id and log change" header seems like >> it >> may no longer be accurate; the contents do not mention any changes to >> client IDs. I might be missing something though; please correct me if I >> am. >> >> 3. Can you provide some before/after examples of what thread names and log >> messages will look like? I'm wondering about the thread that the >> distributed herder runs on, threads for connectors and tasks, and threads >> for polling internal topics (which we currently manage with the >> KafkaBasedLog class). It's fine if some of these are unchanged, I just >> want >> to better understand the scope of the proposed changes and get an idea of >> how they may appear to users. >> >> 4. There's no mention of changes to the default Log4j config files that we >> ship. Is this intentional? I feel like we need some way for users to >> easily >> discover this feature; if we're not going to touch our default Log4j >> config >> files, is there another way that we can expect users to find out about the >> new MDC key? >> >> 5. RE the "Test Plan" section: can you provide a little more detail of >> which cases we'll be covering with the proposed unit tests? Will we be >> verifying that the MDC context is set in various places? If so, which >> places? And the same with thread names? (There doesn't have to be a ton of >> detail, but a little more than "unit tests" would be nice 😄) >> >> Cheers, >> >> Chris >> >> On Mon, Jun 5, 2023 at 9:45 AM Dániel Urbán <urb.dani...@gmail.com> >> wrote: >> >> > I updated the KIP accordingly. Tried to come up with more generic terms >> in >> > the Connect code instead of referring to "flow" anywhere. >> > Daniel >> > >> > Dániel Urbán <urb.dani...@gmail.com> ezt írta (időpont: 2023. jún. 5., >> H, >> > 14:49): >> > >> > > Hi Chris, >> > > >> > > Thank you for your comments! >> > > >> > > I agree that the toString based logging is not ideal, and I believe >> all >> > > occurrences are within a proper logging context, so they can be >> ignored. >> > > If thread names can be changed unconditionally, I agree, using a new >> MDC >> > > key is the ideal solution. >> > > >> > > Will update the KIP accordingly. >> > > >> > > Thanks, >> > > Daniel >> > > >> > > Chris Egerton <chr...@aiven.io.invalid> ezt írta (időpont: 2023. jún. >> > 2., >> > > P, 22:23): >> > > >> > >> Hi Daniel, >> > >> >> > >> Are there any cases of Object::toString being used by internal >> classes >> > for >> > >> logging context that can't be covered by MDC contexts? For example, >> > >> anything logged by WorkerSinkTask (or any WorkerTask subclass) >> already >> > has >> > >> the MDC set for the task [1]. Since the Object::toString-prefixed >> style >> > of >> > >> logging is a bit obsolete after the introduction of MDC contexts it'd >> > feel >> > >> a little strange to continue trying to accommodate it, especially if >> the >> > >> changes from this KIP are going to be opt-in regardless. >> > >> >> > >> As far as thread names go: unlike log statements, I don't believe >> > changing >> > >> them requires a KIP, and would be fine with merging a PR for that >> > without >> > >> worrying about compatibility. >> > >> >> > >> With that in mind, I'm still wondering if we can add a separate MDC >> key >> > >> for >> > >> MM2 replication flows (perhaps "mirror.maker.flow") and >> unconditionally >> > >> add >> > >> that to the MDC contexts for every thread that gets spun up by each >> > >> DistributedHerder instance that MM2 creates. This would be different >> > from >> > >> the draft PR and KIP, which involve altering the content of the >> existing >> > >> "connector.context" MDC key. Users would opt in to the change by >> > altering >> > >> their Log4j config instead of altering their MM2 config, which >> matches >> > >> precedent set with the introduction of the "connector.context" key. >> > >> >> > >> Thoughts? >> > >> >> > >> [1] - >> > >> >> > >> >> > >> https://github.com/apache/kafka/blob/146a6976aed0d9f90c70b6f21dca8b887cc34e71/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerTask.java#L255 >> > >> >> > >> Cheers, >> > >> >> > >> Chris >> > >> >> > >> On Tue, May 23, 2023 at 11:36 AM Dániel Urbán <urb.dani...@gmail.com >> > >> > >> wrote: >> > >> >> > >> > Hi Chris, >> > >> > >> > >> > Thank you for your comments! >> > >> > 1. This approach is OK for me. I thought that the "sample" configs >> in >> > >> the >> > >> > repo do not fall into the same category as the default of a new >> > config. >> > >> > Adding a commented line instead should be ok, and the future >> opt-out >> > >> change >> > >> > sounds good to me. >> > >> > >> > >> > 2. Besides the MDC, there are 2 other places where the flow context >> > >> > information could be added: >> > >> > - Some of the Connect internal classes use their .toString methods >> > when >> > >> > logging (e.g. WorkerSinkTask has a line like this: log.info("{} >> > >> Executing >> > >> > sink task", this);). These toString implementations contain the >> > >> connector >> > >> > name, and nothing else, so in MM2 dedicated mode, adding the flow >> > would >> > >> > make these lines unique. >> > >> > - Connect worker thread names. Currently, they contain the >> connector >> > >> > name/task ID, but to make them unique, the flow should be added in >> MM2 >> > >> > distributed mode. >> > >> > In my draft PR, I changed both of these, but for the sake of >> backward >> > >> > compatibility, I made the new toString/thread name dependent on the >> > new, >> > >> > suggested configuration flag. >> > >> > If we go with a new MDC key, but we still want to change the >> toString >> > >> > methods and the thread names, we still might need an extra flag to >> > turn >> > >> on >> > >> > the new behavior. >> > >> > AFAIK the toString calls are all made inside a proper logging >> context, >> > >> so >> > >> > changing the toString methods don't add much value. I think that >> the >> > >> thread >> > >> > name changes are useful, giving more context for log lines written >> > >> outside >> > >> > of a log context. >> > >> > >> > >> > In short, I think that MDC + thread name changes are necessary to >> make >> > >> MM2 >> > >> > dedicated logs useful for diagnostics. If we keep both, then maybe >> > >> having a >> > >> > single config to control both at the same time is better than >> having a >> > >> new >> > >> > MDC key (configured separately in the log pattern) and a config >> flag >> > >> (set >> > >> > separately in the properties file) for the thread name change. >> > >> > >> > >> > Thanks, >> > >> > Daniel >> > >> > >> > >> >> > > >> > >> >