[ https://issues.apache.org/jira/browse/KAFKA-18066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900609#comment-17900609 ]
Peter Lee commented on KAFKA-18066: ----------------------------------- Hi [~ableegoldman] I’m working on this and would like to hear your thoughts.:D Currently, if we move the *creation* logic directly into the {{StreamThread}} constructor, it becomes harder to refactor tests that use mocks. For reference, see the [current test cases|https://github.com/peterxcli/kafka/blob/2519e4af0c19d2540093c283f14dfe4111a5a21e/streams/src/test/java/org/apache/kafka/streams/processor/internals/StreamThreadTest.java#L1391-L1461]. To address this, I’m considering splitting the initialization process as follows: # Keep the {{StreamThread}} constructor focused on mandatory, static dependencies: {code:java} final StreamThread streamThread = new StreamThread( time, config, adminClient, streamsMetrics, topologyMetadata, threadId, logContext, referenceContainer.assignmentErrorCode, referenceContainer.nextScheduledRebalanceMs, referenceContainer.nonFatalExceptionsToHandle, shutdownErrorHook, streamsUncaughtExceptionHandler, cache::resize ); {code} # Add an {{initializeComponents}} method for setting up additional components: {code:java} streamThread.initializeComponents( mainConsumer, restoreConsumer, changelogReader, originalReset, taskManager, stateUpdater );{code} However, this approach requires removing the {{final}} modifier from the properties set in {{{}initializeComponents{}}}. While it simplifies testing with mocks, it might introduce potential mutability concerns. I’d appreciate any suggestions or insights! Thanks! > Misleading/mismatched StreamThread id in logging > ------------------------------------------------ > > Key: KAFKA-18066 > URL: https://issues.apache.org/jira/browse/KAFKA-18066 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: A. Sophie Blee-Goldman > Assignee: Peter Lee > Priority: Minor > Labels: newbie, newbie++ > > While debugging a test application I was confused to see a number of log > lines where the StreamThread name appeared twice but had a different thread > id/index in the same message. For example: > {code:java} > [INFO ] 2024-11-19 04:59:14.541 > [e2e-963c5b74-0353-4253-bdf2-b71881d9d9f2-StreamThread-1] StreamThread - > stream-thread [e2e-963c5b74-0353-4253-bdf2-b71881d9d9f2-StreamThread-3] > Creating thread producer client{code} > Generally you would expect that the actual Logger prefix (the first thread > name, in this case StreamThread-1) is the same as the LogContext prefix (the > second thread name, ie the StreamThread-3 in this example). I dug into it and > figured out that this happens for all of the messages logged during the > StreamThread#create method, ie before the new thread is actually created. > What happened was StreamThread-1 had actually died, and started up a new > thread (StreamThread-3) to replace itself before shutting down. So we were > logging things _about_ StreamThread-3, but _from_ StreamThread-1. > While this doesn't necessarily harm anyone, it's quite confusing to see and > requires extensive knowledge of Streams to understand (a) that it's not a > bug, and (b) which thread the messages are actually referring to. It also > makes things harder to parse and read – for example I often filter logs on > the Logger prefix to gather everything related to a particular thread and eg > the clients it owns. The name of the currently executing thread is more > reliable and gathers everything whereas not every logger is configured with > the LogContext prefix (eg `stream-thread > [e2e-963c5b74-0353-4253-bdf2-b71881d9d9f2-StreamThread-3]`). > We should move things out of the static StreamThread#create method and into > the thread constructor to make the logging consistent and reliable. -- This message was sent by Atlassian Jira (v8.20.10#820010)