John, the "dynamic routing" case is certainly interesting. However, your argument about "simulating brokers" and verify which topics are created does not really hold up IMHO, because the names of all internal topics are contained in the TopologyDescription, too.
To be fair, the `TopologyDescription` does not make is easy to get all those topic names, as one would need to iterator over all sub-topologies, all nodes (filter sources for output/internal-repartition topics) and all nodes to check if they have a state store etc. However, if this is the main use case, we could simply add methods to `TopologyDescription` like `inputTopics(), `outputTopics()`, and `internalTopics()` (or similar). Details could be discussed further. I agree that the motivation of the KIP is unclear: which topics should actually be returned? The proposed method name is `getOutputTopicNames()` while the current PR exposes all topics the application did write into. Adding the input topic names is even more confusing. Maybe Andy can clarify. The method name "allProducedTopics()" might work, but it would (by its name) exclude input topic (with the exception that one has a loop in the topology and for "through()" topics). It would also not contain "unused" topics (eg, a filter() prevents writing into an output topic). Not sure if there is a use case to get only the topic names of "used topics"? Overall, besides the "dynamic routing" case, I still no 100% sure if we should expose topic names at the TTD level but would rather extend the TopologyDescription. For the dynamic routing case, we could add a method to TTD that _only_ exposes those dynamic topic names, like `dynamicOutputTopic()`? But I guess it's best to wait for Andy to clarify. The discussion is getting somewhat speculative :) -Matthias On 4/15/20 9:13 PM, John Roesler wrote: > Hi Andy, > > Thanks for the KIP! > > To Matthias’s concern, I have to agree that the motivation doesn’t build a > particularly strong case for the KIP, and people often look back to these > documents to understand why something was done, and done a particular way. > So, it would be nice to flesh it out a bit more. I have some thoughts below > on how to do this. > > Matthias also raised a good point that many applications can get the same > information just by looking in the TopologyDescription. At the least, this > alternative deserves to be called out in the “Rejected Alternatives” section, > with an explanation of why the chosen approach is preferable. > > For my part, I can think of a couple of reasons to add the method to > TopologyTestDriver instead of TopologyDescription. > > First, although it’s rare, some output topics are determined dynamically by > the data. In particular, we offer sink nodes with a TopicNameExtractor. Such > topics could only be verified by looking at the actual topics used, rather > than the ones specified in the topology. > > Second is a more philosophical justification, that the TopologyTestDriver > performs some functions of KafkaStreams (processing, serving IQ), but also > some functions of the brokers (simulating input and output topics). A common > integration testing strategy is to run an application and then query the > brokers for the list of topics that the app creates/uses, to make sure the > app won’t run afoul of ACLs in production, or just to make sure it doesn’t > contain a bug that produces to unexpected topics, or any number of other > reasons. Your proposal allows the TTD to partly fill this role, which seems > natural considering its present broker-esque capabilities. > > As for my own feedback, I’m mostly concerned about the method name and > contract. The proposed name mentions “output” topics, but I think most people > would associate this with sink topics, probably not with repartition topics, > and certainly not with changelog topics. I’m sure this also crossed your > mind, and I’m afraid I don’t have a brilliant suggestion. Maybe > getProducedTopicNames? Or, we could include the input topics as well and go > with getTopicNames() as a listing of all the topics that the app “touches”? > > One other point, I’m actually mildly concerned about what you say regarding > the outputs to the repartition and changelog topics being captured and > verified later. I would consider the data in these topics to be a private > interface, and would strongly discourage user code to depend on it in any > way, including in tests. The reason is that we need to maintain the > flexibility to change how those topics are used at any time, and frequently > do to implement features, fix bugs, etc. if we needed a KIP ad deprecation > period for that data, we would be in for a lot of trouble. The only reason I > wouldn’t consider actually removing those topics from TTD is that they’re > exposed in the brokers anyway. But seeing this in the motivation gives me a > little heartburn. > > Finally, a minor point: the Javadoc you proposed contradicts your proposal > for Phase 2, in that the doc says it prints all the topics to which records > have been output, but Phase 2 says we’ll include all the topics from the > description up front, before any outputs happen. > > Anyway, thanks again for the KIP. It does seem useful to me, and I hope my > feedback helps speed you to a successful vote! > > Thanks, > John > > On Tue, Apr 14, 2020, at 19:49, Matthias J. Sax wrote: >> Andy, >> >> thanks for the KIP. The motivation is a little unclear to me: >> >>> This information allows all the outputs of a test run to be captured >>> without prior knowledge of the output topics. >> >> Given that the TTD users writes the `Topology` themselves, they should >> always have prior knowledge what output topics they use. So why would >> this be useful? >> >> Also, there is `Topology#describe()` to get all topic names (even the >> name of internal topics -- to be fair, changelog topic names are not >> exposed, only store names, but those can we used to infer changelog >> topic names, too). >> >> Can you elaborate about the motivation? So far, it's not convincing to me. >> >> >> >> -Matthias >> >> >> On 4/14/20 8:09 AM, Andy Coates wrote: >>> Hey all, >>> I would like to start off the discussion for KIP-594: >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-594%3A+Expose+output+topic+names+from+TopologyTestDriver >>> >>> This KIP proposes to expose the names of the topics a topology produces >>> records during a test run from the TopologyTestDriver class. >>> >>> Let me know your thoughts! >>> Andy >>> >> >> >> Attachments: >> * signature.asc
signature.asc
Description: OpenPGP digital signature