Thanks for the discussion! After skipping over KIP-456, I was also wondering what the overlap of both KIPs are, and if KIP-456 might actually subsume KIP-451.
If we all agree on this, I would recommend to discard KIP-451 in favor of KIP-456. I'll also follow up on KIP-456 DISCUSS thread to get some input about the overlap from their point of view. -Matthias On 4/25/19 2:35 AM, Patrik Kleindl wrote: > Hi > > As discussed, if the preferred option is to consume the records always I > will change both methods in KIP-451 accordingly and also switch them to > return a List. > This would be a bit redundant with Jukkas proposal in KIP-456 so the > question is if KIP-451 should be scraped in favor of KIP-456 which has more > powerful solution but will also need a bit more changes in tests. > On the other hand both are useful and wouldn't conflict as far as I can see. > > Any opinions? > > best regards > > Patrik > > On Thu, 25 Apr 2019 at 08:55, Jukka Karvanen <jukka.karva...@jukinimi.com> > wrote: > >> Hi, >> >> I played around with Patrick's KAFKA-8200 branch and I tested it with >> combined with my draft version of KIP-456. >> >> Some comments: >> These two version of iterableOutput methods are working now differently, if >> you reuse same fetched Iterable object after piping in new inputs. >> Version without serde will see the new input, but version with serdes has >> streamed the converted items already to new list and that's why >> not seeing the new item. Maybe it is intended to to fetch new Iterable each >> time, but the implementation is not mandating it. >> >> See example: >> >> https://github.com/jukkakarvanen/kafka/blob/KAFKA-8200withKIP-456/streams/test-utils/src/test/java/org/apache/kafka/streams/TopologyTestDriverIterableTest.java >> >> >> I have a lot of tests where I pipe the list of input and check list of >> output items, pipe more and check the new list. >> Now with this Iterable from the beginning is not very usable if you test >> like this in multiple batches. You need to reiterate same again. >> >> In KIP-456 readKeyValuesToList returns List same way ConsumerRecordFactory >> and that way this TestInputTopic is accepting List as input. >> This collection methods in TestOutputTopic are also consuming the messages. >> So you can mix the reading individual rows and collections. >> With List it is also easier to get the number of outputs compared to >> Iterable. >> >> Please, check out also DISCUSSion of KIP.456. I will post there the link to >> the current version of implementation and you can see if it fulfill also >> your need. >> >> Jukka >> >> >> la 20. huhtik. 2019 klo 1.11 Patrik Kleindl (pklei...@gmail.com) >> kirjoitti: >> >>> Hi Matthias >>> Seems I got a bit ahead of myself. >>> With option C my aim was a simple alternative which gives back all output >>> records that have happened up to this point (and which have not been >>> removed by calls to readOutput). >>> Based on that the user can decide how to step through or compare the >>> records. >>> >>> If you see it as more consistent if the new methods removed all returned >>> records then this can easily be done. >>> >>> But maybe the pick of Iterable was too narrow. >>> It would probably be a good fit to return a List or just a Collection >>> >>> Picking up John's naming suggestion this would make this: >>> >>> public Collection<ProducerRecord<byte[], byte[]>> readAllOutput(final >>> String topic) { >>> final Collection<ProducerRecord<byte[], byte[]>> outputRecords = >>> outputRecordsByTopic.get(topic); >>> if (outputRecords == null) { >>> return Collections.emptyList(); >>> } >>> outputRecordsByTopic.put(topic, new LinkedList<>()); >>> return outputRecords; >>> } >>> >>> With the semantics the same as readOutput = removing everything. >>> >>> Can be changed to a List if you think it matters that a user can query >>> some index directly. >>> >>> What do you think? >>> >>> best regards >>> >>> Patrik >>> >>> >>> >>> On Fri, 19 Apr 2019 at 03:04, Matthias J. Sax <matth...@confluent.io> >>> wrote: >>> >>>> I am not sure if (C) is the best option to pick. >>>> >>>> What is the reasoning to suggest (C) over the other options? >>>> >>>> It seems that users cannot clear buffered output using option (C). This >>>> might it make difficult to write tests. >>>> >>>> The original Jira tickets suggest: >>>> >>>>> which returns either an iterator or list over the records that are >>>> currently available in the topic >>>> >>>> This implies that the current buffer would be cleared when getting the >>>> iterator. >>>> >>>> Also, from my understanding, the idea of iterating in general, is to >>>> step through a finite collection of objects/elements. Hence, if >>>> `hasNext()` returns `false` is will never return `true` later on. >>>> >>>> As John mentioned, Java also has support for streams, that offer >>>> different semantics, that would align with option (C). However, I am >> not >>>> sure if this would be the test API to write tests? >>>> >>>> Thoughts? >>>> >>>> In any way: whatever semantics we pick, the KIP should explain them. >>>> Atm, this part is missing in the KIP. >>>> >>>> >>>> -Matthias >>>> >>>> On 4/18/19 12:47 PM, Patrik Kleindl wrote: >>>>> Hi John >>>>> >>>>> Thanks for your feedback >>>>> It's C, it does not consume the messages in contrast to the >> readOutput. >>>>> Is it a requirement to do so? >>>>> That's why I picked a different name so the difference is more >>>> noticeable. >>>>> I will add that to the JavaDoc. >>>>> >>>>> I see your point regarding future changes, that's why I linked >> KIP-456 >>>>> where such a method is proposed and would maybe allow to deprecate my >>>>> version in favor of a bigger solution. >>>>> >>>>> Hope that answers your questions >>>>> >>>>> best regards >>>>> Patrik >>>>> >>>>> >>>>> On Thu, 18 Apr 2019 at 19:46, John Roesler <j...@confluent.io> >> wrote: >>>>> >>>>>> Hi, Patrik, >>>>>> >>>>>> Thanks for this proposal! >>>>>> >>>>>> I have one question, which I didn't see addressed by the KIP. >>> Currently, >>>>>> when you call `readOutput`, it consumes the result (removes it from >>> the >>>>>> test driver's output). Does your proposed method: >>>>>> A: consume the whole output stream for that topic "atomically" when >> it >>>>>> returns the iterable? (i.e., two calls in a row would guarantee the >>>> second >>>>>> call is always an empty iterable?) >>>>>> B: consume each record when we iterate over it? (i.e., this is like >> a >>>>>> stream) If this is the case, is the returned object iterable once >>>> (uncached >>>>>> stream), or could we iterate over it repeatedly (cached stream)? >>>>>> C: not consume at all? (i.e., this is a view on the output topic, >> but >>> we >>>>>> need a separate method to consume/clear the output) >>>>>> D: something else? >>>>>> >>>>>> Also, one suggestion: maybe name the method "readAllOutput" or >>>> something. >>>>>> Specifically naming it "iterable" makes it awkward if we do want to >>>> tighten >>>>>> the return type (e.g., to List) in the future. This is something we >>> may >>>>>> actually want to do, if there's an easy way to say, "assert that the >>>> output >>>>>> equals [...some literal list...]". >>>>>> >>>>>> Thanks again! >>>>>> -John >>>>>> >>>>>> On Wed, Apr 17, 2019 at 4:01 PM Patrik Kleindl <pklei...@gmail.com> >>>> wrote: >>>>>> >>>>>>> Hi all >>>>>>> >>>>>>> Unless someone has objections I will start a VOTE thread tomorrow. >>>>>>> The KIP adds two methods to the TopologyTestDriver and has no >>> conflicts >>>>>> for >>>>>>> existing users. >>>>>>> PR https://github.com/apache/kafka/pull/6556 is already being >>>> reviewed. >>>>>>> >>>>>>> Side-note: >>>>>>> >>>>>>> >>>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver >>>>>>> will >>>>>>> provide a much larger solution for the TopologyTestDriver, but is >>> just >>>>>>> starting the discussion. >>>>>>> >>>>>>> best regards >>>>>>> >>>>>>> Patrik >>>>>>> >>>>>>> On Thu, 11 Apr 2019 at 22:14, Patrik Kleindl <pklei...@gmail.com> >>>> wrote: >>>>>>> >>>>>>>> Hi Matthias >>>>>>>> >>>>>>>> Thanks for the questions. >>>>>>>> >>>>>>>> Regarding the return type: >>>>>>>> Iterable offers the option of being used in a foreach loop >> directly >>>> and >>>>>>> it >>>>>>>> gives you access to the .iterator method, too. >>>>>>>> (ref: >>>>>>>> >>>>>>> >>>>>> >>>> >>> >> https://www.techiedelight.com/differences-between-iterator-and-iterable-in-java/ >>>>>>>> ) >>>>>>>> >>>>>>>> To return a List object would require an additional conversion >> and I >>>>>>> don't see the immediate benefit. >>>>>>>> >>>>>>>> Regarding the ordering: >>>>>>>> outputRecordsByTopic gives back a Queue >>>>>>>> >>>>>>>> private final Map<String, Queue<ProducerRecord<byte[], byte[]>>> >>>>>>> outputRecordsByTopic = new HashMap<>(); >>>>>>>> >>>>>>>> which has a LinkedList behind it >>>>>>>> >>>>>>>> outputRecordsByTopic.computeIfAbsent(record.topic(), k -> new >>>>>>> LinkedList<>()).add(record); >>>>>>>> >>>>>>>> So the order is handled by the linked list and should not be >>> modified >>>>>> by >>>>>>>> my changes, >>>>>>>> not even the .stream.map etc. (ref: >>>>>>>> >>>>>>> >>>>>> >>>> >>> >> https://stackoverflow.com/questions/30258566/java-stream-map-and-collect-order-of-resulting-container >>>>>>>> ) >>>>>>>> >>>>>>>> >>>>>>>> Then again, I am open to change it if people have some strong >>>>>> preference >>>>>>>> >>>>>>>> best regards >>>>>>>> >>>>>>>> Patrik >>>>>>>> >>>>>>>> >>>>>>>> On Thu, 11 Apr 2019 at 17:45, Matthias J. Sax < >>> matth...@confluent.io> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks for the KIP! >>>>>>>>> >>>>>>>>> Overall, this makes sense and can simplify testing. >>>>>>>>> >>>>>>>>> What I am wondering is, why you suggest to return an `Iterable`? >>>> Maybe >>>>>>>>> returning an `Iterator` would make more sense? Or a List? Note >> that >>>>>> the >>>>>>>>> order of emits matters, thus returning a generic `Collection` >> would >>>>>> not >>>>>>>>> seem to be appropriate. >>>>>>>>> >>>>>>>>> Can you elaborate on the advantages to use `Iterable` compared to >>> the >>>>>>>>> other options? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> On 4/11/19 2:09 AM, Patrik Kleindl wrote: >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> I would like to start the discussion on this small enhancement >> of >>>>>>>>>> the TopologyTestDriver. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable >>>>>>>>>> >>>>>>>>>> Pull request is available at >>>>>>> https://github.com/apache/kafka/pull/6556 >>>>>>>>>> >>>>>>>>>> Any feedback is welcome >>>>>>>>>> >>>>>>>>>> best regards >>>>>>>>>> >>>>>>>>>> Patrik >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >
signature.asc
Description: OpenPGP digital signature