Hi As discussed, if the preferred option is to consume the records always I will change both methods in KIP-451 accordingly and also switch them to return a List. This would be a bit redundant with Jukkas proposal in KIP-456 so the question is if KIP-451 should be scraped in favor of KIP-456 which has more powerful solution but will also need a bit more changes in tests. On the other hand both are useful and wouldn't conflict as far as I can see.
Any opinions? best regards Patrik On Thu, 25 Apr 2019 at 08:55, Jukka Karvanen <jukka.karva...@jukinimi.com> wrote: > Hi, > > I played around with Patrick's KAFKA-8200 branch and I tested it with > combined with my draft version of KIP-456. > > Some comments: > These two version of iterableOutput methods are working now differently, if > you reuse same fetched Iterable object after piping in new inputs. > Version without serde will see the new input, but version with serdes has > streamed the converted items already to new list and that's why > not seeing the new item. Maybe it is intended to to fetch new Iterable each > time, but the implementation is not mandating it. > > See example: > > https://github.com/jukkakarvanen/kafka/blob/KAFKA-8200withKIP-456/streams/test-utils/src/test/java/org/apache/kafka/streams/TopologyTestDriverIterableTest.java > > > I have a lot of tests where I pipe the list of input and check list of > output items, pipe more and check the new list. > Now with this Iterable from the beginning is not very usable if you test > like this in multiple batches. You need to reiterate same again. > > In KIP-456 readKeyValuesToList returns List same way ConsumerRecordFactory > and that way this TestInputTopic is accepting List as input. > This collection methods in TestOutputTopic are also consuming the messages. > So you can mix the reading individual rows and collections. > With List it is also easier to get the number of outputs compared to > Iterable. > > Please, check out also DISCUSSion of KIP.456. I will post there the link to > the current version of implementation and you can see if it fulfill also > your need. > > Jukka > > > la 20. huhtik. 2019 klo 1.11 Patrik Kleindl (pklei...@gmail.com) > kirjoitti: > > > Hi Matthias > > Seems I got a bit ahead of myself. > > With option C my aim was a simple alternative which gives back all output > > records that have happened up to this point (and which have not been > > removed by calls to readOutput). > > Based on that the user can decide how to step through or compare the > > records. > > > > If you see it as more consistent if the new methods removed all returned > > records then this can easily be done. > > > > But maybe the pick of Iterable was too narrow. > > It would probably be a good fit to return a List or just a Collection > > > > Picking up John's naming suggestion this would make this: > > > > public Collection<ProducerRecord<byte[], byte[]>> readAllOutput(final > > String topic) { > > final Collection<ProducerRecord<byte[], byte[]>> outputRecords = > > outputRecordsByTopic.get(topic); > > if (outputRecords == null) { > > return Collections.emptyList(); > > } > > outputRecordsByTopic.put(topic, new LinkedList<>()); > > return outputRecords; > > } > > > > With the semantics the same as readOutput = removing everything. > > > > Can be changed to a List if you think it matters that a user can query > > some index directly. > > > > What do you think? > > > > best regards > > > > Patrik > > > > > > > > On Fri, 19 Apr 2019 at 03:04, Matthias J. Sax <matth...@confluent.io> > > wrote: > > > > > I am not sure if (C) is the best option to pick. > > > > > > What is the reasoning to suggest (C) over the other options? > > > > > > It seems that users cannot clear buffered output using option (C). This > > > might it make difficult to write tests. > > > > > > The original Jira tickets suggest: > > > > > > > which returns either an iterator or list over the records that are > > > currently available in the topic > > > > > > This implies that the current buffer would be cleared when getting the > > > iterator. > > > > > > Also, from my understanding, the idea of iterating in general, is to > > > step through a finite collection of objects/elements. Hence, if > > > `hasNext()` returns `false` is will never return `true` later on. > > > > > > As John mentioned, Java also has support for streams, that offer > > > different semantics, that would align with option (C). However, I am > not > > > sure if this would be the test API to write tests? > > > > > > Thoughts? > > > > > > In any way: whatever semantics we pick, the KIP should explain them. > > > Atm, this part is missing in the KIP. > > > > > > > > > -Matthias > > > > > > On 4/18/19 12:47 PM, Patrik Kleindl wrote: > > > > Hi John > > > > > > > > Thanks for your feedback > > > > It's C, it does not consume the messages in contrast to the > readOutput. > > > > Is it a requirement to do so? > > > > That's why I picked a different name so the difference is more > > > noticeable. > > > > I will add that to the JavaDoc. > > > > > > > > I see your point regarding future changes, that's why I linked > KIP-456 > > > > where such a method is proposed and would maybe allow to deprecate my > > > > version in favor of a bigger solution. > > > > > > > > Hope that answers your questions > > > > > > > > best regards > > > > Patrik > > > > > > > > > > > > On Thu, 18 Apr 2019 at 19:46, John Roesler <j...@confluent.io> > wrote: > > > > > > > >> Hi, Patrik, > > > >> > > > >> Thanks for this proposal! > > > >> > > > >> I have one question, which I didn't see addressed by the KIP. > > Currently, > > > >> when you call `readOutput`, it consumes the result (removes it from > > the > > > >> test driver's output). Does your proposed method: > > > >> A: consume the whole output stream for that topic "atomically" when > it > > > >> returns the iterable? (i.e., two calls in a row would guarantee the > > > second > > > >> call is always an empty iterable?) > > > >> B: consume each record when we iterate over it? (i.e., this is like > a > > > >> stream) If this is the case, is the returned object iterable once > > > (uncached > > > >> stream), or could we iterate over it repeatedly (cached stream)? > > > >> C: not consume at all? (i.e., this is a view on the output topic, > but > > we > > > >> need a separate method to consume/clear the output) > > > >> D: something else? > > > >> > > > >> Also, one suggestion: maybe name the method "readAllOutput" or > > > something. > > > >> Specifically naming it "iterable" makes it awkward if we do want to > > > tighten > > > >> the return type (e.g., to List) in the future. This is something we > > may > > > >> actually want to do, if there's an easy way to say, "assert that the > > > output > > > >> equals [...some literal list...]". > > > >> > > > >> Thanks again! > > > >> -John > > > >> > > > >> On Wed, Apr 17, 2019 at 4:01 PM Patrik Kleindl <pklei...@gmail.com> > > > wrote: > > > >> > > > >>> Hi all > > > >>> > > > >>> Unless someone has objections I will start a VOTE thread tomorrow. > > > >>> The KIP adds two methods to the TopologyTestDriver and has no > > conflicts > > > >> for > > > >>> existing users. > > > >>> PR https://github.com/apache/kafka/pull/6556 is already being > > > reviewed. > > > >>> > > > >>> Side-note: > > > >>> > > > >>> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver > > > >>> will > > > >>> provide a much larger solution for the TopologyTestDriver, but is > > just > > > >>> starting the discussion. > > > >>> > > > >>> best regards > > > >>> > > > >>> Patrik > > > >>> > > > >>> On Thu, 11 Apr 2019 at 22:14, Patrik Kleindl <pklei...@gmail.com> > > > wrote: > > > >>> > > > >>>> Hi Matthias > > > >>>> > > > >>>> Thanks for the questions. > > > >>>> > > > >>>> Regarding the return type: > > > >>>> Iterable offers the option of being used in a foreach loop > directly > > > and > > > >>> it > > > >>>> gives you access to the .iterator method, too. > > > >>>> (ref: > > > >>>> > > > >>> > > > >> > > > > > > https://www.techiedelight.com/differences-between-iterator-and-iterable-in-java/ > > > >>>> ) > > > >>>> > > > >>>> To return a List object would require an additional conversion > and I > > > >>> don't see the immediate benefit. > > > >>>> > > > >>>> Regarding the ordering: > > > >>>> outputRecordsByTopic gives back a Queue > > > >>>> > > > >>>> private final Map<String, Queue<ProducerRecord<byte[], byte[]>>> > > > >>> outputRecordsByTopic = new HashMap<>(); > > > >>>> > > > >>>> which has a LinkedList behind it > > > >>>> > > > >>>> outputRecordsByTopic.computeIfAbsent(record.topic(), k -> new > > > >>> LinkedList<>()).add(record); > > > >>>> > > > >>>> So the order is handled by the linked list and should not be > > modified > > > >> by > > > >>>> my changes, > > > >>>> not even the .stream.map etc. (ref: > > > >>>> > > > >>> > > > >> > > > > > > https://stackoverflow.com/questions/30258566/java-stream-map-and-collect-order-of-resulting-container > > > >>>> ) > > > >>>> > > > >>>> > > > >>>> Then again, I am open to change it if people have some strong > > > >> preference > > > >>>> > > > >>>> best regards > > > >>>> > > > >>>> Patrik > > > >>>> > > > >>>> > > > >>>> On Thu, 11 Apr 2019 at 17:45, Matthias J. Sax < > > matth...@confluent.io> > > > >>>> wrote: > > > >>>> > > > >>>>> Thanks for the KIP! > > > >>>>> > > > >>>>> Overall, this makes sense and can simplify testing. > > > >>>>> > > > >>>>> What I am wondering is, why you suggest to return an `Iterable`? > > > Maybe > > > >>>>> returning an `Iterator` would make more sense? Or a List? Note > that > > > >> the > > > >>>>> order of emits matters, thus returning a generic `Collection` > would > > > >> not > > > >>>>> seem to be appropriate. > > > >>>>> > > > >>>>> Can you elaborate on the advantages to use `Iterable` compared to > > the > > > >>>>> other options? > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> -Matthias > > > >>>>> > > > >>>>> On 4/11/19 2:09 AM, Patrik Kleindl wrote: > > > >>>>>> Hi everyone, > > > >>>>>> > > > >>>>>> I would like to start the discussion on this small enhancement > of > > > >>>>>> the TopologyTestDriver. > > > >>>>>> > > > >>>>>> > > > >>>>> > > > >>> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable > > > >>>>>> > > > >>>>>> Pull request is available at > > > >>> https://github.com/apache/kafka/pull/6556 > > > >>>>>> > > > >>>>>> Any feedback is welcome > > > >>>>>> > > > >>>>>> best regards > > > >>>>>> > > > >>>>>> Patrik > > > >>>>>> > > > >>>>> > > > >>>>> > > > >>> > > > >> > > > > > > > > > > > > >