Re: [DISCUSS] KIP-451: Make TopologyTestDriver output iterable

Patrik Kleindl Thu, 25 Apr 2019 02:37:11 -0700

Hi

As discussed, if the preferred option is to consume the records always I
will change both methods in KIP-451 accordingly and also switch them to
return a List.
This would be a bit redundant with Jukkas proposal in KIP-456 so the
question is if KIP-451 should be scraped in favor of KIP-456 which has more
powerful solution but will also need a bit more changes in tests.
On the other hand both are useful and wouldn't conflict as far as I can see.


Any opinions?

best regards

Patrik

On Thu, 25 Apr 2019 at 08:55, Jukka Karvanen <[email protected]>
wrote:

> Hi,
>
> I played around with Patrick's KAFKA-8200 branch and I tested it with
> combined with my draft version of KIP-456.
>
> Some comments:
> These two version of iterableOutput methods are working now differently, if
> you reuse same fetched Iterable object after piping in new inputs.
> Version without serde will see the new input, but version with serdes has
> streamed the converted items already to new list and that's why
> not seeing the new item. Maybe it is intended to to fetch new Iterable each
> time, but the implementation is not mandating it.
>
> See example:
>
> https://github.com/jukkakarvanen/kafka/blob/KAFKA-8200withKIP-456/streams/test-utils/src/test/java/org/apache/kafka/streams/TopologyTestDriverIterableTest.java
>
>
> I have a lot of tests where I pipe the list of input and check list of
> output items, pipe more and check the new list.
> Now with this Iterable from the beginning is not very usable if you test
> like this in multiple batches. You need to reiterate same again.
>
> In KIP-456 readKeyValuesToList returns List same way ConsumerRecordFactory
> and that way this TestInputTopic is accepting List as input.
> This collection methods in TestOutputTopic are also consuming the messages.
> So you can mix the reading individual rows and collections.
> With List it is also easier to get the number of outputs compared to
> Iterable.
>
> Please, check out also DISCUSSion of KIP.456. I will post there the link to
> the current version of implementation and you can see if it fulfill also
> your need.
>
> Jukka
>
>
> la 20. huhtik. 2019 klo 1.11 Patrik Kleindl ([email protected])
> kirjoitti:
>
> > Hi Matthias
> > Seems I got a bit ahead of myself.
> > With option C my aim was a simple alternative which gives back all output
> > records that have happened up to this point (and which have not been
> > removed by calls to readOutput).
> > Based on that the user can decide how to step through or compare the
> > records.
> >
> > If you see it as more consistent if the new methods removed all returned
> > records then this can easily be done.
> >
> > But maybe the pick of Iterable was too narrow.
> > It would probably be a good fit to return a List or just a Collection
> >
> > Picking up John's naming suggestion this would make this:
> >
> > public Collection<ProducerRecord<byte[], byte[]>> readAllOutput(final
> > String topic) {
> >     final Collection<ProducerRecord<byte[], byte[]>> outputRecords =
> > outputRecordsByTopic.get(topic);
> >     if (outputRecords == null) {
> >         return Collections.emptyList();
> >     }
> >     outputRecordsByTopic.put(topic, new LinkedList<>());
> >     return outputRecords;
> > }
> >
> > With the semantics the same as readOutput = removing everything.
> >
> > Can be changed to a List if you think it matters that a user can query
> > some index directly.
> >
> > What do you think?
> >
> > best regards
> >
> > Patrik
> >
> >
> >
> > On Fri, 19 Apr 2019 at 03:04, Matthias J. Sax <[email protected]>
> > wrote:
> >
> > > I am not sure if (C) is the best option to pick.
> > >
> > > What is the reasoning to suggest (C) over the other options?
> > >
> > > It seems that users cannot clear buffered output using option (C). This
> > > might it make difficult to write tests.
> > >
> > > The original Jira tickets suggest:
> > >
> > > > which returns either an iterator or list over the records that are
> > > currently available in the topic
> > >
> > > This implies that the current buffer would be cleared when getting the
> > > iterator.
> > >
> > > Also, from my understanding, the idea of iterating in general, is to
> > > step through a finite collection of objects/elements. Hence, if
> > > `hasNext()` returns `false` is will never return `true` later on.
> > >
> > > As John mentioned, Java also has support for streams, that offer
> > > different semantics, that would align with option (C). However, I am
> not
> > > sure if this would be the test API to write tests?
> > >
> > > Thoughts?
> > >
> > > In any way: whatever semantics we pick, the KIP should explain them.
> > > Atm, this part is missing in the KIP.
> > >
> > >
> > > -Matthias
> > >
> > > On 4/18/19 12:47 PM, Patrik Kleindl wrote:
> > > > Hi John
> > > >
> > > > Thanks for your feedback
> > > > It's C, it does not consume the messages in contrast to the
> readOutput.
> > > > Is it a requirement to do so?
> > > > That's why I picked a different name so the difference is more
> > > noticeable.
> > > > I will add that to the JavaDoc.
> > > >
> > > > I see your point regarding future changes, that's why I linked
> KIP-456
> > > > where such a method is proposed and would maybe allow to deprecate my
> > > > version in favor of a bigger solution.
> > > >
> > > > Hope that answers your questions
> > > >
> > > > best regards
> > > > Patrik
> > > >
> > > >
> > > > On Thu, 18 Apr 2019 at 19:46, John Roesler <[email protected]>
> wrote:
> > > >
> > > >> Hi, Patrik,
> > > >>
> > > >> Thanks for this proposal!
> > > >>
> > > >> I have one question, which I didn't see addressed by the KIP.
> > Currently,
> > > >> when you call `readOutput`, it consumes the result (removes it from
> > the
> > > >> test driver's output). Does your proposed method:
> > > >> A: consume the whole output stream for that topic "atomically" when
> it
> > > >> returns the iterable? (i.e., two calls in a row would guarantee the
> > > second
> > > >> call is always an empty iterable?)
> > > >> B: consume each record when we iterate over it? (i.e., this is like
> a
> > > >> stream) If this is the case, is the returned object iterable once
> > > (uncached
> > > >> stream), or could we iterate over it repeatedly (cached stream)?
> > > >> C: not consume at all? (i.e., this is a view on the output topic,
> but
> > we
> > > >> need a separate method to consume/clear the output)
> > > >> D: something else?
> > > >>
> > > >> Also, one suggestion: maybe name the method "readAllOutput" or
> > > something.
> > > >> Specifically naming it "iterable" makes it awkward if we do want to
> > > tighten
> > > >> the return type (e.g., to List) in the future. This is something we
> > may
> > > >> actually want to do, if there's an easy way to say, "assert that the
> > > output
> > > >> equals [...some literal list...]".
> > > >>
> > > >> Thanks again!
> > > >> -John
> > > >>
> > > >> On Wed, Apr 17, 2019 at 4:01 PM Patrik Kleindl <[email protected]>
> > > wrote:
> > > >>
> > > >>> Hi all
> > > >>>
> > > >>> Unless someone has objections I will start a VOTE thread tomorrow.
> > > >>> The KIP adds two methods to the TopologyTestDriver and has no
> > conflicts
> > > >> for
> > > >>> existing users.
> > > >>> PR https://github.com/apache/kafka/pull/6556 is already being
> > > reviewed.
> > > >>>
> > > >>> Side-note:
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver
> > > >>> will
> > > >>> provide a much larger solution for the TopologyTestDriver, but is
> > just
> > > >>> starting the discussion.
> > > >>>
> > > >>> best regards
> > > >>>
> > > >>> Patrik
> > > >>>
> > > >>> On Thu, 11 Apr 2019 at 22:14, Patrik Kleindl <[email protected]>
> > > wrote:
> > > >>>
> > > >>>> Hi Matthias
> > > >>>>
> > > >>>> Thanks for the questions.
> > > >>>>
> > > >>>> Regarding the return type:
> > > >>>> Iterable offers the option of being used in a foreach loop
> directly
> > > and
> > > >>> it
> > > >>>> gives you access to the .iterator method, too.
> > > >>>> (ref:
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://www.techiedelight.com/differences-between-iterator-and-iterable-in-java/
> > > >>>> )
> > > >>>>
> > > >>>> To return a List object would require an additional conversion
> and I
> > > >>> don't see the immediate benefit.
> > > >>>>
> > > >>>> Regarding the ordering:
> > > >>>> outputRecordsByTopic gives back a Queue
> > > >>>>
> > > >>>> private final Map<String, Queue<ProducerRecord<byte[], byte[]>>>
> > > >>> outputRecordsByTopic = new HashMap<>();
> > > >>>>
> > > >>>> which has a LinkedList behind it
> > > >>>>
> > > >>>> outputRecordsByTopic.computeIfAbsent(record.topic(), k -> new
> > > >>> LinkedList<>()).add(record);
> > > >>>>
> > > >>>> So the order is handled by the linked list and should not be
> > modified
> > > >> by
> > > >>>> my changes,
> > > >>>> not even the .stream.map etc. (ref:
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://stackoverflow.com/questions/30258566/java-stream-map-and-collect-order-of-resulting-container
> > > >>>> )
> > > >>>>
> > > >>>>
> > > >>>> Then again, I am open to change it if people have some strong
> > > >> preference
> > > >>>>
> > > >>>> best regards
> > > >>>>
> > > >>>> Patrik
> > > >>>>
> > > >>>>
> > > >>>> On Thu, 11 Apr 2019 at 17:45, Matthias J. Sax <
> > [email protected]>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Thanks for the KIP!
> > > >>>>>
> > > >>>>> Overall, this makes sense and can simplify testing.
> > > >>>>>
> > > >>>>> What I am wondering is, why you suggest to return an `Iterable`?
> > > Maybe
> > > >>>>> returning an `Iterator` would make more sense? Or a List? Note
> that
> > > >> the
> > > >>>>> order of emits matters, thus returning a generic `Collection`
> would
> > > >> not
> > > >>>>> seem to be appropriate.
> > > >>>>>
> > > >>>>> Can you elaborate on the advantages to use `Iterable` compared to
> > the
> > > >>>>> other options?
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> -Matthias
> > > >>>>>
> > > >>>>> On 4/11/19 2:09 AM, Patrik Kleindl wrote:
> > > >>>>>> Hi everyone,
> > > >>>>>>
> > > >>>>>> I would like to start the discussion on this small enhancement
> of
> > > >>>>>> the TopologyTestDriver.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable
> > > >>>>>>
> > > >>>>>> Pull request is available at
> > > >>> https://github.com/apache/kafka/pull/6556
> > > >>>>>>
> > > >>>>>> Any feedback is welcome
> > > >>>>>>
> > > >>>>>> best regards
> > > >>>>>>
> > > >>>>>> Patrik
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] KIP-451: Make TopologyTestDriver output iterable

Reply via email to