Hi Jukka, thanks for the reply!

I think this is a good summary (the discussion was getting a little
unwieldy. I'll reply inline.

Also, thanks for clarify about your library vs. this KIP. That makes
perfect sense to me.
>
> 1. Add JavaDoc for KIP
>
> Is there a good example of KIP where Javadoc is included, so I can follow?
> I create this KIP based on this as an example::
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
>
>
> Now added some comments to KIP page to clarify timestamp handling, but I
> did not want to add full JavaDoc of each methods.
> Is that enough?

That's fine. I was just trying to make the review process more
efficient for other reviewers (which makes getting your KIP accepted
more efficient). I reviewed a few recent KIPs, and, indeed, I see that
javadocs are not actually as common as I thought.

> 2. TTD usability changes and swapping ConsumerRecord and ProducerRecord in
> APIs
>
> To my point of view only:
> - changing readRecord to return ConsumerRecord would cause we cannot use
> OutputVerifier

Yes, we'd likely have to provide new methods in OutputVerifier to work
with ConsumerRecord. If you buy into the plan of deprecating most of
the current-style interactions, this wouldn't be that confusing, since
all the ProducerRecord verifications would be deprecated, and only the
ConsumerRecord verifications would remain "live".

> - changing pipeInput to take in ProducerRecord, but not providing easy way
> to contruct those like ConsumerRecordFactory

I didn't follow this as well. The ConsumerRecordFactory is there
because it's a pain to construct ConsumerRecords. Conversely,
ProducerRecord has many convenience constructors, so we wouldn't need
a factory at all. This is a net win for users, since there's less
surface area for them to deal with. Under my proposal, we'd deprecate
the whole ConsumerRecordFactory.

Note that there's an "idea parity check" here: ConsumerRecords are
hard to construct because developers aren't meant to ever construct
them. They are meant to construct ProducerRecords, which is why it's
made easy. TTD has inverted the relationships of these classes, which
is why the ConsumerRecordFactory is necessary, but if we correct it,
and return to a "normal" interaction with the Client library, then we
don't need special support classes.

> - if initializing ConsumerRecord to/from  ProducerRecord  in these classes
> field by field contructor, there are risk new fields are not added to this
> classes if there are changes in ProducerRecord or ConsumerRecord

This risk seems pretty low, to be honest. We will have tests that
exercise this testing framework, so if anyone changes ProducerRecord
or ConsumerRecord, our tests will break. Since both libraries are
build together, the problem would be fixed before the change is ever
merged to trunk.

> I would propose a separate KIP for these and probably other enhanchements:
> -superclass or common interface for ConsumerRecord and ProducerRecord
> -contructors to ConsumerRecord and ProducerRecord to initialize with this
> superclass
> -modify OutputVerifier to work with both ConsumerRecord and ProducerRecord
> -create new RecordFactory to replace ConsumerRecordFactory

I understand your motivation to control the scope of this change, but
I actually think that it's better for user-facing design changes to
occur in fewer, bigger, chunks, rather than many small changes. People
will get fatigued if multiple releases in a row change the
test-support library from under their feet. Better to do it in one
shot.

Plus, this is a design discussion. We need to include the whole scope
of the system in the design, or we may realize in Phase 3 that there
was some design error in Phase 1, since we were only designing
locally. This doesn't mean that we only need one Jira ticket, there
can be many in support of this KIP, or that we only need one PR, it's
certainly better to send multiple small PRs to decrease risk and ease
reviews. But the design discussion doesn't need to be fragmented
similarly.

> 3. return null vs NoSuchElementException when empty queue
>
> Should this be also included to the above TTD usability changes?
> If single row read methods is changed to throw expectiong, it would require
> addition of hasRecords to able to verified the empty queue scenarios.
> I do not know how to implement it currently without modifying TTD to
> provide some kind way to get the queue size or peak items.

Yes, it's absolutely within bounds to propose changes to TTD to
support the ergonomic API you're proposing.

> 4. IllegalArgumentException("topic doesn't exist")
> Is this worth separate ticket?

This is your call. That was just an idea in response to your experience.

> 5. org.apache.kafka.streams.test vs org.apache.kafka.streams
>
> I was thinking org.apache.kafka.streams.test where also OutputVerifier and
> ConsumerRecordFactory exist would be more logical place, but
> I do not know is there some technical reasons why TTD are in
> org.apache.kafka.streams, not in org.apache.kafka.streams.test where other
> classes are.
>
> Did I skip something?

Ah, no, you're right. I'm not sure why that is. I admit it's
confusing. I don't think the package matters *that* much, just keep it
wherever you think is appropriate.


That's all! Thanks for entertaining my thoughts.
-John

Reply via email to