+1

On Sat, May 16, 2020 at 9:55 AM Konstantine Karantasis <
konstant...@confluent.io> wrote:

> Hi Arjun,
>
> I think I agree with you that subject is interesting. Yet, I feel it
> belongs to a separate future KIP. Reading the proposal in the KIP format
> will help, at least myself, to understand it better.
>
> Having said that, for the purpose of simplifying error handling for sink
> tasks, the discussion on KIP-610 has made some good progress on the mailing
> list. If the few open items are reflected on the proposal, maybe it'd be
> even worthwhile to consider it for inclusion in the upcoming release with
> its current scope.
>
> Konstantine
>
>
> On Fri, May 15, 2020 at 7:44 PM Arjun Satish <arjun.sat...@gmail.com>
> wrote:
>
> > I'm kinda hoping that we get to an approach on how to extend the Connect
> > framework. Adding parameters in the put method is nice, and maybe works
> for
> > now, but I'm not sure how scalable it is. It'd great to be able to add
> more
> > functionality in the future. Couple of examples:
> >
> > - make the metrics registry available to a task, so they can report task
> > level metrics or
> > - be able to pass in a RestExtension handle to the task, so the task can
> > provide a rest endpoint which users can hit to get some task level
> > information (about its status, health, for example)
> >
> > In such scenarios, maybe adding new parameters to existing methods may
> not
> > be immediately acceptable.
> >
> > Since we are very close to a deadline, I wanted to check if the one more
> > possibility is acceptable :-)
> >
> > What if we could create a library that could be used by connector to
> > independently integrated by connector developers in their connectors. The
> > library would be packaged and shipped with their connector like any other
> > library on maven (and other similar repositories). The new module would
> be
> > in the AK project, but its jars will *not* be added to classpath for
> > Connect worker.
> >
> > The library would provide a public interface for an error reporter, which
> > provides both synchronous and asynchronous functionalities (as was
> brought
> > up above).
> >
> > This would be an independent library, they can be easily bundled and
> loaded
> > with the other connectors. The connect framework will be decoupled from
> > this utility.
> >
> > I understand that a similar option is in the rejected alternatives,
> mostly
> > because of configuration overhead, but the configuration required here
> can
> > come directly from the worker properties (and just be copy pasted from
> > there, maybe with a prefix). and I wonder (if maybe part as a future
> KIP),
> > we can evaluate a strategy where certain worker configs can be passed to
> a
> > connector (for example, the producer/consume/admin ones), so end users do
> > not have to.
> >
> > Overall, we would get clean APIs, contracts and developers get freedom to
> > use these libraries and functionalities however they want. The only
> > drawback is how this is configured (end-users will have to add more lines
> > in the json/properties files). But all configs can simply come from
> worker,
> > I believe this is relatively minor issue. We should be able to work out
> > compatibility issues in the implementations, so that the library can
> safely
> > run (and degrade functionality if needed) with old workers.
> >
> >
> > On Fri, May 15, 2020 at 7:04 PM Aakash Shah <as...@confluent.io> wrote:
> >
> > > Just wanted to clarify that I am on board with adding the overloaded
> > > put(...) method.
> > >
> > > Thanks,
> > > Aakash
> > >
> > > On Fri, May 15, 2020 at 7:00 PM Aakash Shah <as...@confluent.io>
> wrote:
> > >
> > > > Hi Randall and Konstantine,
> > > >
> > > > As Chris and Arjun mentioned, I think the main concern is the
> potential
> > > > gap in which developers don't implement the deprecated method due to
> a
> > > > misunderstanding of use cases. Using the setter method approach
> ensures
> > > > that the developer won't break backwards compatibility when using the
> > new
> > > > method due to a mistake. That being said, I think the value added in
> > > > clarity of contract of when the error reporter will be invoked and
> > > overall
> > > > aesthetic while maintaining backwards compatibility outweighs the
> > > potential
> > > > mistake of a developer in not implementing the original put(...)
> > method.
> > > >
> > > > With respect to synchrony, I agree with Konstantine's point, that we
> > > > should make it an opt-in feature of making the reporter only
> > synchronous.
> > > > At the same time, I believe it is important to relieve as much of the
> > > > burden of implementation as possible from the developer in this case,
> > and
> > > > thus I think using a Callback rather than a Future would be easier on
> > the
> > > > developer, while adding asynchronous functionality with the ability
> to
> > > > opt-in synchronous functionality. I also believe making it opt-in
> > > > synchronous vs. the other way simplifies implementation for the
> > developer
> > > > (blocking vs creating a new thread).
> > > >
> > > > Please let me know your thoughts. I would like to come to a consensus
> > > soon
> > > > due to the AK 2.6 deadlines; I will then shortly update the KIP and
> > > start a
> > > > vote.
> > > >
> > > > Thanks,
> > > > Aakash
> > > >
> > > > On Fri, May 15, 2020 at 2:24 PM Randall Hauch <rha...@gmail.com>
> > wrote:
> > > >
> > > >> On Fri, May 15, 2020 at 3:13 PM Arjun Satish <
> arjun.sat...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Couple of thoughts:
> > > >> >
> > > >> > 1. If we add new parameters to put(..), and new connectors
> implement
> > > >> only
> > > >> > this method, it makes them backward incompatible with older
> > workers. I
> > > >> > think newer connectors may only choose to only implement the
> latest
> > > >> method,
> > > >> > and we are passing the compatibility problems back to the
> connector
> > > >> > developers.
> > > >> >
> > > >>
> > > >> New connectors would have to implement both if they want to run in
> > older
> > > >> runtimes.
> > > >>
> > > >>
> > > >> > 2. if we deprecate the older put() method and eventually remove
> it,
> > > then
> > > >> > old connectors are forward incompatible. If we are not going to
> > remove
> > > >> it,
> > > >> > then maybe we should not deprecate it?
> > > >> >
> > > >>
> > > >> I don't think we'll ever remove deprecated methods -- there's no
> > reason
> > > to
> > > >> cut off older connectors.
> > > >>
> > > >>
> > > >> > 3. if a record is realized to be erroneous outside put() (say, in
> > > flush
> > > >> or
> > > >> > preCommit), how will it be reported?
> > > >> >
> > > >>
> > > >> This is a concern no matter how the reporter is passed to the task.
> > > >> Actually, I think it's more clear that the reporter passed through
> > > >> `put(...)` should be used to record errors on the SinkRecords passed
> > in
> > > >> the
> > > >> same method call.
> > > >>
> > > >>
> > > >> >
> > > >> > I do think the concern over aesthetics is an important one, but
> the
> > > >> > trade-off here is to exclude many connectors that are out there
> from
> > > >> > running on worker versions. there may be production deployments
> that
> > > >> need
> > > >> > one old and one new connector that now cannot work on any version
> > of a
> > > >> > single worker. Building connectors is complex, and it's kinda
> unfair
> > > to
> > > >> > expect folks to make changes over aesthetic reasons alone. This is
> > > >> probably
> > > >> > the reason why popular framework APIs very rarely (and probably
> > never)
> > > >> > change.
> > > >> >
> > > >>
> > > >> I don't see how passing the reporter through an overloaded
> `put(...)`
> > is
> > > >> less backward compatible. Because the runtime provides the SinkTask
> > base
> > > >> class, the runtime has control over what the methods do by default.
> > > >>
> > > >>
> > > >> >
> > > >> > Overall, yes, the "public void
> > > >> errantRecordReporter(BiConsumer<SinkRecord,
> > > >> > Throwable> reporter) {}" proposal in the original KIP is somewhat
> > of a
> > > >> > mouthful, but are there are any simpler alternatives that do not
> > > exclude
> > > >> > existing connectors, adding operational burdens and yet provide a
> > > clean
> > > >> > contract?
> > > >> >
> > > >>
> > > >> IMO, overloading `put(...)` is cleaner and easier to understand --
> > plus
> > > >> the
> > > >> other benefits in my earlier email.
> > > >>
> > > >>
> > > >> >
> > > >> > Best,
> > > >> >
> > > >> > PS: Apologies if the language is incorrect or some points are
> > unclear.
> > > >> >
> > > >> > On Fri, May 15, 2020 at 12:02 PM Randall Hauch <rha...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > On Fri, May 15, 2020 at 1:45 PM Konstantine Karantasis <
> > > >> > > konstant...@confluent.io> wrote:
> > > >> > >
> > > >> > > > Thanks for the quick response Aakash.
> > > >> > > >
> > > >> > > > To your last point, modern APIs like this tend to be
> > asynchronous
> > > >> (see
> > > >> > > > admin, producer in Kafka) and such definition results in more
> > > >> > expressive
> > > >> > > > and well defined APIs.
> > > >> > > >
> > > >> > >
> > > >> > > +1
> > > >> > >
> > > >> > >
> > > >> > > > What you describe is easily an opt-in feature for the
> connector
> > > >> > > developer.
> > > >> > > > At the same time, the latest description above, gives us
> better
> > > >> chances
> > > >> > > for
> > > >> > > > this API to remain like this for longer, because it covers
> both
> > > the
> > > >> > sync
> > > >> > > > and async per `put` user cases.
> > > >> > >
> > > >> > >
> > > >> > > +1
> > > >> > >
> > > >> > >
> > > >> > > > Given how simple the sync implementation
> > > >> > > > is, just by complying with the return type of the method, I
> > still
> > > >> think
> > > >> > > the
> > > >> > > > BiFunction definition that returns a Future makes sense.
> > > >> > > >
> > > >> > > > Konstantine
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > On Fri, May 15, 2020 at 11:27 AM Aakash Shah <
> > as...@confluent.io>
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Thanks for the additional feedback.
> > > >> > > > >
> > > >> > > > > I see the benefits of adding an overloaded put(...) over
> > > >> alternatives
> > > >> > > > and I
> > > >> > > > > am on board going forward with this approach. It will
> > definitely
> > > >> set
> > > >> > > > forth
> > > >> > > > > a contract of where the reporter will be used with better
> > > >> aesthetics.
> > > >> > > > >
> > > >> > > > > The original idea of going with a synchronous approach for
> the
> > > >> error
> > > >> > > > > reporter was to ease the connector developer's job
> interacting
> > > >> with
> > > >> > and
> > > >> > > > > handling the error reporter. The tradeoff for having a
> > > >> > synchronous-only
> > > >> > > > > reporter would be lower throughput on the reporter; this was
> > > >> thought
> > > >> > to
> > > >> > > > be
> > > >> > > > > fine since arguably most circumstances would not include
> > > >> consistently
> > > >> > > > large
> > > >> > > > > amounts of records being sent to the error reporter. Even if
> > > this
> > > >> was
> > > >> > > the
> > > >> > > > > case, an argument can be made that the lower throughput
> would
> > be
> > > >> of
> > > >> > > > > assistance in this case, as it would allow more time for the
> > > user
> > > >> to
> > > >> > > > > realize the connector is having records sent to the error
> > > reporter
> > > >> > > before
> > > >> > > > > many are sent. However, if we are strongly in favor of
> having
> > > the
> > > >> > > option
> > > >> > > > of
> > > >> > > > > asynchronous functionality available for the developer,
> then I
> > > am
> > > >> > fine
> > > >> > > > with
> > > >> > > > > that as well.
> > > >> > > > >
> > > >> > > > > Lastly, I am on board with changing the name to
> > > >> failedRecordReporter,
> > > >> > > > >
> > > >> > > > > Please let me know your thoughts.
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Aakash
> > > >> > > > >
> > > >> > > > > On Fri, May 15, 2020 at 9:10 AM Randall Hauch <
> > rha...@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > > >
> > > >> > > > > > Konstantine said:
> > > >> > > > > >
> > > >> > > > > > > I notice Randall also used BiFunction in his example, I
> > > >> wonder if
> > > >> > > > it's
> > > >> > > > > > for
> > > >> > > > > > > similar reasons.
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > > > Nope. Just a typo on my part.
> > > >> > > > > >
> > > >> > > > > > There appear to be three outstanding questions.
> > > >> > > > > >
> > > >> > > > > > First, Konstantine suggested calling this
> > > >> "failedRecordReporter". I
> > > >> > > > think
> > > >> > > > > > this is minor, but using this new name may be a bit more
> > > precise
> > > >> > and
> > > >> > > > I'd
> > > >> > > > > be
> > > >> > > > > > fine with this.
> > > >> > > > > >
> > > >> > > > > > Second, should the reporter method be synchronous? I think
> > the
> > > >> two
> > > >> > > > > options
> > > >> > > > > > are:
> > > >> > > > > >
> > > >> > > > > > 2a. Use `BiConsumer<SinkRecord, Throwable>` that returns
> > > nothing
> > > >> > and
> > > >> > > > > blocks
> > > >> > > > > > (at this time).
> > > >> > > > > > 2b. Use `BiFunction<SinkRecord, Throwable, Future<Void>>`
> > that
> > > >> > > returns
> > > >> > > > a
> > > >> > > > > > future that the user can optionally use to be synchronous.
> > > >> > > > > >
> > > >> > > > > > I do agree with Konstantine that option 2b gives us more
> > room
> > > >> for
> > > >> > > > future
> > > >> > > > > > semantic changes, and since the producer write is already
> > > >> > > asynchronous
> > > >> > > > > this
> > > >> > > > > > should be straightforward to implement. I think the
> concern
> > > >> here is
> > > >> > > > that
> > > >> > > > > if
> > > >> > > > > > the sink task does not *use* the future to make this
> > > >> synchronous,
> > > >> > it
> > > >> > > is
> > > >> > > > > > very possible that the error records could be written out
> of
> > > >> order
> > > >> > > (due
> > > >> > > > > to
> > > >> > > > > > retries). But this won't be an issue if the implementation
> > > uses
> > > >> > > > > > `max.in.flight.requests.per.connection=1` for writing the
> > > error
> > > >> > > > records.
> > > >> > > > > > It's a little less clear, but honestly IMO passing the
> > > reporter
> > > >> in
> > > >> > > the
> > > >> > > > > > `put(...)` method helps make this lambda easier to
> > understand,
> > > >> for
> > > >> > > some
> > > >> > > > > > strange reason. So unless there are good reasons to avoid
> > > this,
> > > >> I'd
> > > >> > > be
> > > >> > > > in
> > > >> > > > > > favor of 2b and returning a Future.
> > > >> > > > > >
> > > >> > > > > > Third, how do we pass the reporter lambda / method
> reference
> > > to
> > > >> the
> > > >> > > > task?
> > > >> > > > > > My proposal to pass the reporter via an overload
> `put(...)`
> > > >> still
> > > >> > is
> > > >> > > > the
> > > >> > > > > > most attractive to me, for several reasons:
> > > >> > > > > >
> > > >> > > > > > 3a. There's no need to pass the reporter separately *and*
> to
> > > >> > describe
> > > >> > > > the
> > > >> > > > > > changes in method call ordering.
> > > >> > > > > > 3b. As mentioned above, for some reason passing it via
> > > >> `put(...)`
> > > >> > > makes
> > > >> > > > > the
> > > >> > > > > > intent more clear that it be used when processing the
> > > >> SinkRecord,
> > > >> > and
> > > >> > > > > that
> > > >> > > > > > it shouldn't be used in `start(...)`, `preCommit(...)`,
> > > >> > > > > > `onPartitionsAssigned(...)`, or any of the other task
> > methods.
> > > >> As
> > > >> > > > Andrew
> > > >> > > > > > pointed out earlier, *describing* this in the KIP and in
> > > JavaDoc
> > > >> > will
> > > >> > > > be
> > > >> > > > > > tough to be exact yet succinct.
> > > >> > > > > > 3c. There is already precedence for evolving
> > > >> > > > > > `SourceTask.commitRecord(...)`, and the pattern is
> > identical.
> > > >> > > > > > 3d. Backward compatibility is easy to understand, and at
> the
> > > >> same
> > > >> > > time
> > > >> > > > > it's
> > > >> > > > > > pretty easy to describe what implementations that want to
> > take
> > > >> > > > advantage
> > > >> > > > > of
> > > >> > > > > > this feature should do.
> > > >> > > > > > 3e. Minimal changes to the interface: we're just *adding*
> > one
> > > >> > default
> > > >> > > > > > method that calls the existing method and deprecating the
> > > >> existing
> > > >> > > > > > `put(...)`.
> > > >> > > > > > 3f. Deprecating the existing `put(...)` makes it more
> clear
> > > in a
> > > >> > > > > > programmatic sense that new sink implementations should
> use
> > > the
> > > >> > > > reporter,
> > > >> > > > > > and that we recommend old sinks evolve to use it.
> > > >> > > > > >
> > > >> > > > > > Some of these benefits apply to some of the other
> > suggestions,
> > > >> but
> > > >> > I
> > > >> > > > > think
> > > >> > > > > > none of the other suggestions have all of these benefits.
> > For
> > > >> > > example,
> > > >> > > > > > overloading `initialize(...)` is more difficult since most
> > > sink
> > > >> > > > > connectors
> > > >> > > > > > don't override it and therefore would be less subject to
> > > >> > deprecations
> > > >> > > > > > warnings. Overloading `start(...)` is less attractive.
> > Adding
> > > a
> > > >> > > method
> > > >> > > > > IMO
> > > >> > > > > > shares the fewest of these benefits.
> > > >> > > > > >
> > > >> > > > > > The one disadvantage of this approach is that sink task
> > > >> > > implementations
> > > >> > > > > > can't rely upon the reporter upon startup. IMO that's an
> > > >> acceptable
> > > >> > > > > > tradeoff to get the cleaner and more explicit API,
> > especially
> > > if
> > > >> > the
> > > >> > > > API
> > > >> > > > > > contract is that Connect will pass the same reporter
> > instance
> > > to
> > > >> > each
> > > >> > > > > call
> > > >> > > > > > to `put(...)` on a single task instance.
> > > >> > > > > >
> > > >> > > > > > Best regards,
> > > >> > > > > >
> > > >> > > > > > Randall
> > > >> > > > > >
> > > >> > > > > > On Fri, May 15, 2020 at 6:59 AM Andrew Schofield <
> > > >> > > > > > andrew_schofi...@live.com>
> > > >> > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi,
> > > >> > > > > > > Randall's suggestion is really good. I think it gives
> the
> > > >> > > flexibility
> > > >> > > > > > > required and also
> > > >> > > > > > > keeps the interface the right way round.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > > Andrew Schofield
> > > >> > > > > > >
> > > >> > > > > > > On 15/05/2020, 02:07, "Aakash Shah" <
> as...@confluent.io>
> > > >> wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hi Randall,
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks for the feedback.
> > > >> > > > > > > >
> > > >> > > > > > > > 1. This is a great suggestion, but I find that adding
> an
> > > >> > > overloaded
> > > >> > > > > > > > put(...) which essentially deprecates the old put(...)
> > to
> > > >> only
> > > >> > be
> > > >> > > > > used
> > > >> > > > > > > when
> > > >> > > > > > > > a connector is deployed on older versions of Connect
> > adds
> > > >> > enough
> > > >> > > > of a
> > > >> > > > > > > > complication that could cause connectors to break if
> the
> > > old
> > > >> > > > put(...)
> > > >> > > > > > > > doesn't correctly invoke the overloaded put(...);
> either
> > > >> that,
> > > >> > or
> > > >> > > > it
> > > >> > > > > > will
> > > >> > > > > > > > add duplication of functionality across the two
> put(...)
> > > >> > > methods. I
> > > >> > > > > > think
> > > >> > > > > > > > the older method simplifies things with the idea that
> a
> > > >> > DLQ/error
> > > >> > > > > > > reporter
> > > >> > > > > > > > will or will not be passed into the method depending
> on
> > > the
> > > >> > > version
> > > >> > > > > of
> > > >> > > > > > > AK.
> > > >> > > > > > > > However, I also understand the aesthetic advantage of
> > this
> > > >> > method
> > > >> > > > vs
> > > >> > > > > > the
> > > >> > > > > > > > setter method, so I am okay with going in this
> direction
> > > if
> > > >> > > others
> > > >> > > > > > agree
> > > >> > > > > > > > with adding the overloaded put(...).
> > > >> > > > > > > >
> > > >> > > > > > > > 2. Yes, your assumption is correct. Yes, we can remove
> > the
> > > >> > "Order
> > > >> > > > of
> > > >> > > > > > > > Operations" if we go with the overloaded put(...)
> > > direction.
> > > >> > > > > > > >
> > > >> > > > > > > > 3. Great point, I will remove them from the KIP.
> > > >> > > > > > > >
> > > >> > > > > > > > 4. Yeah, accept(...) will be synchronous. I will
> change
> > it
> > > >> to
> > > >> > be
> > > >> > > > > > clearer,
> > > >> > > > > > > > thanks.
> > > >> > > > > > > >
> > > >> > > > > > > > 5. This KIP will use existing metrics as well
> introduce
> > > new
> > > >> > > > metrics.
> > > >> > > > > I
> > > >> > > > > > > will
> > > >> > > > > > > > update this section to fully specify the metrics.
> > > >> > > > > > > >
> > > >> > > > > > > > Please let me know what you think.
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > Aakash
> > > >> > > > > > > >
> > > >> > > > > > > > On Thu, May 14, 2020 at 3:52 PM Randall Hauch <
> > > >> > rha...@gmail.com>
> > > >> > > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Hi, Aakash.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks for the KIP. Connect does need an improved
> > > ability
> > > >> for
> > > >> > > > sink
> > > >> > > > > > > > > connectors to report individual records as being
> > > >> problematic,
> > > >> > > and
> > > >> > > > > > this
> > > >> > > > > > > > > integrates nicely with the existing DLQ feature.
> > > >> > > > > > > > >
> > > >> > > > > > > > > I also appreciate the desire to maintain
> compatibility
> > > so
> > > >> > that
> > > >> > > > > > > connectors
> > > >> > > > > > > > > can take advantage of this feature when deployed in
> a
> > > >> runtime
> > > >> > > > that
> > > >> > > > > > > supports
> > > >> > > > > > > > > this feature, but can safely and easily do without
> the
> > > >> > feature
> > > >> > > > when
> > > >> > > > > > > > > deployed to an older runtime. But I do understand
> > > Andrew's
> > > >> > > > concern
> > > >> > > > > > > about
> > > >> > > > > > > > > the aesthetics. Have you considered overloading the
> > > >> > `put(...)`
> > > >> > > > > method
> > > >> > > > > > > and
> > > >> > > > > > > > > adding the `reporter` as a second parameter?
> > Essentially
> > > >> it
> > > >> > > would
> > > >> > > > > add
> > > >> > > > > > > the
> > > >> > > > > > > > > one method (with proper JavaDoc) to `SinkTask` only:
> > > >> > > > > > > > >
> > > >> > > > > > > > > ```
> > > >> > > > > > > > >     public void put(Collection<SinkRecord> records,
> > > >> > > > > > > BiFunction<SinkRecord,
> > > >> > > > > > > > > Throwable> reporter) {
> > > >> > > > > > > > >         put(records);
> > > >> > > > > > > > >     }
> > > >> > > > > > > > > ```
> > > >> > > > > > > > > and the WorkerSinkTask would be changed to call
> > > >> > > `put(Collection,
> > > >> > > > > > > > > BiFunction)` instead.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Sink connector implementations that don't do
> anything
> > > >> > different
> > > >> > > > can
> > > >> > > > > > > still
> > > >> > > > > > > > > override `put(Collection)`, and it still works as
> > > before.
> > > >> > > > > Developers
> > > >> > > > > > > that
> > > >> > > > > > > > > want to change their sink connector implementations
> to
> > > >> > support
> > > >> > > > this
> > > >> > > > > > new
> > > >> > > > > > > > > feature would do the following, which would work in
> > > older
> > > >> and
> > > >> > > > newer
> > > >> > > > > > > Connect
> > > >> > > > > > > > > runtimes:
> > > >> > > > > > > > > ```
> > > >> > > > > > > > >     public void put(Collection<SinkRecord> records)
> {
> > > >> > > > > > > > >         put(records, null);
> > > >> > > > > > > > >     }
> > > >> > > > > > > > >     public void put(Collection<SinkRecord> records,
> > > >> > > > > > > BiFunction<SinkRecord,
> > > >> > > > > > > > > Throwable> reporter) {
> > > >> > > > > > > > >         // the normal `put(Collection)` logic goes
> > here,
> > > >> but
> > > >> > > can
> > > >> > > > > > > optionally
> > > >> > > > > > > > > use `reporter` if non-null
> > > >> > > > > > > > >     }
> > > >> > > > > > > > > ```
> > > >> > > > > > > > >
> > > >> > > > > > > > > I think this has all the same benefits of the
> current
> > > KIP,
> > > >> > but
> > > >> > > > > > > > > it's noticeably simpler and hopefully more
> > aesthetically
> > > >> > > > pleasing.
> > > >> > > > > > > > >
> > > >> > > > > > > > > As for Andrew's second concern about "the task can
> > send
> > > >> > errant
> > > >> > > > > > records
> > > >> > > > > > > to
> > > >> > > > > > > > > it within put(...)" being too restrictive. My guess
> is
> > > >> that
> > > >> > > this
> > > >> > > > > was
> > > >> > > > > > > more
> > > >> > > > > > > > > an attempt at describing the basic behavior, and
> less
> > > >> about
> > > >> > > > > requiring
> > > >> > > > > > > the
> > > >> > > > > > > > > reporter only being called within the `put(...)`
> > method
> > > >> and
> > > >> > not
> > > >> > > > by
> > > >> > > > > > > methods
> > > >> > > > > > > > > to which `put(...)` synchronously or asynchronously
> > > >> > delegates.
> > > >> > > > Can
> > > >> > > > > > you
> > > >> > > > > > > > > confirm whether my assumption is correct? If so,
> then
> > > >> perhaps
> > > >> > > my
> > > >> > > > > > > suggestion
> > > >> > > > > > > > > helps work around this issue as well, since there
> > would
> > > >> be no
> > > >> > > > > > > restriction
> > > >> > > > > > > > > on when the reporter is called, and the whole "Order
> > of
> > > >> > > > Operations"
> > > >> > > > > > > section
> > > >> > > > > > > > > could potentially be removed.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Third, it's not clear to me why the "Error Reporter
> > > >> Object"
> > > >> > > > > > subsection
> > > >> > > > > > > in
> > > >> > > > > > > > > the "Proposal" section lists the worker
> configuration
> > > >> > > properties
> > > >> > > > > that
> > > >> > > > > > > were
> > > >> > > > > > > > > previously introduced with
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-298%3A+Error+Handling+in+Connect
> > > >> > > > > > > > > .
> > > >> > > > > > > > > Maybe it's worth mentioning that the error reporter
> > > >> > > functionality
> > > >> > > > > > will
> > > >> > > > > > > > > reuse or build upon KIP-298, including reusing the
> > > >> > > configuration
> > > >> > > > > > > properties
> > > >> > > > > > > > > defined in KIP-298. But IIUC, the KIP does not
> propose
> > > >> > changing
> > > >> > > > any
> > > >> > > > > > > > > technical or semantic aspect of these configuration
> > > >> > properties,
> > > >> > > > and
> > > >> > > > > > > > > therefore the KIP would be more clear and succinct
> > > without
> > > >> > > them.
> > > >> > > > > > > *That* the
> > > >> > > > > > > > > error reporter will use these properties is part of
> > the
> > > UX
> > > >> > and
> > > >> > > > > > > therefore
> > > >> > > > > > > > > necessary to mention, but *how* it uses those
> > properties
> > > >> is
> > > >> > > > really
> > > >> > > > > up
> > > >> > > > > > > to
> > > >> > > > > > > > > the implementation.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Fourth, the "Synchrony" section has a sentence that
> is
> > > >> > > confusing,
> > > >> > > > > or
> > > >> > > > > > > not as
> > > >> > > > > > > > > clear as it could be.
> > > >> > > > > > > > >
> > > >> > > > > > > > >     "If a record is sent to the error reporter,
> > > >> processing of
> > > >> > > the
> > > >> > > > > > next
> > > >> > > > > > > > > errant record in accept(...) will not begin until
> the
> > > >> > producer
> > > >> > > > > > > successfully
> > > >> > > > > > > > > sends the errant record to Kafka."
> > > >> > > > > > > > >
> > > >> > > > > > > > > This sentence is a bit difficult to understand, but
> > IIUC
> > > >> this
> > > >> > > > > really
> > > >> > > > > > > just
> > > >> > > > > > > > > means that "accept(...)" will be synchronous and
> will
> > > >> block
> > > >> > > until
> > > >> > > > > the
> > > >> > > > > > > > > errant record has been successfully written to
> Kafka.
> > If
> > > >> so,
> > > >> > > > let's
> > > >> > > > > > say
> > > >> > > > > > > > > that. The rest of the paragraph is fine.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Finally, is this KIP proposing new metrics, or that
> > > >> existing
> > > >> > > > > metrics
> > > >> > > > > > > would
> > > >> > > > > > > > > be used to track the error reporter usage? If the
> > > former,
> > > >> > then
> > > >> > > > > please
> > > >> > > > > > > > > fully-specify what these metrics will be, similarly
> to
> > > how
> > > >> > > > metrics
> > > >> > > > > > are
> > > >> > > > > > > > > specified in
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-196%3A+Add+metrics+to+Kafka+Connect+framework
> > > >> > > > > > > > > .
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thoughts?
> > > >> > > > > > > > >
> > > >> > > > > > > > > Best regards,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Randall
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Mon, May 11, 2020 at 4:49 PM Andrew Schofield <
> > > >> > > > > > > > > andrew_schofi...@live.com>
> > > >> > > > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Hi Aakash,
> > > >> > > > > > > > > > Thanks for sorting out the replies to the mailing
> > > list.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > First, I do like the idea of improving error
> > reporting
> > > >> in
> > > >> > > sink
> > > >> > > > > > > > > connectors.
> > > >> > > > > > > > > > I'd like a simple
> > > >> > > > > > > > > > way to put bad records onto the DLQ.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > I think this KIP is considerably more complicated
> > than
> > > >> it
> > > >> > > > seems.
> > > >> > > > > > The
> > > >> > > > > > > > > > guidance on the
> > > >> > > > > > > > > > SinkTask.put() method is that it should send the
> > > records
> > > >> > > > > > > asynchronously
> > > >> > > > > > > > > > and immediately
> > > >> > > > > > > > > > return, so the task is likely to want to report
> > errors
> > > >> > > > > > asynchronously
> > > >> > > > > > > > > > too.  Currently the KIP
> > > >> > > > > > > > > > states that "the task can send errant records to
> it
> > > >> within
> > > >> > > > > > put(...)"
> > > >> > > > > > > and
> > > >> > > > > > > > > > that's too restrictive.
> > > >> > > > > > > > > > The task ought to be able to report any unflushed
> > > >> records,
> > > >> > > but
> > > >> > > > > the
> > > >> > > > > > > > > > synchronisation of this is going
> > > >> > > > > > > > > > to be tricky. I suppose the connector author needs
> > to
> > > >> make
> > > >> > > sure
> > > >> > > > > > that
> > > >> > > > > > > all
> > > >> > > > > > > > > > errant records have
> > > >> > > > > > > > > > been reported before returning control from
> > > >> > > SinkTask.flush(...)
> > > >> > > > > or
> > > >> > > > > > > > > perhaps
> > > >> > > > > > > > > > SinkTask.preCommit(...).
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > I think the interface is a little strange too. I
> can
> > > see
> > > >> > that
> > > >> > > > > this
> > > >> > > > > > > was
> > > >> > > > > > > > > > done so it's possible to deliver a connector
> > > >> > > > > > > > > > that supports error reporting but it can also work
> > in
> > > >> > earlier
> > > >> > > > > > > versions of
> > > >> > > > > > > > > > the KC runtime. But, the
> > > >> > > > > > > > > > pattern so far is that the task uses the methods
> of
> > > >> > > > > SinkTaskContext
> > > >> > > > > > > to
> > > >> > > > > > > > > > access utilities in the Kafka
> > > >> > > > > > > > > > Connect runtime, and I suggest that reporting a
> bad
> > > >> record
> > > >> > is
> > > >> > > > > such
> > > >> > > > > > a
> > > >> > > > > > > > > > utility. SinkTaskContext has
> > > >> > > > > > > > > > changed before when the configs() methods was
> added,
> > > so
> > > >> I
> > > >> > > think
> > > >> > > > > > > there is
> > > >> > > > > > > > > > precedent for adding a method.
> > > >> > > > > > > > > > The way the KIP adds a method to SinkTask that the
> > KC
> > > >> > runtime
> > > >> > > > > calls
> > > >> > > > > > > to
> > > >> > > > > > > > > > provide the error reporting utility
> > > >> > > > > > > > > > seems not to match what has gone before.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks,
> > > >> > > > > > > > > > Andrew
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On 11/05/2020, 19:05, "Aakash Shah" <
> > > as...@confluent.io
> > > >> >
> > > >> > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     I wasn't previously added to the dev mailing
> > list,
> > > >> so
> > > >> > I'd
> > > >> > > > > like
> > > >> > > > > > to
> > > >> > > > > > > > > post
> > > >> > > > > > > > > > my
> > > >> > > > > > > > > >     discussion with Andrew Schofield below for
> > > >> visibility
> > > >> > and
> > > >> > > > > > further
> > > >> > > > > > > > > >     discussion:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     Hi Andrew,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     Thanks for the reply. The main concern with
> this
> > > >> > approach
> > > >> > > > > would
> > > >> > > > > > > be
> > > >> > > > > > > > > its
> > > >> > > > > > > > > >     backward compatibility. I’ve highlighted the
> > > >> thoughts
> > > >> > > > around
> > > >> > > > > > the
> > > >> > > > > > > > > > backwards
> > > >> > > > > > > > > >     compatibility of the initial approach, please
> > let
> > > me
> > > >> > know
> > > >> > > > > what
> > > >> > > > > > > you
> > > >> > > > > > > > > > think.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     Thanks,
> > > >> > > > > > > > > >     Aakash
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> ____________________________________________________________________________________________________________________________
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     Hi,
> > > >> > > > > > > > > >     By adding a new method to the SinkContext
> > > interface
> > > >> in
> > > >> > > say
> > > >> > > > > > Kafka
> > > >> > > > > > > > > 2.6, a
> > > >> > > > > > > > > >     connector that calls it would require a Kafka
> > 2.6
> > > >> > connect
> > > >> > > > > > > runtime. I
> > > >> > > > > > > > > > don't
> > > >> > > > > > > > > >     quite see how that's a backward compatibility
> > > >> problem.
> > > >> > > It's
> > > >> > > > > > just
> > > >> > > > > > > that
> > > >> > > > > > > > > > new
> > > >> > > > > > > > > >     connectors need the latest interface. I might
> > not
> > > >> quite
> > > >> > > be
> > > >> > > > > > > > > > understanding,
> > > >> > > > > > > > > >     but I think it would be fine.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     Thanks,
> > > >> > > > > > > > > >     Andrew
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> ____________________________________________________________________________________________________________________________
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     Hi Andrew,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     I apologize for the way the reply was sent. I
> > just
> > > >> > > > subscribed
> > > >> > > > > > to
> > > >> > > > > > > the
> > > >> > > > > > > > > > dev
> > > >> > > > > > > > > >     mailing list so it should be resolved now.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     You are correct, new connectors would simply
> > > require
> > > >> > the
> > > >> > > > > latest
> > > >> > > > > > > > > > interface.
> > > >> > > > > > > > > >     However, we want to remove that requirement -
> in
> > > >> other
> > > >> > > > words,
> > > >> > > > > > we
> > > >> > > > > > > want
> > > >> > > > > > > > > > to
> > > >> > > > > > > > > >     allow the possibility that someone wants the
> > > latest
> > > >> > > > > > connector/to
> > > >> > > > > > > > > > upgrade to
> > > >> > > > > > > > > >     the latest version, but deploys it on an older
> > > >> version
> > > >> > of
> > > >> > > > AK.
> > > >> > > > > > > > > > Basically, we
> > > >> > > > > > > > > >     don't want to enforce the necessity of
> upgrading
> > > AK
> > > >> to
> > > >> > > get
> > > >> > > > > the
> > > >> > > > > > > latest
> > > >> > > > > > > > > >     interface. In the current approach, there
> would
> > be
> > > >> no
> > > >> > > issue
> > > >> > > > > of
> > > >> > > > > > > > > > deploying a
> > > >> > > > > > > > > >     new connector on an older version of AK, as
> the
> > > >> Connect
> > > >> > > > > > framework
> > > >> > > > > > > > > would
> > > >> > > > > > > > > >     simply not invoke the new method.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     Please let me know what you think and if I
> need
> > to
> > > >> > > clarify
> > > >> > > > > > > anything.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     Thanks,
> > > >> > > > > > > > > >     Aakash
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Reply via email to