re serialization, my concern is that serialization often accounts for a lot of the cycles spent before returning the future. It's not blocking per se, but it's the same effect from the caller's perspective.
Moreover, serde impls often block themselves, e.g. when fetching schemas from a registry. I suppose it's also possible to block in Interceptors (e.g. writing audit events or metrics), which happens before serdes iiuc. So any blocking in either of those plugins would block the send unless we queue first. So I think we want to queue first and do everything off-thread when using the new API, whatever that looks like. I just want to make sure we don't do that for clients that wouldn't expect it. Another consideration is exception handling. If we queue right away, we'll defer some exceptions that currently are thrown to the caller (before the future is returned). In the new API, the send() wouldn't throw any exceptions, and instead the future would fail. I think that might mean that a new method signature is required. Ryanne On Thu, May 13, 2021, 2:57 PM Nakamura <nakamura.mo...@gmail.com> wrote: > Hey Ryanne, > > I agree we should add an additional constructor (or else an additional > overload in KafkaProducer#send, but the new constructor would be easier to > understand) if we're targeting the "user provides the thread" approach. > > From looking at the code, I think we can keep record serialization on the > user thread, if we consider that an important part of the semantics of this > method. It doesn't seem like serialization depends on knowing the cluster, > I think it's incidental that it comes after the first "blocking" segment in > the method. > > Best, > Moses > > On Thu, May 13, 2021 at 2:38 PM Ryanne Dolan <ryannedo...@gmail.com> > wrote: > > > Hey Moses, I like the direction here. My thinking is that a single > > additional work queue, s.t. send() can enqueue and return, seems like the > > lightest touch. However, I don't think we can trivially process that > queue > > in an internal thread pool without subtly changing behavior for some > users. > > For example, users will often run send() in multiple threads in order to > > serialize faster, but that wouldn't work quite the same if there were an > > internal thread pool. > > > > For this reason I'm thinking we need to make sure any such changes are > > opt-in. Maybe a new constructor with an additional ThreadFactory > parameter. > > That would at least clearly indicate that work will happen off-thread, > and > > would require opt-in for the new behavior. > > > > Under the hood, this ThreadFactory could be used to create the worker > > thread that process queued sends, which could fan-out to per-partition > > threads from there. > > > > So then you'd have two ways to send: the existing way, where serde and > > interceptors and whatnot are executed on the calling thread, and the new > > way, which returns right away and uses an internal Executor. As you point > > out, the semantics would be identical in either case, and it would be > very > > easy for clients to switch. > > > > Ryanne > > > > On Thu, May 13, 2021, 9:00 AM Nakamura <nny...@gmail.com> wrote: > > > > > Hey Folks, > > > I just posted a new proposal > > > < > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=181306446 > > > > > > > in the wiki. I think we have an opportunity to improve the > > > KafkaProducer#send user experience. It would certainly make our lives > > > easier. Please take a look! There are two subproblems that I could > use > > > guidance on, so I would appreciate feedback on both of them. > > > Best, > > > Moses > > > > > >