Hi Ismael, Yes, it makes sense to do benchmark. My concern was based on the observation in KAFKA-3994 where we saw GC problem when creating new lists in the purgatory.
Thanks, Jiangjie (Becket) Qin On Fri, Mar 10, 2017 at 8:54 AM, Ismael Juma <ism...@juma.me.uk> wrote: > Hi Becket, > > Sorry for the delay and thanks for your comments. Comments inline. > > On Wed, Mar 1, 2017 at 8:59 PM, Becket Qin <becket....@gmail.com> wrote: > > > > The difference is that the user chooses the value type. They are free > to > > > choose a mutable or immutable type. A generic interceptor cannot mutate > > the > > > value because it doesn't know the type (and it could be immutable). One > > > could write an interceptor that checked the type of the value at > runtime > > > and did things based on that. But again, the user who creates the > record > > is > > > in control. > > > > > But there is no generic interceptor, right? The interceptor always takes > > specific K, V type. > > > > No, generic interceptors exist, are very useful and we use them at > Confluent. > > If we want to let the users control the mutability, users can always call > > headers.close() before calling producer.send() and that will force the > > interceptor to create new ProducerRecord object. > > > > If we expect users to call `close()`, then this is an important point to > document because interceptors would have to catch an exception or check if > the record has already been closed. I'm not sure if we should allow or > encourage that. > > > > If we do not allow the users to add headers on existing ProducerRecord > > objects, each interceptor who wants to add headers will have to create a > > new ProducerRecord. So we will have to create NUM_INTERCEPTOR times of > > ProducerRecord, if a producer is sending 100K messages per second, it > would > > be a lot of new objects, right? > > > > Note we are allocating one Header instance per header. If you are concerned > about the number of ProducerRecord instances, then it seems like you should > also be concerned about the number of Header instances, which is likely to > be a multiple of the former. In addition, if we use a Map to hold the > headers, we'll also need to allocate the map entry instances. This is not > to say that we should be unnecessarily wasteful, but it's worth > understanding what we're trying to achieve to make sure that we are not > solving the wrong bottleneck. > > It may make sense to do some benchmarking during the implementation phase > to verify that the solution achieves whatever performance goal we're aiming > for. > > Thanks, > Ismael >