Hey Lincong,

Thanks for the explanation. Here are my concern with the current proposal:

1) The current APIs of RequestInfo/ResponseInfo only provide byte and count
number of ProduceRequest/FetchRespnse. With these limited AIPs, developers
will likely have to create new KIP and make change in Apache Kafka source
code in order to implement more advanced observer plugin, which would
considerably reduces the extensibility and customizability of observer
plugins:

Here are two use-cases that can be made possible if we can provide the raw
request/response to the observer interface:

- Get the number of bytes produced per source host. This is doable if
plugin can get the original ProduceRequest, deserialize request into Kafka
messages, and parse messages based on the schema of the message payload.

- Get the ApiVersion supported per producer/consumer IPs. In the future we
can add the version of client library in ApiVersionsRequest and observer
can monitor whether there is still client library that is using very old
version, and if so, what is their IP addresses.

2) It requires extra maintenance overhead for Apache Kafka developer to
maintain implementation of RequestInfo (e.g. bytes produced per topic),
which would not be necessary if we can just provide ProduceRequest to the
observer interface.

3) It is not clear why we need RequestInfo/ResponseInfo needs to be
interface rather than class. In general interface is needed when we expect
multiple different implementation of the interface. Can you provide some
idea why we need multiple implementations for RequestInfo?


Thanks,
Dong


On Sun, Nov 18, 2018 at 12:50 AM Lincong Li <andrewlinc...@gmail.com> wrote:

> Hi Dong and Patrick,
>
> Thank you very much for feedbacks! Firstly I want to clarify that the
> implementations of RequestInfo and ResponseInfo are *not* provided by
> user. Instead they will be provided as part of this KIP. In other words,
> "RequestAdapter" and "ResponseAdapter" are implementations of "RequestInfo"
> and "ResponseInfo" respectively. Users can figure out what information they
> can use in their implementation of the observer whereas how to extract
> those information is provided so that no internal interface is provided. In
> the future, if implementation of internal class (Response/Request) changes,
> "RequestAdapter" and "ResponseAdapter" should be changed accordingly.
>
> For Patrick,
> Yes, it would take some effort to figure out exactly what information
> should be exposed in order to find the best balance point between making
> getters generic enough so that it is not hindering future changes in
> internal classes and also specific enough so that most user's requirements
> should be satisfied. With that being said, I think the current set of
> getters should let users extract information that is very unlikely to be
> removed in the future. Moreover, "getProduceToTopicSizeInBytes/
> getProduceToTopicRecordCount/getFetchFromTopicPartitionSizeInBytes/
> getFetchFromTopicPartitionRecordCount" represent information of produce
> and consume which are probably the two most important, frequent, auditable
> and observable operations and most observer users might be interested in
> observing/auditing produce and consume operations. Those getters will need
> to be changed if producer or/and consumer changed their way of interacting
> with brokers drastically, which is not likely to happen. That's why I think
> those getters also both generic enough and specific enough. Still, which
> getters should be added to or subtracted from the current list could be
> discussed further.
>
> To response "Assume we choose to pre-define the info,  we don't need the
> RequestInfo/ResponseInfo to interface and then implement the adaptors
> because there will be only one implementation for the interfaces...", I
> think this is the same pattern as this Java Servlet
> <https://tomcat.apache.org/tomcat-5.5-doc/servletapi/javax/servlet/http/HttpServletRequest.html>
>  which
> exposes "HttpServletRequest" to let user know what information are exposed
> and still provides its implementation.
>
> For Dong,
> Your proposed approach is not obvious to me and I will need further study
> and reading through the source to understand. However, with my
> clarification on the fact that internal classes are not exposed, do you
> think my proposed approach has other major issues?
>
> Thanks again!
>
> Best regards,
> Lincong Li
>
>
>
>
>
> On Sat, Nov 17, 2018 at 6:01 PM Patrick Huang <hzx...@hotmail.com> wrote:
>
>> Hi Lincong,
>>
>> Thanks for the KIP.
>>
>> If I understand correctly, the reason for having RequestInfo and
>> ResponseInfo is that you want to pre-define what information can be exposed
>> to the observer. Users cannot implement RequestAdapter/ResponseAdapter
>> and they are just internal helper classes to extract pre-defined info for
>> the observer. This does decouple the internal classes
>> (AbstractResponse/RequestChannel.request) but my concerns are:
>>
>>    1.  Pre-defining the information that can be exposed to the observer
>>    is good for hiding the internal classes but may limit the use cases of the
>>    observer. Also, it makes it harder to evolve the RequestInfo/ResponseInfo
>>    if there is a need for the observer to access more information in the
>>    future because we will need to change the implementation of
>>    RequestAdapter/ResponseAdaptor and probably break the existing observers.
>>    In this case, I think Dong's suggestion is the right way to go.
>>    2.  If we know (for sure) that the information that can be exposed to
>>    the observer will never change, it is possible to pre-define them but I
>>    think you need to explain more about why this is the case. Assume we 
>> choose
>>    to pre-define the info,  we don't need the RequestInfo/ResponseInfo to
>>    interface and then implement the adaptors because there will be only one
>>    implementation for the interfaces provided internally to extract the info
>>    from the requests. We also don't need to separate RequestInfo and
>>    ResponseInfo. What we need is just one class (e.g. ObservedInfo) with
>>    getters for the pre-defined info and pass it to the observer (e.g. use
>>     public void record(ObserverdInfo observedInfo)). Also,
>>    getProduceToTopicSizeInBytes/getProduceToTopicRecordCount/
>>    
>> getFetchFromTopicPartitionSizeInBytes/getFetchFromTopicPartitionRecordCount
>>    are very specific to one type of request and we should make the
>>    pre-define info more generic if we choose to go for this route.
>>
>>
>>
>> Best,
>> Zhanxiang (Patrick) Huang
>>
>> ------------------------------
>> *From:* Dong Lin <lindon...@gmail.com>
>> *Sent:* Saturday, November 17, 2018 18:37
>> *To:* dev
>> *Subject:* Re: [DISCUSS] KIP-388 Add observer interface to record
>> request and response
>>
>> Hey Lincong,
>>
>> The main concern for the original version of the KIP is that, if we expose
>> internal classes such as RequestChannel.Request and AbstractResponse, it
>> will be difficult to make change to those internal classes in the future.
>>
>> In the latest version of the KIP, it seems that user is going to provide
>> implementation of classes such as RequestAdapter, ResponseAdapter, which
>> again interact with internal classes RequestChannel.Request and
>> AbstractResponse directly. So it seems that we still have the same issue
>> of
>> exposing the same internal classes to the user plugin implementation?
>>
>> Here is some high level idea of how to make this work without exposing
>> internal classes. Basically we want to have the same information as
>> currently contained in AbstractRequest, AbstractResponse and
>> RequestHeader.
>> These classes can be replaced with the following fields. The
>> transformation
>> between ByteBuffer and AbstractRequest/AbstractResponse/RequestHeader is
>> totally specified by the schema of each request/response which is already
>> public interface in Kafka. So this approach will not expose internal
>> classes.
>>
>> AbstractRequest -> int apiKeyId, short apiVersion, ByteBuffer buffer
>> AbstractResponse -> int apiKeyId, short apiVersion, ByteBuffer buffer
>> RequestHeader -> short requsetHeaderSchemaVersion, ByteBuffer buffer
>>
>> And user are also free to compile their plugin together with the Kafka
>> source code so that they can just do the following without re-writing the
>> serialization/deserialization logic:
>>
>> apiKey.parseRequest(apiVersion, buffer)
>> Struct struct = apiKey.parseRequest(apiVersion, buffer);
>> AbstractRequest requset = AbstractRequest.parseRequest(apiKey, apiVersion,
>> struct);
>>
>> Currently I am not sure whether it is going to be expensive to do
>> conversion between Struct and ByteBuffer. My understanding is that this is
>> not relatively expensive because the main CPU overhead in broker appears
>> to
>> be e.g. decompression, re-compression, SSL. But if it is expensive, it may
>> be reasonable to replace ByteBuffer with Struct, where Struct already
>> contain fields such as Schema. I have not carefully considered this
>> directly. The final solution probably needs to explicitly specify whether
>> we want to keep binary compatibility and source compatibility. Hopefully
>> this is good direction to move this KIP forward.
>>
>> Thanks,
>> Dong
>>
>> On Wed, Nov 14, 2018 at 10:41 AM Lincong Li <andrewlinc...@gmail.com>
>> wrote:
>>
>> > Hi Wesley,
>> >
>> > Thank you very much for your feedback. The concern on memory pressure is
>> > definitely valid. However it should be the user's job to keep this
>> concern
>> > in mind and implement the observer in the most reasonable way for their
>> use
>> > case. In other words, implement it at their own risks.
>> >
>> > The alternative approach to mitigate this concern is to implement
>> adapters
>> > in a way that it extracts all information required by all its getters at
>> > initialization when a reference to request/response is given so that
>> > references to request/response could be garbaged collected. However,
>> this
>> > proactive initialization might be wasteful. For example, methods such
>> as "
>> > public Map<TopicPartition, Long>
>> getFetchFromTopicPartitionSizeInBytes()"
>> > takes non-constant time.
>> >
>> > Best regards,
>> > Lincong Li
>> >
>> > On Tue, Nov 13, 2018 at 5:56 PM xiongqi wu <xiongq...@gmail.com> wrote:
>> >
>> > > Lincong,
>> > >
>> > > Thanks for the KIP.
>> > > I have a question about the lifecycle of request and response.
>> > > With the current (requestAdapter, responseAdapter) implementation,
>> > > the observer can potentially extend the lifecycle of request and
>> response
>> > > through adapter.
>> > > Anyone can implement own observer, and some observers may want to do
>> > async
>> > > process or batched processing.
>> > >
>> > > Could you clarify how could we make sure this do not increase the
>> memory
>> > > pressure on potentially holding large request/response object?
>> > >
>> > >
>> > >
>> > > Xiongqi (Wesley) Wu
>> > >
>> > >
>> > > On Mon, Nov 12, 2018 at 10:23 PM Lincong Li <andrewlinc...@gmail.com>
>> > > wrote:
>> > >
>> > > > Thanks Mayuresh, Ismael and Colin for your feedback!
>> > > >
>> > > > I updated the KIP basing on your feedback. The change is basically
>> that
>> > > two
>> > > > interfaces are introduced to prevent the internal classes from being
>> > > > exposed. These two interfaces contain getters that allow user to
>> > extract
>> > > > information from request and response in their own
>> implementation(s) of
>> > > the
>> > > > observer interface and they would not constraint future
>> implementation
>> > > > changes in neither RequestChannel.Request nor AbstractResponse.
>> There
>> > > could
>> > > > be more getters defined in these two interfaces. The implementation
>> of
>> > > > these two interfaces will be provided as part of the KIP.
>> > > >
>> > > > I also expanded on "Add code to the broker (in KafkaApis) to allow
>> > Kafka
>> > > > servers to invoke any
>> > > > observers defined. More specifically, change KafkaApis code to
>> invoke
>> > all
>> > > > defined observers, in the order in which they were defined, for
>> every
>> > > > request-response pair" by providing a sample code block which shows
>> how
>> > > > these interfaces are used in the KafkaApis class.
>> > > >
>> > > > Let me know if you have any question, concern, or comments. Thank
>> you
>> > > very
>> > > > much!
>> > > >
>> > > > Best regards,
>> > > > Lincong Li
>> > > >
>> > > > On Fri, Nov 9, 2018 at 10:34 AM Mayuresh Gharat <
>> > > > gharatmayures...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Lincong,
>> > > > >
>> > > > > Thanks for the KIP.
>> > > > >
>> > > > > As Colin pointed out, would it better to expose certain specific
>> > pieces
>> > > > of
>> > > > > information from the request/response like api key, request
>> headers,
>> > > > record
>> > > > > counts, client ID instead of the entire request/response objects ?
>> > This
>> > > > > enables us to change the request response apis independently of
>> this
>> > > > > pluggable public API, in future, unless you think we have a strong
>> > > reason
>> > > > > that we need to expose the request, response objects.
>> > > > >
>> > > > > Also, it would be great if you can expand on :
>> > > > > "Add code to the broker (in KafkaApis) to allow Kafka servers to
>> > invoke
>> > > > any
>> > > > > observers defined. More specifically, change KafkaApis code to
>> invoke
>> > > all
>> > > > > defined observers, in the order in which they were defined, for
>> every
>> > > > > request-response pair."
>> > > > > probably with an example of how you visualize it. It would help
>> the
>> > KIP
>> > > > to
>> > > > > be more concrete and easier to understand the end to end workflow.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Mayuresh
>> > > > >
>> > > > > On Thu, Nov 8, 2018 at 7:44 PM Ismael Juma <ism...@juma.me.uk>
>> > wrote:
>> > > > >
>> > > > > > I agree, the current KIP doesn't discuss the public API that we
>> > would
>> > > > be
>> > > > > > exposing and it's extensive if the normal usage would allow for
>> > > casting
>> > > > > > AbstractRequest into the various subclasses and potentially even
>> > > > > accessing
>> > > > > > Records and related for produce request.
>> > > > > >
>> > > > > > There are many use cases where this could be useful, but it
>> > requires
>> > > > > quite
>> > > > > > a bit of thinking around the APIs that we expose and the
>> expected
>> > > > usage.
>> > > > > >
>> > > > > > Ismael
>> > > > > >
>> > > > > > On Thu, Nov 8, 2018, 6:09 PM Colin McCabe <cmcc...@apache.org
>> > wrote:
>> > > > > >
>> > > > > > > Hi Lincong Li,
>> > > > > > >
>> > > > > > > I agree that server-side instrumentation is helpful.
>> However, I
>> > > > don't
>> > > > > > > think this is the right approach.
>> > > > > > >
>> > > > > > > The problem is that RequestChannel.Request and
>> AbstractResponse
>> > are
>> > > > > > > internal classes that should not be exposed.  These are
>> > > > implementation
>> > > > > > > details that we may change in the future.  Freezing these
>> into a
>> > > > public
>> > > > > > API
>> > > > > > > would really hold back the project.  For example, for really
>> > large
>> > > > > > > responses, we might eventually want to avoid materializing the
>> > > whole
>> > > > > > > response all at once.  It would make more sense to return it
>> in a
>> > > > > > streaming
>> > > > > > > fashion.  But if we need to support this API forever, we
>> can't do
>> > > > that.
>> > > > > > >
>> > > > > > > I think it's fair to say that this is, at best, half a
>> solution
>> > to
>> > > > the
>> > > > > > > problem of tracing requests.  Users still need to write the
>> > plugin
>> > > > code
>> > > > > > and
>> > > > > > > arrange for it to be on their classpath to make this work.  I
>> > think
>> > > > the
>> > > > > > > alternative here is not client-side instrumentation, but
>> simply
>> > > > making
>> > > > > > the
>> > > > > > > change to the broker without using a plugin interface.
>> > > > > > >
>> > > > > > > If a public interface is absolutely necessary here we should
>> > expose
>> > > > > only
>> > > > > > > things like the API key, client ID, time, etc. that don't
>> > constrain
>> > > > the
>> > > > > > > implementation a lot in the future.  I think we should also
>> use
>> > > java
>> > > > > here
>> > > > > > > to avoid the compatibility issues we have had with Scala APIs
>> in
>> > > the
>> > > > > > past.
>> > > > > > >
>> > > > > > > best,
>> > > > > > > Colin
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Nov 8, 2018, at 11:34, radai wrote:
>> > > > > > > > another downside to client instrumentation (beyond the
>> number
>> > of
>> > > > > > > > client codebases one would need to cover) is that in a large
>> > > > > > > > environments you'll have a very long tail of applications
>> using
>> > > > older
>> > > > > > > > clients to upgrade - it would be a long and disruptive
>> process
>> > > (as
>> > > > > > > > opposed to updating broker-side instrumentation)
>> > > > > > > > On Thu, Nov 8, 2018 at 11:04 AM Peter M. Elias <
>> > > > > petermel...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > I know we have a lot of use cases for this type of
>> > > functionality
>> > > > at
>> > > > > > my
>> > > > > > > > > enterprise deployment. I think it's helpful for
>> maintaining
>> > > > > > > reliability of
>> > > > > > > > > the cluster especially and identifying clients that are
>> not
>> > > > > properly
>> > > > > > > tuned
>> > > > > > > > > and therefore applying excessive load to the brokers.
>> > > > Additionally,
>> > > > > > > there
>> > > > > > > > > is a bit of a dark spot without something like as
>> currently.
>> > > For
>> > > > > > > example,
>> > > > > > > > > if a client is not using a consumer group, there is no
>> direct
>> > > way
>> > > > > to
>> > > > > > > query
>> > > > > > > > > the state of the consumer without looking at raw network
>> > > > > connections
>> > > > > > to
>> > > > > > > > > determine the extent of the traffic generated by that
>> > > particular
>> > > > > > > consumer.
>> > > > > > > > >
>> > > > > > > > > While client instrumentation can certainly help with this
>> > > > > currently,
>> > > > > > > given
>> > > > > > > > > that Kafka is intended to be a shared service across a
>> > > > potentially
>> > > > > > very
>> > > > > > > > > large surface area of clients, central observation of
>> client
>> > > > > activity
>> > > > > > > is in
>> > > > > > > > > my opinion an essential feature.
>> > > > > > > > >
>> > > > > > > > > Peter
>> > > > > > > > >
>> > > > > > > > > On Thu, Nov 8, 2018 at 12:13 PM radai <
>> > > > radai.rosenbl...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > bump.
>> > > > > > > > > >
>> > > > > > > > > > I think the proposed API (Observer) is useful for any
>> sort
>> > of
>> > > > > > > > > > multi-tenant environment for chargeback and reporting
>> > > purposes.
>> > > > > > > > > >
>> > > > > > > > > > if no one wants to comment, can we initiate a vote?
>> > > > > > > > > > On Mon, Nov 5, 2018 at 6:31 PM Lincong Li <
>> > > > > andrewlinc...@gmail.com
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > Hi everyone. Here
>> > > > > > > > > > > <
>> > > > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-388%3A+Add+observer+interface+to+record+request+and+response
>> > > > > > > > > > >
>> > > > > > > > > > > is
>> > > > > > > > > > > my KIP. Any feedback is appreciated.
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks,
>> > > > > > > > > > > Lincong Li
>> > > > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > -Regards,
>> > > > > Mayuresh R. Gharat
>> > > > > (862) 250-7125
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to