Re: [DISCUSS] KIP-349 Priorities for Source Topics

Jan Filipiak Thu, 06 Sep 2018 05:25:01 -0700


On 05.09.2018 17:18, Colin McCabe wrote:

Hi all,

I agree that DISCUSS is more appropriate than VOTE at this point, since I don't 
remember the last discussion coming to a definite conclusion.

I guess my concern is that this will add complexity and memory consumption on 
the server side.  In the case of incremental fetch requests, we will have to 
track at least two extra bytes per partition, to know what the priority of each 
partition is within each active fetch session.

It would be nice to hear more about the use-cases for this feature.  I think 
Gwen asked about this earlier, and I don't remember reading a response.  The 
fact that we're now talking about Samza interfaces is a bit of a red flag.  
After all, Samza didn't need partition priorities to do what it did.  You can 
do a lot with muting partitions and using appropriate threading in your code.

to show a usecase, I linked 353, especially since the threading model ispretty fixed there.

No clue why Samza should be a red flag. They handle purely on theconsumer side. Which I think is reasonable. I would not try to implementany broker side support for this, if I were todo it. Just don't supportincremental fetch then.In the end if you have broker side support, you would need to ship thelogic of the message chooser to the broker. I don't think that willallow for the flexibility I had in mind on purely consumer basedimplementation.


For example, you can hand data from a partition off to a work queue with a 
fixed size, which is handled by a separate service thread.  If the queue gets 
full, you can mute the partition until some of the buffered data is processed.  
Kafka Streams uses a similar approach to avoid reading partition data that 
isn't immediately needed.

There might be some use-cases that need priorities eventually, but I'm 
concerned that we're jumping the gun by trying to implement this before we know 
what they are.

best,
Colin


On Wed, Sep 5, 2018, at 01:06, Jan Filipiak wrote:

On 05.09.2018 02:38, n...@afshartous.com wrote:

On Sep 4, 2018, at 4:20 PM, Jan Filipiak <jan.filip...@trivago.com> wrote:

what I meant is litterally this interface:

https://samza.apache.org/learn/documentation/0.7.0/api/javadocs/org/apache/samza/system/chooser/MessageChooser.html
 
<https://samza.apache.org/learn/documentation/0.7.0/api/javadocs/org/apache/samza/system/chooser/MessageChooser.html>

Hi Jan,

Thanks for the reply and I have a few questions.  This Samza doc

    https://samza.apache.org/learn/documentation/0.14/container/streams.html 
<https://samza.apache.org/learn/documentation/0.14/container/streams.html>

indicates that the chooser is set via configuration.  Are you suggesting adding 
a new configuration for Kafka ?  Seems like we could also have a method on 
KafkaConsumer

      public void register(MessageChooser messageChooser)

I don't have strong opinions regarding this. I like configs, i also
don't think it would be a problem to have both.

to make it more dynamic.

Also, the Samza MessageChooser interface has method

    /* Notify the chooser that a new envelope is available for a processing. */
void update(IncomingMessageEnvelope envelope)

and I’m wondering how this method would be translated to Kafka API.  In 
particular what corresponds to IncomingMessageEnvelope.

I think Samza uses the envelop abstraction as they support other sources
besides kafka aswell. They are more
on the spark end of things when it comes to different input types. I
don't have strong opinions but it feels like
we wouldn't need such a thing in the kafka consumer but just use a
regular ConsumerRecord or so.

Best,
--
        Nick

Re: [DISCUSS] KIP-349 Priorities for Source Topics

Reply via email to