[jira] [Resolved] (KAFKA-7363) How num.stream.threads in streaming application influence memory consumption?

2018-09-02 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7363.

Resolution: Invalid

Please ask question on the mailing list. We use Jira to track bugs only. Thanks.

> How num.stream.threads in streaming application influence memory consumption?
> -
>
> Key: KAFKA-7363
> URL: https://issues.apache.org/jira/browse/KAFKA-7363
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
>
> Dears,
> How option _num.stream.threads_ in streaming application influence memory 
> consumption?
> I see that by increasing num.stream.threads my application needs more memory.
> This is obvious, but it is not obvious how much I need to give it. Try and 
> error method does not work, as it seems to be highly dependen on forced 
> throughput.
> I mean: higher load more memory is needed.
> Thanks for help and regards,
> Seweryn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-365: Materialized, Serialized, Joined, Consumed and Produced with implicit Serde

2018-09-02 Thread Matthias J. Sax
+1 (binding)

On 9/1/18 2:40 PM, Guozhang Wang wrote:
> +1 (binding).
> 
> On Mon, Aug 27, 2018 at 5:20 PM, Dongjin Lee  wrote:
> 
>> +1 (non-binding)
>>
>> On Tue, Aug 28, 2018 at 8:53 AM Bill Bejeck  wrote:
>>
>>> +1
>>>
>>> -Bill
>>>
>>> On Mon, Aug 27, 2018 at 3:24 PM Ted Yu  wrote:
>>>
 +1

 On Mon, Aug 27, 2018 at 12:18 PM John Roesler 
>> wrote:

> +1 (non-binding)
>
> On Sat, Aug 25, 2018 at 1:16 PM Joan Goyeau  wrote:
>
>> Hi,
>>
>> We want to make sure that we always have a serde for all
>>> Materialized,
>> Serialized, Joined, Consumed and Produced.
>> For that we can make use of the implicit parameters in Scala.
>>
>> KIP:
>>
>>
>

>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 365%3A+Materialized%2C+Serialized%2C+Joined%2C+Consumed+and+Produced+with+
>> implicit+Serde
>>
>> Github PR: https://github.com/apache/kafka/pull/5551
>>
>> Please make your votes.
>> Thanks
>>
>

>>>
>>> --
>>> *Dongjin Lee*
>>>
>>> *A hitchhiker in the mathematical world.*
>>>
>>> *github:  github.com/dongjinleekr
>>> linkedin: kr.linkedin.com/in/
>> dongjinleekr
>>> slideshare: www.slideshare.net/
>> dongjinleekr
>>> *
>>>
>>
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-363: Make FunctionConversions private

2018-09-02 Thread Matthias J. Sax
Why do we not deprecate the class? Seems like a breaking change to me --
at least technically. Are we 100% sure that nobody is using it and we
don't break someone's application (I don't think we can ever be 100% sure).

I would prefer to deprecate the class and make it private (ie, remove
from public API) in 3.0 release. Or do you see a mayor disadvantage in
following this pattern that is usually applied?


-Matthias

On 8/31/18 12:27 PM, Joan Goyeau wrote:
> Ah ok I didn't know we need multiple binding vote.
> Should I send again a new email with the updated KIP-366 title?
> 
> Thanks
> 
> On Wed, 29 Aug 2018 at 21:14 John Roesler  wrote:
> 
>> Hey Joan,
>>
>> It looks like you've updated the KIP to "Accepted", but I only count one
>> binding vote (Guozhang). Ted, Attila, Bill, and myself are all non-binding
>> votes.
>>
>> For reference, these are all folks who hold binding votes:
>> https://kafka.apache.org/committers . Obviously, they don't all take note
>> of every KIP, so we sometimes have to keep pinging the thread with
>> reminders that we are waiting on binding votes.
>>
>> Also, people muddied the water by responding "+1" to this thread, but it's
>> customary to start a new thread entitled "[VOTE] KIP-366: Make
>> FunctionConversions private" to let people know when the voting has
>> actually started.
>>
>> Thanks,
>> -John
>>
>> On Mon, Aug 27, 2018 at 3:44 PM Joan Goyeau  wrote:
>>
>>> John, no this is for internal use only.
>>> I fact I expect this object to go away with the drop of Scala 2.11 since
>> in
>>> Scala 2.12 we have support for SAM.
>>>
>>> Thanks
>>>
>>> On Mon, 27 Aug 2018 at 15:41 John Roesler  wrote:
>>>
 Hey Joan,

 I was thinking more about this... Do any of the conversions in
 FunctionConversions convert to types that are used in the public Scala
 interface?

 If you've already checked, then carry on.

 Otherwise, we should leave public any that might be in use.

 Thanks,
 -John

 On Sat, Aug 25, 2018 at 12:19 PM Joan Goyeau  wrote:

> Thanks Attila, it's done.
>
> On Sat, 25 Aug 2018 at 02:57 Ted Yu  wrote:
>
>> +1
>>
>> On Fri, Aug 24, 2018 at 5:17 PM Attila Sasvári <
>> asasv...@apache.org>
>> wrote:
>>
>>> Hi there,
>>>
>>> There is a conflicting KIP with the same number, see
>>>
>>>
>>
>

>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-363%3A+Allow+performance+tools+to+print+final+results+to+output+file
>>>
>>> Its discussion was started earlier, on August 23
>>> https://www.mail-archive.com/dev@kafka.apache.org/msg91132.html
>>> and
> KIP
>>> page already includes it:
>>>
>>>
>>
>

>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>>
>>> Please update KIP number to resolve the conflict.
>>>
>>> Apart from this, +1 (non-binding) and thanks for the KIP!
>>>
>>> Regards,
>>> - Attila
>>>
>>>
>>> Guozhang Wang  (időpont: 2018. aug. 24., P,
 20:26)
>> ezt
>>> írta:
>>>
 +1 from me (binding).

 On Fri, Aug 24, 2018 at 11:24 AM, Joan Goyeau >>
> wrote:

> Hi,
>
> As pointed out in this comment #5539 (comment)
> <
>>> https://github.com/apache/kafka/pull/5539#discussion_r212380648
>
>>> "This
> class was already defaulted to public visibility, and we
>> can't
>> retract
>>> it
> now, without a KIP.", the object FunctionConversions is only
>> of
>>> internal
> use and therefore should be private to the lib only so that
>> we
 can
> do
> changes without going through KIP like this one.
>
> Please make your vote.
>
> On Fri, 24 Aug 2018 at 19:14 John Roesler >>
> wrote:
>
>> I'm also in favor of this. I don't think it's controversial
> either.
> Should
>> we just move to a vote?
>>
>> On Thu, Aug 23, 2018 at 7:01 PM Guozhang Wang <
> wangg...@gmail.com>
> wrote:
>>
>>> +1.
>>>
>>> On Thu, Aug 23, 2018 at 12:47 PM, Ted Yu <
 yuzhih...@gmail.com>
 wrote:
>>>
 +1

 In the Motivation section, you can quote the comment
>> from
> pull
> request
>> so
 that reader doesn't have to click through.

 Cheers

 On Thu, Aug 23, 2018 at 12:13 PM Joan Goyeau <
> j...@goyeau.com>
> wrote:

> Hi,
>
> As pointed out in this comment #5539 (comment)
> <
>>> https://github.com/apache/kafka/pull/5539#discussion_r212380648
>
>> the
> object FunctionConversions is only of internal use
>> and
>>> therefore
>> should
 be
>>>

Re: [VOTE] KIP-359: Verify leader epoch in produce requests

2018-09-02 Thread Matthias J. Sax
+1 (binding)

On 8/30/18 11:30 PM, Dongjin Lee wrote:
> Thanks for the KIP. I'm +1 (non-binding).
> 
> Best,
> Dongjin
> 
> On Fri, Aug 31, 2018 at 9:25 AM Dong Lin  wrote:
> 
>> Thanks for the KIP!
>>
>> +1 (binding)
>>
>> On Thu, Aug 30, 2018 at 3:56 PM, Jason Gustafson 
>> wrote:
>>
>>> Hi All,
>>>
>>> I'd like to start the vote on KIP-359: https://cwiki.apache.org/
>>> confluence/display/KAFKA/KIP-359%3A+Verify+leader+epoch+in+
>>> produce+requests.
>>> Thanks in advance for reviewing.
>>>
>>> -Jason
>>>
>>
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-369 Alternative Partitioner to Support "Always Round-Robin" Selection

2018-09-02 Thread M. Manna
Thanks for all your comments and taking it easy on me for my first KIP :)

 I am trying to check if it's okay for us to start a vote on this? As per
some recent comment I'll change the name to RoundRobinPartitioner.

I'll need to put some effort in writing Scala tests etc. since I'm a novice
with Scala.

Please let me know your thoughts, and I'll update the status accordingly
(and start working on the JIRA once it's approved).

Regards,

On Fri, 31 Aug 2018, 10:22 M. Manna,  wrote:

> Yes I’m more than happy to change it to a more appropriate name.
>
> The issue with RoundRobinPatitoner is that the DefaultPartitioner already
> has a Round-Robin associated to it. But if community doesn’t mind the name,
> I don’t either.
>
> Thanks for reading the KIP btw.
>
> Regards,
>
> On Fri, 31 Aug 2018 at 05:47, Magesh Nandakumar 
> wrote:
>
>> +1 for this. The only small suggestion would be to possibly call this
>> RondRobinPartitioner which makes the intent obvious.
>>
>> On Thu, Aug 30, 2018 at 5:31 PM Stephen Powis 
>> wrote:
>>
>> > Neat, this would be super helpful! I submitted this ages ago:
>> > https://issues.apache.org/jira/browse/KAFKA-
>> >
>> > On Fri, Aug 31, 2018 at 5:04 AM, Satish Duggana <
>> satish.dugg...@gmail.com>
>> > wrote:
>> >
>> > > +including both dev and user mailing lists.
>> > >
>> > > Hi,
>> > > Thanks for the KIP.
>> > >
>> > > "* For us, the message keys represent some metadata which we use to
>> > either
>> > > ignore messages (if a loop-back to the sender), or log some
>> > information.*"
>> > >
>> > > Above statement was mentioned in the KIP about how key value is used.
>> I
>> > > guess the topic is not configured to be compacted and you do not want
>> to
>> > > have partitioning based on that key. IMHO, it qualifies more as a
>> header
>> > > than a key. What do you think about building records with a specific
>> > header
>> > > and consumers to execute the logic whether to process or ignore the
>> > > messages based on that header value.
>> > >
>> > > Thanks,
>> > > Satish.
>> > >
>> > >
>> > > On Fri, Aug 31, 2018 at 1:32 AM, Satish Duggana <
>> > satish.dugg...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > > Thanks for the KIP.
>> > > >
>> > > > "* For us, the message keys represent some metadata which we use to
>> > > > either ignore messages (if a loop-back to the sender), or log some
>> > > > information.*"
>> > > >
>> > > > Above statement was mentioned in the KIP about how key value is
>> used. I
>> > > > guess the topic is not configured to be compacted and you do not
>> want
>> > to
>> > > > have partitioning based on that key. IMHO, it qualifies more as a
>> > header
>> > > > than a key. What do you think about building records with a specific
>> > > header
>> > > > and consumers to execute the logic whether to process or ignore the
>> > > > messages based on that header value.
>> > > >
>> > > > Thanks,
>> > > > Satish.
>> > > >
>> > > >
>> > > > On Fri, Aug 31, 2018 at 12:02 AM, M. Manna 
>> wrote:
>> > > >
>> > > >> Hi Harsha,
>> > > >>
>> > > >> thanks for reading the KIP.
>> > > >>
>> > > >> The intent is to use the DefaultPartitioner logic for round-robin
>> > > >> selection
>> > > >> of partition regardless of key type.
>> > > >>
>> > > >> Implementing Partitioner interface isn’t the issue here, you would
>> > have
>> > > to
>> > > >> do that anyway if  you are implementing your own. But we also want
>> > this
>> > > to
>> > > >> be part of formal codebase.
>> > > >>
>> > > >> Regards,
>> > > >>
>> > > >> On Thu, 30 Aug 2018 at 16:58, Harsha  wrote:
>> > > >>
>> > > >> > Hi,
>> > > >> >   Thanks for the KIP. I am trying to understand the intent of
>> > the
>> > > >> > KIP.  Is the use case you specified can't be achieved by
>> > implementing
>> > > >> the
>> > > >> > Partitioner interface here?
>> > > >> > https://github.com/apache/kafka/blob/trunk/clients/src/main/
>> > > >> java/org/apache/kafka/clients/producer/Partitioner.java#L28
>> > > >> > .
>> > > >> > Use your custom partitioner to be configured in your producer
>> > clients.
>> > > >> >
>> > > >> > Thanks,
>> > > >> > Harsha
>> > > >> >
>> > > >> > On Thu, Aug 30, 2018, at 1:45 AM, M. Manna wrote:
>> > > >> > > Hello,
>> > > >> > >
>> > > >> > > I opened a very simple KIP and there exists a JIRA for it.
>> > > >> > >
>> > > >> > > I would be grateful if any comments are available for action.
>> > > >> > >
>> > > >> > > Regards,
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>


Re: [VOTE] KIP-359: Verify leader epoch in produce requests

2018-09-02 Thread Satish Duggana
Thanks for the KIP,
+1 (non-binding)

On Sun, Sep 2, 2018 at 7:59 PM, Matthias J. Sax 
wrote:

> +1 (binding)
>
> On 8/30/18 11:30 PM, Dongjin Lee wrote:
> > Thanks for the KIP. I'm +1 (non-binding).
> >
> > Best,
> > Dongjin
> >
> > On Fri, Aug 31, 2018 at 9:25 AM Dong Lin  wrote:
> >
> >> Thanks for the KIP!
> >>
> >> +1 (binding)
> >>
> >> On Thu, Aug 30, 2018 at 3:56 PM, Jason Gustafson 
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I'd like to start the vote on KIP-359: https://cwiki.apache.org/
> >>> confluence/display/KAFKA/KIP-359%3A+Verify+leader+epoch+in+
> >>> produce+requests.
> >>> Thanks in advance for reviewing.
> >>>
> >>> -Jason
> >>>
> >>
> >
> >
>
>


KAFKA-1194

2018-09-02 Thread Stephane Maarek
Hi,

I've seen Dong has done some work on
https://issues.apache.org/jira/browse/KAFKA-7278 which is said from the
comments that it could have possibly fixed
https://issues.apache.org/jira/browse/KAFKA-1194.

I tested and it is unfortunately not the case...
I have posted in KAFKA-1194 a way to reproduce the issue in a deterministic
way:
===
C:\kafka_2.11-2.1.0-SNAPSHOT>bin\windows\kafka-server-start.bat
config\server.properties

C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
--topic second_topic --create --partitions 3 --replication-factor 1

C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-console-producer.bat --broker-list
127.0.0.1:9092 --topic second_topic
>hello
>world
>hello
>Terminate batch job (Y/N)? Y

C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
--topic second_topic --delete



I usually wouldn't push for any issues for windows to be fixed, but this
one is triggered from basic CLI usage and triggers a full broker failure
that can't be recovered.
It actually completely destroys the faith of Kafka newcomers learning using
Windows (which are 50% of the users I have in my course).

I'm happy to help out in any way I can to test any patch.

Thanks
Stephane


Re: [DISCUSS] KIP-347: Enable batching in FindCoordinatorRequest

2018-09-02 Thread Yishun Guan
Hi Guozhang,

Yes, you are right. I didn't notice T build() is bounded to .
I was originally thinking T could be an AbstractedRequest or List<>
but I was wrong. Now the return type has to change from T build() to
List build
where . As you mentioned,
this required a change for all the requests, probably need
a new KIP too, do you think. I will update this KIP accordingly first.

However, do you see other benefits of changing the return type for build()?
The original improvement that we want is this:
https://issues.apache.org/jira/browse/KAFKA-6788.
It seems like we have to make a lot of structural changes for this
small improvement.
I think changing the return type might benefit the project in the future,
but I don't know the project enough to say so. I would love to keep
working on this,
but do you see all these changes are worth for this story,
and if not, is there another way out?

Thanks,
Yishun
On Sat, Sep 1, 2018 at 11:04 AM Guozhang Wang  wrote:
>
> Hello Yishun,
>
> Thanks for the updated KIP. I think option 1), i.e. return multiple
> requests from build() call is okay. Just to clarify: you are going to
> change `AbstractRequest#build(short version)` signature, and hence all
> requests need to be updated accordingly, but only FindCoordinator for may
> return multiple requests in the list, while all others will return
> singleton list, right?
>
>
> Guozhang
>
>
> On Fri, Aug 31, 2018 at 10:51 AM, Yishun Guan  wrote:
>
> > @Guozhang Wang Could you review this again when you have time? Thanks!
> > -Yishun
> > On Wed, Aug 29, 2018 at 11:57 AM Yishun Guan  wrote:
> > >
> > > Hi, because I have made some significant changes on this design, so I
> > > want to reopen the discussion on this KIP:
> > > https://cwiki.apache.org/confluence/x/CgZPBQ
> > >
> > > Thanks,
> > > Yishun
> > > On Thu, Aug 16, 2018 at 5:06 PM Yishun Guan  wrote:
> > > >
> > > > I see! Thanks!
> > > >
> > > > On Thu, Aug 16, 2018, 4:35 PM Guozhang Wang 
> > wrote:
> > > >>
> > > >> It is not implemented, but should not be hard to do so (and again you
> > do
> > > >> NOT have to do that in this KIP, I'm bringing this up so that you can
> > help
> > > >> thinking about the process).
> > > >>
> > > >> Quoting from Colin's comment:
> > > >>
> > > >> "
> > > >> The pattern is that you would try to send a request for more than one
> > > >> group, and then you would get an UnsupportedVersionException (nothing
> > would
> > > >> be sent on the wire, this is purely internal to the code).
> > > >> Then your code would handle the UVE by creating requests with an older
> > > >> version that only had one group each.
> > > >> "
> > > >>
> > > >>
> > > >> Guozhang
> > > >>
> > > >>
> > > >> On Wed, Aug 15, 2018 at 4:44 PM, Yishun Guan 
> > wrote:
> > > >>
> > > >> > Hi, I am looking into AdminClient.scala and AdminClient.java, and
> > also
> > > >> > looking into ApiVersionRequest.java and ApiVersionResponse.java,
> > but I
> > > >> > don't see anywhere contains to logic of the one-to-one mapping from
> > version
> > > >> > to version, am i looking at the right place?
> > > >> >
> > > >> > On Mon, Aug 13, 2018 at 1:30 PM Guozhang Wang 
> > wrote:
> > > >> >
> > > >> > > Regarding 3): Today we do not have this logic with the existing
> > client,
> > > >> > > because defer the decision about the version to use (we always
> > assume
> > > >> > that
> > > >> > > an new versioned request need to be down-converted to a single old
> > > >> > > versioned request: i.e. an one-to-one mapping), but in principle,
> > we
> > > >> > should
> > > >> > > be able to modify the client make it work.
> > > >> > >
> > > >> > > Again this is not necessarily need to be included in this KIP,
> > but I'd
> > > >> > > recommend you to look into AdminClient implementations around the
> > > >> > > ApiVersionRequest / Response and think about how that logic can be
> > > >> > modified
> > > >> > > in the follow-up PR of this KIP.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > Guozhang
> > > >> > >
> > > >> > > On Mon, Aug 13, 2018 at 12:55 PM, Yishun Guan 
> > wrote:
> > > >> > >
> > > >> > > > @Guozhang, thank you so much!
> > > >> > > > 1. I agree, fixed.
> > > >> > > > 2. Added.
> > > >> > > > 3. I see, that is something that I haven't think about. How
> > does Kafka
> > > >> > > > handle other api's different version problem now? So we have a
> > specific
> > > >> > > > convertor that convect a new version request to a old version
> > one for
> > > >> > > each
> > > >> > > > API (is this what the ApiVersionsRequest supposed to do, does
> > it only
> > > >> > > > handle the detection or it should handle the conversion too)?
> > What will
> > > >> > > be
> > > >> > > > the consequence of not having such a transformer but the
> > version is
> > > >> > > > incompatible?
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Yishun
> > > >> > > >
> > > >> > > > On Sat, Aug 11, 2018 at 11:27 AM Guozhang Wang <
> > wangg...@gmail.com>
> > > >> > > wrote:
> > > >> > > >
> > > >> > > 

Re: KAFKA-1194

2018-09-02 Thread Ismael Juma
Hi Stephane,

Does https://github.com/apache/kafka/pull/4947/files fix it for you?

Ismael

On Sun, Sep 2, 2018 at 11:25 AM Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi,
>
> I've seen Dong has done some work on
> https://issues.apache.org/jira/browse/KAFKA-7278 which is said from the
> comments that it could have possibly fixed
> https://issues.apache.org/jira/browse/KAFKA-1194.
>
> I tested and it is unfortunately not the case...
> I have posted in KAFKA-1194 a way to reproduce the issue in a deterministic
> way:
> ===
> C:\kafka_2.11-2.1.0-SNAPSHOT>bin\windows\kafka-server-start.bat
> config\server.properties
>
> C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
> --topic second_topic --create --partitions 3 --replication-factor 1
>
> C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-console-producer.bat --broker-list
> 127.0.0.1:9092 --topic second_topic
> >hello
> >world
> >hello
> >Terminate batch job (Y/N)? Y
>
> C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
> --topic second_topic --delete
> 
>
>
> I usually wouldn't push for any issues for windows to be fixed, but this
> one is triggered from basic CLI usage and triggers a full broker failure
> that can't be recovered.
> It actually completely destroys the faith of Kafka newcomers learning using
> Windows (which are 50% of the users I have in my course).
>
> I'm happy to help out in any way I can to test any patch.
>
> Thanks
> Stephane
>


Re: [DISCUSS] KIP-354 Time-based log compaction policy

2018-09-02 Thread Brett Rann
+1 (non-binding) from me on the interface. I'd like to see someone familiar
with
the code comment on the approach, and note there's a couple of different
approaches: what's documented in the KIP, and what Xiaohe Dong was working
on
here:
https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-cleaner-compaction-max-lifetime-2.0

If you have code working already Xiongqi Wu could you share a PR? I'd be
happy
to start testing.

On Tue, Aug 28, 2018 at 5:57 AM xiongqi wu  wrote:

> Hi All,
>
> Do you have any additional comments on this KIP?
>
>
> On Thu, Aug 16, 2018 at 9:17 PM, xiongqi wu  wrote:
>
> > on 2)
> > The offsetmap is built starting from dirty segment.
> > The compaction starts from the beginning of the log partition. That's how
> > it ensure the deletion of tomb keys.
> > I will double check tomorrow.
> >
> > Xiongqi (Wesley) Wu
> >
> >
> > On Thu, Aug 16, 2018 at 6:46 PM Brett Rann 
> > wrote:
> >
> >> To just clarify a bit on 1. whether there's an external storage/DB isn't
> >> relevant here.
> >> Compacted topics allow a tombstone record to be sent (a null value for a
> >> key) which
> >> currently will result in old values for that key being deleted if some
> >> conditions are met.
> >> There are existing controls to make sure the old values will stay around
> >> for a minimum
> >> time at least, but no dedicated control to ensure the tombstone will
> >> delete
> >> within a
> >> maximum time.
> >>
> >> One popular reason that maximum time for deletion is desirable right now
> >> is
> >> GDPR with
> >> PII. But we're not proposing any GDPR awareness in kafka, just being
> able
> >> to guarantee
> >> a max time where a tombstoned key will be removed from the compacted
> >> topic.
> >>
> >> on 2)
> >> huh, i thought it kept track of the first dirty segment and didn't
> >> recompact older "clean" ones.
> >> But I didn't look at code or test for that.
> >>
> >> On Fri, Aug 17, 2018 at 10:57 AM xiongqi wu 
> wrote:
> >>
> >> > 1, Owner of data (in this sense, kafka is the not the owner of data)
> >> > should keep track of lifecycle of the data in some external
> storage/DB.
> >> > The owner determines when to delete the data and send the delete
> >> request to
> >> > kafka. Kafka doesn't know about the content of data but to provide a
> >> mean
> >> > for deletion.
> >> >
> >> > 2 , each time compaction runs, it will start from first segments (no
> >> > matter if it is compacted or not). The time estimation here is only
> used
> >> > to determine whether we should run compaction on this log partition.
> So
> >> we
> >> > only need to estimate uncompacted segments.
> >> >
> >> > On Thu, Aug 16, 2018 at 5:35 PM, Dong Lin 
> wrote:
> >> >
> >> > > Hey Xiongqi,
> >> > >
> >> > > Thanks for the update. I have two questions for the latest KIP.
> >> > >
> >> > > 1) The motivation section says that one use case is to delete PII
> >> > (Personal
> >> > > Identifiable information) data within 7 days while keeping non-PII
> >> > > indefinitely in compacted format. I suppose the use-case depends on
> >> the
> >> > > application to determine when to delete those PII data. Could you
> >> explain
> >> > > how can application reliably determine the set of keys that should
> be
> >> > > deleted? Is application required to always messages from the topic
> >> after
> >> > > every restart and determine the keys to be deleted by looking at
> >> message
> >> > > timestamp, or is application supposed to persist the key-> timstamp
> >> > > information in a separate persistent storage system?
> >> > >
> >> > > 2) It is mentioned in the KIP that "we only need to estimate
> earliest
> >> > > message timestamp for un-compacted log segments because the deletion
> >> > > requests that belong to compacted segments have already been
> >> processed".
> >> > > Not sure if it is correct. If a segment is compacted before user
> sends
> >> > > message to delete a key in this segment, it seems that we still need
> >> to
> >> > > ensure that the segment will be compacted again within the given
> time
> >> > after
> >> > > the deletion is requested, right?
> >> > >
> >> > > Thanks,
> >> > > Dong
> >> > >
> >> > > On Thu, Aug 16, 2018 at 10:27 AM, xiongqi wu 
> >> > wrote:
> >> > >
> >> > > > Hi Xiaohe,
> >> > > >
> >> > > > Quick note:
> >> > > > 1) Use minimum of segment.ms and max.compaction.lag.ms
> >> > > >  
> >> >  >>
> >> > > >
> >> > > > 2) I am not sure if I get your second question. first, we have
> >> jitter
> >> > > when
> >> > > > we roll the active segment. second, on each compaction, we compact
> >> upto
> >> > > > the offsetmap could allow. Those will not lead to perfect
> compaction
> >> > > storm
> >> > > > overtime. In addition, I expect we are setting
> >> max.compaction.lag.ms
> >> > on
> >> > > > the order of days.
> >> > > >
> >> > > > 3) I don't have access to the confluent community slack for now. I
> >> am

[jira] [Created] (KAFKA-7373) GetOffsetShell doesn't work when SSL authentication is enabled

2018-09-02 Thread Andy Bryant (JIRA)
Andy Bryant created KAFKA-7373:
--

 Summary: GetOffsetShell doesn't work when SSL authentication is 
enabled
 Key: KAFKA-7373
 URL: https://issues.apache.org/jira/browse/KAFKA-7373
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Andy Bryant


GetOffsetShell doesn't provide a mechanism to provide additional configuration 
for the underlying KafkaConsumer as does the `ConsoleConsumer`. Passing SSL 
config as system properties doesn't propagate to the consumer either.
{code:java}
10:47 $ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
${BROKER_LIST} --topic cld-dev-sor-crods-crodsdba_contact

Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: 
Timeout expired while fetching topic metadata{code}
Editing {{GetOffsetShell.scala}} to include the SSL properties in the 
KafkaConsumer configuration resolved the issue.

Providing {{consumer-property}} and {{consumer-config}} configuration options 
for {{kafka-run-class-sh}} or creating a separate run script for offsets and 
using these properties in {{GetOffsetShell.scala}} seems like a good solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-347: Enable batching in FindCoordinatorRequest

2018-09-02 Thread Guozhang Wang
Hi Yishun,

I was actually not suggesting we should immediately make such dramatic
change on the AbstractRequest APIs which will affect all requests types,
just clarifying if it is your intent or not, since your code snippet in the
KIP has "@Override"  :)

I think an alternative way is to add such a function for for
FindCoordinator only, i.e. besides the overridden `public
FindCoordinatorRequest build(short version)` we can have one more function
(note the function name need to be different since Java type erasure caused
it to not able to differentiate these two otherwise, but we can consider a
better name: buildMulti is only for illustration)

public List buildMulti(short version)


It does mean that we now need to special-handle
FindCoordinatorRequestBuilder in all callers from other requests, which is
also a bit "ugly and dirty", but the change scope may be smaller. General
changes on the AbstractRequestBuilder could be delayed until we realize
this is a common usage for some other requests in their newer versions as
well.


Guozhang


On Sun, Sep 2, 2018 at 4:10 PM, Yishun Guan  wrote:

> Hi Guozhang,
>
> Yes, you are right. I didn't notice T build() is bounded to  AbstractRequest>.
> I was originally thinking T could be an AbstractedRequest or List<>
> but I was wrong. Now the return type has to change from T build() to
> List build
> where . As you mentioned,
> this required a change for all the requests, probably need
> a new KIP too, do you think. I will update this KIP accordingly first.
>
> However, do you see other benefits of changing the return type for build()?
> The original improvement that we want is this:
> https://issues.apache.org/jira/browse/KAFKA-6788.
> It seems like we have to make a lot of structural changes for this
> small improvement.
> I think changing the return type might benefit the project in the future,
> but I don't know the project enough to say so. I would love to keep
> working on this,
> but do you see all these changes are worth for this story,
> and if not, is there another way out?
>
> Thanks,
> Yishun
> On Sat, Sep 1, 2018 at 11:04 AM Guozhang Wang  wrote:
> >
> > Hello Yishun,
> >
> > Thanks for the updated KIP. I think option 1), i.e. return multiple
> > requests from build() call is okay. Just to clarify: you are going to
> > change `AbstractRequest#build(short version)` signature, and hence all
> > requests need to be updated accordingly, but only FindCoordinator for may
> > return multiple requests in the list, while all others will return
> > singleton list, right?
> >
> >
> > Guozhang
> >
> >
> > On Fri, Aug 31, 2018 at 10:51 AM, Yishun Guan  wrote:
> >
> > > @Guozhang Wang Could you review this again when you have time? Thanks!
> > > -Yishun
> > > On Wed, Aug 29, 2018 at 11:57 AM Yishun Guan 
> wrote:
> > > >
> > > > Hi, because I have made some significant changes on this design, so I
> > > > want to reopen the discussion on this KIP:
> > > > https://cwiki.apache.org/confluence/x/CgZPBQ
> > > >
> > > > Thanks,
> > > > Yishun
> > > > On Thu, Aug 16, 2018 at 5:06 PM Yishun Guan 
> wrote:
> > > > >
> > > > > I see! Thanks!
> > > > >
> > > > > On Thu, Aug 16, 2018, 4:35 PM Guozhang Wang 
> > > wrote:
> > > > >>
> > > > >> It is not implemented, but should not be hard to do so (and again
> you
> > > do
> > > > >> NOT have to do that in this KIP, I'm bringing this up so that you
> can
> > > help
> > > > >> thinking about the process).
> > > > >>
> > > > >> Quoting from Colin's comment:
> > > > >>
> > > > >> "
> > > > >> The pattern is that you would try to send a request for more than
> one
> > > > >> group, and then you would get an UnsupportedVersionException
> (nothing
> > > would
> > > > >> be sent on the wire, this is purely internal to the code).
> > > > >> Then your code would handle the UVE by creating requests with an
> older
> > > > >> version that only had one group each.
> > > > >> "
> > > > >>
> > > > >>
> > > > >> Guozhang
> > > > >>
> > > > >>
> > > > >> On Wed, Aug 15, 2018 at 4:44 PM, Yishun Guan 
> > > wrote:
> > > > >>
> > > > >> > Hi, I am looking into AdminClient.scala and AdminClient.java,
> and
> > > also
> > > > >> > looking into ApiVersionRequest.java and ApiVersionResponse.java,
> > > but I
> > > > >> > don't see anywhere contains to logic of the one-to-one mapping
> from
> > > version
> > > > >> > to version, am i looking at the right place?
> > > > >> >
> > > > >> > On Mon, Aug 13, 2018 at 1:30 PM Guozhang Wang <
> wangg...@gmail.com>
> > > wrote:
> > > > >> >
> > > > >> > > Regarding 3): Today we do not have this logic with the
> existing
> > > client,
> > > > >> > > because defer the decision about the version to use (we always
> > > assume
> > > > >> > that
> > > > >> > > an new versioned request need to be down-converted to a
> single old
> > > > >> > > versioned request: i.e. an one-to-one mapping), but in
> principle,
> > > we
> > > > >> > should
> > > > >> > > be able to modify the client make it work.
> > > > >> > >
> > > > >

Re: KAFKA-1194

2018-09-02 Thread Stephane Maarek
Hi Ismael, thanks for having a look!

https://github.com/apache/kafka/pull/4947/files does not fix it for me.

But this one does: https://github.com/apache/kafka/pull/4431
It produces the following log: https://pastebin.com/NHeAcc6v
As you can see the folders do get renamed, it gives the user a few odd
WARNs, but I checked and the directories are fully gone nonetheless.

I think the ideas of the PRs go back all along to https://github.com/apache/
kafka/pull/154/files or https://github.com/apache/kafka/pull/1757

I couldn't find how to map both PRs to the new Kafka codebase,
unfortunately.

On 3 September 2018 at 03:26, Ismael Juma  wrote:

> Hi Stephane,
>
> Does https://github.com/apache/kafka/pull/4947/files fix it for you?
>
> Ismael
>
> On Sun, Sep 2, 2018 at 11:25 AM Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > Hi,
> >
> > I've seen Dong has done some work on
> > https://issues.apache.org/jira/browse/KAFKA-7278 which is said from the
> > comments that it could have possibly fixed
> > https://issues.apache.org/jira/browse/KAFKA-1194.
> >
> > I tested and it is unfortunately not the case...
> > I have posted in KAFKA-1194 a way to reproduce the issue in a
> deterministic
> > way:
> > ===
> > C:\kafka_2.11-2.1.0-SNAPSHOT>bin\windows\kafka-server-start.bat
> > config\server.properties
> >
> > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
> > --topic second_topic --create --partitions 3 --replication-factor 1
> >
> > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-console-producer.bat --broker-list
> > 127.0.0.1:9092 --topic second_topic
> > >hello
> > >world
> > >hello
> > >Terminate batch job (Y/N)? Y
> >
> > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
> > --topic second_topic --delete
> > 
> >
> >
> > I usually wouldn't push for any issues for windows to be fixed, but this
> > one is triggered from basic CLI usage and triggers a full broker failure
> > that can't be recovered.
> > It actually completely destroys the faith of Kafka newcomers learning
> using
> > Windows (which are 50% of the users I have in my course).
> >
> > I'm happy to help out in any way I can to test any patch.
> >
> > Thanks
> > Stephane
> >
>