Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2019-12-26 Thread Stanislav Kozlovski
Hey Brian,

1. Could we more explicitly clarify the behavior of the algorithm when `|T|
> TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in that
scenario
2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10, 20)` be
`topicsPerSec * 10` ?
3. When is this new algorithm applied? To confirm my understanding - what
is the behavior of `metadata.max.age.ms` after this KIP? Are we adding a
new, more proactive metadata fetch for topics in |U|?

Thanks,
Stanislav

On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne  wrote:

> Hello everyone,
>
> For all interested, please take a look at the proposed algorithm as I'd
> like to get more feedback. I'll call for a vote once the break is over.
>
> Thanks,
> Brian
>
> On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang  wrote:
>
> > Sounds good, I agree that should not make a big difference in practice.
> >
> > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne  wrote:
> >
> > > Hi Guozhang,
> > >
> > > I see, we agree on the topic threshold not applying to urgent topics,
> but
> > > differ slightly on what should be considered urgent. I would argue that
> > we
> > > should consider topics nearing the metadata.max.age.ms to be urgent
> > since
> > > they may still be well within the metadata.expiry.ms. That is, the
> > client
> > > still considers these topics to be relevant (not expired), but doesn't
> > want
> > > to incur the latency bubble of having to wait for the metadata to be
> > > re-fetched if it's stale. This could be a frequent case if the
> > > metadata.max.age.ms << metadata.expiry.ms.
> > >
> > > In practice, I wouldn't expect this to make a noticeable difference so
> I
> > > don't have a strong leaning, but the current behavior today is to
> > > aggressively refresh the metadata of stale topics by ensuring a refresh
> > is
> > > triggered before that metadata.max.age.ms duration elapses.
> > >
> > > Thanks,
> > > Brian
> > >
> > >
> > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang 
> > wrote:
> > >
> > > > Hello Brian,
> > > >
> > > > Thanks for your explanation, could you then update the wiki page for
> > the
> > > > algorithm part since when I read it, I thought it was different from
> > the
> > > > above, e.g. urgent topics should not be added just because of max.age
> > > > expiration, but should only be added if there are sending data
> pending.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne 
> > wrote:
> > > >
> > > > > Hi Guozhang,
> > > > >
> > > > > Thanks for the feedback!
> > > > >
> > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang 
> > > wrote:
> > > > >
> > > > > > 1. The addition of *metadata.expiry.ms <
> http://metadata.expiry.ms>
> > > > > *should
> > > > > > be included in the public interface. Also its semantics needs
> more
> > > > > > clarification (since previously it is hard-coded internally we do
> > not
> > > > > need
> > > > > > to explain it publicly, but now with the configurable value we do
> > > > need).
> > > > > >
> > > > >
> > > > > This was an oversight. Done.
> > > > >
> > > > >
> > > > > > 2. There are a couple of hard-coded parameters like 25 and 0.5 in
> > the
> > > > > > proposal, maybe we need to explain why these magic values makes
> > sense
> > > > in
> > > > > > common scenarios.
> > > > > >
> > > > >
> > > > > So these are pretty fuzzy numbers, and seemed to be a decent
> balance
> > > > > between trade-offs. I've updated the target size to account for
> > setups
> > > > with
> > > > > a large number of topics or a shorter refresh time, as well as
> added
> > > some
> > > > > light rationale.
> > > > >
> > > > >
> > > > > > 3. In the Urgent set condition, do you actually mean "with no
> > cached
> > > > > > metadata AND there are existing data buffered for the topic"?
> > > > > >
> > > > >
> > > > > Yes, fixed.
> > > > >
> > > > >
> > > > >
> > > > > > One concern I have is whether or not we may introduce a
> regression,
> > > > > > especially during producer startup such that since we only
> require
> > up
> > > > to
> > > > > 25
> > > > > > topics each request, it may cause the send data to be buffered
> more
> > > > time
> > > > > > than now due to metadata not available. I understand this is a
> > > > > acknowledged
> > > > > > trade-off in our design but any regression that may surface to
> > users
> > > > need
> > > > > > to be very carefully considered. I'm wondering, e.g. if we can
> > tweak
> > > > our
> > > > > > algorithm for the Urgent set, e.g. to consider those with non
> > cached
> > > > > > metadata have higher priority than those who have elapsed max.age
> > but
> > > > not
> > > > > > yet have been called for sending. More specifically:
> > > > > >
> > > > > > Urgent: topics that have been requested for sending but no cached
> > > > > metadata,
> > > > > > and topics that have send request failed with e.g. NOT_LEADER.
> > > > > > Non-urgent: topics that are not in Urgent but have expired
> max.age.
> > > > > >
> > > > > > Then when sending me

[jira] [Created] (KAFKA-9335) java.lang.IllegalArgumentException: Number of partitions must be at least 1.

2019-12-26 Thread Nitay Kufert (Jira)
Nitay Kufert created KAFKA-9335:
---

 Summary: java.lang.IllegalArgumentException: Number of partitions 
must be at least 1.
 Key: KAFKA-9335
 URL: https://issues.apache.org/jira/browse/KAFKA-9335
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.4.0
Reporter: Nitay Kufert


Hey,

When trying to upgrade our Kafka streams client to 2.4.0 (from 2.3.1) we 
encountered the following exception: 
{code:java}
java.lang.IllegalArgumentException: Number of partitions must be at least 1.
{code}
It's important to notice that the exact same code works just fine at 2.3.1.

 

I have created a "toy" example which reproduced this exception:

[https://gist.github.com/nitayk/50da33b7bcce19ad0a7f8244d309cb8f]

and I would love to get some insight regarding why its happening / ways to get 
around it

 

Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2019-12-26 Thread Brian Byrne
Hi Stanislav,

Appreciate the feedback!

1. You're correct. I've added notes to the KIP to clarify.

2. Yes it should. Fixed.

3. So I made a mistake when generalizing the target refresh size, which
should have been using `metadata.max.age.ms` instead of `metadata.evict.ms`.
Therefore, `metadata.max.age.ms` still controls the target refresh time of
active topic metadata, and doesn't change in that regard. You're right in
that the U set will make metadata refreshing more proactive in some cases,
which is done to smooth out the metadata RPCs over time and lessen the
potential bulk of some of the RPC responses by working in smaller units.
The T set is what logically exists today and doesn't change.

Thanks,
Brian





On Thu, Dec 26, 2019 at 2:16 AM Stanislav Kozlovski 
wrote:

> Hey Brian,
>
> 1. Could we more explicitly clarify the behavior of the algorithm when `|T|
> > TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in that
> scenario
> 2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10, 20)` be
> `topicsPerSec * 10` ?
> 3. When is this new algorithm applied? To confirm my understanding - what
> is the behavior of `metadata.max.age.ms` after this KIP? Are we adding a
> new, more proactive metadata fetch for topics in |U|?
>
> Thanks,
> Stanislav
>
> On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne  wrote:
>
> > Hello everyone,
> >
> > For all interested, please take a look at the proposed algorithm as I'd
> > like to get more feedback. I'll call for a vote once the break is over.
> >
> > Thanks,
> > Brian
> >
> > On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang 
> wrote:
> >
> > > Sounds good, I agree that should not make a big difference in practice.
> > >
> > > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne 
> wrote:
> > >
> > > > Hi Guozhang,
> > > >
> > > > I see, we agree on the topic threshold not applying to urgent topics,
> > but
> > > > differ slightly on what should be considered urgent. I would argue
> that
> > > we
> > > > should consider topics nearing the metadata.max.age.ms to be urgent
> > > since
> > > > they may still be well within the metadata.expiry.ms. That is, the
> > > client
> > > > still considers these topics to be relevant (not expired), but
> doesn't
> > > want
> > > > to incur the latency bubble of having to wait for the metadata to be
> > > > re-fetched if it's stale. This could be a frequent case if the
> > > > metadata.max.age.ms << metadata.expiry.ms.
> > > >
> > > > In practice, I wouldn't expect this to make a noticeable difference
> so
> > I
> > > > don't have a strong leaning, but the current behavior today is to
> > > > aggressively refresh the metadata of stale topics by ensuring a
> refresh
> > > is
> > > > triggered before that metadata.max.age.ms duration elapses.
> > > >
> > > > Thanks,
> > > > Brian
> > > >
> > > >
> > > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang 
> > > wrote:
> > > >
> > > > > Hello Brian,
> > > > >
> > > > > Thanks for your explanation, could you then update the wiki page
> for
> > > the
> > > > > algorithm part since when I read it, I thought it was different
> from
> > > the
> > > > > above, e.g. urgent topics should not be added just because of
> max.age
> > > > > expiration, but should only be added if there are sending data
> > pending.
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne 
> > > wrote:
> > > > >
> > > > > > Hi Guozhang,
> > > > > >
> > > > > > Thanks for the feedback!
> > > > > >
> > > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang  >
> > > > wrote:
> > > > > >
> > > > > > > 1. The addition of *metadata.expiry.ms <
> > http://metadata.expiry.ms>
> > > > > > *should
> > > > > > > be included in the public interface. Also its semantics needs
> > more
> > > > > > > clarification (since previously it is hard-coded internally we
> do
> > > not
> > > > > > need
> > > > > > > to explain it publicly, but now with the configurable value we
> do
> > > > > need).
> > > > > > >
> > > > > >
> > > > > > This was an oversight. Done.
> > > > > >
> > > > > >
> > > > > > > 2. There are a couple of hard-coded parameters like 25 and 0.5
> in
> > > the
> > > > > > > proposal, maybe we need to explain why these magic values makes
> > > sense
> > > > > in
> > > > > > > common scenarios.
> > > > > > >
> > > > > >
> > > > > > So these are pretty fuzzy numbers, and seemed to be a decent
> > balance
> > > > > > between trade-offs. I've updated the target size to account for
> > > setups
> > > > > with
> > > > > > a large number of topics or a shorter refresh time, as well as
> > added
> > > > some
> > > > > > light rationale.
> > > > > >
> > > > > >
> > > > > > > 3. In the Urgent set condition, do you actually mean "with no
> > > cached
> > > > > > > metadata AND there are existing data buffered for the topic"?
> > > > > > >
> > > > > >
> > > > > > Yes, fixed.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > One concern I have is whether or not we may introduce a
> > regress

Re: Please give a link where I can explain myself what this can be ...

2019-12-26 Thread Matthias J. Sax
Docs ->
https://kafka.apache.org/24/documentation/streams/developer-guide/dsl-api.html#aggregating



On 12/23/19 8:30 AM, Aurel Sandu wrote:
> I'am reading the folowing code :
> ...
> KTable wordCounts = textLines
> .flatMapValues(textLine -> 
> Arrays.asList(textLine.toLowerCase().split("\\W+")))
> .groupBy((key, word) -> word)
> .count(Materialized. byte[]>>as("counts-store"));
> ...
> 
> I do not understand the last line ... what is this ?
> Materialized.>as("counts-store")  ?
> Materialized is from this 
> https://kafka.apache.org/21/javadoc/org/apache/kafka/streams/kstream/Materialized.html
> Please give me a link where I can find explanations ,
> Many thanks,
> Aurel 
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-552: Add interface to handle unused config

2019-12-26 Thread John Roesler
Thanks for the KIP, Artur!

For reference, here is the kip: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-552%3A+Add+interface+to+handle+unused+config

I agree, these warnings are kind of a nuisance. Would it be feasible just to 
leverage log4j in some way to make it easy to filter these messages? For 
example, we could move those warnings to debug level, or even use a separate 
logger for them. 

Thanks for starting the discussion. 
-John 

On Tue, Dec 24, 2019, at 07:23, Artur Burtsev wrote:
> Hi,
> 
> This KIP provides a way to deal with a warning "The configuration {}
> was supplied but isn't a known config." when it is not relevant.
> 
> Cheers,
> Artur
>


[jira] [Created] (KAFKA-9336) Connecting to Kafka using forwarded Kerberos credentials

2019-12-26 Thread Grzegorz Kokosinski (Jira)
Grzegorz Kokosinski created KAFKA-9336:
--

 Summary: Connecting to Kafka using forwarded Kerberos credentials
 Key: KAFKA-9336
 URL: https://issues.apache.org/jira/browse/KAFKA-9336
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Reporter: Grzegorz Kokosinski


My application is using forwarded Kerberos tickets, see: 
[https://web.mit.edu/kerberos/krb5-latest/doc/user/tkt_mgmt.html].

Users authenticates in my JMV-based remote service using KRB, then in my 
service I would like to connect to Kafka (via KafkaProducer or KarkaConsumer) 
using user KRB credentials. It looks like currently this scenario is impossible 
to be implemented, because the only option to authenticate to Kafka with KRB is 
via JVM system property: 
-Djava.security.auth.login.config=/etc/kafka/kafka_client_jaas.conf.

Notice that I don't have a keytab file but only: 
[https://docs.oracle.com/javase/7/docs/api/org/ietf/jgss/GSSCredential.html.|https://docs.oracle.com/javase/7/docs/api/org/ietf/jgss/GSSCredential.html]
 GSSCredential allows me to use 
[https://docs.oracle.com/javase/7/docs/api/javax/security/auth/Subject.html#doAs(javax.security.auth.Subject,%20java.security.PrivilegedAction)]
 which typically works in other systems like Postgres to authenticate the user 
with KRB using forwarded ticket.

 

 

afka requires to use 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)