Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics
Hey Brian, 1. Could we more explicitly clarify the behavior of the algorithm when `|T| > TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in that scenario 2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10, 20)` be `topicsPerSec * 10` ? 3. When is this new algorithm applied? To confirm my understanding - what is the behavior of `metadata.max.age.ms` after this KIP? Are we adding a new, more proactive metadata fetch for topics in |U|? Thanks, Stanislav On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne wrote: > Hello everyone, > > For all interested, please take a look at the proposed algorithm as I'd > like to get more feedback. I'll call for a vote once the break is over. > > Thanks, > Brian > > On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang wrote: > > > Sounds good, I agree that should not make a big difference in practice. > > > > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne wrote: > > > > > Hi Guozhang, > > > > > > I see, we agree on the topic threshold not applying to urgent topics, > but > > > differ slightly on what should be considered urgent. I would argue that > > we > > > should consider topics nearing the metadata.max.age.ms to be urgent > > since > > > they may still be well within the metadata.expiry.ms. That is, the > > client > > > still considers these topics to be relevant (not expired), but doesn't > > want > > > to incur the latency bubble of having to wait for the metadata to be > > > re-fetched if it's stale. This could be a frequent case if the > > > metadata.max.age.ms << metadata.expiry.ms. > > > > > > In practice, I wouldn't expect this to make a noticeable difference so > I > > > don't have a strong leaning, but the current behavior today is to > > > aggressively refresh the metadata of stale topics by ensuring a refresh > > is > > > triggered before that metadata.max.age.ms duration elapses. > > > > > > Thanks, > > > Brian > > > > > > > > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang > > wrote: > > > > > > > Hello Brian, > > > > > > > > Thanks for your explanation, could you then update the wiki page for > > the > > > > algorithm part since when I read it, I thought it was different from > > the > > > > above, e.g. urgent topics should not be added just because of max.age > > > > expiration, but should only be added if there are sending data > pending. > > > > > > > > > > > > Guozhang > > > > > > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne > > wrote: > > > > > > > > > Hi Guozhang, > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang > > > wrote: > > > > > > > > > > > 1. The addition of *metadata.expiry.ms < > http://metadata.expiry.ms> > > > > > *should > > > > > > be included in the public interface. Also its semantics needs > more > > > > > > clarification (since previously it is hard-coded internally we do > > not > > > > > need > > > > > > to explain it publicly, but now with the configurable value we do > > > > need). > > > > > > > > > > > > > > > > This was an oversight. Done. > > > > > > > > > > > > > > > > 2. There are a couple of hard-coded parameters like 25 and 0.5 in > > the > > > > > > proposal, maybe we need to explain why these magic values makes > > sense > > > > in > > > > > > common scenarios. > > > > > > > > > > > > > > > > So these are pretty fuzzy numbers, and seemed to be a decent > balance > > > > > between trade-offs. I've updated the target size to account for > > setups > > > > with > > > > > a large number of topics or a shorter refresh time, as well as > added > > > some > > > > > light rationale. > > > > > > > > > > > > > > > > 3. In the Urgent set condition, do you actually mean "with no > > cached > > > > > > metadata AND there are existing data buffered for the topic"? > > > > > > > > > > > > > > > > Yes, fixed. > > > > > > > > > > > > > > > > > > > > > One concern I have is whether or not we may introduce a > regression, > > > > > > especially during producer startup such that since we only > require > > up > > > > to > > > > > 25 > > > > > > topics each request, it may cause the send data to be buffered > more > > > > time > > > > > > than now due to metadata not available. I understand this is a > > > > > acknowledged > > > > > > trade-off in our design but any regression that may surface to > > users > > > > need > > > > > > to be very carefully considered. I'm wondering, e.g. if we can > > tweak > > > > our > > > > > > algorithm for the Urgent set, e.g. to consider those with non > > cached > > > > > > metadata have higher priority than those who have elapsed max.age > > but > > > > not > > > > > > yet have been called for sending. More specifically: > > > > > > > > > > > > Urgent: topics that have been requested for sending but no cached > > > > > metadata, > > > > > > and topics that have send request failed with e.g. NOT_LEADER. > > > > > > Non-urgent: topics that are not in Urgent but have expired > max.age. > > > > > > > > > > > > Then when sending me
[jira] [Created] (KAFKA-9335) java.lang.IllegalArgumentException: Number of partitions must be at least 1.
Nitay Kufert created KAFKA-9335: --- Summary: java.lang.IllegalArgumentException: Number of partitions must be at least 1. Key: KAFKA-9335 URL: https://issues.apache.org/jira/browse/KAFKA-9335 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.4.0 Reporter: Nitay Kufert Hey, When trying to upgrade our Kafka streams client to 2.4.0 (from 2.3.1) we encountered the following exception: {code:java} java.lang.IllegalArgumentException: Number of partitions must be at least 1. {code} It's important to notice that the exact same code works just fine at 2.3.1. I have created a "toy" example which reproduced this exception: [https://gist.github.com/nitayk/50da33b7bcce19ad0a7f8244d309cb8f] and I would love to get some insight regarding why its happening / ways to get around it Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics
Hi Stanislav, Appreciate the feedback! 1. You're correct. I've added notes to the KIP to clarify. 2. Yes it should. Fixed. 3. So I made a mistake when generalizing the target refresh size, which should have been using `metadata.max.age.ms` instead of `metadata.evict.ms`. Therefore, `metadata.max.age.ms` still controls the target refresh time of active topic metadata, and doesn't change in that regard. You're right in that the U set will make metadata refreshing more proactive in some cases, which is done to smooth out the metadata RPCs over time and lessen the potential bulk of some of the RPC responses by working in smaller units. The T set is what logically exists today and doesn't change. Thanks, Brian On Thu, Dec 26, 2019 at 2:16 AM Stanislav Kozlovski wrote: > Hey Brian, > > 1. Could we more explicitly clarify the behavior of the algorithm when `|T| > > TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in that > scenario > 2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10, 20)` be > `topicsPerSec * 10` ? > 3. When is this new algorithm applied? To confirm my understanding - what > is the behavior of `metadata.max.age.ms` after this KIP? Are we adding a > new, more proactive metadata fetch for topics in |U|? > > Thanks, > Stanislav > > On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne wrote: > > > Hello everyone, > > > > For all interested, please take a look at the proposed algorithm as I'd > > like to get more feedback. I'll call for a vote once the break is over. > > > > Thanks, > > Brian > > > > On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang > wrote: > > > > > Sounds good, I agree that should not make a big difference in practice. > > > > > > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne > wrote: > > > > > > > Hi Guozhang, > > > > > > > > I see, we agree on the topic threshold not applying to urgent topics, > > but > > > > differ slightly on what should be considered urgent. I would argue > that > > > we > > > > should consider topics nearing the metadata.max.age.ms to be urgent > > > since > > > > they may still be well within the metadata.expiry.ms. That is, the > > > client > > > > still considers these topics to be relevant (not expired), but > doesn't > > > want > > > > to incur the latency bubble of having to wait for the metadata to be > > > > re-fetched if it's stale. This could be a frequent case if the > > > > metadata.max.age.ms << metadata.expiry.ms. > > > > > > > > In practice, I wouldn't expect this to make a noticeable difference > so > > I > > > > don't have a strong leaning, but the current behavior today is to > > > > aggressively refresh the metadata of stale topics by ensuring a > refresh > > > is > > > > triggered before that metadata.max.age.ms duration elapses. > > > > > > > > Thanks, > > > > Brian > > > > > > > > > > > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang > > > wrote: > > > > > > > > > Hello Brian, > > > > > > > > > > Thanks for your explanation, could you then update the wiki page > for > > > the > > > > > algorithm part since when I read it, I thought it was different > from > > > the > > > > > above, e.g. urgent topics should not be added just because of > max.age > > > > > expiration, but should only be added if there are sending data > > pending. > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne > > > wrote: > > > > > > > > > > > Hi Guozhang, > > > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang > > > > > wrote: > > > > > > > > > > > > > 1. The addition of *metadata.expiry.ms < > > http://metadata.expiry.ms> > > > > > > *should > > > > > > > be included in the public interface. Also its semantics needs > > more > > > > > > > clarification (since previously it is hard-coded internally we > do > > > not > > > > > > need > > > > > > > to explain it publicly, but now with the configurable value we > do > > > > > need). > > > > > > > > > > > > > > > > > > > This was an oversight. Done. > > > > > > > > > > > > > > > > > > > 2. There are a couple of hard-coded parameters like 25 and 0.5 > in > > > the > > > > > > > proposal, maybe we need to explain why these magic values makes > > > sense > > > > > in > > > > > > > common scenarios. > > > > > > > > > > > > > > > > > > > So these are pretty fuzzy numbers, and seemed to be a decent > > balance > > > > > > between trade-offs. I've updated the target size to account for > > > setups > > > > > with > > > > > > a large number of topics or a shorter refresh time, as well as > > added > > > > some > > > > > > light rationale. > > > > > > > > > > > > > > > > > > > 3. In the Urgent set condition, do you actually mean "with no > > > cached > > > > > > > metadata AND there are existing data buffered for the topic"? > > > > > > > > > > > > > > > > > > > Yes, fixed. > > > > > > > > > > > > > > > > > > > > > > > > > One concern I have is whether or not we may introduce a > > regress
Re: Please give a link where I can explain myself what this can be ...
Docs -> https://kafka.apache.org/24/documentation/streams/developer-guide/dsl-api.html#aggregating On 12/23/19 8:30 AM, Aurel Sandu wrote: > I'am reading the folowing code : > ... > KTable wordCounts = textLines > .flatMapValues(textLine -> > Arrays.asList(textLine.toLowerCase().split("\\W+"))) > .groupBy((key, word) -> word) > .count(Materialized. byte[]>>as("counts-store")); > ... > > I do not understand the last line ... what is this ? > Materialized.>as("counts-store") ? > Materialized is from this > https://kafka.apache.org/21/javadoc/org/apache/kafka/streams/kstream/Materialized.html > Please give me a link where I can find explanations , > Many thanks, > Aurel > signature.asc Description: OpenPGP digital signature
Re: [DISCUSS] KIP-552: Add interface to handle unused config
Thanks for the KIP, Artur! For reference, here is the kip: https://cwiki.apache.org/confluence/display/KAFKA/KIP-552%3A+Add+interface+to+handle+unused+config I agree, these warnings are kind of a nuisance. Would it be feasible just to leverage log4j in some way to make it easy to filter these messages? For example, we could move those warnings to debug level, or even use a separate logger for them. Thanks for starting the discussion. -John On Tue, Dec 24, 2019, at 07:23, Artur Burtsev wrote: > Hi, > > This KIP provides a way to deal with a warning "The configuration {} > was supplied but isn't a known config." when it is not relevant. > > Cheers, > Artur >
[jira] [Created] (KAFKA-9336) Connecting to Kafka using forwarded Kerberos credentials
Grzegorz Kokosinski created KAFKA-9336: -- Summary: Connecting to Kafka using forwarded Kerberos credentials Key: KAFKA-9336 URL: https://issues.apache.org/jira/browse/KAFKA-9336 Project: Kafka Issue Type: Improvement Components: clients Reporter: Grzegorz Kokosinski My application is using forwarded Kerberos tickets, see: [https://web.mit.edu/kerberos/krb5-latest/doc/user/tkt_mgmt.html]. Users authenticates in my JMV-based remote service using KRB, then in my service I would like to connect to Kafka (via KafkaProducer or KarkaConsumer) using user KRB credentials. It looks like currently this scenario is impossible to be implemented, because the only option to authenticate to Kafka with KRB is via JVM system property: -Djava.security.auth.login.config=/etc/kafka/kafka_client_jaas.conf. Notice that I don't have a keytab file but only: [https://docs.oracle.com/javase/7/docs/api/org/ietf/jgss/GSSCredential.html.|https://docs.oracle.com/javase/7/docs/api/org/ietf/jgss/GSSCredential.html] GSSCredential allows me to use [https://docs.oracle.com/javase/7/docs/api/javax/security/auth/Subject.html#doAs(javax.security.auth.Subject,%20java.security.PrivilegedAction)] which typically works in other systems like Postgres to authenticate the user with KRB using forwarded ticket. afka requires to use -- This message was sent by Atlassian Jira (v8.3.4#803005)