A1 / BC+1
What do you think of these options for taking internal topics and
state store names into consideration?
* ensure.explicit.internal.names
* ensure.explicit.resource.names
* ensure.consistent.resource.naming
* ensure.invariant.persistence.naming
BC+2
To not rely only on a pattern, this is what can be done:
- For repartition topics, they are all created in the
KStreamImpl.createRepartitionedSource(...) static method.
The method either receives a name explicitly provided by the user or
null and builds the final repartition topic name.
Here, I can easily determine if a name has been provided.
- For changelog topics and state store names, I identified two
situations where they are created:
1. In the MaterializedInternal constructor, which receives a
Materialized object with a name or not.
2. When a KStream/KStream join is made. In this case, stores are
not built using Materialized but are instead created in the
KStreamImplJoin.join() method.
In all cases, I can access the InternalTopologyBuilder from the
InternalStreamBuilder object and add the unprovided topics or store
names to a list property:
List<String> UnprovidedInternalNames.
I will then be able to check the content of this list to enforce the
explicit naming requirement if needed.
Thanks for your feedback.
regards,
Sébastien
________________________________
De : Lucas Brutschy <lbruts...@confluent.io.INVALID>
Envoyé : jeudi 28 novembre 2024 11:51
À : dev@kafka.apache.org <dev@kafka.apache.org>
Objet : [SUSPICIOUS EXTERNAL MESSAGE] Re: [DISCUSS] KIP-1111:
Enforcing Explicit Naming for Kafka Streams Internal Topics
Warning This might be a fraudulent message! When clicking REPLY, your
answers will NOT go to the sender (lbruts...@confluent.io.invalid).
Instead, replies will be sent to dev@kafka.apache.org. Be cautious!
Hi all,
Thanks for the KIP! Super useful.
No new comments, let me just voice my opinion on the suggestions being
made.
A1: When I read `require.auto.generated.topic.names`, it sounds like
explicit naming is not allowed if the config is true. This is not what
we are doing here. So to avoid the negative, I'd use
`require.explicit.topic.names`, default false.
BC+1: I think `invariant` is quite indirect, because it depends on
what changes are being made for the naming to change or not. Also,
it's not only about changelog topics but also repartition topics, so
`persistance` seems misleading as well. I agree, IQ can benefit, but
it seems that it is more of a side effect of the feature?
BC+2: Big +1 on this.
Cheers,
Lucas
This email was screened for spam and malicious content but exercise
caution anyway.
On Tue, Nov 26, 2024 at 9:37 PM Sebastien Viale
<sebastien.vi...@michelin.com> wrote:
Thanks Bruno for your comments:
A1 .
BC+1.
I let other people give their advice, and each of them makes sense.
BC+2.
I believe you're right. I think I can manage to check whether names
are set for repartition or changelog topics across different DSL
operators in KStreamImpl. The complexity arises because repartition
topic names can depend on the operator, such as selectKey before a join.
If I detect unnamed topics, I could introduce a property like
unNamedInternalTopics in InternalStreamBuilder to maintain a list of
unnamed topics that I can check when topology is built. This approach
would allow me to focus not only on sequence numbers but also on
ensuring explicit topic naming for improved stability. I hope this is
what you imagine.
Let me know if you need further refinements!
Sébastien,
________________________________
De : Bruno Cadonna <cado...@apache.org>
Envoyé : mardi 26 novembre 2024 19:12
À : dev@kafka.apache.org <dev@kafka.apache.org>
Objet : [EXT] Re: [DISCUSS] KIP-1111: Enforcing Explicit Naming for
Kafka Streams Internal Topics
Warning External sender Do not click on any links or open any
attachments unless you trust the sender and know the content is safe.
Hi,
Thanks for the KIP!
A1.
I find the "auto.generated.topic.names" part a bit misleading, because
actually the topic names are always auto-generated no matter if the
processors and state stores are explicitly named or not.
which leads me to
BC+1.
IMO this is not only about topic names, explicitly naming state stores
would also benefit existing IQ queries. So the config should not
necessarily be only focused on internal topics.
Maybe a config name like invariant.persistence.naming or
ensure.invariant.persistence.naming or
enabled.invariant.persistence.naming. I am not too happy with the names
but I hope you get the idea.
BC+2.
Can we not keep track of explicit naming within the DSL instead of
relying on the 10-digit sequence number?
Best,
Bruno
This email was screened for spam and malicious content but exercise
caution anyway.
On 22.11.24 22:04, Sebastien Viale wrote:
hi,
A1. Personally, I prefer the suggestion:
require.auto.generated.topic.names.
If everyone agrees, I will update the KIP accordingly.
MJS-1. Sophie, I reviewed the ticket and the sub-tasks, and
everything seems clear to me. I must say that I sometimes felt
confused with the constructors, but now everything is much clearer.
Based on my experience and discussions with my teammates, I believe
this is a good opportunity to simplify and clarify the implementation.
The question is whether this should be part of a separate KIP.
I don’t see any issue with integrating it into the current one, but
if necessary, I’m happy to volunteer to open a new KIP.
In both cases, would it be okay for you if I validate it with you
before publishing?
cheers !
________________________________
De : Sophie Blee-Goldman <sop...@responsive.dev>
Envoyé : vendredi 22 novembre 2024 01:33
À : dev@kafka.apache.org <dev@kafka.apache.org>
Objet : [EXT] Re: [DISCUSS] KIP-1111: Enforcing Explicit Naming for
Kafka Streams Internal Topics
Warning External sender Do not click on any links or open any
attachments unless you trust the sender and know the content is safe.
First off, thanks for the KIP! I think this is a great idea as it's
super
easy to miss naming one thing and end up with a topology that isn't
upgradeable.
A1. I actually had the same reaction as Almog to the name, I feel it's
slightly clearer as a positive instead of a negative, though I think
the
rest of it is fine -- what about "require.auto.generated.topic.names"
instead?
MJS-1. While I can see the advantage of having this stuff be
programmatically configured, I personally would prefer to keep this and
other topology-related configurations part of StreamsConfig. For one
thing,
I think they would be much more discoverable as true configs. For
another,
this still wouldn't solve the problem for configs that need to be
passed in
up front, that is, to the Topology/StreamsBuilder constructor rather
than
to StreamsBuilder#build. And imo whatever we do should work for all
such
configs that are required for building a topology.
Sebastien -- have you read through that ticket and its subtasks yet?
I'd be
interested in your take on what I'm proposing there, as I think it
could be
mutually beneficial. For context, I'm currently implementing a KIP that
introduces another such config in this category which needs to be
passed
into the topology.
This email was screened for spam and malicious content but exercise
caution anyway.
On Thu, Nov 21, 2024 at 12:42 PM Matthias J. Sax <mj...@apache.org>
wrote:
Yes, the ticket is related. Also just saw it today.
Did leave a comment on the ticket for Sophie to hopefully chime in on
this KIP discussion.
-Matthias
On 11/21/24 9:06 AM, Sebastien Viale wrote:
Thank you very much for your reviews!
A1 I will keep the disallow.auto.generated.topic.names
configuration in
the KIP for now while waiting for other suggestions.
A2 I thought about making the implementation typesafe; it would, of
course, complicate the implementation, but for me,
this feature is intended for new Kafka Streams applications or those
after a reset. I agree with Matthias.
If a user wants to enable the check, they will simply need to avoid
naming topics like the auto-generated ones.
MJS-1 It might be a good idea to add your proposal to the KIP. I just
wonder how to distinguish configurations that must be set
'programmatically' from others (e.g., topology optimization and
this one)
I am open to any suggestions.
Is this ticket related to what you are proposing
https://lists.apache.org/thread/
dfgd2vcco7d1omjptfqp92kdocnlf3cq<https://lists.apache.org/thread/
dfgd2vcco7d1omjptfqp92kdocnlf3cq><https://lists.apache.org/thread/
dfgd2vcco7d1omjptfqp92kdocnlf3cq<https://lists.apache.org/thread/
dfgd2vcco7d1omjptfqp92kdocnlf3cq>><https://lists.apache.org/
thread/dfgd2vcco7d1omjptfqp92kdocnlf3cq<https://lists.apache.org/
thread/dfgd2vcco7d1omjptfqp92kdocnlf3cq><https://lists.apache.org/
thread/dfgd2vcco7d1omjptfqp92kdocnlf3cq<https://lists.apache.org/
thread/dfgd2vcco7d1omjptfqp92kdocnlf3cq>>>
cheers,
Sébastien
________________________________
De : Matthias J. Sax <mj...@apache.org>
Envoyé : jeudi 21 novembre 2024 02:48
À : dev@kafka.apache.org <dev@kafka.apache.org>
Objet : [EXT] Re: [DISCUSS] KIP-1111: Enforcing Explicit Naming for
Kafka Streams Internal Topics
Warning External sender Do not click on any links or open any
attachments unless you trust the sender and know the content is safe.
Thanks for the KIP. Overall this does sound useful.
About Almog's comments
(A1) -- I don't think that `topics.internal.require.explicit.naming`
would be a good name, as we use `topic.` prefix for actual topic
configs. Thus, KS would pickup `internal.require.explicit.naming` and
try to apply is as topic config what would either crash or log a
spurious WARN what would be annoying?
(A2) -- We would need to check the code if such a strategy would
work as
expected? If users pass in a name for a previously un-named topic, we
might get an cascading index shift which might be undesired (and
could
"break" processor level metrics)?
But I am also not sure if we actually need a migration path? If
might be
ok if this new feature only work for new deployments? (Or use can
re-set
their application.)
MJS-1: I have another question though about the risk such a config
implies? Given that configs are often managed from "outside", one
could
easily break an exiting application which does not have this config
enable, by enable the config. Of course, we already have similar
config
which are equally dangerous; however, most people don't like that one
need to pass in the config into `StreamsBuilder.build(configs)`
method
anyway, so maybe we could make a first step to get rid of this via
this
KIP?
Thus, I am wondering if a config is actually the right way to go?
Should
we instead make it feature of `StreamsBuilder` that one can enable
programmatically? Not 100% how we would do this, but maybe we
could use
a builder pattern (to allow us to add similar thing in the
future), and
deprecate the current constructor of `StreamsBuilder`?
// new way to build a `StreamsBuilder`
StreamsBuilder builder = StreamsBuilder.build();
// if we don't extend the scope of this KIP, we might also need:
StreamsBuilder builder = StreamsBuilder.build(Properties);
Or we do extend this KIP and deprecate existing config (like topology
optimization) if favor of the new builder pattern:
// enable the new feature
StreamsBuilder builder =
StreamsBuilder.requireExplicitNaming().build();
// or
StreamsBuilder builder =
StreamsBuilder.disableNameGeneration().build();
Thoughts? I am not 100% sure if this is a good idea or not, but
thought
it cannot hurt to throw it out.
-Matthias
This email was screened for spam and malicious content but exercise
caution anyway.
On 11/18/24 12:57 PM, Almog Gavra wrote:
Hi Sebastien,
Thanks for the KIP! In general, I'm a fan of giving users the
tools they
need to protect their organization so I'm supportive of this
proposal. A
few nits and comments:
A1. [nit] consider 'topics.internal.require.explicit.naming' so that
(a) we
can group anything else we introduce for "topics.internal" with
the same
prefix and (b) it's not a double negative (don't disallow is the
default,
instead of don't require).
A2. I think we can improve on the implementation by making it
typesafe
instead of checking whether the topic matches some pattern. I
think a
migration path for users that want to turn this flag on, but already
have
some auto-generated names, is to manually specify the auto-generated
names
for preexisting topics. This would enforce future topics naming,
but not
penalize them for having used auto generated names in the past. This
makes
the implementation a little more challenging, but I think it's
worthwhile.
Cheers,
Almog
On Fri, Nov 15, 2024 at 12:20 AM Sebastien Viale <
sebastien.vi...@michelin.com> wrote:
Hi Everyone,
I would like to start a discussion on KIP-1111: Enforcing Explicit
Naming
for Kafka Streams Internal Topics<
https://cwiki.apache.org/confluence/display/KAFKA/
KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics><https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics>><https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics><https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics>>>
<
https://cwiki.apache.org/confluence/display/KAFKA/
KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics><https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics>><https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics><https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1111%3A+Enforcing+Explicit+Naming+for+Kafka+Streams+Internal+Topics>>>
This proposal aims to add a configuration that prevents a Kafka
Streams
application from starting if any of its internal topics have
auto-generated
names.
Regards,
Sébastien