Re: [VOTE] 2.0.1 RC0

2018-11-02 Thread Ewen Cheslack-Postava
+1

-Ewen

On Thu, Nov 1, 2018 at 10:10 AM Manikumar  wrote:

> We were waiting for the system test results. There were few failures:
> KAFKA-7579,  KAFKA-7559, KAFKA-7561
> they are not blockers for 2.0.1 release. We need more votes from
> PMC/committers :)
>
> Thanks Stanislav! for the system test results.
>
> Thanks,
> Manikumar
>
> On Thu, Nov 1, 2018 at 10:20 PM Eno Thereska 
> wrote:
>
> > Anything else holding this up?
> >
> > Thanks
> > Eno
> >
> > On Thu, Nov 1, 2018 at 10:27 AM Jakub Scholz  wrote:
> >
> > > +1 (non-binding) ... I used the staged binaries and run tests with
> > > different clients.
> > >
> > > On Fri, Oct 26, 2018 at 4:29 AM Manikumar 
> > > wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the first candidate for release of Apache Kafka 2.0.1.
> > > >
> > > > This is a bug fix release closing 49 tickets:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
> > > >
> > > > Release notes for the 2.0.1 release:
> > > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote by  Tuesday, October 30, end of
> day
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > http://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > > https://repository.apache.org/content/groups/staging/
> > > >
> > > > * Javadoc:
> > > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
> > > >
> > > > * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> > > > https://github.com/apache/kafka/releases/tag/2.0.1-rc0
> > > >
> > > > * Documentation:
> > > > http://kafka.apache.org/20/documentation.html
> > > >
> > > > * Protocol:
> > > > http://kafka.apache.org/20/protocol.html
> > > >
> > > > * Successful Jenkins builds for the 2.0 branch:
> > > > Unit/integration tests:
> > > https://builds.apache.org/job/kafka-2.0-jdk8/177/
> > > >
> > > > /**
> > > >
> > > > Thanks,
> > > > Manikumar
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2018-11-27 Thread Ewen Cheslack-Postava
re: AdminClient vs this proposal, one consideration is that AdminClient
exposes a lot more surface area and probably a bunch of stuff we actually
don't want Connectors to be able to do, such as deleting topics. You can
always lock down by ACLs, but what the framework enables directly vs
requiring the user to opt in via connector-specific config is an important
distinction.

I'm not a fan of how complex the config is (same deal with
transformations), and agree with Ryanne that any case requiring multiple
rules is probably an outlier. A cleaner option for the common case might be
worth it. One option that's still aligned with the current state of the KIP
would be to change the default for topic.creation to a fixed default value
(e.g. 'default'), if that's the case turn the topic.creation.default.regex
default to .*, and then 99% use case would just be specifying the # of
partitions with a single config and relying on cluster defaults for the
rest. (I would suggest the same thing for transformations if we added a
simple scripting transformation such that most use cases wouldn't need to
compose multiple transformations.)

Along related lines, is there actually a need for TopicSettings class? We
already have NewTopic in the AdminClient APIs. Does that not suffice?

-Ewen

On Mon, Sep 24, 2018 at 11:56 AM Andrew Otto  wrote:

> FWIW, I’d find this feature useful.
>
> On Mon, Sep 24, 2018 at 2:42 PM Randall Hauch  wrote:
>
> > Ryanne,
> >
> > If your connector is already using the AdminClient, then you as the
> > developer have a choice of switching to the new Connect-based
> functionality
> > or keeping the existing use of the AdminClient. If the connector uses
> both
> > mechanisms (which I wouldn't recommend, simply because of the complexity
> of
> > it for a user), then the topic will be created by the first mechanism to
> > actually attempt and successfully create the topic(s) in the Kafka
> cluster
> > that the Connect worker uses. As mentioned in the KIP, "This feature ...
> > does not change the topic-specific settings on any existing topics." IOW,
> > if the topic already exists, it can't be created again and therefore the
> > `topic.creation.*` properties will not apply for that existing topic.
> >
> > > Do these settings apply to internal topics created by the framework on
> > > bahalf of a connector, e.g. via KafkaConfigBackingStore?
> >
> > No, they don't, and I'm happy to add a clarification to the KIP if you
> feel
> > it is necessary.
> >
> > > I'd have the same questions if e.g. transformations could be ignored or
> > > overridden by connectors or if there were multiple places to specify
> what
> > > serde to use.
> >
> > There are multiple places that converters can be defined: the worker
> config
> > defines the key and value converters that will be used for all
> connectors,
> > except when a connector defines its own key and value converters.
> >
> > > I don't see how controlling topic creation based on topic name is
> > something
> > > we should support across all connectors, as if it is some established
> > > pattern or universally useful.
> >
> > Topics are identified by name, and when you create a topic with specific
> > settings or change a topic's settings you identify the topic by name. The
> > fact that this KIP uses regular expressions to match topic names doesn't
> > seem surprising, since we use regexes elsewhere.
> >
> > Best regards
> >
> > On Mon, Sep 24, 2018 at 1:24 PM Ryanne Dolan 
> > wrote:
> >
> > > Randall,
> > >
> > > Say I've got a connector that needs to control topic creation. What I
> > need
> > > is an AdminClient s.t. my connector can do what it knows it needs to
> do.
> > > This KIP doesn't address the issues that have been brought up wrt
> > > configuration, principals, ACL etc, since I'll still need to construct
> my
> > > own AdminClient.
> > >
> > > Should such a connector ignore your proposed configuration settings?
> > Should
> > > it use it's own principal and it's own configuration properties? How
> does
> > > my AdminClient's settings interact with your proposed settings and the
> > > existing cluster settings?
> > >
> > > What happens when a user specifies topic creation settings in a
> connector
> > > config, but the connector then applies it's own topic creation logic?
> Are
> > > the configurations silently ignored? If not, how can a connector honor
> > your
> > > proposed settings?
> > >
> > > Do these settings apply to internal topics created by the framework on
> > > bahalf of a connector, e.g. via KafkaConfigBackingStore?
> > >
> > > When do the cluster settings get applied? Only after 3 layers of
> > > fall-through?
> > >
> > > I'd have the same questions if e.g. transformations could be ignored or
> > > overridden by connectors or if there were multiple places to specify
> what
> > > serde to use.
> > >
> > > I don't see how controlling topic creation based on topic name is
> > something
> > > we should support across all connectors, as if it is

Re: [DISCUSS] KIP-411: Add option to make Kafka Connect task client ID values unique

2019-01-04 Thread Ewen Cheslack-Postava
Hi Paul,

Thanks for the KIP. A few comments.

To me, biggest question here is if we can fix this behavior without adding
a config. In particular, today, we don't even set the client.id for the
producer and consumer at all, right? The *only* way it is set is if you
include an override in the worker config, but in that case you need to be
explicitly opting in with a `producer.` or `consumer.` prefix, i.e. the
settings are `producer.client.id` and `consumer.client.id`. Otherwise, I
think we're getting the default behavior where we generate unique,
per-process IDs, i.e. via this logic
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L662-L664

If that's the case, would it maybe be possible to compatibly change the
default to use task IDs in the client ID, but only if we don't see an
existing override from the worker config? This would only change the
behavior when someone is using the default, but since the default would
just use what is effectively a random ID that is useless for monitoring
metrics, presumably this wouldn't affect any existing users. I think that
would avoid having to introduce the config, give better out of the box
behavior, and still be a safe, compatible change to make.


Other than that, just two minor comments. On the config naming, not sure
about a better name, but I think the config name could be a bit clearer if
we need to have it. Maybe something including "task" like
"task.based.client.ids" or something like that (or change the type to be an
enum and make it something like task.client.ids=[default|task] and leave it
open for extension in the future if needed).

Finally, you have this:

*"Allow overriding client.id  on a per-connector basis"*
>
> This is a much more complex change, and would require individual
> connectors to be updated to support the change. In contrast, the proposed
> approach would immediately allow detailed consumer/producer monitoring for
> all existing connectors.
>

I don't think this is quite accurate. I think the reason to reject is that
for your particular requirement for metrics, it simply doesn't give enough
granularity (there's only one value per entire connector), but it doesn't
require any changes to connectors. The framework allocates all of these and
there are already framework-defined config values that all connectors share
(some for only sinks or sources), so the framework can handle all of this
without changes to connectors. Further, with connector-specific overrides,
you could get task-specific values if interpolation were supported in the
config value (as we now do with managed secrets). For example, it could
support something like client.id=connector-${taskId} and the task ID would
be substituted automatically into the string.

I don't necessarily like that solution (seems complicated and not a great
user experience), but it could work.

-Ewen




On Thu, Dec 20, 2018 at 5:05 PM Paul Davidson 
wrote:

> Hi everyone,
>
> I would like to start a discussion around the following KIP:
> *
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique
> >*
>
> This proposes a small change to allow Kafka Connect the option to
> auto-generate unique client IDs for each task. This enables granular
> monitoring of the producer / consumer client in each task.
>
> Feedback is appreciated, thanks in advance!
>
> Paul
>


Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2019-01-04 Thread Ewen Cheslack-Postava
Hey Ryanne,

Sorry, late to the game here.

On the ACL management, can you explain how things are supposed to work when
you need to migrate to the new cluster? Or is this purely for a mirroring
but not DR and failover cases? In particular, the rules outlined state that
only MM2 would be able to write on the new cluster. But if you have a DR
case, at some point you need to adjust ACLs for the failed-over apps to
write. And presumably you want the original set of write ACLs to apply, so
it'd need to apply them during some failover event?

On the compatibility story, you mention running a compatibility mode. What
does this mode do with ensuring settings match and offsets are reused? I
see in the proposal that we define new settings, but is an existing MM1
config guaranteed to continue working? Does that include custom extensions
like MessageHandlers? I'm not sure I entirely understand the compatibility
story here (which could also be that we just don't provide one -- just want
to make sure it is clear).

I may have missed something in this proposal since it's pretty long, let me
know if there was something obvious I overlooked.

Thanks,
-Ewen

On Mon, Dec 31, 2018 at 12:57 PM Ryanne Dolan  wrote:

> > transactional messages [...] could result in frequent writes to the
> offset mapping topic.
>
> Becket, I think we could limit writes to a max frequency to ameliorate this
> problem.
>
> >  I am wondering if users can just seek by timestamp and get a more
> precise mapping [...]
> >  I assume that MM2 will mirror the timestamps from source to target
> without being changed
>
> Yes, MM2 passes timestamps along, and also any headers. The timestamps are
> useful for measuring replication lag etc, but they are not particularly
> useful for consumer migration (failover etc). I can expound on this further
> if you like, but in practice offsets are better if you have them.
>
> > I don't think this should block MM2 given there are a lot of other
> benefits already
>
> Thanks! I appreciate your support.
> Ryanne
>
> On Fri, Dec 28, 2018 at 9:54 PM Becket Qin  wrote:
>
> > Hi Ryanne,
> >
> > Thanks for the reply. You are right. The topic naming and ACL sanity
> check
> > should probably be a separate discussion.
> >
> > Regarding the offset mapping. Thought about this a bit more. We may need
> to
> > consider the cases such as logs including transactional messages. In that
> > case, the offset mapping may not be contiguous due to the existence of
> the
> > transaction control messages. It could result in frequent writes to the
> > offset mapping topic.
> >
> > I don't think this should block MM2 given there are a lot of other
> benefits
> > already. That said, if we use periodic offset mapping records, I am
> > wondering if users can just seek by timestamp and get a more precise
> > mapping (at millisecond granularity) from the source cluster to the
> target
> > cluster. But admittedly, this approach has its own limitations such as
> > users are expected to use LogAppendTime for the original topic. (BTW, I
> > assume that MM2 will mirror the timestamps from source to target without
> > being changed)
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Thu, Dec 27, 2018 at 1:16 AM Ryanne Dolan 
> > wrote:
> >
> > > Becket, this is great feedback, thanks.
> > >
> > > > having a reserved character for the topics is probably something
> worth
> > > doing in general
> > >
> > > Agreed, but we shouldn't make this a requirement for MM2, or else it
> > > wouldn't work with older versions of Kafka, complicating adoption,
> > testing
> > > etc. In particular, we'll want to prove MM2 against production
> workloads
> > > without first upgrading Kafka brokers.
> > >
> > > I think we should 1) make the separator configurable in MM2, defaulting
> > to
> > > a period for now, 2) in a separate KIP, propose a special separator
> > > character as you suggest, 3) maybe update the default at some point.
> > >
> > > We can endeavor to do this all in the same release, which would have
> the
> > > effect you want.
> > >
> > > > It might be better to add a config like allowAclMismatch to let user
> > > decide what should be the right behavior, i.e. either fail a mirror if
> > ACL
> > > mismatch, or mirror it with different ACLs.
> > >
> > > What would it mean to "fail a mirror"? I think it would be strange if
> > > replication suddenly stopped after someone changes an ACL somewhere.
> > >
> > > I think for users that want ACLs to mismatch, they'd just disable
> > > sync.topic.acls and manage ACLs themselves. I want MM2 to do the right
> > > thing by default, but I don't think it should be responsible for
> > enforcing
> > > policies or protecting against changes beyond its control and purview.
> > >
> > > > seems possible to achieve per message granularity, given that there
> is
> > a
> > > single writer to the remote topic [...] if the returned target offset
> is
> > > different from the expectation [...], MM2 emit a new mapping message to

Re: [DISCUSS] KIP-285: Connect Rest Extension Plugin

2018-05-16 Thread Ewen Cheslack-Postava
Hey,

Sorry for the late follow up. I just had a couple of minor questions about
details:

* Some of the public API being added is under a runtime package. But that
would be new for public API -- currently only things under the runtime
package use that package name. I think changing the package name to just be
under o.a.k.connect.rest or something like that would better keep this
distinction clear and would also help shorten it a bit -- the packages are
getting quite deeply nested with some of the new naming.
* The cluster state classes probably shouldn't be under a rest package.
That's where we're exposing them for public APIs currently, but it's not
really specific to REST stuff in any way. I think we should house those
somewhere more generic so they won't be awkward to reuse if we decided to
(e.g. you could imagine extensions that provide this directly for metrics.
* Currently we have the State classes nested inside ConnectorHealth class.
I think this makes those classes more annoying to use. Is there a reason
for them to be nested or can we just pull them out to the same level as
ConnectorHealth?

-Ewen

On Tue, May 15, 2018 at 9:30 AM Magesh Nandakumar 
wrote:

> Randall- I think I have addressed all the comments. Let me know if we can
> take this to Vote.
>
> Thanks
> Magesh
>
> On Tue, May 8, 2018 at 10:12 PM, Magesh Nandakumar 
> wrote:
>
> > Hi All,
> >
> > I have updated the KIP to reflect changes based on the PR
> > https://github.com/apache/kafka/pull/4931. Its mostly has minor changes
> > to the interfaces and includes details on packages for the interfaces and
> > the classes. Let me know your thoughts.
> >
> > Thanks
> > Magesh
> >
> > On Fri, Apr 27, 2018 at 12:03 PM, Randall Hauch 
> wrote:
> >
> >> Great work, Magesh. I like the overall approach a lot, so I left some
> >> pretty nuanced comments about specific details.
> >>
> >> Best regards,
> >>
> >> Randall
> >>
> >> On Wed, Apr 25, 2018 at 3:03 PM, Magesh Nandakumar <
> mage...@confluent.io>
> >> wrote:
> >>
> >> > Thanks Randall for your thoughts. I have created a replica of the
> >> required
> >> > entities in the draft implementation. If you can take a look at the PR
> >> and
> >> > let me know your thoughts, I will update the KIP to reflect the same
> >> >
> >> > https://github.com/apache/kafka/pull/4931
> >> >
> >> > On Tue, Apr 24, 2018 at 11:44 AM, Randall Hauch 
> >> wrote:
> >> >
> >> > > Magesh, I think our last emails cross in mid-stream.
> >> > >
> >> > > We definitely want to put the new public interfaces/classes in the
> API
> >> > > module, and implementation in the runtime module. Yes, this will
> >> affect
> >> > the
> >> > > design, since for example we don't want to expose runtime types to
> the
> >> > API,
> >> > > and we want to prevent breaking changes. We don't really want to
> move
> >> the
> >> > > REST entities if we don't have to, since that may break projects
> that
> >> are
> >> > > extending the runtime module -- even though the runtime module is
> not
> >> a
> >> > > public API we still want to _try_ to change things.
> >> > >
> >> > > Do you want to try to create a prototype to see what kind of impact
> >> and
> >> > > choices we'll have to make?
> >> > >
> >> > > Best regards,
> >> > >
> >> > > Randall
> >> > >
> >> > > On Tue, Apr 24, 2018 at 12:48 PM, Randall Hauch 
> >> > wrote:
> >> > >
> >> > > > Thanks for updating the KIP, Magesh. You've resolved all of my
> >> > concerns,
> >> > > > though I have one more: we should specify the package names for
> all
> >> new
> >> > > > interfaces/classes.
> >> > > >
> >> > > > I'm looking forward to more feedback from others.
> >> > > >
> >> > > > Best regards,
> >> > > >
> >> > > > Randall
> >> > > >
> >> > > > On Fri, Apr 20, 2018 at 12:17 AM, Magesh Nandakumar <
> >> > > mage...@confluent.io>
> >> > > > wrote:
> >> > > >
> >> > > >> Hi All,
> >> > > >>
> >> > > >> I have updated the KIP with following changes
> >> > > >>
> >> > > >>1. Expanded the Motivation section
> >> > > >>2. Included details about the interface in the public
> interface
> >> > > section
> >> > > >>3. Modified the config name to rest.extension.classes
> >> > > >>4. Modified the ConnectRestExtension to include Configurable
> >> > instead
> >> > > of
> >> > > >>ResourceConfig
> >> > > >>5. Modified the "Rest Extension Integration with Connect" in
> >> > > "Proposed
> >> > > >>Approach" to include a new Custom implementation for
> >> Configurable
> >> > > >>6. Provided examples for the Java Service provider mechanism
> >> > > >>7. Included a reference implementation in scope
> >> > > >>
> >> > > >> Kindly let me know your thoughts on the updates.
> >> > > >>
> >> > > >> Thanks
> >> > > >> Magesh
> >> > > >>
> >> > > >> On Thu, Apr 19, 2018 at 10:39 AM, Magesh Nandakumar <
> >> > > mage...@confluent.io
> >> > > >> >
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Hi Randall,
> >> > > >> >
> >> > > >> > Thanks for your feedback. I also would like to go with

Re: [DISCUSS] KIP-305: Add Connect primitive number converters

2018-05-17 Thread Ewen Cheslack-Postava
Just a couple of minor points that don't really affect the implementation:

* For nulls, let's just mention the underlying serializers already support
this. I'm actually not sure why they should/need to, but given they do,
let's just defer to that implementation.
* I'm not sure where Float and Double converters are actually useful. The
use cases I know for integer serdes is for keys, but floats seem like a bad
choice for keys. These aren't a lot of overhead to build and maintain, but
if we don't know use cases for the specific types, it might be silly to
spend time and effort building and maintaining them.

Otherwise, this seems simple and straightforward. Generally +1 on the
proposal.

-Ewen

On Thu, May 17, 2018 at 6:04 PM Magesh Nandakumar 
wrote:

> Thanks Randall for the KIP. I think it will be super useful and looks
> pretty straightforward to me.
>
> Thanks
> Magesh
>
> On Thu, May 17, 2018 at 4:15 PM, Randall Hauch  wrote:
>
> > I'd like to start discussion of a very straightforward proposal for
> Connect
> > to add converters for the basic primitive number types: integer, short,
> > long, double, and float. Here is the KIP:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 305%3A+Add+Connect+primitive+number+converters
> >
> > As mentioned in the KIP, I've created a pull request (
> > https://github.com/apache/kafka/pull/5034) for those looking for
> > implementation details.
> >
> > Any feedback is appreciated.
> >
> > Best regards,
> >
> > Randall
> >
>


Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-17 Thread Ewen Cheslack-Postava
Thanks for addressing this Robert, it's a pretty common user need.

First, +1 (binding) generally.

Two very minor comments that I think could be clarified but wouldn't affect
votes:

* Let's list in the KIP what package the ConfigProvider,
ConfigChangeCallback, ConfigData and ConfigTransformer interfaces are
defined in. Very, very minor, but given the aim to possibly reuse elsewhere
and the fact that it'll likely end up in the common packages might mean
devs focused more on the common/core packages will have strong opinions
where they should be. I think it'd definitely be good to get input from
folks focusing on the broker on where they think it should go since I think
it would be very natural to extend this to security settings there. (Also,
I think ConfigData is left out of the list of new interfaces by accident,
but I think it's clear what's being added anyway.)
* I may have glanced past it, but we're not shipping any ConfigProviders
out of the box? This mentions file and vault, but just as examples. Just
want to make sure everyone knows up front that this is a pluggable API, but
you need to add more jars to take advantage of it. I think this is fine as
I don't think there are truly common secrets provider
formats/apis/protocols, just want to make sure it is clear.

Thanks,
Ewen

On Thu, May 17, 2018 at 6:19 PM Ted Yu  wrote:

> +1
>  Original message From: Magesh Nandakumar <
> mage...@confluent.io> Date: 5/17/18  6:05 PM  (GMT-08:00) To:
> dev@kafka.apache.org Subject: Re: [VOTE] KIP-297: Externalizing Secrets
> for Connect Configurations
> Thanks Robert, this looks great
>
> +1 (non-binding)
>
> On Thu, May 17, 2018 at 5:35 PM, Colin McCabe  wrote:
>
> > Thanks, Robert!
> >
> > +1 (non-binding)
> >
> > Colin
> >
> >
> > On Thu, May 17, 2018, at 14:15, Robert Yokota wrote:
> > > Hi Colin,
> > >
> > > I've changed the KIP to have a composite object returned from get().
> > It's
> > > probably the most straightforward option.  Please let me know if you
> have
> > > any other concerns.
> > >
> > > Thanks,
> > > Robert
> > >
> > > On Thu, May 17, 2018 at 11:44 AM, Robert Yokota 
> > wrote:
> > >
> > > >
> > > >
> > > > Hi Colin,
> > > >
> > > > My last response was not that clear, so let me back up and explain a
> > bit
> > > > more.
> > > >
> > > > Some secret managers, such as Vault (and maybe Keywhiz) have the
> > notion of
> > > > a lease duration or a TTL for a path.  Every path can have a
> different
> > > > TTL.  This is period after which the value of the keys at the given
> > path
> > > > may be invalid.  It can be used to indicate a rotation will be done.
> > In
> > > > the cause of the Vault integration with AWS, Vault will actually
> > delete the
> > > > secrets from AWS at the moment the TTL expires.  A TTL could be used
> by
> > > > other ConfigProviders, such as a FileConfigProvider, to indicate that
> > all
> > > > the secrets at a given path (file), will be rotated on a regular
> basis.
> > > >
> > > > I would like to expose the TTL in the APIs somewhere.  The TTL can be
> > made
> > > > available at the time get() is called.  Connect already has a built
> in
> > > > ScheduledExecutor, so Connect can just use the TTL to schedule a
> > Connector
> > > > restart.  Originally, I had exposed the TTL in a ConfigContext
> > interface
> > > > passed to the get() method.  To reduce the number of APIs, I placed
> it
> > on
> > > > the onChange() method.  This means at the time of get(), onChange()
> > would
> > > > be called with a TTL.  The Connector's implementation of the callback
> > would
> > > > use onChange() with the TTL to schedule a restart.
> > > >
> > > > If you think this is overloading onChange() too much, I could add the
> > > > ConfigContext back to get():
> > > >
> > > >
> > > > Map get(ConfigContext ctx, String path);
> > > >
> > > > public interface ConfigContext {
> > > >
> > > > void willExpire(String path, long ttl);
> > > >
> > > > }
> > > >
> > > >
> > > >
> > > > or I could separate out the TTL method in the callback:
> > > >
> > > >
> > > > public interface ConfigChangeCallback {
> > > >
> > > > void willExpire(String path, long ttl);
> > > >
> > > > void onChange(String path, Map values);
> > > > }
> > > >
> > > >
> > > >
> > > > Or we could return a composite object from get():
> > > >
> > > > ConfigData get(String path);
> > > >
> > > > public class ConfigData {
> > > >
> > > >   Map data;
> > > >   long ttl;
> > > >
> > > > }
> > > >
> > > >
> > > > Do you have a preference Colin?
> > > >
> > > > Thanks,
> > > > Robert
> > > >
> > > >
> > > > On Thu, May 17, 2018 at 9:27 AM, Colin McCabe 
> > wrote:
> > > >
> > > >> Hi Robert,
> > > >>
> > > >> Hmm.  I thought that if you're using ConfigChangeCallback, you are
> > > >> relying on the ConfigProvider to make a callback to you when the
> > > >> configuration has changed.  So isn't that always the "push model"
> > (where
> > > >> the ConfigProvider pushes changes to Connect).  If you want the

Re: [DISCUSS] KIP-285: Connect Rest Extension Plugin

2018-05-17 Thread Ewen Cheslack-Postava
Yup, thanks for the changes. The 'health' package in particular feels like
a nice fit given the way we expect it to be used.

-Ewen

On Wed, May 16, 2018 at 7:02 PM Randall Hauch  wrote:

> Looks good to me. Thanks for quickly making the changes! Great work!
>
> Best regards,
>
> Randall
>
> > On May 16, 2018, at 8:07 PM, Magesh Nandakumar 
> wrote:
> >
> > Randall,
> >
> > I have adjusted the package names per Ewen's suggestions and also made
> some
> > minor edits per your suggestions. Since there are no major outstanding
> > issues, i'm moving this to vote.
> >
> > Thanks
> > Magesh
> >
> >> On Wed, May 16, 2018 at 4:38 PM, Randall Hauch 
> wrote:
> >>
> >> A few very minor suggestions:
> >>
> >>
> >>   1. There are a few formatting issues with paragraphs that use a
> >>   monospace font. Minor, but it would be nice to fix.
> >>   2. Would be nice to link to the PR
> >>   3. Do we need the org.apache.kafka.connect.rest.extension.entities
> >>   package? Could we just move the two classes into the parent
> >>   org.apache.kafka.connect.rest.extension package?
> >>   4. This sentence "The above approach helps alleviate any issues that
> >>   could arise if Extension accidentally reregister the" is cut off.
> >>   5. The "ConnectRestExtensionContext.configure(...)" method's JavaDoc
> >>   should describe the behaviors that are mentioned in the "Rest
> Extension
> >>   Integration with Connect" section; e.g., behavior when an extension
> >> adds a
> >>   resource that is already registered, whether unregistering works, etc.
> >>   Also, ideally the "close()" method would have JavaDoc that explained
> >> when
> >>   it is called (e.g., no other methods will be called on the extension
> >> after
> >>   this, etc.).
> >>   6. Packaging requirements are different for this component vs
> >>   connectors, transformations, and converters, since this now mandates
> the
> >>   Service Loader manifest file. This should be called out more
> explicitly.
> >>   7. It'd be nice if the example included how extension-specific config
> >>   properties are to be defined in the worker configuration file.
> >>
> >> As I said, these are all minor suggestions that only affect the KIP
> >> document. Once these are fixed, I think this is ready to move to voting.
> >>
> >> Best regards,
> >>
> >> Randall
> >>
> >> On Tue, May 15, 2018 at 11:30 AM, Magesh Nandakumar <
> mage...@confluent.io>
> >> wrote:
> >>
> >>> Randall- I think I have addressed all the comments. Let me know if we
> can
> >>> take this to Vote.
> >>>
> >>> Thanks
> >>> Magesh
> >>>
> >>> On Tue, May 8, 2018 at 10:12 PM, Magesh Nandakumar <
> mage...@confluent.io
> >>>
> >>> wrote:
> >>>
>  Hi All,
> 
>  I have updated the KIP to reflect changes based on the PR
>  https://github.com/apache/kafka/pull/4931. Its mostly has minor
> >> changes
>  to the interfaces and includes details on packages for the interfaces
> >> and
>  the classes. Let me know your thoughts.
> 
>  Thanks
>  Magesh
> 
>  On Fri, Apr 27, 2018 at 12:03 PM, Randall Hauch 
> >>> wrote:
> 
> > Great work, Magesh. I like the overall approach a lot, so I left some
> > pretty nuanced comments about specific details.
> >
> > Best regards,
> >
> > Randall
> >
> > On Wed, Apr 25, 2018 at 3:03 PM, Magesh Nandakumar <
> >>> mage...@confluent.io>
> > wrote:
> >
> >> Thanks Randall for your thoughts. I have created a replica of the
> > required
> >> entities in the draft implementation. If you can take a look at the
> >> PR
> > and
> >> let me know your thoughts, I will update the KIP to reflect the same
> >>
> >> https://github.com/apache/kafka/pull/4931
> >>
> >> On Tue, Apr 24, 2018 at 11:44 AM, Randall Hauch 
> > wrote:
> >>
> >>> Magesh, I think our last emails cross in mid-stream.
> >>>
> >>> We definitely want to put the new public interfaces/classes in the
> >>> API
> >>> module, and implementation in the runtime module. Yes, this will
> > affect
> >> the
> >>> design, since for example we don't want to expose runtime types to
> >>> the
> >> API,
> >>> and we want to prevent breaking changes. We don't really want to
> >>> move
> > the
> >>> REST entities if we don't have to, since that may break projects
> >>> that
> > are
> >>> extending the runtime module -- even though the runtime module is
> >>> not
> > a
> >>> public API we still want to _try_ to change things.
> >>>
> >>> Do you want to try to create a prototype to see what kind of
> >> impact
> > and
> >>> choices we'll have to make?
> >>>
> >>> Best regards,
> >>>
> >>> Randall
> >>>
> >>> On Tue, Apr 24, 2018 at 12:48 PM, Randall Hauch  >>>
> >> wrote:
> >>>
>  Thanks for updating the KIP, Magesh. You've resolved all of my
> >> concerns,
>  though I have one more: we should specify the pa

Re: [VOTE] KIP-285: Connect Rest Extension Plugin

2018-05-17 Thread Ewen Cheslack-Postava
+1 (binding)

Thanks,
Ewen

On Thu, May 17, 2018 at 12:16 PM Ted Yu  wrote:

> +1
>  Original message From: Gwen Shapira 
> Date: 5/17/18  12:02 PM  (GMT-08:00) To: dev 
> Subject: Re: [VOTE] KIP-285: Connect Rest Extension Plugin
> LGTM. +1.
>
> On Wed, May 16, 2018 at 8:19 PM, Magesh Nandakumar 
> wrote:
>
> > Hello everyone,
> >
> > After a good round of discussions with excellent feedback and no major
> > objections, I would like to start a vote on KIP-285: Connect Rest
> Extension
> > Plugin.
> >
> > KIP: <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 285%3A+Connect+Rest+Extension+Plugin
> > >
> >
> >
> > JIRA: <*https://issues.apache.org/jira/browse/KAFKA-6776
> > *>
> >
> > Discussion thread: <
> > https://www.mail-archive.com/dev@kafka.apache.org/msg86910.html>
> >
> > Thanks,
> > Magesh
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>


Re: [DISCUSS] KIP-298: Error Handling in Connect

2018-05-17 Thread Ewen Cheslack-Postava
A few more thoughts -- might not change things enough to affect a vote, but
still some things to consider:

* errors.retry.delay.max.ms -- this defines the max, but I'm not seeing
where we define the actual behavior. Is this intentional, or should we just
say that it is something like exponential, based on a starting delay value?
* I'm not sure I understand tolerance vs retries? They sound generally the
same -- tolerance sounds like # of retries since it is defined in terms of
failures.
* errors.log.enable -- it's unclear why this shouldn't just be
errors.log.include.configs
|| errors.log.include.messages (and include clauses for any other flags).
If there's just some base info, that's fine, but the explanation of the
config should make that clear.
* errors.deadletterqueue.enable - similar question here about just enabling
based on other relevant configs. seems like additional config complexity
for users when the topic name is absolutely going to be a basic requirement
anyway.
* more generally related to dlq, it seems we're trying to support multiple
clusters here -- is there a reason for this? it's not that costly, but one
thing supporting this requires is an entirely separate set of configs,
ACLs, etc. in contrast, assuming an additional topic on the same cluster
we're already working with keeps things quite simple. do we think this
complexity is worth it? elsewhere, we've seen the complexity of multiple
clusters result in a lot of config confusion.
* It's not obvious throughout that the format is JSON, and I assume in many
cases it uses JsonConverter. This should be clear at the highest level, not
just in the case of things like SchemaAndValue fields. This also seems to
introduce possibly complications for DLQs -- instead of delivering the raw
data, we potentially lose raw data & schema info because we're rendering it
as JSON. Not sure that's a good idea...

I think that last item might be the biggest concern to me -- DLQ formats
and control over content & reprocessing seems a bit unclear to me here, so
I'd assume users could also end up confused.

-Ewen


On Thu, May 17, 2018 at 8:53 PM Arjun Satish  wrote:

> Konstantine,
>
> Thanks for pointing out the typos. Fixed them.
>
> I had added the JSON schema which should now include key and header configs
> in there too. This should have been in the public interfaces section.
>
> Thanks very much,
>
> On Thu, May 17, 2018 at 9:13 AM, Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> > Thanks Arjun for your quick response.
> >
> > Adding an example for the failure log improves things, but I think it'd
> be
> > better to also add the schema definition of these Json entries. And I'll
> > agree with Magesh that this format should be public API.
> >
> > Also, does the current example have a copy/paste typo? Seems that the
> > TRANSFORMATION stage in the end has the config of a converter.
> > Similar to the above, fields for 'key' and 'headers' (and their
> conversion
> > stages) are skipped when they are not defined? Or should they present and
> > empty? A schema definition would help to know what a consumer of such
> logs
> > should expect.
> >
> > Also, thanks for adding some info for error on the source side. However,
> I
> > feel the current description might be a little bit ambiguous. I read:
> > "For errors in a source connector, the process is similar, but care needs
> > to be taken while writing back to the source." and sounds like it's
> > suggested that Connect will write records back to the source, which can't
> > be correct.
> >
> > Finally, a nit: " adds store the row information "... typo?
> >
> > Thanks,
> > - Konstantine
> >
> >
> >
> > On Thu, May 17, 2018 at 12:48 AM, Arjun Satish 
> > wrote:
> >
> > > On Wed, May 16, 2018 at 7:13 PM, Matt Farmer  wrote:
> > >
> > > > Hey Arjun,
> > > >
> > > > I like deadletterqueue all lower case, so I'm +1 on that.
> > > >
> > >
> > > Super! updated the KIP.
> > >
> > >
> > > >
> > > > Yes, in the case we were seeing there were external system failures.
> > > > We had issues connecting to S3. While the connector does include
> > > > some retry functionality, however setting these values sufficiently
> > > > high seemed to cause us to hit timeouts and cause the entire
> > > > task to fail anyway. (I think I was using something like 100 retries
> > > > during the brief test of this behavior?)
> > > >
> > >
> > > I am guessing these issues come up with trying to write to S3. Do you
> > think
> > > the S3 connector can detect the safe situations where it can throw
> > > RetriableExceptions instead of ConnectExceptions here (when the
> connector
> > > think it is safe to do so)?
> > >
> > >
> > > >
> > > > Yeah, totally understand that there could be unintended concequences
> > > > from this. I guess the use case I'm trying to optimize for is to give
> > > > folks some bubblegum to keep a high volume system limping
> > > > along until the software engineers get time to address it. So I'm
> > > > im

Re: [DISCUSS] KIP-305: Add Connect primitive number converters

2018-05-18 Thread Ewen Cheslack-Postava
Yeah, the usefulness of short seems questionable, but int is probably a
large enough range for some identifiers (e.g. we use an int in schema
registry). But yeah, I don't really have a problem with having Converters
for each of the existing serdes just for symmetry and since presumably
somebody finds them useful for something if they exist.

-Ewen

On Fri, May 18, 2018 at 11:55 AM Randall Hauch  wrote:

> Thanks, Ewen.
>
> You make several good points, and I've updated the KIP to hopefully address
> your comments. I think the symmetry with the Kafka serdes is useful, so
> I've kept all 5 converters in the KIP.
>
> Interestingly, perhaps the short and int converters (with the reduced
> ranges) are not necessarily that useful for keys either.
>
> Regards,
>
> Randall
>
> On Thu, May 17, 2018 at 10:08 PM, Ewen Cheslack-Postava  >
> wrote:
>
> > Just a couple of minor points that don't really affect the
> implementation:
> >
> > * For nulls, let's just mention the underlying serializers already
> support
> > this. I'm actually not sure why they should/need to, but given they do,
> > let's just defer to that implementation.
> > * I'm not sure where Float and Double converters are actually useful. The
> > use cases I know for integer serdes is for keys, but floats seem like a
> bad
> > choice for keys. These aren't a lot of overhead to build and maintain,
> but
> > if we don't know use cases for the specific types, it might be silly to
> > spend time and effort building and maintaining them.
> >
> > Otherwise, this seems simple and straightforward. Generally +1 on the
> > proposal.
> >
> > -Ewen
> >
> > On Thu, May 17, 2018 at 6:04 PM Magesh Nandakumar 
> > wrote:
> >
> > > Thanks Randall for the KIP. I think it will be super useful and looks
> > > pretty straightforward to me.
> > >
> > > Thanks
> > > Magesh
> > >
> > > On Thu, May 17, 2018 at 4:15 PM, Randall Hauch 
> wrote:
> > >
> > > > I'd like to start discussion of a very straightforward proposal for
> > > Connect
> > > > to add converters for the basic primitive number types: integer,
> short,
> > > > long, double, and float. Here is the KIP:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 305%3A+Add+Connect+primitive+number+converters
> > > >
> > > > As mentioned in the KIP, I've created a pull request (
> > > > https://github.com/apache/kafka/pull/5034) for those looking for
> > > > implementation details.
> > > >
> > > > Any feedback is appreciated.
> > > >
> > > > Best regards,
> > > >
> > > > Randall
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-298: Error Handling in Connect

2018-05-21 Thread Ewen Cheslack-Postava
Arjun,

Understood on retries vs tolerance -- though I suspect this will end up
being a bit confusing to users as well. It's two levels of error handling
which is what tripped me up.

One last comment on KIP (which otherwise looks good): for the tolerance
setting, do we want it to be an absolute value or something like a
percentage? Given the current way of setting things, I'm not sure I'd ever
set it to anything but -1 or 0, with maybe 1 as an easy option for
restarting a connector to get it past one bad message, then reverting back
to -1 or 0.

-Ewen

On Mon, May 21, 2018 at 11:01 AM Arjun Satish 
wrote:

> Hey Jason,
>
> Thanks for your comments. Please find answers inline:
>
> On Mon, May 21, 2018 at 9:52 AM, Jason Gustafson 
> wrote:
>
> > Hi Arjun,
> >
> > Thanks for the KIP. Just a few comments/questions:
> >
> > 1. The proposal allows users to configure the number of retries. I
> usually
> > find it easier as a user to work with timeouts since it's difficult to
> know
> > how long a retry might take. Have you considered adding a timeout option
> > which would retry until the timeout expires?
> >
>
> Good point. Updated the KIP.
>
> 2. The configs are named very generically (e.g. errors.retries.limit). Do
> > you think it will be clear to users what operations these configs apply
> to?
> >
>
> As of now, these configs are applicable to all operations in the connector
> pipeline (as mentioned in the proposed changes section). We decided not to
> have per operation limit because of the additional config complexity.
>
>
> > 3. I wasn't entirely clear what messages are stored in the dead letter
> > queue. It sounds like it includes both configs and messages since we have
> > errors.dlq.include.configs? Is there a specific schema you have in mind?
> >
>
> This has been addressed in the KIP. The DLQ will now contain only raw
> messages (no additional context). We are also supporting DLQs only for sink
> connectors now.
>
>
> > 4. I didn't see it mentioned explicitly in the KIP, but I assume the
> > tolerance metrics are reset after every task rebalance?
> >
>
> Great question. Yes, we will reset the tolerance metrics on every
> rebalance.
>
>
> > 5. I wonder if we can do without errors.tolerance.limit. You can get a
> > similar effect using errors.tolerance.rate.limit if you allow longer
> > durations. I'm not sure how useful an absolute counter is in practice.
> >
>
> Yeah, the rate limit does subsume the features offered by the absolute
> counter. Removed it.
>
>
> >
> > Thanks,
> > Jason
> >
> >
>


Re: [VOTE] KIP-298: Error Handling in Connect kafka

2018-05-21 Thread Ewen Cheslack-Postava
+1 binding. I had one last comment in the DISCUSS thread, but not really a
blocker.

-Ewen

On Mon, May 21, 2018 at 9:48 AM Matthias J. Sax 
wrote:

> +1 (binding)
>
>
>
> On 5/21/18 9:30 AM, Randall Hauch wrote:
> > Thanks, Arjun. +1 (non-binding)
> >
> > Regards,
> > Randall
> >
> > On Mon, May 21, 2018 at 11:14 AM, Guozhang Wang 
> wrote:
> >
> >> Thanks for the KIP. +1 (binding)
> >>
> >>
> >> Guozhang
> >>
> >> On Fri, May 18, 2018 at 3:36 PM, Gwen Shapira 
> wrote:
> >>
> >>> +1
> >>>
> >>> Thank you! Error handling in Connect will be a huge improvement.
> >>>
> >>> On Thu, May 17, 2018, 1:58 AM Arjun Satish 
> >> wrote:
> >>>
>  All,
> 
>  Many thanks for all the feedback on KIP-298. Highly appreciate the
> time
> >>> and
>  effort you all put into it.
> 
>  I've updated the KIP accordingly, and would like to start to start a
> >> vote
>  on it.
> 
>  KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>  298%3A+Error+Handling+in+Connect
>  JIRA: https://issues.apache.org/jira/browse/KAFKA-6738
>  Discussion Thread: https://www.mail-archive.com/
>  dev@kafka.apache.org/msg87660.html
> 
>  Thanks very much!
> 
> >>>
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>
>


Re: [DISCUSS] KIP-298: Error Handling in Connect

2018-05-21 Thread Ewen Cheslack-Postava
On Mon, May 21, 2018 at 12:39 PM Arjun Satish 
wrote:

> Thanks a lot, Ewen! I'll make sure the documentation is clear on the
> differences between retries an tolerance.
>
> Do you think percentage would have the same problem as the one you brought
> up? Also, if we say 10% tolerance, do we have to wait for the duration to
> finish before failing the task, or should we fail as soon as we hit 10%
> error.
>
Yeah, percent might not be right either. I'd probably only reasonably set
it to 0% or 100% as something in between seems difficult to justify.


>
> Alternatively, do you think making tolerance an Enum would be simpler?
> Where it's values are NONE (any errors kill), ALL (tolerate all errors and
> skip records) and FIRST (tolerate the first error, but fail after that)?
>

I do think the values I would ever use are limited enough to just be an
enum. Not sure if anyone has use cases for larger positive values.

-Ewen


>
> Best,
>
>
> On Mon, May 21, 2018 at 11:28 AM, Ewen Cheslack-Postava  >
> wrote:
>
> > Arjun,
> >
> > Understood on retries vs tolerance -- though I suspect this will end up
> > being a bit confusing to users as well. It's two levels of error handling
> > which is what tripped me up.
> >
> > One last comment on KIP (which otherwise looks good): for the tolerance
> > setting, do we want it to be an absolute value or something like a
> > percentage? Given the current way of setting things, I'm not sure I'd
> ever
> > set it to anything but -1 or 0, with maybe 1 as an easy option for
> > restarting a connector to get it past one bad message, then reverting
> back
> > to -1 or 0.
> >
> > -Ewen
> >
> > On Mon, May 21, 2018 at 11:01 AM Arjun Satish 
> > wrote:
> >
> > > Hey Jason,
> > >
> > > Thanks for your comments. Please find answers inline:
> > >
> > > On Mon, May 21, 2018 at 9:52 AM, Jason Gustafson 
> > > wrote:
> > >
> > > > Hi Arjun,
> > > >
> > > > Thanks for the KIP. Just a few comments/questions:
> > > >
> > > > 1. The proposal allows users to configure the number of retries. I
> > > usually
> > > > find it easier as a user to work with timeouts since it's difficult
> to
> > > know
> > > > how long a retry might take. Have you considered adding a timeout
> > option
> > > > which would retry until the timeout expires?
> > > >
> > >
> > > Good point. Updated the KIP.
> > >
> > > 2. The configs are named very generically (e.g. errors.retries.limit).
> Do
> > > > you think it will be clear to users what operations these configs
> apply
> > > to?
> > > >
> > >
> > > As of now, these configs are applicable to all operations in the
> > connector
> > > pipeline (as mentioned in the proposed changes section). We decided not
> > to
> > > have per operation limit because of the additional config complexity.
> > >
> > >
> > > > 3. I wasn't entirely clear what messages are stored in the dead
> letter
> > > > queue. It sounds like it includes both configs and messages since we
> > have
> > > > errors.dlq.include.configs? Is there a specific schema you have in
> > mind?
> > > >
> > >
> > > This has been addressed in the KIP. The DLQ will now contain only raw
> > > messages (no additional context). We are also supporting DLQs only for
> > sink
> > > connectors now.
> > >
> > >
> > > > 4. I didn't see it mentioned explicitly in the KIP, but I assume the
> > > > tolerance metrics are reset after every task rebalance?
> > > >
> > >
> > > Great question. Yes, we will reset the tolerance metrics on every
> > > rebalance.
> > >
> > >
> > > > 5. I wonder if we can do without errors.tolerance.limit. You can get
> a
> > > > similar effect using errors.tolerance.rate.limit if you allow longer
> > > > durations. I'm not sure how useful an absolute counter is in
> practice.
> > > >
> > >
> > > Yeah, the rate limit does subsume the features offered by the absolute
> > > counter. Removed it.
> > >
> > >
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-304: Connect runtime mode improvements for container platforms

2018-05-21 Thread Ewen Cheslack-Postava
Hey all,

I think think this is a great discussion, and is helping to clarify the
real pain points as well as explore a few more options than just what was
initially proposed.

Stephane, I think why you're ending up in "grand redesign" state is because
you're highlighting (and the KIP's motivation section examples of pain
points) are actually a set of a couple of different high level challenges:


   - Impact of shared resources, especially rebalancing but also just
   shared process space/memory/etc
   - Kafka security credentials / shared client configs
   - How configs are specified in the two modes

I actually think there isn't much in the identified problems that are
unique to containers or k8s/mesos, that's just an environment you'll
commonly run into these challenges.

I want to cover each of these issues to see if we can improve them in a
more incremental way.

*Impact of shared resources, especially rebalancing but also just shared
process space/memory/etc*
There are 2 aspects to this. The first is rebalancing. There's actually
nothing that fundamentally requires every rebalance operation to be
stop-the-world. I've had some thoughts about allowing partial and
incremental rebalancing floating around for awhile. As Connect clusters are
growing larger, this is becoming increasingly important. I don't have a
full write up ready, but it's fixable by making a straightforward change
(and we planned for the ability to make changes like this, so it absolutely
should be possible to do and apply to existing clusters). The basic idea is
to change what starting a rebalance means for ownership of resources --
instead of assuming everything might be rearranged, you assume by default
that you will continue to own everything you currently do and continue
processing through the rebalance. The leader who handles assignment can
then decide what set of rebalances really are required and include that
info in the data it sends back. If any rebalancing is required, do a
subsequent round of rebalancing where you actually give up those resources
if they were assigned to you. This gives you a way to do partial rebalances
and only as needed. You can further extend this in a variety of ways, e.g.
only rebalancing one resource per rebalance, doing a bit of "scheduling" to
spread out the impact, etc. This is definitely not a trivial
change/addition to make and will require very thorough testing, but it's
definitely feasible for 1 person to implement themselves pretty quickly.

The second aspect is shared resources with no control over different
connectors usage (and the fact that someone doing something bad might mean
all connectors crash with the workers). Containers are great for this that
give you more granularity than VMs or physical hosts, but either one works.
If you want to use any of these solutions, you are going to have to have
separate Connect clusters. There are some pain points if you go this route,
but I think some you can just live with (e.g. 3 topics per connector isn't
actually that insane), some are fixable and just not there yet (e.g. the
upgrade example affecting the whole cluster is also fixable by just
supporting dynamically loading in new jars instead of requiring a restart).

*Kafka security credentials / shared client configs*
This is definitely one we hear from time to time. In general, there's no
reason we couldn't just expose security configs in the connector config. I
have pushed back on just exposing all configs because a) I don't think
users *should* have control over them, b) in a shared cluster, some
represent potentially serious stability threats (e.g. I override buffer
sizes and OOM everything), and c) it's actually a compatibility concern.
However, I don't think that means we shouldn't consider exposing some of
them if there are good use cases (such as here, where you might want to
lock down different connectors from each other via ACLs).

For the purposes of this discussion, I don't really think this is the thing
to focus on -- given you're already committing to having separate security
configs for each connector, it seems like whether you configure multiple
clusters vs configure them per connector isn't that much different.

*How configs are specified in the two modes*
This one is, I think, the most interesting because it has been requested a
few times before and I'm not sure anyone has actually worked through
semantics that are consistent and handle all the different requirements. Of
all the stuff mentioned so far in this discussion, this is the piece that
is really the biggest pain point that's not easy to work around today --
even if you split out to separate containers per connector, you still want
distributed mode because a single connector may need to be spread across
multiple containers. However, the default way for apps to be configured is
for them to get config files/parameters directly with each process, and so
it seems nice if you are only running 1 connecter per cluster, to trea

Re: [DISCUSS] KIP-305: Add Connect primitive number converters

2018-05-21 Thread Ewen Cheslack-Postava
Sorry, probably not ideal to just be seeing this now given KIP deadline,
but in general straightforward ones where we expect things to be
uncontroversial, I think its fine to kick off a vote thread. Worst case we
have people re-file votes if something substantial changes in the proposal.

-Ewen

On Sat, May 19, 2018 at 6:20 AM Randall Hauch  wrote:

> Considering this KIP is straightforward, what do you think about kicking
> off a vote? Or does it need more discussion time?
>
> Regards,
> Randall
>
> > On May 18, 2018, at 4:30 PM, Ewen Cheslack-Postava 
> wrote:
> >
> > Yeah, the usefulness of short seems questionable, but int is probably a
> > large enough range for some identifiers (e.g. we use an int in schema
> > registry). But yeah, I don't really have a problem with having Converters
> > for each of the existing serdes just for symmetry and since presumably
> > somebody finds them useful for something if they exist.
> >
> > -Ewen
> >
> >> On Fri, May 18, 2018 at 11:55 AM Randall Hauch 
> wrote:
> >>
> >> Thanks, Ewen.
> >>
> >> You make several good points, and I've updated the KIP to hopefully
> address
> >> your comments. I think the symmetry with the Kafka serdes is useful, so
> >> I've kept all 5 converters in the KIP.
> >>
> >> Interestingly, perhaps the short and int converters (with the reduced
> >> ranges) are not necessarily that useful for keys either.
> >>
> >> Regards,
> >>
> >> Randall
> >>
> >> On Thu, May 17, 2018 at 10:08 PM, Ewen Cheslack-Postava <
> e...@confluent.io
> >>>
> >> wrote:
> >>
> >>> Just a couple of minor points that don't really affect the
> >> implementation:
> >>>
> >>> * For nulls, let's just mention the underlying serializers already
> >> support
> >>> this. I'm actually not sure why they should/need to, but given they do,
> >>> let's just defer to that implementation.
> >>> * I'm not sure where Float and Double converters are actually useful.
> The
> >>> use cases I know for integer serdes is for keys, but floats seem like a
> >> bad
> >>> choice for keys. These aren't a lot of overhead to build and maintain,
> >> but
> >>> if we don't know use cases for the specific types, it might be silly to
> >>> spend time and effort building and maintaining them.
> >>>
> >>> Otherwise, this seems simple and straightforward. Generally +1 on the
> >>> proposal.
> >>>
> >>> -Ewen
> >>>
> >>> On Thu, May 17, 2018 at 6:04 PM Magesh Nandakumar <
> mage...@confluent.io>
> >>> wrote:
> >>>
> >>>> Thanks Randall for the KIP. I think it will be super useful and looks
> >>>> pretty straightforward to me.
> >>>>
> >>>> Thanks
> >>>> Magesh
> >>>>
> >>>> On Thu, May 17, 2018 at 4:15 PM, Randall Hauch 
> >> wrote:
> >>>>
> >>>>> I'd like to start discussion of a very straightforward proposal for
> >>>> Connect
> >>>>> to add converters for the basic primitive number types: integer,
> >> short,
> >>>>> long, double, and float. Here is the KIP:
> >>>>>
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>> 305%3A+Add+Connect+primitive+number+converters
> >>>>>
> >>>>> As mentioned in the KIP, I've created a pull request (
> >>>>> https://github.com/apache/kafka/pull/5034) for those looking for
> >>>>> implementation details.
> >>>>>
> >>>>> Any feedback is appreciated.
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> Randall
> >>>>>
> >>>>
> >>>
> >>
>


Re: [VOTE] KIP-305: Add Connect primitive number converters

2018-05-22 Thread Ewen Cheslack-Postava
+1 (binding)

-Ewen

On Tue, May 22, 2018 at 9:29 AM Ted Yu  wrote:

> +1
>
> On Tue, May 22, 2018 at 9:19 AM, Randall Hauch  wrote:
>
> > I'd like to start a vote of a very straightforward proposal for Connect
> to
> > add converters for the basic primitive number types: integer, short,
> long,
> > double, and float that reuse Kafka's corresponding serdes. Here is the
> KIP:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 305%3A+Add+Connect+primitive+number+converters
> >
> >
> > Best regards,
> >
> > Randall
> >
>


Re: [EXTERNAL] Kafka Connect: New Kafka Source Connector

2018-05-22 Thread Ewen Cheslack-Postava
Sorry for the delay, didn't see this until now. I've given you edit
permissions on the wiki.

-Ewen

On Tue, May 15, 2018 at 2:21 PM McCaig, Rhys 
wrote:

> Hi Team,
>
> Would someone be able to provide me with Confluence permission in order to
> write a KIP for the below code.
>
> User: https://cwiki.apache.org/confluence/display/~mccaig
>
> Cheers,
> Rhys
>
> On May 11, 2018, at 4:45 PM, McCaig, Rhys  rhys_mcc...@comcast.com>> wrote:
>
> Hi there,
>
> Over at Comcast we just open sourced a Kafka source connector for Kafka
> Connect. (https://github.com/Comcast/MirrorTool-for-Kafka-Connect) We’ve
> used this as an alternative to MirrorMaker on a couple of projects.
> While discussing open sourcing the project, we realized that the
> functionality is similar to a connector that was suggested in the original
> Kafka Connect KIP (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767#KIP-26-AddKafkaConnectframeworkfordataimport/export-MirrorMaker
> ).
>
> Given this - we we’re wondering if there would be interest from the Kafka
> community in adopting the connector into the main Kafka codebase. We’d be
> more than happy to donate the code and help get it integrated.
>
> Cheers,
> Rhys McCaig
>
>


Re: [VOTE] KIP-176: Remove deprecated new-consumer option for tools

2018-05-25 Thread Ewen Cheslack-Postava
+1 (binding)

Just follow up on the existing version of the KIP, so nothing new here.
Possibly a bit disruptive given how quick the 1.0 -> 2.0 jump happened, but
it's the right time to remove it.

-Ewen

On Thu, May 24, 2018 at 8:13 AM Viktor Somogyi 
wrote:

> +1 (non-binding)
>
> I like this KIP. The new-consumer option is actually sometimes confusing,
> commands could work without it and 2.0.0 seems like a good time to do it.
> We should make sure to add enough tests to provide coverage.
>
> On Wed, May 23, 2018 at 7:32 PM, Ted Yu  wrote:
>
> > lgtm
> >
> > On Wed, May 23, 2018 at 9:04 AM, Paolo Patierno 
> > wrote:
> >
> > > Sorry ... I hope it's not too late but I created the KIP-176 on
> September
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 176%3A+Remove+deprecated+new-consumer+option+for+tools
> > >
> > > but due to be a breaking change, I needed to wait for a major release
> ...
> > > and the right time is now.
> > > Can you vote for that adding to the release plan, please ?
> > >
> > > Thanks,
> > >
> > > Paolo Patierno
> > > Principal Software Engineer (IoT) @ Red Hat
> > > Microsoft MVP on Azure & IoT
> > > Microsoft Azure Advisor
> > >
> > > Twitter : @ppatierno
> > > Linkedin : paolopatierno
> > > Blog : DevExperience
> > >
> >
>


Re: [Vote] KIP-321: Update TopologyDescription to better represent Source and Sink Nodes

2018-07-31 Thread Ewen Cheslack-Postava
Generally +1 (binding)

It would be helpful to just provide the full, updated interfaces in the KIP
and mark things as new with comments if needed. I had to go back and read
the discussion thread to make sure I was understanding the intent correctly.

Damian -- if we make that Optional, shouldn't the methods on Source also be
Optional types?

-Ewen

On Mon, Jul 30, 2018 at 11:13 PM Damian Guy  wrote:

> Hi Nishanth,
>
> I have one nit on the KIP. I think the topicNameExtractor method should
> return Optional rather than null.
> Sorry I'm late here.
>
> Thanks,
> Damian
>
> On Tue, 31 Jul 2018 at 01:14 Nishanth Pradeep 
> wrote:
>
> > We need one more binding vote.
> >
> > Binding Votes:
> >
> >- Matthias J. Sax
> >- Guozhang Wong
> >
> > Community Votes:
> >
> >- Bill Bejeck
> >- Ted Yu
> >
> > Best,
> > Nishanth Pradeep
> >
> > On Fri, Jul 27, 2018 at 10:02 AM Bill Bejeck  wrote:
> >
> > > Thanks for the KIP!
> > >
> > > +1
> > >
> > > -Bill
> > >
> > > On Thu, Jul 26, 2018 at 2:39 AM Guozhang Wang 
> > wrote:
> > >
> > > > +1
> > > >
> > > > On Wed, Jul 25, 2018 at 11:13 PM, Matthias J. Sax <
> > matth...@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > -Matthias
> > > > >
> > > > > On 7/25/18 7:47 PM, Ted Yu wrote:
> > > > > > +1
> > > > > >
> > > > > > On Wed, Jul 25, 2018 at 7:24 PM Nishanth Pradeep <
> > > > nishanth...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Hello,
> > > > > >>
> > > > > >> I'm calling a vote for KIP-321:
> > > > > >>
> > > > > >>
> > > > > >>
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-321%3A+Update+
> > > > > TopologyDescription+to+better+represent+Source+and+Sink+Nodes
> > > > > >>
> > > > > >> Best,
> > > > > >> Nishanth Pradeep
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>


Re: [VOTE] KIP-243: Make ProducerConfig and ConsumerConfig constructors public

2018-01-08 Thread Ewen Cheslack-Postava
+1 (binding), though I still think the Map should be  instead of
.

I also think its better to just expose the defaults as constants on the
class. Apparently there was discussion of this before and the concern is
that people write code that rely on the default configs and we might break
their code if we change it. I don't really buy this as using the constant
allows you to to symbolically reference the value rather than making your
own copy of it. Usually if we change a default like that there is an
important reason why and having the old copied value might be worse than
having the value change out from under you. Having the defaults explicitly
exposed can also be helpful when writing tests sometimes.

-Ewen

On Wed, Jan 3, 2018 at 9:30 AM, Colin McCabe  wrote:

> On Thu, Dec 21, 2017, at 10:28, Jason Gustafson wrote:
> > Hey Matthias,
> >
> > Let me suggest an alternative. As you have mentioned, these config
> classes
> > do not give users much benefit currently. Maybe we change that? I think
> > many users would appreciate having a builder for configuration since it
> > provides type safety and is generally a much friendlier pattern to work
> > with programmatically. Users could then do something like this:
> >
> > ConsumerConfig config = ConsumerConfig.newBuilder()
> > .setBootstrapServers("localhost:9092")
> > .setGroupId("group")
> > .setRequestTimeout(15, TimeUnit.SECONDS)
> > .build();
> >
> > Consumer consumer = new KafkaConsumer(config);
> >
> > An additional benefit of this is that it gives us a better way to expose
> > config deprecations. In any case, it would make it less odd to expose the
> > public constructor without giving users anything useful to do with the
> > class.
>
> Yeah, that would be good.  The builder idea would definitely make it a lot
> easier to configure clients programmatically.
>
> I do wonder if there are some cross-version compatibility issues here.  If
> there's any configuration that needs to be set by the client, but then
> propagated to the broker to be applied, the validation of that
> configuration really needs to be done by the broker itself.  The client
> code doesn't know the broker version, so it can't validate these configs.
> One example is topic configurations (although those are not set by
> ProducerConfig).  I'm not sure how big of an issue this is with our current
> configurations.
>
> Another problem here is that all these builder functions become API, and
> cannot easily be changed.  So if we want to change a configuration key that
> formerly accepted an int to accept a long, it will be difficult to do
> that.  We would have to add a new function with a separate name.
>
> best,
> Colin
>
>
> >
> > What do you think?
> >
> > -Jason
> >
> > On Wed, Dec 20, 2017 at 5:59 PM, Matthias J. Sax 
> > wrote:
> >
> > > It's tailored for internal usage. I think client constructors don't
> > > benefit from accepting those config objects. We just want to be able to
> > > access the default values for certain parameters.
> > >
> > > From a user point of view, it's actually boiler plate code if you pass
> > > in a config object instead of a plain Properties object because the
> > > config object itself is immutable.
> > >
> > > I actually create a JIRA to remove the constructors from KafkaStreams
> > > that do accept StreamsConfig for exact this reason:
> > > https://issues.apache.org/jira/browse/KAFKA-6386
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 12/20/17 3:33 PM, Jason Gustafson wrote:
> > > > Hi Matthias,
> > > >
> > > > Isn't it a little weird to make these constructors public but not
> also
> > > > expose the corresponding client constructors that use them?
> > > >
> > > > -Jason
> > > >
> > > > On Tue, Dec 19, 2017 at 9:30 AM, Bill Bejeck 
> wrote:
> > > >
> > > >> +1
> > > >>
> > > >> On Tue, Dec 19, 2017 at 12:09 PM, Guozhang Wang  >
> > > >> wrote:
> > > >>
> > > >>> +1
> > > >>>
> > > >>> On Tue, Dec 19, 2017 at 1:49 AM, Tom Bentley <
> t.j.bent...@gmail.com>
> > > >>> wrote:
> > > >>>
> > >  +1
> > > 
> > >  On 18 December 2017 at 23:28, Vahid S Hashemian <
> > > >>> vahidhashem...@us.ibm.com
> > > >
> > >  wrote:
> > > 
> > > > +1
> > > >
> > > > Thanks for the KIP.
> > > >
> > > > --Vahid
> > > >
> > > >
> > > >
> > > > From:   Ted Yu 
> > > > To: dev@kafka.apache.org
> > > > Date:   12/18/2017 02:45 PM
> > > > Subject:Re: [VOTE] KIP-243: Make ProducerConfig and
> > >  ConsumerConfig
> > > > constructors public
> > > >
> > > >
> > > >
> > > > +1
> > > >
> > > > nit: via "copy and past" an 'e' is missing at the end.
> > > >
> > > > On Mon, Dec 18, 2017 at 2:38 PM, Matthias J. Sax <
> > > >>> matth...@confluent.io>
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I want to propose the following KIP:
> > > >>
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> > > > apache.or

Re: [DISCUSS] KIP-208: Add SSL support to Kafka Connect REST interface

2018-01-08 Thread Ewen Cheslack-Postava
On Mon, Jan 8, 2018 at 11:39 AM, Randall Hauch  wrote:

> Nice feedback, Ewen. Thanks!
>
> On Thu, Jan 4, 2018 at 5:11 PM, Ewen Cheslack-Postava 
> wrote:
>
> > Hey Jakub,
> >
> > Sorry for not getting to this sooner. Overall the proposal looks good to
> > me, I just had a couple of questions.
> >
> > 1. For the configs/overrides, does this happen on a per-setting basis or
> if
> > one override is included do we not use any of the original settings? I
> > suspect that if you need to override one setting, it probably means
> you're
> > using an entirely different config and so the latter behavior seems
> better
> > to me. We've talked a bit about doing something similar for the
> > producer/consumer security settings as well so you don't have to specify
> > security configs in 3 places in your worker config.
> >
>
> Not sure if you were referring to
> https://issues.apache.org/jira/browse/KAFKA-6387, but I just withdrew that
> proposal (and the corresponding KIP-246) because behavior with existing
> configurations was not backward compatible, so existing configs might have
> very different behavior after the "inheritance" was implemented.
>
> But regardless, I do think that in this case if you have to override one of
> the settings you probably need to override multiple. So I'd be in favor of
> requiring all configs to be specified in the overridden `listeners.*`
> properties.
>

Yeah, a related case i was thinking of is how key.converter and
value.converter overrides work in Connectors. It's not exactly the same,
but in that case, if you include the key.converter setting in the connector
config, then nothing with key.converter prefix from the worker is passed
along. Just might be worth clarifying the all-or-nothing behavior. Also how
we apply it in this case (e.g. is there one key setting we can use that, if
it appears, then we do not inherit any security configs from the worker?)


>
>
> >
> > 2. For using default values from the worker config, I am wondering how
> > convinced we are that it will be common for them to be the same? I really
> > don't have enough experience w/ these setups to know, so just a question
> > here. I think the other thing to take into account here is that even
> though
> > we're not dealing with authorization in this KIP, we will eventually want
> > it for these APIs. Would we expect to be using the same principal for
> Kafka
> > and the Connect REST API? In a case where a company has a Connect cluster
> > that, e.g., an ops team manages and they are the only ones that are
> > supposed to make changes, that would make sense to me. But for a setup
> > where some dev team is allowed to use the REST API to create new
> connectors
> > but the cluster is managed by an ops team, I would think the Kafka
> > credentials would be different. I'm not sure how frequent each case would
> > be, so I'm a bit unsure about the default of using the worker security
> > configs by default. Thoughts?
> >
> > 3. We should probably specify the default in the table for
> > rest.advertised.security.protocol because in ConfigDef if you don't
> > specify
> > a default value it becomes a required config. The HTTP default will
> > probably need to be in there anyway.
> >
> > 4. Do we want to list the existing settings as deprecated and just move
> to
> > using listeners for consistency? We don't need to remove them anytime
> soon,
> > but given that the broker is doing the same, maybe we should just do that
> > in this KIP?
> >
>
> Marking them as deprecated in this KIP sounds good to me.
>
> >
> > I think these are mostly small details, overall it looks like a good
> plan!
> >
>
> +1
>
> Randall
>
>
> >
> > Thanks,
> > Ewen
> >
> > On Tue, Oct 24, 2017 at 5:19 AM, Jakub Scholz  wrote:
> >
> > > There has been no discussion since my last update week ago. Unless
> > someone
> > > has some further comments in the next 48 hours, I will start the voting
> > for
> > > this KIP.
> > >
> > > Thanks & Regards
> > > Jakub
> > >
> > > On Tue, Oct 17, 2017 at 5:54 PM, Jakub Scholz  wrote:
> > >
> > > > Ok, so I updated the KIP according to what we discussed. Please have
> a
> > > > look at the updates. Two points I'm not 100% sure about:
> > > >
> > > > 1) Should we mark the rest.host.name and rest.port options as
> > > deprecated?
> > > >
> > > > 2) I needed to also addres

Re: [VOTE] KIP-174 Deprecate and remove internal converter configs in WorkerConfig

2018-01-08 Thread Ewen Cheslack-Postava
+1 binding. Thanks for the KIP!

-Ewen

On Mon, Jan 8, 2018 at 8:34 AM, Ted Yu  wrote:

> +1
>
> On Mon, Jan 8, 2018 at 4:27 AM, UMESH CHAUDHARY 
> wrote:
>
> > Hello All,
> > Since there are no outstanding comments on this, so I'd like to start a
> > vote.
> >
> > Please find the KIP here
> >  > 174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig>
> > and
> > the related JIRA here  >.
> >
> > The KIP suggests to deprecate and remove the configs:
> > internal.key.converter, internal.value.converter
> >
> > Appreciate your comments.
> >
> > Regards,
> > Umesh
> >
>


Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-09 Thread Ewen Cheslack-Postava
great point, I'm always for exclusions where they make sense. i just prefer
to include by default w/ exclusions when necessary to listing explicit
inclusions and being restrictive. (and security updates immediately as
needed).

If you have a set of characters you think we should exclude, I think it
would be good to add them here or in a subsequent KIP!

-Ewen

On Tue, Jan 9, 2018 at 1:30 PM, Colin McCabe  wrote:

> On Sat, Jan 6, 2018, at 16:00, Ewen Cheslack-Postava wrote:
> > re: whitespace characters, I'm fine with the restriction since I don't
> see
> > it becoming an issue in practice. I just don't see any reason to restrict
> > it so it seems like we're going out of our way and doing extra work to be
> > restrictive, but without clear motivation.
>
> There are very good reasons not to support control characters in file
> names, topic names, log files, etc.
>
> See http://seclists.org/fulldisclosure/2003/Feb/att-341/Termulation.txt
>
> There are a bunch of CVEs about this, too.  Because of the (in my opinion,
> mistaken) decision to allow control characters in UNIX filenames, even
> echoing a file name to your terminal is a security vulnerability.
>
> best,
> Colin
>
>
> >
> > In general my default approach (without context of a specific system)
> would
> > be to accept anything that we can encode in UTF-8 and only apply
> > restrictions where it becomes necessary (e.g. we need to define a
> delimiter
> > for some reason). The constraints of URLs introduce some complexity (you
> > need escaping), but probably generally still allow this. If I can use an
> > emoji when naming things, then I'm probably happy :) Whitespace
> characters
> > definitely have some other issues (e.g. you can have non-visible
> whitespace
> > which obscures which connector you're actually working with), but despite
> > the JIRA linked, I wasn't really convinced they need special handling. It
> > seems like a really weird issue to encounter in the first place.
> >
> > -Ewen
> >
> > On Fri, Jan 5, 2018 at 8:10 AM, Randall Hauch  wrote:
> >
> > > Sönke, I'm happy with the current proposal.
> > >
> > > Ewen, the proposal allows any characters in the name as long as they
> are
> > > properly escaped/encoded. That seems to adhere to the robustness
> principle.
> > > The only exception is that the proposal trims leading and trailing
> > > whitespace characters in an attempt to reduce user errors. Can you
> please
> > > clarify that you're okay with this behavior? I agree that technically
> we
> > > can (and currently do) support whitespace-only names, but users have
> > > reported this as problematic, and it also would be confusing for most
> user
> > > interfaces.
> > >
> > > Best regards,
> > >
> > > Randall
> > >
> > > On Thu, Jan 4, 2018 at 10:31 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > > wrote:
> > >
> > > > Very late to the game here, but a few thoughts:
> > > >
> > > > 1. Regarding whether KIP is necessary, I don't mind doing it for
> > > > documentation sake, but I would classify any mishandling of connector
> > > names
> > > > here as a bug. Which doesn't require a KIP to fix.
> > > >
> > > > 2. For support of characters, Kafka has some history of just being
> > > > restrictive (e.g., see topic name restrictions), but I personally
> > > disagree
> > > > with this approach. I think it is better to be liberal in what we
> accept
> > > > and just document limitations. I think our default should be to
> accept
> > > any
> > > > user input and document why we can't handle certain inputs and how
> the
> > > user
> > > > should adapt if we can't. In general I try to work under the
> robustness
> > > > principle: *Be conservative in what you do, be liberal in what you
> accept
> > > > from others*
> > > >
> > > > 3. Related to 2, there were some cases like whitespace-only connector
> > > > names. This seems extremely weird and not critical, so I'm fine not
> > > > supporting it officially, but technically I don't see any reason it
> > > > shouldn't be supported with any appropriate escaping (i.e. what
> would it
> > > > break for us?).
> > > >
> > > > But in general, I think just being more explicit about expectations
> is
> > > > great and it

[DISCUSS] Release Plan for 1.0.1

2018-01-16 Thread Ewen Cheslack-Postava
Hi all,

I'd like to start the process for doing a 1.0.1 bug fix release. 1.0.0 was
released Nov 1, 2017, and about 2.5 mos have passed and 32 bug fixes have
accumulated so far. A few of the more notable fixes that we've merged so
far:

https://issues.apache.org/jira/browse/KAFKA-6269 - KTable restore fails
after rebalance
https://issues.apache.org/jira/browse/KAFKA-6185 - Selector memory leak
with high likelihood of OOM in case of down conversion
https://issues.apache.org/jira/browse/KAFKA-6167 - Timestamp on streams
directory contains a colon, which is an illegal character
https://issues.apache.org/jira/browse/KAFKA-6190 - GlobalKTable never
finishes restoring when consuming transactional messages
https://issues.apache.org/jira/browse/KAFKA-6252 - A fix for cleaning up
new Connect metrics when a connector does not shut down properly

These represent important fixes across all components -- core, Connect and
Streams.

I've prepared
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+1.0.1 to
help track the fixes for 1.0.1. As you can see we have only 1 item
currently marked as a blocker for 1.0.1. If folks with outstanding bugs
could take a pass and a) make sure anything they think should go into 1.0.1
is marked as such, with the appropriate priority, and b) make sure anything
marked for 1.0.1 should really be there.

Once people have taken a pass we can work on a VOTE thread and getting any
outstanding PRs reviewed.

-Ewen


Re: [DISCUSS] KIP-208: Add SSL support to Kafka Connect REST interface

2018-01-16 Thread Ewen Cheslack-Postava
On Sun, Jan 14, 2018 at 1:20 PM, Jakub Scholz  wrote:

> Hi Ewen,
>
> Thanks for your comments / questions.
>
> re 1) I was using the valuesWithPrefixOverride(...) method from
> AbstractConfig class. That takes the overrides setting by setting. It
> should not be hard to change the code to use only the overrides if one
> value has changed. But I think that the broker is using the same for the
> SSL configuration. So I guess the question is whether we want to keep thins
> behave the same within Connect (e.g. with the key.converter /
> value.converter) or with the broker configuration. I would be ultimately
> fine with both, so if you want I can change it to the approach you
> suggested.
>
>
This is a fair point. Mainly my concern is that the "effective"
configuration would become confusing -- if you don't remember that it is
per-field, your config files could look fine (separate blocks of configs if
you need them to be different), but still end up with an effective config
that would be a mishmash of the two.

I'm fine with consistency, but maybe the thing to do here then is to ensure
that we definitely log the "effective" or "derived" config before using it
so there is at least some useful trace of what we actually used that can be
helpful in debugging.


> re 2) I think the dangerous thing is that we are mixing different purposes
> for the same fields. The truststore is used for server authentication when
> connecting to Kafka broker and for user authentication when used for the
> REST API. The keystore is used for client authentication when connecting to
> Kafka brokers but as server certificate for the REST API. It can work fine,
> but it might be confusing and it is not obviously clear what are all the
> functions of given truststore / keystore. And that might not be a good
> thing for security related configuration.
>
> However because they have different roles, using the same truststore
> configuration for both doesn't mean that every Kafka Connect user is
> automatically able to connect also directly to Kafka broker. You can have
> in the truststore several public keys - each for different purpose.
>
> I think the original KIP actually proposed to use always separate fields.
> It was modified later after feedback from the discussion. I think that the
> current way is good because it is most flexible. But I'm fine with
> implementing it both ways.
>

Sure, I wasn't sure how common it would be to want to inherit the
keystore/truststore settings but override others, i.e. whether you would
have included both principals together or, if they were going to be
different anyway, possibly ship them separately.

I don't feel strongly about this, it's mostly a question from ignorance on
how people would be managing these principals/permissions/configurations.
Mainly I'm trying to make sure we get a good balance of ease of use,
encouraging good security practices, and being able to lock down strongly &
carefully.


>
> re 3) What I'm actually using right now is the default value of "null"
> (although in the KIP it was actually empty - I added the null only now).
> That makes it not required in ConfigDef. In the same time it makes it
> possible for Kafka Connect that if the protocol is not specified, I can
> define it based on the listeners field. Thanks to that, when the user has
> only one listener specified in the "listeners" field he does not need to
> specify the protocol. When I set the default value to HTTP, users will need
> to change it if they want to use HTTPS for the communication which would
> make it more complicated for the users. So I think my current version is
> better. What do you think?
>

I see, this sounds fine then. I had thought it was just no default value at
all.


>
> re 4) Yeah, I was thinking about it as well. I changed the KIP.
>
> I will rebase the work I have done so far, finish it and open a WIP PR, so
> that you can see the code as well.
>
>
Sounds good, thanks!

If we think we've converged, we should probably revive the vote thread
again and try to get this passed through. KIP deadline and then feature
freeze are coming up soon. Not a big deal if we can't make it for this
release, but I know quite a few folks are looking forward to this feature.

-Ewen



> Thanks & regards
> Jakub
>
> On Wed, Jan 10, 2018 at 10:39 AM, Jakub Scholz  wrote:
>
> > I'm sorry guys, I'm a bit busy with something else this week. But I will
> > get back to his till the end of the week.
> >
> > Thanks & Regards
> > Jakub
> >
> > On Tue, Jan 9, 2018 at 1:19 AM, Ewen Cheslack-Postava  >
> > wrote:
> >
> >> On Mon, Jan 8, 2018 at 11:39 AM, Randal

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-16 Thread Ewen Cheslack-Postava
Sonke,

I'm fine filtering some control characters. The trimming also seems like it
might be *somewhat* moot because the way connector names work in standalone
mode is limited by ConfigDef, which already does trimming of settings. Not
a great reason to be restrictive, but we'd partly just be codifying what's
there.

I just generally have a distaste for being restrictive without a clear
reason. In this case I don't think it has any significant impact.

KIP freeze is nearing and this seems like a simple improvement and a PR is
already available (modulo any changes re: control characters). I'll start
reviewing the PR, do you want to make any last updates about control
characters in the KIP and kick off a VOTE thread?

-Ewen

On Fri, Jan 12, 2018 at 1:43 PM, Colin McCabe  wrote:

> On Fri, Jan 12, 2018, at 08:03, Sönke Liebau wrote:
> > Hi everybody,
> >
> > from reading the discussion I understand that we have two things still
> > open for discussen.
> >
> >  Ewen is still a bit on the fence about whether or not we trim
> > whitespace characters but seems to favor not doing it due to there not
> > being a real issue with them. I think it mostly boils down to a
> > question of general preference. I am happy to change the code to allow
> > leading and trailing whitespace characters again. If someone has a use
> > case for these, so be it - I don't see a technical reason against
> > them. Personally I think it is more likely that someone accidentally
> > gets a whitespace character in his connector name somehow and
> > subsequently has a confusing time figuring it out, but it shouldn't be
> > that tough to spot and is correct behavior, so no issue with it.
> > TL/DR: I'm happy either way :)
> >
> > Colin brought up control characters and that we should disallow these
> > in connector names. The specific one that is mentioned in his link is
> > Ascii 27 - ESC - \e so one possibility would be to explicitly
> > blacklist this. The rest of the control characters (Ascii 0 through 31
> > and 127) should be less critical I think, but since there is no way of
> > knowing all software that might look at these strings and interpret
> > them there is no real way of being certain. I think there is a case to
> > be made for disallowing all control characters (and their respective
> > escape sequences where applicable) in connector names - perhaps with
> > the possible exclusion of /n /r /t .
>
> +1
>
> Colin
>
> >
> > Thoughts?
> >
> > Kind regards,
> > Sönke
> >
> >
> >
> > On Wed, Jan 10, 2018 at 7:23 AM, Ewen Cheslack-Postava
> >  wrote:
> > > great point, I'm always for exclusions where they make sense. i just
> prefer
> > > to include by default w/ exclusions when necessary to listing explicit
> > > inclusions and being restrictive. (and security updates immediately as
> > > needed).
> > >
> > > If you have a set of characters you think we should exclude, I think it
> > > would be good to add them here or in a subsequent KIP!
> > >
> > > -Ewen
> > >
> > > On Tue, Jan 9, 2018 at 1:30 PM, Colin McCabe 
> wrote:
> > >
> > >> On Sat, Jan 6, 2018, at 16:00, Ewen Cheslack-Postava wrote:
> > >> > re: whitespace characters, I'm fine with the restriction since I
> don't
> > >> see
> > >> > it becoming an issue in practice. I just don't see any reason to
> restrict
> > >> > it so it seems like we're going out of our way and doing extra work
> to be
> > >> > restrictive, but without clear motivation.
> > >>
> > >> There are very good reasons not to support control characters in file
> > >> names, topic names, log files, etc.
> > >>
> > >> See http://seclists.org/fulldisclosure/2003/Feb/att-
> 341/Termulation.txt
> > >>
> > >> There are a bunch of CVEs about this, too.  Because of the (in my
> opinion,
> > >> mistaken) decision to allow control characters in UNIX filenames, even
> > >> echoing a file name to your terminal is a security vulnerability.
> > >>
> > >> best,
> > >> Colin
> > >>
> > >>
> > >> >
> > >> > In general my default approach (without context of a specific
> system)
> > >> would
> > >> > be to accept anything that we can encode in UTF-8 and only apply
> > >> > restrictions where it becomes necessary (e.g. we need to define a
> > >> delimiter
> > 

Re: [DISCUSS] KIP-242: Mask password fields in Kafka Connect REST response

2018-01-16 Thread Ewen Cheslack-Postava
Vincent,

I think with the addition of a configuration to control this for
compatibility, people would generally be ok with it. If you want to start a
VOTE thread, the KIP deadline is coming up and the PR looks pretty small. I
will take a pass at reviewing the PR so we'll be ready to merge if we can
get the KIP voted through.

Thanks,
Ewen

On Fri, Jan 12, 2018 at 10:18 AM, Vincent Meng  wrote:

> @Ted: The issue is kinda hard to reproduce. It's just something we observe
> over time.
>
> @Ewen: I agree. Opt-in seems to be a good solution to me. To your question,
> if there is no ConfDef that defines which fields are Passwords we can just
> return the config as is.
>
> There is a PR for this KIP already. Comments/Discussions are welcome.
> https://github.com/apache/kafka/pull/4269
>
> On Tue, Jan 2, 2018 at 8:52 PM, Ewen Cheslack-Postava 
> wrote:
>
> > Vincent,
> >
> > Thanks for the KIP. This is definitely an issue we know is a problem for
> > some users.
> >
> > I think the major problem with the KIP as-is is that it makes it
> impossible
> > to get the original value back out of the API. This KIP probably ties in
> > significantly with ideas for securing the REST API (SSL) and adding ACLs
> to
> > it. Both are things we know people want, but haven't happened yet.
> However,
> > it also interacts with other approaches to adding those features, e.g.
> > layering proxies on top of the existing API (e.g. nginx, apache, etc).
> Just
> > doing a blanket replacement of password values with a constant would
> likely
> > break things for people who secure things via a proxy (and may just not
> > allow reads of configs unless the user is authorized for the particular
> > connector). These are the types of concerns we like to think through in
> the
> > compatibility section. One option to get the masking functionality in
> > without depending on a bunch of other security improvements might be to
> > make this configurable so users that need this (and can forgo seeing a
> > valid config via the API) can opt-in.
> >
> > Regarding your individual points:
> >
> > * I don't think the particular value for the masked content matters much.
> > Any constant indicating a password field is good. Your value seems fine
> to
> > me.
> > * I don't think ConnectorInfo has enough info on its own to do proper
> > masking. In fact, I think you need to parse the config enough to get the
> > Connector-specific ConfigDef out in order to determine which fields are
> > Password fields. I would probably try to push this to be as central as
> > possible, maybe adding a method to AbstractHerder that can get configs
> with
> > a boolean indicating whether they need to have sensitive fields removed.
> > That method could deal with parsing the config to get the right
> connector,
> > getting the connector config, and then sanitizing any configs that are
> > sensitive. We could have this in one location, then have the relevant
> REST
> > APIs just use the right flag to determine if they get sanitized or
> > unsanitized data.
> >
> > That second point raises another interesting point -- what happens if the
> > connector configuration references a connector which the worker serving
> the
> > REST request *does not know about*? In that case, there will be no
> > corresponding ConfigDef that defines which fields are Passwords and need
> to
> > be sensitized. Does it return an error? Or just return the config as is?
> >
> > -Ewen
> >
> > On Thu, Dec 28, 2017 at 3:34 AM, Ted Yu  wrote:
> >
> > > For the last point you raised, can you come up with a unit test that
> > shows
> > > what you observed ?
> > >
> > > Cheers
> > >
> > > On Mon, Dec 18, 2017 at 11:14 AM, Vincent Meng  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I've created KIP-242, a proposal to secure credentials in kafka
> connect
> > > > rest endpoint.
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 242%3A+Mask+password+in+Kafka+Connect+Rest+API+response
> > > >
> > > > Here are something I'd like to discuss:
> > > >
> > > >- The "masked" value is set to "*" (9 stars) currently.
> It's
> > > an
> > > >arbitrary value I picked. Are there any better options?
> > > >- The proposal change is in the
> > > >*org.apache.kafka.connect.runtime.rest.resources.
> > ConnectorsRes

Re: [VOTE] KIP-145: Expose Record Headers in Kafka Connect

2018-01-21 Thread Ewen Cheslack-Postava
+1 (binding)

Thanks for the work on this -- not a small upgrade to the Connect APIs!

-Ewen

On Fri, Jan 19, 2018 at 3:37 PM, Randall Hauch  wrote:

> Hi everyone,
>
> I'd like to start the voting on this KIP to add support for headers in
> Connect.:
>
> *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 145+-+Expose+Record+Headers+in+Kafka+Connect
>  145+-+Expose+Record+Headers+in+Kafka+Connect>*
>
> This does add a fair number of interfaces to our public API, and defines
> some behavioral changes as well.
>
> Thanks! Your feedback is highly appreciated.
>
> Randall
>


Re: [DISCUSS] Release Plan for 1.0.1

2018-01-22 Thread Ewen Cheslack-Postava
Jeff,

The initial email was just to get people looking at JIRAs. We still have
one outstanding blocker, and Guozhang's PR has already been merged.
Otherwise we have no blockers, but there are a couple marked "Critical"
that are unresolved. I'll be following up on those JIRAs to get them either
moved to a later release or upgraded to blockers with folks working on
resolving them.

We'll probably do an RC either late this week or early next week. Since
this is a bug fix release, I am guessing it will only require 1 RC.

-Ewen

On Mon, Jan 22, 2018 at 11:23 AM, Jeff Widman  wrote:

> Any update on this release?
>
> Haven't seen anything other than Guozhang's email...
>
> Waiting for it to drop so we can upgrade to 1.0 series...
>
> On Tue, Jan 16, 2018 at 1:32 PM, Guozhang Wang  wrote:
>
> > Hello Ewen,
> >
> > Could you include one more notable changes in 1.0.1:
> > https://issues.apache.org/jira/browse/KAFKA-6398 ?
> >
> > My PR is ready for reviews and should be mergable at any time.
> >
> >
> > Guozhang
> >
> > On Tue, Jan 16, 2018 at 10:54 AM, Ewen Cheslack-Postava <
> e...@confluent.io
> > >
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start the process for doing a 1.0.1 bug fix release. 1.0.0
> > was
> > > released Nov 1, 2017, and about 2.5 mos have passed and 32 bug fixes
> have
> > > accumulated so far. A few of the more notable fixes that we've merged
> so
> > > far:
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-6269 - KTable restore
> fails
> > > after rebalance
> > > https://issues.apache.org/jira/browse/KAFKA-6185 - Selector memory
> leak
> > > with high likelihood of OOM in case of down conversion
> > > https://issues.apache.org/jira/browse/KAFKA-6167 - Timestamp on
> streams
> > > directory contains a colon, which is an illegal character
> > > https://issues.apache.org/jira/browse/KAFKA-6190 - GlobalKTable never
> > > finishes restoring when consuming transactional messages
> > > https://issues.apache.org/jira/browse/KAFKA-6252 - A fix for cleaning
> up
> > > new Connect metrics when a connector does not shut down properly
> > >
> > > These represent important fixes across all components -- core, Connect
> > and
> > > Streams.
> > >
> > > I've prepared
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+1.0.1
> to
> > > help track the fixes for 1.0.1. As you can see we have only 1 item
> > > currently marked as a blocker for 1.0.1. If folks with outstanding bugs
> > > could take a pass and a) make sure anything they think should go into
> > 1.0.1
> > > is marked as such, with the appropriate priority, and b) make sure
> > anything
> > > marked for 1.0.1 should really be there.
> > >
> > > Once people have taken a pass we can work on a VOTE thread and getting
> > any
> > > outstanding PRs reviewed.
> > >
> > > -Ewen
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
>
> *Jeff Widman*
> jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
> <><
>


Re: [VOTE] KIP-212: Enforce set of legal characters for connector names

2018-01-23 Thread Ewen Cheslack-Postava
+1 (binding)

On Tue, Jan 23, 2018 at 7:28 AM, Matt Farmer  wrote:

> +1 from me (non-binding) =)
>
> > On Jan 22, 2018, at 7:35 PM, Sönke Liebau 
> > 
> wrote:
> >
> > All,
> >
> > this KIP has been discussed for quite some time now and I believe we
> > addressed all major concerns in the current revision, so I'd like to
> > start a vote.
> >
> > KIP can be found here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 212%3A+Enforce+set+of+legal+characters+for+connector+names
> >
> > Let me know what you think.
> >
> > Kind regards,
> > Sönke
>
>


Re: [VOTE] KIP-206: Add support for UUID serialization and deserialization

2018-01-29 Thread Ewen Cheslack-Postava
+1 (binding)

On Fri, Jan 26, 2018 at 9:16 AM, Colin McCabe  wrote:

> +1 (non-binding)
>
>
>
> On Fri, Jan 26, 2018, at 08:29, Ted Yu wrote:
> > +1
> >
> > On Fri, Jan 26, 2018 at 7:00 AM, Brandon Kirchner <
> > brandon.kirch...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I would like to (re)start the voting process for KIP-206:
> > >
> > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 206%3A+Add+support+for+UUID+serialization+and+deserialization
> > >  > > 206%3A+Add+support+for+UUID+serialization+and+deserialization>*
> > >
> > > The KIP adds a UUID serializer and deserializer. Possible
> implementation
> > > can be seen here --
> > >
> > > https://github.com/apache/kafka/pull/4438
> > >
> > > Original discussion and voting thread can be seen here --
> > > http://search-hadoop.com/m/Kafka/uyzND1dlgePJY7l9?subj=+
> > > DISCUSS+KIP+206+Add+support+for+UUID+serialization+and+deserialization
> > >
> > >
> > > Thanks!
> > > Brandon K.
> > >
>


[VOTE] 1.0.1 RC0

2018-02-05 Thread Ewen Cheslack-Postava
Hello Kafka users, developers and client-developers,

Sorry for a bit of delay, but I've now prepared the first candidate for
release of Apache Kafka 1.0.1.

This is a bugfix release for the 1.0 branch that was first released with
1.0.0 about 3 months ago. We've fixed 46 significant issues since that
release. Most of these are non-critical, but in aggregate these fixes will
have significant impact. A few of the more significant fixes include:

* KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
plugins
* KAFKA-6185: Selector memory leak with high likelihood of OOM in case of
down conversion
* KAFKA-6269: KTable state restore fails after rebalance
* KAFKA-6190: GlobalKTable never finishes restoring when consuming
transactional messages

Release notes for the 1.0.1 release:
http://home.apache.org/~ewencp/kafka-1.0.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Thursday, Feb 8, 12pm PT ***

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~ewencp/kafka-1.0.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~ewencp/kafka-1.0.1-rc0/javadoc/

* Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
https://github.com/apache/kafka/tree/1.0.1-rc0


* Documentation:
http://kafka.apache.org/10/documentation.html

* Protocol:
http://kafka.apache.org/10/protocol.html


Please test and verify the release artifacts and submit a vote for this RC,
or report any issues so we can fix them and get a new RC out ASAP! Although
this release vote requires PMC votes to pass, testing, votes, and bug
reports are valuable and appreciated from everyone.

Thanks,
Ewen


Re: [VOTE] 1.0.1 RC0

2018-02-05 Thread Ewen Cheslack-Postava
Not sure, seems to be variable. I "closed" the upload on the Apache staging
nexus repo per the release process, but I've never seen any indication of a
reliable amount of time for it to show up. Didn't want to hold up the
process waiting for it so I sent the email, but if it doesn't show up for
too long or we need a follow up RC, I'll just extend/restart the process.

-Ewen

On Mon, Feb 5, 2018 at 8:21 PM, Ted Yu  wrote:

> Ewen:
> Do you know how long it takes for maven repo to be populated ?
>
> Looking at:
> https://repository.apache.org/content/groups/staging/org/
> apache/kafka/kafka-clients/
>
> I don't see 1.0.1 version.
>
> Cheers
>
> On Mon, Feb 5, 2018 at 7:48 PM, Ewen Cheslack-Postava 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > Sorry for a bit of delay, but I've now prepared the first candidate for
> > release of Apache Kafka 1.0.1.
> >
> > This is a bugfix release for the 1.0 branch that was first released with
> > 1.0.0 about 3 months ago. We've fixed 46 significant issues since that
> > release. Most of these are non-critical, but in aggregate these fixes
> will
> > have significant impact. A few of the more significant fixes include:
> >
> > * KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
> > plugins
> > * KAFKA-6185: Selector memory leak with high likelihood of OOM in case of
> > down conversion
> > * KAFKA-6269: KTable state restore fails after rebalance
> > * KAFKA-6190: GlobalKTable never finishes restoring when consuming
> > transactional messages
> >
> > Release notes for the 1.0.1 release:
> > http://home.apache.org/~ewencp/kafka-1.0.1-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Thursday, Feb 8, 12pm PT ***
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~ewencp/kafka-1.0.1-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/
> >
> > * Javadoc:
> > http://home.apache.org/~ewencp/kafka-1.0.1-rc0/javadoc/
> >
> > * Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
> > https://github.com/apache/kafka/tree/1.0.1-rc0
> >
> >
> > * Documentation:
> > http://kafka.apache.org/10/documentation.html
> >
> > * Protocol:
> > http://kafka.apache.org/10/protocol.html
> >
> >
> > Please test and verify the release artifacts and submit a vote for this
> RC,
> > or report any issues so we can fix them and get a new RC out ASAP!
> Although
> > this release vote requires PMC votes to pass, testing, votes, and bug
> > reports are valuable and appreciated from everyone.
> >
> > Thanks,
> > Ewen
> >
>


Re: [VOTE] 1.0.1 RC0

2018-02-05 Thread Ewen Cheslack-Postava
@Ted Check now. I went back and double checked the details. Seems like the
release details may have changed in a way that was not clearly documented
because we have two separate uploads now -- the gradle process for most of
the artifacts and the maven uploads for the streams quickstart examples.
This requires "closing" two separate staging repos to get everything
released.

If things look good now, I can update the release script/instructions to
make this clearer.

-Ewen

On Mon, Feb 5, 2018 at 9:59 PM, Ewen Cheslack-Postava 
wrote:

> Not sure, seems to be variable. I "closed" the upload on the Apache
> staging nexus repo per the release process, but I've never seen any
> indication of a reliable amount of time for it to show up. Didn't want to
> hold up the process waiting for it so I sent the email, but if it doesn't
> show up for too long or we need a follow up RC, I'll just extend/restart
> the process.
>
> -Ewen
>
> On Mon, Feb 5, 2018 at 8:21 PM, Ted Yu  wrote:
>
>> Ewen:
>> Do you know how long it takes for maven repo to be populated ?
>>
>> Looking at:
>> https://repository.apache.org/content/groups/staging/org/apa
>> che/kafka/kafka-clients/
>>
>> I don't see 1.0.1 version.
>>
>> Cheers
>>
>> On Mon, Feb 5, 2018 at 7:48 PM, Ewen Cheslack-Postava 
>> wrote:
>>
>> > Hello Kafka users, developers and client-developers,
>> >
>> > Sorry for a bit of delay, but I've now prepared the first candidate for
>> > release of Apache Kafka 1.0.1.
>> >
>> > This is a bugfix release for the 1.0 branch that was first released with
>> > 1.0.0 about 3 months ago. We've fixed 46 significant issues since that
>> > release. Most of these are non-critical, but in aggregate these fixes
>> will
>> > have significant impact. A few of the more significant fixes include:
>> >
>> > * KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
>> > plugins
>> > * KAFKA-6185: Selector memory leak with high likelihood of OOM in case
>> of
>> > down conversion
>> > * KAFKA-6269: KTable state restore fails after rebalance
>> > * KAFKA-6190: GlobalKTable never finishes restoring when consuming
>> > transactional messages
>> >
>> > Release notes for the 1.0.1 release:
>> > http://home.apache.org/~ewencp/kafka-1.0.1-rc0/RELEASE_NOTES.html
>> >
>> > *** Please download, test and vote by Thursday, Feb 8, 12pm PT ***
>> >
>> > Kafka's KEYS file containing PGP keys we use to sign the release:
>> > http://kafka.apache.org/KEYS
>> >
>> > * Release artifacts to be voted upon (source and binary):
>> > http://home.apache.org/~ewencp/kafka-1.0.1-rc0/
>> >
>> > * Maven artifacts to be voted upon:
>> > https://repository.apache.org/content/groups/staging/
>> >
>> > * Javadoc:
>> > http://home.apache.org/~ewencp/kafka-1.0.1-rc0/javadoc/
>> >
>> > * Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
>> > https://github.com/apache/kafka/tree/1.0.1-rc0
>> >
>> >
>> > * Documentation:
>> > http://kafka.apache.org/10/documentation.html
>> >
>> > * Protocol:
>> > http://kafka.apache.org/10/protocol.html
>> >
>> >
>> > Please test and verify the release artifacts and submit a vote for this
>> RC,
>> > or report any issues so we can fix them and get a new RC out ASAP!
>> Although
>> > this release vote requires PMC votes to pass, testing, votes, and bug
>> > reports are valuable and appreciated from everyone.
>> >
>> > Thanks,
>> > Ewen
>> >
>>
>
>


Documentation build system

2018-02-06 Thread Ewen Cheslack-Postava
Hi all,

I just wrote a note in https://issues.apache.org/jira/browse/KAFKA-2967
with a proposal for changing how docs are written. I want to move on this
soon if possible and normally would just leave the discussion to the JIRA,
but as I think this is something everyone has an opinion on and affects
everyone contributing to the project, I figured I'd send this quick note to
increase the likelihood people see it and have a chance to weigh in.

Thanks,
-Ewen


Re: [VOTE] 1.0.1 RC0

2018-02-09 Thread Ewen Cheslack-Postava
Just a heads up that we had a fix for KAFKA-6529 land to fix a file
descriptor leak. So this RC is dead and I'll be generating RC1 soon.

Thanks,
Ewen

On Wed, Feb 7, 2018 at 11:06 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Hi Ewen,
>
> +1
>
> Building from source and running the quickstart were successful on Ubuntu
> and Windows 10.
>
> Thanks for running the release.
> --Vahid
>
>
>
> From:   Ewen Cheslack-Postava 
> To: dev@kafka.apache.org, us...@kafka.apache.org,
> kafka-clie...@googlegroups.com
> Date:   02/05/2018 07:49 PM
> Subject:[VOTE] 1.0.1 RC0
>
>
>
> Hello Kafka users, developers and client-developers,
>
> Sorry for a bit of delay, but I've now prepared the first candidate for
> release of Apache Kafka 1.0.1.
>
> This is a bugfix release for the 1.0 branch that was first released with
> 1.0.0 about 3 months ago. We've fixed 46 significant issues since that
> release. Most of these are non-critical, but in aggregate these fixes will
> have significant impact. A few of the more significant fixes include:
>
> * KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
> plugins
> * KAFKA-6185: Selector memory leak with high likelihood of OOM in case of
> down conversion
> * KAFKA-6269: KTable state restore fails after rebalance
> * KAFKA-6190: GlobalKTable never finishes restoring when consuming
> transactional messages
>
> Release notes for the 1.0.1 release:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__home.
> apache.org_-7Eewencp_kafka-2D1.0.1-2Drc0_RELEASE-5FNOTES.
> html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-
> kjJc7uSVcviKUc&m=Z45uiGWoLkCQ5hYes5SiOy1n_pA3ih4Cvmr5W32xx98&s=
> l1iKa9gDVsN8n73JUsdMj2b_8vCXjo6ZlhPjlHnwLa4&e=
>
>
> *** Please download, test and vote by Thursday, Feb 8, 12pm PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__kafka.
> apache.org_KEYS&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_
> itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc&m=Z45uiGWoLkCQ5hYes5SiOy1n_
> pA3ih4Cvmr5W32xx98&s=FMJWV-i3KbNT9eWV7mxnb9vLofAG8UOyqf13nC60HT0&e=
>
>
> * Release artifacts to be voted upon (source and binary):
> https://urldefense.proofpoint.com/v2/url?u=http-3A__home.
> apache.org_-7Eewencp_kafka-2D1.0.1-2Drc0_&d=DwIBaQ&c=jf_
> iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc&m=
> Z45uiGWoLkCQ5hYes5SiOy1n_pA3ih4Cvmr5W32xx98&s=wfEb6h21ejMltBiWDsND5C_
> iAR1asfxwSVKbbmNwDRQ&e=
>
>
> * Maven artifacts to be voted upon:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__
> repository.apache.org_content_groups_staging_&d=DwIBaQ&c=jf_
> iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc&m=
> Z45uiGWoLkCQ5hYes5SiOy1n_pA3ih4Cvmr5W32xx98&s=
> YVQzF4zQchi3ru3UYkgkhgC2LnRRf_NFl1iJId4Iw2Q&e=
>
>
> * Javadoc:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__home.
> apache.org_-7Eewencp_kafka-2D1.0.1-2Drc0_javadoc_&d=
> DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-
> kjJc7uSVcviKUc&m=Z45uiGWoLkCQ5hYes5SiOy1n_pA3ih4Cvmr5W32xx98&s=
> Y7hXIhHxDGb-M7d6kLZaargoYcLW6kH3agSdqO1SuwQ&e=
>
>
> * Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_apache_kafka_tree_1.0.1-2Drc0&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_
> itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc&m=Z45uiGWoLkCQ5hYes5SiOy1n_
> pA3ih4Cvmr5W32xx98&s=L729TlgNpT-y8WQzeZTsNATg1zFfAsCpXBhXfbu6UXk&e=
>
>
>
> * Documentation:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__kafka.
> apache.org_10_documentation.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_
> itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc&m=Z45uiGWoLkCQ5hYes5SiOy1n_
> pA3ih4Cvmr5W32xx98&s=DYynoi4X5K3p9DwzxkGYp8vprFK4qvPPQtO1IvQEbME&e=
>
>
> * Protocol:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__kafka.
> apache.org_10_protocol.html&d=DwIBaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc&m=
> Z45uiGWoLkCQ5hYes5SiOy1n_pA3ih4Cvmr5W32xx98&s=_BLA3u9JgZKeJ0Kwij9_
> 2J3lnxt8rCCXmptRh4OUPic&e=
>
>
>
> Please test and verify the release artifacts and submit a vote for this
> RC,
> or report any issues so we can fix them and get a new RC out ASAP!
> Although
> this release vote requires PMC votes to pass, testing, votes, and bug
> reports are valuable and appreciated from everyone.
>
> Thanks,
> Ewen
>
>
>
>
>


[VOTE] 1.0.1 RC1

2018-02-12 Thread Ewen Cheslack-Postava
Hello Kafka users, developers and client-developers,

This is the second candidate for release of Apache Kafka 1.0.1.

This is a bugfix release for the 1.0 branch that was first released with
1.0.0 about 3 months ago. We've fixed 49 significant issues since that
release. Most of these are non-critical, but in aggregate these fixes will
have significant impact. A few of the more significant fixes include:

* KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
plugins
* KAFKA-6185: Selector memory leak with high likelihood of OOM in case of
down conversion
* KAFKA-6269: KTable state restore fails after rebalance
* KAFKA-6190: GlobalKTable never finishes restoring when consuming
transactional messages
* KAFKA-6529: Stop file descriptor leak when client disconnects with staged
receives

Release notes for the 1.0.1 release:
http://home.apache.org/~ewencp/kafka-1.0.1-rc1/RELEASE_NOTES.html

*** Please download, test and vote by Thursday, Feb 15, 5pm PT ***

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~ewencp/kafka-1.0.1-rc1/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~ewencp/kafka-1.0.1-rc1/javadoc/

* Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
https://github.com/apache/kafka/tree/1.0.1-rc1

* Documentation:
http://kafka.apache.org/10/documentation.html

* Protocol:
http://kafka.apache.org/10/protocol.html


Thanks,
Ewen Cheslack-Postava


Re: [VOTE] 1.0.1 RC1

2018-02-12 Thread Ewen Cheslack-Postava
And of course I'm +1 since I've already done normal release validation
before posting this.

-Ewen

On Mon, Feb 12, 2018 at 10:15 AM, Ewen Cheslack-Postava 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka 1.0.1.
>
> This is a bugfix release for the 1.0 branch that was first released with
> 1.0.0 about 3 months ago. We've fixed 49 significant issues since that
> release. Most of these are non-critical, but in aggregate these fixes will
> have significant impact. A few of the more significant fixes include:
>
> * KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
> plugins
> * KAFKA-6185: Selector memory leak with high likelihood of OOM in case of
> down conversion
> * KAFKA-6269: KTable state restore fails after rebalance
> * KAFKA-6190: GlobalKTable never finishes restoring when consuming
> transactional messages
> * KAFKA-6529: Stop file descriptor leak when client disconnects with
> staged receives
>
> Release notes for the 1.0.1 release:
> http://home.apache.org/~ewencp/kafka-1.0.1-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Thursday, Feb 15, 5pm PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ewencp/kafka-1.0.1-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~ewencp/kafka-1.0.1-rc1/javadoc/
>
> * Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
> https://github.com/apache/kafka/tree/1.0.1-rc1
>
> * Documentation:
> http://kafka.apache.org/10/documentation.html
>
> * Protocol:
> http://kafka.apache.org/10/protocol.html
>
>
> Thanks,
> Ewen Cheslack-Postava
>


Re: [VOTE] 1.0.1 RC1

2018-02-20 Thread Ewen Cheslack-Postava
Just a heads up that an issue with upgrades was found
https://issues.apache.org/jira/browse/KAFKA-6238 There's a PR in progress
to address the underlying issue. There was a workaround, but it looked like
this would cause more pain than doing one more RC, so we'll have an RC2 up
soon after the PR is merged.

-Ewen

On Fri, Feb 16, 2018 at 3:53 AM, Mickael Maison 
wrote:

> Ran tests from source and quickstart with binaries
>
> +1 (non-binding)
>
> On Fri, Feb 16, 2018 at 6:05 AM, Jason Gustafson 
> wrote:
> > +1. Verified artifacts and ran quickstart. Thanks Ewen!
> >
> > -Jason
> >
> > On Thu, Feb 15, 2018 at 1:42 AM, Rajini Sivaram  >
> > wrote:
> >
> >> +1
> >>
> >> Ran quickstart with binaries, built source and ran tests,
> >>
> >> Thank you for running the release, Ewen.
> >>
> >> Regards,
> >>
> >> Rajini
> >>
> >> On Thu, Feb 15, 2018 at 2:31 AM, Guozhang Wang 
> wrote:
> >>
> >> > +1
> >> >
> >> > Ran tests, verified web docs.
> >> >
> >> > On Wed, Feb 14, 2018 at 6:00 PM, Satish Duggana <
> >> satish.dugg...@gmail.com>
> >> > wrote:
> >> >
> >> > > +1 (non-binding)
> >> > >
> >> > > - Ran testAll/releaseTarGzAll on 1.0.1-rc1
> >> > > <https://github.com/apache/kafka/tree/1.0.1-rc1> tag
> >> > > - Ran through quickstart of core/streams
> >> > >
> >> > > Thanks,
> >> > > Satish.
> >> > >
> >> > >
> >> > > On Tue, Feb 13, 2018 at 11:30 PM, Damian Guy 
> >> > wrote:
> >> > >
> >> > > > +1
> >> > > >
> >> > > > Ran tests, verified streams quickstart works
> >> > > >
> >> > > > On Tue, 13 Feb 2018 at 17:52 Damian Guy 
> >> wrote:
> >> > > >
> >> > > > > Thanks Ewen - i had the staging repo set up as profile that i
> >> forgot
> >> > to
> >> > > > > add to my maven command. All good.
> >> > > > >
> >> > > > > On Tue, 13 Feb 2018 at 17:41 Ewen Cheslack-Postava <
> >> > e...@confluent.io>
> >> > > > > wrote:
> >> > > > >
> >> > > > >> Damian,
> >> > > > >>
> >> > > > >> Which quickstart are you referring to? The streams quickstart
> only
> >> > > > >> executes
> >> > > > >> pre-built stuff afaict.
> >> > > > >>
> >> > > > >> In any case, if you're building a maven streams project, did
> you
> >> > > modify
> >> > > > it
> >> > > > >> to point to the staging repository at
> >> > > > >> https://repository.apache.org/content/groups/staging/ in
> addition
> >> > to
> >> > > > the
> >> > > > >> default repos? During rc it wouldn't fetch from maven central
> >> since
> >> > it
> >> > > > >> hasn't been published there yet.
> >> > > > >>
> >> > > > >> If that is configured, more compete maven output would be
> helpful
> >> to
> >> > > > track
> >> > > > >> down where it is failing to resolve the necessary archetype.
> >> > > > >>
> >> > > > >> -Ewen
> >> > > > >>
> >> > > > >> On Tue, Feb 13, 2018 at 3:03 AM, Damian Guy <
> damian@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >>
> >> > > > >> > Hi Ewen,
> >> > > > >> >
> >> > > > >> > I'm trying to run the streams quickstart and I'm getting:
> >> > > > >> > [ERROR] Failed to execute goal
> >> > > > >> > org.apache.maven.plugins:maven-archetype-plugin:3.0.1:
> generate
> >> > > > >> > (default-cli) on project standalone-pom: The desired
> archetype
> >> > does
> >> > > > not
> >> > > > >> > exist (org.apache.kafka:streams-quickstart-java:1.0.1)
> >> > > > >> >
> >> > > > >> > Something i'm missing

[VOTE] 1.0.1 RC2

2018-02-21 Thread Ewen Cheslack-Postava
Hello Kafka users, developers and client-developers,

This is the third candidate for release of Apache Kafka 1.0.1.

This is a bugfix release for the 1.0 branch that was first released with
1.0.0 about 3 months ago. We've fixed 49 issues since that release. Most of
these are non-critical, but in aggregate these fixes will have significant
impact. A few of the more significant fixes include:

* KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
plugins
* KAFKA-6185: Selector memory leak with high likelihood of OOM in case of
down conversion
* KAFKA-6269: KTable state restore fails after rebalance
* KAFKA-6190: GlobalKTable never finishes restoring when consuming
transactional messages
* KAFKA-6529: Stop file descriptor leak when client disconnects with staged
receives
* KAFKA-6238: Issues with protocol version when applying a rolling upgrade
to 1.0.0

Release notes for the 1.0.1 release:
http://home.apache.org/~ewencp/kafka-1.0.1-rc2/RELEASE_NOTES.html

*** Please download, test and vote by Saturday Feb 24, 9pm PT ***

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~ewencp/kafka-1.0.1-rc2/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/

* Javadoc:
http://home.apache.org/~ewencp/kafka-1.0.1-rc2/javadoc/

* Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
https://github.com/apache/kafka/tree/1.0.1-rc2

* Documentation:
http://kafka.apache.org/10/documentation.html

* Protocol:
http://kafka.apache.org/10/protocol.html

/**

Thanks,
Ewen Cheslack-Postava


Re: Question about developer documentation

2018-02-27 Thread Ewen Cheslack-Postava
The web page is more about general project info and might be of interest to
people beyond just developers. But I agree the wiki landing page could use
some updating. Even more than just the developer section as we're missing
several releases, the oldest ones are listed at the top, etc.

-Ewen

On Fri, Feb 9, 2018 at 10:53 AM, Ray Chiang  wrote:

> There is some documentation for developers at:
>
>   http://kafka.apache.org/project
>
> There's also another set of links at the bottom of this wiki page:
>
>   https://cwiki.apache.org/confluence/display/KAFKA/Index
>
> There's some minor duplication of information, but it's definitely not
> quite presented in a clean "step by step" manner.
>
> I think it could benefit from a reorganization of how the information is
> presented.  Before I start making suggestions, does anyone have any
> thoughts on the subject?
>
> -Ray
>
>


Re: Kafka 0.11.0.1 and filebeat 6.1 compatibility

2018-02-27 Thread Ewen Cheslack-Postava
filebeat is implemented using sarama last I checked, so presumably they are
on a version that doesn't know about Kafka 0.11.0.1 and therefore it
doesn't know which API versions to use.

Not sure if they support leaving it blank or exactly how the sarama config
works, but as far as I know sarama has support for the API compatibility
request that was added way back in 0.10.0.0, meaning it really shouldn't
need version info specified for anything newer than that unless they
specifically chose not to support some API versions.

In either case, this is probably better answered by folks from Elastic or
the sarama project.

-Ewen

On Fri, Feb 9, 2018 at 7:11 AM, Sandeep Sarkar 
wrote:

> Hi All,
>
>
>
> I am using filebeat 6.1 and kafka 0.11.0.1 to push logs. From filebeat
> logs I could see that communication is getting established but then logs
> are not getting pushed.
>
> Also when I updated filebeat's output.kafka.version to 0.11.0.1 I get
> error message :
>
>
>
> 2018/02/08 09:41:15.417691 beat.go:635: CRIT Exiting: error initializing
> publisher:  unknown/unsupported kafka version '0.11.0.1' accessing
> 'output.kafka' (source:'/etc/filebeat/filebeat.yml')
>
> Exiting: error initializing publisher: unknown/unsupported kafka version
> '0.11.0.1' accessing 'output.kafka' (source:'/etc/filebeat/filebeat.yml')
>
>
>
> So I used kafka 0.10.2.0 and updated filebeat's output.kafka.version to
> 0.10.2.0 and the error message is gone but I do not get any logs in kafka.
> Filebeate throws
>
> Kafka publish failed with: circuit breaker is open
>
>
>
> And after some time it throws below message in a loop:
>
> 2018/02/09 15:08:13.075932 client.go:234: DBG [kafka] Kafka publish failed
> with: circuit breaker is open
>
> 2018/02/09 15:08:13.110793 client.go:220: DBG [kafka] finished kafka batch
>
> 2018/02/09 15:08:13.110814 client.go:234: DBG [kafka] Kafka publish failed
> with: circuit breaker is open
>
> 2018/02/09 15:08:13.149482 metrics.go:39: INFO Non-zero metrics in the
> last 30s: beat.info.uptime.ms=3 beat.memstats.gc_next=18583840
> beat.memstats.memory_alloc=17667464 beat.memstats.memory_total=3883962925856
> filebeat.harvester.open_files=11 filebeat.harvester.running=10
> libbeat.config.module.running=0 libbeat.output.events.batches=832
> libbeat.output.events.failed=1703936 libbeat.output.events.total=1703936
> libbeat.pipeline.clients=2 libbeat.pipeline.events.active=4118
> libbeat.pipeline.events.retry=1703936 registrar.states.current=21
>
>
>
>
>
> Please tell me how to debug this.
>
>
>
> I am using logstash 6.1 as consumer.
>
>
>
> --
> Thanks
>
> http://www.oracle.com/
> Sandeep Sarkar | Member of Technical Staff
> Phone: HYPERLINK "tel:+918067445685"+918067445685
> Oracle CGBU PM&C
>
> Oracle India Bangalore
>
> http://www.oracle.com/commitment
>
> Oracle is committed to developing practices and products that help protect
> the environment
>
>
>
>
>


Re: [VOTE] 1.0.1 RC2

2018-03-02 Thread Ewen Cheslack-Postava
Thanks everyone for voting. This passes with 3 binding +1, 5 non-binding
+1, and no dissenting votes.

I'll work on getting the release finalized and send out an announcement
when it is ready.

-Ewen

On Tue, Feb 27, 2018 at 11:18 PM, Jason Gustafson 
wrote:

> +1. Verified artifacts and ran the basic quickstart.
>
> -Jason
>
> On Mon, Feb 26, 2018 at 1:08 AM, Manikumar 
> wrote:
>
> > +1 (non-binding)
> > Built src and ran tests
> > Ran core quick start
> >
> > On Sat, Feb 24, 2018 at 8:44 PM, Jakub Scholz  wrote:
> >
> > > +1 (non-binding) ... I used the Scala 2.12 binaries and run my tests
> with
> > > producers / consumers.
> > >
> > > On Thu, Feb 22, 2018 at 1:06 AM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the third candidate for release of Apache Kafka 1.0.1.
> > > >
> > > > This is a bugfix release for the 1.0 branch that was first released
> > with
> > > > 1.0.0 about 3 months ago. We've fixed 49 issues since that release.
> > Most
> > > of
> > > > these are non-critical, but in aggregate these fixes will have
> > > significant
> > > > impact. A few of the more significant fixes include:
> > > >
> > > > * KAFKA-6277: Make loadClass thread-safe for class loaders of Connect
> > > > plugins
> > > > * KAFKA-6185: Selector memory leak with high likelihood of OOM in
> case
> > of
> > > > down conversion
> > > > * KAFKA-6269: KTable state restore fails after rebalance
> > > > * KAFKA-6190: GlobalKTable never finishes restoring when consuming
> > > > transactional messages
> > > > * KAFKA-6529: Stop file descriptor leak when client disconnects with
> > > staged
> > > > receives
> > > > * KAFKA-6238: Issues with protocol version when applying a rolling
> > > upgrade
> > > > to 1.0.0
> > > >
> > > > Release notes for the 1.0.1 release:
> > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc2/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote by Saturday Feb 24, 9pm PT ***
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > http://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc2/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > > https://repository.apache.org/content/groups/staging/
> > > >
> > > > * Javadoc:
> > > > http://home.apache.org/~ewencp/kafka-1.0.1-rc2/javadoc/
> > > >
> > > > * Tag to be voted upon (off 1.0 branch) is the 1.0.1 tag:
> > > > https://github.com/apache/kafka/tree/1.0.1-rc2
> > > >
> > > > * Documentation:
> > > > http://kafka.apache.org/10/documentation.html
> > > >
> > > > * Protocol:
> > > > http://kafka.apache.org/10/protocol.html
> > > >
> > > > /**
> > > >
> > > > Thanks,
> > > > Ewen Cheslack-Postava
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-186: Increase offsets retention default to 7 days

2018-03-05 Thread Ewen Cheslack-Postava
Reviving this thread.

@Stevo I think there are more refinements we can make, but I'd like to get
at least this pushed through because this keeps biting people unexpectedly.
How about we bump the default now just to address the immediate issue and
we can follow up with additional refinements in future KIPs?

It's also worth pointing out we also have
https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets.
That KIP dovetails nicely with this one -- this addresses the immediate
problem but can still expire offsets while the group is still active (which
is probably unintuitive). However, even if/when KIP-211 makes it in, this
will simply extend the period of the group being empty before removing
offsets to 7 days by default. This is still a significant improvement since
it helps with, e.g., cases where you have an outage and need to debug and
it takes longer than 24h. 1w is still a much better default even in the
case we wait for the group to be inactive before starting the expiration
timer.

I've also updated the KIP with the trivial PR:
https://github.com/apache/kafka/pull/4648

I'm going to kick off the vote thread so we can get this fixed for the next
version.

-Ewen


On Fri, Oct 6, 2017 at 11:00 AM, Manikumar 
wrote:

> looks like VOTE thread is *NOT* started for this KIP.
>
> On Fri, Oct 6, 2017 at 11:23 PM, Manikumar 
> wrote:
>
> > looks like VOTE thread is started for this KIP.
> >
> >
> > On Wed, Aug 16, 2017 at 5:39 PM, Stevo Slavić  wrote:
> >
> >> +1 for making consistent default log and offsets retention time.
> >> I like Stephane's suggestion too, log retention override should override
> >> offset retention too if not explicitly configured.
> >>
> >> Please consider additionally:
> >> - introducing offsets.retention.hours config property
> >> - syncing log and offsets retention.check.interval.ms, if there's no
> real
> >> reason for the two to differ
> >> -- consider making retention check interval by default (if not
> explicitly
> >> configured) a fraction of retention time
> >> - name all "offsets" configs with "offsets" prefix (now it's a mix of
> >> singular/"offset" and plural/"offsets")
> >>
> >>
> >> On Fri, Aug 11, 2017 at 2:01 AM, Guozhang Wang 
> >> wrote:
> >>
> >> > +1 from me
> >> >
> >> > On Wed, Aug 9, 2017 at 9:40 AM, Jason Gustafson 
> >> > wrote:
> >> >
> >> > > +1 on the bump to 7 days. Wanted to mention one minor point. The
> >> > > OffsetCommit RPC still provides the ability to set the retention
> time
> >> > from
> >> > > the client, but we do not use it in the consumer. Should we consider
> >> > adding
> >> > > a consumer config to set this? Given the problems people had with
> the
> >> old
> >> > > default, such a config would probably have gotten a fair bit of use.
> >> > Maybe
> >> > > it's less necessary with the new default, but there may be
> situations
> >> > where
> >> > > you don't want to keep the offsets for too long. For example, the
> >> console
> >> > > consumer commits offsets with a generated group id. We might want to
> >> set
> >> > a
> >> > > low retention time to keep it from filling the offset cache with
> >> garbage
> >> > > from such groups.
> >> > >
> >> >
> >> > I agree with Jason here, but maybe itself deserves a separate KIP
> >> > discussion.
> >> >
> >> >
> >> > >
> >> > > -Jason
> >> > >
> >> > > On Wed, Aug 9, 2017 at 5:24 AM, Sönke Liebau <
> >> > > soenke.lie...@opencore.com.invalid> wrote:
> >> > >
> >> > > > Just had this create issues at a customer as well, +1
> >> > > >
> >> > > > On Wed, Aug 9, 2017 at 11:46 AM, Mickael Maison <
> >> > > mickael.mai...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Yes the current default is too short, +1
> >> > > > >
> >> > > > > On Wed, Aug 9, 2017 at 8:56 AM, Ismael Juma 
> >> > wrote:
> >> > > > > > Thanks for the KIP, +1 from me.
> >> > > > > >
> >> > > > > > Ismael
> >> > > > > >
> >> > > > > > On Wed, Aug 9, 2017 at 1:24 AM, Ewen Cheslack-Postava <
> >> > > > e...@confluent.io
> >> > > > > >
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > >> Hi all,
> >> > > > > >>
> >> > > > > >> I posted a simple new KIP for a problem we see with a lot of
> >> > users:
> >> > > > > >> KIP-186: Increase offsets retention default to 7 days
> >> > > > > >>
> >> > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > > > > >> 186%3A+Increase+offsets+retention+default+to+7+days
> >> > > > > >>
> >> > > > > >> Note that in addition to the KIP text itself, the linked JIRA
> >> > > already
> >> > > > > >> existed and has a bunch of discussion on the subject.
> >> > > > > >>
> >> > > > > >> -Ewen
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Sönke Liebau
> >> > > > Partner
> >> > > > Tel. +49 179 7940878
> >> > > > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel -
> >> Germany
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > -- Guozhang
> >> >
> >>
> >
> >
>


[VOTE] KIP-186: Increase offsets retention default to 7 days

2018-03-05 Thread Ewen Cheslack-Postava
I'd like to kick off voting for KIP-186:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-186%3A+Increase+offsets+retention+default+to+7+days

This is the trivial fix that people in the DISCUSS thread were in favor of.
There are some ideas for further refinements, but I think we can follow up
with those in subsequent KIPs, see the discussion thread for details. Also
note this is related, but complementary, to
https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets
.

And of course +1 (binding) from me.

Thanks,
Ewen


[ANNOUNCE] Apache Kafka 1.0.1 Released

2018-03-06 Thread Ewen Cheslack-Postava
The Apache Kafka community is pleased to announce the release for Apache Kafka
1.0.1.

This is a bugfix release for the 1.0 branch that was first released with 1.0.0 
about 4 months ago. We've fixed 49 issues since that release. Most of these are 
non-critical, but in aggregate these fixes will have significant impact. A few 
of the more significant fixes include:

* KAFKA-6277: Make loadClass thread-safe for class loaders of Connect plugins
* KAFKA-6185: Selector memory leak with high likelihood of OOM in case of down 
conversion
* KAFKA-6269: KTable state restore fails after rebalance
* KAFKA-6190: GlobalKTable never finishes restoring when consuming 
transactional messages
* KAFKA-6529: Stop file descriptor leak when client disconnects with staged 
receives
* KAFKA-6238: Issues with protocol version when applying a rolling upgrade to 
1.0.0


All of the changes in this release can be found in the release notes:


https://dist.apache.org/repos/dist/release/kafka/1.0.1/RELEASE_NOTES.html



You can download the source release from:


https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.1/kafka-1.0.1-src.tgz


and binary releases from:


https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.1/kafka_2.11-1.0.1.tgz
(Scala 2.11)

https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.1/kafka_2.12-1.0.1.tgz
(Scala 2.12)

---


Apache Kafka is a distributed streaming platform with four core APIs:


** The Producer API allows an application to publish a stream records to
one or more Kafka topics.


** The Consumer API allows an application to subscribe to one or more
topics and process the stream of records produced to them.


** The Streams API allows an application to act as a stream processor,
consuming an input stream from one or more topics and producing an output
stream to one or more output topics, effectively transforming the input
streams to output streams.


** The Connector API allows building and running reusable producers or
consumers that connect Kafka topics to existing applications or data
systems. For example, a connector to a relational database might capture
every change to a table.three key capabilities:



With these APIs, Kafka can be used for two broad classes of application:


** Building real-time streaming data pipelines that reliably get data
between systems or applications.


** Building real-time streaming applications that transform or react to the
streams of data.



Apache Kafka is in use at large and small companies worldwide, including 
Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank, 
Target, The New York Times, Uber, Yelp, and Zalando, among others.



A big thank you for the following 36 contributors to this release!

Alex Good, Andras Beni, Andy Bryant, Arjun Satish, Bill Bejeck, Colin P. 
Mccabe, Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, Daniel 
Wojda, Dong Lin, Edoardo Comar, Ewen Cheslack-Postava, Filipe Agapito, fredfp, 
Guozhang Wang, huxihx, Ismael Juma, Jason Gustafson, Jeremy Custenborder, 
Jiangjie (Becket) Qin, Joel Hamill, Konstantine Karantasis, lisa2lisa, Logan 
Buckley, Manjula K, Matthias J. Sax, Nick Chiu, parafiend, Rajini Sivaram, 
Randall Hauch, Robert Yokota, Ron Dagostino, tedyu, Yaswanth Kumar, Yu.


We welcome your help and feedback. For more information on how to
report problems,
and to get involved, visit the project website at http://kafka.apache.org/


Thank you!
Ewen


Re: Exposing additional metadata in Kafka Connect schema parameters

2018-03-06 Thread Ewen Cheslack-Postava
I can only speak to my view on it, but I view logical types as truly
generic. The point is that they can be handled as the underlying type
safely, processed as such as needed, but if the logical type is properly
preserved they can properly translate through as the more "complicated"
type.

To me, this would mean that we try to preserve logical type names and
parameters where possible, such that use cases like you're suggesting would
be feasible -- we might just handle them as int32, but by carrying along
the relevant type/parameter info, we would enable other apps/systems to
handle them in a more nuanced way.

If you want to encode the column type, I would recommend using the same
approach Connect does for logical types (specifying the schema name as the
class that implements it), but in general using any name & parameters
*should* work, modulo any bugs in converters that fail to preserve that
information.

-Ewen

On Tue, Mar 6, 2018 at 3:29 AM, Gunnar Morling  wrote:

> Hi,
>
> A user of the Debezium CDC Kafka Connect connectors has asked whether we
> could provide information about the original source type of captured table
> columns.
>
> Usually the type info we provide by using the Kafka Connect types and some
> custom semantic types is good enough. But there are some cases where
> additional type info would help: e.g. in case of MySQL, MEDIUMINT and INT
> columns are transmitted as Connect Int32 (as that's the smallest type which
> covers their value range). But from that, a consumer can't tell wether an
> INT or MEDIUMINT column should be created in a downstream database.
>
> Now my question is: would it be a reasonable thing for us to encode the
> original column type as an additional parameter of the Kafka Connect
> schemas (using a special parameter name), or would this be bending the
> concept of schema parameters too much? Admittedly, this metadata would be
> kind of source-specific, but I can see how it'd be beneficial in some use
> cases.
>
> Thanks for any advice,
>
> --Gunnar
>


Re: [VOTE] KIP-186: Increase offsets retention default to 7 days

2018-03-09 Thread Ewen Cheslack-Postava
Thanks for voting everyone, KIP is accepted with 5 binding +1 and 5
non-binding +1s.

PR is already merged and there are upgrade notes about the potential memory
impact.

-Ewen

On Wed, Mar 7, 2018 at 9:25 PM, Ted Yu  wrote:

> +1
>  Original message From: Guozhang Wang 
> Date: 3/7/18  9:17 PM  (GMT-08:00) To: dev@kafka.apache.org Subject: Re:
> [VOTE] KIP-186: Increase offsets retention default to 7 days
> +1 (binding).
>
> On Wed, Mar 7, 2018 at 5:04 PM, James Cheng  wrote:
>
> > +1 (non-binding)
> >
> > -James
> >
> > > On Mar 7, 2018, at 1:20 PM, Jay Kreps  wrote:
> > >
> > > +1
> > >
> > > I think we can improve this in the future, but this simple change will
> > > avoid a lot of pain. Thanks for reviving it Ewen.
> > >
> > > -Jay
> > >
> > > On Mon, Mar 5, 2018 at 11:35 AM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > >> I'd like to kick off voting for KIP-186:
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> 186%3A+Increase+offsets+retention+default+to+7+days
> > >>
> > >> This is the trivial fix that people in the DISCUSS thread were in
> favor
> > of.
> > >> There are some ideas for further refinements, but I think we can
> follow
> > up
> > >> with those in subsequent KIPs, see the discussion thread for details.
> > Also
> > >> note this is related, but complementary, to
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> 211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets
> > >> .
> > >>
> > >> And of course +1 (binding) from me.
> > >>
> > >> Thanks,
> > >> Ewen
> > >>
> >
> >
>
>
> --
> -- Guozhang
>


Re: Seeking Feedback on Kafka Connect Issues

2018-03-19 Thread Ewen Cheslack-Postava
Responses inline.

On Mon, Mar 19, 2018 at 3:02 PM, Matt Farmer  wrote:

> Hi everyone,
>
> We’ve been experimenting recently with some limited use of Kafka Connect
> and are hoping to expand to wider use cases soon. However, we had some
> internal issues that gave us a well-timed preview of error handling
> behavior in Kafka Connect. I think the fixes for this will require at least
> three different KIPs, but I want to share some thoughts to get the initial
> reaction from folks in the dev community. If these ideas seem reasonable, I
> can go ahead and create the required KIPs.
>
> Here are the three things specifically we ran into…
>
> ---
>
> (1) Kafka Connect only retries tasks when certain exceptions are thrown
> Currently, Kafka Connect only retries tasks when certain exceptions are
> thrown - I believe the logic checks to see if the exception is specifically
> marked as “retryable” and if not, fails. We’d like to bypass this behavior
> and implement a configurable exponential backoff for tasks regardless of
> the failure reason. This is probably two changes: one to implement
> exponential backoff retries for tasks if they don’t already exist and a
> chance to implement a RetryPolicy interface that evaluates the Exception to
> determine whether or not to retry.
>

This has definitely come up before. The likely "fix" for this is to provide
general "bad data handling" options within the framework itself. The
obvious set would be

1. fail fast, which is what we do today (assuming connector actually fails
and doesn't eat errors)
2. retry (possibly with configs to limit)
3. drop data and move on
4. dead letter queue

This needs to be addressed in a way that handles errors from:

1. The connector itself (e.g. connectivity issues to the other system)
2. Converters/serializers (bad data, unexpected format, etc)
3. SMTs
4. Ideally the fmwk as well (though I don't think we have any known bugs
where this would be a problem, and we'd be inclined to just fix them
anyway).

I think we understand the space of problems and how to address them pretty
well already, this issue is really just a matter of someone finding the
time to KIP, implement, and review/implement. (And that review/commit one
realistically means we need multiple people's time). Happy to guide anyone
interested on next steps. If not addressed by general community, Confluent
will get to this at some point, but I couldn't say when that would be --
Randall might know better than I would.


> (2) Kafka Connect doesn’t permit Connectors to smartly reposition after
> rebalance
> We’re using the S3 connector to dump files with a large number of records
> into an S3 bucket. About 100,000 records per file. Unfortunately, every
> time a task fails, the consumer rebalance causes all partitions to get
> re-shuffled amongst the various partitions. To compensate for this, the
> connector gets stopped and started from what I can tell from the logs? And
> then picks up from the last consumer position that was committed to the
> brokers.
>
> This doesn’t work great if you’re batching things into large numbers for
> archival.
>
> For the S3 connector, for example: Let’s say I have two partitions and the
> connector has two tasks to process each of those. Task 0 is at 5,000
> records read from the last commit and Task 1 is at 70,000 records read from
> the last commit. Then, boom, something goes wrong with Task 0 and it falls
> over. This triggers a rebalance and Task 1 has to take over the workload.
> Task 1 will, at this point, discard the 70,000 records in its buffer and
> start from the last commit point. This failure mode is brutal for the
> archival system we’re building.
>
>
Yes, this is a known pain point. Usually it shows up as more of an issue
for running a lot of connectors (where you don't want a tasks failure to
unnecessarily affect unrelated work), but the concern for connectors which
do relatively infrequent commits is valid as well. I'll make a point on the
first solution then see below for more complete answer.


> There are two solutions that I can think of to this:
>
> (A) Provide an interface for connectors to define their own rebalance
> listener. This listener could compare the newly assigned list of partitions
> with a previously assigned list. For all partitions that this connector was
> already working on prior to the rebalance, it could manually seek to the
> last position it locally processed before resuming. So, in the scenario
> above Task 1 could keep an accounting file locally and seek over the first
> 70,000 records without reprocessing them. It would then wait until after it
> confirms the S3 upload to commit those offsets back to Kafka. This ensures
> that if the machine running Task 1 dies a new consumer can take its place,
> but we’ll still benefit from a local cache if one is present.
>

For sink tasks, this actually already exists -- see
http://kafka.apache.org/10/javadoc/org/apache/kafka/connect/sink/SinkTask.html#open-jav

Re: [DISCUSS] KIP-242: Mask password fields in Kafka Connect REST response

2018-03-19 Thread Ewen Cheslack-Postava
SSL authentication was added in KIP-208, which will be included in Kafka
1.1.0:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface

Connect isn't much different from the core Kafka/client configs currently
where in some security setups you need to pass in passwords directly, and
since there's various dynamic broker config improvements in the works, the
fact that Connect exposes these in a REST API doesn't make it any
different. I think the real long term solution to this is to add pluggable
password support where you could, e.g., get these values out of a separate
secrets management system instead of specifying them directly. Masking
passwords as described in this solution feels like it's more of a temporary
workaround and in order to be able to edit and update these connector
configs by working with the REST API, we'd have to address these issues
anyway.

-Ewen

On Mon, Mar 19, 2018 at 2:33 PM, Matt Farmer  wrote:

> What’s the status of this? This is a pretty hard blocker for us to meet
> requirements internally to deploy connect in a distributed fashion.
>
> @Ewen - Regarding the concern of accessing information securely - has
> there been any consideration of adding authentication to the connect api?
>
> > On Jan 17, 2018, at 3:55 PM, Randall Hauch  wrote:
> >
> > Vincent,
> >
> > Can the KIP more explicitly say that this is opt-in, and that by default
> > nothing will change?
> >
> > Randall
> >
> > On Tue, Jan 16, 2018 at 11:18 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> >> Vincent,
> >>
> >> I think with the addition of a configuration to control this for
> >> compatibility, people would generally be ok with it. If you want to
> start a
> >> VOTE thread, the KIP deadline is coming up and the PR looks pretty
> small. I
> >> will take a pass at reviewing the PR so we'll be ready to merge if we
> can
> >> get the KIP voted through.
> >>
> >> Thanks,
> >> Ewen
> >>
> >> On Fri, Jan 12, 2018 at 10:18 AM, Vincent Meng  wrote:
> >>
> >>> @Ted: The issue is kinda hard to reproduce. It's just something we
> >> observe
> >>> over time.
> >>>
> >>> @Ewen: I agree. Opt-in seems to be a good solution to me. To your
> >> question,
> >>> if there is no ConfDef that defines which fields are Passwords we can
> >> just
> >>> return the config as is.
> >>>
> >>> There is a PR for this KIP already. Comments/Discussions are welcome.
> >>> https://github.com/apache/kafka/pull/4269
> >>>
> >>> On Tue, Jan 2, 2018 at 8:52 PM, Ewen Cheslack-Postava <
> e...@confluent.io
> >>>
> >>> wrote:
> >>>
> >>>> Vincent,
> >>>>
> >>>> Thanks for the KIP. This is definitely an issue we know is a problem
> >> for
> >>>> some users.
> >>>>
> >>>> I think the major problem with the KIP as-is is that it makes it
> >>> impossible
> >>>> to get the original value back out of the API. This KIP probably ties
> >> in
> >>>> significantly with ideas for securing the REST API (SSL) and adding
> >> ACLs
> >>> to
> >>>> it. Both are things we know people want, but haven't happened yet.
> >>> However,
> >>>> it also interacts with other approaches to adding those features, e.g.
> >>>> layering proxies on top of the existing API (e.g. nginx, apache, etc).
> >>> Just
> >>>> doing a blanket replacement of password values with a constant would
> >>> likely
> >>>> break things for people who secure things via a proxy (and may just
> not
> >>>> allow reads of configs unless the user is authorized for the
> particular
> >>>> connector). These are the types of concerns we like to think through
> in
> >>> the
> >>>> compatibility section. One option to get the masking functionality in
> >>>> without depending on a bunch of other security improvements might be
> to
> >>>> make this configurable so users that need this (and can forgo seeing a
> >>>> valid config via the API) can opt-in.
> >>>>
> >>>> Regarding your individual points:
> >>>>
> >>>> * I don't think the particular value for the masked content matters
> >> much.
> >>>> Any constant ind

Re: Kafka Connect task re-balance repeatedly

2018-03-22 Thread Ewen Cheslack-Postava
The log is showing that the Connect worker is trying to make sure it has
read the entire log and gets to offset 119, but some other worker says it
has read to offset 169. The two are in inconsistent states, so the one that
seems to be behind will not start work with potentially outdated
configuration info.

Since it is logging that it *finished* reading to the end of the log, the
steps to check the offset and validate we've read those offsets have been
met. That seems to indicate the two workers aren't looking at the same data.

One case I'm aware of where this can happen is if you have two connect
workers that are configured to be in the same Connect worker group (their
worker configs have the same group.id) but they are actually looking at
different config topics (config.storage.topic is different for the two
workers). Sometimes people make this mistake when they start running
multiple Connect clusters but forget to make some of the default settings
unique. I'd probably start by looking into that possibility to debug this
issue.

-Ewen

On Wed, Mar 21, 2018 at 10:15 PM, Ziliang Chen  wrote:

> Hi,
>
> I have 2 Kafka Connect instances runs in 2 boxes which forms a Kafka
> Connect Cluster. One of the instance seems doing the re-balance repeatedly
> in a dead loop without running the actual Sink task, the other works fine.
> The following is the output message in the console.
>
> May I ask if you have ever encountered similar issue before ?
>
> Thank you so much !
>
>
> [2018-03-21 14:53:15,671] WARN Catching up to assignment's config offset.
> > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:762)
> > [2018-03-21 14:53:15,671] INFO Current config state offset 119 is behind
> > group assignment 169, reading to end of config log
> > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:807)
> > [2018-03-21 14:53:16,046] INFO Finished reading to end of log and updated
> > config snapshot, new config log offset: 119
> > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:811)
> > [2018-03-21 14:53:16,046] INFO Current config state offset 119 does not
> > match group assignment 169. Forcing rebalance.
> > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:786)
> > [2018-03-21 14:53:16,046] INFO Rebalance started
> > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1214)
> > [2018-03-21 14:53:16,046] INFO Wasn't unable to resume work after last
> > rebalance, can skip stopping connectors and tasks
> > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1246)
> > [2018-03-21 14:53:16,046] INFO [Worker clientId=connect-1,
> > groupId=kafka-connect-NA-sink] (Re-)joining group
> > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
> > [2018-03-21 14:53:16,172] INFO [Worker clientId=connect-1,
> > groupId=kafka-connect-NA-sink] Successfully joined group with generation
> 14
> > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
> > [2018-03-21 14:53:16,173] INFO Joined group and got assignment:
> > Assignment{error=0,
> > leader='connect-1-3b12c02b-070e-431b-9ec9-924cffd2b6dc', leaderUrl='
> > http://1.1.1.1:8083/', offset=169, connectorIds=[Send_To_NA],
> >  taskIds=[Send_To_NA-1, Send_To_NA-3, Send_To_NA-5, Send_To_NA-7,
> > Send_To_NA-9, Send_To_NA-11, Send_To_NA-13, Send_To_NA-15, Send_To_NA-17,
> > Send_To_NA-19, Send_To_NA-21]}
> > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1192
>
>
>
> --
> Regards, Zi-Liang
>
> Mail:zlchen@gmail.com
>


Re: [DISCUSSION] KIP-266: Add TimeoutException to KafkaConsumer#position()

2018-03-23 Thread Ewen Cheslack-Postava
Regarding the flexibility question, has someone tried to dig up the
discussion of the new consumer APIs when they were being written? I vaguely
recall these exact questions about using APIs vs configs and flexibility vs
bloating the API surface area having already been discussed. (Not that we
shouldn't revisit, just that it might also be a faster way to get to a full
understanding of the options, concerns, and tradeoffs).

-Ewen

On Thu, Mar 22, 2018 at 7:19 AM, Richard Yu 
wrote:

> I do have one question though: in the current KIP, throwing
> TimeoutException to mark
> that time limit is exceeded is applied to all new methods introduced in
> this proposal.
> However, how would users respond when a TimeoutException (since it is
> considered
> a RuntimeException)?
>
> Thanks,
> Richard
>
>
>
> On Mon, Mar 19, 2018 at 6:10 PM, Richard Yu 
> wrote:
>
> > Hi Ismael,
> >
> > You have a great point. Since most of the methods in this KIP have
> similar
> > callbacks (position() and committed() both use fetchCommittedOffsets(),
> > and
> > commitSync() is similar to position(), except just updating offsets), the
> > amount of time
> > they block should be also about equal.
> >
> > However, I think that we need to take into account a couple of things.
> For
> > starters,
> > if the new methods were all reliant on one config, there is likelihood
> > that the
> > shortcomings for this approach would be similar to what we faced if we
> let
> > request.timeout.ms control all method timeouts. In comparison, adding
> > overloads
> > does not have this problem.
> >
> > If you have further thoughts, please let me know.
> >
> > Richard
> >
> >
> > On Mon, Mar 19, 2018 at 5:12 PM, Ismael Juma  wrote:
> >
> >> Hi,
> >>
> >> An option that is not currently covered in the KIP is to have a separate
> >> config max.block.ms, which is similar to the producer config with the
> >> same
> >> name. This came up during the KAFKA-2391 discussion. I think it's clear
> >> that we can't rely on request.timeout.ms, so the decision is between
> >> adding
> >> overloads or adding a new config. People seemed to be leaning towards
> the
> >> latter in KAFKA-2391, but Jason makes a good point that the overloads
> are
> >> more flexible. A couple of questions from me:
> >>
> >> 1. Do we need the additional flexibility?
> >> 2. If we do, do we need it for every blocking method?
> >>
> >> Ismael
> >>
> >> On Mon, Mar 19, 2018 at 5:03 PM, Richard Yu  >
> >> wrote:
> >>
> >> > Hi Guozhang,
> >> >
> >> > I made some clarifications to KIP-266, namely:
> >> > 1. Stated more specifically that commitSync will accept user input.
> >> > 2. fetchCommittedOffsets(): Made its role in blocking more clear to
> the
> >> > reader.
> >> > 3. Sketched what would happen when time limit is exceeded.
> >> >
> >> > These changes should make the KIP easier to understand.
> >> >
> >> > Cheers,
> >> > Richard
> >> >
> >> > On Mon, Mar 19, 2018 at 9:33 AM, Guozhang Wang 
> >> wrote:
> >> >
> >> > > Hi Richard,
> >> > >
> >> > > I made a pass over the KIP again, some more clarifications /
> comments:
> >> > >
> >> > > 1. seek() call itself is not blocking, only the following poll()
> call
> >> may
> >> > > be blocking as the actually metadata rq will happen.
> >> > >
> >> > > 2. I saw you did not include Consumer.partitionFor(),
> >> > > Consumer.OffsetAndTimestamp() and Consumer.listTopics() in your KIP.
> >> > After
> >> > > a second thought, I think this may be a better idea to not tackle
> >> them in
> >> > > the same KIP, and probably we should consider whether we would
> change
> >> the
> >> > > behavior or not in another discussion. So I agree to not include
> them.
> >> > >
> >> > > 3. In your wiki you mentioned "Another change shall be made to
> >> > > KafkaConsumer#poll(), due to its call to updateFetchPositions()
> which
> >> > > blocks indefinitely." This part may a bit obscure to most readers
> >> who's
> >> > not
> >> > > familiar with the KafkaConsumer internals, could you please add more
> >> > > elaborations. More specifically, I think the root causes of the
> public
> >> > APIs
> >> > > mentioned are a bit different while the KIP's explanation sounds
> like
> >> > they
> >> > > are due to the same reason:
> >> > >
> >> > > 3.1 fetchCommittedOffsets(): this internal call will block forever
> if
> >> the
> >> > > committed offsets cannot be fetched successfully and affect
> position()
> >> > and
> >> > > committed(). We need to break out of its internal while loop.
> >> > > 3.2 position() itself will while loop when offsets cannot be
> >> retrieved in
> >> > > the underlying async call. We need to break out this while loop.
> >> > > 3.3 commitSync() passed Long.MAX_VALUE as the timeout value, we
> should
> >> > take
> >> > > the user specified timeouts when applicable.
> >> > >
> >> > >
> >> > >
> >> > > Guozhang
> >> > >
> >> > > On Sat, Mar 17, 2018 at 4:44 PM, Richard Yu <
> >> yohan.richard...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Actually, what

Re: [VOTE] KIP-272: Add API version tag to broker's RequestsPerSec metric

2018-03-30 Thread Ewen Cheslack-Postava
+1 (binding)

The incompatibility is unfortunate, but seems unlikely to cause a problem
in practice. Let's just make sure there's a note in the upgrade notes about
the incompatibility when we have a PR for this.

-Ewen

On Fri, Mar 30, 2018 at 10:22 AM, Jun Rao  wrote:

> Hi, Allen,
>
> Thanks for the explanation. That's a great point. Could you add that to the
> rejected section of the KIP.  +1 on this KIP.
>
> Jun
>
> On Fri, Mar 30, 2018 at 10:06 AM, Allen Wang  wrote:
>
> > Hi Jun,
> >
> > I think this KIP helps to gain insight into how many clients have been
> > upgraded *before* the we decide to upgrade the message format. In other
> > words, users will know if they will pay down conversion cost before
> message
> > format upgrade. It is about making an informed decision. To be
> > conservative, users can wait until all clients have been upgraded before
> > changing message format, which will guarantee zero down conversion cost.
> >
> > While for the KIP you mentioned, users will only know the cost of down
> > conversion after format upgrade, which may be too late.
> >
> > Thanks,
> > Allen
> >
> >
> >
> > On Thu, Mar 29, 2018 at 6:16 PM, Jun Rao  wrote:
> >
> > > Hi, Allen,
> > >
> > > Thanks for the KIP.
> > >
> > > It seems the main motivation of the KIP is to estimate the ratio of the
> > > clients doing down conversion. I am wondering if we can do that by
> > relying
> > > on the metrics we added in 1.0.0 (
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 188+-+Add+new+metrics+to+support+health+checks#KIP-188-
> > > Addnewmetricstosupporthealthchecks-Messageconversionrateandtime).
> > > This metric reports the message rate of down conversion. By comparing
> > this
> > > to the message rate of all consumers, you can roughly estimate the
> ratio
> > of
> > > consumers still needing down conversion. Does that cover the main thing
> > > that you want from this KIP?
> > >
> > > Jun
> > >
> > >
> > >
> > > On Wed, Mar 28, 2018 at 9:55 AM, Allen Wang 
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I would like to start voting for KIP-272:  Add API version tag to
> > > broker's
> > > > RequestsPerSec metric.
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 272%3A+Add+API+version+tag+to+broker%27s+RequestsPerSec+metric
> > > >
> > > > Thanks,
> > > > Allen
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-276: Add StreamsConfig prefix for different consumers

2018-04-04 Thread Ewen Cheslack-Postava
I think this model is more confusing than it needs to be.

We end up with 4 prefixes despite only have 3 types of consumers. We have
prefixes for: "base", "main", "global", and "restore". However, we only
instantiate consumers of type "main", "global", and "restore".

Until now, we've only had two types of configs mapping to two types of
consumers, despite internally having some common shared configs as a
baseline to bootstrap the two "public" ones (see
StreamsConfig.getCommonConsumerConfigs). Do we want to complicate this to 4
levels of "public" configs when there are only 3 types of concrete configs
we instantiate?

More generally, I worry that we're optimizing too much to avoid copy/paste
in less common cases to the point that we would confuse users with yet more
concepts before they can even write their configuration. What if we took an
(perhaps modified) all or nothing approach to inheriting from the the
"main" consumer properties? In that mode if you override *anything*, you
should specify *everything*. But if it was just, e.g. group.id, client.id,
and offset.reset that needed adjustment for a restoreConsumer, that would
be the default and everything else is inherited. Same deal for a clearly
specified set of configs for global consumers that required override.

I feel like I'm also concurrently seeing the opposite side of this problem
in Connect where we namespaced and didn't proactively implement
inheritance; and while people find the config duplication annoying (and
confusing!), we inevitably find cases where they need it. I think
inheritance in these configuration setups needs to be approached very
carefully. Admittedly, some of the challenges in Connect don't appear here
(e.g. conflicts in producer/consumer config naming, since this is a
Consumer-only KIP), but similar problems arise.

-Ewen

On Wed, Apr 4, 2018 at 10:56 PM, Boyang Chen  wrote:

> Thanks Guozhang! I already updated the pull request and KIP to deprecate
> getConsumerConfigs() function. Do you think we could move to a voting stage
> now?
>
>
> 
> From: Guozhang Wang 
> Sent: Thursday, April 5, 2018 9:52 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-276: Add StreamsConfig prefix for different
> consumers
>
> I agree that renaming the method in this case may not worth it. Let's keep
> the existing function names.
>
> On Wed, Apr 4, 2018 at 6:06 PM, Matthias J. Sax 
> wrote:
>
> > Thanks for updating the KIP.
> >
> > One more comment. Even if we don't expect users to call
> > `StreamsConfig#getConsumerConfigs()` it is still public API. Thus, we
> > cannot just rename the method.
> >
> > I think we have two options: either we keep the current name or we
> > deprecate the method and introduce `getMainConsumerConfigs()` in
> > parallel. The deprecated method would just call the new method.
> >
> > I am not sure if we gain a lot renaming the method, thus I have a slight
> > preference in just keeping the method name as is. (But I am also ok to
> > rename it, if people prefer this)
> >
> > -Matthias
> >
> >
> > On 4/3/18 1:59 PM, Bill Bejeck wrote:
> > > Boyang,
> > >
> > > Thanks for making changes to the KIP,  I'm +1 on the updated version.
> > >
> > > -Bill
> > >
> > > On Tue, Apr 3, 2018 at 1:14 AM, Boyang Chen 
> wrote:
> > >
> > >> Hey friends,
> > >>
> > >>
> > >> both KIP > >> 276+Add+StreamsConfig+prefix+for+different+consumers> and pull
> request<
> > >> https://github.com/apache/kafka/pull/4805> are updated. Feel free to
> > take
> > >> another look.
> > >>
> > >>
> > >>
> > >> Thanks for your valuable feedback!
> > >>
> > >>
> > >> Best,
> > >>
> > >> Boyang
> > >>
> > >> 
> > >> From: Boyang Chen 
> > >> Sent: Tuesday, April 3, 2018 11:39 AM
> > >> To: dev@kafka.apache.org
> > >> Subject: Re: [DISCUSS] KIP-276: Add StreamsConfig prefix for different
> > >> consumers
> > >>
> > >> Thanks Matthias, Ted and Guozhang for the inputs. I shall address them
> > in
> > >> next round.
> > >>
> > >>
> > >> 
> > >> From: Matthias J. Sax 
> > >> Sent: Tuesday, April 3, 2018 4:43 AM
> > >> To: dev@kafka.apache.org
> > >> Subject: Re: [DISCUSS] KIP-276: Add StreamsConfig prefix for different
> > >> consumers
> > >>
> > >> Yes, your examples make sense to me. That was the idea behind the
> > proposal.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 4/2/18 11:25 AM, Guozhang Wang wrote:
> > >>> @Matthias
> > >>>
> > >>> That's a good question: I think adding another config for the main
> > >> consumer
> > >>> makes good tradeoffs between compatibility and new user convenience.
> > Just
> > >>> to clarify, from user's pov on upgrade:
> > >>>
> > >>> 1) I'm already overriding some consumer configs, and now I want to
> > >> override
> > >>> these values differently for restore consumers, I'd add one new line
> > for
> > >>> the restore consumer prefix.
> > >>>
> > >>> 2) I'm already

Re: [DISCUSS] KIP-276: Add StreamsConfig prefix for different consumers

2018-04-11 Thread Ewen Cheslack-Postava
On Thu, Apr 5, 2018 at 3:28 PM, Matthias J. Sax 
wrote:

> Ewen,
>
> I cannot completely follow your argument. Can you elaborate a little
> bit? After reading you mail, I am not sure if you prefer config
> inheritance or not? And if, to what extend?
>

So we've had issues with config complexity in a couple of places. On the
one hand I think having the ability to specify configs separately for
different clients is important. In that regard namespacing the different
configs is important. This is particularly important when you can have
conflicting configs. This comes up frequently with interceptors because the
producer and consumer use the same config name, but require classes that
implement different interfaces. If you have both a producer and consumer,
you need to have this namespacing.

On the other hand, in the most common case where the user wants the same
configs for everything (i.e. just pass the bootstrap brokers, or share
common authentication configs), the namespacing becomes a pain. We have
this issue in connect for worker, producer and consumer settings, where if
you want to connect to a secured cluster you duplicate the info.

So because of that I like the *idea* of inheritance, but all the proposals
I've seen so far basically just say "use base configuration and add
overrides". See notes below about why I don't think this is a good idea.


>
>
> > In that mode if you override *anything*, you
> >> should specify *everything*.
>
> Can you give an example for this?
>

Security is the most obvious example of this. If you're overriding any
security config, you clearly have a different security setup than the
parent and inheriting stuff is just confusing and requires digging through
logs to see the "effective" config if something isn't working.

One way to keep this simpler to understand is that you can use the same
config from a base config entirely or, if you specify a single config with
the special prefix then you *only* use configs. There might be cases where
you duplicate the same settings, but the all or nothing approach makes it
easy to understand what is happening and you still only pay the cost of
larger configs if you need the advanced functionality of customizing
different clients.

The unfortunate side effect of this is that some things, e.g. bootstrap
brokers, probably don't make sense to customize under the different types
of clients but you'd still need to duplicate those settings. This is why I
was saying I haven't really seen a clean solution to this.



>
>
> > But if it was just, e.g. group.id, client.id,
> >> and offset.reset that needed adjustment for a restoreConsumer, that
> would
> >> be the default and everything else is inherited.
>
> Don't understand this part.
>
>
> > I think
> >> inheritance in these configuration setups needs to be approached very
> >> carefully.
>
> Agreed. I think that our approach is carefully designed though.
>
>
> For follow up on Guozhang's argument, I agree with him, that for most
> users the existing prefix would be good enough and the three new
> prefixes are for advanced users.
>

This is maybe where the biggest gap in the proposal is. It just says:

> For some use cases, it's required to set different configs for different
consumers.

Why? what are those use cases? What case isn't handled by the configs we
have today? It seems weird that different consumers within the same app
would need different client-side settings.

-Ewen


>
>
>
> -Matthias
>
> On 4/5/18 11:37 AM, Guozhang Wang wrote:
> > Ewen, thanks for your thoughts.
> >
> > Just to clarify the current KIP does not propose for four new prefixes,
> but
> > three plus the existing one. So it is
> >
> > "consumer"
> > "main.consumer"
> > "restore.consumer"
> > "global.consumer"
> >
> >
> > If we design the configs for the first time, I'd be in favor of only
> > keeping the last three prefixes. But as of now we have a compatibility
> > issue to consider. So the question is if it is worthwhile to break the
> > compatibility and enforce users to make code changes. My rationale is
> that:
> >
> > 1) for normal users they would not bother overriding configs for
> different
> > types of consumers, where "consumer" prefix is good enough for them; and
> > today they probably have already made those overrides via "consumer".
> >
> > 2) for advanced users they would need some additional overrides for
> > different types of consumers, and they would go ahead and learn about the
> > other three prefixes and set them there.
> >
> >
> > I

Re: [DISCUSS] add connect-related packages to "What is considered a "major change" that needs a KIP?"

2018-08-07 Thread Ewen Cheslack-Postava
First, I agree, updating that list would be a good idea. It's likely it
will always be a little divergent from any new additions -- the last update
was probably when the KIP page was originally created, before either
Connect or Streams existed.

However, note that we also document the exact set of public APIs, or at
least try to, in the user-facing javadocs
http://kafka.apache.org/20/javadoc/index.html?overview-summary.html If its
not in there, it likely is not public API.

In the case of Connect, the only public Java APIs are in their own module,
connect-api. Everything in the other connect modules are considered
internal implementation details.

-Ewen


On Fri, Jul 27, 2018 at 7:15 PM Chia-Ping Tsai  wrote:

> hi Kafka
>
> There is a section[1] listing the packages which have public interfaces.
> However, it doesn't include any connect-related packages. It would be
> better to have a list used to highlight the public interfaces for
> connector. Otherwise, connector user may misuse the internal interface in
> their code base. For example, TopicAdmin[2] is a nice wrap of
> AdminClient[3] so I have applied TopicAdmin in production code. However,
> I'm not sure whether TopicAdmin is a public interface. It implies that I
> should remove it from code base since the SC of TopicAdmin may be broken in
> minor release.
>
> Not familiar with the rule of updating "Kafka Improvement Proposals" so I
> start this thread. All suggestions are welcome!
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-Whatisconsidereda%22majorchange%22thatneedsaKIP
> ?
>
> [2]
> https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java
>
> [3]
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/AdminClient.java
>
> --
> Chia-Ping
>


Re: Apache Kafka project charter

2018-09-30 Thread Ewen Cheslack-Postava
Hey all,

Sorry I haven't been closely following the threads on this, but I think I
can provide a bit more color.

Jakub, re: general policy, I'll take the blame that the relevant "rejected
alternatives" section in the KIP
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767#KIP-26-AddKafkaConnectframeworkfordataimport/export-Maintainconnectorsintheprojectalongwithframework
never made it into documentation. That means everything related to that
decision is currently locked up in the KIP or, possibly worse, the
difficult-to-search mailing list archives.

The reasoning for that decision was to control the scope of what gets added
to AK itself, scope expectations re: maintenance, and frankly make cases
like this more clear cut, rather than subjective case-by-case decisions.
There are plenty of other examples of other projects of similar pluggable
structure where ownership, maintenance, responsibility for quality, and
more become *really* hard to reason about because *some* things are
upstreamed into the main project, others aren't, maintainers come and go,
etc. (see logstash, Flume, etc) Pushing basically everything except for a
simple example out to be community maintained helps make many of these
characteristics clear: look to the maintainer of the connector for guidance
on support level, compatibility, commitment of the maintainer, etc. Apache
Kafka will maintain the core framework.

KIP-310 is, admittedly, a bit of an odd case as it is all about Kafka. It's
much harder for the Apache Kafka project to commit to maintaining, for
example, a MongoDB connector than one that deals specifically with Kafka.
The fact that MM has been included and maintained certainly muddies the
waters here.

On the flip side, consider that although it's generally implicit in KIP
discussions, all KIPs come with the consideration of whether something
makes sense to support by the Apache Kafka project vs supported by the
broader community. e.g. KIP-10 (Mesos integration), KIP-69 (schema
registry), KIP-80 (REST server), KIP-127 (Pluggable JAAS LoginModule, which
was better implemented by a more general KIP), etc. There's a legitimate
concern wrt whether the Apache project can provide the throughput to
maintain substantial new chunks of code, in additional to the large surface
area we already have, maintain quality, provide the same level of
compatibility and testing we provide today, etc. Sometimes it makes sense
for a community member or team to maintain these projects; note that
connectors are far from the first example of this -- non-Java clients have
never been maintained by the Apache project for this same reason -- there
wasn't sufficient expertise or throughput to do so.

wrt a single connector, I actually don't care much where it is maintained
as long as:

1. Policy for in/out for other connectors in the project is clear. To date,
we haven't really had a question about this until now.
2. There's clear, committed maintainership to keep the component healthy;
this includes committer throughput for the feature itself, subsequent
bugfixes, KIPs, etc. (and tbh, this is thin currently on Connect features,
something I'm very aware of)
3. If it lands in Apache Kafka, there's an immediate compatibility
commitment (compatibility across versions of AK broker  & Connect, as well
as upgrade & API & config compatibility for the connector itself.

-Ewen

On Sun, Sep 30, 2018 at 1:44 PM Matthias J. Sax 
wrote:

> I am not aware of anything like this. And I also think, it's difficult
> to generalize. So far, each feature is discussed on a per-case basis.
>
> Because it's hard to draw the boarder line we might be too restrictive
> or too loose in a "project charter", thus, scaring people from starting
> KIPs, what would be bad for the community and the project IMHO.
>
> I also think that the overhead of writing a KIP is not too large, and
> thus the risk (and "wasted time") that a KIP is rejected because "not
> part of the project" is rather small. Also, anybody could suggest a
> feature and collect feedback on the mailing list even before a concrete
> KIP is proposed.
>
> Just my 2 cents.
>
>
> -Matthias
>
>
>
> On 9/29/18 4:31 AM, Jakub Scholz wrote:
> > Hi community,
> >
> > I noticed following argument in the discussion about KIP-310.
> >
> >> However, I don't think the apache/kafka repository is the right place to
> > host such a Connector.
> >
> > I was wondering whether there is some project charter describing what
> does
> > and what does not belong to the Apache Kafka project. I tried to search
> for
> > it, but I haven't found anything.
> >
> > If nothing like that exists, I wonder if we should write something. I
> think
> > its not very community friendly to let people write the KIP just to get a
> > feedback like this. By that I do not mean that the point raised by
> > Konstantine is necessarily wrong. All I'm trying to say is that I think
> > there should be some project charter which would describe what does 

Re: Anyone interested in helping out a little brother Apache project with their Kafka integration?

2018-09-30 Thread Ewen Cheslack-Postava
Chris,

Can you point at the starting point for your Connector? Happy to do a quick
review of code.

I'll admit, I'm a bit confused by the PLC4X positioning -- it seems to
claim to be universal IoT protocol, but then I see, e.g., nothing about
MQTT on the front page, a protocol that Kafka Connect already has a couple
of Connectors for. Could you clarify what format/serialization/use case
PLC4X has?

-Ewen


On Wed, Aug 15, 2018 at 7:43 AM Christofer Dutz 
wrote:

> Hi all,
>
> I am one of the Apache PLC4X (incubating) committers and am looking for
> people willing to help out with a little thing.
>
> PLC4X is aiming at industrial programmable logic controllers. So what we
> are doing is similar what JDBC did in the late 90s.
>
> We’ve implemented a universal API with which we can write software to
> access industrial PLCs in a unified way.
> In contrast to OPC-UA we do not require any changes on the devices as we
> implement the drivers for the individual protocols as main part of the
> PLC4X project.
>
> We hope that this will help us use all our cool Apache software for
> building the next generation of Industry 4.0 solutions,
> as currently it’s a real problem to communicate with these little gray
> boxes directly.
>
> We not only work hard on the API and the matching drivers, but also on
> integration modules to easily use PLC4X in other frameworks.
> We already have full Apache Camel and Apache Edgent support.
>
> Even if we do have examples for reading and writing data from and to
> Kafka, we would really like to provide a Kafka-Connect adapter.
> This would make it super easy to communicate with industrial controllers
> from inside the Kafka ecosystem.
>
> I have already implemented a stub of a Kafka Connect plugin with Source
> and Sink partially implemented.
> So this stub has all the code for reading and writing via PLC4X but I am
> not quite that deep into Kafka that I think I should finish this
> implementation without professional support of the people actually knowing
> what they are doing on the Kafka side.
>
> So is anyone here able and willing to help with this Kafka Connect Plugin
> for PLC4X?
>
> Looking forward to positive responses ;-)
>
> Chris
>


Re: [DISCUSS] KIP-190: Handle client-ids consistently between clients and brokers

2017-09-22 Thread Ewen Cheslack-Postava
Hi all,

In working on the patch for KIP-196: Add metrics to Kafka Connect
framework, we realized that we have worker and connector/task IDs that are
to be included in metrics and those don't currently have constraints on
naming. I'd prefer to avoid adding naming restrictions or mangling names
unnecessarily, and for users that define a custom metrics reporter the name
mangling may be unexpected since their metrics system may not have the same
limitations as JMX.

The text of the KIP is pretty JMX specific and doesn't really define where
this mangling happens. Currently, it is being applied essentially as early
as possible. I would like to propose moving the name mangling into the
JmxReporter itself so the only impact is on JMX metrics, but other metrics
reporters would see the original. In other words, leave system-specific
name mangling up to that system.

In the JmxReporter, the mangling could remain the same (though I think the
mangling for principals is an internal implementation detail, whereas this
name mangling is user-visible). The quota metrics on the broker would now
be inconsistent with the others, but I think trying to be less JMX-specific
given that we support pluggable reporters is the right direction to go.

I think in practice this has no publicly visible impact and definitely no
compatibility concerns, it just moves where we're doing the JMX name
mangling. However, since discussion about metric naming/character
substitutions had happened here recently I wanted to raise it here and make
sure there would be agreement on this direction.

(Long term I'd also like to get the required instantiation of JmxReporter
removed as well, but that requires its own KIP.)

Thanks,
Ewen

On Thu, Sep 14, 2017 at 2:09 AM, Tom Bentley  wrote:

> Hi Mickael,
>
> I was just wondering why the restriction was imposed for Java clients the
> first place, do you know?
>
> Cheers,
>
> Tom
>
> On 14 September 2017 at 09:16, Ismael Juma  wrote:
>
> > Thanks for the KIP Mickael. I suggest starting a vote.
> >
> > Ismael
> >
> > On Mon, Aug 21, 2017 at 2:51 PM, Mickael Maison <
> mickael.mai...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > I have created a KIP to cleanup the way client-ids are handled by
> > > brokers and clients.
> > >
> > > Currently the Java clients have some restrictions on the client-ids
> > > that are not enforced by the brokers. Using 3rd party clients,
> > > client-ids containing any characters can be used causing some strange
> > > behaviours in the way brokers handle metrics and quotas.
> > >
> > > Feedback is appreciated.
> > >
> > > Thanks
> > >
> >
>


Re: [DISCUSS] KIP-190: Handle client-ids consistently between clients and brokers

2017-09-25 Thread Ewen Cheslack-Postava
On Mon, Sep 25, 2017 at 3:35 AM, Mickael Maison 
wrote:

> Hi Ewen,
>
> I understand your point of view and ideally we'd have one convention
> for handling all user provided strings. This KIP reused the
> sanitization mechanism we had in place for user principals.
>
> I think both ways have pros and cons but what I like about early
> sanitization (as is currently) is that it's consistent. While
> monitoring tools have to know about the sanitization, all metrics are
> sanitized the same way before being passed to reporters and that
> includes all "fields" in the metric name (client-id, user).
>

How is just passing the actual values to the reporters any less consistent?
The "sanitization" process that was in place was really only for internal
purposes, right? i.e. so that we'd have a path for ZK that ZK could handle?

I think the real question is why JmxReporter is being special-cased?


>
> Moving the sanitization into JMXReporter is a publicly visible change
> as it would affect the metrics we pass to other reporters.
>

How would this be any more publicly visible than the change already being
made? In fact, since we haven't really specified what reporters should
accept, if anything the change to the sanitized strings is more of a
publicly visible change (you need to understand exactly what transformation
is being applied) than the change I am suggesting (which could be
considered a bug fix that now just fixes support for certain client IDs and
only affects JMX metric names because of JMX limitations).

-Ewen


>
>
>
>
> On Fri, Sep 22, 2017 at 8:53 PM, Ewen Cheslack-Postava
>  wrote:
> > Hi all,
> >
> > In working on the patch for KIP-196: Add metrics to Kafka Connect
> > framework, we realized that we have worker and connector/task IDs that
> are
> > to be included in metrics and those don't currently have constraints on
> > naming. I'd prefer to avoid adding naming restrictions or mangling names
> > unnecessarily, and for users that define a custom metrics reporter the
> name
> > mangling may be unexpected since their metrics system may not have the
> same
> > limitations as JMX.
> >
> > The text of the KIP is pretty JMX specific and doesn't really define
> where
> > this mangling happens. Currently, it is being applied essentially as
> early
> > as possible. I would like to propose moving the name mangling into the
> > JmxReporter itself so the only impact is on JMX metrics, but other
> metrics
> > reporters would see the original. In other words, leave system-specific
> > name mangling up to that system.
> >
> > In the JmxReporter, the mangling could remain the same (though I think
> the
> > mangling for principals is an internal implementation detail, whereas
> this
> > name mangling is user-visible). The quota metrics on the broker would now
> > be inconsistent with the others, but I think trying to be less
> JMX-specific
> > given that we support pluggable reporters is the right direction to go.
> >
> > I think in practice this has no publicly visible impact and definitely no
> > compatibility concerns, it just moves where we're doing the JMX name
> > mangling. However, since discussion about metric naming/character
> > substitutions had happened here recently I wanted to raise it here and
> make
> > sure there would be agreement on this direction.
> >
> > (Long term I'd also like to get the required instantiation of JmxReporter
> > removed as well, but that requires its own KIP.)
> >
> > Thanks,
> > Ewen
> >
> > On Thu, Sep 14, 2017 at 2:09 AM, Tom Bentley 
> wrote:
> >
> >> Hi Mickael,
> >>
> >> I was just wondering why the restriction was imposed for Java clients
> the
> >> first place, do you know?
> >>
> >> Cheers,
> >>
> >> Tom
> >>
> >> On 14 September 2017 at 09:16, Ismael Juma  wrote:
> >>
> >> > Thanks for the KIP Mickael. I suggest starting a vote.
> >> >
> >> > Ismael
> >> >
> >> > On Mon, Aug 21, 2017 at 2:51 PM, Mickael Maison <
> >> mickael.mai...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I have created a KIP to cleanup the way client-ids are handled by
> >> > > brokers and clients.
> >> > >
> >> > > Currently the Java clients have some restrictions on the client-ids
> >> > > that are not enforced by the brokers. Using 3rd party clients,
> >> > > client-ids containing any characters can be used causing some
> strange
> >> > > behaviours in the way brokers handle metrics and quotas.
> >> > >
> >> > > Feedback is appreciated.
> >> > >
> >> > > Thanks
> >> > >
> >> >
> >>
>


Re: [DISCUSS] KIP-190: Handle client-ids consistently between clients and brokers

2017-09-26 Thread Ewen Cheslack-Postava
It's a fair point that it would undo that sanitization. It's possible that
for compatibility reasons, doing so would require a bit more work and care
(e.g. supporting both sanitized and unsanitized for awhile so users have a
chance to migrate). But I guess my point is that I view the location where
sanitization is happening as more of a bug, and I actually haven't followed
the whole history of that discussion so I'm not entirely sure whether it
was intentional or just sort of happened along with the sanitization
required for generating ZK paths.

Anyway, I think we shouldn't choose to cripple all metrics just because the
metrics involving those user principals weren't handled ideally. If you're
open to this, we could always make the change to the code affected by this
KIP, leave the user principal metrics sanitized as they are today, and then
follow up with another KIP to get all the metrics unsanitized when they hit
the metrics reporter, but for Jmx sanitized as they are today when
characters are encountered that Jmx cannot support.

-Ewen

On Tue, Sep 26, 2017 at 2:24 AM, Mickael Maison 
wrote:

> Hi Ewen,
>
> By consistency, I meant having all fields sanitized the same way we
> were previously doing for user principal.
>
> But re-reading your previous email, I'm guessing you meant to also
> remove the current user principal sanitization from the metrics (only
> use that internally for ZK) and have all metrics use the actual
> values. If so, yes that's an option.
>
> On Mon, Sep 25, 2017 at 5:58 PM, Ewen Cheslack-Postava
>  wrote:
> > On Mon, Sep 25, 2017 at 3:35 AM, Mickael Maison <
> mickael.mai...@gmail.com>
> > wrote:
> >
> >> Hi Ewen,
> >>
> >> I understand your point of view and ideally we'd have one convention
> >> for handling all user provided strings. This KIP reused the
> >> sanitization mechanism we had in place for user principals.
> >>
> >> I think both ways have pros and cons but what I like about early
> >> sanitization (as is currently) is that it's consistent. While
> >> monitoring tools have to know about the sanitization, all metrics are
> >> sanitized the same way before being passed to reporters and that
> >> includes all "fields" in the metric name (client-id, user).
> >>
> >
> > How is just passing the actual values to the reporters any less
> consistent?
> > The "sanitization" process that was in place was really only for internal
> > purposes, right? i.e. so that we'd have a path for ZK that ZK could
> handle?
> >
> > I think the real question is why JmxReporter is being special-cased?
> >
> >
> >>
> >> Moving the sanitization into JMXReporter is a publicly visible change
> >> as it would affect the metrics we pass to other reporters.
> >>
> >
> > How would this be any more publicly visible than the change already being
> > made? In fact, since we haven't really specified what reporters should
> > accept, if anything the change to the sanitized strings is more of a
> > publicly visible change (you need to understand exactly what
> transformation
> > is being applied) than the change I am suggesting (which could be
> > considered a bug fix that now just fixes support for certain client IDs
> and
> > only affects JMX metric names because of JMX limitations).
> >
> > -Ewen
> >
> >
> >>
> >>
> >>
> >>
> >> On Fri, Sep 22, 2017 at 8:53 PM, Ewen Cheslack-Postava
> >>  wrote:
> >> > Hi all,
> >> >
> >> > In working on the patch for KIP-196: Add metrics to Kafka Connect
> >> > framework, we realized that we have worker and connector/task IDs that
> >> are
> >> > to be included in metrics and those don't currently have constraints
> on
> >> > naming. I'd prefer to avoid adding naming restrictions or mangling
> names
> >> > unnecessarily, and for users that define a custom metrics reporter the
> >> name
> >> > mangling may be unexpected since their metrics system may not have the
> >> same
> >> > limitations as JMX.
> >> >
> >> > The text of the KIP is pretty JMX specific and doesn't really define
> >> where
> >> > this mangling happens. Currently, it is being applied essentially as
> >> early
> >> > as possible. I would like to propose moving the name mangling into the
> >> > JmxReporter itself so the only impact is on JMX metrics, but other
> >> metrics
> >> > repo

Re: [DISCUSS] KIP-215: Add topic regex support for Connect sinks

2017-10-26 Thread Ewen Cheslack-Postava
It's fine to be more detailed, but ConfigException is already implied for
all other config issues as well.

Default could be either null or just empty string. re: alternatives, if you
wanted to be slightly more detailed (though still a bit vague) re:
supported syntax, you could just say that while Pattern is used, we only
guarantee support for common regular expression syntax. Not sure if there's
a good way of defining what "common" syntax is.

Otherwise LGTM, and thanks for helping fill in a longstanding gap!

-Ewen

On Thu, Oct 26, 2017 at 7:56 PM, Ted Yu  wrote:

> bq. Users may specify only one of 'topics' or 'topics.pattern'.
>
> Can you fill in which exception would be thrown if both of them are
> specified
> ?
>
> Cheers
>
> On Thu, Oct 26, 2017 at 6:27 PM, Jeff Klukas  wrote:
>
> > Looking for feedback on
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 215%3A+Add+topic+regex+support+for+Connect+sinks
> >
>


Re: Permission to create a KIP

2017-10-30 Thread Ewen Cheslack-Postava
Not sure if someone added you and just forgot to reply, but it looks like
you already have permissions on the wiki. Just follow the process described
here:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

-Ewen

On Thu, Oct 26, 2017 at 8:40 AM, Elizabeth Bennett <
elizabeth.benn...@stitchfix.com> wrote:

> Hi Kafka Devs,
> I'd like permission to create a KIP. This is following up on a discussion
> from the #connect slack channel regarding overrides for producer/consumer
> configs on a per-connector level.
>
> My confluence id is: elizabeth.bennett
>
>
> Thanks,
> Liz
>


Re: [DISCUSS] KIP-215: Add topic regex support for Connect sinks

2017-10-30 Thread Ewen Cheslack-Postava
I took a quick pass at the PR, looks good so far. ConfigException would
still be fine in the case you're highlighting as it's inside the framework
anyway and we'd expect a ConfigException from configure() if connectors try
to use their ConfigDef to parse an invalid config. But here I don't feel
strongly about which to use since the error message is clear anyway and
will just end up in logs / the REST API for the user to sort out.

-Ewen

On Fri, Oct 27, 2017 at 6:39 PM, Jeff Klukas  wrote:

> I've updated the KIP to use the topics.regex name and opened a WIP PR with
> an implementation that shows some additional complexity in how the
> configuration option gets passed through, affecting various public function
> signatures.
>
> I would appreciate any eyes on that for feedback on whether more design
> discussion needs to happen in the KIP.
>
> https://github.com/apache/kafka/pull/4151
>
> On Fri, Oct 27, 2017 at 7:50 AM, Jeff Klukas  wrote:
>
> > I added a note in the KIP about ConfigException being thrown. I also
> > changed the proposed default for the new config to empty string rather
> than
> > null.
> >
> > Absent a clear definition of what "common" regex syntax is, it seems an
> > undue burden to ask the user to guess at what Pattern features are safe.
> If
> > we do end up implementing a different regex style, I think it will be
> > necessary to still support the Java Pattern style long-term as an option.
> > If we want to use a different regex style as default down the road, we
> > could require "power users" of Java Pattern to enable an additional
> config
> > option to maintain compatibility.
> >
> > One additional change I might make to the KIP is that 'topics.regex'
> might
> > be a better choice for config name than 'topics.pattern'. That would be
> in
> > keeping with RegexRouter that has a 'regex' configuration option rather
> > than 'pattern'.
> >
> > On Thu, Oct 26, 2017 at 11:00 PM, Ewen Cheslack-Postava <
> e...@confluent.io
> > > wrote:
> >
> >> It's fine to be more detailed, but ConfigException is already implied
> for
> >> all other config issues as well.
> >>
> >> Default could be either null or just empty string. re: alternatives, if
> >> you
> >> wanted to be slightly more detailed (though still a bit vague) re:
> >> supported syntax, you could just say that while Pattern is used, we only
> >> guarantee support for common regular expression syntax. Not sure if
> >> there's
> >> a good way of defining what "common" syntax is.
> >>
> >> Otherwise LGTM, and thanks for helping fill in a longstanding gap!
> >>
> >> -Ewen
> >>
> >> On Thu, Oct 26, 2017 at 7:56 PM, Ted Yu  wrote:
> >>
> >> > bq. Users may specify only one of 'topics' or 'topics.pattern'.
> >> >
> >> > Can you fill in which exception would be thrown if both of them are
> >> > specified
> >> > ?
> >> >
> >> > Cheers
> >> >
> >> > On Thu, Oct 26, 2017 at 6:27 PM, Jeff Klukas  wrote:
> >> >
> >> > > Looking for feedback on
> >> > >
> >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > > 215%3A+Add+topic+regex+support+for+Connect+sinks
> >> > >
> >> >
> >>
> >
> >
>


Re: [VOTE] KIP-215: Add topic regex support for Connect sinks

2017-11-03 Thread Ewen Cheslack-Postava
+1 binding

Thanks Jeff!

On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch  wrote:

> +1 (non-binding)
>
> Thanks for pushing this through. Great work!
>
> Randall Hauch
>
> On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas  wrote:
>
> > I haven't heard any additional concerns over the proposal, so I'd like to
> > get the voting process started for:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 215%3A+Add+topic+regex+support+for+Connect+sinks
> >
> > It adds a topics.regex option for Kafka Connect sinks as an alternative
> to
> > the existing required topics option.
> >
>


Re: [VOTE] KIP-215: Add topic regex support for Connect sinks

2017-11-03 Thread Ewen Cheslack-Postava
Jeff,

Just FYI re: process, I think you're pretty much definitely in the clear
hear since this one is a straightforward design I doubt anybody would
object to, but voting normally stays open 72h to ensure everyone has a
chance to weigh in.

Again thanks for the KIP and we can move any final discussion over to the
PR!

-Ewen

On Fri, Nov 3, 2017 at 4:43 PM, Jeff Klukas  wrote:

> Looks like we've achieved lazy majority, so I'll move the KIP to approved.
>
> Thanks all for looking this over.
>
> On Fri, Nov 3, 2017 at 7:31 PM, Jason Gustafson 
> wrote:
>
> > +1. Thanks for the KIP!
> >
> > On Fri, Nov 3, 2017 at 2:15 PM, Guozhang Wang 
> wrote:
> >
> > > +1 binding
> > >
> > > On Fri, Nov 3, 2017 at 1:25 PM, Ewen Cheslack-Postava <
> e...@confluent.io
> > >
> > > wrote:
> > >
> > > > +1 binding
> > > >
> > > > Thanks Jeff!
> > > >
> > > > On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch 
> > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Thanks for pushing this through. Great work!
> > > > >
> > > > > Randall Hauch
> > > > >
> > > > > On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas 
> > > wrote:
> > > > >
> > > > > > I haven't heard any additional concerns over the proposal, so I'd
> > > like
> > > > to
> > > > > > get the voting process started for:
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 215%3A+Add+topic+regex+support+for+Connect+sinks
> > > > > >
> > > > > > It adds a topics.regex option for Kafka Connect sinks as an
> > > alternative
> > > > > to
> > > > > > the existing required topics option.
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


Re: [VOTE] KIP-218: Make KafkaFuture.Function java 8 lambda compatible

2017-12-11 Thread Ewen Cheslack-Postava
+1 (binding)

-Ewen

On Mon, Dec 11, 2017 at 12:40 PM, Gwen Shapira  wrote:

> +1 (binding) - nice API improvement, thanks for driving it!
>
> On Mon, Dec 11, 2017 at 11:52 AM Xavier Léauté 
> wrote:
>
> > Thanks Steven, I believe I addressed all the comments. If the it looks
> good
> > to you let's move forward on the vote.
> >
> > On Sat, Dec 9, 2017 at 12:50 AM Steven Aerts 
> > wrote:
> >
> > > Hello Xavier,
> > >
> > > for me it is perfect to take it along.
> > > I made a few small remarks in your PR.
> > >
> > > Thanks
> > >
> > > Op za 9 dec. 2017 om 01:29 schreef Xavier Léauté  >:
> > >
> > > > Hi Steve, I just posted in the discussion thread, there's just one
> tiny
> > > fix
> > > > I think would be useful to add while we're making changes to this
> API.
> > > > Do you mind having a look?
> > > >
> > > > On Fri, Dec 8, 2017 at 11:37 AM Mickael Maison <
> > mickael.mai...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > +1 (non binding)
> > > > > Thanks for the KIP
> > > > >
> > > > > On Fri, Dec 8, 2017 at 6:53 PM, Tom Bentley  >
> > > > wrote:
> > > > > > +1
> > > > > >
> > > > > > On 8 December 2017 at 18:34, Ted Yu  wrote:
> > > > > >
> > > > > >> +1
> > > > > >>
> > > > > >> On Fri, Dec 8, 2017 at 3:49 AM, Steven Aerts <
> > > steven.ae...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hello everybody,
> > > > > >> >
> > > > > >> >
> > > > > >> > I think KIP-218 is crystallized enough to start voting.
> > > > > >> >
> > > > > >> > KIP documentation:
> > > > > >> >
> > > > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > >> > 218%3A+Make+KafkaFuture.Function+java+8+lambda+compatible
> > > > > >> >
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> >
> > > > > >> >Steven
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
>


[DISCUSS] KIP-238: Expose Kafka cluster ID in Connect REST API

2017-12-11 Thread Ewen Cheslack-Postava
I'd like to start discussion on a simple KIP to expose Kafka cluster ID
info in the Connect REST API:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-238%3A+Expose+Kafka+cluster+ID+in+Connect+REST+API

Hopefully straightforward, though there are some details on how this
affects startup behavior that might warrant discussion.

-Ewen


Re: [DISCUSS] KIP-238: Expose Kafka cluster ID in Connect REST API

2017-12-11 Thread Ewen Cheslack-Postava
I did, but it doesn't seem to gain much. In order to still avoid having
these intermediate states, you'd still need a latch and then to block any
calls to the root resource until you could connect. It would allow starting
up the rest of the worker, but if it's just going to fail and put the
worker into a bad state anyway, that doesn't seem to help much.

The alternative of just returning incomplete info didn't seem worth the
hassle for users since if you can't connect to the cluster to get the
cluster ID, none of the other APIs would be useful either (you're not going
to be able to write new connector configs, asking for connector state will
give you empty data since you wouldn't be able to load the configs or
status topics, etc).

-Ewen

On Mon, Dec 11, 2017 at 4:35 PM, Ted Yu  wrote:

> Looks good overall.
>
> Currently lookupKafkaClusterId() is called synchronously. Have you
> considered making the call asynchronous (normally the GET / request comes
> sometime after worker start) ?
>
> Thanks
>
> On Mon, Dec 11, 2017 at 3:40 PM, Ewen Cheslack-Postava 
> wrote:
>
> > I'd like to start discussion on a simple KIP to expose Kafka cluster ID
> > info in the Connect REST API:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 238%3A+Expose+Kafka+cluster+ID+in+Connect+REST+API
> >
> > Hopefully straightforward, though there are some details on how this
> > affects startup behavior that might warrant discussion.
> >
> > -Ewen
> >
>


Re: [DISCUSS] KIP-234: add support for getting topic defaults from AdminClient

2017-12-11 Thread Ewen Cheslack-Postava
I think the key point is when the kafka admin and user creating topics
differ. I think a more realistic example of Dan's point (2) is for
retention. I know that realistically, admins aren't just going to randomly
drop the broker defaults from 1w to 1d without warning anyone (they'd
likely be fired...). But as a user, I may not know the broker configs, if
admins have overridden them, etc. I may want a *minimum* of, e.g., 2d. But
if the broker defaults are higher such that the admins are confident the
cluster can handle 1w, I'd rather just fall back on the default value.

Now, there's arguably a better solution for that case -- allow topic
configs to express a *minimum* value (or maximum depending on the
particular config), with the broker config taking precedence if it has a
smaller value (or larger in the case of maximums). This lets you express
your minimum requirements but allows the cluster to do more if that's the
default. However, that would represent a much more significant and invasive
change, and honestly I think it is more likely to confuse users.

@Dan, regarding compatibility, this changes behavior without revving the
request version number, which normally we only do for things that are
reasonably considered bugfixes or were it has no compatibility
implications. In this case, older brokers talking to newer AdminClients
will presumably return some error. Do we know what the non-null assertion
gets converted to and if we're happy with the behavior (i.e. will
applications be able to do something reasonable, distinguish it from some
completely unrelated error, etc)? Similarly, it's obviously only one
implementation using the KIP-4 APIs, but do we know what client-side
validation AdminClient is already doing and whether old AdminClients
talking to new brokers will see a change in behavior (or do they do
client-side validation such that old clients simply wouldn't have access to
this new functionality)?

-Ewen

On Mon, Dec 11, 2017 at 2:11 PM, dan  wrote:

> Dong,
>
> I agree that it *may* be better for a user to be explicit, however there
> are a couple reasons they may not.
> 1) a user doesn't even know what the options are. imagine writing a tool
> for users to create topics that steps them through things:
>
> $ kafka-topics.sh --create
> Give your topic a name: my-fav-topic
> How many partitions do you want [12]:
> What is the minimum in set replica size [2]:
> What is the maximum message size [1MB]:
> ...
>
> 2) a user wants to use broker defaults within reason. say they thinke they
> want min.cleanable.dirty.ratio=.5 and the default is .6. maybe thats fine,
> or even better for them. since the person maintaining the actual cluster
> has put thought in to this config. and as the maintainer keeps working on
> making the cluster run better they can change and tune things on the
> cluster level as needed.
>
> dan
>
>
> On Wed, Dec 6, 2017 at 11:51 AM, Dong Lin  wrote:
>
> > Hey Dan,
> >
> > I think you are saying that, if user can read the default config before
> > creating the topic, then this user can make better decision in what
> configs
> > need to be overwritten. The question here is, how user is going to use
> this
> > config to make the decision.
> >
> > In my understanding, user will compare the default value with expected
> > value, and override the config to be expected value if they are
> different.
> > If this is the only way that the default value can affect user's
> decision,
> > then it seems OK for user to directly override the config to the expected
> > value. I am wondering if this solution has some drawback.
> >
> > On the other hand, maybe there is a more advanced way that the default
> > value can affect the user's decision. It may be useful to understand this
> > use-case more specifically. Could you help provide a specific example
> here?
> >
> > Thanks,
> > Dong
> >
> >
> > On Wed, Dec 6, 2017 at 11:12 AM, dan  wrote:
> >
> > > Rajini,
> > >
> > > that was not my intent, the intent was to give a user of this api an
> > > insight in to what their topic will look like once created. as things
> > stand
> > > now a user is unable to (easily) have any knowledge of what their topic
> > > configs will be before doing a `CREATE_TOPICS`. as i mentioned in the
> > KIP,
> > > another option would be to have the `CreateTopicsOptions.
> > > validateOnly=true`
> > > version return data, but seems more invasive/questionable.
> > >
> > > dan
> > >
> > > On Wed, Dec 6, 2017 at 5:10 AM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Hi Dan,
> > > >
> > > > Thank you for the KIP. KIP-226 (https://cwiki.apache.org/
> > > > confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration)
> > > proposes
> > > > to add an option to include all synonyms of a config option when
> > > describing
> > > > configs. This includes any defaults. For example (using Dong's
> > example),
> > > if
> > > > you have topicA with min.cleanable.dirty.ratio=0.6 as an override and
> > the
> > 

Re: [DISCUSS] KIP-238: Expose Kafka cluster ID in Connect REST API

2017-12-11 Thread Ewen Cheslack-Postava
On Mon, Dec 11, 2017 at 4:01 PM, Gwen Shapira  wrote:

> Thanks, Ewen :)
>
> One thing that wasn't clear to me from the wiki: Will standalone connect
> also have a Kafka cluster ID? While it is true that only tasks have
> producers and consumers, I think we assumed that all tasks on one
> stand-alone will use one Kafka cluster?
>

Yeah, maybe not clear enough in the KIP, but this is what I was getting at
-- while I think it's possible to use different clusters for worker,
producer, and consumer, I don't think this is really expected or a use case
worth bending backwards to support perfectly. In standalone mode,
technically a value is not required because a default is included and we
only utilize the value currently for the producers/consumers in tasks. But
I don't think it is unreasonable to require a valid setting at the worker
level, even if you override the bootstrap.servers for producer and consumer.


>
> Another suggestion is not to block the REST API on the connection, but
> rather not return the cluster ID until we know it (return null instead). So
> clients will need to poll rather than block. Not sure this is better, but
> you didn't really discuss this, so wanted to raise the option.
>

It's mentioned briefly in
https://cwiki.apache.org/confluence/display/KAFKA/KIP-238%3A+Expose+Kafka+cluster+ID+in+Connect+REST+API#KIP-238:ExposeKafkaclusterIDinConnectRESTAPI-ProposedChanges
I think the tradeoff of blocking the server from being "started" until we
can at least make one request to the cluster isn't unreasonable since if
you can't do that, you're not going to be able to do any useful work
anyway. Anyone who might otherwise be using this endpoint to monitor health
(which it is useful for since it doesn't require any other external
services to be running just to give a response) can just interpret
connection refused or timeouts as an unhealthy state, as they should anyway.

-Ewen


>
> Gwen
>
>
> On Mon, Dec 11, 2017 at 3:42 PM Ewen Cheslack-Postava 
> wrote:
>
> > I'd like to start discussion on a simple KIP to expose Kafka cluster ID
> > info in the Connect REST API:
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 238%3A+Expose+Kafka+cluster+ID+in+Connect+REST+API
> >
> > Hopefully straightforward, though there are some details on how this
> > affects startup behavior that might warrant discussion.
> >
> > -Ewen
> >
>


Re: [DISCUSS] KIP-238: Expose Kafka cluster ID in Connect REST API

2017-12-11 Thread Ewen Cheslack-Postava
And to clarify a bit further: the goal is for both standalone and
distributed mode to display the same basic information. This hasn't
*strictly* been required before because standalone had no worker-level
interaction with the cluster (configs stored in memory, offsets on disk,
and statuses in memory). However, we've always *expected* that a reasonable
configuration was available for the worker and that any overrides were just
that -- customizations on top of the existing config. Although it could
have been *possible* to leave an invalid config for the worker yet provide
valid configs for producers and consumers, this was never the intent.

Therefore, the argument here is that we *should* be able to rely on a valid
config to connect to the Kafka cluster, whether in standalone or
distributed mode. There should always be a valid "fallback" even if
overrides are provided. We haven't been explicit about this before, but
unless someone objects, I don't think it is unreasonable.

Happy to update the KIP w/ these details if someone feels they would be
valuable.

-Ewen

On Mon, Dec 11, 2017 at 8:21 PM, Ewen Cheslack-Postava 
wrote:

>
> On Mon, Dec 11, 2017 at 4:01 PM, Gwen Shapira  wrote:
>
>> Thanks, Ewen :)
>>
>> One thing that wasn't clear to me from the wiki: Will standalone connect
>> also have a Kafka cluster ID? While it is true that only tasks have
>> producers and consumers, I think we assumed that all tasks on one
>> stand-alone will use one Kafka cluster?
>>
>
> Yeah, maybe not clear enough in the KIP, but this is what I was getting at
> -- while I think it's possible to use different clusters for worker,
> producer, and consumer, I don't think this is really expected or a use case
> worth bending backwards to support perfectly. In standalone mode,
> technically a value is not required because a default is included and we
> only utilize the value currently for the producers/consumers in tasks. But
> I don't think it is unreasonable to require a valid setting at the worker
> level, even if you override the bootstrap.servers for producer and consumer.
>
>
>>
>> Another suggestion is not to block the REST API on the connection, but
>> rather not return the cluster ID until we know it (return null instead).
>> So
>> clients will need to poll rather than block. Not sure this is better, but
>> you didn't really discuss this, so wanted to raise the option.
>>
>
> It's mentioned briefly in https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-238%3A+Expose+Kafka+cluster+
> ID+in+Connect+REST+API#KIP-238:ExposeKafkaclusterIDinConnectR
> ESTAPI-ProposedChanges I think the tradeoff of blocking the server from
> being "started" until we can at least make one request to the cluster isn't
> unreasonable since if you can't do that, you're not going to be able to do
> any useful work anyway. Anyone who might otherwise be using this endpoint
> to monitor health (which it is useful for since it doesn't require any
> other external services to be running just to give a response) can just
> interpret connection refused or timeouts as an unhealthy state, as they
> should anyway.
>
> -Ewen
>
>
>>
>> Gwen
>>
>>
>> On Mon, Dec 11, 2017 at 3:42 PM Ewen Cheslack-Postava 
>> wrote:
>>
>> > I'd like to start discussion on a simple KIP to expose Kafka cluster ID
>> > info in the Connect REST API:
>> >
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-238%
>> 3A+Expose+Kafka+cluster+ID+in+Connect+REST+API
>> >
>> > Hopefully straightforward, though there are some details on how this
>> > affects startup behavior that might warrant discussion.
>> >
>> > -Ewen
>> >
>>
>
>


Re: [DISCUSS] KIP-234: add support for getting topic defaults from AdminClient

2017-12-12 Thread Ewen Cheslack-Postava
re: API versions, I actually wasn't sure if we needed it or not. I'm fine
if people would prefer just bumping it, but I was actually curious if we
could get away without bumping it. I don't know the behavior of the broker
code paths for this well enough to know what types of errors those non-null
assertions get converted into.

For the return type, NewTopic seems reasonable and kind of intuitive --
basically a description of the NewTopic you would get. The only reason I
would be wary of reusing it is that what we don't want people doing is
taking that and passing it directly into AdminClient.createTopics since we
don't want them explicitly overriding all the defaults.

-Ewen

On Tue, Dec 12, 2017 at 2:32 PM, dan  wrote:

> Colin/Ewen,
>
> i will add changes to bump the API version.
>
> any preferences on the return type for the new method? tbh it seems like
> returning a NewTopic could make sense because the ConfigResource for a
> TOPIC type does not let me encode `numPartitions`
>
> thanks
> dan
>
> On Mon, Dec 11, 2017 at 7:22 PM, Colin McCabe  wrote:
>
> > Hi Dan,
> >
> > The KIP looks good overall.
> >
> > On Mon, Dec 11, 2017, at 18:28, Ewen Cheslack-Postava wrote:
> > > I think the key point is when the kafka admin and user creating topics
> > > differ. I think a more realistic example of Dan's point (2) is for
> > > retention. I know that realistically, admins aren't just going to
> > > randomly
> > > drop the broker defaults from 1w to 1d without warning anyone (they'd
> > > likely be fired...). But as a user, I may not know the broker configs,
> if
> > > admins have overridden them, etc. I may want a *minimum* of, e.g., 2d.
> > > But if the broker defaults are higher such that the admins are
> confident
> > the
> > > cluster can handle 1w, I'd rather just fall back on the default value.
> >
> > Right.  I think this API addresses a similar set of use-cases as adding
> > the "validateOnly" boolean for createTopics.  You shouldn't have to
> > create a topic to know whether it was possible to create it, or what the
> > retention will end up being, etc. etc.
> >
> > > Now, there's arguably a better solution for that case -- allow topic
> > > configs to express a *minimum* value (or maximum depending on the
> > > particular config), with the broker config taking precedence if it has
> a
> > > smaller value (or larger in the case of maximums). This lets you
> express
> > > your minimum requirements but allows the cluster to do more if that's
> the
> > > default. However, that would represent a much more significant and
> > > invasive change, and honestly I think it is more likely to confuse
> users.
> >
> > There always need to be topic defaults, though.  If we add a foobar
> > configuration for topics, existing topics will need to get grandfathered
> > in with a default foobar.  And they won't be able to set min and max
> > ranges, because foobars didn't exist back when the old topics were
> > created.
> >
> > >
> > > @Dan, regarding compatibility, this changes behavior without revving
> the
> > > request version number, which normally we only do for things that are
> > > reasonably considered bugfixes or were it has no compatibility
> > > implications. In this case, older brokers talking to newer AdminClients
> > > will presumably return some error. Do we know what the non-null
> assertion
> > > gets converted to and if we're happy with the behavior (i.e. will
> > > applications be able to do something reasonable, distinguish it from
> some
> > > completely unrelated error, etc)? Similarly, it's obviously only one
> > > implementation using the KIP-4 APIs, but do we know what client-side
> > > validation AdminClient is already doing and whether old AdminClients
> > > talking to new brokers will see a change in behavior (or do they do
> > > client-side validation such that old clients simply wouldn't have
> access
> > > to this new functionality)?
> >
> > I think we should bump the API version for this or add a new API key.
> > Nothing good is going to happen by pretending like this is compatible
> > with existing brokers.
> >
> > Also, I think it would be better just to have a separate function in
> > AdminClient rather than overloading the behavior of NULL in
> > describeConfigs.  It's not really that much more effort to have another
> > AdminClient function, and it's a lot simpler for 

[VOTE] KIP-238: Expose Kafka cluster ID in Connect REST API

2017-12-15 Thread Ewen Cheslack-Postava
Discussion seems to have tapered off, so I'd like to start the vote on
https://cwiki.apache.org/confluence/display/KAFKA/KIP-238%3A+Expose+Kafka+cluster+ID+in+Connect+REST+API

Obviously +1 (binding) from me :)

-Ewen


Re: [VOTE] KIP-238: Expose Kafka cluster ID in Connect REST API

2017-12-18 Thread Ewen Cheslack-Postava
Seems we've passed 72h and have the necessary 3 +1 binding votes, 2
non-binding, and 0 otherwise, so this KIP passes.

I'll follow up w/ another committer about landing the patch.

Thanks everyone!

-Ewen

On Mon, Dec 18, 2017 at 11:27 AM, Jason Gustafson 
wrote:

> +1 (binding)
>
> On Sun, Dec 17, 2017 at 8:44 PM, Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> > +1
> >
> > Nice KIP. Thanks!
> >
> > - Konstantine
> >
> > On Sat, Dec 16, 2017 at 6:19 AM, Gwen Shapira  wrote:
> >
> > > +1 (binding). Thanks!
> > > On Fri, Dec 15, 2017 at 10:55 AM Ted Yu  wrote:
> > >
> > > > +1
> > > >
> > > > On Fri, Dec 15, 2017 at 10:49 AM, Ewen Cheslack-Postava <
> > > e...@confluent.io
> > > > >
> > > > wrote:
> > > >
> > > > > Discussion seems to have tapered off, so I'd like to start the vote
> > on
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 238%3A+Expose+Kafka+cluster+ID+in+Connect+REST+API
> > > > >
> > > > > Obviously +1 (binding) from me :)
> > > > >
> > > > > -Ewen
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-228 Negative record timestamp support

2017-12-18 Thread Ewen Cheslack-Postava
I think the trivial change of just recognizing using -1 was a mistake for a
sentinel value and special casing it while allowing other negative values
through is the most practical, reasonable change.

Realistically, the scope of impact for that -1 is pretty tiny, as has been
pointed out. A single millisecond gap in available timestamps in 1969. For
producers that really want to be careful (as the NYT data might want to
be), having the producer layer adjust accordingly is unlikely to be an
issue (you can't assume these timestamps are unique anyway, so they cannot
reasonably used for ordering; adjusting by 1ms is a practical tradeoff).

Other approaches where we modify the semantics of the timestamp from the
two existing modes require eating up valuable flags in the message format,
or ramping the message format version, all of which make things
significantly messier. Hell, timezones, leap seconds, and ms granularity
probably make that 1ms window pretty much moot for any practical
applications, and for the extremely rare case that an application might
care, they are probably willing to pay the cost of a secondary index if
they needed to store timestamp values in the payload rather than in the
metadata.

Given that we have the current system in place, I suspect that any
translation to using Long.MIN_VALUE as the sentinel is probably just more
confusing to users, adds more implementation overhead to client libraries,
and is more likely to introduce bugs.

Warts like these always feel wrong when approached from pure design
principles, but the fact is that the constraints are already there. To me,
none of the proposals to move to an encoding we'd prefer seem to add enough
value to outweigh the migration, compatibility, and implementation costs.

@Dong -- your point about special timestamp values is a very good one. The
issue may extend to other cases in the protocol where we use timestamps. Is
this the scope we need to worry about (2 values instead of just 1) or are
there others? This also might be something we want to look out for in the
future -- using special values relative to .MIN_VALUE
instead of relative to 0.

-Ewen

On Tue, Dec 12, 2017 at 11:12 AM, Dong Lin  wrote:

> Hey Konstantin,
>
> Thanks for updating the KIP.
>
> If we were to support negative timestamp in the message, we probably also
> want to support negative timestamp in ListOffsetRequest. Currently in
> ListOffsetRequest, timestamp value -2 is used to indicate earliest
> timestamp and timestamp value -1 is used to indicate latest timestamp. It
> seems that we should make changes accordingly so that -1 and -2 can be
> supported as valid timestamp in ListOffsetRequest. What do you think?
>
> Thanks,
> Dong
>
>
>
> On Mon, Dec 11, 2017 at 12:55 PM, Konstantin Chukhlomin <
> chuhlo...@gmail.com
> > wrote:
>
> > Hi all,
> >
> > I've updated KIP with few more details:
> > Added (proposed) Changes in binary message format <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-228+
> > Negative+record+timestamp+support#KIP-228Negativerecordtimes
> > tampsupport-Changesinbinarymessageformat>
> > Added Changes from producer perspective  > luence/display/KAFKA/KIP-228+Negative+record+timestamp+supp
> > ort#KIP-228Negativerecordtimestampsupport-Changesfromproducerperspective
> >
> > Added Changes from consumer perspective  > luence/display/KAFKA/KIP-228+Negative+record+timestamp+supp
> > ort#KIP-228Negativerecordtimestampsupport-Changesfromconsumerperspective
> >
> >
> > Let me know if it makes sense to you.
> >
> > -Konstantin
> >
> > > On Dec 7, 2017, at 2:46 PM, Konstantin Chukhlomin  >
> > wrote:
> > >
> > > Hi Matthias,
> > >
> > > Indeed for consumers it will be not obvious what −1 means: actual
> > timestamp
> > > or no timestamp. Nevertheless, it's just −1 millisecond, so I thought
> it
> > will be
> > > not a big deal to leave it (not clean, but acceptable).
> > >
> > > I agree that it will much cleaner to have a different type of topics
> > that support
> > > negative timestamp and/or threat Long.MIN_VALUE as a no-timestamp.
> > > I'll update KIP to make it a proposed solution.
> > >
> > > Thanks,
> > > Konstantin
> > >
> > >> On Dec 5, 2017, at 7:06 PM, Matthias J. Sax 
> > wrote:
> > >>
> > >> Thanks for the KIP Konstantin.
> > >>
> > >> From my understanding, you propose to just remove the negative
> timestamp
> > >> check in KafkaProducer and KafkaStreams. If topics are configured with
> > >> `CreateTime` brokers also write negative timestamps if they are
> embedded
> > >> in the message.
> > >>
> > >> However, I am not sure about the overlapping semantics for -1
> timestamp.
> > >> My concerns is, that this ambiguity might result in issues. Assume
> that
> > >> there is a topic (configured with `CreateTime`) for which an old and a
> > >> new producer are writing. The old producer uses old message format and
> > >> does not include any timestamp in the message. The broker

Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect

2018-01-02 Thread Ewen Cheslack-Postava
 > >>>> > think this is a safe assumption for the short term, and if we need
> > >>>> more
> > >>>> > control to (de)serialize named headers differently for the same
> > >>>> connector,
> > >>>> > we can always implement a different `HeaderConverter` that gives
> > >>>> users more
> > >>>> > control.
> > >>>> >
> > >>>> > So that would solve the serialization problem. How about
> connectors
> > >>>> and
> > >>>> > transforms that are implemented to expect a certain type of header
> > >>>> value,
> > >>>> > such as an integer or boolean or timestamp? We could solve this
> > >>>> problem
> > >>>> > (for the most part) by adding methods to the `Header` interface to
> > >>>> get the
> > >>>> > value in the desired type, and to support all of the sensible
> > >>>> conversions
> > >>>> > between Connect's primitives and logical types. So, a connector or
> > >>>> > transform could always call `header.valueAsObject()` to get the
> raw
> > >>>> > representation from the converter, but a connector or transform
> > could
> > >>>> also
> > >>>> > get the string representation by calling `header.valueAsString()`,
> > or
> > >>>> the
> > >>>> > INT64 representation by calling `header.valueAsLong()`, etc. We
> > could
> > >>>> even
> > >>>> > have converting methods for the built-in logical types (e.g.,
> > >>>> > `header.valueAsTimestamp()` to return a java.util.Date value that
> is
> > >>>> > described by Connect's Timestamp logical type). We can convert
> > >>>> between most
> > >>>> > primitive and logical types (e.g., anything to a STRING, INT32 to
> > >>>> FLOAT32,
> > >>>> > etc.), but there are a few that don't make sense (e.g., ARRAY to
> > >>>> FLOAT32,
> > >>>> > INT32 to STRUCT, BYTE_ARRAY to anything, etc.), so these can
> throw a
> > >>>> > `DataException`.
> > >>>> >
> > >>>> > I've refined this approach over the last few months, and have a PR
> > >>>> for a
> > >>>> > complete prototype that demonstrates these concepts and
> techniques:
> > >>>> > https://github.com/apache/kafka/pull/4319
> > >>>> >
> > >>>> > This PR does *not* update the documentation, though I can add that
> > if
> > >>>> we
> > >>>> > approve of this approach. And, we probably want to define (at
> least
> > >>>> on the
> > >>>> > KIP) some relatively obvious SMTs for copying header values into
> > >>>> record
> > >>>> > key/value fields, and extracting record key/value fields into
> header
> > >>>> values.
> > >>>> >
> > >>>> > @Michael, would you mind if I edited KIP-145 to reflect this
> > >>>> proposal? I
> > >>>> > would be happy to keep the existing proposal at the end of the
> > >>>> document (or
> > >>>> > remove it if you prefer, since it's already in the page history),
> > and
> > >>>> we
> > >>>> > can revise as we choose a direction.
> > >>>> >
> > >>>> > Comments? Thoughts?
> > >>>> >
> > >>>> > Best regards,
> > >>>> >
> > >>>> > Randall
> > >>>> >
> > >>>> >
> > >>>> > On Thu, Oct 19, 2017 at 2:10 PM, Michael André Pearce <
> > >>>> > michael.andre.pea...@me.com> wrote:
> > >>>> >
> > >>>> >> @rhauch
> > >>>> >>
> > >>>> >> Here is the previous discussion thread, just reigniting so we can
> > >>>> discuss
> > >>>> >> against the original kip thread
> > >>>> >>
> > >>>> >>
> > >>>> >> Cheers
> > >>>> >>
> > >>>> >> Mike
> > >>>> >>
> > >&

Re: [VOTE] KIP-239 Add queryableStoreName() to GlobalKTable

2018-01-02 Thread Ewen Cheslack-Postava
+1 binding

The idea seems reasonable. Looking at it implementation-wise, seems there
is a bit of awkwardness because GlobalKTableImpl uses a
KTableValueGetterSupplier which seems to possibly have multiple stores, but
maybe using the more specific KTableSourceValueGetterSupplier
implementation instead can resolve that.

-Ewen

On Mon, Jan 1, 2018 at 6:22 PM, Ted Yu  wrote:

> Gentle reminder: one more binding vote is needed for the KIP to pass.
>
> Cheers
>
> On Thu, Dec 21, 2017 at 4:13 AM, Damian Guy  wrote:
>
> > +1
> >
> > On Wed, 20 Dec 2017 at 21:09 Ted Yu  wrote:
> >
> > > Ping for more (binding) votes.
> > >
> > > The pull request is ready.
> > >
> > > On Fri, Dec 15, 2017 at 12:57 PM, Guozhang Wang 
> > > wrote:
> > >
> > > > +1 (binding), thanks!
> > > >
> > > > On Fri, Dec 15, 2017 at 11:56 AM, Ted Yu 
> wrote:
> > > >
> > > > > Hi,
> > > > > Here is the discussion thread:
> > > > >
> > > > > http://search-hadoop.com/m/Kafka/uyzND12QnH514pPO9?subj=
> > > > > Re+DISCUSS+KIP+239+Add+queryableStoreName+to+GlobalKTable
> > > > >
> > > > > Please vote on this KIP.
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>


Re: [VOTE] KIP-233: Simplify StreamsBuilder#addGlobalStore

2018-01-02 Thread Ewen Cheslack-Postava
+1 binding, seems like a nice simplification.

Regarding the source and processor name, do they actually need to be unique
or could they use the same value? Since these use incrementing integers, it
could be nice for debuggability/understanding to have them use the same
name if possible instead of generating 2 separate names.

-Ewen

On Tue, Jan 2, 2018 at 9:12 AM, Guozhang Wang  wrote:

> On a side note, could you update the "compatibility and upgrade" section,
> that when users start to make code changes to leverage the new API, what
> kind of upgrade executions they'd need to do? I feel they need to rename
> topics / etc.
>
> On Tue, Jan 2, 2018 at 9:10 AM, Guozhang Wang  wrote:
>
> > +1, thanks!
> >
> > On Wed, Dec 27, 2017 at 6:01 PM, Ted Yu  wrote:
> >
> >> +1
> >>
> >> On Wed, Dec 27, 2017 at 12:15 PM, Bill Bejeck 
> wrote:
> >>
> >> > +1
> >> >
> >> > On Wed, Dec 27, 2017 at 3:07 PM, Matthias J. Sax <
> matth...@confluent.io
> >> >
> >> > wrote:
> >> >
> >> > > +1
> >> > >
> >> > > On 12/26/17 9:00 PM, Panuwat Anawatmongkhon wrote:
> >> > > > Hi all,
> >> > > > I would like to start the vote thread.
> >> > > > This is link for the kip.
> >> > > >
> >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-233%
> >> 3A+Simplify+
> >> > > StreamsBuilder%23addGlobalStore
> >> > > >
> >> > > > Cheers
> >> > > >
> >> > >
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-228 Negative record timestamp support

2018-01-02 Thread Ewen Cheslack-Postava
For `allow.negative.timestamps`, do you mean this as a broker config? I'm
not entirely clear on what the proposal would entail.

I think taking into account whether we're talking about compatibility with
existing data in Kafka vs enabling use of negative timestamps is important
here. If they're effectively not supported today (though admittedly this is
really client-specific), then we need only concern ourselves with data that
hasn't been produced into Kafka yet. In that case, we can always handle
sentinel values in special ways if we really want to. For example, the Java
producer does not accept any values < 0 and the API supports passing null
rather than the sentinels. The implementation could easily be made to map
those values into a range that is less likely to be utilized (e.g. use the
values near Long.MIN_VALUE and have the consumer convert back as needed).
The sentinel for NO_TIMESTAMP could be changed between versions as long as
it is handled consistently between client versions.

IMO we already have way too many configs, so we should think about where
the impact is and if a not ideal, but also not significant compromise can
be made and avoid most of the additional complexity. Introducing the new
config seems like it has significant compatibility concerns that need to be
sorted out. In contrast, I suspect the use cases we need to support that
have come up so far can handle 1 or 2 special cases and the necessary
munging could be handled safely by interceptors such that it is trivial to
make sure all your apps do the right thing. I appreciate the pain of a ton
of mailing list questions about an issue like this, but given the
likelihood of encountering that particular value, I just find it unlikely
it would be that common and I think it's a reasonable tradeoff to tell a
user they might need to handle that one special case.

-Ewen

On Thu, Dec 28, 2017 at 12:58 PM, Matthias J. Sax 
wrote:

> I agree that changing message format or using a flag bit might not be
> worth it.
>
> However, just keeping -1 as "unknown" leaving a time gap give me a lot
> of headache, too. Your arguments about "not an issue in practice" kinda
> make sense to me, but I see the number of question on the mailing list
> already if we really follow this path... It will confuse users that
> don't pay attention and "loose" data if Kafka Streams drops records with
> timestamp -1 but processes other records with negative timestamps.
>
> Thus, I was wondering if a new topic config (maybe
> `allow.negative.timestamps` with default `false`) that allows for enable
> negative timestamps would be the better solution? With this new config,
> we would not have any sentinel value for "unknown" and all timestamps
> would be valid. Old producers, can't write to those topics if they are
> configured with CREATE_TIME though; APPEND_TIME would still work for
> older producers but with APPEND_TIME no negative timestamps are possible
> in the first place, so this config would not have any impact anyway.
>
> Kafka Streams could check the topic config and only drop negative
> timestamps is they are not enabled. Or course, existing topic should not
> enable negative timestamps if there are records with -1 in them already
> -- otherwise, semantics break down -- but this would be a config error
> we cannot prevent. However, I would expect that mostly newly created
> topics would enable this config anyway.
>
>
> -Matthias
>
> On 12/18/17 10:47 PM, Ewen Cheslack-Postava wrote:
> > I think the trivial change of just recognizing using -1 was a mistake
> for a
> > sentinel value and special casing it while allowing other negative values
> > through is the most practical, reasonable change.
> >
> > Realistically, the scope of impact for that -1 is pretty tiny, as has
> been
> > pointed out. A single millisecond gap in available timestamps in 1969.
> For
> > producers that really want to be careful (as the NYT data might want to
> > be), having the producer layer adjust accordingly is unlikely to be an
> > issue (you can't assume these timestamps are unique anyway, so they
> cannot
> > reasonably used for ordering; adjusting by 1ms is a practical tradeoff).
> >
> > Other approaches where we modify the semantics of the timestamp from the
> > two existing modes require eating up valuable flags in the message
> format,
> > or ramping the message format version, all of which make things
> > significantly messier. Hell, timezones, leap seconds, and ms granularity
> > probably make that 1ms window pretty much moot for any practical
> > applications, and for the extremely rare case that an application might
> > care, they are probably willing to pay the cost of a secondary inde

Re: [DISCUSS] KIP-228 Negative record timestamp support

2018-01-02 Thread Ewen Cheslack-Postava
On Tue, Jan 2, 2018 at 8:04 PM, Matthias J. Sax 
wrote:

> I was thinking about a broker/topic config.
>
> However, I am not sure if we only need to worry about data written in
> the future (this would only be true, if there would be no records with
> -1 timestamp already). Assume that we you an existing topic that
> contains data with -1 = UNKNOWN records -- for this case, we would give
> those timestamps a new semantics if we suddenly allow negative
> timestamps. (Assuming that we don't allow -1 as a gap in the timeline
> what I would rather not do.)
>

Using the Java producer you cannot have a negative timestamp today. So
(modulo comment about being dependent on the client implementation), no
existing data should have -1 timestamp unless it is NO_TIMESTAMP.

When you say you'd rather not like to have -1 as a gap in the timeline, can
you explain the potential scale of impact? I view it as a relatively
unlikely value and something that people who are really concerned with
negative timestamps can easily work around. Probably many users won't care
as they will not be using pre-1970s data where they actually set the Kafka
timestamp (rather than having timestamps embedded in the data) anyway. I
agree it isn't ideal, but to me it looks like a reasonable tradeoff. What
are the effects/use cases that make you concerned that we'd see significant
user pain as a result?


>
> Also note, that it's not really client specific IMHO, as one could
> implement their own clients. There are many third party clients and we
> don't know if they check for negative timestamps (applications could
> even assign their own special meaning to negative timestamps as those
> are unused atm) -- furthermore, all older client not embedding a
> timestamp default to -1 on the broker side...
>

I said "client-specific" because some of the checks are done on the
client-side, which means they are dependent on the specific client
implementation being used. Based on the rest of your comment, I think we're
in agreement except for how we are naming things :) I'd have to double
check if the same level of enforcement is done broker-side. I only mention
that because we tend to discuss these proposals in the context of only the
Java clients, but it is worth thinking through the impact to other clients
as well.


>
> > The implementation could easily be made to map
> > those values into a range that is less likely to be utilized (e.g. use
> the
> > values near Long.MIN_VALUE and have the consumer convert back as needed).
> > The sentinel for NO_TIMESTAMP could be changed between versions as long
> as
> > it is handled consistently between client versions.
>
> This opens Pandora's box IMHO.
>

Why? There should be a small number of values that need to be mapped and
someone could think through the different compatibility issues that are
possible to determine if there are any significant issues/drawbacks.


>
> > Introducing the new
> > config seems like it has significant compatibility concerns that need to
> be
> > sorted out.
>
> I cannot follow here -- from my point of view, it relaxes compatibility
> concerns. If we only allow new topic to enable negative timestamps, old
> behavior and new behavior are not mixed. IMHO, mixing both would be a
> real issue. Thus, for new topics we can change "unknown" from -1 to
> Long.MIN_VALUE and don't mix two different approaches within a single
> topic.
>

What's the mechanism for this? Is the new config only allowed in
CreateTopics requests? If you use existing tooling to set topic configs,
you would just be able to set any valid config. Are the semantics just
undefined if you do? Unless it is impossible to do certain things, we have
to deal with the compatibility concerns regardless of intended use. Might
be fine to just say the behavior is undefined, but there's still work to be
done there. Regardless, I didn't (and probably still don't) have a concrete
understanding of the proposed setting, so hard for me to reason about it.

-Ewen


>
> I see your point that we do have too many configs -- we could also make
> it a new value for existing `message.timestamp.type`.
>
>
> -Matthias
>
>
> On 1/2/18 7:48 PM, Ewen Cheslack-Postava wrote:
> > For `allow.negative.timestamps`, do you mean this as a broker config? I'm
> > not entirely clear on what the proposal would entail.
> >
> > I think taking into account whether we're talking about compatibility
> with
> > existing data in Kafka vs enabling use of negative timestamps is
> important
> > here. If they're effectively not supported today (though admittedly this
> is
> > really client-specific), then we need only concern ourselves wi

Re: [DISCUSS] KIP-242: Mask password fields in Kafka Connect REST response

2018-01-02 Thread Ewen Cheslack-Postava
Vincent,

Thanks for the KIP. This is definitely an issue we know is a problem for
some users.

I think the major problem with the KIP as-is is that it makes it impossible
to get the original value back out of the API. This KIP probably ties in
significantly with ideas for securing the REST API (SSL) and adding ACLs to
it. Both are things we know people want, but haven't happened yet. However,
it also interacts with other approaches to adding those features, e.g.
layering proxies on top of the existing API (e.g. nginx, apache, etc). Just
doing a blanket replacement of password values with a constant would likely
break things for people who secure things via a proxy (and may just not
allow reads of configs unless the user is authorized for the particular
connector). These are the types of concerns we like to think through in the
compatibility section. One option to get the masking functionality in
without depending on a bunch of other security improvements might be to
make this configurable so users that need this (and can forgo seeing a
valid config via the API) can opt-in.

Regarding your individual points:

* I don't think the particular value for the masked content matters much.
Any constant indicating a password field is good. Your value seems fine to
me.
* I don't think ConnectorInfo has enough info on its own to do proper
masking. In fact, I think you need to parse the config enough to get the
Connector-specific ConfigDef out in order to determine which fields are
Password fields. I would probably try to push this to be as central as
possible, maybe adding a method to AbstractHerder that can get configs with
a boolean indicating whether they need to have sensitive fields removed.
That method could deal with parsing the config to get the right connector,
getting the connector config, and then sanitizing any configs that are
sensitive. We could have this in one location, then have the relevant REST
APIs just use the right flag to determine if they get sanitized or
unsanitized data.

That second point raises another interesting point -- what happens if the
connector configuration references a connector which the worker serving the
REST request *does not know about*? In that case, there will be no
corresponding ConfigDef that defines which fields are Passwords and need to
be sensitized. Does it return an error? Or just return the config as is?

-Ewen

On Thu, Dec 28, 2017 at 3:34 AM, Ted Yu  wrote:

> For the last point you raised, can you come up with a unit test that shows
> what you observed ?
>
> Cheers
>
> On Mon, Dec 18, 2017 at 11:14 AM, Vincent Meng  wrote:
>
> > Hi all,
> >
> > I've created KIP-242, a proposal to secure credentials in kafka connect
> > rest endpoint.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 242%3A+Mask+password+in+Kafka+Connect+Rest+API+response
> >
> > Here are something I'd like to discuss:
> >
> >- The "masked" value is set to "*" (9 stars) currently. It's
> an
> >arbitrary value I picked. Are there any better options?
> >- The proposal change is in the
> >*org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource*
> >class, where before the response is returned we go through config and
> > mask
> >the password. This has been proven to work. However I think it's
> > cleaner if
> >we do the masking in
> >*org.apache.kafka.connect.runtime.rest.entities.ConnectorInfo* where
> >config() method can return the masked config, so that we don't have to
> > mask
> >the value in each endpoint (and new endpoints if added in the
> future). I
> >ran into some issue with this. So after a while, I start seeing
> > incorrect
> >password being used for the connector. My conjecture is that the value
> >stored in kafka has been changed to the mask value. Can someone
> confirm
> >this might happen with kafka connect? Feel like
> *ConnectorInfo.Config()*
> >is used somewhere to update connect config storage topic.
> >
> > If there's any comments on the KIP let me know. Thank you very much.
> >
> > -Vincent
> >
>


Re: [VOTE] KIP-237: More Controller Health Metrics

2018-01-02 Thread Ewen Cheslack-Postava
Dong,

Seems like a reasonable addition. +1 (binding) from me.

There were some final naming issues in the DISCUSS thread that didn't get
follow up. I'm fine with either version of the naming as these are details
I think mostly advanced users will be monitoring and both naming arguments
are reasonable here.

-Ewen

On Thu, Dec 21, 2017 at 2:57 PM, Dong Lin  wrote:

> Bump up the thread so that we can have these sensors to monitor our Kafka
> service sooner.
>
> On Mon, Dec 18, 2017 at 2:03 PM, Dong Lin  wrote:
>
> > Hi all,
> >
> > Since there are no more outstanding comments, I would like to start
> voting
> > thread for KIP-237: https://cwiki.apache.org/confluence/display/KAFKA/
> > KIP-237%3A+More+Controller+Health+Metrics
> >
> > The KIP proposes to add a few more metrics to help monitor Kafka
> > Controller health.
> >
> > Thanks,
> > Dong
> >
>


Re: [VOTE] KIP-243: Make ProducerConfig and ConsumerConfig constructors public

2018-01-02 Thread Ewen Cheslack-Postava
Builders have historically seen some resistance in the project (by some
notable original authors...) but they've been increasingly making their way
in -- SchemaBuilder in Connect, I believe some small streams APIs,
AdminClient stuff. The general trend seems to be towards more fluent APIs.

Regarding the KIP, I'm not sure why the APIs are currently Map, but
this seems wrong. We have a mess of old Properties compatibility (which we
seem to have dragged into the new producer/consumer APIs), but we could at
least make these Map. (I prefer just doing Map
since we go through a parse phase regardless of the types, but didn't
manage to make that happen within Connect.)

Otherwise, the general idea seems fine to me.

-Ewen

On Thu, Dec 21, 2017 at 2:28 PM, Jason Gustafson  wrote:

> I didn't sense much resistance in that thread, just an effort to keep the
> streams and core client config APIs consistent ;).
>
> I'd prefer seeing a KIP for a more general improvement, but this change
> seems harmless and improves consistency between the clients, so +1 from me.
>
> -Jason
>
> On Thu, Dec 21, 2017 at 11:19 AM, Matthias J. Sax 
> wrote:
>
> > I personally love the builder pattern idea. There was some push back in
> > the past though from some people.
> >
> > cf https://issues.apache.org/jira/browse/KAFKA-4436
> >
> > Happy to propose the builder pattern but than we should have a proper
> > DISCUSS thread. Maybe we do this as a follow up and just do this KIP
> as-is?
> >
> >
> > -Matthias
> >
> > On 12/21/17 10:28 AM, Jason Gustafson wrote:
> > > Hey Matthias,
> > >
> > > Let me suggest an alternative. As you have mentioned, these config
> > classes
> > > do not give users much benefit currently. Maybe we change that? I think
> > > many users would appreciate having a builder for configuration since it
> > > provides type safety and is generally a much friendlier pattern to work
> > > with programmatically. Users could then do something like this:
> > >
> > > ConsumerConfig config = ConsumerConfig.newBuilder()
> > > .setBootstrapServers("localhost:9092")
> > > .setGroupId("group")
> > > .setRequestTimeout(15, TimeUnit.SECONDS)
> > > .build();
> > >
> > > Consumer consumer = new KafkaConsumer(config);
> > >
> > > An additional benefit of this is that it gives us a better way to
> expose
> > > config deprecations. In any case, it would make it less odd to expose
> the
> > > public constructor without giving users anything useful to do with the
> > > class.
> > >
> > > What do you think?
> > >
> > > -Jason
> > >
> > > On Wed, Dec 20, 2017 at 5:59 PM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > >> It's tailored for internal usage. I think client constructors don't
> > >> benefit from accepting those config objects. We just want to be able
> to
> > >> access the default values for certain parameters.
> > >>
> > >> From a user point of view, it's actually boiler plate code if you pass
> > >> in a config object instead of a plain Properties object because the
> > >> config object itself is immutable.
> > >>
> > >> I actually create a JIRA to remove the constructors from KafkaStreams
> > >> that do accept StreamsConfig for exact this reason:
> > >> https://issues.apache.org/jira/browse/KAFKA-6386
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >> On 12/20/17 3:33 PM, Jason Gustafson wrote:
> > >>> Hi Matthias,
> > >>>
> > >>> Isn't it a little weird to make these constructors public but not
> also
> > >>> expose the corresponding client constructors that use them?
> > >>>
> > >>> -Jason
> > >>>
> > >>> On Tue, Dec 19, 2017 at 9:30 AM, Bill Bejeck 
> > wrote:
> > >>>
> >  +1
> > 
> >  On Tue, Dec 19, 2017 at 12:09 PM, Guozhang Wang  >
> >  wrote:
> > 
> > > +1
> > >
> > > On Tue, Dec 19, 2017 at 1:49 AM, Tom Bentley <
> t.j.bent...@gmail.com>
> > > wrote:
> > >
> > >> +1
> > >>
> > >> On 18 December 2017 at 23:28, Vahid S Hashemian <
> > > vahidhashem...@us.ibm.com
> > >>>
> > >> wrote:
> > >>
> > >>> +1
> > >>>
> > >>> Thanks for the KIP.
> > >>>
> > >>> --Vahid
> > >>>
> > >>>
> > >>>
> > >>> From:   Ted Yu 
> > >>> To: dev@kafka.apache.org
> > >>> Date:   12/18/2017 02:45 PM
> > >>> Subject:Re: [VOTE] KIP-243: Make ProducerConfig and
> > >> ConsumerConfig
> > >>> constructors public
> > >>>
> > >>>
> > >>>
> > >>> +1
> > >>>
> > >>> nit: via "copy and past" an 'e' is missing at the end.
> > >>>
> > >>> On Mon, Dec 18, 2017 at 2:38 PM, Matthias J. Sax <
> > > matth...@confluent.io>
> > >>> wrote:
> > >>>
> >  Hi,
> > 
> >  I want to propose the following KIP:
> > 
> > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> > >>> apache.org_confluence_display_KAFKA_KIP-2D&d=DwIBaQ&c=jf_
> > >>> iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-
> > >>> kjJc7uSVcviKUc&m=JT

Re: [DISCUSS] KIP-231: Improve the Required ACL of ListGroups API

2018-01-02 Thread Ewen Cheslack-Postava
Late to the game here, but I'm +1 on this. Some of the ConsumerGroup
permissions are weird, but this KIP brings the describe ACLs into better
alignment with everything else and makes things more functional for clients
with more locked down permissions.

-Ewen

On Fri, Dec 15, 2017 at 12:57 PM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> If there are no other feedback or suggestion on this KIP, I'll start a
> vote early next week.
>
> Thanks.
> --Vahid
>
>
>
> From:   "Vahid S Hashemian" 
> To: dev@kafka.apache.org
> Date:   11/29/2017 03:18 PM
> Subject:Re: [DISCUSS] KIP-231: Improve the Required ACL of
> ListGroups API
>
>
>
> Completing the subject line :)
>
>
>
> From:   "Vahid S Hashemian" 
> To: dev 
> Date:   11/29/2017 03:17 PM
> Subject:[DISCUSS] KIP-231:
>
>
>
> Hi everyone,
>
> I started KIP-231 to propose a small change to the required ACL of
> ListGroups API (in response to KAFKA-5638):
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> apache.org_confluence_display_KAFKA_KIP-2D231-253A-
> 2BImprove-2Bthe-2BRequired-2BACL-2Bof-2BListGroups-2BAPI&
> d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-
> kjJc7uSVcviKUc&m=XjHVTsIl7t-z0NBesB0U-ptMMm6mmpy3UqS8TjJM5yM&s=
> eu378oaLvC0Wzbfcz15Rwo4nqdrO11ENLK6v9Kq9Z6w&e=
>
>
> Your feedback and suggestions are welcome!
>
> Thanks.
> --Vahid
>
>
>
>
>
>
>
>
>
>
>


Re: [VOTE] KIP-231: Improve the Required ACL of ListGroups API

2018-01-02 Thread Ewen Cheslack-Postava
+1 (binding)

Thanks for the KIP Vahid, nice improvement!

-Ewen

On Tue, Dec 19, 2017 at 11:30 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> I believe the concerns on this KIP have been addressed so far.
> Therefore, I'd like to start a vote.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 231%3A+Improve+the+Required+ACL+of+ListGroups+API
>
> Thanks.
> --Vahid
>
>


Re: [VOTE] KIP-239 Add queryableStoreName() to GlobalKTable

2018-01-02 Thread Ewen Cheslack-Postava
Oh, the KIP passes w/ the required votes. My comment was just on
implementation details. I will leave comments about that up to the
subsequent PR and to the Kafka Streams folks that are much better suited
than me to comment on them :)

-Ewen

On Tue, Jan 2, 2018 at 9:28 PM, Richard Yu 
wrote:

> After investigation, I have found that the
> InternalStreamsBuilder#globalTable method is the only instance where the
> constructor for GlobalKTableImpl is called.
> The KTableValueGetterSupplier parameter used in this particular constructor
> is an instance of KTableSourceValueGetterSupplier. Hence, your requirement
> is satisfied.
>
> Since this is the vote thread, if you have further comments, please comment
> on the pull request.
>
> On Tue, Jan 2, 2018 at 6:38 PM, Ewen Cheslack-Postava 
> wrote:
>
> > +1 binding
> >
> > The idea seems reasonable. Looking at it implementation-wise, seems there
> > is a bit of awkwardness because GlobalKTableImpl uses a
> > KTableValueGetterSupplier which seems to possibly have multiple stores,
> but
> > maybe using the more specific KTableSourceValueGetterSupplier
> > implementation instead can resolve that.
> >
> > -Ewen
> >
> > On Mon, Jan 1, 2018 at 6:22 PM, Ted Yu  wrote:
> >
> > > Gentle reminder: one more binding vote is needed for the KIP to pass.
> > >
> > > Cheers
> > >
> > > On Thu, Dec 21, 2017 at 4:13 AM, Damian Guy 
> > wrote:
> > >
> > > > +1
> > > >
> > > > On Wed, 20 Dec 2017 at 21:09 Ted Yu  wrote:
> > > >
> > > > > Ping for more (binding) votes.
> > > > >
> > > > > The pull request is ready.
> > > > >
> > > > > On Fri, Dec 15, 2017 at 12:57 PM, Guozhang Wang <
> wangg...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 (binding), thanks!
> > > > > >
> > > > > > On Fri, Dec 15, 2017 at 11:56 AM, Ted Yu 
> > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > Here is the discussion thread:
> > > > > > >
> > > > > > > http://search-hadoop.com/m/Kafka/uyzND12QnH514pPO9?subj=
> > > > > > > Re+DISCUSS+KIP+239+Add+queryableStoreName+to+GlobalKTable
> > > > > > >
> > > > > > > Please vote on this KIP.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-208: Add SSL support to Kafka Connect REST interface

2018-01-04 Thread Ewen Cheslack-Postava
Hey Jakub,

Sorry for not getting to this sooner. Overall the proposal looks good to
me, I just had a couple of questions.

1. For the configs/overrides, does this happen on a per-setting basis or if
one override is included do we not use any of the original settings? I
suspect that if you need to override one setting, it probably means you're
using an entirely different config and so the latter behavior seems better
to me. We've talked a bit about doing something similar for the
producer/consumer security settings as well so you don't have to specify
security configs in 3 places in your worker config.

2. For using default values from the worker config, I am wondering how
convinced we are that it will be common for them to be the same? I really
don't have enough experience w/ these setups to know, so just a question
here. I think the other thing to take into account here is that even though
we're not dealing with authorization in this KIP, we will eventually want
it for these APIs. Would we expect to be using the same principal for Kafka
and the Connect REST API? In a case where a company has a Connect cluster
that, e.g., an ops team manages and they are the only ones that are
supposed to make changes, that would make sense to me. But for a setup
where some dev team is allowed to use the REST API to create new connectors
but the cluster is managed by an ops team, I would think the Kafka
credentials would be different. I'm not sure how frequent each case would
be, so I'm a bit unsure about the default of using the worker security
configs by default. Thoughts?

3. We should probably specify the default in the table for
rest.advertised.security.protocol because in ConfigDef if you don't specify
a default value it becomes a required config. The HTTP default will
probably need to be in there anyway.

4. Do we want to list the existing settings as deprecated and just move to
using listeners for consistency? We don't need to remove them anytime soon,
but given that the broker is doing the same, maybe we should just do that
in this KIP?

I think these are mostly small details, overall it looks like a good plan!

Thanks,
Ewen

On Tue, Oct 24, 2017 at 5:19 AM, Jakub Scholz  wrote:

> There has been no discussion since my last update week ago. Unless someone
> has some further comments in the next 48 hours, I will start the voting for
> this KIP.
>
> Thanks & Regards
> Jakub
>
> On Tue, Oct 17, 2017 at 5:54 PM, Jakub Scholz  wrote:
>
> > Ok, so I updated the KIP according to what we discussed. Please have a
> > look at the updates. Two points I'm not 100% sure about:
> >
> > 1) Should we mark the rest.host.name and rest.port options as
> deprecated?
> >
> > 2) I needed to also address the advertised hostname / port. With multiple
> > listeners it is not clear anymore which one should be used. I saw as one
> > option to add advertised.listeners option and some modified version of
> > inter.broker.listener.name option to follow what is done in Kafka
> > brokers. But for the Connect REST interface, we do not advertise the
> > address to the clients like in Kafka broker. So we only need to tell
> other
> > workers how to connect - and for that we need only one advertised
> address.
> > So I decided to reuse the existing rest.advertised.host.name and
> > rest.advertised.port options and add additional option
> > rest.advertised.security.protocol to specify whether HTTP or HTTPS
> should
> > be used. Does this make sense to you? DO you think this is the right
> > approach?
> >
> > Thanks & Regards
> > Jakub
> >
> > On Mon, Oct 16, 2017 at 6:34 PM, Randall Hauch  wrote:
> >
> >> The broker's configuration options are "listeners" (plural) and
> >> "listeners.security.protocol.map". I agree that following the pattern
> set
> >> by the broker is better, so these are really good ideas. However, at
> this
> >> point I don't see a need for the "listeners.security.procotol.map",
> which
> >> for the broker must be set if the listener name is not a security
> >> protocol.
> >> Can we not simply just allow "HTTP" and "HTTPS" as the names of the
> >> listeners (rather than the broker's "PLAINTEXT", "SSL", etc.)? If so,
> then
> >> for example "listeners" might be set to "http://myhost:8081,
> >> https://myhost:80";, which seems to work out nicely without needing
> >> listener
> >> names other than security protocols.
> >>
> >> I also like using the worker's SSL and SASL security configs by default
> if
> >> "https" is included in the listener, but allowing the overriding of this
> >> via other additional properties. Here, I'm not a big fan of
> >> "listeners.name.https.*" prefix, which I think is pretty verbose, but I
> >> could see "listener.https.*" as a prefix. This allows us to add other
> >> security protocols at some point, if that ever becomes necessary.
> >>
> >> +1 for continuing down this road. Nice work.
> >>
> >> On Mon, Oct 16, 2017 at 9:51 AM, Ted Yu  wrote:
> >>
> >> > +1 to this proposal.
> >> >
> >> > On Mon, Oct 16

Re: [VOTE] KIP-208: Add SSL support to Kafka Connect REST interface

2018-01-04 Thread Ewen Cheslack-Postava
Jakub,

I left a few comments in the discuss thread, but I'll also reply here just
to bump the VOTE thread's visibility. I would like to resolve the few
comments I left, but I am effectively +1 on this, the comments I left were
mainly details.

Committers that could help with the necessary votes would probably be Gwen
and Jason (but others more than welcome to help out too :)

-Ewen

On Mon, Nov 6, 2017 at 1:52 AM, Jakub Scholz  wrote:

> Hi all,
>
> Just a reminder that htis is still up for vote. I think this is important
> featrue which would deserve your votes.
>
> Regards
> Jakub
>
> On Mon, Oct 30, 2017 at 9:24 PM, Jakub Scholz  wrote:
>
> > Hi,
> >
> > It seems there are no more comments for this KIP, so I would like to
> start
> > the voting .
> >
> > For more details about the KIP-208 go to
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface
> >  208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface>*
> >
> > Thanks & Regards
> > Jakub
> >
>


Re: [DISCUSS] KIP-174 - Deprecate and remove internal converter configs in WorkerConfig

2018-01-04 Thread Ewen Cheslack-Postava
Sorry I lost track of this thread. If things are in good shape we should
probably vote on this and get the deprecation commit through. It seems like
a good idea as this has been confusing to users from day one.

-Ewen

On Wed, Aug 9, 2017 at 5:18 AM, UMESH CHAUDHARY  wrote:

> Thanks Ewen,
> I just edited the KIP to reflect the changes.
>
> Regards,
> Umesh
>
> On Wed, 9 Aug 2017 at 11:00 Ewen Cheslack-Postava 
> wrote:
>
>> Great, looking good. I'd probably be a bit more concrete about the
>> Proposed Changes (e.g., "will log an warning if the config is specified"
>> and "since the JsonConverter is the default, the configs will be removed
>> immediately from the example worker configuration files").
>>
>> Other than that this LGTM and I'll be happy to get rid of those settings!
>>
>> -Ewen
>>
>> On Tue, Aug 8, 2017 at 2:54 AM, UMESH CHAUDHARY 
>> wrote:
>>
>>> Hi Ewen,
>>> Sorry, I am bit late in responding this.
>>>
>>> Thanks for your inputs and I've updated the KIP by adding more details
>>> to it.
>>>
>>> Regards,
>>> Umesh
>>>
>>> On Mon, 31 Jul 2017 at 21:51 Ewen Cheslack-Postava 
>>> wrote:
>>>
>>>> On Sun, Jul 30, 2017 at 10:21 PM, UMESH CHAUDHARY 
>>>> wrote:
>>>>
>>>>> Hi Ewen,
>>>>> Thanks for your comments.
>>>>>
>>>>> 1) Yes, there are some test and java classes which refer these
>>>>> configs, so I will include them as well in "public interface" section of
>>>>> KIP. What should be our approach to deal with the classes and tests which
>>>>> use these configs: we need to change them to use JsonConverter when
>>>>> we plan for removal of these configs right?
>>>>>
>>>>
>>>> I actually meant the references in config/connect-standalone.properties
>>>> and config/connect-distributed.properties
>>>>
>>>>
>>>>> 2) I believe we can target the deprecation in 1.0.0 release as it is
>>>>> planned in October 2017 and then removal in next major release. Let
>>>>> me know your thoughts as we don't have any information for next major
>>>>> release (next to 1.0.0) yet.
>>>>>
>>>>
>>>> That sounds fine. Tough to say at this point what our approach to major
>>>> version bumps will be since the approach to version numbering is changing a
>>>> bit.
>>>>
>>>>
>>>>> 3) Thats a good point and mentioned JIRA can help us to validate the
>>>>> usage of any other converters. I will list this down in the KIP.
>>>>>
>>>>> Let me know if you have some additional thoughts on this.
>>>>>
>>>>> Regards,
>>>>> Umesh
>>>>>
>>>>>
>>>>>
>>>>> On Wed, 26 Jul 2017 at 09:27 Ewen Cheslack-Postava 
>>>>> wrote:
>>>>>
>>>>>> Umesh,
>>>>>>
>>>>>> Thanks for the KIP. Straightforward and I think it's a good change.
>>>>>> Unfortunately it is hard to tell how many people it would affect
>>>>>> since we
>>>>>> can't tell how many people have adjusted that config, but I think
>>>>>> this is
>>>>>> the right thing to do long term.
>>>>>>
>>>>>> A couple of quick things that might be helpful to refine:
>>>>>>
>>>>>> * Note that there are also some references in the example configs
>>>>>> that we
>>>>>> should remove.
>>>>>> * It's nice to be explicit about when the removal is planned. This
>>>>>> lets us
>>>>>> set expectations with users for timeframe (especially now that we
>>>>>> have time
>>>>>> based releases), allows us to give info about the removal timeframe
>>>>>> in log
>>>>>> error messages, and lets us file a JIRA against that release so we
>>>>>> remember
>>>>>> to follow up. Given the update to 1.0.0 for the next release, we may
>>>>>> also
>>>>>> need to adjust how we deal with deprecations/removal if we don't want
>>>>>> to
>>>>>> have to wait all the way until 2.0 to remove (though it is unclear how
>>>&

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-04 Thread Ewen Cheslack-Postava
Very late to the game here, but a few thoughts:

1. Regarding whether KIP is necessary, I don't mind doing it for
documentation sake, but I would classify any mishandling of connector names
here as a bug. Which doesn't require a KIP to fix.

2. For support of characters, Kafka has some history of just being
restrictive (e.g., see topic name restrictions), but I personally disagree
with this approach. I think it is better to be liberal in what we accept
and just document limitations. I think our default should be to accept any
user input and document why we can't handle certain inputs and how the user
should adapt if we can't. In general I try to work under the robustness
principle: *Be conservative in what you do, be liberal in what you accept
from others*

3. Related to 2, there were some cases like whitespace-only connector
names. This seems extremely weird and not critical, so I'm fine not
supporting it officially, but technically I don't see any reason it
shouldn't be supported with any appropriate escaping (i.e. what would it
break for us?).

But in general, I think just being more explicit about expectations is
great and it'd be great to set baseline expectations.

-Ewen



On Mon, Nov 20, 2017 at 12:33 AM, Sönke Liebau <
soenke.lie...@opencore.com.invalid> wrote:

> @Randall: are you happy with the KIP as it stands so I can call for a vote,
> or are there any outstanding items still to discuss?
>
> Same question to anybody else who'd like to participate of course :)
>
> On Thu, Nov 16, 2017 at 5:35 PM, Sönke Liebau 
> wrote:
>
> > Sounds good. I've added a few sentences to this effect to the KIP.
> >
> > On Thu, Nov 16, 2017 at 5:02 PM, Randall Hauch  wrote:
> >
> >> Nice job updating the KIP. The PR (
> >> https://github.com/apache/kafka/pull/2755/files) for the proposed
> >> implementation does prevent names from being empty, and it trims
> >> whitespace
> >> from the name only when creating a new connector. However, the KIP's
> >> "Proposed Change" section should probably be very clear about this, and
> >> the
> >> migration section should address how a connector that was created with
> >> leading and/or trailing whitespace characters will still be able to be
> >> updated and deleted. I think that decreases the likelihood of this
> change
> >> negatively impacting existing users. Basically, going forward, the names
> >> of
> >> new connectors will be trimmed.
> >>
> >> WDYT?
> >>
> >> On Thu, Nov 16, 2017 at 9:32 AM, Sönke Liebau <
> >> soenke.lie...@opencore.com.invalid> wrote:
> >>
> >> > I've added some more detail to the KIP [1] around current scenarios
> that
> >> > might break in the future. I actually came up with a second limitation
> >> that
> >> > we'd impose on users and also documented this.
> >> >
> >> > Let me know what you think.
> >> >
> >> > Kind regards,
> >> > Sönke
> >> >
> >> > [1]
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > 212%3A+Enforce+set+of+legal+characters+for+connector+names
> >> >
> >> >
> >> > On Thu, Nov 16, 2017 at 2:59 PM, Sönke Liebau <
> >> soenke.lie...@opencore.com>
> >> > wrote:
> >> >
> >> > > Hi Randall,
> >> > >
> >> > > I had mentioned this edge case in the KIP, but will add some further
> >> > > detail to further clarify all changing scenarios post pull request.
> >> > >
> >> > > Kind regards,
> >> > > Sönke
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Nov 16, 2017 at 2:06 PM, Randall Hauch 
> >> wrote:
> >> > >
> >> > >> No, we need to keep the KIP since we want to change/correct the
> >> existing
> >> > >> behavior. But we do need to clarify in the KIP these edge cases
> that
> >> > will
> >> > >> change.
> >> > >>
> >> > >> Thanks for the continued work on this, Sönke.
> >> > >>
> >> > >> Regards,
> >> > >>
> >> > >> Randall
> >> > >>
> >> > >> > On Nov 16, 2017, at 1:56 AM, Sönke Liebau <
> >> soenke.lie...@opencore.com
> >> > >> .INVALID> wrote:
> >> > >> >
> >> > >> > Hi Randall,
> >> > >> >
> >> > >> > zero length definitely works, that's what sent me down this hole
> in
> >> > the
> >> > >> > first place. I had a customer accidentally create a connector
> >> without
> >> > a
> >> > >> > name in his environment and then be unable to delete it. No
> >> connector
> >> > >> name
> >> > >> > doesn't work, as this throws a null pointer exception due to
> >> > KAFKA-4938
> >> > >> ,
> >> > >> > but once that is fixed would create a connector named "null" I
> >> think.
> >> > >> Have
> >> > >> > not retested this, but seen it in the past.
> >> > >> >
> >> > >> > Also, it is possible to create connectors with trailing and
> leading
> >> > >> > whitespaces, this errors out on the create request (which will be
> >> > fixed
> >> > >> > when KAFKA-4827 is merged), but correctly creates the connector
> and
> >> > you
> >> > >> can
> >> > >> > access it if you percent-escape the curl call. This for me is the
> >> main
> >> > >> > reason why a KIP might be needed, as we are changing public
> facing
> >> > >> behavior
> 

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-06 Thread Ewen Cheslack-Postava
re: whitespace characters, I'm fine with the restriction since I don't see
it becoming an issue in practice. I just don't see any reason to restrict
it so it seems like we're going out of our way and doing extra work to be
restrictive, but without clear motivation.

In general my default approach (without context of a specific system) would
be to accept anything that we can encode in UTF-8 and only apply
restrictions where it becomes necessary (e.g. we need to define a delimiter
for some reason). The constraints of URLs introduce some complexity (you
need escaping), but probably generally still allow this. If I can use an
emoji when naming things, then I'm probably happy :) Whitespace characters
definitely have some other issues (e.g. you can have non-visible whitespace
which obscures which connector you're actually working with), but despite
the JIRA linked, I wasn't really convinced they need special handling. It
seems like a really weird issue to encounter in the first place.

-Ewen

On Fri, Jan 5, 2018 at 8:10 AM, Randall Hauch  wrote:

> Sönke, I'm happy with the current proposal.
>
> Ewen, the proposal allows any characters in the name as long as they are
> properly escaped/encoded. That seems to adhere to the robustness principle.
> The only exception is that the proposal trims leading and trailing
> whitespace characters in an attempt to reduce user errors. Can you please
> clarify that you're okay with this behavior? I agree that technically we
> can (and currently do) support whitespace-only names, but users have
> reported this as problematic, and it also would be confusing for most user
> interfaces.
>
> Best regards,
>
> Randall
>
> On Thu, Jan 4, 2018 at 10:31 PM, Ewen Cheslack-Postava 
> wrote:
>
> > Very late to the game here, but a few thoughts:
> >
> > 1. Regarding whether KIP is necessary, I don't mind doing it for
> > documentation sake, but I would classify any mishandling of connector
> names
> > here as a bug. Which doesn't require a KIP to fix.
> >
> > 2. For support of characters, Kafka has some history of just being
> > restrictive (e.g., see topic name restrictions), but I personally
> disagree
> > with this approach. I think it is better to be liberal in what we accept
> > and just document limitations. I think our default should be to accept
> any
> > user input and document why we can't handle certain inputs and how the
> user
> > should adapt if we can't. In general I try to work under the robustness
> > principle: *Be conservative in what you do, be liberal in what you accept
> > from others*
> >
> > 3. Related to 2, there were some cases like whitespace-only connector
> > names. This seems extremely weird and not critical, so I'm fine not
> > supporting it officially, but technically I don't see any reason it
> > shouldn't be supported with any appropriate escaping (i.e. what would it
> > break for us?).
> >
> > But in general, I think just being more explicit about expectations is
> > great and it'd be great to set baseline expectations.
> >
> > -Ewen
> >
> >
> >
> > On Mon, Nov 20, 2017 at 12:33 AM, Sönke Liebau <
> > soenke.lie...@opencore.com.invalid> wrote:
> >
> > > @Randall: are you happy with the KIP as it stands so I can call for a
> > vote,
> > > or are there any outstanding items still to discuss?
> > >
> > > Same question to anybody else who'd like to participate of course :)
> > >
> > > On Thu, Nov 16, 2017 at 5:35 PM, Sönke Liebau <
> > soenke.lie...@opencore.com>
> > > wrote:
> > >
> > > > Sounds good. I've added a few sentences to this effect to the KIP.
> > > >
> > > > On Thu, Nov 16, 2017 at 5:02 PM, Randall Hauch 
> > wrote:
> > > >
> > > >> Nice job updating the KIP. The PR (
> > > >> https://github.com/apache/kafka/pull/2755/files) for the proposed
> > > >> implementation does prevent names from being empty, and it trims
> > > >> whitespace
> > > >> from the name only when creating a new connector. However, the KIP's
> > > >> "Proposed Change" section should probably be very clear about this,
> > and
> > > >> the
> > > >> migration section should address how a connector that was created
> with
> > > >> leading and/or trailing whitespace characters will still be able to
> be
> > > >> updated and deleted. I think that decreases the likelihood of this
> > > change
> > > >&g

Re: [EXTERNAL] [VOTE] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2019-01-23 Thread Ewen Cheslack-Postava
+1 (binding)

My only final comment would be that the topic.creation.enable setting could
potentially be left out. We're still backwards compatible with what we
promised before (or at least compatibility is debatable). You could enforce
not being able to create topics with ACLs anyway, so avoiding the extra
config might be worth the not-quite-perfect compatibility story.

-Ewen

On Tue, Sep 11, 2018 at 7:01 AM Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> +1 (non binding)
>
> On Tue., 11 Sep. 2018, 10:48 am Mickael Maison, 
> wrote:
>
> > +1 (non-binding)
> > Thanks for the KIP!
> > On Tue, Sep 11, 2018 at 8:40 AM McCaig, Rhys 
> > wrote:
> > >
> > > Looks great Randall
> > > +1 (non-binding)
> > >
> > > > On Sep 9, 2018, at 7:17 PM, Gwen Shapira  wrote:
> > > >
> > > > +1
> > > > Useful improvement, thanks Randall.
> > > >
> > > >
> > > > On Fri, Sep 7, 2018, 3:28 PM Randall Hauch  wrote:
> > > >
> > > >> I believe the feedback on KIP-158 has been addressed. I'd like to
> > start a
> > > >> vote.
> > > >>
> > > >> KIP:
> > > >>
> > > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-158%3A+Kafka+Connect+should+allow+source+connectors+to+set+topic-specific+settings+for+new+topics
> > > >>
> > > >> Discussion Thread:
> > > >> https://www.mail-archive.com/dev@kafka.apache.org/msg73775.html
> > > >>
> > > >> Thanks!
> > > >>
> > > >> Randall
> > > >>
> > >
> >
>


Re: [VOTE] KIP-421: Support resolving externalized secrets in AbstractConfig

2019-01-24 Thread Ewen Cheslack-Postava
> It allows _all_ existing clients of the class, e.g. those in Apache
Kafka or in applications written by other people that use the class, to get
this functionality for free, i.e. without any code changes.  (I realize
this is probably where the 'unexpected effects' comes from).

> Personally, I think we should do our best to make this work seamlessly /
transparently, because we're likely going to have this functionality for a
long time.

Connect (and connectors that may also use AbstractConfig for themselves
since they are supposed to expose a ConfigDef anyway) could definitely be
an issue. I'd imagine formats like this are rare, but we do know there are
some cases where people add new syntax, e.g. the landoop connectors support
some sort of inline sql-like transformation. I don't know of any cases that
would specifically conflict with the syntax, but there is some risk.

I agree getting it automated would be ideal, and it is probably more
reasonable to claim any issues would be unlike if unresolvable cases don't
result in an exception. On the other hand, I think the vast majority of the
benefit would come from making this work for brokers, Connect, and Streams
(and in most applications making this work is pretty trivial given the
answer to question (1) is that it works by passing same config to the
static method then constructor).

Tying this discussion also back to the question about subscribing for
updates, apps would commonly need modification to support that, and I think
ideally you want to be using some sort of KMS where rotation is done
automatically and you need to subscribe to updates. Since it's a pretty
common pattern to only look up configs once instead of always going back to
the AbstractConfig, you'd really only be able to get some of the long term
intended benefit of this improvement. We should definitely have a follow up
to this that deals with the subscriptions, but I think the current scope is
still a useful improvement -- Connect got this implemented because exposure
of secrets via REST API was such a big problem. Making the changes in
AbstractConfig is a better long term solution so we can get this working
with all components.

Regarding brokers, I think if we want to avoid exposing secrets over
DescribeConfigs responses, we'd probably need changes similar to those we
needed to make for the Connect REST API. Also agree we'd need to think
about how to make this work with dynamic configs (which would also be a
nice thing to extend to, e.g., Connect).

As a practical suggestion, while it doesn't give you the update for free,
we could consider also deprecating the existing constructor to encourage
people to update. Further, if we're worried about confusion about how to
load the two files, we could have a constructor that does that default
pattern for you.

-Ewen

On Thu, Jan 24, 2019 at 11:36 AM Colin McCabe  wrote:

> On Thu, Jan 24, 2019, at 11:25, TEJAL ADSUL wrote:
> >
> >
> > On 2019/01/24 17:26:02, Andy Coates  wrote:
> > > I'm wondering why we're rejected changing AbstractConfig to
> automatically
> > > resolve the variables?
> > >
> > > > 1. Change AbstractConfig to *automatically* resolve variables of the
> form
> > > specified in KIP-297. This was rejected because it would change the
> > > behavior of existing code and might cause unexpected effects.
> > >
> > > Doing so seems to me to have two very large benefits:
> > >
> > > 1. It allows the config providers to be defined within the same file
> as the
> > > config that uses the providers, e.g.
> > >
> > > config.providers=file,vault
> > > <
> https://cwiki.apache.org/confluence/display/KAFKA/config.providers=file,vault
> >
> > > config.providers.file.
> > > <
> https://cwiki.apache.org/confluence/display/KAFKA/config.providers.file.>
> > > class=org.apache.kafka.connect.configs.FileConfigProvider
> > > <
> https://cwiki.apache.org/confluence/display/KAFKA/org.apache.kafka.connect.configs.FileConfigProvider
> >
> > > config.providers.file.param.path=
> > > <
> https://cwiki.apache.org/confluence/display/KAFKA/config.providers.file.other.prop=another
> >
> > > /mnt/secrets/passwords
> > >
> > > foo.baz=/usr/temp/
> > > 
> > > foo.bar=$  >
> > > {file:/path/to/variables.properties:foo.bar}
> > >
> > > Is this possible with what's currently being proposed? i.e could you
> load
> > > the file and pass the map first to `loadConfigProviders` and then
> again to
> > > the constructor?
> > >
> > > 2. It allows _all_ existing clients of the class, e.g. those in Apache
> > > Kafka or in applications written by other people that use the class,
> to get
> > > this functionality for free, i.e. without any code changes.  (I realize
> > > this is probably where the 'unexpected effects' comes from).
> > >
> > > I'm assuming the unexpected side effects come about if an existing
> > > properties file already contains 

Re: [DISCUSS] KIP-415: Incremental Cooperative Rebalancing in Kafka Connect

2019-01-25 Thread Ewen Cheslack-Postava
I was going to make a related comment about connect.protocol. Even if we
have the config, we should default it to compatible given v0 state is small
and we believe v1 is better and people should migrate to it.

While I like getting rid of configs, not sure whether we should remove it
here. If we think the protocol might continue to evolve, just setting us up
with a method to do this cleanly without just defaulting to including all
versions would be better. That said, it took 3+ years to get to v1 and I'm
not sure we're aware of any blatant deficiencies in this version, so maybe
we won't ever get to v2 anyway...

-Ewen

On Fri, Jan 25, 2019 at 9:38 AM Jason Gustafson  wrote:

> Hey Konstantine,
>
> Thanks for the reply. Just one response below:
>
> In 'compatible' mode, the worker sends both protocols to the broker
> > coordinator during the Join request. The field is already a list of
> > ProtocolMetadata. The broker, after gathering all the requests follows a
> > process of selecting the first choice that is common among all the
> workers.
>
>
> Discussed briefly offline. The point I was trying to make is that the
> consumer doesn't know when a rebalance begins whether the coordinator will
> decide to use the "eager" or "cooperative" protocol. The question is how
> that affects cooperative semantics. In other words, is it safe to allow
> tasks to continue executing when a rebalance begins even if the coordinator
> ultimate decides to use the "eager" protocol? If it is, then I think the
> value of the `connect.protocol` configuration is diminished. We can just
> implement the "compatible" behavior and we won't need the two rolling
> restart upgrade. The size of the join state in v0 is small, so I don't see
> much downside to retaining compatibility and our users will thank us for
> it.
>
> Best,
> Jason
>
>
> On Thu, Jan 24, 2019 at 7:04 PM Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> > Thank you for the questions Cyrus! Replies inline:
> >
> > On Thu, Jan 24, 2019 at 6:03 PM Cyrus Vafadari 
> wrote:
> >
> > > "scheduled.rebalance.max.delay.ms" -- "...the actual delay used by the
> > > leader to hold off redistribution of connectors and tasks and maintain
> > > imbalance may be less or equal to this value."   How is the actual
> delay
> > > determined and specified in the AssignmentFlatBuffer? I see this
> defaults
> > > to a max of 5 minutes -- how will a 5 minute delay for rebalancing
> affect
> > > current users who don't explicitly set this config?
> > >
> >
> > Current users that don't choose to enable the new protocol are not
> > affected. They keep running Connect with the current behavior.
> > The leader of the group sets the actual delay. It's a hint to the other
> > workers not to join for rebalance before the delay expires.
> > But it's not a hard requirement. A new worker might join in the meantime,
> > or another worker might leave the group while a delay is in effect.
> >
> >
> > >
> > > If both this KIP and "Static Membership" KIP are merged, is there a
> case
> > > where any user would use static membership over incremental rebalancing
> > > (which has less configuration/user-input)? If yes, can you elaborate
> why
> > a
> > > user would use both side-by-side, or if there is situation where one is
> > an
> > > obvious "best" choice? I raise the concern with usability/config-sprawl
> > in
> > > mind, that if this KIP deprecates a config property, we should note
> that.
> > > explicitly.
> > >
> >
> >
> > As I mentioned above, KIP-345 and KIP-415 conceptually share some common
> > goals. They avoid unnecessary re-assignments of resources to the members
> of
> > the group.
> > In their current form, they can't be combined immediately, even after the
> > code is merged, because KIP-345 is targeting Consumer groups and KIP-415
> > will be implemented for Connect worker groups. KIP-345 is more precise in
> > re-assignment by requiring a unique id for each group member. KIP-415
> > tolerates temporary imbalances and is more broad in the sense that
> > addresses group resizes as well. Static membership could be used along
> with
> > incremental cooperative rebalancing in the future to improve assignment
> of
> > resources (e.g. topic-partitions or connectors and tasks). Therefore,
> even
> > if they are combined for the same group at some point, I don't foresee an
> > immediate need to deprecate one over the other.
> >
> > Cheers,
> > Konstantine
> >
> >
> >
> > > On Tue, Jan 22, 2019 at 6:26 PM Konstantine Karantasis <
> > > konstant...@confluent.io> wrote:
> > >
> > > > Hi Randall, thanks for you comments! Replying inline below:
> > > >
> > > >
> > > > On Sat, Jan 19, 2019 at 4:26 PM Randall Hauch 
> > wrote:
> > > >
> > > > > Thanks for all this work, Konstantine.
> > > > >
> > > > > I have a question about when a member leaves. Here's the partial
> > > > scenario,
> > > > > repeated from the KIP:
> > > > >
> > > > >
> > > > > Initial group and assignment: W1([AC0, AT1]), W2([

Re: [VOTE] KIP-415: Incremental Cooperative Rebalancing in Kafka Connect

2019-03-14 Thread Ewen Cheslack-Postava
+1 (binding)

-Ewen

On Wed, Mar 13, 2019 at 2:04 PM Randall Hauch  wrote:

> Excellent work, Konstantine!
>
> +1 (binding)
>
> On Mon, Mar 11, 2019 at 8:05 PM Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> > Thanks Jason!
> > That makes perfect sense. The change is reflected in the KIP now.
> > "compatible" will be the default mode for "connect.protocol"
> >
> > Cheers,
> > Konstantine
> >
> >
> > On Mon, Mar 11, 2019 at 4:31 PM Jason Gustafson 
> > wrote:
> >
> > > +1 Thanks for all the work on this. My only minor comment is that
> > > `connect.protocol` probably should be `compatible` by default. The cost
> > is
> > > low and it will save upgrade confusion.
> > >
> > > Best,
> > > Jason
> > >
> > > On Fri, Mar 8, 2019 at 10:37 AM Robert Yokota 
> > wrote:
> > >
> > > > Thanks for the great KIP Konstantine!
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Robert
> > > >
> > > > On Thu, Mar 7, 2019 at 2:56 PM Guozhang Wang 
> > wrote:
> > > >
> > > > > Thanks Konstantine, I've read the updated section on
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
> > > > > and it lgtm.
> > > > >
> > > > > I'm +1 on the KIP.
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > >
> > > > > On Thu, Mar 7, 2019 at 2:35 PM Konstantine Karantasis <
> > > > > konstant...@confluent.io> wrote:
> > > > >
> > > > > > Thanks Guozhang. This is a valid observation regarding the
> current
> > > > status
> > > > > > of the PR.
> > > > > >
> > > > > > I updated the KIP to explicitly call out how the downgrade
> process
> > > > should
> > > > > > work in the section Compatibility, Deprecation, and Migration.
> > > > > >
> > > > > > Additionally, I reduced the configuration modes for the
> > > > connect.protocol
> > > > > to
> > > > > > only two: eager and compatible.
> > > > > > That's because there's no way at the moment to select a protocol
> > > based
> > > > on
> > > > > > simple majority and not unanimity across at least one option for
> > the
> > > > > > sub-protocol.
> > > > > > Therefore there's no way to lock a group of workers in a
> > > > cooperative-only
> > > > > > mode at the moment, if we account for accidental joins of workers
> > > > running
> > > > > > at an older version.
> > > > > >
> > > > > > The changes have been reflected in the KIP doc and will be
> > reflected
> > > in
> > > > > the
> > > > > > PR in a subsequent commit.
> > > > > >
> > > > > > Thanks,
> > > > > > Konstantine
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 7, 2019 at 1:17 PM Guozhang Wang  >
> > > > wrote:
> > > > > >
> > > > > > > Hi Konstantine,
> > > > > > >
> > > > > > > Thanks for the updated KIP and the PR as well (which is huge
> :) I
> > > > > briefly
> > > > > > > looked through it as well as the KIP, and I have one minor
> > comment
> > > to
> > > > > add
> > > > > > > (otherwise I'm binding +1 on it as well) about the backward
> > > > > > compatibility.
> > > > > > > I'll use one example to illustrate the issue:
> > > > > > >
> > > > > > > 1) Suppose you have workerA and B on newer version and
> configured
> > > the
> > > > > > > connect.protocol as "compatible", they will send both V0/V1 to
> > the
> > > > > leader
> > > > > > > (say it's workerA) who will choose V1 as the current protocol,
> > this
> > > > > will
> > > > > > be
> > > > > > > sent back to A and B who would remember the current protocol
> > > version
> > > > is
> > > > > > > already V1. So after this rebalance everyone remembers that V1
> > can
> > > be
> > > > > > used,
> > > > > > > which means that upon prepareJoin they will not revoke all the
> > > > assigned
> > > > > > > tasks.
> > > > > > >
> > > > > > > 2) Now let's say a new worker joins but with old version V0
> > > > > (practically
> > > > > > > this is rare, but for illustration purposes some common
> scenarios
> > > may
> > > > > > falls
> > > > > > > into this, e.g. an existing worker being downgraded, which is
> > > > > essentially
> > > > > > > as being kicked out of the group, and then rejoined as a new
> > member
> > > > on
> > > > > > the
> > > > > > > older version), the leader realized that at least one of the
> > member
> > > > > does
> > > > > > > not know V1 and hence would fall back to use version V0 to
> > perform
> > > > > > > assignment. V0 algorithm would do eager rebalance which may
> move
> > > some
> > > > > > tasks
> > > > > > > to the new comer immediately from the existing members, as it
> > > assumes
> > > > > > that
> > > > > > > everyone would revoke everything before join (a.k.a the
> > > sync-barrier)
> > > > > but
> > > > > > > this is actually not true, since everyone other than the old
> > > > versioned
> > > > > > new
> > > > > > > comer would still follow the behavior of V1 --- not revoking
> > > anything
> > > > > ---
> > > > > > > before sending the join group request.
> > > > > > >
> > > > > > > This could be solvable though, e.g. when leader 

Re: [VOTE] 2.2.0 RC2

2019-03-21 Thread Ewen Cheslack-Postava
+1

-Ewen

On Thu, Mar 21, 2019 at 10:33 AM Harsha  wrote:

> +1 (non-bidning)
>  - Download artifacts, setup 3 node cluster
> - Ran producer/consumer clients
>
> Thanks,
> Harsha
>
> On Thu, Mar 21, 2019, at 5:54 AM, Andrew Schofield wrote:
> > +1 (non-binding)
> >
> > - Downloaded the artifacts
> > - Ran Kafka Connect connectors
> >
> > Thanks,
> > Andrew Schofield
> > IBM Event Streams
> >
> > On 19/03/2019, 19:13, "Manikumar"  wrote:
> >
> > +1 (non-binding)
> >
> > - Verified the artifacts, build from src, ran tests
> > - Verified the quickstart, ran producer/consumer performance tests.
> >
> > Thanks for running release!.
> >
> > Thanks,
> > Manikumar
> >
> > On Wed, Mar 20, 2019 at 12:19 AM David Arthur <
> davidart...@apache.org>
> > wrote:
> >
> > > +1
> > >
> > > Validated signatures, and ran through quick-start.
> > >
> > > Thanks!
> > >
> > > On Mon, Mar 18, 2019 at 4:00 AM Jakub Scholz 
> > wrote:
> > >
> > > > +1 (non-binding). I used the staged binaries and run some of my
> > tests
> > > > against them. All seems to look good to me.
> > > >
> > > > On Sat, Mar 9, 2019 at 11:56 PM Matthias J. Sax
> > 
> > > > wrote:
> > > >
> > > > > Hello Kafka users, developers and client-developers,
> > > > >
> > > > > This is the third candidate for release of Apache Kafka 2.2.0.
> > > > >
> > > > >  - Added SSL support for custom principal name
> > > > >  - Allow SASL connections to periodically re-authenticate
> > > > >  - Command line tool bin/kafka-topics.sh adds AdminClient
> > support
> > > > >  - Improved consumer group management
> > > > >- default group.id is `null` instead of empty string
> > > > >  - API improvement
> > > > >- Producer: introduce close(Duration)
> > > > >- AdminClient: introduce close(Duration)
> > > > >- Kafka Streams: new flatTransform() operator in Streams
> > DSL
> > > > >- KafkaStreams (and other classed) now implement
> > AutoClosable to
> > > > > support try-with-resource
> > > > >- New Serdes and default method implementations
> > > > >  - Kafka Streams exposed internal client.id via ThreadMetadata
> > > > >  - Metric improvements:  All `-min`, `-avg` and `-max`
> > metrics will now
> > > > > output `NaN` as default value
> > > > > Release notes for the 2.2.0 release:
> > > > >
> >
> https://eur02.safelinks.protection.outlook.com/?url=https:%2F%2Fhome.apache.org%2F~mjsax%2Fkafka-2.2.0-rc2%2FRELEASE_NOTES.html&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079314852&sdata=zBUbQlQiAuGZzs33TUPUqsuC8IpPavg2lT3yPFO%2F3nA%3D&reserved=0
> > > > >
> > > > > *** Please download, test, and vote by Thursday, March 14,
> > 9am PST.
> > > > >
> > > > > Kafka's KEYS file containing PGP keys we use to sign the
> > release:
> > > > >
> >
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2FKEYS&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079314852&sdata=g1Gg%2BoIRgpKUum5%2Bmi2plT1qIfH9d2aZkdK9jw7DLxM%3D&reserved=0
> > > > >
> > > > > * Release artifacts to be voted upon (source and binary):
> > > > >
> >
> https://eur02.safelinks.protection.outlook.com/?url=https:%2F%2Fhome.apache.org%2F~mjsax%2Fkafka-2.2.0-rc2%2F&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079324862&sdata=dUZrMCGvR4ki8XS%2B9dEDQ5Bavv4A4xq86CtcXQ6tnFs%3D&reserved=0
> > > > >
> > > > > * Maven artifacts to be voted upon:
> > > > >
> >
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fkafka%2F&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079324862&sdata=sCoRIXmcRQd473bRqwFgQaSm2XI%2BBqHw%2FbiddQd4hnE%3D&reserved=0
> > > > >
> > > > > * Javadoc:
> > > > >
> >
> https://eur02.safelinks.protection.outlook.com/?url=https:%2F%2Fhome.apache.org%2F~mjsax%2Fkafka-2.2.0-rc2%2Fjavadoc%2F&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079324862&sdata=iK4WEFuaK0lCySWROi7BbBv%2Bpg8h%2B9umbVNA7I1rqxc%3D&reserved=0
> > > > >
> > > > > * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
> > > > >
> >
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkafka%2Freleases%2Ftag%2F2.2.0-rc2&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079324862&sdata=UYH6TgZ%2Fiki82Ep5kp6V8f6K6A1PMSLSmnR29OfJxmc%3D&reserved=0
> > > > >
> > > > > * Documentation:
> > > > >
> >
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F22%2Fdocument

Re: [DISCUSS] KIP-172 Add regular-expression topic support for sink connector

2017-07-10 Thread Ewen Cheslack-Postava
Kenji,

This looks great. I'd probably make a couple of minor changes for clarity:

1. In public interfaces section, I'd either round it out with client
configuration via REST proxy or just simplify to "Adds an optional field to
sink connector configuration"
2. Maybe clarify that if you specify both you'll throw a ConfigException.
3. For proposed changes part #4, maybe specify that this just gets passed
to the underlying consumer. This basically passes any details that should
be specified around the performance of the feature, how fast it sees new
topics, etc, can be passed off to the consumer docs.
4. Fill out the compatibility section, which should basically just say that
since this is only adding a new config, there are no compatibility
concerns. (Technically there's is a risk if some connector uses topic.regex
already, but I highly doubt that will be an issue.

These are all pretty minor details just to clean up and finalize this, from
my perspective this is probably ready for vote after these tweaks since
this should be a pretty uncontroversial KIP.

-Ewen

On Tue, Jun 27, 2017 at 12:57 AM, 郭健  wrote:

> Great idea! We have similar demands so I have implemented this function in
> a MySQL source connector, see:
> https://github.com/Aegeaner/kafka-connector-mysql/blob/
> master/src/main/java/org/apache/kafka/connect/mysql/
> EventRecordFactory.java#L347
>
> Thanks,
> Aegeaner
>
>
> On 6/27/17, 11:45, "Kenji Hayashida"  wrote:
>
> Hi all,
>
> I have raised a new KIP at
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
> 172%3A+Add+regular-expression+topic+support+for+sink+connector
>
> The corresponding JIRA is at https://issues.apache.org/
> jira/browse/KAFKA-3073
>
> I look forward to your feedback.
>
> Thanks,
> Kenji Hayashida
>
>
>


Re: [DISCUSS] 2017 October release planning and release version

2017-07-20 Thread Ewen Cheslack-Postava
Did we deprecate anything in 0.11.0? The one concern with bumping major
versions in consecutive releases is that you may not give people the room
for transition if you deprecate and then immediately remove in the next
release.

-Ewen

On Thu, Jul 20, 2017 at 4:51 AM, Damian Guy  wrote:

> +1 on 1.0!
> Are we also going to move to java 8?
> I also think we should drop the Unstable annotations completely.
>
> Cheers,
> Damian
>
> On Wed, 19 Jul 2017 at 21:36 Guozhang Wang  wrote:
>
> > Hi Stevo,
> >
> > Just trying to add to what Ismael has already replied you:
> >
> >
> > > Practice/"features" like protocol version being a parameter, and
> > defaulting
> > > to latest so auto updated with dependency update which introduces new
> > > protocol/behavior should not be used in public client APIs. To switch
> > > between backward incompatible APIs (contract and behaviors), ideally
> user
> > > should explicitly have to change code and not dependency only, but at
> > least
> > > it should be clearly communicated that there are breaking changes to
> > expect
> > > even with just dependency update by e.g. giving major version release
> > clear
> > > meaning. If app dependency on Kafka client library minor.patch on same
> > > major is updated, and if there's a change in behavior or API requiring
> > app
> > > code change - it's a bug.
> > >
> > > Change introduced contrary to the SLO, is OK to be reported as bug.
> > > Everything else is improvement or feature request.
> > >
> > > If this was the case, and 1.0.0 was released today with APIs as they
> are
> > > now, Scala client APIs even though deprecated would not break and
> require
> > > refactoring with every 1.* minor/patch release, and would only be
> allowed
> > > to be broken or removed in future major release, like 2.0.0
> >
> > Just to clarify, my proposal is that moving forward beyond the next
> release
> > we will not make any public API breaking changes in any of the major or
> > minor releases, but will only mark them as "deprecated", and deprecated
> > public APIs will be only considered for removing as early as the next
> major
> > release: so if we mark the scala consumer APIs as deprecated in 1.0.0, we
> > should only be consider removing it at 2.0.0 or even later.
> >
> > > It should be also clear how long is each version supported - e.g. if
> > > minor.patch had meaning that there are no backward incompatible
> changes,
> > > it's OK to file a bug only for current major.minor.patch; previous
> major
> > > and its last minor.patch can only have patches released up to some time
> > > like 1 up to 3 months.
> >
> > Currently in practice we have not ever done, for example a bugfix release
> > on an older major / minor release: i.e. once we have released say
> 0.10.2.0
> > we did not release 0.10.1.2 any more. So practically speaking we do not
> > have a "support period" for older versions yet, and in the next coming
> > release I do not have plans to propose some concrete policy for that
> > matter.
> >
> >
> > Guozhang
> >
> >
> >
> > On Wed, Jul 19, 2017 at 2:12 AM, Ismael Juma  wrote:
> >
> > > Hi Stevo,
> > >
> > > Thanks for your feedback. We should definitely do a better job of
> > > documenting things. We basically follow semantic versioning, but it's
> > > currently a bit confusing because:
> > >
> > > 1. There are 4 segments in the version. The "0." part should be ignored
> > > when deciding what is major, minor and patch at the moment, but many
> > people
> > > don't know this. Once we move to 1.0.0, that problem goes away.
> > >
> > > 2. To know what is a public API, you must check the Javadoc (
> > > https://kafka.apache.org/0110/javadoc/index.html?org/apache/
> > > kafka/clients/consumer/KafkaConsumer.html).
> > > If it's not listed there, it's not public API. Ideally, it would be
> > obvious
> > > from the package name (i.e. there would be "internals" in the name),
> but
> > we
> > > are not there yet. The exception are the old Scala APIs, but they have
> > all
> > > been deprecated and they will be removed eventually (the old Scala
> > > consumers won't be removed until the June 2018 release at the earliest
> in
> > > order to give people time to migrate).
> > >
> > > 3. Even though we are following reasonably common practices, we haven't
> > > documented them all in one place. It would be great to do it during the
> > > next release cycle.
> > >
> > > A few comments below.
> > >
> > > On Wed, Jul 19, 2017 at 1:31 AM, Stevo Slavić 
> wrote:
> > >
> > > > - APIs not labeled or labeled as stable
> > > > -- change in major version is only one that can break backward
> > > > compatibility (client APIs or behavior)
> > > >
> > >
> > > To clarify, stable APIs should not be changed in an incompatible way
> > > without a deprecation cycle. Independently of whether it's a major
> > release
> > > or not.
> > >
> > >
> > > > -- change in minor version can introduce new features, but not break
> > > > backward compatibility
> > > > -- change in pat

  1   2   3   4   5   6   7   8   9   10   >