Re: [VOTE] KIP-382 MirrorMaker 2.0

2018-12-24 Thread Stephane Maarek
+1 ! Great stuff

Stephane

On Mon., 24 Dec. 2018, 12:07 pm Edoardo Comar  +1 non-binding
>
> thanks for the KIP
> --
>
> Edoardo Comar
>
> IBM Event Streams
>
>
> Harsha  wrote on 21/12/2018 20:17:03:
>
> > From: Harsha 
> > To: dev@kafka.apache.org
> > Date: 21/12/2018 20:17
> > Subject: Re: [VOTE] KIP-382 MirrorMaker 2.0
> >
> > +1 (binding).  Nice work Ryan.
> > -Harsha
> >
> > On Fri, Dec 21, 2018, at 8:14 AM, Andrew Schofield wrote:
> > > +1 (non-binding)
> > >
> > > Andrew Schofield
> > > IBM Event Streams
> > >
> > > On 21/12/2018, 01:23, "Srinivas Reddy" 
> wrote:
> > >
> > > +1 (non binding)
> > >
> > > Thank you Ryan for the KIP, let me know if you need support in
> > implementing
> > > it.
> > >
> > > -
> > > Srinivas
> > >
> > > - Typed on tiny keys. pls ignore typos.{mobile app}
> > >
> > >
> > > On Fri, 21 Dec, 2018, 08:26 Ryanne Dolan  wrote:
> > >
> > > > Thanks for the votes so far!
> > > >
> > > > Due to recent discussions, I've removed the high-level REST
> > API from the
> > > > KIP.
> > > >
> > > > On Thu, Dec 20, 2018 at 12:42 PM Paul Davidson
> > 
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Would be great to see the community build on the basic
> > approach we took
> > > > > with Mirus. Thanks Ryanne.
> > > > >
> > > > > On Thu, Dec 20, 2018 at 9:01 AM Andrew Psaltis
> >  > > > >
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Really looking forward to this and to helping in any way
> > I can. Thanks
> > > > > for
> > > > > > kicking this off Ryanne.
> > > > > >
> > > > > > On Thu, Dec 20, 2018 at 10:18 PM Andrew Otto
> 
> > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > This looks like a huge project! Wikimedia would be
> > very excited to
> > > > have
> > > > > > > this. Thanks!
> > > > > > >
> > > > > > > On Thu, Dec 20, 2018 at 9:52 AM Ryanne Dolan
> > 
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey y'all, please vote to adopt KIP-382 by replying +1
> to this
> > > > > thread.
> > > > > > > >
> > > > > > > > For your reference, here are the highlights of the
> proposal:
> > > > > > > >
> > > > > > > > - Leverages the Kafka Connect framework and ecosystem.
> > > > > > > > - Includes both source and sink connectors.
> > > > > > > > - Includes a high-level driver that manages connectors
> in a
> > > > dedicated
> > > > > > > > cluster.
> > > > > > > > - High-level REST API abstracts over connectors
> > between multiple
> > > > > Kafka
> > > > > > > > clusters.
> > > > > > > > - Detects new topics, partitions.
> > > > > > > > - Automatically syncs topic configuration between
> clusters.
> > > > > > > > - Manages downstream topic ACL.
> > > > > > > > - Supports "active/active" cluster pairs, as well as
> > any number of
> > > > > > active
> > > > > > > > clusters.
> > > > > > > > - Supports cross-data center replication,
> > aggregation, and other
> > > > > > complex
> > > > > > > > topologies.
> > > > > > > > - Provides new metrics including end-to-end
> > replication latency
> > > > > across
> > > > > > > > multiple data centers/clusters.
> > > > > > > > - Emits offsets required to migrate consumers
> > between clusters.
> > > > > > > > - Tooling for offset translation.
> > > > > > > > - MirrorMaker-compatible legacy mode.
> > > > > > > >
> > > > > > > > Thanks, and happy holidays!
> > > > > > > > Ryanne
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Paul Davidson
> > > > > Principal Engineer, Ajna Team
> > > > > Big Data & Monitoring
> > > > >
> > > >
> > >
> > >
> >
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


Re: [DISCUSS] KIP-304: Connect runtime mode improvements for container platforms

2018-05-17 Thread Stephane Maarek
Hi Salius

I think you're on the money, but you're not pushing things too far.
This is something I've hoped for a long time.
Let's talk Kafka Connect v2

Kafka Connect Cluster, as you said, are not convenient to work with (the
KIP details drawbacks well). I'm all about containerisation just like
stream apps support (and boasts!).

Now, here's the problem with Kafka Connect. There are three backing topics.
Here's the analysis of how they can evolve:
- Config topic: this one is irrelevant if each connect cluster comes with a
config bundled with the corresponding JAR, as you mentioned in your KIP
- Status topic: this is something I wish was gone too. The consumers have a
coordinator, and I believe the connect workers should have a coordinator
too, for task rebalancing.
- Source Offset topic: only relevant for sources. I wish there was a
__connect_offsets global topic just like for consumers and an
"ConnectOffsetCoordinator" to talk to to retrieve latest committed offset.

If we look above, with a few back-end fundamental transformations, we can
probably make Connect "cluster-less".

What the community would get out of it is huge:
- Connect workers for a specific connector are independent and isolated,
measurable (in CPU and Mem) and auto-scalable
- CI/CD is super easy to integrate, as it's just another container / jar.
- You can roll restart a specific connector and upgrade a JAR without
interrupting your other connectors and while keeping the current connector
from running.
- The topics backing connect are removed except the global one, which
allows you to scale easily in terms of number of connectors
- Running a connector in dev or prod (for people offering connectors) is as
easy as doing a simple "docker run".
- Each consumer / producer settings can be configured at the container
level.
- Each connect process is immutable in configuration.
- Each connect process has its own security identity (right now, you need a
connect cluster per service role, which is a lot of overhead in terms of
backing topic)

Now, I don't have the Kafka expertise to know exactly which changes to make
in the code, but I believe the final idea is achievable.
The change would be breaking for how Kafka Connect is run, but I think
there's a chance to make the change non breaking to how Connect is
programmed. I believe the same public API framework can be used.

Finally, the REST API can be used for monitoring, or the JMX metrics as
usual.

I may be completely wrong, but I would see such a change drive the
utilisation, management of Connect by a lot while lowering the barrier to
adoption.

This change may be big to implement but probably worthwhile. I'd be happy
to provide more "user feedback" on a PR, but probably won't be able to
implement a PR myself.

More than happy to discuss this

Best,
Stephane


Kind regards,
Stephane

[image: Simple Machines]

Stephane Maarek | Developer

+61 416 575 980
steph...@simplemachines.com.au
simplemachines.com.au
Level 2, 145 William Street, Sydney NSW 2010

On 17 May 2018 at 14:42, Saulius Valatka  wrote:

> Hi,
>
> the only real usecase for the REST interface I can see is providing
> health/liveness checks for mesos/kubernetes. It's also true that the API
> can be left as is and e.g. not exposed publicly on the platform level, but
> this would still leave opportunities to accidentally mess something up
> internally, so it's mostly a safety concern.
>
> Regarding the option renaming: I agree that it's not necessary as it's not
> clashing with anything, my reasoning is that assuming some other offset
> storage appears in the future, having all config properties at the root
> level of offset.storage.* _MIGHT_ introduce clashes in the future, so this
> is just a suggestion for introducing a convention of
> offset.storage.., which the existing
> property offset.storage.file.filename already adheres to. But in general,
> yes -- this can be left as is.
>
>
>
> 2018-05-17 1:20 GMT+03:00 Jakub Scholz :
>
> > Hi,
> >
> > What do you plan to use the read-only REST interface for? Is there
> > something what you cannot get through metrics interface? Otherwise it
> might
> > be easier to just disable the REST interface (either in the code, or just
> > on the platform level - e.g. in Kubernetes).
> >
> > Also, I do not know what is the usual approach in Kafka ... but do we
> > really have to rename the offset.storage.* options? The current names do
> > not seem to have any collision with what you are adding and they would
> get
> > "out of sync" with the other options used in connect (status.storage.*
> and
> > config.storage.*). So it seems a bit unnecessary change to me.
> >
> > Jakub
> >
> >
> >
> > On

Re: [DISCUSS] KIP-304: Connect runtime mode improvements for container platforms

2018-05-17 Thread Stephane Maarek
Say you have 50 connectors all with different ACLs and service account.
That's 50 connect clusters to maintain. So 50*3 internal connect topics to
maintain (they can't share the same connect topics because they're
different clusters). At default config we're talking 1500 partitions which
is a lot for a Kafka cluster.

The parallel is with consumer groups. Are all consumer groups backing their
offset in their own topic or in a central topic ? Connect should be able to
achieve the same centrality of Configs.

Finally , Configs should go along with the jar, and not be stored in Kafka,
especially for connectors that have secrets. There's no reason Kafka needs
to have a database secret on its own disk

On Thu., 17 May 2018, 5:55 pm Rahul Singh, 
wrote:

> First sentence fat fingered.
>
> “Just curious as to why there’s an issue with the backing topics for Kafka
> Connect.”
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On May 17, 2018, 6:17 AM -0400, Stephane Maarek <
> steph...@simplemachines.com.au>, wrote:
> > Hi Salius
> >
> > I think you're on the money, but you're not pushing things too far.
> > This is something I've hoped for a long time.
> > Let's talk Kafka Connect v2
> >
> > Kafka Connect Cluster, as you said, are not convenient to work with (the
> > KIP details drawbacks well). I'm all about containerisation just like
> > stream apps support (and boasts!).
> >
> > Now, here's the problem with Kafka Connect. There are three backing
> topics.
> > Here's the analysis of how they can evolve:
> > - Config topic: this one is irrelevant if each connect cluster comes
> with a
> > config bundled with the corresponding JAR, as you mentioned in your KIP
> > - Status topic: this is something I wish was gone too. The consumers
> have a
> > coordinator, and I believe the connect workers should have a coordinator
> > too, for task rebalancing.
> > - Source Offset topic: only relevant for sources. I wish there was a
> > __connect_offsets global topic just like for consumers and an
> > "ConnectOffsetCoordinator" to talk to to retrieve latest committed
> offset.
> >
> > If we look above, with a few back-end fundamental transformations, we can
> > probably make Connect "cluster-less".
> >
> > What the community would get out of it is huge:
> > - Connect workers for a specific connector are independent and isolated,
> > measurable (in CPU and Mem) and auto-scalable
> > - CI/CD is super easy to integrate, as it's just another container / jar.
> > - You can roll restart a specific connector and upgrade a JAR without
> > interrupting your other connectors and while keeping the current
> connector
> > from running.
> > - The topics backing connect are removed except the global one, which
> > allows you to scale easily in terms of number of connectors
> > - Running a connector in dev or prod (for people offering connectors) is
> as
> > easy as doing a simple "docker run".
> > - Each consumer / producer settings can be configured at the container
> > level.
> > - Each connect process is immutable in configuration.
> > - Each connect process has its own security identity (right now, you
> need a
> > connect cluster per service role, which is a lot of overhead in terms of
> > backing topic)
> >
> > Now, I don't have the Kafka expertise to know exactly which changes to
> make
> > in the code, but I believe the final idea is achievable.
> > The change would be breaking for how Kafka Connect is run, but I think
> > there's a chance to make the change non breaking to how Connect is
> > programmed. I believe the same public API framework can be used.
> >
> > Finally, the REST API can be used for monitoring, or the JMX metrics as
> > usual.
> >
> > I may be completely wrong, but I would see such a change drive the
> > utilisation, management of Connect by a lot while lowering the barrier to
> > adoption.
> >
> > This change may be big to implement but probably worthwhile. I'd be happy
> > to provide more "user feedback" on a PR, but probably won't be able to
> > implement a PR myself.
> >
> > More than happy to discuss this
> >
> > Best,
> > Stephane
> >
> >
> > Kind regards,
> > Stephane
> >
> > [image: Simple Machines]
> >
> > Stephane Maarek | Developer
> >
> > +61 416 575 980
> > steph...@simplemachines.com.au
> > simplemachines.com.au
> > Level 2, 145 William Stree

Re: [DISCUSS] KIP-304: Connect runtime mode improvements for container platforms

2018-05-17 Thread Stephane Maarek
Fat fingered too... "Connect source should be able to achieve the same
centrality of offsets "

On Thu., 17 May 2018, 10:27 pm Stephane Maarek, <
steph...@simplemachines.com.au> wrote:

> Say you have 50 connectors all with different ACLs and service account.
> That's 50 connect clusters to maintain. So 50*3 internal connect topics to
> maintain (they can't share the same connect topics because they're
> different clusters). At default config we're talking 1500 partitions which
> is a lot for a Kafka cluster.
>
> The parallel is with consumer groups. Are all consumer groups backing
> their offset in their own topic or in a central topic ? Connect should be
> able to achieve the same centrality of Configs.
>
> Finally , Configs should go along with the jar, and not be stored in
> Kafka, especially for connectors that have secrets. There's no reason Kafka
> needs to have a database secret on its own disk
>
> On Thu., 17 May 2018, 5:55 pm Rahul Singh, 
> wrote:
>
>> First sentence fat fingered.
>>
>> “Just curious as to why there’s an issue with the backing topics for
>> Kafka Connect.”
>>
>> --
>> Rahul Singh
>> rahul.si...@anant.us
>>
>> Anant Corporation
>>
>> On May 17, 2018, 6:17 AM -0400, Stephane Maarek <
>> steph...@simplemachines.com.au>, wrote:
>> > Hi Salius
>> >
>> > I think you're on the money, but you're not pushing things too far.
>> > This is something I've hoped for a long time.
>> > Let's talk Kafka Connect v2
>> >
>> > Kafka Connect Cluster, as you said, are not convenient to work with (the
>> > KIP details drawbacks well). I'm all about containerisation just like
>> > stream apps support (and boasts!).
>> >
>> > Now, here's the problem with Kafka Connect. There are three backing
>> topics.
>> > Here's the analysis of how they can evolve:
>> > - Config topic: this one is irrelevant if each connect cluster comes
>> with a
>> > config bundled with the corresponding JAR, as you mentioned in your KIP
>> > - Status topic: this is something I wish was gone too. The consumers
>> have a
>> > coordinator, and I believe the connect workers should have a coordinator
>> > too, for task rebalancing.
>> > - Source Offset topic: only relevant for sources. I wish there was a
>> > __connect_offsets global topic just like for consumers and an
>> > "ConnectOffsetCoordinator" to talk to to retrieve latest committed
>> offset.
>> >
>> > If we look above, with a few back-end fundamental transformations, we
>> can
>> > probably make Connect "cluster-less".
>> >
>> > What the community would get out of it is huge:
>> > - Connect workers for a specific connector are independent and isolated,
>> > measurable (in CPU and Mem) and auto-scalable
>> > - CI/CD is super easy to integrate, as it's just another container /
>> jar.
>> > - You can roll restart a specific connector and upgrade a JAR without
>> > interrupting your other connectors and while keeping the current
>> connector
>> > from running.
>> > - The topics backing connect are removed except the global one, which
>> > allows you to scale easily in terms of number of connectors
>> > - Running a connector in dev or prod (for people offering connectors)
>> is as
>> > easy as doing a simple "docker run".
>> > - Each consumer / producer settings can be configured at the container
>> > level.
>> > - Each connect process is immutable in configuration.
>> > - Each connect process has its own security identity (right now, you
>> need a
>> > connect cluster per service role, which is a lot of overhead in terms of
>> > backing topic)
>> >
>> > Now, I don't have the Kafka expertise to know exactly which changes to
>> make
>> > in the code, but I believe the final idea is achievable.
>> > The change would be breaking for how Kafka Connect is run, but I think
>> > there's a chance to make the change non breaking to how Connect is
>> > programmed. I believe the same public API framework can be used.
>> >
>> > Finally, the REST API can be used for monitoring, or the JMX metrics as
>> > usual.
>> >
>> > I may be completely wrong, but I would see such a change drive the
>> > utilisation, management of Connect by a lot while lowering the barrier
>> to
>> > adoption.
>> &g

Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-21 Thread Stephane Maarek
+1 non binding

On Mon., 21 May 2018, 2:44 pm Rajini Sivaram, 
wrote:

> Hi Piyush, Thanks for the KIP!
>
> +1 (binding)
>
> Regards,
>
> Rajini
>
> On Sun, May 20, 2018 at 2:53 PM, Andy Coates  wrote:
>
> > Awesome last minute effort Piyush.
> >
> > Really appreciate your time and input,
> >
> > Andy
> >
> > Sent from my iPhone
> >
> > > On 19 May 2018, at 03:43, Piyush Vijay  wrote:
> > >
> > > Updated the KIP.
> > >
> > > 1. New enum field 'ResourceNameType' in Resource and ResourceFilter
> > classes.
> > > 2. modify getAcls() and rely on ResourceNameType' field in Resource to
> > > return either exact matches or all matches based on wildcard-suffix.
> > > 3. CLI changes to identify if resource name is literal or
> wildcard-suffix
> > > 4. Escaping doesn't work and isn't required if we're keeping a separate
> > > path on ZK (kafka-wildcard-acl) to store wildcard-suffix ACLs.
> > > 5. New API keys for Create / Delete / Describe Acls request with a new
> > > field in schemas for 'ResourceNameType'.
> > >
> > > Looks ready to me for the vote, will start voting thread now. Thanks
> > > everyone for the valuable feedback.
> > >
> > >
> > >
> > >
> > > Piyush Vijay
> > >
> > >
> > > Piyush Vijay
> > >
> > >> On Fri, May 18, 2018 at 6:07 PM, Andy Coates 
> wrote:
> > >>
> > >> Hi Piyush,
> > >>
> > >> We're fast approaching the KIP deadline. Are you actively working on
> > this?
> > >> If you're not I can take over.
> > >>
> > >> Thanks,
> > >>
> > >> Andy
> > >>
> > >>> On 18 May 2018 at 14:25, Andy Coates  wrote:
> > >>>
> > >>> OK I've read it now.
> > >>>
> > >>> 1. I see you have an example:
> >  For example: If I want to fetch all ACLs that match ’topicA*’, it’s
> > not
> > >>> possible without introducing new API AND maintaining backwards
> > >>> compatibility.
> > >>> getAcls takes a Resource, right, which would be either a full
> resource
> > >>> name or 'ALL', i.e. '*', right?  The point of the call is to get all
> > ACLs
> > >>> relating to a specific resource, not a partial resource like
> 'topicA*'.
> > >>> Currently, I'm guessing / half-remembering that if you ask it for
> ACLs
> > >> for
> > >>> topic 'foo' it doesn't include global 'ALL' ACLs in the list - that
> > would
> > >>> be a different call.  With the introduction of partial wildcards I
> > think
> > >>> the _most_ backwards compatible change would be to have
> > >>> getAcls("topic:foo") to return all the ACLs, including that affect
> this
> > >>> topic. This could include any '*'/ALL Acls, (which would be a small
> > >>> backwards compatible change), or exclude them as it current does.
> > >>> Excluding any matching partial wildcard acl, e.g. 'f*' would break
> > >>> compatibility IMHO.
> > >>>
> > >>> 2. Example command lines, showing how to add ACLs to specific
> resources
> > >>> that *end* with an asterisk char and adding wildcard-suffixed ACLs,
> > would
> > >>> really help clarify the KIP. e.g.
> > >>>
> > >>> bin/kafka-acls.sh --authorizer-properties
> zookeeper.connect=localhost:
> > >> 2181
> > >>> --add --allow-principal User:Bob --allow-principal User:Alice
> > >> --allow-host
> > >>> 198.51.100.0 --allow-host 198.51.100.1 --operation Read --group
> > my-app-*
> > >>>
> > >>> With the above command I can't see how the code can know if the user
> > >> means
> > >>> a literal group called 'my-app-*', or a wildcard suffix for any group
> > >>> starting with 'my-app-'. Escaping isn't enough as the escape char can
> > >> clash
> > >>> too, e.g. escaping a literal to 'my-app-\*' can still clash with
> > someone
> > >>> wanting a wildcard sufiix matching any group starting with
> 'my-app-\'.
> > >>>
> > >>> So there needs to be a syntax change here, I think.  Maybe some new
> > >>> command line switch to either explicitly enable or disable
> > >>> 'wildcard-suffix' support?  Probably defaulting to wildcard-suffix
> > being
> > >>> on, (better experience going forward), though off is more backwards
> > >>> compatible.
> > >>>
> > >>>
> > >>> 3. Again, examples of how to store ACLs for specific resources that
> > >> *end* with
> > >>> an asterisk and wildcard-suffix ACLs, with any escaping would really
> > >> help.
> > >>>
> > >>>
> > >>>
> >  On 18 May 2018 at 13:55, Andy Coates  wrote:
> > 
> >  Hey Piyush,
> > 
> >  Thanks for getting this in! :D
> > 
> >  About to read now. But just quickly...
> > 
> >  1. I'll read up on the need for getMatchingAcls - but just playing
> > >> devils
> >  advocate for a moment - if a current caller of getAcls() expects it
> to
> >  return the full set of ACLs for a given resource, would post this
> > change
> >  only returning a sub set and requiring them to return
> getMatchingAcls
> > to
> >  get the full set not itself be a break in compatibility? I'm
> thinking
> > >> about
> >  any tooling / UI / etc people may have built on top of this.  If Im
> > >> missing
> >  the point, then maybe a concrete example, (if you've not already
>

Re: [DISCUSS] KIP-310: Add a Kafka Source Connector to Kafka Connect

2018-06-06 Thread Stephane Maarek
Hi Rhys,

I think this will be a great addition.

Are there any performance benchmarks against Mirror Maker available? I'm
interested to know if this is more performant / scalable.
Regarding the implementation, here's some feedback:

- I think it's worth mentioning that this solution does not rely on
consumer groups, and therefore tracking progress may be tricky. Can you
think of a way to expose that?

- Some code can be in config Validator I believe:
https://github.com/Comcast/MirrorTool-for-Kafka-Connect/blob/master/src/main/java/com/comcast/kafka/connect/kafka/KafkaSourceConnector.java#L47

- I think your kip mentions `source.admin.` and `source.consumer.` but I
don't see it reflected yet in the code

- Is there a way to be flexible and merge list and regex, or offer the two
simultaneously ? source_topics=my_static_topic,prefix.* ?

Hope that helps
Stephane

Kind regards,
Stephane

[image: Simple Machines]

Stephane Maarek | Developer

+61 416 575 980
steph...@simplemachines.com.au
simplemachines.com.au
Level 2, 145 William Street, Sydney NSW 2010

On 5 June 2018 at 09:04, McCaig, Rhys  wrote:

> Hi All,
>
> As I didn’t get any comment on this KIP and there has since been an
> additional 2 KIP’s created numbered 308 since, I'm bumping this and
> renaming the KIP to 310 to remove the duplication:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect
>
> Let me know if you have any comments or feedback, would love to hear them.
>
> Cheers,
> Rhys
>
> > On May 28, 2018, at 10:23 PM, McCaig, Rhys 
> wrote:
> >
> > Sorry for the bad link to the KIP, here it is: https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-308%3A+Add+a+Kafka+Source+
> Connector+to+Kafka+Connect
> >
> >> On May 28, 2018, at 10:19 PM, McCaig, Rhys 
> wrote:
> >>
> >> Hi All,
> >>
> >> I added a KIP to include a Kafka Source Connector with Kafka Connect.
> >> Here is the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 308%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 308:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
> >>
> >> Looking forward to your feedback and suggestions.
> >>
> >> Cheers,
> >> Rhys
> >>
> >>
> >
>
>


Re: [DISCUSS] KIP-304: Connect runtime mode improvements for container platforms

2018-06-06 Thread Stephane Maarek
 more depth (maybe someone more familiar with the
> code-base
> > > could offer some insight?).
> > >
> > > Regarding Rahul's questions as to why these topics are a problem, I can
> > > also re-iterate what Stephane said: it's a needless burden to manage.
> > > Though I'm pretty sure different connectors can use the same topics,
> it's
> > > still not nice, since if you're using ACLs to control access, all
> > > connectors will have to be granted access to these topics and there can
> > > potentially be trouble from misbehaving/malicious connectors. Also, I
> > don't
> > > think these topics offer any advantages when it comes to
> centralization:
> > > the only reasons they exist is because the distributed mode has no
> other
> > > way to store information apart from inside kafka topics, whereas if
> > you're
> > > running connectors in containers on kubernetes, you'd store this
> > > configuration in the image/env-vars/configmaps or some other mechanism
> > you
> > > use, the point being that this once again becomes a concern for the
> > > container platform -- once again, just like with Kafka Streams.
> > >
> > > Given all of the above, I'm now inclined to either extend the
> standalone
> > > mode and enable scaling/offset storage in kafka OR better yet,
> introduce
> > a
> > > new runtime mode ("container"? not sure what should it be called). The
> > key
> > > points of this runtime mode would be:
> > >
> > >  - offset storage in a kafka topic
> > >  - only one statically configured connector launched during startup
> > >  - scaling happens via the consumer group protocol
> > >  - only a lightweight "health" REST API that simply informs if the
> > > connector is running
> > >
> > > Obviously this would extend the scope of the KIP, but I'd be willing to
> > > give this a shot.
> > > Waiting for your feedback, once a more clear vision is in place I could
> > > update the KIP.
> > >
> > > Thanks
> > >
> > > 2018-05-17 13:17 GMT+03:00 Stephane Maarek <
> steph...@simplemachines.com.
> > au
> > > >:
> > >
> > > > Hi Salius
> > > >
> > > > I think you're on the money, but you're not pushing things too far.
> > > > This is something I've hoped for a long time.
> > > > Let's talk Kafka Connect v2
> > > >
> > > > Kafka Connect Cluster, as you said, are not convenient to work with
> > (the
> > > > KIP details drawbacks well). I'm all about containerisation just like
> > > > stream apps support (and boasts!).
> > > >
> > > > Now, here's the problem with Kafka Connect. There are three backing
> > > topics.
> > > > Here's the analysis of how they can evolve:
> > > > - Config topic: this one is irrelevant if each connect cluster comes
> > > with a
> > > > config bundled with the corresponding JAR, as you mentioned in your
> KIP
> > > > - Status topic: this is something I wish was gone too. The consumers
> > > have a
> > > > coordinator, and I believe the connect workers should have a
> > coordinator
> > > > too, for task rebalancing.
> > > > - Source Offset topic: only relevant for sources. I wish there was a
> > > > __connect_offsets global topic just like for consumers and an
> > > > "ConnectOffsetCoordinator" to talk to to retrieve latest committed
> > > offset.
> > > >
> > > > If we look above, with a few back-end fundamental transformations, we
> > can
> > > > probably make Connect "cluster-less".
> > > >
> > > > What the community would get out of it is huge:
> > > > - Connect workers for a specific connector are independent and
> > isolated,
> > > > measurable (in CPU and Mem) and auto-scalable
> > > > - CI/CD is super easy to integrate, as it's just another container /
> > jar.
> > > > - You can roll restart a specific connector and upgrade a JAR without
> > > > interrupting your other connectors and while keeping the current
> > > connector
> > > > from running.
> > > > - The topics backing connect are removed except the global one, which
> > > > allows you to scale easily in terms of number of connectors
> > > > - Ru

Are defaults serde in Kafka streams doing more harm then good ?

2018-06-12 Thread Stephane Maarek
Hi

Coming from a user perspective, I see a lot of beginners not understanding
the need for serdes and misusing the default serde settings.

I believe default serdes do more harm than good. At best, they save a bit
of boilerplate code but hide the complexity of serde happening at each
step. At worst, they generate confusion and make debugging tremendously
hard as the errors thrown at runtime don't indicate that the serde being
used is the default one.

What do you think of deprecating them as well as any API that does not use
explicit serde?

I know this may be a "tough change", but in my opinion it'll allow for more
explicit development and easier debugging.

Regards
Stéphane


Re: Are defaults serde in Kafka streams doing more harm then good ?

2018-06-13 Thread Stephane Maarek
Thanks Matthias and Guozhang

1) regarding having json protobuf or avro across the entire topology this
makes sense. I still wish the builder could take a 'defaultSerde' for value
and keys to make types explicit throughout the topology vs a class as
string in a properties. That might also help with Java types through the
topology as now we can infer that the default serde implies T as the
operators are chained

1*) I still think as soon as a 'count' or any 'window' happens the user
needs to override the default serde which can be confusing for end users

2) I very much agree a type and serde map could be very useful.

2*) big scala user here but this will affect maybe 10 percent of the user
unfortunately. Java is still where people try most things out. Still very
excited for that release !

3) haven't dug through the code, but how easy would it be to indicate to
the end user that a default serde was used during a runtime error ? This
could be a very quick kip-less win for the developers

On Thu., 14 Jun. 2018, 12:28 am Guozhang Wang,  wrote:

> Hello Stéphane,
>
> Good question :) And there have been some discussions about the default
> serdes in the past in the community, my two cents about this:
>
> 1) When a user tries out Streams for the first time she is likely to use
> some primitive typed data as her first POC app, in which case the data
> types of the intermediate streams can change frequently and hence a default
> serde would not help much but may introduce confusions; on the other hand,
> in real production environment users are likely to use some data schema
> system like Avro / Protobuf, and hence their declared serde may well be
> consistent. For example if you are using Avro with GenericRecord, then all
> the value types throughout your topology may be of the same type, so just
> declaring a `Serdes` would help. Over time,
> this is indeed what we have seen from practical user scenarios.
>
> 2) So to me the question is for top-of-the-funnel adoptions, could we make
> the OOTB experience better with serdes for users. We've discussed some
> ideas around this topic, like improving our typing systems so that users
> can specify some serdes per type (for primitive types we can pre-register a
> list of default ones as well), and the library can infer the data types and
> choose which serde to use automatically. However for Java type erasure
> makes it tricky (I think it is still the case in Java8), and we cannot
> always make it work. And that's where we paused on investigating further.
> Note that in the coming 2.0 release we have a Scala API for Streams where
> default serdes are indeed dropped since with Scala we can safely rely on
> implicit typing inference to override the serdes automatically.
>
>
>
> Guozhang
>
>
> On Tue, Jun 12, 2018 at 6:32 PM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > Hi
> >
> > Coming from a user perspective, I see a lot of beginners not
> understanding
> > the need for serdes and misusing the default serde settings.
> >
> > I believe default serdes do more harm than good. At best, they save a bit
> > of boilerplate code but hide the complexity of serde happening at each
> > step. At worst, they generate confusion and make debugging tremendously
> > hard as the errors thrown at runtime don't indicate that the serde being
> > used is the default one.
> >
> > What do you think of deprecating them as well as any API that does not
> use
> > explicit serde?
> >
> > I know this may be a "tough change", but in my opinion it'll allow for
> more
> > explicit development and easier debugging.
> >
> > Regards
> > Stéphane
> >
>
>
>
> --
> -- Guozhang
>


Re: Are defaults serde in Kafka streams doing more harm then good ?

2018-06-16 Thread Stephane Maarek
Thank Guozhang!

Let's pick this up in
https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-7066 and
https://github.com/apache/kafka/pull/5239

Cheers
Stephane

Kind regards,
Stephane

[image: Simple Machines]

Stephane Maarek | Developer

+61 416 575 980
steph...@simplemachines.com.au
simplemachines.com.au
Level 2, 145 William Street, Sydney NSW 2010

On 16 June 2018 at 07:27, Guozhang Wang  wrote:

> Hi Stephane,
>
> The serdes only happen in the following case:
>
> 1. when sending to an external topic or repartition topic, this is covered
> in SinkNode.
> 2. when reading from external topic, we cover deserialization errors in
> the DeserializationExceptionHandler interface, customizable in config.
> 3. when writing into the store, which accepts only serialized bytes (note
> it includes sending to the changelog topic as well if the store is logging
> enabled).
>
> So as of now only case 3) is not captured, and the serdes happens at
> MeteredXXStores, calling the serde, i.e. not centralized in one class. We
> can add the logic similar in SinkNode to capture ClassCastException in the
> serde calls there.
>
>
> Guozhang
>
>
> On Fri, Jun 15, 2018 at 2:05 AM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
>> 3) I've had trouble finding the proper place to catch the exception as
>> the stack trace is huge.
>>
>> I've found some "wanted" behaviour is implemented in SinkNode but not
>> elsewhere: https://github.com/apache/kafka/blob/trunk/stream
>> s/src/main/java/org/apache/kafka/streams/processor/
>> internals/SinkNode.java#L93
>>
>> Overall it'd be ideal to catch that in the Serde classes, but they don't
>> expose the correct types.
>>
>> I'm happy to propose a PR but not sure where the correct try / catch
>> should go... too high in the trace and I lose the "key value serde"
>> information, and too low in the trace I don't encompass all the cases (just
>> like SinkNode).
>>
>> If you have any pointers, much appreciated :)
>> Stephane
>>
>>
>> On Fri., 15 Jun. 2018, 4:20 am Guozhang Wang,  wrote:
>>
>>> 2) Pre-registering serdes and data types for Kafka topics as well as
>>> state
>>> stores could be a good feature to add.
>>>
>>> 3) For this, we can consider capturing the ClassCastException in serde
>>> callers and returns a more informative error.
>>>
>>>
>>> Guozhang
>>>
>>> On Wed, Jun 13, 2018 at 8:34 PM, Stephane Maarek <
>>> steph...@simplemachines.com.au> wrote:
>>>
>>> > Thanks Matthias and Guozhang
>>> >
>>> > 1) regarding having json protobuf or avro across the entire topology
>>> this
>>> > makes sense. I still wish the builder could take a 'defaultSerde' for
>>> value
>>> > and keys to make types explicit throughout the topology vs a class as
>>> > string in a properties. That might also help with Java types through
>>> the
>>> > topology as now we can infer that the default serde implies T as the
>>> > operators are chained
>>> >
>>> > 1*) I still think as soon as a 'count' or any 'window' happens the user
>>> > needs to override the default serde which can be confusing for end
>>> users
>>> >
>>> > 2) I very much agree a type and serde map could be very useful.
>>> >
>>> > 2*) big scala user here but this will affect maybe 10 percent of the
>>> user
>>> > unfortunately. Java is still where people try most things out. Still
>>> very
>>> > excited for that release !
>>> >
>>> > 3) haven't dug through the code, but how easy would it be to indicate
>>> to
>>> > the end user that a default serde was used during a runtime error ?
>>> This
>>> > could be a very quick kip-less win for the developers
>>> >
>>> > On Thu., 14 Jun. 2018, 12:28 am Guozhang Wang, 
>>> wrote:
>>> >
>>> > > Hello Stéphane,
>>> > >
>>> > > Good question :) And there have been some discussions about the
>>> default
>>> > > serdes in the past in the community, my two cents about this:
>>> > >
>>> > > 1) When a user tries out Streams for the first time she is likely to
>>> use
>>> > > some primitive typed data as her first POC app, in which case the
>>> data
>>> > > types of the inte

Re: [DISCUSS] KIP-317: Transparent Data Encryption

2018-06-18 Thread Stephane Maarek
Hi Sonke

Very much needed feature and discussion. FYI the image links seem broken.

My 2 cents (if I understood correctly): you say "This process will be
implemented after Serializer and Interceptors are done with the message
right before it is added to the batch to be sent, in order to ensure that
existing serializers and interceptors keep working with encryption just
like without it."

I think encryption should happen AFTER a batch is created, right before it
is sent. Reason is that if we want to still keep advantage of compression,
encryption needs to happen after it (and I believe compression happens on a
batch level).
So to me for a producer: serializer / interceptors => batching =>
compression => encryption => send.
and the inverse for a consumer.

Regards
Stephane

On 19 June 2018 at 06:46, Sönke Liebau 
wrote:

> Hi everybody,
>
> I've created a draft version of KIP-317 which describes the addition
> of transparent data encryption functionality to Kafka.
>
> Please consider this as a basis for discussion - I am aware that this
> is not at a level of detail sufficient for implementation, but I
> wanted to get some feedback from the community on the general idea
> before spending more time on this.
>
> Link to the KIP is:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 317%3A+Add+transparent+data+encryption+functionality
>
> Best regards,
> Sönke
>


[DISCUSS] KIP-318: Make Kafka Connect Source idempotent

2018-06-19 Thread Stephane Maarek
KIP link:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-318%3A+Make+Kafka+Connect+Source+idempotent


By looking at the code, it seems Worker.java is where the magic happens,
but do we need to also operate changes to KafkaBasedLog.java (~line 241) ?

Love to hear your thoughts!


Re: [DISCUSS] KIP-318: Make Kafka Connect Source idempotent

2018-06-20 Thread Stephane Maarek
Hi Randall

Thanks for your feedback

1) user can override: yes they can but they most surely won't know about
it. I didn't know about this improvement until I got on twitter and
exchanged with Ismael. I would say most users don't even know Kafka connect
is running with acks=all. My understanding behind the philosophy of Kafka
connect was that users only worry about writing a connector and the
framework makes the whole ETL safe. In that regards I think it's important
to increase the level of safety by preventing network duplicates (I don't
think anyone is against not having duplicates) and at the same time
increasing performance by having more in flight requests while keeping
ordering guarantees (I don't think anyone is against that either). So the
behaviour changes but I don't see any drawbacks to it.

1*) I'm very much allergic to introducing more Configs, but if the
community desires we can control that behaviour explicitly with a new
config and default the behaviour to false. It would give the users an
easier opt in and eventually we'll flip the config default to true


2) is there an easy way to clearly detect if a broker is running a specific
version of the API. If so, I don't mind including an if statement for a
conditional worker configuration and that would solve backwards
compatibility?

Thanks
Stephane




On Wed., 20 Jun. 2018, 10:54 pm Randall Hauch,  wrote:

> Thanks for starting this conversation, Stephane. I have a few questions.
>
> The worker already accepts nearly all producer properties already, and all
> `producer.*` properties override any hard-coded properties defined in
> `Worker.java`. So isn't it currently possible for a user to define these
> properties in their worker configuration if they want?
>
> Second, wouldn't this change the default behavior for existing worker
> configurations that have not overridden these properties? IOW, we would
> need to address the migration path to ensure backward compatibility.
>
> Third, the KIP mentions but does not really address the problem of running
> workers against pre-1.0 Kafka clusters. That definitely is something that
> happens frequently, so what is the planned approach for addressing this
> compatibility concern?
>
> All of these factors are likely why this has not yet been addressed to
> date: it's already possible for users to enable this feature, but doing it
> by default has compatibility concerns.
>
> Thoughts?
>
> Best regards,
>
> Randall
>
>
> On Wed, Jun 20, 2018 at 1:17 AM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > KIP link:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 318%3A+Make+Kafka+Connect+Source+idempotent
> >
> >
> > By looking at the code, it seems Worker.java is where the magic happens,
> > but do we need to also operate changes to KafkaBasedLog.java (~line 241)
> ?
> >
> > Love to hear your thoughts!
> >
>


Re: KIP-327: Add describe all topics API to AdminClient

2018-07-14 Thread Stephane Maarek
Why not paginate ? Then one can retrieve as many topics as desired ?

On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe,  wrote:

> Good point.  We should probably have a maximum number of results like
> 1000 or something.  That can go in the request RPC as well...
> Cheers,
> Colin
>
> On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > bq. describe topics by a regular expression on the server side
> >
> > Should caution be taken if the regex doesn't filter ("*") ?
> >
> > Cheers
> >
> > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> >  wrote:>
> > > As Jason wrote, this won't scale as the number of partitions
> > > increases.> > We already have users who have tens of thousands of
> topics, or
> > > more.  If> > you multiply that by 100x over the next few years, you
> end up with
> > > this API> > returning full information about millions of topics, which
> clearly
> > > doesn't> > work.
> > >
> > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > which added> > the Java AdminClient.  ListTopics and DescribeTopics
> were
> > > deliberately kept> > separate because we understood that eventually a
> single RPC would
> > > not be> > able to return information about all the topics in the
> cluster.  So
> > > I have> > to vote -1 for this proposal as it stands.
> > >
> > > I do agree that adding a way to describe topics by a regular
> > > expression on> > the server side would be very useful.  This would
> also fix a major
> > > scalability problem we have now, which is that when
> > > subscribing via a> > regular expression, clients need to fetch the
> full list of all
> > > topics in> > the cluster and filter locally.
> > >
> > > I think a regular expression library like re2 would be ideal
> > > for this> > purpose.  re2 is standardized and language-agnostic (it's
> not tied
> > > only to> > Java).  In contrast, Java regular expression change with
> different
> > > releases> > of the JDK (there were some changes in java 8, for
> example).
> > > Also, re2> > regular expressions are linear time, never exponential
> time.  See
> > > https://github.com/google/re2j
> > >
> > > regards,
> > > Colin
> > >
> > >
> > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > > The KIP looks good to me.
> > > > However, if there is willingness in the community to work on
> > > > metadata> > > request with patterns, the feature proposed here and
> filtering by
> > > > '*' or> > > '.*' would be redundant.
> > > >
> > > > Andras
> > > >
> > > >
> > > >
> > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > > > wrote:
> > > >
> > > > > Hey Manikumar,
> > > > >
> > > > > As Kafka begins to scale to larger and larger numbers of
> > > topics/partitions,
> > > > > I'm a little concerned about the scalability of APIs such as
> > > > > this. The> > API
> > > > > looks benign, but imagine you have have a few million
> > > > > partitions. We> > > > already expose similar APIs in the producer
> and consumer, so
> > > > > probably> > not
> > > > > much additional harm to expose it in the AdminClient, but it
> > > > > would be> > nice
> > > > > to put a little thought into some longer term options. We should
> > > > > be> > giving
> > > > > users an efficient way to select a smaller set of the topics
> > > > > they are> > > > interested in. We have always discussed adding
> some filtering
> > > > > support> > to
> > > > > the Metadata API. Perhaps now is a good time to reconsider this?
> > > > > We now> > > > have a convention for wildcard ACLs, so perhaps we
> can do
> > > > > something> > > > similar. Full regex support might be ideal given
> the consumer's> > > > subscription API, but that is more challenging. What
> do you
> > > > > think?> > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha  wrote:>
> > > >
> > > > > > Very useful. LGTM.
> > > > > >
> > > > > > Thanks,
> > > > > > Harsha
> > > > > >
> > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I have created a KIP to add describe all topics API to
> > > > > > > AdminClient> > .
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > > >
> > > > > > > Please take a look.
> > > > > > >
> > > > > > > Thanks,
> > > > > >
> > > > >
> > >
>
>


Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Stephane Maarek
nt to move secrets out of
> connector
> > > > configurations and integrate with any external key management system.
> > > > The placeholders in connector configurations are only resolved before
> > > > sending the configuration to the connector, ensuring that secrets are
> > > > stored
> > > > and managed securely in your preferred key management system and
> > > > not exposed over the REST APIs or in log files.
> > > >
> > > > ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> > > > which provides better type inference and better type safety during
> > > compile
> > > > time. Scala users can have less boilerplate in their code, notably
> > > > regarding
> > > > Serdes with new implicit Serdes.
> > > >
> > > > ** Message headers are now supported in the Kafka Streams Processor
> > API,
> > > > allowing users to add and manipulate headers read from the source
> > topics
> > > > and propagate them to the sink topics.
> > > >
> > > > ** Windowed aggregations performance in Kafka Streams has been
> largely
> > > > improved (sometimes by an order of magnitude) thanks to the new
> > > > single-key-fetch API.
> > > >
> > > > ** We have further improved unit testibility of Kafka Streams with
> the
> > > > kafka-streams-testutil artifact.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > All of the changes in this release can be found in the release notes:
> > > >
> > > > https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > You can download the source and binary release (Scala 2.11 and Scala
> > > 2.12)
> > > > from:
> > > >
> > > > https://kafka.apache.org/downloads#2.0.0
> > > > <https://kafka.apache.org/downloads#2.0.0>
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> ---
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Apache Kafka is a distributed streaming platform with four core APIs:
> > > >
> > > >
> > > >
> > > > ** The Producer API allows an application to publish a stream records
> > to
> > > >
> > > > one or more Kafka topics.
> > > >
> > > >
> > > >
> > > > ** The Consumer API allows an application to subscribe to one or more
> > > >
> > > > topics and process the stream of records produced to them.
> > > >
> > > >
> > > >
> > > > ** The Streams API allows an application to act as a stream
> processor,
> > > >
> > > > consuming an input stream from one or more topics and producing an
> > > >
> > > > output stream to one or more output topics, effectively transforming
> > the
> > > >
> > > > input streams to output streams.
> > > >
> > > >
> > > >
> > > > ** The Connector API allows building and running reusable producers
> or
> > > >
> > > > consumers that connect Kafka topics to existing applications or data
> > > >
> > > > systems. For example, a connector to a relational database might
> > > >
> > > > capture every change to a table.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > With these APIs, Kafka can be used for two broad classes of
> > application:
> > > >
> > > >
> > > >
> > > > ** Building real-time streaming data pipelines that reliably get data
> > > >
> > > > between systems or applications.
> > > >
> > > >
> > > >
> > > > ** Building real-time streaming applications that transform or react
> > > >
> > > > to the streams of data.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Apache Kafka is in use at large and small companies worldwide,
> > including
> > > >
> > > > Capital One, Goldman Sa

Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-02-27 Thread Stephane Maarek
Sounds awesome !
Are you planning to have auto scaling of partitions in a following KIP ?
That would be the holy grail

On 28 Feb. 2018 5:13 pm, "Dong Lin"  wrote:

> Hey Jan,
>
> I am not sure if it is acceptable for producer to be stopped for a while,
> particularly for online application which requires low latency. I am also
> not sure how consumers can switch to a new topic. Does user application
> needs to explicitly specify a different topic for producer/consumer to
> subscribe to? It will be helpful for discussion if you can provide more
> detail on the interface change for this solution.
>
> Thanks,
> Dong
>
> On Mon, Feb 26, 2018 at 12:48 AM, Jan Filipiak 
> wrote:
>
> > Hi,
> >
> > just want to throw my though in. In general the functionality is very
> > usefull, we should though not try to find the architecture to hard while
> > implementing.
> >
> > The manual steps would be to
> >
> > create a new topic
> > the mirrormake from the new old topic to the new topic
> > wait for mirror making to catch up.
> > then put the consumers onto the new topic
> > (having mirrormaker spit out a mapping from old offsets to new
> offsets:
> > if topic is increased by factor X there is gonna be a clean
> > mapping from 1 offset in the old topic to X offsets in the new topic,
> > if there is no factor then there is no chance to generate a
> > mapping that can be reasonable used for continuing)
> > make consumers stop at appropriate points and continue consumption
> > with offsets from the mapping.
> > have the producers stop for a minimal time.
> > wait for mirrormaker to finish
> > let producer produce with the new metadata.
> >
> >
> > Instead of implementing the approach suggest in the KIP which will leave
> > log compacted topic completely crumbled and unusable.
> > I would much rather try to build infrastructure to support the mentioned
> > above operations more smoothly.
> > Especially having producers stop and use another topic is difficult and
> > it would be nice if one can trigger "invalid metadata" exceptions for
> them
> > and
> > if one could give topics aliases so that their produces with the old
> topic
> > will arrive in the new topic.
> >
> > The downsides are obvious I guess ( having the same data twice for the
> > transition period, but kafka tends to scale well with datasize). So its a
> > nicer fit into the architecture.
> >
> > I further want to argument that the functionality by the KIP can
> > completely be implementing in "userland" with a custom partitioner that
> > handles the transition as needed. I would appreciate if someone could
> point
> > out what a custom partitioner couldn't handle in this case?
> >
> > With the above approach, shrinking a topic becomes the same steps.
> Without
> > loosing keys in the discontinued partitions.
> >
> > Would love to hear what everyone thinks.
> >
> > Best Jan
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 11.02.2018 00:35, Dong Lin wrote:
> >
> >> Hi all,
> >>
> >> I have created KIP-253: Support in-order message delivery with partition
> >> expansion. See
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-253%
> >> 3A+Support+in-order+message+delivery+with+partition+expansion
> >> .
> >>
> >> This KIP provides a way to allow messages of the same key from the same
> >> producer to be consumed in the same order they are produced even if we
> >> expand partition of the topic.
> >>
> >> Thanks,
> >> Dong
> >>
> >>
> >
>


Re: [VOTE] KIP-171 - Extend Consumer Group Reset Offset for Stream Application

2018-02-27 Thread Stephane Maarek
Bootstrap servers vs broker list is the biggest hurdle when teaching to
beginners. A standardized set of parameters would incredibly help

On 27 Feb. 2018 9:44 am, "Matthias J. Sax"  wrote:

I agree on consistency, too.

However, I am not sure if we should introduce an explicit --execute
option. Anybody familiar with Linux tools will expect a command to
execute by default.

Thus, I would suggest to remove --execute for all tools that use this
option atm.

Btw: there is a related Jira:
https://issues.apache.org/jira/browse/KAFKA-1299

Furthermore, this also affect arguments like

--bootstrap-servers
vs
--broker-list

and maybe others.

IMHO, all tools should use the same names. Thus, it's a larger change...
But totally worth doing it.


-Matthias

On 2/26/18 10:09 AM, Guozhang Wang wrote:
> Hi Jorge,
>
> I agree on being consistent across our tools.
>
> Besides the kafka-consumer-groups and kafka-streams-application-reset, a
> couple of other classes to consider adding the --execute options for the
> next major release:
>
> 1. kafka-preferred-replica-elections
> 2. kafka-reassign-partitions
> 3. kafka-delete-records
> 4. kafka-topics
> 5. kafka-acls
> 6. kafka-configs
> 7. kafka-delegation-tokens
>
>
> Guozhang
>
> On Mon, Feb 26, 2018 at 3:03 AM, Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
>> Hi all,
>>
>> Thanks for the feedback.
>>
>> I have updated the "Compatibility, Deprecation, and Migration Plan"
section
>> to document this to support the rollback. I probably should have handled
>> this change, as small as it looks, as a new KIP to avoid this issue.
>>
>> I like Colin's idea about asking for confirmation, although I'm not sure
if
>> another tool has already this behavior and could create more confusion
>> (e.g. why this command ask for confirmation and others don't). Maybe we
>> will require a more broad looks at the CLI tools to agree on this?
>>
>> Jorge.
>>
>> El jue., 22 feb. 2018 a las 21:09, Guozhang Wang ()
>> escribió:
>>
>>> Yup, agreed.
>>>
>>> On Thu, Feb 22, 2018 at 11:46 AM, Ismael Juma  wrote:
>>>
 Hi Guozhang,

 To clarify my comment: any change with a backwards compatibility impact
 should be mentioned in the "Compatibility, Deprecation, and Migration
>>> Plan"
 section (in addition to the deprecation period and only happening in a
 major release as you said).

 Ismael

 On Thu, Feb 22, 2018 at 11:10 AM, Guozhang Wang 
 wrote:

> Just to clarify, the KIP itself has mentioned about the change so the
>>> PR
> was not un-intentional:
>
> "
>
> 3. Keep execution parameters uniform between both tools: It will
>>> execute
 by
> default, and have a `dry-run` parameter just show the results. This
>>> will
> involve change current `ConsumerGroupCommand` to change execution
 options.
>
> "
>
> We were agreed that the proposed change is better than the current
 status,
> since may people not using "--execute" on consumer reset tool were
 actually
> surprised that nothing gets executed. What we were concerning as a
> hind-sight is that instead of doing such change in a minor release
>> like
> 1.1, we should consider only doing that in the next major release as
>> it
> breaks compatibility. In the past when we are going to remove /
>> replace
> certain option we would first add a going-to-be-deprecated warning in
>>> the
> previous releases until it was finally removed. So Jason's suggestion
>>> is
 to
> do the same: we are not reverting this change forever, but trying to
 delay
> it after 1.1.
>
>
> Guozhang
>
>
> On Thu, Feb 22, 2018 at 10:56 AM, Colin McCabe 
 wrote:
>
>> Perhaps, if the user doesn't pass the --execute flag, the tool
>> should
>> print a prompt like "would you like to perform this reset?" and
>> wait
 for
> a
>> Y / N (or yes or no) input from the command-line.  Then, if the
 --execute
>> flag is passed, we skip this.  That seems 99% compatible, and also
>> accomplishes the goal of making the tool less confusing.
>>
>> best,
>> Colin
>>
>>
>> On Thu, Feb 22, 2018, at 10:23, Ismael Juma wrote:
>>> Yes, let's revert the incompatible changes. There was no mention
>> of
>>> compatibility impact on the KIP and we should ensure that is the
>>> case
> for
>>> 1.1.0.
>>>
>>> Ismael
>>>
>>> On Thu, Feb 22, 2018 at 9:55 AM, Jason Gustafson <
>>> ja...@confluent.io
>
>> wrote:
>>>
 I know it's a been a while since this vote passed, but I think
>> we
> need
>> to
 reconsider the incompatible changes to the consumer reset tool.
 Specifically, we have removed the --execute option without
> deprecating
>> it
 first, and we have changed the default behavior to execute
>> rather
> than
>> do a
 dry run. The latter in particu

Re: [DISCUSS] KIP-263: Allow broker to skip sanity check of inactive segments on broker startup

2018-02-27 Thread Stephane Maarek
This is great and definitely needed. I'm not exactly sure of what goes in
the process of checking log files at startup, but is there something like
signature checks of files (especially closed, immutable ones) that can be
saved on disk and checked against at startup ? Wouldn't that help speed up
boot time, for all segments ?

On 26 Feb. 2018 5:28 pm, "Dong Lin"  wrote:

> Hi all,
>
> I have created KIP-263: Allow broker to skip sanity check of inactive
> segments on broker startup. See
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 263%3A+Allow+broker+to+skip+sanity+check+of+inactive+
> segments+on+broker+startup
> .
>
> This KIP provides a way to significantly reduce time to rolling bounce a
> Kafka cluster.
>
> Comments are welcome!
>
> Thanks,
> Dong
>


Kafka version and JDK version

2018-03-20 Thread Stephane Maarek
Hi

If I remember correctly, Kafka 2.0 is targeted this summer as it'll drop
support for java 7 and dropping a java version is supposed to imply a major
version bump in Kafka.

Now that Java has a very quick release cycle for JDK (version 10 today), my
question is: how fast will Kafka versioning go ?

My point of view is that we shouldn't increment the Kafka version as fast
as Java, but that's currently the way it seems it'll go

My perspective is that from someone who teaches Kafka, students expect
major version bumps to have major effects on how they program. But it's a
tough sell to explain that Kafka 2.0 has nothing major in the functioning
or programming style except the underlying Java version.

I just want to hear thoughts and opinions and start a discussion.

Thanks !
Stéphane


Re: Kafka version and JDK version

2018-03-20 Thread Stephane Maarek
Sounds good ! Thanks for the detailed explanation :)

On Wed., 21 Mar. 2018, 11:40 am Ismael Juma,  wrote:

> Hi Stephane,
>
> I don't see why we would increment Kafka versions as quick as Java
> versions. The way I think it should work is that we support LTS versions
> for a long time and only support the most recent non LTS version. The
> latter is to ensure that we catch any issues with newer Java releases
> quickly, but people are encouraged to use Java LTS versions in production.
> Given that, I don't think major version bumps in Kafka will happen often.
> The bump to 2.0 also gives us an opportunity to drop the old Scala clients.
> This will be a huge win in tech debt reduction so the project will be able
> to move faster after that.
>
> Ismael
>
> On Tue, 20 Mar 2018, 20:24 Stephane Maarek, <
> steph...@simplemachines.com.au>
> wrote:
>
> > Hi
> >
> > If I remember correctly, Kafka 2.0 is targeted this summer as it'll drop
> > support for java 7 and dropping a java version is supposed to imply a
> major
> > version bump in Kafka.
> >
> > Now that Java has a very quick release cycle for JDK (version 10 today),
> my
> > question is: how fast will Kafka versioning go ?
> >
> > My point of view is that we shouldn't increment the Kafka version as fast
> > as Java, but that's currently the way it seems it'll go
> >
> > My perspective is that from someone who teaches Kafka, students expect
> > major version bumps to have major effects on how they program. But it's a
> > tough sell to explain that Kafka 2.0 has nothing major in the functioning
> > or programming style except the underlying Java version.
> >
> > I just want to hear thoughts and opinions and start a discussion.
> >
> > Thanks !
> > Stéphane
> >
>


Re: [DISCUSS] KIP-277 - Fine Grained ACL for CreateTopics API

2018-03-29 Thread Stephane Maarek
Not against, but this needs to support regex for support for Kafka streams
application that create many topics with complex names

On Thu., 29 Mar. 2018, 7:21 pm Edoardo Comar,  wrote:

> Hi all,
>
> We have submitted KIP-277 to give users permission to manage the lifecycle
> of a defined set of topics;
> the current ACL checks are for permission to create *any* topic and on
> delete for permission against the *named* topics.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-277+-+Fine+Grained+ACL+for+CreateTopics+API
>
> Feedback and suggestions are welcome, thanks.
>
> Edo & Mickael
> --
>
> Edoardo Comar
>
> IBM Message Hub
>
> IBM UK Ltd, Hursley Park, SO21 2JN
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


Dead link in Kafka Streams documentation

2018-04-06 Thread Stephane Maarek
https://kafka.apache.org/documentation/streams/developer-guide/

"Testing a Streams application" points to
https://kafka.apache.org/documentation/streams/developer-guide/testing.html

Which is a dead link.

The working link is
https://kafka.apache.org/11/documentation/streams/developer-guide/testing.html
But I believe the working link should change to be consistent with the rest
of the doc

Unsure where do make PR for docs, but if you can do a quick fix it'll be
appreciated!

Thanks :)
Stephane


Re: Dead link in Kafka Streams documentation

2018-04-07 Thread Stephane Maarek
Thanks Guozhang, everything works as expected now!

Kind regards,
Stephane

[image: Simple Machines]

Stephane Maarek | Developer

+61 416 575 980
steph...@simplemachines.com.au
simplemachines.com.au
Level 2, 145 William Street, Sydney NSW 2010

On 8 April 2018 at 06:38, Guozhang Wang  wrote:

> I have pushed a hotfix PR for fixing the default version links in streams
> dev-guide.
>
> https://github.com/apache/kafka-site/commit/94f59b43d013dca2832fecfa1efe91
> 9f3b66ca90
>
>
> Guozhang
>
> On Sat, Apr 7, 2018 at 5:55 PM, Guozhang Wang  wrote:
>
> > Thanks for reporting the issue Stephane, I will fix it.
> >
> >
> > Guozhang
> >
> > On Fri, Apr 6, 2018 at 10:15 PM, Stephane Maarek <
> > steph...@simplemachines.com.au> wrote:
> >
> >> https://kafka.apache.org/documentation/streams/developer-guide/
> >>
> >> "Testing a Streams application" points to
> >> https://kafka.apache.org/documentation/streams/developer-
> >> guide/testing.html
> >>
> >> Which is a dead link.
> >>
> >> The working link is
> >> https://kafka.apache.org/11/documentation/streams/developer-
> >> guide/testing.html
> >> But I believe the working link should change to be consistent with the
> >> rest
> >> of the doc
> >>
> >> Unsure where do make PR for docs, but if you can do a quick fix it'll be
> >> appreciated!
> >>
> >> Thanks :)
> >> Stephane
> >>
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>


Re: [VOTE] Kafka 2.0.0 in June 2018

2018-04-19 Thread Stephane Maarek
+1 (non binding). Thanks Ismael!

On Thu., 19 Apr. 2018, 2:47 pm Gwen Shapira,  wrote:

> +1 (binding)
>
> On Wed, Apr 18, 2018 at 11:35 AM, Ismael Juma  wrote:
>
> > Hi all,
> >
> > I started a discussion last year about bumping the version of the June
> 2018
> > release to 2.0.0[1]. To reiterate the reasons in the original post:
> >
> > 1. Adopt KIP-118 (Drop Support for Java 7), which requires a major
> version
> > bump due to semantic versioning.
> >
> > 2. Take the chance to remove deprecated code that was deprecated prior to
> > 1.0.0, but not removed in 1.0.0 (e.g. old Scala clients) so that we can
> > move faster.
> >
> > One concern that was raised is that we still do not have a rolling
> upgrade
> > path for the old ZK-based consumers. Since the Scala clients haven't been
> > updated in a long time (they don't support security or the latest message
> > format), users who need them can continue to use 1.1.0 with no loss of
> > functionality.
> >
> > Since it's already mid-April and people seemed receptive during the
> > discussion last year, I'm going straight to a vote, but we can discuss
> more
> > if needed (of course).
> >
> > Ismael
> >
> > [1]
> > https://lists.apache.org/thread.html/dd9d3e31d7e9590c1f727ef5560c93
> > 3281bad0de3134469b7b3c4257@%3Cdev.kafka.apache.org%3E
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>


Re: [VOTE] KIP-277 - Fine Grained ACL for CreateTopics API

2018-04-25 Thread Stephane Maarek
+1

On Thu., 26 Apr. 2018, 4:19 am Ted Yu,  wrote:

> +1
>
> On Wed, Apr 25, 2018 at 3:24 PM, Guozhang Wang  wrote:
>
> > Thanks Edo, +1 from me.
> >
> >
> > Guozhang
> >
> > On Wed, Apr 25, 2018 at 2:45 AM, Edoardo Comar 
> wrote:
> >
> > > Hi,
> > >
> > > The discuss thread on KIP-277 (
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg86540.html )
> > > seems to have been fruitful and concerns have been addressed, please
> > allow
> > > me start a vote on it:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 277+-+Fine+Grained+ACL+for+CreateTopics+API
> > >
> > > I will update the small PR to the latest KIP semantics if the vote
> passes
> > > (as I hope :-).
> > >
> > > cheers
> > > Edo
> > > --
> > >
> > > Edoardo Comar
> > >
> > > IBM Message Hub
> > >
> > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > Unless stated otherwise above:
> > > IBM United Kingdom Limited - Registered in England and Wales with
> number
> > > 741598.
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> > 3AU
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [VOTE] KIP-235 Add DNS alias support for secured connection

2018-04-26 Thread Stephane Maarek
+1 as a user
BUT

I am no security expert. I have experienced that issue while setting up a
cluster and while I would have liked a feature like that (I opened a JIRA
at the time), I always guessed that the reason was because of some security
protection.

Now from a setup point of view this helps a ton, but I really want to make
sure this doesn't introduce any security risk by relaxing a constraint.

Is there a security assessment possible by someone accredited ?

Sorry for raising these questions just want to make sure it's addressed

On Thu., 26 Apr. 2018, 5:32 pm Gwen Shapira,  wrote:

> +1 (binding)
>
> This KIP is quite vital to running secured clusters in cloud/container
> environment. Would love to see more support from the community to this (or
> feedback...)
>
> Gwen
>
> On Mon, Apr 16, 2018 at 4:52 PM, Skrzypek, Jonathan <
> jonathan.skrzy...@gs.com> wrote:
>
> > Hi,
> >
> > Could anyone take a look ?
> > Does the proposal sound reasonable ?
> >
> > Jonathan Skrzypek
> >
> >
> > From: Skrzypek, Jonathan [Tech]
> > Sent: 23 March 2018 19:05
> > To: dev@kafka.apache.org
> > Subject: [VOTE] KIP-235 Add DNS alias support for secured connection
> >
> > Hi,
> >
> > I would like to start a vote for KIP-235
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 235%3A+Add+DNS+alias+support+for+secured+connection
> >
> > This is a proposition to add an option for reverse dns lookup of
> > bootstrap.servers hosts, allowing the use of dns aliases on clusters
> using
> > SASL authentication.
> >
> >
> >
> >
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>


Re: [DISCUSS] KIP-290: Support for wildcard suffixed ACLs

2018-05-01 Thread Stephane Maarek
Hi, thanks for this badly needed feature

1) Why introduce two new APIs in authorizer instead of replacing the
implementation for simple ACL authorizer with adding the wildcard
capability?

2) is there an impact to performance as now we're evaluating more rules ? A
while back I had evaluated the concept of cached Acl result so swallow the
cost of computing an authorisation cost once and then doing in memory
lookups. CF: https://issues.apache.org/jira/browse/KAFKA-5261

3) is there any need to also extend this to consumer group resources ?

4) create topics KIP as recently moved permissions out of Cluster into
Topic. Will wildcard be supported for this action too?

Thanks a lot for this !

On Wed., 2 May 2018, 1:37 am Ted Yu,  wrote:

> w.r.t. naming, we can keep wildcard and drop 'prefixed' (or 'suffixed')
> since the use of regex would always start with non-wildcard portion.
>
> Cheers
>
> On Tue, May 1, 2018 at 12:13 PM, Andy Coates  wrote:
>
> > Hi Piyush,
> >
> > Can you also document in the Compatibility section what would happen
> should
> > the cluster be upgraded, wildcard-suffixed ACLs are added, and then the
> > cluster is rolled back to the previous version.  On downgrade the partial
> > wildcard ACLs will be treated as literals and hence never match anything.
> > This is fine for ALLOW ACLs, but some might consider this a security hole
> > if a DENY ACL is ignored, so this needs to be documented, both in the KIP
> > and the final docs.
> >
> > For some reason I find the term 'prefixed wildcard ACLs' easier to grok
> > than 'wildcard suffixed ACLs'. Probably because in the former the
> > 'wildcard' term comes after the positional adjective, which matches the
> > position of the wildcard char in the resource name, i.e. "some*".  It's
> > most likely a person thing, but I thought I'd mention it as naming is
> > important when it comes to making this initiative for users.
> >
> > On 1 May 2018 at 19:57, Andy Coates  wrote:
> >
> > > Hi Piyush,
> > >
> > > Thanks for raising this KIP - it's very much appreciated.
> > >
> > > I've not had chance to digest it yet, but...
> > >
> > > 1. you might want to add details of how the internals of the
> > > `getMatchingAcls` is implemented. We'd want to make sure the complexity
> > of
> > > the operation isn't adversely affected.
> > > 2. You might want to be more explicit that the length of a prefix does
> > not
> > > play a part in the `authorize` call, e.g. given ACLs {DENY, some.*},
> > {ALLOW,
> > > some.prefix.*}, the longer, i.e. more specific, allow ACL does _not_
> > > override the more general deny ACL.
> > >
> > > Thanks,
> > >
> > > Andy
> > >
> > > On 1 May 2018 at 16:59, Ron Dagostino  wrote:
> > >
> > >> Hi Piyush.  I appreciated your talk at Kafka Summit and appreciate the
> > KIP
> > >> -- thanks.
> > >>
> > >> Could you explain these mismatching references?  Near the top of the
> KIP
> > >> you refer to these proposed new method signatures:
> > >>
> > >> def getMatchingAcls(resource: Resource): Set[Acl]
> > >> def getMatchingAcls(principal: KafkaPrincipal): Map[Resource,
> Set[Acl]]
> > >>
> > >> But near the bottom of the KIP you refer to different method
> > >> signatures that don't seem to match the above ones:
> > >>
> > >> getMatchingAcls(topicRegex)
> > >> getMatchingAcls(principalRegex)
> > >>
> > >> Ron
> > >>
> > >>
> > >> On Tue, May 1, 2018 at 11:48 AM, Ted Yu  wrote:
> > >>
> > >> > The KIP was well written. Minor comment on formatting:
> > >> >
> > >> > https://github.com/apache/kafka/blob/trunk/core/src/
> > >> > main/scala/kafka/admin/AclCommand.scala
> > >> > to
> > >> >
> > >> > Leave space between the URL and 'to'
> > >> >
> > >> > Can you describe changes for the AdminClient ?
> > >> >
> > >> > Thanks
> > >> >
> > >> > On Tue, May 1, 2018 at 8:12 AM, Piyush Vijay <
> piyushvij...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi all,
> > >> > >
> > >> > > I just opened a KIP to add support for wildcard suffixed ACLs.
> This
> > is
> > >> > one
> > >> > > of the feature I talked about in my Kafka summit talk and we
> > promised
> > >> to
> > >> > > upstream it :)
> > >> > >
> > >> > > The details are here -
> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > > 290%3A+Support+for+wildcard+suffixed+ACLs
> > >> > >
> > >> > > There is an open question about the way to add the support in the
> > >> > > AdminClient, which I can discuss here in more detail once everyone
> > has
> > >> > > taken a first look at the KIP.
> > >> > >
> > >> > > Looking forward to discuss the change.
> > >> > >
> > >> > > Best,
> > >> > > Piyush Vijay
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>


Re: [EXTERNAL] [VOTE] KIP-310: Add a Kafka Source Connector to Kafka Connect

2018-08-10 Thread Stephane Maarek
Hi Rhys,

Overall I'm +1 (non binding), but you're going to need 3 binding votes for
this KIP to pass.
I don't feel there has been enough discussion on this from the community.
Can we get some input from other people?

Thanks for starting the vote nonetheless :)
Stephane

On 8 August 2018 at 20:28, McCaig, Rhys  wrote:

> Hi
>
> Could we get a couple of votes on this KIP - voting closes in 24 hours.
>
> Thanks,
>
> Rhys
>
> > On Aug 6, 2018, at 11:51 AM, McCaig, Rhys 
> wrote:
> >
> > Hi All,
> >
> > I’m starting a vote on KIP-310: Add a Kafka Source Connector to Kafka
> Connect
> >
> > KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 310%3A+Add+a+Kafka+Source+Connector+to+Kafka+Connect ps://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 310:+Add+a+Kafka+Source+Connector+to+Kafka+Connect>
> > Discussion Thread: http://mail-archives.apache.
> org/mod_mbox/kafka-dev/201808.mbox/%3c17E8D696-E51C-4BEB-
> bd70-9324d4b53...@comcast.com%3e apache.org/mod_mbox/kafka-dev/201808.mbox/<17E8D696-E51C-
> 4beb-bd70-9324d4b53...@comcast.com>>
> >
> > Cheers,
> > Rhys
>
>


Re: [EXTERNAL] [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2018-08-27 Thread Stephane Maarek
@randall it's more of a "safety net" and making sure an application runs
against a topic that's properly configured or crashes. Not absolutely
needed

On Mon., 27 Aug. 2018, 6:04 pm McCaig, Rhys, 
wrote:

> Randall,
>
> This KIP looks great to me. As for _updating_ topic configs - It’s a nice
> to have but certainly something that I could live without in order to get
> this KIP implemented. (Its not something I would use in my current setup
> but I can see some cases where it could be part of the workflow for
> mirrored topics).
> If it were to be included, I’d be happier to see it hidden behind a config
> flag - (if topic already exists, can be an option to WARN/FAIL or change
> the topic, where the default would be warn?)
>
> Cheers,
> Rhys
>
> > On Aug 21, 2018, at 10:58 PM, Randall Hauch  wrote:
> >
> > Okay, after much delay let's try this again for AK 2.1. Has anyone found
> > any concerns? Stephane suggested that we allow updating topic
> > configurations (everything but partition count). I'm unconvinced that
> it's
> > worth the additional complexity in the implementation and the
> documentation
> > to explain the behavior. Changing several of the topic-specific
> > configurations have significant impact on broker behavior /
> functionality,
> > so IMO we need to proceed more cautiously.
> >
> > Stephane, do you have a particular use case in mind for updating topic
> > configurations on an existing topic?
> >
> > Randall
> >
> >
> > On Fri, Jan 26, 2018 at 4:20 PM Randall Hauch  wrote:
> >
> >> The KIP deadline for 1.1 has already passed, but I'd like to restart
> this
> >> discussion so that we make the next release. I've not yet addressed the
> >> previous comment about *existing* topics, but I'll try to do that over
> the
> >> next few weeks. Any other comments/suggestions/questions?
> >>
> >> Best regards,
> >>
> >> Randall
> >>
> >> On Thu, Oct 5, 2017 at 12:13 AM, Randall Hauch 
> wrote:
> >>
> >>> Oops. Yes, I meant “replication factor”.
> >>>
> >>>> On Oct 4, 2017, at 7:18 PM, Ted Yu  wrote:
> >>>>
> >>>> Randall:
> >>>> bq. AdminClient currently allows changing the replication factory.
> >>>>
> >>>> By 'replication factory' did you mean 'replication factor' ?
> >>>>
> >>>> Cheers
> >>>>
> >>>>> On Wed, Oct 4, 2017 at 9:58 AM, Randall Hauch 
> >>> wrote:
> >>>>>
> >>>>> Currently the KIP's scope is only topics that don't yet exist, and we
> >>> have
> >>>>> to cognizant of race conditions between tasks with the same
> connector.
> >>> I
> >>>>> think it is worthwhile to consider whether the KIP's scope should
> >>> expand to
> >>>>> also address *existing* partitions, though it may not be appropriate
> to
> >>>>> have as much control when changing the topic settings for an existing
> >>>>> topic. For example, changing the number of partitions (which the KIP
> >>>>> considers a "topic-specific setting" even though technically it is
> not)
> >>>>> shouldn't be done blindly due to the partitioning impacts, and IIRC
> you
> >>>>> can't reduce them (which we could verify before applying). Also, I
> >>> don't
> >>>>> think the AdminClient currently allows changing the replication
> >>> factory. I
> >>>>> think changing the topic configs is less problematic both from what
> >>> makes
> >>>>> sense for connectors to verify/change and from what the AdminClient
> >>>>> supports.
> >>>>>
> >>>>> Even if we decide that it's not appropriate to change the settings on
> >>> an
> >>>>> existing topic, I do think it's advantageous to at least notify the
> >>>>> connector (or task) prior to the first record sent to a given topic
> so
> >>> that
> >>>>> the connector can fail or issue a warning if it doesn't meet its
> >>>>> requirements.
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> Randall
> >>>>>
> >>>>> On Wed, Oct 4, 2017 at 12:52 AM, Stephane Maarek <
>

KAFKA-1194

2018-09-02 Thread Stephane Maarek
Hi,

I've seen Dong has done some work on
https://issues.apache.org/jira/browse/KAFKA-7278 which is said from the
comments that it could have possibly fixed
https://issues.apache.org/jira/browse/KAFKA-1194.

I tested and it is unfortunately not the case...
I have posted in KAFKA-1194 a way to reproduce the issue in a deterministic
way:
===
C:\kafka_2.11-2.1.0-SNAPSHOT>bin\windows\kafka-server-start.bat
config\server.properties

C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
--topic second_topic --create --partitions 3 --replication-factor 1

C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-console-producer.bat --broker-list
127.0.0.1:9092 --topic second_topic
>hello
>world
>hello
>Terminate batch job (Y/N)? Y

C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
--topic second_topic --delete



I usually wouldn't push for any issues for windows to be fixed, but this
one is triggered from basic CLI usage and triggers a full broker failure
that can't be recovered.
It actually completely destroys the faith of Kafka newcomers learning using
Windows (which are 50% of the users I have in my course).

I'm happy to help out in any way I can to test any patch.

Thanks
Stephane


Re: KAFKA-1194

2018-09-02 Thread Stephane Maarek
Hi Ismael, thanks for having a look!

https://github.com/apache/kafka/pull/4947/files does not fix it for me.

But this one does: https://github.com/apache/kafka/pull/4431
It produces the following log: https://pastebin.com/NHeAcc6v
As you can see the folders do get renamed, it gives the user a few odd
WARNs, but I checked and the directories are fully gone nonetheless.

I think the ideas of the PRs go back all along to https://github.com/apache/
kafka/pull/154/files or https://github.com/apache/kafka/pull/1757

I couldn't find how to map both PRs to the new Kafka codebase,
unfortunately.

On 3 September 2018 at 03:26, Ismael Juma  wrote:

> Hi Stephane,
>
> Does https://github.com/apache/kafka/pull/4947/files fix it for you?
>
> Ismael
>
> On Sun, Sep 2, 2018 at 11:25 AM Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > Hi,
> >
> > I've seen Dong has done some work on
> > https://issues.apache.org/jira/browse/KAFKA-7278 which is said from the
> > comments that it could have possibly fixed
> > https://issues.apache.org/jira/browse/KAFKA-1194.
> >
> > I tested and it is unfortunately not the case...
> > I have posted in KAFKA-1194 a way to reproduce the issue in a
> deterministic
> > way:
> > ===
> > C:\kafka_2.11-2.1.0-SNAPSHOT>bin\windows\kafka-server-start.bat
> > config\server.properties
> >
> > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
> > --topic second_topic --create --partitions 3 --replication-factor 1
> >
> > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-console-producer.bat --broker-list
> > 127.0.0.1:9092 --topic second_topic
> > >hello
> > >world
> > >hello
> > >Terminate batch job (Y/N)? Y
> >
> > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper 127.0.0.1:2181
> > --topic second_topic --delete
> > 
> >
> >
> > I usually wouldn't push for any issues for windows to be fixed, but this
> > one is triggered from basic CLI usage and triggers a full broker failure
> > that can't be recovered.
> > It actually completely destroys the faith of Kafka newcomers learning
> using
> > Windows (which are 50% of the users I have in my course).
> >
> > I'm happy to help out in any way I can to test any patch.
> >
> > Thanks
> > Stephane
> >
>


Re: KAFKA-1194

2018-09-03 Thread Stephane Maarek
Sorry for the double email. I've compiled the fixes made by the other PR
and it fixes the Windows problem: https://github.com/apache/kafka/pull/5603

It triggers a few errors in the unit tests, which is probably due to the
order changes of which files are deleted. Let's continue the discussion
there?


On 3 September 2018 at 08:24, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi Ismael, thanks for having a look!
>
> https://github.com/apache/kafka/pull/4947/files does not fix it for me.
>
> But this one does: https://github.com/apache/kafka/pull/4431
> It produces the following log: https://pastebin.com/NHeAcc6v
> As you can see the folders do get renamed, it gives the user a few odd
> WARNs, but I checked and the directories are fully gone nonetheless.
>
> I think the ideas of the PRs go back all along to
> https://github.com/apache/kafka/pull/154/files or
> https://github.com/apache/kafka/pull/1757
>
> I couldn't find how to map both PRs to the new Kafka codebase,
> unfortunately.
>
> On 3 September 2018 at 03:26, Ismael Juma  wrote:
>
>> Hi Stephane,
>>
>> Does https://github.com/apache/kafka/pull/4947/files fix it for you?
>>
>> Ismael
>>
>> On Sun, Sep 2, 2018 at 11:25 AM Stephane Maarek <
>> steph...@simplemachines.com.au> wrote:
>>
>> > Hi,
>> >
>> > I've seen Dong has done some work on
>> > https://issues.apache.org/jira/browse/KAFKA-7278 which is said from the
>> > comments that it could have possibly fixed
>> > https://issues.apache.org/jira/browse/KAFKA-1194.
>> >
>> > I tested and it is unfortunately not the case...
>> > I have posted in KAFKA-1194 a way to reproduce the issue in a
>> deterministic
>> > way:
>> > ===
>> > C:\kafka_2.11-2.1.0-SNAPSHOT>bin\windows\kafka-server-start.bat
>> > config\server.properties
>> >
>> > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper
>> 127.0.0.1:2181
>> > --topic second_topic --create --partitions 3 --replication-factor 1
>> >
>> > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-console-producer.bat --broker-list
>> > 127.0.0.1:9092 --topic second_topic
>> > >hello
>> > >world
>> > >hello
>> > >Terminate batch job (Y/N)? Y
>> >
>> > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper
>> 127.0.0.1:2181
>> > --topic second_topic --delete
>> > 
>> >
>> >
>> > I usually wouldn't push for any issues for windows to be fixed, but this
>> > one is triggered from basic CLI usage and triggers a full broker failure
>> > that can't be recovered.
>> > It actually completely destroys the faith of Kafka newcomers learning
>> using
>> > Windows (which are 50% of the users I have in my course).
>> >
>> > I'm happy to help out in any way I can to test any patch.
>> >
>> > Thanks
>> > Stephane
>> >
>>
>
>


Re: KAFKA-1194

2018-09-03 Thread Stephane Maarek
Hi Manna

If you check the first email in this thread I was telling I already tested
the fix by Dong and that it unfortunately didn't work. I've been building
from trunk.

I've instead opened a PR based on various other people's fixes and it
passed Jenkins and also fixes the windows issues encountered in KAFKA-1194:
https://github.com/apache/kafka/pull/5603
Tested for topic deletion

Would love your pair of eyes on this too and testing if you can

Cheers
Stephane

On Mon., 3 Sep. 2018, 11:42 am M. Manna,  wrote:

> Hi Stephan,
>
> https://jira.apache.org/jira/browse/KAFKA-7278
>
> This might (according to Dong) be a better fix for the issue. Could you
> kindly apply to patch to your kafka (assuming that your version is
> reasonably close to the latest), and see if that helps?
>
> Regards,
>
> On Mon, 3 Sep 2018 at 02:27, Ismael Juma  wrote:
>
> > Hi Stephane,
> >
> > Does https://github.com/apache/kafka/pull/4947/files fix it for you?
> >
> > Ismael
> >
> > On Sun, Sep 2, 2018 at 11:25 AM Stephane Maarek <
> > steph...@simplemachines.com.au> wrote:
> >
> > > Hi,
> > >
> > > I've seen Dong has done some work on
> > > https://issues.apache.org/jira/browse/KAFKA-7278 which is said from
> the
> > > comments that it could have possibly fixed
> > > https://issues.apache.org/jira/browse/KAFKA-1194.
> > >
> > > I tested and it is unfortunately not the case...
> > > I have posted in KAFKA-1194 a way to reproduce the issue in a
> > deterministic
> > > way:
> > > ===
> > > C:\kafka_2.11-2.1.0-SNAPSHOT>bin\windows\kafka-server-start.bat
> > > config\server.properties
> > >
> > > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper
> 127.0.0.1:2181
> > > --topic second_topic --create --partitions 3 --replication-factor 1
> > >
> > > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-console-producer.bat --broker-list
> > > 127.0.0.1:9092 --topic second_topic
> > > >hello
> > > >world
> > > >hello
> > > >Terminate batch job (Y/N)? Y
> > >
> > > C:\kafka_2.11-2.1.0-SNAPSHOT>kafka-topics.bat --zookeeper
> 127.0.0.1:2181
> > > --topic second_topic --delete
> > > 
> > >
> > >
> > > I usually wouldn't push for any issues for windows to be fixed, but
> this
> > > one is triggered from basic CLI usage and triggers a full broker
> failure
> > > that can't be recovered.
> > > It actually completely destroys the faith of Kafka newcomers learning
> > using
> > > Windows (which are 50% of the users I have in my course).
> > >
> > > I'm happy to help out in any way I can to test any patch.
> > >
> > > Thanks
> > > Stephane
> > >
> >
>


Re: [EXTERNAL] [VOTE] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2018-09-11 Thread Stephane Maarek
+1 (non binding)

On Tue., 11 Sep. 2018, 10:48 am Mickael Maison, 
wrote:

> +1 (non-binding)
> Thanks for the KIP!
> On Tue, Sep 11, 2018 at 8:40 AM McCaig, Rhys 
> wrote:
> >
> > Looks great Randall
> > +1 (non-binding)
> >
> > > On Sep 9, 2018, at 7:17 PM, Gwen Shapira  wrote:
> > >
> > > +1
> > > Useful improvement, thanks Randall.
> > >
> > >
> > > On Fri, Sep 7, 2018, 3:28 PM Randall Hauch  wrote:
> > >
> > >> I believe the feedback on KIP-158 has been addressed. I'd like to
> start a
> > >> vote.
> > >>
> > >> KIP:
> > >>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-158%3A+Kafka+Connect+should+allow+source+connectors+to+set+topic-specific+settings+for+new+topics
> > >>
> > >> Discussion Thread:
> > >> https://www.mail-archive.com/dev@kafka.apache.org/msg73775.html
> > >>
> > >> Thanks!
> > >>
> > >> Randall
> > >>
> >
>


Re: [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2018-09-12 Thread Stephane Maarek
gt; > >> >> > configured. Is it limited to the setting exposed in the
> > > TopicSettings
> > > >> >> > interface?
> > > >> >> >
> > > >> >> > Thanks
> > > >> >> > Magesh
> > > >> >> >
> > > >> >> > On Tue, Aug 21, 2018 at 7:59 PM Randall Hauch <
> rha...@gmail.com>
> > > >> wrote:
> > > >> >> >
> > > >> >> > > Okay, after much delay let's try this again for AK 2.1. Has
> > > anyone
> > > >> >> found
> > > >> >> > > any concerns? Stephane suggested that we allow updating topic
> > > >> >> > > configurations (everything but partition count). I'm
> > unconvinced
> > > >> that
> > > >> >> > it's
> > > >> >> > > worth the additional complexity in the implementation and the
> > > >> >> > documentation
> > > >> >> > > to explain the behavior. Changing several of the
> topic-specific
> > > >> >> > > configurations have significant impact on broker behavior /
> > > >> >> > functionality,
> > > >> >> > > so IMO we need to proceed more cautiously.
> > > >> >> > >
> > > >> >> > > Stephane, do you have a particular use case in mind for
> > updating
> > > >> topic
> > > >> >> > > configurations on an existing topic?
> > > >> >> > >
> > > >> >> > > Randall
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > On Fri, Jan 26, 2018 at 4:20 PM Randall Hauch <
> > rha...@gmail.com>
> > > >> >> wrote:
> > > >> >> > >
> > > >> >> > > > The KIP deadline for 1.1 has already passed, but I'd like
> to
> > > >> restart
> > > >> >> > this
> > > >> >> > > > discussion so that we make the next release. I've not yet
> > > >> addressed
> > > >> >> the
> > > >> >> > > > previous comment about *existing* topics, but I'll try to
> do
> > > that
> > > >> >> over
> > > >> >> > > the
> > > >> >> > > > next few weeks. Any other comments/suggestions/questions?
> > > >> >> > > >
> > > >> >> > > > Best regards,
> > > >> >> > > >
> > > >> >> > > > Randall
> > > >> >> > > >
> > > >> >> > > > On Thu, Oct 5, 2017 at 12:13 AM, Randall Hauch <
> > > rha...@gmail.com
> > > >> >
> > > >> >> > wrote:
> > > >> >> > > >
> > > >> >> > > >> Oops. Yes, I meant “replication factor”.
> > > >> >> > > >>
> > > >> >> > > >> > On Oct 4, 2017, at 7:18 PM, Ted Yu  >
> > > >> wrote:
> > > >> >> > > >> >
> > > >> >> > > >> > Randall:
> > > >> >> > > >> > bq. AdminClient currently allows changing the
> replication
> > > >> >> factory.
> > > >> >> > > >> >
> > > >> >> > > >> > By 'replication factory' did you mean 'replication
> > factor' ?
> > > >> >> > > >> >
> > > >> >> > > >> > Cheers
> > > >> >> > > >> >
> > > >> >> > > >> >> On Wed, Oct 4, 2017 at 9:58 AM, Randall Hauch <
> > > >> rha...@gmail.com
> > > >> >> >
> > > >> >> > > >> wrote:
> > > >> >> > > >> >>
> > > >> >> > > >> >> Currently the KIP's scope is only topics that don't yet
> > > >> exist,
> > > >> >> and
> > > >> >> > we
> > > >> >> > > >> have
> > > >> >> > > >> >> to cognizant

Re: [ANNOUNCE] New committer: Colin McCabe

2018-09-25 Thread Stephane Maarek
Congrats Colin !

On Tue., 25 Sep. 2018, 3:33 pm Bill Bejeck,  wrote:

> Congrats Colin!
>
> On Tue, Sep 25, 2018 at 8:11 AM Manikumar 
> wrote:
>
> > Congrats Colin!
> >
> > On Tue, Sep 25, 2018 at 4:39 PM Mickael Maison  >
> > wrote:
> >
> > > Congratulations Colin!
> > >
> > > On Tue, Sep 25, 2018, 11:17 Viktor Somogyi-Vass <
> viktorsomo...@gmail.com
> > >
> > > wrote:
> > >
> > > > Congrats Colin, well deserved! :)
> > > >
> > > > On Tue, Sep 25, 2018 at 10:58 AM Stanislav Kozlovski <
> > > > stanis...@confluent.io>
> > > > wrote:
> > > >
> > > > > Congrats Colin!
> > > > >
> > > > > On Tue, Sep 25, 2018 at 9:51 AM Edoardo Comar 
> > > wrote:
> > > > >
> > > > > > Congratulations Colin !
> > > > > > --
> > > > > >
> > > > > > Edoardo Comar
> > > > > >
> > > > > > IBM Event Streams
> > > > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > From:   Ismael Juma 
> > > > > > To: Kafka Users , dev <
> > > > dev@kafka.apache.org>
> > > > > > Date:   25/09/2018 09:40
> > > > > > Subject:[ANNOUNCE] New committer: Colin McCabe
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > The PMC for Apache Kafka has invited Colin McCabe as a committer
> > and
> > > we
> > > > > > are
> > > > > > pleased to announce that he has accepted!
> > > > > >
> > > > > > Colin has contributed 101 commits and 8 KIPs including
> significant
> > > > > > improvements to replication, clients, code quality and testing. A
> > few
> > > > > > highlights were KIP-97 (Improved Clients Compatibility Policy),
> > > KIP-117
> > > > > > (AdminClient), KIP-227 (Incremental FetchRequests to Increase
> > > Partition
> > > > > > Scalability), the introduction of findBugs and adding Trogdor
> > (fault
> > > > > > injection and benchmarking tool).
> > > > > >
> > > > > > In addition, Colin has reviewed 38 pull requests and participated
> > in
> > > > more
> > > > > > than 50 KIP discussions.
> > > > > >
> > > > > > Thank you for your contributions Colin! Looking forward to many
> > more.
> > > > :)
> > > > > >
> > > > > > Ismael, for the Apache Kafka PMC
> > > > > >
> > > > > >
> > > > > >
> > > > > > Unless stated otherwise above:
> > > > > > IBM United Kingdom Limited - Registered in England and Wales with
> > > > number
> > > > > > 741598.
> > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
> Hampshire
> > > PO6
> > > > > 3AU
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
> >
>


Speeding up Kafka Start cold startup time

2017-09-26 Thread Stephane Maarek
Hi,

 

We have our Kafka Brokers in AWS backed by st1 EBS drives. These are optimised 
for throughput

On a kafka restart (warm restart), if some of the appropriate data is in the 
pagecache, everything goes well and the broker boots in just a few seconds. 
(1000 partitions)

On a cold start (new EC2 instance which just got the volume attached), nothing 
is in the page cache and the first boot can take up to 10 minutes. 

 

Question: what does the kafka broker do and read on start? I would assume index 
files and so on, but not the actual segments, does it?

 

If so, would it be worth considering a KIP that would allow to optimise kafka 
files this way:
Kafka index files, or any small files, would be placed on a faster drive (say a 
gp2 SSD drive)
Kafka segment files, or any huge files, would be placed on a throughput 
optimised drive. 
 

I am not sure how the KIP would shape up, or the code change, but:
Is that a possible strategy to drastically improve the broker start-up time? 
Would that also improve shutdown time?
 

Thanks,

Stephane



Re: [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2017-10-03 Thread Stephane Maarek
Hi Randall,

Thanks for the KIP. I like it
What happens when the target topic is already created but the configs do not 
match? 
i.e. wrong RF, num partitions, or missing / additional configs? Will you 
attempt to apply the necessary changes or throw an error?

Thanks!
Stephane
 

On 24/5/17, 5:59 am, "Mathieu Fenniak"  wrote:

Ah, yes, I see you a highlighted part that should've made this clear
to me the first read. :-)  Much clearer now!

By the way, enjoyed your Debezium talk in NYC.

Looking forward to this Kafka Connect change; it will allow me to
remove a post-deployment tool that I hacked together for the purpose
of ensuring auto-created topics have the right config.

Mathieu


On Tue, May 23, 2017 at 11:38 AM, Randall Hauch  wrote:
> Thanks for the quick feedback, Mathieu. Yes, the first configuration rule
> whose regex matches will be applied, and no other rules will be used. I've
> updated the KIP to try to make this more clear, but let me know if it's
> still not clear.
>
> Best regards,
>
> Randall
>
> On Tue, May 23, 2017 at 10:07 AM, Mathieu Fenniak <
> mathieu.fenn...@replicon.com> wrote:
>
>> Hi Randall,
>>
>> Awesome, very much looking forward to this.
>>
>> It isn't 100% clear from the KIP how multiple config-based rules would
>> be applied; it looks like the first configuration rule whose regex
>> matches the topic name will be used, and no other rules will be
>> applied.  Is that correct?  (I wasn't sure if it might cascade
>> together multiple matching rules...)
>>
>> Looks great,
>>
>> Mathieu
>>
>>
>> On Mon, May 22, 2017 at 1:43 PM, Randall Hauch  wrote:
>> > Hi, all.
>> >
>> > We recently added the ability for Kafka Connect to create *internal*
>> topics
>> > using the new AdminClient, but it still would be great if Kafka Connect
>> > could do this for new topics that result from source connector records.
>> > I've outlined an approach to do this in "KIP-158 Kafka Connect should
>> allow
>> > source connectors to set topic-specific settings for new topics".
>> >
>> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 158%3A+Kafka+Connect+should+allow+source+connectors+to+
>> set+topic-specific+settings+for+new+topics
>> > > 158%3A+Kafka+Connect+should+allow+source+connectors+to+
>> set+topic-specific+settings+for+new+topics>*
>> >
>> > Please take a look and provide feedback. Thanks!
>> >
>> > Best regards,
>> >
>> > Randall
>>





Re: [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2017-10-04 Thread Stephane Maarek
I agree. I'm personally against increasing the partitions number, but RF would 
make sense. 
Same for configs, I'm okay with them being overriden.  
 
Maybe a "conflict" setting would make sense? Options: do nothing, throw 
exception, or apply? (default: do nothing - for safety)

It'd be worth including this in the scope of that KIP in my opinion

On 5/10/17, 3:58 am, "Randall Hauch"  wrote:

Currently the KIP's scope is only topics that don't yet exist, and we have
to cognizant of race conditions between tasks with the same connector. I
think it is worthwhile to consider whether the KIP's scope should expand to
also address *existing* partitions, though it may not be appropriate to
have as much control when changing the topic settings for an existing
topic. For example, changing the number of partitions (which the KIP
considers a "topic-specific setting" even though technically it is not)
shouldn't be done blindly due to the partitioning impacts, and IIRC you
can't reduce them (which we could verify before applying). Also, I don't
think the AdminClient currently allows changing the replication factory. I
think changing the topic configs is less problematic both from what makes
sense for connectors to verify/change and from what the AdminClient
supports.

Even if we decide that it's not appropriate to change the settings on an
existing topic, I do think it's advantageous to at least notify the
connector (or task) prior to the first record sent to a given topic so that
the connector can fail or issue a warning if it doesn't meet its
requirements.
    
    Best regards,

Randall

On Wed, Oct 4, 2017 at 12:52 AM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi Randall,
>
> Thanks for the KIP. I like it
> What happens when the target topic is already created but the configs do
> not match?
> i.e. wrong RF, num partitions, or missing / additional configs? Will you
> attempt to apply the necessary changes or throw an error?
>
> Thanks!
> Stephane
>
>
> On 24/5/17, 5:59 am, "Mathieu Fenniak" 
> wrote:
>
> Ah, yes, I see you a highlighted part that should've made this clear
> to me the first read. :-)  Much clearer now!
>
> By the way, enjoyed your Debezium talk in NYC.
>
> Looking forward to this Kafka Connect change; it will allow me to
> remove a post-deployment tool that I hacked together for the purpose
> of ensuring auto-created topics have the right config.
>
> Mathieu
>
>
> On Tue, May 23, 2017 at 11:38 AM, Randall Hauch 
> wrote:
> > Thanks for the quick feedback, Mathieu. Yes, the first configuration
> rule
> > whose regex matches will be applied, and no other rules will be
> used. I've
> > updated the KIP to try to make this more clear, but let me know if
> it's
> > still not clear.
> >
> > Best regards,
> >
> > Randall
> >
> > On Tue, May 23, 2017 at 10:07 AM, Mathieu Fenniak <
> > mathieu.fenn...@replicon.com> wrote:
> >
> >> Hi Randall,
> >>
> >> Awesome, very much looking forward to this.
> >>
> >> It isn't 100% clear from the KIP how multiple config-based rules
> would
> >> be applied; it looks like the first configuration rule whose regex
> >> matches the topic name will be used, and no other rules will be
> >> applied.  Is that correct?  (I wasn't sure if it might cascade
> >> together multiple matching rules...)
> >>
> >> Looks great,
> >>
> >> Mathieu
> >>
> >>
> >> On Mon, May 22, 2017 at 1:43 PM, Randall Hauch 
> wrote:
> >> > Hi, all.
> >> >
> >> > We recently added the ability for Kafka Connect to create
> *internal*
> >> topics
> >> > using the new AdminClient, but it still would be great if Kafka
> Connect
> >> > could do this for new topics that result from source connector
> records.
> >> > I've outlined an approach to do this in "KIP-158 Kafka Connect
> should
> >> allow
> >> > source connectors

Kafka Connect Source questions

2017-10-11 Thread Stephane Maarek
Hi,

 

I had a look at the Connect Source Worker code and have two questions:
When a Source Task commits offsets, does it perform compaction / optimisation 
before sending off? E.g.  I read from 1 source partition, and I read 1000 
messages. Will the offset flush send 1000 messages to the offset storage, or 
just 1 (the last one)?
I don’t really understand why WorkerSourceTask is trying to flush outstanding 
messages before committing the offsets? (cf 
https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L328
 ).  
I would believe that committing the offsets would just commit the offsets for 
the messages we know for sure have been flushed at the moment the commit is 
requested. That would remove one massive timeout from happening if the source 
task pulls a lot of message and the producer is overwhelmed / can’t complete 
the message flush in the 5 seconds of timeout.  
 

Thanks a lot for the responses. I may open JIRAs based on the answers of the 
questions, if that would help bring some performance improvements. 

 

Stephane



Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

2017-10-28 Thread Stephane Maarek
Hi Jun,

I think this is very helpful. Restarting Kafka brokers in case of zookeeper
host change is not a well known operation.

Few questions:
1) would it not be worth fixing the problem at the source ? This has been
stuck for a while though, maybe a little push would help :
https://issues.apache.org/jira/plugins/servlet/mobile#issue/ZOOKEEPER-2184

2) upon recreating the zookeeper object , is it not possible to invalidate
the DNS cache so that it resolves the new hostname?

3) could the cluster be down in this situation: one migrates an entire
zookeeper cluster to new machines (one by one). The quorum is still alive
without downtime, but now every broker in a cluster can't resolve zookeeper
at the same time. They all shut down at the same time after the new
time-out setting.

Thanks !
Stéphane

On 28 Oct. 2017 9:42 am, "Jun Rao"  wrote:

> Hi, Everyone,
>
> We created "KIP-217: Expose a timeout to allow an expired ZK session to be
> re-created".
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+to+be+re-created
>
> Please take a look and provide your feedback.
>
> Thanks,
>
> Jun
>


Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

2017-10-31 Thread Stephane Maarek
Hi Jun,

Thanks for the reply.

1) The reason I'm asking about it is I wonder if it's not worth focusing the 
development efforts on taking ownership of the existing PR 
(https://github.com/apache/zookeeper/pull/150)  to fix ZOOKEEPER-2184, rebase 
it and have it merged into the ZK codebase shortly.  I feel this KIP might 
introduce a setting that could be deprecated shortly and confuse the end user a 
bit further with one more knob to turn.

3) I'm not sure if I fully understand, sorry for the beginner's question: if 
the default timeout is infinite, then it won't change anything to how Kafka 
works from today, does it? (unless I'm missing something sorry). If not set to 
infinite, then we introduce the risk of a whole cluster shutting down at once?

Thanks,
Stephane

On 31/10/17, 1:00 pm, "Jun Rao"  wrote:

Hi, Stephane,

Thanks for the reply.

1) Fixing the issue in ZK will be ideal. Not sure when it will happen
though. Once it's fixed, we can probably deprecate this config.

2) That could be useful. Is there a java api to do that at runtime? Also,
invalidating DNS cache doesn't always fix the issue of unresolved host. In
some of the cases, human intervention is needed.

3) The default timeout is infinite though.

Jun

    
    On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi Jun,
>
> I think this is very helpful. Restarting Kafka brokers in case of 
zookeeper
> host change is not a well known operation.
>
> Few questions:
> 1) would it not be worth fixing the problem at the source ? This has been
> stuck for a while though, maybe a little push would help :
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/ZOOKEEPER-2184
>
> 2) upon recreating the zookeeper object , is it not possible to invalidate
> the DNS cache so that it resolves the new hostname?
>
> 3) could the cluster be down in this situation: one migrates an entire
> zookeeper cluster to new machines (one by one). The quorum is still alive
> without downtime, but now every broker in a cluster can't resolve 
zookeeper
> at the same time. They all shut down at the same time after the new
> time-out setting.
>
> Thanks !
> Stéphane
>
> On 28 Oct. 2017 9:42 am, "Jun Rao"  wrote:
>
> > Hi, Everyone,
> >
> > We created "KIP-217: Expose a timeout to allow an expired ZK session to
> be
> > re-created".
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+to+be+re-created
> >
> > Please take a look and provide your feedback.
> >
> > Thanks,
> >
> > Jun
> >
>





Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

2017-11-01 Thread Stephane Maarek
Thanks Jun for the clarification

It sounds like this kip is complementary to the zookeeper-2184 and can move
forward without it. We should still push hard for zookeeper-2184 to go
through (saw you commented on it earlier)

LGTM!

On 2 Nov. 2017 12:34 pm, "Jun Rao"  wrote:

> Hi, Stephane,
>
> 3) The difference is that currently, there is no retry when re-creating the
> Zookeeper object when a ZK session expires. So, if the re-creation of
> Zookeeper fails, the broker just logs the error and the Zookeeper object
> will never be created again. With this KIP, we will keep retrying the
> creation of Zookeeper until success.
>
> Thanks,
>
> Jun
>
> On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > Hi Jun,
> >
> > Thanks for the reply.
> >
> > 1) The reason I'm asking about it is I wonder if it's not worth focusing
> > the development efforts on taking ownership of the existing PR (
> > https://github.com/apache/zookeeper/pull/150)  to fix ZOOKEEPER-2184,
> > rebase it and have it merged into the ZK codebase shortly.  I feel this
> KIP
> > might introduce a setting that could be deprecated shortly and confuse
> the
> > end user a bit further with one more knob to turn.
> >
> > 3) I'm not sure if I fully understand, sorry for the beginner's question:
> > if the default timeout is infinite, then it won't change anything to how
> > Kafka works from today, does it? (unless I'm missing something sorry). If
> > not set to infinite, then we introduce the risk of a whole cluster
> shutting
> > down at once?
> >
> > Thanks,
> > Stephane
> >
> > On 31/10/17, 1:00 pm, "Jun Rao"  wrote:
> >
> > Hi, Stephane,
> >
> > Thanks for the reply.
> >
> > 1) Fixing the issue in ZK will be ideal. Not sure when it will happen
> > though. Once it's fixed, we can probably deprecate this config.
> >
> > 2) That could be useful. Is there a java api to do that at runtime?
> > Also,
> > invalidating DNS cache doesn't always fix the issue of unresolved
> > host. In
> > some of the cases, human intervention is needed.
> >
> > 3) The default timeout is infinite though.
> >
> > Jun
> >
> >
> > On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
> > steph...@simplemachines.com.au> wrote:
> >
> > > Hi Jun,
> > >
> > > I think this is very helpful. Restarting Kafka brokers in case of
> > zookeeper
> > > host change is not a well known operation.
> > >
> > > Few questions:
> > > 1) would it not be worth fixing the problem at the source ? This
> has
> > been
> > > stuck for a while though, maybe a little push would help :
> > > https://issues.apache.org/jira/plugins/servlet/mobile#
> > issue/ZOOKEEPER-2184
> > >
> > > 2) upon recreating the zookeeper object , is it not possible to
> > invalidate
> > > the DNS cache so that it resolves the new hostname?
> > >
> > > 3) could the cluster be down in this situation: one migrates an
> > entire
> > > zookeeper cluster to new machines (one by one). The quorum is still
> > alive
> > > without downtime, but now every broker in a cluster can't resolve
> > zookeeper
> > > at the same time. They all shut down at the same time after the new
> > > time-out setting.
> > >
> > > Thanks !
> > > Stéphane
> > >
> > > On 28 Oct. 2017 9:42 am, "Jun Rao"  wrote:
> > >
> > > > Hi, Everyone,
> > > >
> > > > We created "KIP-217: Expose a timeout to allow an expired ZK
> > session to
> > > be
> > > > re-created".
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> > to+be+re-created
> > > >
> > > > Please take a look and provide your feedback.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > >
> >
> >
> >
> >
>


Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

2017-11-02 Thread Stephane Maarek
Hi Jun

I think this is a better option. Would that change require a kip then as
it's not a change in public API ?

@ted it was marked as a blocked for 3.4.11 but they pushed it. It seems
that the owner of the pr hasn't acted in over a year and I think someone
needs to take ownership of that. Additionally, this would be a change in
Kafka zookeeper client dependency, so no need to update your zookeeper
quorum to benefit from the change

Thanks
Stéphane


On 3 Nov. 2017 8:45 am, "Jun Rao"  wrote:

Stephane, Jeff,

Another option is to not expose the reconnect timeout config and just retry
the creation of Zookeeper forever. This is an improvement from the current
situation and if zookeeper-2184 is fixed in the future, we don't need to
deprecate the config.

Thanks,

Jun

On Thu, Nov 2, 2017 at 9:02 AM, Ted Yu  wrote:

> ZOOKEEPER-2184 is scheduled for 3.4.12 whose release is unknown.
>
> I think adding the session recreation on Kafka side should benefit Kafka
> users, especially those who don't plan to move to 3.4.12+ in the near
> future.
>
> On Wed, Nov 1, 2017 at 6:34 PM, Jun Rao  wrote:
>
> > Hi, Stephane,
> >
> > 3) The difference is that currently, there is no retry when re-creating
> the
> > Zookeeper object when a ZK session expires. So, if the re-creation of
> > Zookeeper fails, the broker just logs the error and the Zookeeper object
> > will never be created again. With this KIP, we will keep retrying the
> > creation of Zookeeper until success.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
> > steph...@simplemachines.com.au> wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the reply.
> > >
> > > 1) The reason I'm asking about it is I wonder if it's not worth
> focusing
> > > the development efforts on taking ownership of the existing PR (
> > > https://github.com/apache/zookeeper/pull/150)  to fix ZOOKEEPER-2184,
> > > rebase it and have it merged into the ZK codebase shortly.  I feel
this
> > KIP
> > > might introduce a setting that could be deprecated shortly and confuse
> > the
> > > end user a bit further with one more knob to turn.
> > >
> > > 3) I'm not sure if I fully understand, sorry for the beginner's
> question:
> > > if the default timeout is infinite, then it won't change anything to
> how
> > > Kafka works from today, does it? (unless I'm missing something sorry).
> If
> > > not set to infinite, then we introduce the risk of a whole cluster
> > shutting
> > > down at once?
> > >
> > > Thanks,
> > > Stephane
> > >
> > > On 31/10/17, 1:00 pm, "Jun Rao"  wrote:
> > >
> > > Hi, Stephane,
> > >
> > > Thanks for the reply.
> > >
> > > 1) Fixing the issue in ZK will be ideal. Not sure when it will
> happen
> > > though. Once it's fixed, we can probably deprecate this config.
> > >
> > > 2) That could be useful. Is there a java api to do that at
runtime?
> > > Also,
> > > invalidating DNS cache doesn't always fix the issue of unresolved
> > > host. In
> > > some of the cases, human intervention is needed.
> > >
> > > 3) The default timeout is infinite though.
> > >
> > > Jun
> > >
> > >
> > > On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
> > > steph...@simplemachines.com.au> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > I think this is very helpful. Restarting Kafka brokers in case
of
> > > zookeeper
> > > > host change is not a well known operation.
> > > >
> > > > Few questions:
> > > > 1) would it not be worth fixing the problem at the source ? This
> > has
> > > been
> > > > stuck for a while though, maybe a little push would help :
> > > > https://issues.apache.org/jira/plugins/servlet/mobile#
> > > issue/ZOOKEEPER-2184
> > > >
> > > > 2) upon recreating the zookeeper object , is it not possible to
> > > invalidate
> > > > the DNS cache so that it resolves the new hostname?
> > > >
> > > > 3) could the cluster be down in this situation: one migrates an
> > > entire
> > > > zookeeper cluster to new machines (one by one). The quorum is
> still
> > > alive
> > > > without downtime, but now every broker in a cluster can't
resolve
> > > zookeeper
> > > > at the same time. They all shut down at the same time after the
> new
> > > > time-out setting.
> > > >
> > > > Thanks !
> > > > Stéphane
> > > >
> > > > On 28 Oct. 2017 9:42 am, "Jun Rao"  wrote:
> > > >
> > > > > Hi, Everyone,
> > > > >
> > > > > We created "KIP-217: Expose a timeout to allow an expired ZK
> > > session to
> > > > be
> > > > > re-created".
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> > > to+be+re-created
> > > > >
> > > > > Please take a look and provide your feedback.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > >
> > >
> > >
> > >
> > >
> >
>


Re: [VOTE] KIP-215: Add topic regex support for Connect sinks

2017-11-03 Thread Stephane Maarek
+1! The S3 or hdfs connector will now be super powerful !

On 4 Nov. 2017 11:27 am, "Konstantine Karantasis" 
wrote:

> Nice addition!
>
> +1 (non-binding)
>
> Konstantine
>
> On Fri, Nov 3, 2017 at 4:52 PM, Jeff Klukas  wrote:
>
> > So sorry for skirting the process there. I wasn't aware of the 72 hour
> > window and I don't see that mentioned in in
> > https://cwiki.apache.org/confluence/display/KAFKA/Bylaws#Bylaws-Voting
> >
> > Should I feel free to update that wiki page with a note about the window?
> >
> > On Fri, Nov 3, 2017 at 7:49 PM, Ewen Cheslack-Postava  >
> > wrote:
> >
> > > Jeff,
> > >
> > > Just FYI re: process, I think you're pretty much definitely in the
> clear
> > > hear since this one is a straightforward design I doubt anybody would
> > > object to, but voting normally stays open 72h to ensure everyone has a
> > > chance to weigh in.
> > >
> > > Again thanks for the KIP and we can move any final discussion over to
> the
> > > PR!
> > >
> > > -Ewen
> > >
> > > On Fri, Nov 3, 2017 at 4:43 PM, Jeff Klukas 
> wrote:
> > >
> > > > Looks like we've achieved lazy majority, so I'll move the KIP to
> > > approved.
> > > >
> > > > Thanks all for looking this over.
> > > >
> > > > On Fri, Nov 3, 2017 at 7:31 PM, Jason Gustafson 
> > > > wrote:
> > > >
> > > > > +1. Thanks for the KIP!
> > > > >
> > > > > On Fri, Nov 3, 2017 at 2:15 PM, Guozhang Wang 
> > > > wrote:
> > > > >
> > > > > > +1 binding
> > > > > >
> > > > > > On Fri, Nov 3, 2017 at 1:25 PM, Ewen Cheslack-Postava <
> > > > e...@confluent.io
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1 binding
> > > > > > >
> > > > > > > Thanks Jeff!
> > > > > > >
> > > > > > > On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch <
> rha...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > Thanks for pushing this through. Great work!
> > > > > > > >
> > > > > > > > Randall Hauch
> > > > > > > >
> > > > > > > > On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas <
> > jklu...@simple.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > I haven't heard any additional concerns over the proposal,
> so
> > > I'd
> > > > > > like
> > > > > > > to
> > > > > > > > > get the voting process started for:
> > > > > > > > >
> > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > 215%3A+Add+topic+regex+support+for+Connect+sinks
> > > > > > > > >
> > > > > > > > > It adds a topics.regex option for Kafka Connect sinks as an
> > > > > > alternative
> > > > > > > > to
> > > > > > > > > the existing required topics option.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-201: Rationalising policy interfaces

2017-11-07 Thread Stephane Maarek
Hi Tom,

What's the status of this? I was about to create a KIP to implement a
SimpleCreateTopicPolicy
(and Alter, etc...)
These policies would have some most basic parameter to check for
replication factor and min insync replicas (mostly) so that end users can
leverage them out of the box. This KIP obviously changes the interface so
I'd like this to be in before I propose my KIP

I'll add my +1 to this, and hopefully we get quick progress so I can
propose my KIP.

Finally, have the packages been discussed?
I find it extremely awkward to have the current CreateTopicPolicy part of
the kafka-clients package, and would love to see the next classes you're
implementing appear in core/src/main/scala/kafka/policy or server/policy.
Unless I'm missing something?

Thanks for driving this
Stephane

Kind regards,
Stephane

[image: Simple Machines]

Stephane Maarek | Developer

+61 416 575 980
steph...@simplemachines.com.au
simplemachines.com.au
Level 2, 145 William Street, Sydney NSW 2010

On 25 October 2017 at 19:45, Tom Bentley  wrote:

> It's been two weeks since I started the vote on this KIP and although there
> are two votes so far there are no binding votes. Any feedback from
> committers would be appreciated.
>
> Thanks,
>
> Tom
>
> On 12 October 2017 at 10:09, Edoardo Comar  wrote:
>
> > Thanks Tom with the last additions (changes to the protocol) it now
> > supersedes KIP-170
> >
> > +1 non-binding
> > --
> >
> > Edoardo Comar
> >
> > IBM Message Hub
> >
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> >
> >
> > From:   Tom Bentley 
> > To: dev@kafka.apache.org
> > Date:   11/10/2017 09:21
> > Subject:[VOTE] KIP-201: Rationalising policy interfaces
> >
> >
> >
> > I would like to start a vote on KIP-201, which proposes to replace the
> > existing policy interfaces with a single new policy interface that also
> > extends policy support to cover new and existing APIs in the AdminClient.
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> > apache.org_confluence_display_KAFKA_KIP-2D201-253A-
> > 2BRationalising-2BPolicy-2Binterfaces&d=DwIBaQ&c=jf_
> iaSHvJObTbx-siA1ZOg&r=
> > EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=tE3xo2lmmoCoFZAX60PBT-
> > J8TBDWcv-tarJyAlgwfJY&s=puFqZ3Ny4Xcdil5A5huwA5WZtS3WZpD9517uJkCgrCk&e=
> >
> >
> > Thanks for your time.
> >
> > Tom
> >
> >
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> >
>


Re: [VOTE] KIP-201: Rationalising policy interfaces

2017-11-07 Thread Stephane Maarek
Hi Tom,

Regarding the java / scala compilation, I believe this is fine (the
compiler will know), but any reason why you don't want the policy to be
implemented using Scala ? (like the Authorizer)
It's usually not best practice to mix in scala / java code.

Thanks!
Stephane

Kind regards,
Stephane

[image: Simple Machines]

Stephane Maarek | Developer

+61 416 575 980
steph...@simplemachines.com.au
simplemachines.com.au
Level 2, 145 William Street, Sydney NSW 2010

On 7 November 2017 at 20:27, Tom Bentley  wrote:

> Hi Stephane,
>
> The vote on this KIP is on-going.
>
> I think it would be OK to make minor changes, but Edoardo and Mickael would
> have to to not disagree with them.
>
> The packages have not been brought up as a problem before now. I don't know
> the reason they're in the client's package, but I agree that it's not
> ideal. To me the situation with the policies is analogous to the situation
> with the Authorizer which is in core: They're both broker-side extensions
> points which users can provide their own implementations of. I don't know
> whether the scala compiler is OK compiling interdependent scala and java
> code (maybe Ismael knows?), but if it is, I would be happy if these
> server-side policies were moved.
>
> Cheers,
>
> Tom
>
> On 7 November 2017 at 08:45, Stephane Maarek  au
> > wrote:
>
> > Hi Tom,
> >
> > What's the status of this? I was about to create a KIP to implement a
> > SimpleCreateTopicPolicy
> > (and Alter, etc...)
> > These policies would have some most basic parameter to check for
> > replication factor and min insync replicas (mostly) so that end users can
> > leverage them out of the box. This KIP obviously changes the interface so
> > I'd like this to be in before I propose my KIP
> >
> > I'll add my +1 to this, and hopefully we get quick progress so I can
> > propose my KIP.
> >
> > Finally, have the packages been discussed?
> > I find it extremely awkward to have the current CreateTopicPolicy part of
> > the kafka-clients package, and would love to see the next classes you're
> > implementing appear in core/src/main/scala/kafka/policy or
> server/policy.
> > Unless I'm missing something?
> >
> > Thanks for driving this
> > Stephane
> >
> > Kind regards,
> > Stephane
> >
> > [image: Simple Machines]
> >
> > Stephane Maarek | Developer
> >
> > +61 416 575 980
> > steph...@simplemachines.com.au
> > simplemachines.com.au
> > Level 2, 145 William Street, Sydney NSW 2010
> >
> > On 25 October 2017 at 19:45, Tom Bentley  wrote:
> >
> > > It's been two weeks since I started the vote on this KIP and although
> > there
> > > are two votes so far there are no binding votes. Any feedback from
> > > committers would be appreciated.
> > >
> > > Thanks,
> > >
> > > Tom
> > >
> > > On 12 October 2017 at 10:09, Edoardo Comar  wrote:
> > >
> > > > Thanks Tom with the last additions (changes to the protocol) it now
> > > > supersedes KIP-170
> > > >
> > > > +1 non-binding
> > > > --
> > > >
> > > > Edoardo Comar
> > > >
> > > > IBM Message Hub
> > > >
> > > > IBM UK Ltd, Hursley Park, SO21 2JN
> > > >
> > > >
> > > >
> > > > From:   Tom Bentley 
> > > > To: dev@kafka.apache.org
> > > > Date:   11/10/2017 09:21
> > > > Subject:[VOTE] KIP-201: Rationalising policy interfaces
> > > >
> > > >
> > > >
> > > > I would like to start a vote on KIP-201, which proposes to replace
> the
> > > > existing policy interfaces with a single new policy interface that
> also
> > > > extends policy support to cover new and existing APIs in the
> > AdminClient.
> > > >
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> > > > apache.org_confluence_display_KAFKA_KIP-2D201-253A-
> > > > 2BRationalising-2BPolicy-2Binterfaces&d=DwIBaQ&c=jf_
> > > iaSHvJObTbx-siA1ZOg&r=
> > > > EzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ&m=tE3xo2lmmoCoFZAX60PBT-
> > > > J8TBDWcv-tarJyAlgwfJY&s=puFqZ3Ny4Xcdil5A5huwA5WZtS3WZp
> D9517uJkCgrCk&e=
> > > >
> > > >
> > > > Thanks for your time.
> > > >
> > > > Tom
> > > >
> > > >
> > > >
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> > number
> > > > 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> > > 3AU
> > > >
> > >
> >
>


Re: [VOTE] KIP-201: Rationalising policy interfaces

2017-11-07 Thread Stephane Maarek
Thanks !

How about a java folder package in the core then ? It's not a separate jar
and it's still java?

Nonetheless I agree these are details. I just got really confused when
trying to write my policy and would hope that confusion is not shared by
others because it's a "client " class although should only reside within a
broker

On 7 Nov. 2017 9:04 pm, "Ismael Juma"  wrote:

The location of the policies is fine. Note that the package _does not_
include clients in the name. If we ever have enough server side only code
to merit a separate JAR, we can do that and it's mostly compatible (users
would only have to update their build dependency). Generally, all public
APIs going forward will be in Java.

Ismael

On Tue, Nov 7, 2017 at 9:44 AM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi Tom,
>
> Regarding the java / scala compilation, I believe this is fine (the
> compiler will know), but any reason why you don't want the policy to be
> implemented using Scala ? (like the Authorizer)
> It's usually not best practice to mix in scala / java code.
>
> Thanks!
> Stephane
>
> Kind regards,
> Stephane
>
> [image: Simple Machines]
>
> Stephane Maarek | Developer
>
> +61 416 575 980
> steph...@simplemachines.com.au
> simplemachines.com.au
> Level 2, 145 William Street, Sydney NSW 2010
>
> On 7 November 2017 at 20:27, Tom Bentley  wrote:
>
> > Hi Stephane,
> >
> > The vote on this KIP is on-going.
> >
> > I think it would be OK to make minor changes, but Edoardo and Mickael
> would
> > have to to not disagree with them.
> >
> > The packages have not been brought up as a problem before now. I don't
> know
> > the reason they're in the client's package, but I agree that it's not
> > ideal. To me the situation with the policies is analogous to the
> situation
> > with the Authorizer which is in core: They're both broker-side
extensions
> > points which users can provide their own implementations of. I don't
know
> > whether the scala compiler is OK compiling interdependent scala and java
> > code (maybe Ismael knows?), but if it is, I would be happy if these
> > server-side policies were moved.
> >
> > Cheers,
> >
> > Tom
> >
> > On 7 November 2017 at 08:45, Stephane Maarek <
> steph...@simplemachines.com.
> > au
> > > wrote:
> >
> > > Hi Tom,
> > >
> > > What's the status of this? I was about to create a KIP to implement a
> > > SimpleCreateTopicPolicy
> > > (and Alter, etc...)
> > > These policies would have some most basic parameter to check for
> > > replication factor and min insync replicas (mostly) so that end users
> can
> > > leverage them out of the box. This KIP obviously changes the interface
> so
> > > I'd like this to be in before I propose my KIP
> > >
> > > I'll add my +1 to this, and hopefully we get quick progress so I can
> > > propose my KIP.
> > >
> > > Finally, have the packages been discussed?
> > > I find it extremely awkward to have the current CreateTopicPolicy part
> of
> > > the kafka-clients package, and would love to see the next classes
> you're
> > > implementing appear in core/src/main/scala/kafka/policy or
> > server/policy.
> > > Unless I'm missing something?
> > >
> > > Thanks for driving this
> > > Stephane
> > >
> > > Kind regards,
> > > Stephane
> > >
> > > [image: Simple Machines]
> > >
> > > Stephane Maarek | Developer
> > >
> > > +61 416 575 980
> > > steph...@simplemachines.com.au
> > > simplemachines.com.au
> > > Level 2, 145 William Street, Sydney NSW 2010
> > >
> > > On 25 October 2017 at 19:45, Tom Bentley 
> wrote:
> > >
> > > > It's been two weeks since I started the vote on this KIP and
although
> > > there
> > > > are two votes so far there are no binding votes. Any feedback from
> > > > committers would be appreciated.
> > > >
> > > > Thanks,
> > > >
> > > > Tom
> > > >
> > > > On 12 October 2017 at 10:09, Edoardo Comar 
> wrote:
> > > >
> > > > > Thanks Tom with the last additions (changes to the protocol) it
now
> > > > > supersedes KIP-170
> > > > >
> > > > > +1 non-binding
> > > > > --
> > >

Re: [VOTE] KIP-201: Rationalising policy interfaces

2017-11-07 Thread Stephane Maarek
Okay makes sense thanks! As you said maybe in the future (or now), it's worth 
starting a server java dependency jar that's not called "client".
Probably a debate for another day (

Tom, crossing fingers to see more votes on this! Good stuff
 

On 7/11/17, 9:51 pm, "Ismael Juma"  wrote:

The idea is that you only depend on a Java jar. The core jar includes the
Scala version in the name and you should not care about that when
implementing a Java interface.

Ismael

On Tue, Nov 7, 2017 at 10:37 AM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Thanks !
>
> How about a java folder package in the core then ? It's not a separate jar
> and it's still java?
>
> Nonetheless I agree these are details. I just got really confused when
> trying to write my policy and would hope that confusion is not shared by
> others because it's a "client " class although should only reside within a
> broker
>
> On 7 Nov. 2017 9:04 pm, "Ismael Juma"  wrote:
>
> The location of the policies is fine. Note that the package _does not_
> include clients in the name. If we ever have enough server side only code
> to merit a separate JAR, we can do that and it's mostly compatible (users
> would only have to update their build dependency). Generally, all public
> APIs going forward will be in Java.
>
> Ismael
>
> On Tue, Nov 7, 2017 at 9:44 AM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> > Hi Tom,
> >
> > Regarding the java / scala compilation, I believe this is fine (the
> > compiler will know), but any reason why you don't want the policy to be
> > implemented using Scala ? (like the Authorizer)
> > It's usually not best practice to mix in scala / java code.
> >
> > Thanks!
> > Stephane
> >
> > Kind regards,
> > Stephane
> >
> > [image: Simple Machines]
> >
> > Stephane Maarek | Developer
> >
> > +61 416 575 980
> > steph...@simplemachines.com.au
> > simplemachines.com.au
> > Level 2, 145 William Street, Sydney NSW 2010
> >
> > On 7 November 2017 at 20:27, Tom Bentley  wrote:
> >
> > > Hi Stephane,
> > >
> > > The vote on this KIP is on-going.
> > >
> > > I think it would be OK to make minor changes, but Edoardo and Mickael
> > would
> > > have to to not disagree with them.
> > >
> > > The packages have not been brought up as a problem before now. I don't
> > know
> > > the reason they're in the client's package, but I agree that it's not
> > > ideal. To me the situation with the policies is analogous to the
> > situation
> > > with the Authorizer which is in core: They're both broker-side
> extensions
> > > points which users can provide their own implementations of. I don't
> know
> > > whether the scala compiler is OK compiling interdependent scala and
> java
> > > code (maybe Ismael knows?), but if it is, I would be happy if these
> > > server-side policies were moved.
> > >
> > > Cheers,
> > >
> > > Tom
> > >
> > > On 7 November 2017 at 08:45, Stephane Maarek <
> > steph...@simplemachines.com.
> > > au
> > > > wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > What's the status of this? I was about to create a KIP to implement 
a
> > > > SimpleCreateTopicPolicy
> > > > (and Alter, etc...)
> > > > These policies would have some most basic parameter to check for
> > > > replication factor and min insync replicas (mostly) so that end 
users
> > can
> > > > leverage them out of the box. This KIP obviously changes the
> interface
> > so
> > > > I'd like this to be in before I propose my KIP
> > > >
> > > > I'll add my +1 to this, and hopefully we get quick progress so I can
> > > > propose my KIP.
> > > >
> > > > Finally, have the packages been discussed?
> > > > I find it extremely awkward to have the current CreateTopicPolicy
> part
> > of
> > > > the kafka-clients package, and w

Re: [DISCUSS] Kafka 2.0.0 in June 2018

2017-11-09 Thread Stephane Maarek
I'm very happy with the milestones but worried about the versioning number.
It seems it will mostly bring stuff out of deprecation vs actually bringing
in breaking features. A 2.0 to me should bring something major to the
table, possibly breaking, which would justify a big number hop. I'm still
new to software development in the oss, but that's my two cents

On 9 Nov. 2017 8:44 pm, "Ismael Juma"  wrote:

> Hi all,
>
> I'm starting this discussion early because of the potential impact.
>
> Kafka 1.0.0 was just released and the focus was on achieving the original
> project vision in terms of features provided while maintaining
> compatibility for the most part (i.e. we did not remove deprecated
> components like the Scala clients).
>
> This was the right decision, in my opinion, but it's time to start thinking
> about 2.0.0, which is an opportunity for us to remove major deprecated
> components and to benefit from Java 8 language enhancements (so that we can
> move faster). So, I propose the following for Kafka 2.0.0:
>
> 1. It should be released in June 2018
> 2. The Scala clients (Consumer, SimpleConsumer, Producer, SyncProducer)
> will be removed
> 3. Java 8 or higher will be required, i.e. support for Java 7 will be
> dropped.
>
> Thoughts?
>
> Ismael
>


Re: [DISCUSS] KIP-227: Introduce Incremental FetchRequests to Increase Partition Scalability

2017-11-22 Thread Stephane Maarek
We have seen clusters with a few thousand topic getting idle CPU of 20
percent or more. These may be due to these fetch request. It seems your kip
would address scalability by a lot in terms of (dormant) partitions, so I'm
excited for this change

On 22 Nov. 2017 10:01 pm, "Mickael Maison"  wrote:

That's an interesting idea. In our clusters, we definitively feel the
cost of unused partitions and I think it's one of these areas where
Kafka could improve.

On Wed, Nov 22, 2017 at 6:11 AM, Jun Rao  wrote:
> Hi, Jay,
>
> I guess in your proposal the leader has to cache the last offset given
back
> for each partition so that it knows from which offset to serve the next
> fetch request. This is doable but it means that the leader needs to do an
> additional index lookup per partition to serve a fetch request. Not sure
if
> the benefit from the lighter fetch request obviously offsets the
additional
> index lookup though.
>
> Thanks,
>
> Jun
>
> On Tue, Nov 21, 2017 at 7:03 PM, Jay Kreps  wrote:
>
>> I think the general thrust of this makes a ton of sense.
>>
>> I don't love that we're introducing a second type of fetch request. I
think
>> the motivation is for compatibility, right? But isn't that what
versioning
>> is for? Basically to me although the modification we're making makes
sense,
>> the resulting protocol doesn't really seem like something you would
design
>> this way from scratch.
>>
>> I think I may be misunderstanding the semantics of the partitions in
>> IncrementalFetchRequest. I think the intention is that the server
remembers
>> the partitions you last requested, and the partitions you specify in the
>> request are added to this set. This is a bit odd though because you can
add
>> partitions but I don't see how you remove them, so it doesn't really let
>> you fully make changes incrementally. I suspect I'm misunderstanding that
>> somehow, though. You'd also need to be a little bit careful that there
was
>> no way for the server's idea of what the client is interested in and the
>> client's idea to ever diverge as you made these modifications over time
>> (due to bugs or whatever).
>>
>> It seems like an alternative would be to not add a second request, but
>> instead change the fetch api and implementation
>>
>>1. We save the partitions you last fetched on that connection in the
>>session for the connection (as I think you are proposing)
>>2. It only gives you back info on partitions that have data or have
>>changed (no reason you need the others, right?)
>>3. Not specifying any partitions means "give me the usual", as defined
>>by whatever you requested before attached to the session.
>>
>> This would be a new version of the fetch API, so compatibility would be
>> retained by retaining the older version as is.
>>
>> This seems conceptually simpler to me. It's true that you have to resend
>> the full set whenever you want to change it, but that actually seems less
>> error prone and that should be rare.
>>
>> I suspect you guys thought about this and it doesn't quite work, but
maybe
>> you could explain why?
>>
>> -Jay
>>
>> On Tue, Nov 21, 2017 at 1:02 PM, Colin McCabe  wrote:
>>
>> > Hi all,
>> >
>> > I created a KIP to improve the scalability and latency of FetchRequest:
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 227%3A+Introduce+Incremental+FetchRequests+to+Increase+
>> > Partition+Scalability
>> >
>> > Please take a look.
>> >
>> > cheers,
>> > Colin
>> >
>>


RE: [DISCUSS]KIP-235 DNS alias and secured connections

2017-12-06 Thread Stephane Maarek
Hi Jonathan

I think this will be very useful. I reported something similar here :
https://issues.apache.org/jira/browse/KAFKA-4781

Please confirm your kip will address it ?

Stéphane

On 6 Dec. 2017 8:20 pm, "Skrzypek, Jonathan" 
wrote:

> True, amended the KIP, thanks.
>
> Jonathan Skrzypek
> Middleware Engineering
> Messaging Engineering
> Goldman Sachs International
>
>
> -Original Message-
> From: Tom Bentley [mailto:t.j.bent...@gmail.com]
> Sent: 05 December 2017 18:19
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS]KIP-235 DNS alias and secured connections
>
> Hi Jonathan,
>
> It might be worth mentioning in the KIP that this is necessary only for
> *Kerberos* on SASL, and not other SASL mechanisms. Reading the JIRA it
> makes sensem, but I was confused up until that point.
>
> Cheers,
>
> Tom
>
> On 5 December 2017 at 17:53, Skrzypek, Jonathan 
> wrote:
>
> > Hi,
> >
> > I would like to discuss a KIP I've submitted :
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> > confluence_display_KAFKA_KIP-2D&d=DwIBaQ&c=7563p3e2zaQw0AB1wrFVgyagb2I
> > E5rTZOYPxLxfZlX4&r=nNmJlu1rR_QFAPdxGlafmDu9_r6eaCbPOM0NM1EHo-E&m=GWKXA
> > ILbqxFU2j7LtoOx9MZ00uy_jJcGWWIG92CyAuc&s=fv5WAkOgLhVOmF4vhEzq_39CWnEo0
> > q0AJbqhAuDFDT0&e= 235%3A+Add+DNS+alias+support+for+secured+connection
> >
> > Feedback and suggestions welcome !
> >
> > Regards,
> > Jonathan Skrzypek
> > Middleware Engineering
> > Messaging Engineering
> > Goldman Sachs International
> > Christchurch Court - 10-15 Newgate Street London EC1A 7HD
> > Tel: +442070512977
> >
> >
>


Re: [VOTE] 2.2.0 RC0

2019-02-24 Thread Stephane Maarek
Hi Matthias

Thanks for this
Running through the list of KIPs. I think this is not included in 2.2:

- Allow clients to suppress auto-topic-creation

Regards
Stephane

On Sun, Feb 24, 2019 at 1:03 AM Matthias J. Sax 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for the release of Apache Kafka 2.2.0.
>
> This is a minor release with the follow highlight:
>
>  - Added SSL support for custom principle name
>  - Allow SASL connections to periodically re-authenticate
>  - Improved consumer group management
>- default group.id is `null` instead of empty string
>  - Add --under-min-isr option to describe topics command
>  - Allow clients to suppress auto-topic-creation
>  - API improvement
>- Producer: introduce close(Duration)
>- AdminClient: introduce close(Duration)
>- Kafka Streams: new flatTransform() operator in Streams DSL
>- KafkaStreams (and other classed) now implement AutoClosable to
> support try-with-resource
>- New Serdes and default method implementations
>  - Kafka Streams exposed internal client.id via ThreadMetadata
>  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will now
> output `NaN` as default value
>
>
> Release notes for the 2.2.0 release:
> http://home.apache.org/~mjsax/kafka-2.2.0-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Friday, March 1, 9am PST.
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~mjsax/kafka-2.2.0-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> http://home.apache.org/~mjsax/kafka-2.2.0-rc0/javadoc/
>
> * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
> https://github.com/apache/kafka/releases/tag/2.2.0-rc0
>
> * Documentation:
> https://kafka.apache.org/22/documentation.html
>
> * Protocol:
> https://kafka.apache.org/22/protocol.html
>
> * Successful Jenkins builds for the 2.2 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-2.2-jdk8/31/
>
> * System tests:
> https://jenkins.confluent.io/job/system-test-kafka/job/2.2/
>
>
>
>
> Thanks,
>
> -Matthias
>
>


Kafka-1194

2019-03-20 Thread Stephane Maarek
Hi all

Kafka-1194 has been a longstanding issue and now there seems to be a viable
fix that also passes tests. I know Windows isn't a priority, but as I
mentioned before, avoiding a fatal un-recoverable crash for newcomers
learning Kafka is very important.

I don't have enough core Kafka expertise to judge the quality of the code,
but I'm sure some of you would. Please have a look at:
https://github.com/apache/kafka/pull/6329 if you have some time

Thanks!
Stephane


Re: [VOTE] KIP-392: Allow consumers to fetch from the closest replica

2019-03-26 Thread Stephane Maarek
It's going to change quite a few things for learners, but this is an
awesome idea!
+1 (non-binding)

On Tue, Mar 26, 2019 at 3:35 PM Viktor Somogyi-Vass 
wrote:

> +1 (non-binding) very good proposal.
>
> On Mon, Mar 25, 2019 at 7:19 PM David Arthur  wrote:
>
> > +1
> >
> > Thanks, Jason!
> >
> > On Mon, Mar 25, 2019 at 1:23 PM Eno Thereska 
> > wrote:
> >
> > > +1 (non-binding)
> > > Thanks for updating the KIP and addressing my previous comments.
> > >
> > > Eno
> > >
> > > On Mon, Mar 25, 2019 at 4:35 PM Ryanne Dolan 
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Great stuff, thanks.
> > > >
> > > > Ryanne
> > > >
> > > > On Mon, Mar 25, 2019, 11:08 AM Jason Gustafson 
> > > wrote:
> > > >
> > > > > Hi All, discussion on the KIP seems to have died down, so I'd like
> to
> > > go
> > > > > ahead and start a vote. Here is a link to the KIP:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica
> > > > > .
> > > > >
> > > > > +1 from me (duh)
> > > > >
> > > > > -Jason
> > > > >
> > > >
> > >
> >
> >
> > --
> > David Arthur
> >
>


Re: [ANNOUNCE] Apache Kafka 2.2.0

2019-03-26 Thread Stephane Maarek
Congratulations on this amazing release! Lots of cool new features :)

I've also released a YouTube video that will hopefully help the community
get up to speed: https://www.youtube.com/watch?v=kaWbp1Cnfo4&t=5s

Happy watching!

On Tue, Mar 26, 2019 at 7:02 PM Matthias J. Sax  wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 2.2.0
>
>  - Added SSL support for custom principal name
>  - Allow SASL connections to periodically re-authenticate
>  - Command line tool bin/kafka-topics.sh adds AdminClient support
>  - Improved consumer group management
>- default group.id is `null` instead of empty string
>  - API improvement
>- Producer: introduce close(Duration)
>- AdminClient: introduce close(Duration)
>- Kafka Streams: new flatTransform() operator in Streams DSL
>- KafkaStreams (and other classed) now implement AutoClosable to
> support try-with-resource
>- New Serdes and default method implementations
>  - Kafka Streams exposed internal client.id via ThreadMetadata
>  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will now
> output `NaN` as default value
>
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/2.2.0/RELEASE_NOTES.html
>
>
> You can download the source and binary release (Scala 2.11 and 2.12)
> from: https://kafka.apache.org/downloads#2.2.0
>
>
> ---
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
> ** The Producer API allows an application to publish a stream records to
> one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input streams to output streams.
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
> ** Building real-time streaming applications that transform or react
> to the streams of data.
>
>
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>
> A big thank you for the following 98 contributors to this release!
>
> Alex Diachenko, Andras Katona, Andrew Schofield, Anna Povzner, Arjun
> Satish, Attila Sasvari, Benedict Jin, Bert Roos, Bibin Sebastian, Bill
> Bejeck, Bob Barrett, Boyang Chen, Bridger Howell, cadonna, Chia-Ping
> Tsai, Chris Egerton, Colin Hicks, Colin P. Mccabe, Colin Patrick McCabe,
> cwildman, Cyrus Vafadari, David Arthur, Dhruvil Shah, Dong Lin, Edoardo
> Comar, Flavien Raynaud, forficate, Gardner Vickers, Guozhang Wang, Gwen
> (Chen) Shapira, hackerwin7, hejiefang, huxi, Ismael Juma, Jacek
> Laskowski, Jakub Scholz, Jarek Rudzinski, Jason Gustafson, Jingguo Yao,
> John Eismeier, John Roesler, Jonathan Santilli, jonathanskrzypek, Jun
> Rao, Kamal Chandraprakash, Kan Li, Konstantine Karantasis, lambdaliu,
> Lars Francke, layfe, Lee Dongjin, linyli001, lu.ke...@berkeley.edu,
> Lucas Bradstreet, Magesh Nandakumar, Manikumar Reddy, Manikumar Reddy O,
> Manohar Vanam, Mark Cho, Mathieu Chataigner, Matthias J. Sax, Matthias
> Wessendorf, matus-cuper, Max Zheng, Mayuresh Gharat, Mickael Maison,
> mingaliu, Nikolay, occho, Pasquale Vazzana, Radai Rosenblatt, Rajini
> Sivaram, Randall Hauch, Renato Mefi, Richard Yu, Robert Yokota, Ron
> Dagostino, ryannatesmith, Samuel Hawker, Satish Duggana, Sayat, seayoun,
> Shawn Nguyen, slim, Srinivas Reddy, Stanislav Kozlovski, Stig Rohde
> Døssing, Suman, Tom Bentley, u214578, Vahid Hashemian, Viktor Somogyi,
> Viktor Somogyi-Vass, Xi Yang, Xiongqi Wu, ying-zheng, Yishun Guan,
> Zhanxiang (Patrick) Huang
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kafka.apache.org/
>
> Thank you!
>
>
> Regards,
>
> Matthias
>


Re: Preliminary blog post for the Apache Kafka 2.3.0 release

2019-06-23 Thread Stephane Maarek
The video is ready :) waiting for the release of Kafka 2.3 to make it
public. @colin if you want to link it in the blog at some point that'd be
great!

On Wed., 19 Jun. 2019, 4:03 pm Ron Dagostino,  wrote:

> Looks great, Colin.
>
> I have also enjoyed Stephane Maarek's "What's New in Kafka..." series of
> videos (e.g. https://www.youtube.com/watch?v=kaWbp1Cnfo4&t=10s).  Having
> summaries like this in both formats -- blog and video -- for every release
> would be helpful as different people have different preferences.
>
> Ron
>
> On Tue, Jun 18, 2019 at 8:20 PM Colin McCabe  wrote:
>
> > Thanks, Konstantine.  I reworked the wording a bit -- take a look.
> >
> > best,
> > C.
> >
> > On Tue, Jun 18, 2019, at 14:31, Konstantine Karantasis wrote:
> > > Thanks Colin.
> > > Great initiative!
> > >
> > > Here's a small correction (between **) for KIP-415 with a small
> > suggestion
> > > as well (between _ _):
> > >
> > > In Kafka Connect, worker tasks are distributed among the available
> worker
> > > nodes. When a connector is reconfigured or a new connector is deployed
> > _as
> > > well as when a worker is added or removed_, the *tasks* must be
> > rebalanced
> > > across the Connect cluster to help ensure that all of the worker nodes
> > are
> > > doing a fair share of the Connect work. In 2.2 and earlier, a Connect
> > > rebalance caused all worker threads to pause while the rebalance
> > proceeded.
> > > As of KIP-415, rebalancing is no longer a stop-the-world affair, making
> > > configuration changes a more pleasant thing.
> > >
> > > Cheers,
> > > Konstantine
> > >
> > > On Tue, Jun 18, 2019 at 1:50 PM Swen Moczarski <
> swen.moczar...@gmail.com
> > >
> > > wrote:
> > >
> > > > Nice overview!
> > > >
> > > > I found some typos:
> > > > rbmainder
> > > > bmits
> > > > implbmentation
> > > >
> > > > Am Di., 18. Juni 2019 um 22:43 Uhr schrieb Boyang Chen <
> > > > bche...@outlook.com
> > > > >:
> > > >
> > > > > One typo:
> > > > > KIP-428: Add in-mbmory window store
> > > > > should be
> > > > > KIP-428: Add in-memory window store
> > > > >
> > > > >
> > > > > 
> > > > > From: Colin McCabe 
> > > > > Sent: Wednesday, June 19, 2019 4:22 AM
> > > > > To: dev@kafka.apache.org
> > > > > Subject: Re: Preliminary blog post for the Apache Kafka 2.3.0
> release
> > > > >
> > > > > Sorry, I copied the wrong URL at first.  Try this URL instead:
> > > > >
> > > >
> >
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > > On Tue, Jun 18, 2019, at 13:17, Colin McCabe wrote:
> > > > > > Hmm.  I'm looking to see if there's any way to open up the
> > > > > permissions... :|
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jun 18, 2019, at 13:12, M. Manna wrote:
> > > > > > > It’s asking for credentials...?
> > > > > > >
> > > > > > > On Tue, 18 Jun 2019 at 15:10, Colin McCabe  >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I've written up a preliminary blog post about the upcoming
> > Apache
> > > > > Kafka
> > > > > > > > 2.3.0 release.  Take a look and let me know what you think.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > >
> > > >
> >
> https://blogs.apache.org/roller-ui/authoring/preview/kafka/?previewEntry=what-s-new-in-apache
> > > > > > > >
> > > > > > > > cheers,
> > > > > > > > Colin
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Preliminary blog post for the Apache Kafka 2.3.0 release

2019-06-25 Thread Stephane Maarek
Here it is: https://youtu.be/YutjYKSGd64

Thanks in advance!!

On Tue., 25 Jun. 2019, 12:39 am Colin McCabe,  wrote:

> Hi Stephane,
>
> Sounds interesting!  Do you have a link to the video you made for 2.3?
>
> best,
> Colin
>
>
> On Sun, Jun 23, 2019, at 15:10, Stephane Maarek wrote:
> > The video is ready :) waiting for the release of Kafka 2.3 to make it
> > public. @colin if you want to link it in the blog at some point that'd be
> > great!
> >
> > On Wed., 19 Jun. 2019, 4:03 pm Ron Dagostino,  wrote:
> >
> > > Looks great, Colin.
> > >
> > > I have also enjoyed Stephane Maarek's "What's New in Kafka..." series
> of
> > > videos (e.g. https://www.youtube.com/watch?v=kaWbp1Cnfo4&t=10s).
> Having
> > > summaries like this in both formats -- blog and video -- for every
> release
> > > would be helpful as different people have different preferences.
> > >
> > > Ron
> > >
> > > On Tue, Jun 18, 2019 at 8:20 PM Colin McCabe 
> wrote:
> > >
> > > > Thanks, Konstantine.  I reworked the wording a bit -- take a look.
> > > >
> > > > best,
> > > > C.
> > > >
> > > > On Tue, Jun 18, 2019, at 14:31, Konstantine Karantasis wrote:
> > > > > Thanks Colin.
> > > > > Great initiative!
> > > > >
> > > > > Here's a small correction (between **) for KIP-415 with a small
> > > > suggestion
> > > > > as well (between _ _):
> > > > >
> > > > > In Kafka Connect, worker tasks are distributed among the available
> > > worker
> > > > > nodes. When a connector is reconfigured or a new connector is
> deployed
> > > > _as
> > > > > well as when a worker is added or removed_, the *tasks* must be
> > > > rebalanced
> > > > > across the Connect cluster to help ensure that all of the worker
> nodes
> > > > are
> > > > > doing a fair share of the Connect work. In 2.2 and earlier, a
> Connect
> > > > > rebalance caused all worker threads to pause while the rebalance
> > > > proceeded.
> > > > > As of KIP-415, rebalancing is no longer a stop-the-world affair,
> making
> > > > > configuration changes a more pleasant thing.
> > > > >
> > > > > Cheers,
> > > > > Konstantine
> > > > >
> > > > > On Tue, Jun 18, 2019 at 1:50 PM Swen Moczarski <
> > > swen.moczar...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Nice overview!
> > > > > >
> > > > > > I found some typos:
> > > > > > rbmainder
> > > > > > bmits
> > > > > > implbmentation
> > > > > >
> > > > > > Am Di., 18. Juni 2019 um 22:43 Uhr schrieb Boyang Chen <
> > > > > > bche...@outlook.com
> > > > > > >:
> > > > > >
> > > > > > > One typo:
> > > > > > > KIP-428: Add in-mbmory window store
> > > > > > > should be
> > > > > > > KIP-428: Add in-memory window store
> > > > > > >
> > > > > > >
> > > > > > > 
> > > > > > > From: Colin McCabe 
> > > > > > > Sent: Wednesday, June 19, 2019 4:22 AM
> > > > > > > To: dev@kafka.apache.org
> > > > > > > Subject: Re: Preliminary blog post for the Apache Kafka 2.3.0
> > > release
> > > > > > >
> > > > > > > Sorry, I copied the wrong URL at first.  Try this URL instead:
> > > > > > >
> > > > > >
> > > >
> > >
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache
> > > > > > >
> > > > > > > best,
> > > > > > > Colin
> > > > > > >
> > > > > > > On Tue, Jun 18, 2019, at 13:17, Colin McCabe wrote:
> > > > > > > > Hmm.  I'm looking to see if there's any way to open up the
> > > > > > > permissions... :|
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jun 18, 2019, at 13:12, M. Manna wrote:
> > > > > > > > > It’s asking for credentials...?
> > > > > > > > >
> > > > > > > > > On Tue, 18 Jun 2019 at 15:10, Colin McCabe <
> cmcc...@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I've written up a preliminary blog post about the
> upcoming
> > > > Apache
> > > > > > > Kafka
> > > > > > > > > > 2.3.0 release.  Take a look and let me know what you
> think.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> https://blogs.apache.org/roller-ui/authoring/preview/kafka/?previewEntry=what-s-new-in-apache
> > > > > > > > > >
> > > > > > > > > > cheers,
> > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Minimum Replication Factor

2017-06-22 Thread Stephane Maarek
Hi all,

 

Interested in getting people’s opinion on something.

The problem I have is that some people launch streams app in our cluster but 
forget to set a replication factor > 1. Then it’s a pain to increase the 
topic’s RF, when we do notice some topic partitions go offline because we 
reboot brokers. 

 

I have two solutions for this, which I’m interested in hearing:
Make the replication.factor in Kafka Streams “opiniated / smart” by changing 
the default to a dynamic min(3, # brokers).
Create a “minimum.replication.factor” in Kafka broker settings. If any topic is 
trying to be created using a RF less than the min, Kafka says no and doesn’t 
create the topic. That would ensure no topics get “miscreated” in production 
clusters and ease the pain on both devs, devops and support.
 

Thoughts? 

My preference goes towards 2). 

 

Cheers!

Stephane 



Re: Minimum Replication Factor

2017-06-23 Thread Stephane Maarek
That’s the first time I see this setting, wow it was burried!
I think it makes sense to implement one to get full control. 

I wonder if it’s still not worth implementing a simple setting, or implementing 
a few “simple” topic creation policies that users can just reference. I don’t 
see that interface being implemented anywhere
 

On 23/6/17, 6:43 pm, "Edoardo Comar"  wrote:

Hi Stephane,
we enforce the constraint in a custom create topic policy (see '
create.topic.policy.class.name')
--
Edoardo Comar
IBM Message Hub
eco...@uk.ibm.com
IBM UK Ltd, Hursley Park, SO21 2JN


    

    From:   Stephane Maarek 
To: "dev@kafka.apache.org" 
Date:   23/06/2017 01:48
Subject:Minimum Replication Factor



Hi all,

 

Interested in getting people’s opinion on something.

The problem I have is that some people launch streams app in our cluster 
but forget to set a replication factor > 1. Then it’s a pain to increase 
the topic’s RF, when we do notice some topic partitions go offline because 
we reboot brokers. 

 

I have two solutions for this, which I’m interested in hearing:
Make the replication.factor in Kafka Streams “opiniated / smart” by 
changing the default to a dynamic min(3, # brokers).
Create a “minimum.replication.factor” in Kafka broker settings. If any 
topic is trying to be created using a RF less than the min, Kafka says no 
and doesn’t create the topic. That would ensure no topics get “miscreated” 
in production clusters and ease the pain on both devs, devops and support.
 

Thoughts? 

My preference goes towards 2). 

 

Cheers!

Stephane 




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU






Re: [DISCUSS] KIP-186: Increase offsets retention default to 7 days

2017-08-09 Thread Stephane Maarek
Any interest on having offsets.retention.minutes= 
log.retention.(ms|minutes|hours)   as a dynamic setting if not set, and having 
the option the override to a constant value?
That would address different types of deployments as well, who modify the 
default log retention period

On 10/8/17, 5:11 am, "Apurva Mehta"  wrote:

Thanks for the KIP. +1 from me.

On Tue, Aug 8, 2017 at 5:24 PM, Ewen Cheslack-Postava 
wrote:

> Hi all,
>
> I posted a simple new KIP for a problem we see with a lot of users:
> KIP-186: Increase offsets retention default to 7 days
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 186%3A+Increase+offsets+retention+default+to+7+days
>
> Note that in addition to the KIP text itself, the linked JIRA already
> existed and has a bunch of discussion on the subject.
>
> -Ewen
>





Credentials for Confluence

2017-04-10 Thread Stephane Maarek
Hi,

I’d like to create a KIP on Confluence but according to Matthias Sax I need
credentials.
Can you please provide me some?

Looking forward to redacting my first KIP :)

Regards,
Stephane


Re: Credentials for Confluence

2017-04-12 Thread Stephane Maarek
It’s simplesteph
Thanks for adding me!

On 13 April 2017 at 9:28:35 am, Gwen Shapira (g...@confluent.io) wrote:

What's your Apache wiki username?

On Mon, Apr 10, 2017 at 11:48 PM, Stephane Maarek
 wrote:
> Hi,
>
> I’d like to create a KIP on Confluence but according to Matthias Sax I
need
> credentials.
> Can you please provide me some?
>
> Looking forward to redacting my first KIP :)
>
> Regards,
> Stephane



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


[DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-04-19 Thread Stephane Maarek
Hi all,

My first KIP, let me know your thoughts!
https://cwiki.apache.org/confluence/display/KAFKA/KIP+141+-+ProducerRecordBuilder+Interface


Cheers,
Stephane


Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-04-20 Thread Stephane Maarek
Matthias: I was definitely on board with you at first, but Ismael made the 
comment that for:

public ProducerRecord(String topic, K key, V value, Integer partition)
public ProducerRecord(String topic, K key, V value, Long timestamp)

Integer and Long are way too close in terms of meaning, and could provide a 
strong misuse of the timestamp / partition field. 
Therefore I started with a builder pattern for explicit argument declaration. 
Seems like a lot of boilerplate, but it makes things quite easy to understand.

I like your point about the necessity of the key, and that users should set it 
to null explicitely.

Damian: I like your idea of public ProducerRecordBuilder(String topic, V value)
Finally, I also chose the withForcedPartition because in my learning of Kafka, 
I was always told that the key is solely the determining factor to know how a 
messages makes it to a partition. I find it incredibly unintuitive and 
dangerous to provide the users the ability to force a partition. If anything 
they should be providing their own key -> partition mapping, but I’m really 
against letting users force a partition within the producerRecord. What do you 
think?


What do you both think of the more opiniated:

public ProducerRecordBuilder(String topic, K key, V value)

coming with withPartition and withTimestamp?  



On 21/4/17, 2:24 am, "Matthias J. Sax"  wrote:

Thanks for the KIP!

While I agree, that the current API is not perfect, I am not sure if a
builder pattern does make sense here, because it's not too many parameters.

IMHO, builder pattern makes sense if there are many optional parameters.
For a ProducerRecord, I think there are only 2 optional parameters:
partition and timestamp.

I don't think key should be optional, because uses should be "forced" to
think about the key argument as it effects the partitioning. Thus,
providing an explicit `null` if there is no key seems reasonable to me.

Overall I think that providing 3 overloads would be sufficient:

> public ProducerRecord(String topic, K key, V value, Integer partition)
> public ProducerRecord(String topic, K key, V value, Long timestamp)
> public ProducerRecord(String topic, K key, V value, Integer partition, 
Long timestamp)


Just my 2 cents.

-Matthias


On 4/20/17 4:20 AM, Damian Guy wrote:
> Hi Stephane,
> 
> Thanks for the KIP.  Overall it looks ok, though i think the builder 
should
> enforce the required parameters by supplying them via the constructor, 
i.e,
> 
> public ProducerRecordBuilder(String topic, V value)
> 
> You can then remove the withValue and withTopic methods
> 
> I also think withForcedPartition should just be withPartition
> 
    > Thanks,
> Damian
> 
> On Wed, 19 Apr 2017 at 23:34 Stephane Maarek 

> wrote:
> 
>> Hi all,
>>
>> My first KIP, let me know your thoughts!
>>
>> 
https://cwiki.apache.org/confluence/display/KAFKA/KIP+141+-+ProducerRecordBuilder+Interface
>>
>>
>> Cheers,
>> Stephane
>>
> 






Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-04-21 Thread Stephane Maarek
Agreed! I’ll update the KIP shortly
I’m self-taught, so I guess I still have a lot to learn (

If anything, I think the withForcedPartition method could be just removed, and 
if users need to force partition, shouldn’t they just mandatorily use a custom 
Partitioner?  It would achieve the same purpose, but in a much “safer” way?
It just sounds like there would be two ways to achieve the same purpose here. 
Nonetheless I understand it could be a breaking change

- Stephane
 
 

On 22/4/17, 4:05 am, "Matthias J. Sax"  wrote:

Ismael's comment is quite reasonable.

And I actually like the idea about `withForcedPartition` -- it helps
newcomers to understand the API better. Even if I don't agree with some
of you reasoning:

> I was always told that the key is solely the determining factor

Bad teacher?

> I find it incredibly unintuitive and dangerous to provide the users the 
ability to force a partition

Disagree completely. This is a very useful feature for advanced use
cases. Btw: users can also specify a custom partitioner -- and this
custom partitioner could also do any computation to determine the
partitions (it also has access to key and value -- thus, it could also
use the value to compute the partition).

But I like `withForcedPartition` because it indicates that the
partitioner (default or custom) is not going to be used for this case.


Btw: if we plan to make `public ProducerRecord(String topic, Integer
partition, Long timestamp, K key, V value)` protected as some point, we
should deprecate it, too.


I also like the compromise you suggest

>> public ProducerRecordBuilder(String topic, K key, V value)
>> 
>> coming with withPartition and withTimestamp?  


The only thing, I still don't like is that we use a builder and are thus
forced to call .build() -- boilerplate.

Maybe we would just change ProducerRecord itself? Like:

new ProducerRecord(topic, key,
value).withTimestamp(ts).withForcedPartition(p);

WDYT?


    -Matthias

On 4/20/17 4:57 PM, Stephane Maarek wrote:
> Matthias: I was definitely on board with you at first, but Ismael made 
the comment that for:
> 
> public ProducerRecord(String topic, K key, V value, Integer partition)
> public ProducerRecord(String topic, K key, V value, Long timestamp)
> 
> Integer and Long are way too close in terms of meaning, and could provide 
a strong misuse of the timestamp / partition field. 
> Therefore I started with a builder pattern for explicit argument 
declaration. Seems like a lot of boilerplate, but it makes things quite easy to 
understand.
> 
> I like your point about the necessity of the key, and that users should 
set it to null explicitely.
> 
> Damian: I like your idea of public ProducerRecordBuilder(String topic, V 
value)
> Finally, I also chose the withForcedPartition because in my learning of 
Kafka, I was always told that the key is solely the determining factor to know 
how a messages makes it to a partition. I find it incredibly unintuitive and 
dangerous to provide the users the ability to force a partition. If anything 
they should be providing their own key -> partition mapping, but I’m really 
against letting users force a partition within the producerRecord. What do you 
think?
> 
> 
> What do you both think of the more opiniated:
> 
> public ProducerRecordBuilder(String topic, K key, V value)
> 
> coming with withPartition and withTimestamp?  
> 
> 
> 
> On 21/4/17, 2:24 am, "Matthias J. Sax"  wrote:
> 
> Thanks for the KIP!
> 
> While I agree, that the current API is not perfect, I am not sure if a
> builder pattern does make sense here, because it's not too many 
parameters.
> 
> IMHO, builder pattern makes sense if there are many optional 
parameters.
> For a ProducerRecord, I think there are only 2 optional parameters:
> partition and timestamp.
> 
> I don't think key should be optional, because uses should be "forced" 
to
> think about the key argument as it effects the partitioning. Thus,
> providing an explicit `null` if there is no key seems reasonable to 
me.
> 
> Overall I think that providing 3 overloads would be sufficient:
> 
> > public ProducerRecord(String topic, K key, V value, Integer 
partition)
> > public ProducerRecord(String topic, K key, V value, Long timestamp)
> > public ProducerRecord(String topic, K key, V value, Integer 
partition, Long timestamp)
> 
&g

Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-04-21 Thread Stephane Maarek
Hi Ismael,

Good points
I think I was headed in that direction: 
https://github.com/apache/kafka/pull/2894 
1. That’s a possibility. I’m just unsure about how the message format will 
evolve in future versions, because adding constructors is painful if more 
parameters come into play. The approach above (KIP / PR) can easily allow for 
message format extensions
2. /3 Not a bad idea, maybe I’ll explore that as well. Just looking for 
feedback on the KIP / PR first as it’s been updated 15 minutes ago


On 22/4/17, 10:38 am, "Ismael Juma"  wrote:

Thanks for the KIP. A possible alternative:

1. Add constructor ProducerRecord(String topic, K key, V value, Long
timestamp). This provides an unambiguous constructor that allows one to
pass a timestamp without a partition, which is the main requirement of the
KIP.

We could also consider:

2. Add a couple of `createWithPartition` static factory methods to replace
the existing constructors that take a partition. The idea being that
passing a partition is different enough that it should be called out
specifically.

3. Deprecate the existing constructors that take a partition so that we can
remove them (or make one of them private/protected) in a future release

Because ProducerRecord is used so widely, we should make sure that there is
real value in doing 2 and 3. Otherwise, we should stick to 1.

Ismael

On Fri, Apr 21, 2017 at 12:57 AM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Matthias: I was definitely on board with you at first, but Ismael made the
> comment that for:
>
> public ProducerRecord(String topic, K key, V value, Integer partition)
> public ProducerRecord(String topic, K key, V value, Long timestamp)
>
> Integer and Long are way too close in terms of meaning, and could provide
> a strong misuse of the timestamp / partition field.
> Therefore I started with a builder pattern for explicit argument
> declaration. Seems like a lot of boilerplate, but it makes things quite
> easy to understand.
>
> I like your point about the necessity of the key, and that users should
> set it to null explicitely.
>
> Damian: I like your idea of public ProducerRecordBuilder(String topic, V
> value)
> Finally, I also chose the withForcedPartition because in my learning of
> Kafka, I was always told that the key is solely the determining factor to
> know how a messages makes it to a partition. I find it incredibly
> unintuitive and dangerous to provide the users the ability to force a
> partition. If anything they should be providing their own key -> partition
> mapping, but I’m really against letting users force a partition within the
> producerRecord. What do you think?
>
>
> What do you both think of the more opiniated:
>
> public ProducerRecordBuilder(String topic, K key, V value)
>
> coming with withPartition and withTimestamp?
>
>
>
> On 21/4/17, 2:24 am, "Matthias J. Sax"  wrote:
>
> Thanks for the KIP!
>
> While I agree, that the current API is not perfect, I am not sure if a
> builder pattern does make sense here, because it's not too many
> parameters.
>
> IMHO, builder pattern makes sense if there are many optional
> parameters.
> For a ProducerRecord, I think there are only 2 optional parameters:
> partition and timestamp.
>
> I don't think key should be optional, because uses should be "forced"
> to
> think about the key argument as it effects the partitioning. Thus,
> providing an explicit `null` if there is no key seems reasonable to 
me.
>
> Overall I think that providing 3 overloads would be sufficient:
>
> > public ProducerRecord(String topic, K key, V value, Integer
> partition)
> > public ProducerRecord(String topic, K key, V value, Long timestamp)
> > public ProducerRecord(String topic, K key, V value, Integer
> partition, Long timestamp)
>
>
> Just my 2 cents.
>
> -Matthias
>
>
> On 4/20/17 4:20 AM, Damian Guy wrote:
> > Hi Stephane,
> >
> > Thanks for the KIP.  Overall it looks ok, though i think the builder
> should
> > enforce the required parameters by supplying them via the
> constructor, i.e,
> >
> > public ProducerRecordBuilder(String topic, V value)
> >
> > You can then remove the withValue and withTopic methods
> 

Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-04-22 Thread Stephane Maarek
Good call.
That’s when I heavily miss Scala Case classes and options. You get clarity on 
optional vs mandatory fields, one constructor, and immutability. If losing 
Immutability is an issue, then the with pattern is a no-go and then I’ll just 
add a missing constructor the way Ismael described it. That’ll make the PR way 
simpler, with limited refactoring.

Regarding the ConsumerRecord, I’m happy to have a look, but it’s the first time 
I see it concretely. When would you manually construct such a record? Isn’t the 
client handling all that for you behind the scene?
 

On 23/4/17, 3:21 pm, "Michael Pearce"  wrote:

If moving to a wither pattern instead of a builder. How will this enforce 
immutability? Eg current PR it is now changing to allow possible change values 
once set.

Or are you proposing to change it to a mutable record? And move to a 
closable record similar to the closing of the headers on send.

How about also the consumer record, is this also being looked at so we 
don't have two very different styles.

Cheers
Mike



Sent using OWA for iPhone

From: isma...@gmail.com  on behalf of Ismael Juma 

Sent: Saturday, April 22, 2017 11:53:45 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

On Sat, Apr 22, 2017 at 6:16 PM, Matthias J. Sax 
wrote:

> I think Ismael's suggestion is a valid alternative.
>
> However, `timestamp` is an optional field and thus we should have at
> least two constructors for this:
>
>  - ProducerRecord(String topic, K key, V value)

 - ProducerRecord(String topic, K key, V value, Long timestamp)
>

Yes, the other one already exists.

Ismael
The information contained in this email is strictly confidential and for 
the use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.





Re: [ANNOUNCE] Apache Kafka 0.10.2.0 Released

2017-02-22 Thread Stephane Maarek
Awesome thanks a lot! When should we expect the dependencies to be released
in Maven? (including 2.12 scala)

On 23 February 2017 at 8:27:10 am, Jun Rao (j...@confluent.io) wrote:

Thanks for driving the release, Ewen.

Jun

On Wed, Feb 22, 2017 at 12:33 AM, Ewen Cheslack-Postava 
wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.2.0. This is a feature release which includes the completion
> of 15 KIPs, over 200 bug fixes and improvements, and more than 500 pull
> requests merged.
>
> All of the changes in this release can be found in the release notes:
> https://archive.apache.org/dist/kafka/0.10.2.0/RELEASE_NOTES.html
>
> Apache Kafka is a distributed streaming platform with four four core
> APIs:
>
> ** The Producer API allows an application to publish a stream records to
> one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output
> stream to one or more output topics, effectively transforming the input
> streams to output streams.
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might capture
> every change to a table.three key capabilities:
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
> ** Building real-time streaming applications that transform or react to
> the
> streams of data.
>
>
> You can download the source release from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka-0.10.2.0-src.tgz
>
> and binary releases from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.11-0.10.2.0.tgz
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.10-0.10.2.0.tgz
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
> 0/kafka_2.12-0.10.2.0.tgz
> (experimental 2.12 artifact)
>
> Thanks to the 101 contributors on this release!
>
> Akash Sethi, Alex Loddengaard, Alexey Ozeritsky, amethystic, Andrea
> Cosentino, Andrew Olson, Andrew Stevenson, Anton Karamanov, Antony
> Stubbs, Apurva Mehta, Arun Mahadevan, Ashish Singh, Balint Molnar, Ben
> Stopford, Bernard Leach, Bill Bejeck, Colin P. Mccabe, Damian Guy, Dan
> Norwood, Dana Powers, dasl, Derrick Or, Dong Lin, Dustin Cote, Edoardo
> Comar, Edward Ribeiro, Elias Levy, Emanuele Cesena, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, fpj, Geoff Anderson, Guozhang Wang,
> Gwen Shapira, Hikiko Murakami, Himani Arora, himani1, Hojjat Jafarpour,
> huxi, Ishita Mandhan, Ismael Juma, Jakub Dziworski, Jan Lukavsky, Jason
> Gustafson, Jay Kreps, Jeff Widman, Jeyhun Karimov, Jiangjie Qin, Joel
> Koshy, Jon Freedman, Joshi, Jozef Koval, Json Tu, Jun He, Jun Rao,
> Kamal, Kamal C, Kamil Szymanski, Kim Christensen, Kiran Pillarisetty,
> Konstantine Karantasis, Lihua Xin, LoneRifle, Magnus Edenhill, Magnus
> Reftel, Manikumar Reddy O, Mark Rose, Mathieu Fenniak, Matthias J. Sax,
> Mayuresh Gharat, MayureshGharat, Michael Schiff, Mickael Maison,
> MURAKAMI Masahiko, Nikki Thean, Olivier Girardot, pengwei-li, pilo,
> Prabhat Kashyap, Qian Zheng, Radai Rosenblatt, radai-rosenblatt, Raghav
> Kumar Gautam, Rajini Sivaram, Rekha Joshi, rnpridgeon, Ryan Pridgeon,
> Sandesh K, Scott Ferguson, Shikhar Bhushan, steve, Stig Rohde Døssing,
> Sumant Tambe, Sumit Arrawatia, Theo, Tim Carey-Smith, Tu Yang, Vahid
> Hashemian, wangzzu, Will Marshall, Xavier Léauté, Xavier Léauté, Xi Hu,
> Yang Wei, yaojuncn, Yuto Kawamura
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
> Ewen
>


Re: [ANNOUNCE] Apache Kafka 0.10.2.0 Released

2017-02-22 Thread Stephane Maarek
https://mvnrepository.com/artifact/org.apache.kafka/kafka_2.11

Am I missing something ?

On 23 February 2017 at 9:21:08 am, Gwen Shapira (g...@confluent.io) wrote:

I saw them in Maven yesterday?

On Wed, Feb 22, 2017 at 2:15 PM, Stephane Maarek
 wrote:
> Awesome thanks a lot! When should we expect the dependencies to be
released
> in Maven? (including 2.12 scala)
>
> On 23 February 2017 at 8:27:10 am, Jun Rao (j...@confluent.io) wrote:
>
> Thanks for driving the release, Ewen.
>
> Jun
>
> On Wed, Feb 22, 2017 at 12:33 AM, Ewen Cheslack-Postava 

> wrote:
>
>> The Apache Kafka community is pleased to announce the release for Apache
>> Kafka 0.10.2.0. This is a feature release which includes the completion
>> of 15 KIPs, over 200 bug fixes and improvements, and more than 500 pull
>> requests merged.
>>
>> All of the changes in this release can be found in the release notes:
>> https://archive.apache.org/dist/kafka/0.10.2.0/RELEASE_NOTES.html
>>
>> Apache Kafka is a distributed streaming platform with four four core
>> APIs:
>>
>> ** The Producer API allows an application to publish a stream records to
>> one or more Kafka topics.
>>
>> ** The Consumer API allows an application to subscribe to one or more
>> topics and process the stream of records produced to them.
>>
>> ** The Streams API allows an application to act as a stream processor,
>> consuming an input stream from one or more topics and producing an
>> output
>> stream to one or more output topics, effectively transforming the input
>> streams to output streams.
>>
>> ** The Connector API allows building and running reusable producers or
>> consumers that connect Kafka topics to existing applications or data
>> systems. For example, a connector to a relational database might capture
>> every change to a table.three key capabilities:
>>
>>
>> With these APIs, Kafka can be used for two broad classes of application:
>>
>> ** Building real-time streaming data pipelines that reliably get data
>> between systems or applications.
>>
>> ** Building real-time streaming applications that transform or react to
>> the
>> streams of data.
>>
>>
>> You can download the source release from
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
>> 0/kafka-0.10.2.0-src.tgz
>>
>> and binary releases from
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
>> 0/kafka_2.11-0.10.2.0.tgz
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
>> 0/kafka_2.10-0.10.2.0.tgz
>> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.
>> 0/kafka_2.12-0.10.2.0.tgz
>> (experimental 2.12 artifact)
>>
>> Thanks to the 101 contributors on this release!
>>
>> Akash Sethi, Alex Loddengaard, Alexey Ozeritsky, amethystic, Andrea
>> Cosentino, Andrew Olson, Andrew Stevenson, Anton Karamanov, Antony
>> Stubbs, Apurva Mehta, Arun Mahadevan, Ashish Singh, Balint Molnar, Ben
>> Stopford, Bernard Leach, Bill Bejeck, Colin P. Mccabe, Damian Guy, Dan
>> Norwood, Dana Powers, dasl, Derrick Or, Dong Lin, Dustin Cote, Edoardo
>> Comar, Edward Ribeiro, Elias Levy, Emanuele Cesena, Eno Thereska, Ewen
>> Cheslack-Postava, Flavio Junqueira, fpj, Geoff Anderson, Guozhang Wang,
>> Gwen Shapira, Hikiko Murakami, Himani Arora, himani1, Hojjat Jafarpour,
>> huxi, Ishita Mandhan, Ismael Juma, Jakub Dziworski, Jan Lukavsky, Jason
>> Gustafson, Jay Kreps, Jeff Widman, Jeyhun Karimov, Jiangjie Qin, Joel
>> Koshy, Jon Freedman, Joshi, Jozef Koval, Json Tu, Jun He, Jun Rao,
>> Kamal, Kamal C, Kamil Szymanski, Kim Christensen, Kiran Pillarisetty,
>> Konstantine Karantasis, Lihua Xin, LoneRifle, Magnus Edenhill, Magnus
>> Reftel, Manikumar Reddy O, Mark Rose, Mathieu Fenniak, Matthias J. Sax,
>> Mayuresh Gharat, MayureshGharat, Michael Schiff, Mickael Maison,
>> MURAKAMI Masahiko, Nikki Thean, Olivier Girardot, pengwei-li, pilo,
>> Prabhat Kashyap, Qian Zheng, Radai Rosenblatt, radai-rosenblatt, Raghav
>> Kumar Gautam, Rajini Sivaram, Rekha Joshi, rnpridgeon, Ryan Pridgeon,
>> Sandesh K, Scott Ferguson, Shikhar Bhushan, steve, Stig Rohde Døssing,
>> Sumant Tambe, Sumit Arrawatia, Theo, Tim Carey-Smith, Tu Yang, Vahid
>> Hashemian, wangzzu, Will Marshall, Xavier Léauté, Xavier Léauté, Xi Hu,
>> Yang Wei, yaojuncn, Yuto Kawamura
>>
>> We welcome your help and feedback. For more information on how to
>> report problems, and to get involved, visit the project website at
>> http://kafka.apache.org/
>>
>> Thanks,
>> Ewen
>>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-04-28 Thread Stephane Maarek
Hi

Thanks for all your input. My initial intent was to just add a “timestamp” 
constructor for the ProducerRecord, and it seems greater changes are too 
contentious.
I’ve just rolled back the PR to add the missing constructors, in the way Ismael 
suggested.

Updated KIP here: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69406838 
Added constructors here: https://github.com/apache/kafka/pull/2894  
 
Let me know your thoughts,

Regards,
Stephane

On 25/4/17, 2:58 am, "Michael Pearce"  wrote:

Why not simply make a cleaner client fluent API wrapper? The internals use 
and send via current api, but provide a cleaner more fluent api.

A good example here is HTTP compontents where they did this.

https://hc.apache.org/httpcomponents-client-ga/tutorial/html/fluent.html



On 24/04/2017, 17:40, "Matthias J. Sax"  wrote:

Hey Jay,

I understand your concern, and for sure, we will need to keep the
current constructors deprecated for a long time (ie, many years).

But if we don't make the move, we will not be able to improve. And I
think warnings about using deprecated APIs is an acceptable price to
pay. And the API improvements will help new people who adopt Kafka to
get started more easily.

Otherwise Kafka might end up as many other enterprise software with a
lots of old stuff that is kept forever because nobody has the guts to
improve/change it.

Of course, we can still improve the docs of the deprecated constructors,
too.

Just my two cents.


-Matthias

On 4/23/17 3:37 PM, Jay Kreps wrote:
> Hey guys,
>
> I definitely think that the constructors could have been better 
designed,
> but I think given that they're in heavy use I don't think this 
proposal
> will improve things. Deprecating constructors just leaves everyone 
with
> lots of warnings and crossed out things. We can't actually delete the
> methods because lots of code needs to be usable across multiple Kafka
> versions, right? So we aren't picking between the original approach 
(worse)
> and the new approach (better); what we are proposing is a perpetual
> mingling of the original style and the new style with a bunch of 
deprecated
> stuff, which I think is worst of all.
>
> I'd vote for just documenting the meaning of null in the 
ProducerRecord
> constructor.
    >
> -Jay
>
> On Wed, Apr 19, 2017 at 3:34 PM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
>> Hi all,
>>
>> My first KIP, let me know your thoughts!
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
>> 141+-+ProducerRecordBuilder+Interface
>>
>>
>> Cheers,
>> Stephane
>>
>



The information contained in this email is strictly confidential and for 
the use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.





Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-04-30 Thread Stephane Maarek
I’m not sure how people would feel about having two distinct methods to build 
the same object?
An API wrapper may be useful, but it doesn’t bring opinion about how one should 
program, that’s just driven by the docs. 
I’m okay with that, but we need concensus
 

On 1/5/17, 6:08 am, "Michael Pearce"  wrote:

Why not, instead of deprecating or removing whats there, as noted, its a 
point of preference, think about something that could wrap the existing, but 
provide an api that for you is cleaner?

e.g. here's a sample idea building on a fluent api way. (this wraps the 
producer and producer records so no changes needed)

https://gist.github.com/michaelandrepearce/de0f5ad4aa7d39d243781741c58c293e

In future as new items further add to Producer Record, they just become new 
methods in the fluent API, as it builds the ProducerRecord using the most 
exhaustive constructor.




From: Matthias J. Sax 
Sent: Saturday, April 29, 2017 6:52 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

I understand that we cannot just break stuff (btw: also not for
Streams!). But deprecating does not break anything, so I don't think
it's a big deal to change the API as long as we keep the old API as
deprecated.


-Matthias

On 4/29/17 9:28 AM, Jay Kreps wrote:
> Hey Matthias,
>
> Yeah I agree, I'm not against change as a general thing! I also think if
> you look back on the last two years, we completely rewrote the producer 
and
> consumer APIs, reworked the binary protocol many times over, and added the
> connector and stream processing apis, both major new additions. So I don't
> think we're in too much danger of stagnating!
>
> My two cents was just around breaking compatibility for trivial changes
> like constructor => builder. I think this only applies to the producer,
> consumer, and connect apis which are heavily embedded in hundreds of
> ecosystem components that depend on them. This is different from direct
> usage. If we break the streams api it is really no big deal---apps just
> need to rebuild when they upgrade, not the end of the world at all. 
However
> because many intermediate things depend on the Kafka producer you can 
cause
> these weird situations where your app depends on two third party things
> that use Kafka and each requires different, incompatible versions. We did
> this a lot in earlier versions of Kafka and it was the cause of much angst
> (and an ingrained general reluctance to upgrade) from our users.
>
> I still think we may have to break things, i just don't think we should do
> it for things like builders vs direct constructors which i think are kind
> of a debatable matter of taste.
>
> -Jay
>
>
>
> On Mon, Apr 24, 2017 at 9:40 AM, Matthias J. Sax 
> wrote:
>
>> Hey Jay,
>>
>> I understand your concern, and for sure, we will need to keep the
>> current constructors deprecated for a long time (ie, many years).
>>
>> But if we don't make the move, we will not be able to improve. And I
>> think warnings about using deprecated APIs is an acceptable price to
>> pay. And the API improvements will help new people who adopt Kafka to
>> get started more easily.
>>
>> Otherwise Kafka might end up as many other enterprise software with a
>> lots of old stuff that is kept forever because nobody has the guts to
>> improve/change it.
>>
>> Of course, we can still improve the docs of the deprecated constructors,
>> too.
>>
>> Just my two cents.
>>
>>
>> -Matthias
>>
>> On 4/23/17 3:37 PM, Jay Kreps wrote:
>>> Hey guys,
>>>
>>> I definitely think that the constructors could have been better 
designed,
>>> but I think given that they're in heavy use I don't think this proposal
>>> will improve things. Deprecating constructors just leaves everyone with
>>> lots of warnings and crossed out things. We can't actually delete the
>>> methods because lots of code needs to be usable across multiple Kafka
>>> versions, right? So we aren't picking between the original approach
>> (worse)
>>> and the new approach (better); what we are proposing is a perpetual
>>> mingling of the original style and the new style with a bunch of
>> deprecated
>>> stuff, which 

Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

2017-05-02 Thread Stephane Maarek
Thanks for the work, it’s definitely needed!
I’d like to suggest to take it one step further.

To me, I’d like to see Kafka Connect the same way we have Docker and Docker 
repositories.

Here’s how I would envision the flow:
- Kafka Connect workers are just workers. They come with no jars whatsoever
- The REST API allow you to add a config to the connect cluster 
- The workers, seeing the config, pull the jars from the available (maven?) 
repositories (public or private)
- Classpath isolation comes into play so that the pulled jar doesn’t interact 
with other connectors
- Additionally, I believe the config should have a “tag” or “version” (like 
docker really), so that 
o you can run different versions of the same connector on your connect cluster
o configurations are strongly linked to a connector (right now if I update my 
connector jars, I may break my configuration)

I know this is a bit out of scope, but if major changes are coming to connect 
then these are my two cents.

Finally, maybe extend that construct to Transformers. The ability to 
externalise transformers as jars would democratize their usage IMO


On 3/5/17, 4:24 am, "Ewen Cheslack-Postava"  wrote:

Thanks for the KIP.

A few responses inline, followed by additional comments.

On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis <
konstant...@confluent.io> wrote:

> Gwen, Randall thank you for your very insightful observations. I'm glad 
you
> find this first draft to be an adequate platform for discussion.
>
> I'll attempt replying to your comments in order.
>
> Gwen, I also debated exactly the same two options: a) interpreting absence
> of module path as a user's intention to turn off isolation and b)
> explicitly using an additional boolean property. A few reasons why I went
> with b) in this first draft are:
> 1) As Randall mentions, to leave the option of using a default value open.
> If not immediately in the first version of isolation, maybe in the future.
> 2) I didn't like the implicit character of the choice of interpreting an
> empty string as a clear intention to turn isolation off by the user. Half
> the time could be just that users forget to set a location, although 
they'd
> like to use class loading isolation.
> 3) There's a slim possibility that in rare occasions a user might want to
> avoid even the slightest increase in memory consumption due to class
> loading duplication. I admit this should be very rare, but given the other
> concerns and that we would really like to keep the isolation 
implementation
> simple, the option to turn off this feature by using only one additional
> config property might not seem too excessive. At least at the start of 
this
> discussion.
> 4) Debugging during development might be simpler in some cases.
> 5) Finally, as you mention, this could allow for smoother upgrades.
>

I'm not sure any of these keep you from removing the extra config. Is there
any reason you couldn't have clean support for relying on the CLASSPATH
while still supporting the classloaders? Then getting people onto the new
classloaders does require documentation for how to install connectors, but
that's pretty minimal. And we don't break existing installations where
people are just adding to the CLASSPATH. It seems like this:

1. Allows you to set a default. Isolation is always enabled, but we won't
include any paths/directories we already use. Setting a default just
requires specifying a new location where we'd hold these directories.
2. It doesn't require the implicit choice -- you actually never turn off
isolation, but still support the regular CLASSPATH with an empty list of
isolated loaders
3. The user can still use CLASSPATH if they want to minimize classloader
overhead
4. Debugging can still use CLASSPATH
5. Upgrades just work.


>
> Randall, regarding your comments:
> 1) To keep its focus narrow, this KIP, as well as the first implementation
> of isolation in Connect, assume filesystem based discovery. With careful
> implementation, transitioning to discovery schemes that support broader
> URIs I believe should be easy in the future.
>

Maybe just mention a couple of quick examples in the KIP. When described
inline it might be more obvious that it will extend cleanly.


> 2) The example you give makes a good point. However I'm inclined to say
> that such cases should be addressed more as exceptions rather than as 
being
> the common case. Therefore, I wouldn't see all dependencies imported by 
the
> framework as required to be filtered out, because in that case we lose the
> advantage of isolation between the framework and the connectors (and we 
are
> left only with isolation between connectors).

3) I tried to abstract implem

Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-05-03 Thread Stephane Maarek
reluctance to upgrade) from our users.
> > >
> > > I still think we may have to break things, i just don't think we 
should
> > do
> > > it for things like builders vs direct constructors which i think are
> kind
> > > of a debatable matter of taste.
> > >
> > > -Jay
> > >
> > >
> > >
> > > On Mon, Apr 24, 2017 at 9:40 AM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > >> Hey Jay,
> > >>
> > >> I understand your concern, and for sure, we will need to keep the
> > >> current constructors deprecated for a long time (ie, many years).
> > >>
> > >> But if we don't make the move, we will not be able to improve. And I
> > >> think warnings about using deprecated APIs is an acceptable price to
> > >> pay. And the API improvements will help new people who adopt Kafka to
> > >> get started more easily.
> > >>
> > >> Otherwise Kafka might end up as many other enterprise software with a
> > >> lots of old stuff that is kept forever because nobody has the guts to
> > >> improve/change it.
> > >>
> > >> Of course, we can still improve the docs of the deprecated
> constructors,
> > >> too.
> > >>
> > >> Just my two cents.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 4/23/17 3:37 PM, Jay Kreps wrote:
> > >>> Hey guys,
> > >>>
> > >>> I definitely think that the constructors could have been better
> > designed,
    > > >>> but I think given that they're in heavy use I don't think this
> proposal
> > >>> will improve things. Deprecating constructors just leaves everyone
> with
> > >>> lots of warnings and crossed out things. We can't actually delete 
the
> > >>> methods because lots of code needs to be usable across multiple 
Kafka
> > >>> versions, right? So we aren't picking between the original approach
> > >> (worse)
> > >>> and the new approach (better); what we are proposing is a perpetual
> > >>> mingling of the original style and the new style with a bunch of
> > >> deprecated
> > >>> stuff, which I think is worst of all.
> > >>>
> > >>> I'd vote for just documenting the meaning of null in the
> ProducerRecord
> > >>> constructor.
> > >>>
> > >>> -Jay
> > >>>
> > >>> On Wed, Apr 19, 2017 at 3:34 PM, Stephane Maarek <
> > >>> steph...@simplemachines.com.au> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> My first KIP, let me know your thoughts!
> > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
> > >>>> 141+-+ProducerRecordBuilder+Interface
> > >>>>
> > >>>>
> > >>>> Cheers,
> > >>>> Stephane
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>





Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

2017-05-03 Thread Stephane Maarek
Glad you find the feedback useful !
Definitely all the ideas should be split in reasonable length KIPs. I just want 
to make sure the ideas are not lost. I won’t create the subsequent KIPs because 
I’m not good enough to implement the changes, but happy to keep on providing 
feedback alongside the way. 

Regarding the versioning comments: yes there’s a version of Connector, but how 
is that referenced in the config? I believe no config exposes a “version” 
field, which would tie a configuration to a connector version?
Regarding shipping connect with a few connectors, that’s fine, but once a 
capability to pull from maven is here, I’d rather have a vanilla lightweight 
connect. Anyway, discussions for later. 
 
 

On 4/5/17, 4:17 am, "Konstantine Karantasis"  wrote:

Thank you Stephane,

your comments bring interesting and useful subjects to the discussion. I'm
adding my replies below Ewen's comments.


On Tue, May 2, 2017 at 10:15 PM, Ewen Cheslack-Postava 
wrote:

> On Tue, May 2, 2017 at 10:01 PM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
> Excellent feedback, Stephane!
>
> Thanks for the work, it’s definitely needed!
> > I’d like to suggest to take it one step further.
> >
> > To me, I’d like to see Kafka Connect the same way we have Docker and
> > Docker repositories.
> >
> > Here’s how I would envision the flow:
> > - Kafka Connect workers are just workers. They come with no jars
> whatsoever
> > - The REST API allow you to add a config to the connect cluster
> > - The workers, seeing the config, pull the jars from the available
> > (maven?) repositories (public or private)
> >
>
> I think supporting this mode is really valuable. It seems *really*
> attractive if you have some easily accessible, scalable, centralized
> storage for connectors (i.e. some central distributed FS).
>
> But having to jump through these hoops (which presumably include some 
extra
> URLs to point to the connectors, some definition of what that URL points 
to
> e.g. zip of jars? uberjar? something with a more complex spec?) just to do
> a quickstart seems like quite a bit of overhead. I think we should 
consider
> both a) incremental progress that opens up new options but doesn't prevent
> us from the ideal end state and b) all local testing/dev/prod use cases
> (which is also why I still like having the plain old CLASSPATH option
> available).
>
> I think the proposal leaves open the scope for this -- it doesn't specify
> connector-specific overrides, but that's obviously something that could be
> added easily.
>
>
Regarding 1) the workers carrying no modules, I believe we are pretty close
to this situation today, at least in terms of connectors. Still, I feel
that whether we bundle a few basic modules with the framework is orthogonal
to class loading isolation. I agree with you that Connectors, Converters
and Transformations should be external modules that are loaded by the
framework. But having a few basic ones shipped with Connect simplifies
on-boarding and quickstarts significantly.

Regarding 2) and 3) these are very good to have, and definitely belong to
the near term vision for Kafka Connect. Many interesting things to do here,
from programmatically fetching connectors and their dependencies from maven
repos (using something like Aether maybe), to deciding to support certain
types of module bundling such as zip, uberjars etc. However dealing with
the issue of extended discoverability at this point seems to broaden
significantly the scope of this KIP. I think it's more practical (and
probably faster) to proceed in phases. I estimate that after this KIP,
subsequent KIPs will be intuitive and quite transparent.



> > - Classpath isolation comes into play so that the pulled jar doesn’t
> > interact with other connectors
> > - Additionally, I believe the config should have a “tag” or “version”
> > (like docker really), so that
> > o you can run different versions of the same connector on your connect
> > cluster
> > o configurations are strongly linked to a connector (right now if I
> update
> > my connector jars, I may break my configuration)
> >
>
> We have versions on Connectors. We don't have them on transformations or
> converters, which was definitely an oversight -- we should figure out how
> to get them in there.
>
> I think none of the proposals here rule out taking advantage of versioning
> i

Re: [VOTE] KIP-154 Add Kafka Connect configuration properties for creating internal topics

2017-05-08 Thread Stephane Maarek
+1 (non binding) 
 
 

On 9/5/17, 5:51 am, "Randall Hauch"  wrote:

Hi, everyone.

Given the simple and non-controversial nature of the KIP, I would like to
start the voting process for KIP-154: Add Kafka Connect configuration
properties for creating internal topics:


https://cwiki.apache.org/confluence/display/KAFKA/KIP-154+Add+Kafka+Connect+configuration+properties+for+creating+internal+topics

The vote will run for a minimum of 72 hours.

Thanks,

Randall





Re: [VOTE] KIP-146: Isolation of dependencies and classes in Kafka Connect (restarted voting thread)

2017-05-08 Thread Stephane Maarek
+1
Thanks heaps I can’t wait!
 

On 9/5/17, 4:48 am, "Konstantine Karantasis"  wrote:

** Restarting the voting thread here, with a different title to avoid
collapsing this thread's messages with the discussion thread's messages in
mail clients. Apologies for the inconvenience. **


Hi all,

Given that the comments during the discussion seem to have been addressed,
I'm pleased to bring

KIP-146: Classloading Isolation in Connect
https://cwiki.apache.org/confluence/display/KAFKA/KIP-
146+-+Classloading+Isolation+in+Connect

up for voting. Again, this KIP aims to bring the highly desired feature of
dependency isolation in Kafka Connect.

In the meantime, for any additional feedback, please continue to send your
comments in the discussion thread here:

https://www.mail-archive.com/dev@kafka.apache.org/msg71453.html

This voting thread will stay active for a minimum of 72 hours.

Sincerely,
Konstantine





[jira] [Created] (KAFKA-7066) Make Streams Runtime Error User Friendly in Case of Serialisation exception

2018-06-16 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-7066:
--

 Summary: Make Streams Runtime Error User Friendly in Case of 
Serialisation exception
 Key: KAFKA-7066
 URL: https://issues.apache.org/jira/browse/KAFKA-7066
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 1.1.0
Reporter: Stephane Maarek
 Fix For: 2.0.0


This kind of exception can be cryptic for the beginner:
{code:java}
ERROR stream-thread 
[favourite-colors-application-a336770d-6ba6-4bbb-8681-3c8ea91bd12e-StreamThread-1]
 Failed to process stream task 2_0 due to the following error: 
(org.apache.kafka.streams.processor.internals.AssignedStreamsTasks:105)
java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String
at 
org.apache.kafka.common.serialization.StringSerializer.serialize(StringSerializer.java:28)
at org.apache.kafka.streams.state.StateSerdes.rawValue(StateSerdes.java:178)
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore$1.innerValue(MeteredKeyValueBytesStore.java:66)
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore$1.innerValue(MeteredKeyValueBytesStore.java:57)
at 
org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.put(InnerMeteredKeyValueStore.java:198)
at 
org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.put(MeteredKeyValueBytesStore.java:117)
at 
org.apache.kafka.streams.kstream.internals.KTableAggregate$KTableAggregateProcessor.process(KTableAggregate.java:95)
at 
org.apache.kafka.streams.kstream.internals.KTableAggregate$KTableAggregateProcessor.process(KTableAggregate.java:56)
at 
org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
at 
org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
at 
org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:124)
at 
org.apache.kafka.streams.processor.internals.AbstractProcessorContext.forward(AbstractProcessorContext.java:174)
at 
org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:80)
at 
org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:224)
at 
org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
at 
org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:411)
at 
org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:918)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:798)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:750)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:720){code}

We should add more detailed logging already present in SinkNode to assist the 
user into solving this issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7077) producerProps.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5") producerProps.put(ProducerConfig.ENABLE_IDEMPOTENCE, true)

2018-06-19 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-7077:
--

 Summary: 
producerProps.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5") 
producerProps.put(ProducerConfig.ENABLE_IDEMPOTENCE, true)
 Key: KAFKA-7077
 URL: https://issues.apache.org/jira/browse/KAFKA-7077
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 2.0.0
Reporter: Stephane Maarek
Assignee: Stephane Maarek






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7355) Topic Configuration Changes are not applied until reboot

2018-08-29 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-7355:
--

 Summary: Topic Configuration Changes are not applied until reboot
 Key: KAFKA-7355
 URL: https://issues.apache.org/jira/browse/KAFKA-7355
 Project: Kafka
  Issue Type: Bug
  Components: config, core
Affects Versions: 2.0.0
Reporter: Stephane Maarek


Steps to reproduce:

{code}
kafka-topics --zookeeper 127.0.0.1:2181 --create --topic employee-salary 
--partitions 1 --replication-factor 1

kafka-configs --zookeeper 127.0.0.1:2181 --alter --entity-type topics 
--entity-name employee-salary --add-config 
cleanup.policy=compact,min.cleanable.dirty.ratio=0.001,segment.ms=5000

kafka-configs --zookeeper 127.0.0.1:2181 --alter --entity-type topics 
--entity-name employee-salary

kafka-console-producer --broker-list 127.0.0.1:9092 --topic employee-salary 
--property parse.key=true --property key.separator=,
{code}

Try publishing a bunch of data, and no segment roll over will happen (even 
though segment.ms=5000). I looked at the kafka directory and the kafka logs to 
ensure 

I noticed the broker processed the notification of config changes, but the 
behaviour was not updated to use the new config values nonetheless. 

After restarting the broker, the expected behaviour is observed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-5992) Better Java Documentation for AdminClient Exceptions

2017-09-28 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-5992:
--

 Summary: Better Java Documentation for AdminClient Exceptions
 Key: KAFKA-5992
 URL: https://issues.apache.org/jira/browse/KAFKA-5992
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.11.0.1
Reporter: Stephane Maarek


When invoking a describeTopics operation on a topic that does not exist, we get 
an InvalidTopicException as a RuntimeException.

I believe this should be documented, and the API maybe changed:

For example changing:
{code:java}
public DescribeTopicsResult describeTopics(Collection topicNames) {
{code}

To:
{code:java}
public DescribeTopicsResult describeTopics(Collection topicNames) 
throws InvalidTopicException 
{code}

Additionally, in case multiple topics don't exist, only the first one will 
throw an error. This is really not scalable. 

Maybe the DescribeTopicsResult could have a Boolean "topicExists" ? 
Up for discussion





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5993) Kafka AdminClient does not support standard security settings

2017-09-28 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-5993:
--

 Summary: Kafka AdminClient does not support standard security 
settings
 Key: KAFKA-5993
 URL: https://issues.apache.org/jira/browse/KAFKA-5993
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.11.0.1
Reporter: Stephane Maarek


Kafka Admin Client does not support basic security configurations, such as 
"sasl.jaas.config".
Therefore it makes it impossible to use against a secure cluster

```
14:12:12.948 [main] WARN  org.apache.kafka.clients.admin.AdminClientConfig - 
The configuration 'sasl.jaas.config' was supplied but isn't a known config.
```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6007) Connect can't validate against transforms in plugins.path

2017-10-03 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-6007:
--

 Summary: Connect can't validate against transforms in plugins.path
 Key: KAFKA-6007
 URL: https://issues.apache.org/jira/browse/KAFKA-6007
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.11.0.1
Reporter: Stephane Maarek


Kafka Connect can't validate a custom transformation if placed in plugins path.
Here's the output I get on the validate call:


{code:java}
Invalid value com.mycorp.kafka.transforms.impl.FlattenSinkRecord for 
configuration transforms.Flat.type: Class 
com.mycorp.kafka.transforms.impl.FlattenSinkRecord could not be found.
Invalid value null for configuration transforms.Flat.type: Not a Transformation
"recommended_values": [   
"com.mycorp.kafka.transforms.Flatten$Key",
"com.mycorp.kafka.transforms.Flatten$Value",
"com.mycorp.kafka.transforms.impl.FlattenSinkRecord",
"org.apache.kafka.connect.transforms.Cast$Key",
"org.apache.kafka.connect.transforms.Cast$Value",
"org.apache.kafka.connect.transforms.ExtractField$Key",
"org.apache.kafka.connect.transforms.ExtractField$Value",
"org.apache.kafka.connect.transforms.Flatten$Key",
"org.apache.kafka.connect.transforms.Flatten$Value",
"org.apache.kafka.connect.transforms.HoistField$Key",
"org.apache.kafka.connect.transforms.HoistField$Value",
"org.apache.kafka.connect.transforms.InsertField$Key",
"org.apache.kafka.connect.transforms.InsertField$Value",
"org.apache.kafka.connect.transforms.MaskField$Key",
"org.apache.kafka.connect.transforms.MaskField$Value",
"org.apache.kafka.connect.transforms.RegexRouter",
"org.apache.kafka.connect.transforms.ReplaceField$Key",
"org.apache.kafka.connect.transforms.ReplaceField$Value",
"org.apache.kafka.connect.transforms.SetSchemaMetadata$Key",
"org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"org.apache.kafka.connect.transforms.TimestampConverter$Key",
"org.apache.kafka.connect.transforms.TimestampConverter$Value",
"org.apache.kafka.connect.transforms.TimestampRouter",
"org.apache.kafka.connect.transforms.ValueToKey"],

{code}

As you can see the class appear in the recommended values (!) but can't be 
picked up on the validate call. 

I believe it's because the recommender implements class discovery using plugins:
https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ConnectorConfig.java#L194

But the class inference itself doesn't:
https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ConnectorConfig.java#L199

(I'm not an expert in class loading though, just a guess... Unsure how to fix)

A quick fix is to add the transformations in the ClassPath itself, but that 
defeats the point a bit. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-5993) Kafka Admin Client throws a warning for sasl.jaas.config

2017-10-03 Thread Stephane Maarek (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephane Maarek resolved KAFKA-5993.

Resolution: Not A Problem

> Kafka Admin Client throws a warning for sasl.jaas.config
> 
>
> Key: KAFKA-5993
> URL: https://issues.apache.org/jira/browse/KAFKA-5993
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.1
>        Reporter: Stephane Maarek
>
> Kafka Admin Client does not support basic security configurations, such as 
> "sasl.jaas.config".
> Therefore it makes it impossible to use against a secure cluster
> ```
> 14:12:12.948 [main] WARN  org.apache.kafka.clients.admin.AdminClientConfig - 
> The configuration 'sasl.jaas.config' was supplied but isn't a known config.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6092) Time passed in punctuate call is currentTime, not punctuate schedule time.

2017-10-19 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-6092:
--

 Summary: Time passed in punctuate call is currentTime, not 
punctuate schedule time. 
 Key: KAFKA-6092
 URL: https://issues.apache.org/jira/browse/KAFKA-6092
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.11.0.0
Reporter: Stephane Maarek


The java doc specifies that for a Transformer, calling context.schedule calls 
punctuate every 1000ms. This is not entirely accurate, as if no data is 
received for a while, punctuate won't be called.

{code}
 * void init(ProcessorContext context) {
 * this.context = context;
 * this.state = context.getStateStore("myTransformState");
 * context.schedule(1000); // call #punctuate() each 1000ms
 * }
{code}

When you receive new data say after 20 seconds, punctuate will play catch up 
and will be called 20 times at reception of the new data. 

the signature of punctuate is
{code}
* KeyValue punctuate(long timestamp) {
 * // can access this.state
 * // can emit as many new KeyValue pairs as required via 
this.context#forward()
 * return null; // don't return result -- can also be "new 
KeyValue()"
 * }
{code}

but the timestamp being passed is currentTimestamp at the time of the call to 
punctuate, not at the time the punctuate was scheduled. It is very confusing 
and I think the timestamp should represent the one at which the punctuate 
should have been scheduled. Getting the current timestamp is not adding much 
information as it can easily obtained using  System.currentTimeMillis();



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5468) Change Source offset commit message to info to match Sink behaviour

2017-06-18 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-5468:
--

 Summary: Change Source offset commit message to info to match Sink 
behaviour
 Key: KAFKA-5468
 URL: https://issues.apache.org/jira/browse/KAFKA-5468
 Project: Kafka
  Issue Type: Improvement
Reporter: Stephane Maarek


When WorkerSinkTask does a commit, we get an INFO message:
https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L278

When WorkerSourceTask does a commit, we get a DEBUG message:
https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L284

Changing it to INFO is a bit easier to easily measure timeout and get a better 
understanding of source connectors behaviour and tune the timeout settings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5851) CPU overhead to having partitions (even inactive)

2017-09-07 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-5851:
--

 Summary: CPU overhead to having partitions (even inactive)
 Key: KAFKA-5851
 URL: https://issues.apache.org/jira/browse/KAFKA-5851
 Project: Kafka
  Issue Type: Bug
 Environment: 0.11.0.0, 0.10.x
Reporter: Stephane Maarek


We're running on three r4.xlarge, and this is our dev environment.

Some people managed to create 1200 topics with 3 partitions, so we end up at 
4000 partitions per broker (we have a replication factor of 3). Even though no 
data goes through the cluster (it sits at a comfortable 20 messages per 
seconds), we saw CPU at 100% (out of 400%).

I went ahead today and deleted 700 topics that I knew were unused. And CPU went 
drastically down. See image

!https://i.imgur.com/OIPTwDM.png!

We use the defaults for our brokers, and they use PLAINTEXT internally to 
replicate. 

I'm not sure of the root cause of this, threads, replication, log cleanup, etc, 
and I guess it wouldn't be too hard to replicate (just create 1000 topics on a 
vanilla cluster and see CPU go up). 

Hope that helps



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-4560) Min / Max Partitions Fetch Records params

2016-12-19 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-4560:
--

 Summary: Min / Max Partitions Fetch Records params
 Key: KAFKA-4560
 URL: https://issues.apache.org/jira/browse/KAFKA-4560
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 0.10.0.1
Reporter: Stephane Maarek


There is currently a `max.partition.fetch.bytes` parameter to limit the total 
size of the fetch call (also a min).

Sometimes I'd like to control how many records altogether I'm getting at the 
time and I'd like to see a `max.partition.fetch.records` (also a min).

If both are specified the first condition that is met would complete the fetch 
call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4560) Min / Max Partitions Fetch Records params

2016-12-19 Thread Stephane Maarek (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762979#comment-15762979
 ] 

Stephane Maarek commented on KAFKA-4560:


Hi [~huxi_2b], this setting has the following description: "The maximum number 
of records returned in a single call to poll()." 

This doesn't affect how many records are returned in the end per partition, 
just affect how many records are affected at each time of the poll call within 
a loop. 
As you see 
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L991
 isn't affected by the parameter max.partition.fetch.bytes which is probably at 
a higher level wrapper call (I can't find it)

> Min / Max Partitions Fetch Records params
> -
>
> Key: KAFKA-4560
> URL: https://issues.apache.org/jira/browse/KAFKA-4560
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.1
>Reporter: Stephane Maarek
>  Labels: features, newbie
>
> There is currently a `max.partition.fetch.bytes` parameter to limit the total 
> size of the fetch call (also a min).
> Sometimes I'd like to control how many records altogether I'm getting at the 
> time and I'd like to see a `max.partition.fetch.records` (also a min).
> If both are specified the first condition that is met would complete the 
> fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4560) Min / Max Partitions Fetch Records params

2016-12-19 Thread Stephane Maarek (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763036#comment-15763036
 ] 

Stephane Maarek commented on KAFKA-4560:


Basically if you set the max bytes, each partitions will fetch records up until 
the max bytes is met. 

My ticket is a feature request to offer a similar parameter, named 
max.partitions.fetch.records, which would limit the number of records fetched 
by partitions at every request. 

The requirement isn't addressed by max.poll.records.

Makes sense?

> Min / Max Partitions Fetch Records params
> -
>
> Key: KAFKA-4560
> URL: https://issues.apache.org/jira/browse/KAFKA-4560
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.1
>Reporter: Stephane Maarek
>  Labels: features, newbie
>
> There is currently a `max.partition.fetch.bytes` parameter to limit the total 
> size of the fetch call (also a min).
> Sometimes I'd like to control how many records altogether I'm getting at the 
> time and I'd like to see a `max.partition.fetch.records` (also a min).
> If both are specified the first condition that is met would complete the 
> fetch call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4566) Can't Symlink to Kafka bins

2016-12-21 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-4566:
--

 Summary: Can't Symlink to Kafka bins
 Key: KAFKA-4566
 URL: https://issues.apache.org/jira/browse/KAFKA-4566
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.10.1.1
Reporter: Stephane Maarek


in the kafka consumer for example, the last line is :
https://github.com/apache/kafka/blob/trunk/bin/kafka-console-consumer.sh#L21

{code}
exec $(dirname $0)/kafka-run-class.sh kafka.tools.ConsoleConsumer "$@"
{code}

if I create a symlink using 
{code}
ln -s
{code}

it doesn't resolve the right directory name because of $(dirname $0) 

I believe the right way is to do:
{code}
"$(dirname "$(readlink -e "$0")")"
{code}
 

Any thoughts on that before I do a PR?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4864) Kafka Secure Migrator tool doesn't secure all the nodes

2017-03-07 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-4864:
--

 Summary: Kafka Secure Migrator tool doesn't secure all the nodes
 Key: KAFKA-4864
 URL: https://issues.apache.org/jira/browse/KAFKA-4864
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.2.0, 0.10.1.1, 0.10.1.0
Reporter: Stephane Maarek


It seems that the secure nodes as referred by ZkUtils.scala are the following:

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/utils/ZkUtils.scala#L201

A couple things:
- the list is highly outdated, and for example the most important nodes such as 
kafka-acls don't get secured. That's a huge security risk. Would it be better 
to just secure all the nodes from the given root?
- the root of some nodes aren't secured. Ex: /brokers (but many others).

The result is the following after running the tool:
zookeeper-security-migration --zookeeper.acl secure --zookeeper.connect 
zoo1:2181/kafka-test

[zk: localhost:2181(CONNECTED) 9] getAcl /kafka-test/brokers
'world,'anyone
: cdrwa
[zk: localhost:2181(CONNECTED) 11] getAcl /kafka-test/brokers/ids
'world,'anyone
: r
'sasl,'myzkcli...@example.com
: cdrwa
[zk: localhost:2181(CONNECTED) 16] getAcl /kafka-test/kafka-acl
'world,'anyone
: cdrwa

That seems pretty bad to be honest... A fast enough ZkClient could delete some 
root nodes, and create the nodes they like before the Acls get set. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4864) Kafka Secure Migrator tool doesn't secure all the nodes

2017-03-07 Thread Stephane Maarek (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephane Maarek updated KAFKA-4864:
---
Priority: Critical  (was: Major)

> Kafka Secure Migrator tool doesn't secure all the nodes
> ---
>
> Key: KAFKA-4864
> URL: https://issues.apache.org/jira/browse/KAFKA-4864
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0, 0.10.1.1, 0.10.2.0
>    Reporter: Stephane Maarek
>Priority: Critical
>
> It seems that the secure nodes as referred by ZkUtils.scala are the following:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/utils/ZkUtils.scala#L201
> A couple things:
> - the list is highly outdated, and for example the most important nodes such 
> as kafka-acls don't get secured. That's a huge security risk. Would it be 
> better to just secure all the nodes from the given root?
> - the root of some nodes aren't secured. Ex: /brokers (but many others).
> The result is the following after running the tool:
> zookeeper-security-migration --zookeeper.acl secure --zookeeper.connect 
> zoo1:2181/kafka-test
> [zk: localhost:2181(CONNECTED) 9] getAcl /kafka-test/brokers
> 'world,'anyone
> : cdrwa
> [zk: localhost:2181(CONNECTED) 11] getAcl /kafka-test/brokers/ids
> 'world,'anyone
> : r
> 'sasl,'myzkcli...@example.com
> : cdrwa
> [zk: localhost:2181(CONNECTED) 16] getAcl /kafka-test/kafka-acl
> 'world,'anyone
> : cdrwa
> That seems pretty bad to be honest... A fast enough ZkClient could delete 
> some root nodes, and create the nodes they like before the Acls get set. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4864) Kafka Secure Migrator tool doesn't secure all the nodes

2017-03-07 Thread Stephane Maarek (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephane Maarek updated KAFKA-4864:
---
Description: 
It seems that the secure nodes as referred by ZkUtils.scala are the following:

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/utils/ZkUtils.scala#L201

A couple things:
- the list is highly outdated, and for example the most important nodes such as 
kafka-acls don't get secured. That's a huge security risk. Would it be better 
to just secure all the nodes recursively from the given root?
- the root of some nodes aren't secured. Ex: /brokers (but many others).

The result is the following after running the tool:
zookeeper-security-migration --zookeeper.acl secure --zookeeper.connect 
zoo1:2181/kafka-test

[zk: localhost:2181(CONNECTED) 9] getAcl /kafka-test/brokers
'world,'anyone
: cdrwa
[zk: localhost:2181(CONNECTED) 11] getAcl /kafka-test/brokers/ids
'world,'anyone
: r
'sasl,'myzkcli...@example.com
: cdrwa
[zk: localhost:2181(CONNECTED) 16] getAcl /kafka-test/kafka-acl
'world,'anyone
: cdrwa

That seems pretty bad to be honest... A fast enough ZkClient could delete some 
root nodes, and create the nodes they like before the Acls get set. 

  was:
It seems that the secure nodes as referred by ZkUtils.scala are the following:

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/utils/ZkUtils.scala#L201

A couple things:
- the list is highly outdated, and for example the most important nodes such as 
kafka-acls don't get secured. That's a huge security risk. Would it be better 
to just secure all the nodes from the given root?
- the root of some nodes aren't secured. Ex: /brokers (but many others).

The result is the following after running the tool:
zookeeper-security-migration --zookeeper.acl secure --zookeeper.connect 
zoo1:2181/kafka-test

[zk: localhost:2181(CONNECTED) 9] getAcl /kafka-test/brokers
'world,'anyone
: cdrwa
[zk: localhost:2181(CONNECTED) 11] getAcl /kafka-test/brokers/ids
'world,'anyone
: r
'sasl,'myzkcli...@example.com
: cdrwa
[zk: localhost:2181(CONNECTED) 16] getAcl /kafka-test/kafka-acl
'world,'anyone
: cdrwa

That seems pretty bad to be honest... A fast enough ZkClient could delete some 
root nodes, and create the nodes they like before the Acls get set. 


> Kafka Secure Migrator tool doesn't secure all the nodes
> ---
>
> Key: KAFKA-4864
> URL: https://issues.apache.org/jira/browse/KAFKA-4864
>     Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0, 0.10.1.1, 0.10.2.0
>Reporter: Stephane Maarek
>Priority: Critical
>
> It seems that the secure nodes as referred by ZkUtils.scala are the following:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/utils/ZkUtils.scala#L201
> A couple things:
> - the list is highly outdated, and for example the most important nodes such 
> as kafka-acls don't get secured. That's a huge security risk. Would it be 
> better to just secure all the nodes recursively from the given root?
> - the root of some nodes aren't secured. Ex: /brokers (but many others).
> The result is the following after running the tool:
> zookeeper-security-migration --zookeeper.acl secure --zookeeper.connect 
> zoo1:2181/kafka-test
> [zk: localhost:2181(CONNECTED) 9] getAcl /kafka-test/brokers
> 'world,'anyone
> : cdrwa
> [zk: localhost:2181(CONNECTED) 11] getAcl /kafka-test/brokers/ids
> 'world,'anyone
> : r
> 'sasl,'myzkcli...@example.com
> : cdrwa
> [zk: localhost:2181(CONNECTED) 16] getAcl /kafka-test/kafka-acl
> 'world,'anyone
> : cdrwa
> That seems pretty bad to be honest... A fast enough ZkClient could delete 
> some root nodes, and create the nodes they like before the Acls get set. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4871) Kafka doesn't respect TTL on Zookeeper hostname - crash if zookeeper IP changes

2017-03-08 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-4871:
--

 Summary: Kafka doesn't respect TTL on Zookeeper hostname - crash 
if zookeeper IP changes
 Key: KAFKA-4871
 URL: https://issues.apache.org/jira/browse/KAFKA-4871
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.2.0
Reporter: Stephane Maarek


I had a Zookeeper cluster that automatically obtains hostname so that they 
remain constant over time. I deleted my 3 zookeeper machines and new machines 
came back online, with the same hostname, and they updated their CNAME

Kafka then failed and couldn't reconnect to Zookeeper as it didn't try to 
resolve the IP of Zookeeper again. See log below:

[2017-03-09 05:49:57,302] INFO Client will use GSSAPI as SASL mechanism. 
(org.apache.zookeeper.client.ZooKeeperSaslClient)
[2017-03-09 05:49:57,302] INFO Opening socket connection to server 
zookeeper-3.example.com/10.12.79.43:2181. Will attempt to SASL-authenticate 
using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)

[ec2-user]$ dig +short zookeeper-3.example.com
10.12.79.36

As you can see even though the machine is capable of finding the new hostname, 
Kafka somehow didn't respect the TTL (was set to 60 seconds) and didn't get the 
new IP. I feel that on failed Zookeeper connection, Kafka should at least try 
to resolve the new Zookeeper IP. That allows Kafka to keep up with Zookeeper 
changes over time

What do you think? Is that expected behaviour or a bug?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4871) Kafka doesn't respect TTL on Zookeeper hostname - crash if zookeeper IP changes

2017-03-09 Thread Stephane Maarek (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephane Maarek updated KAFKA-4871:
---

Yeah definitely ! Seeing how this issue has been opened for two years I'm
quite worried about the way forward until it trickles down to Kafka. Is
there a temporary fix you can think of ?




> Kafka doesn't respect TTL on Zookeeper hostname - crash if zookeeper IP 
> changes
> ---
>
> Key: KAFKA-4871
> URL: https://issues.apache.org/jira/browse/KAFKA-4871
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
>Reporter: Stephane Maarek
>
> I had a Zookeeper cluster that automatically obtains hostname so that they 
> remain constant over time. I deleted my 3 zookeeper machines and new machines 
> came back online, with the same hostname, and they updated their CNAME
> Kafka then failed and couldn't reconnect to Zookeeper as it didn't try to 
> resolve the IP of Zookeeper again. See log below:
> [2017-03-09 05:49:57,302] INFO Client will use GSSAPI as SASL mechanism. 
> (org.apache.zookeeper.client.ZooKeeperSaslClient)
> [2017-03-09 05:49:57,302] INFO Opening socket connection to server 
> zookeeper-3.example.com/10.12.79.43:2181. Will attempt to SASL-authenticate 
> using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
> [ec2-user]$ dig +short zookeeper-3.example.com
> 10.12.79.36
> As you can see even though the machine is capable of finding the new 
> hostname, Kafka somehow didn't respect the TTL (was set to 60 seconds) and 
> didn't get the new IP. I feel that on failed Zookeeper connection, Kafka 
> should at least try to resolve the new Zookeeper IP. That allows Kafka to 
> keep up with Zookeeper changes over time
> What do you think? Is that expected behaviour or a bug?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-03-22 Thread Stephane Maarek (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937468#comment-15937468
 ] 

Stephane Maarek commented on KAFKA-2729:


If I may add, this is a pretty bad issue, but it got worse. You not only have 
to recover Kafka, but also recover your Kafka Connect ClusterS. They got stuck 
for me in the following state:

[2017-03-23 00:06:05,478] INFO Marking the coordinator kafka-1:9092 (id: 
2147483626 rack: null) dead for group connect-MyConnector 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2017-03-23 00:06:05,478] INFO Marking the coordinator kafka-1:9092 (id: 
2147483626 rack: null) dead for group connect-MyConnector 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4942) Kafka Connect: Offset committing times out before expected

2017-03-22 Thread Stephane Maarek (JIRA)
Stephane Maarek created KAFKA-4942:
--

 Summary: Kafka Connect: Offset committing times out before expected
 Key: KAFKA-4942
 URL: https://issues.apache.org/jira/browse/KAFKA-4942
 Project: Kafka
  Issue Type: Bug
Reporter: Stephane Maarek
Priority: Critical


I run a connector that deals with a lot of data, in a kafka connect cluster

When the offsets are getting committed, I get the following:
[2017-03-23 03:56:25,134] INFO WorkerSinkTask{id=MyConnector-1} Committing 
offsets (org.apache.kafka.connect.runtime.WorkerSinkTask)
[2017-03-23 03:56:25,135] WARN Commit of WorkerSinkTask{id=MyConnector-1} 
offsets timed out (org.apache.kafka.connect.runtime.WorkerSinkTask)

If you look at the timestamps, they're 1 ms apart. My settings are the 
following: 
offset.flush.interval.ms = 12
offset.flush.timeout.ms = 6
offset.storage.topic = _connect_offsets

It seems the offset flush timeout setting is completely ignored for the look of 
the logs. I would expect the timeout message to happen 60 seconds after the 
commit offset INFO message, not 1 millisecond later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4942) Kafka Connect: Offset committing times out before expected

2017-03-22 Thread Stephane Maarek (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephane Maarek updated KAFKA-4942:
---
Description: 
On Kafka 0.10.2.0
I run a connector that deals with a lot of data, in a kafka connect cluster

When the offsets are getting committed, I get the following:
{code}
[2017-03-23 03:56:25,134] INFO WorkerSinkTask{id=MyConnector-1} Committing 
offsets (org.apache.kafka.connect.runtime.WorkerSinkTask)
[2017-03-23 03:56:25,135] WARN Commit of WorkerSinkTask{id=MyConnector-1} 
offsets timed out (org.apache.kafka.connect.runtime.WorkerSinkTask)
{code}

If you look at the timestamps, they're 1 ms apart. My settings are the 
following: 
offset.flush.interval.ms = 12
offset.flush.timeout.ms = 6
offset.storage.topic = _connect_offsets

It seems the offset flush timeout setting is completely ignored for the look of 
the logs. I would expect the timeout message to happen 60 seconds after the 
commit offset INFO message, not 1 millisecond later.

  was:
I run a connector that deals with a lot of data, in a kafka connect cluster

When the offsets are getting committed, I get the following:
[2017-03-23 03:56:25,134] INFO WorkerSinkTask{id=MyConnector-1} Committing 
offsets (org.apache.kafka.connect.runtime.WorkerSinkTask)
[2017-03-23 03:56:25,135] WARN Commit of WorkerSinkTask{id=MyConnector-1} 
offsets timed out (org.apache.kafka.connect.runtime.WorkerSinkTask)

If you look at the timestamps, they're 1 ms apart. My settings are the 
following: 
offset.flush.interval.ms = 12
offset.flush.timeout.ms = 6
offset.storage.topic = _connect_offsets

It seems the offset flush timeout setting is completely ignored for the look of 
the logs. I would expect the timeout message to happen 60 seconds after the 
commit offset INFO message, not 1 millisecond later.


> Kafka Connect: Offset committing times out before expected
> --
>
> Key: KAFKA-4942
> URL: https://issues.apache.org/jira/browse/KAFKA-4942
> Project: Kafka
>  Issue Type: Bug
>    Reporter: Stephane Maarek
>Priority: Critical
>
> On Kafka 0.10.2.0
> I run a connector that deals with a lot of data, in a kafka connect cluster
> When the offsets are getting committed, I get the following:
> {code}
> [2017-03-23 03:56:25,134] INFO WorkerSinkTask{id=MyConnector-1} Committing 
> offsets (org.apache.kafka.connect.runtime.WorkerSinkTask)
> [2017-03-23 03:56:25,135] WARN Commit of WorkerSinkTask{id=MyConnector-1} 
> offsets timed out (org.apache.kafka.connect.runtime.WorkerSinkTask)
> {code}
> If you look at the timestamps, they're 1 ms apart. My settings are the 
> following: 
>   offset.flush.interval.ms = 12
>   offset.flush.timeout.ms = 6
>   offset.storage.topic = _connect_offsets
> It seems the offset flush timeout setting is completely ignored for the look 
> of the logs. I would expect the timeout message to happen 60 seconds after 
> the commit offset INFO message, not 1 millisecond later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4942) Kafka Connect: Offset committing times out before expected

2017-03-22 Thread Stephane Maarek (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephane Maarek updated KAFKA-4942:
---
Description: 
On Kafka 0.10.2.0
I run a connector that deals with a lot of data, in a kafka connect cluster

When the offsets are getting committed, I get the following:
{code}
[2017-03-23 03:56:25,134] INFO WorkerSinkTask{id=MyConnector-1} Committing 
offsets (org.apache.kafka.connect.runtime.WorkerSinkTask)
[2017-03-23 03:56:25,135] WARN Commit of WorkerSinkTask{id=MyConnector-1} 
offsets timed out (org.apache.kafka.connect.runtime.WorkerSinkTask)
{code}

If you look at the timestamps, they're 1 ms apart. My settings are the 
following: 
{code}
offset.flush.interval.ms = 12
offset.flush.timeout.ms = 6
offset.storage.topic = _connect_offsets
{code}

It seems the offset flush timeout setting is completely ignored for the look of 
the logs. I would expect the timeout message to happen 60 seconds after the 
commit offset INFO message, not 1 millisecond later.

  was:
On Kafka 0.10.2.0
I run a connector that deals with a lot of data, in a kafka connect cluster

When the offsets are getting committed, I get the following:
{code}
[2017-03-23 03:56:25,134] INFO WorkerSinkTask{id=MyConnector-1} Committing 
offsets (org.apache.kafka.connect.runtime.WorkerSinkTask)
[2017-03-23 03:56:25,135] WARN Commit of WorkerSinkTask{id=MyConnector-1} 
offsets timed out (org.apache.kafka.connect.runtime.WorkerSinkTask)
{code}

If you look at the timestamps, they're 1 ms apart. My settings are the 
following: 
offset.flush.interval.ms = 12
offset.flush.timeout.ms = 6
offset.storage.topic = _connect_offsets

It seems the offset flush timeout setting is completely ignored for the look of 
the logs. I would expect the timeout message to happen 60 seconds after the 
commit offset INFO message, not 1 millisecond later.


> Kafka Connect: Offset committing times out before expected
> --
>
> Key: KAFKA-4942
> URL: https://issues.apache.org/jira/browse/KAFKA-4942
> Project: Kafka
>  Issue Type: Bug
>    Reporter: Stephane Maarek
>Priority: Critical
>
> On Kafka 0.10.2.0
> I run a connector that deals with a lot of data, in a kafka connect cluster
> When the offsets are getting committed, I get the following:
> {code}
> [2017-03-23 03:56:25,134] INFO WorkerSinkTask{id=MyConnector-1} Committing 
> offsets (org.apache.kafka.connect.runtime.WorkerSinkTask)
> [2017-03-23 03:56:25,135] WARN Commit of WorkerSinkTask{id=MyConnector-1} 
> offsets timed out (org.apache.kafka.connect.runtime.WorkerSinkTask)
> {code}
> If you look at the timestamps, they're 1 ms apart. My settings are the 
> following: 
> {code}
>   offset.flush.interval.ms = 12
>   offset.flush.timeout.ms = 6
>   offset.storage.topic = _connect_offsets
> {code}
> It seems the offset flush timeout setting is completely ignored for the look 
> of the logs. I would expect the timeout message to happen 60 seconds after 
> the commit offset INFO message, not 1 millisecond later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >