Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-22 Thread Shikhar Bhushan
flatMap() / supporting 1->n feels nice and general since filtering is just
the case of going from 1->0

I'm not sure why we'd need to do any more granular offset tracking (like
sub-offsets) for source connectors: after transformation of a given record
to n records, all of those n should map to same offset of the source
partition. The only thing to take care of here would be that we don't
commit a source offset while there are still records with that offset that
haven't been flushed to Kafka, but this is in the control of the connect
runtime.

I see your point for sink connectors, though. Implementors can currently
assume 1:1ness of a record to its Kafka coordinates (topic, partition,
offset).

On Thu, Jul 21, 2016 at 10:57 PM Ewen Cheslack-Postava 
wrote:

> Jun, The problem with it not being 1-1 is that Connect relies heavily on
> offsets, so we'd need to be able to track offsets at this finer
> granularity. Filtering is ok, but flatMap isn't. If you convert one message
> to many, what are the offsets for the new messages? One possibility would
> be to assume that transformations are deterministic and then "enhance" the
> offsets with an extra integer field that indicates its position in the
> subset. For sources this seems attractive since you can then reset to
> whatever the connector-provided offset is and then filter out any of the
> "sub"-messages that are earlier than the recorded "sub"-offset. But this
> might not actually work for sources since a) the offsets will include extra
> fields that the connector doesn't expect (might be ok since we handle that
> data as schemaless anyway) and b) if we allow multiple transformations
> (which seems likely given that people might want to do things like
> rearrange fields + filter messages) then offsets start getting quite
> complex as we add sub-sub-offsets and sub-sub-sub-offsets. It's doable, but
> seems messy.
>
> Things aren't as easy on the sink side. Since we track offsets using Kafka
> offsets we either need to use the extra metadata space to store the
> sub-offsets or we need to ensure that we only ever need to commit offsets
> on Kafka message boundaries. We might be able to get away with just
> delivering the entire set of generated messages in a single put() call,
> which the connector is expected to either fully accept or fully reject (via
> exception). However, this may end up interacting poorly with assumptions
> connectors might make if we expose things like max.poll.records, where they
> might expect one record at a time.
>
> I'm not really convinced of the benefit of support this -- at some point it
> seems better to use Streams to do transformations if you need flatMap. I
> can't think of many generic transformations that would use 1-to-many, and
> single message transforms really should be quite general -- that's the
> reason for providing a separate interface isolated from Connectors or
> Converters.
>
> Gwen, re: using null and sending to dead letter queue, it would be useful
> to think about how this might interact with other uses of a dead letter
> queue. Similar ideas have been raised for messages that either can't be
> parsed or which the connector chokes on repeatedly. If we use a dead letter
> queue for those, do we want these messages (which are explicitly filtered
> by a transform setup by the user) to end up in the same location?
>
> -Ewen
>
> On Sun, Jul 17, 2016 at 9:53 PM, Jun Rao  wrote:
>
> > Does the transformation need to be 1-to-1? For example, some users model
> > each Kafka message as schema + a batch of binary records. When using a
> sink
> > connector to push the Kafka data to a sink, if would be useful if the
> > transformer can convert each Kafka message to multiple records.
> >
> > Thanks,
> >
> > Jun
> >
> > On Sat, Jul 16, 2016 at 1:25 PM, Nisarg Shah  wrote:
> >
> > > Gwen,
> > >
> > > Yup, that sounds great! Instead of keeping it up to the transformers to
> > > handle null, we can instead have the topic as null. Sounds good. To get
> > rid
> > > of a message, set the topic to a special one (could be as simple as
> > null).
> > >
> > > Like I said before, the more interesting part would be ‘adding’ a new
> > > message to the existing list, based on say the current message in the
> > > transformer. Does that feature warrant to be included?
> > >
> > > > On Jul 14, 2016, at 22:25, Gwen Shapira  wrote:
> > > >
> > > > I used to work on Apache Flume, where we used to allow users to
> filter
> > > > messages completely in the transformation and then we got rid of it,
> > > > because we spent too much time trying to help users who had "message
> > > > loss", where the loss was actually a bug in the filter...
> > > >
> > > > What we couldn't do in Flume, but perhaps can do in the simple
> > > > transform for Connect is the ability to route messages to different
> > > > topics, with "null" as one of the possible targets. This will allow
> > > > you to implement a dead-letter-queue functionality and redirect
> > > > me

Re: [DISCUSS] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-07-22 Thread Ismael Juma
Thanks for the KIP Vahid. The change makes sense. On the compatibility
front, could we check some of the advanced Kafka users like Storm and Spark
in order to verify if they would be affected?

Ismael

On Wed, Jul 20, 2016 at 1:55 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Hi all,
>
> We have started a new KIP under
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-70%3A+Revise+Partition+Assignment+Semantics+on+New+Consumer%27s+Subscription+Change
>
> Your feedback is much appreciated.
>
> Regards,
> Vahid Hashemian
>
>


[jira] [Resolved] (KAFKA-3167) Use local to the workspace Gradle cache and recreate it on every build

2016-07-22 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3167.

Resolution: Fixed

> Use local to the workspace Gradle cache and recreate it on every build
> --
>
> Key: KAFKA-3167
> URL: https://issues.apache.org/jira/browse/KAFKA-3167
> Project: Kafka
>  Issue Type: Improvement
>  Components: build
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> Kafka builds often fail with "Could not add entry 
> '/home/jenkins/.gradle/caches/modules-2/files-2.1/net.jpountz.lz4/lz4/1.3.0/c708bb2590c0652a642236ef45d9f99ff842a2ce/lz4-1.3.0.jar'
>  to cache fileHashes.bin"
> I filed INFRA-11083 and Andrew Bayer suggested:
> "Can you change your builds to use a local-to-the-workspace cache and then 
> nuke it/recreate it on every build?"
> This issue is about changing the Jenkins config for one of the trunk builds 
> to do the above to see if it helps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3916) Connection from controller to broker disconnects

2016-07-22 Thread Andrey Konyaev (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389188#comment-15389188
 ] 

Andrey Konyaev commented on KAFKA-3916:
---

I have this problem with 0.10 vers.



> Connection from controller to broker disconnects
> 
>
> Key: KAFKA-3916
> URL: https://issues.apache.org/jira/browse/KAFKA-3916
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.9.0.1
>Reporter: Dave Powell
>
> We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
> day, the controllers in our clusters have their connection to all brokers 
> disconnected, and then successfully reconnected a few hundred ms later. Each 
> time this occurs we see a brief spike in our 99th percentile produce and 
> consume times, reaching several hundred ms.
> Here is an example of what we're seeing in the controller.log:
> {code}
> [2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
> Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
> broker.160.hostname, 9092). Reconnecting to broker. 
> (kafka.controller.RequestSendThread)
> java.io.IOException: Connection to 160 was disconnected before the response 
> was read
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
> at scala.Option.foreach(Option.scala:236)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
> at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
> at 
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
> at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
> at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
> at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
> at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> ... one each for all brokers (including the controller) ...
>  [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
> Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
> state change requests (kafka.controller.RequestSendThread)
> … one each for all brokers (including the controller) ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Optimise memory used by replication process by using adaptive fetch message size

2016-07-22 Thread Andrey L. Neporada
Hi!

Thanks for feedback - I agree that the proper way to fix this issue is to 
provide per-request data limit.
Will try to do it.

Thanks,
Andrey.



> On 21 Jul 2016, at 18:57, Jay Kreps  wrote:
> 
> I think the memory usage for consumers can be improved a lot, but I think
> there may be a better way then what you are proposing.
> 
> The problem is exactly what you describe: the bound the user sets is
> per-partition, but the number of partitions may be quite high. The consumer
> could provide a bound on the response size by only requesting a subset of
> the partitions, but this would mean that if there was no data available on
> those partitions the consumer wouldn't be checking other partitions, which
> would add latency.
> 
> I think the solution is to add a new "max response size" parameter to the
> fetch request so the server checks all partitions but doesn't send back
> more than this amount in total. This has to be done carefully to ensure
> fairness (i.e. if one partition has unbounded amounts of data it shouldn't
> indefinitely starve other partitions).
> 
> This will fix memory management both in the replicas and for consumers.
> 
> There is a JIRA for this: https://issues.apache.org/jira/browse/KAFKA-2063
> 
> I think it isn't too hard to do and would be a huge aid to the memory
> profile of both the clients and server.
> 
> I also don't think there is much use in setting a max size that expands
> dynamically since in any case you have to be able to support the maximum,
> so you might as well always use that rather than expanding and contracting
> dynamically. That is, if your max fetch response size is 64MB you need to
> budget 64MB of free memory, so making it smaller some of the time doesn't
> really help you.
> 
> -Jay
> 
> On Thu, Jul 21, 2016 at 2:49 AM, Andrey L. Neporada <
> anepor...@yandex-team.ru> wrote:
> 
>> Hi all!
>> 
>> We noticed that our Kafka cluster uses a lot of memory for replication.
>> Our Kafka usage pattern is following:
>> 
>> 1. Most messages are small (tens or hundreds kilobytes at most), but some
>> (rare) messages can be several megabytes.So, we have to set
>> replica.fetch.max.bytes = max.message.bytes = 8MB
>> 2. Each Kafka broker handles several thousands of partitions from multiple
>> topics.
>> 
>> In this scenario total memory required for replication (i.e.
>> replica.fetch.max.bytes * numOfPartitions) is unreasonably big.
>> 
>> So we would like to propose following approach to fix this problem:
>> 
>> 1. Introduce new config parameter replica.fetch.base.bytes - which is the
>> initial size of replication data chunk. By default this parameter should be
>> equal to replica.fetch.max.bytes so the replication process will work as
>> before.
>> 
>> 2. If the ReplicaFetcherThread fails when trying to replicate message
>> bigger than current replication chunk, we increase it twofold (or up to
>> replica.fetch.max.bytes, whichever is smaller) and retry.
>> 
>> 3. If the chunk is replicated successfully we try to decrease the size of
>> replication chunk back to replica.fetch.base.bytes.
>> 
>> 
>> By choosing replica.fetch.base.bytes in optimal way (in our case ~200K),
>> we we able to significatly decrease memory usage without any noticeable
>> impact on replication efficiency.
>> 
>> Here is JIRA ticket (with PR):
>> https://issues.apache.org/jira/browse/KAFKA-3979
>> 
>> Your comments and feedback are highly appreciated!
>> 
>> 
>> Thanks,
>> Andrey.



Re: [DISCUSS] KIP-4 ACL Admin Schema

2016-07-22 Thread Jim Jagielski

> On Jul 21, 2016, at 10:57 PM, Ismael Juma  wrote:
> 
> Hi Grant,
> 
> Thanks for the KIP.  A few questions and comments:
> 
> 1. My main concern is that we are skipping the discussion on the desired
> model for controlling ACL access and updates. I understand the desire to
> reduce the scope, but this seems to be a fundamental aspect of the design
> that we need to get right. Without a plan for that, it is difficult to
> evaluate if that part of the current proposal is fine.

++1.

> 2. Are the Java objects in "org.apache.kafka.common.security.auth" going to
> be public API? If so, we should explain why they should be public and
> describe them in the KIP. If not, we should mention that.
> 3. It would be nice to have a name for a (Resource, ACL) pair. The current
> protocol uses `requests`/`responses` for the list of such pairs, but it
> would be nice to have something more descriptive, if possible. Any ideas?

The problem w/ being more descriptive is that its possible that
it restricts potential use cases if people think that somehow
their use case wouldn't fit. 

> 4. There is no CreateAcls or DeleteAcls (unlike CreateTopics and
> DeleteTopics, for example). It would be good to explain the reasoning for
> this choice (Jason also asked this question).
> 5. What is the plan for when we add standard exceptions to the Authorizer
> interface? Will we bump the protocol version?
> 
> Thanks,
> Ismael
> 
> On Thu, Jul 14, 2016 at 5:09 PM, Grant Henke  wrote:
> 
>> The KIP-4 Delete Topic Schema vote has passed and the patch
>>  is available for review. Now I
>> would like to start the discussion for the Acls request/response and server
>> side implementations. This includes the ListAclsRequest/Response and the
>> AlterAclsRequest/Response.
>> 
>> Details for this implementation can be read here:
>> *
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-ACLAdminSchema(KAFKA-3266)
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-ACLAdminSchema(KAFKA-3266)
>>> *
>> 
>> I have included the exact content below for clarity:
>> 
>>> ACL Admin Schema (KAFKA-3266
>>> )
>>> 
>>> *Note*: Some of this work/code overlaps with "KIP-50 - Move Authorizer to
>>> o.a.k.common package
>>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-50+-+Move+Authorizer+to+o.a.k.common+package
>>> ".
>>> KIP-4 does not change the Authorizer interface at all, but does provide
>>> java objects in "org.apache.kafka.common.security.auth" to be used in the
>>> protocol request/response classes. It also provides translations between
>>> the Java and Scala versions for server side compatibility with
>>> the Authorizer interface.
>>> 
>>> List ACLs Request
>>> 
>>> 
>>> 
>>> ListAcls Request (Version: 0) => principal resource
>>>  principal => NULLABLE_STRING
>>>  resource => resource_type resource_name
>>>resource_type => INT8
>>>resource_name => STRING
>>> 
>>> Request semantics:
>>> 
>>>   1. Can be sent to any broker
>>>   2. If a non-null principal is provided the returned ACLs will be
>>>   filtered by that principle, otherwise ACLs for all principals will be
>>>   listed.
>>>   3. If a resource with a resource_type != -1 is provided ACLs will be
>>>   filtered by that resource, otherwise ACLs for all resources will be
>> listed.
>>>   4. Any principle can list their own ACLs where the permission type is
>>>   "Allow", Otherwise the principle must be authorized to the "All"
>> Operation
>>>   on the "Cluster" resource to list ACLs.
>>>   - Unauthorized requests will receive a ClusterAuthorizationException
>>>  - This avoids adding a new operation that an existing authorizer
>>>  implementation may not be aware of.
>>>  - This can be reviewed and further refined/restricted as a follow
>>>  up ACLs review after this KIP. See Follow Up Changes
>>>  <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-follow-up-changes
>>> 
>>>  .
>>>   5. Requesting a resource or principle that does not have any ACLs will
>>>   not result in an error, instead empty response list is returned
>>> 
>>> List ACLs Response
>>> 
>>> 
>>> 
>>> ListAcls Response (Version: 0) => [responses] error_code
>>>  responses => resource [acls]
>>>resource => resource_type resource_name
>>>  resource_type => INT8
>>>  resource_name => STRING
>>>acls => acl_principle acl_permission_type acl_host acl_operation
>>>  acl_principle => STRING
>>>  acl_permission_type => INT8
>>>  acl_host => STRING
>>>  acl_operation => INT8
>>>  error_code => INT16
>>> 
>>> Alte

[GitHub] kafka pull request #1651: MINOR: Fix typos in security section

2016-07-22 Thread ssaamm
GitHub user ssaamm opened a pull request:

https://github.com/apache/kafka/pull/1651

MINOR: Fix typos in security section

1. I think the instructions in step 2 of the security section which 
describe adding the CA to server/client truststores are swapped. That is, the 
instruction that says to add the CA to the server truststore adds it to the 
client truststore (and vice versa).
2. "clients keys" should be possessive ("clients' keys").

This contribution is my original work, and I license the work to the 
project under the project's open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ssaamm/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1651.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1651


commit aedea43b34135bb4f2c42be7a03e2a6e758c8422
Author: Samuel Taylor 
Date:   2016-07-22T13:07:29Z

MINOR: Fix typos in security section

1. I believe the commands in step 2 to add the CA to the server
and client truststores were swapped
2. "clients keys" should be possessive ("clients' keys")




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1434: KAFKA-2394: move to RollingFileAppender by default...

2016-07-22 Thread cotedm
Github user cotedm closed the pull request at:

https://github.com/apache/kafka/pull/1434


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389514#comment-15389514
 ] 

ASF GitHub Bot commented on KAFKA-2394:
---

Github user cotedm closed the pull request at:

https://github.com/apache/kafka/pull/1434


> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11.0.0
>
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1652: KAFKA-2394: move to RollingFileAppender by default...

2016-07-22 Thread cotedm
GitHub user cotedm opened a pull request:

https://github.com/apache/kafka/pull/1652

KAFKA-2394: move to RollingFileAppender by default for log4j 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cotedm/kafka KAFKA-2394

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1652


commit 31a565b31a99567327b774045fbf726f5a2d1812
Author: Dustin Cote 
Date:   2016-07-22T14:37:53Z

update log4j.properties to use RollingFileAppender




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-07-22 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389618#comment-15389618
 ] 

Dustin Cote commented on KAFKA-2394:


had to change the PR to come from a different branch so I could have my trunk 
branch back :)

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11.0.0
>
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389617#comment-15389617
 ] 

ASF GitHub Bot commented on KAFKA-2394:
---

GitHub user cotedm opened a pull request:

https://github.com/apache/kafka/pull/1652

KAFKA-2394: move to RollingFileAppender by default for log4j 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cotedm/kafka KAFKA-2394

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1652


commit 31a565b31a99567327b774045fbf726f5a2d1812
Author: Dustin Cote 
Date:   2016-07-22T14:37:53Z

update log4j.properties to use RollingFileAppender




> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11.0.0
>
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2932) Adjust importance level of Kafka Connect configs

2016-07-22 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389722#comment-15389722
 ] 

Dustin Cote commented on KAFKA-2932:


[~ewencp] mind if I pick this one up?

> Adjust importance level of Kafka Connect configs
> 
>
> Key: KAFKA-2932
> URL: https://issues.apache.org/jira/browse/KAFKA-2932
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> Some of the configuration importance levels are out of whack, probably due to 
> the way they evolved over time. For example, the internal converter settings 
> are currently marked with high importance, but they are really an internal 
> implementation detail that the user usually shouldn't need to worry about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3985) Transient system test failure ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol

2016-07-22 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-3985:
--

 Summary: Transient system test failure 
ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol
 Key: KAFKA-3985
 URL: https://issues.apache.org/jira/browse/KAFKA-3985
 Project: Kafka
  Issue Type: Test
  Components: system tests
Affects Versions: 0.10.0.0
Reporter: Jason Gustafson


Found this in the nightly build on the 0.10.0 branch. Full details here: 
http://testing.confluent.io/confluent-kafka-0-10-0-system-test-results/?prefix=2016-07-22--001.1469199875--apache--0.10.0--71a598a/.
  

{code}
test_id:
2016-07-22--001.kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=SSL
status: FAIL
run time:   5 minutes 14.067 seconds


292 acked message did not make it to the Consumer. They are: 11264, 11265, 
11266, 11267, 11268, 11269, 11270, 11271, 11272, 11273, 11274, 11275, 11276, 
11277, 11278, 11279, 11280, 11281, 11282, 11283, ...plus 252 more. Total Acked: 
11343, Total Consumed: 11054. We validated that the first 272 of these missing 
messages correctly made it into Kafka's data files. This suggests they were 
lost on their way to the consumer.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
 line 106, in run_all_tests
data = self.run_single_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape/tests/runner.py",
 line 162, in run_single_test
return self.current_test_context.function(self.current_test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape/mark/_mark.py",
 line 331, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/tests/kafkatest/tests/core/zookeeper_security_upgrade_test.py",
 line 115, in test_zk_security_upgrade
self.run_produce_consume_validate(self.run_zk_migration)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 79, in run_produce_consume_validate
raise e
AssertionError: 292 acked message did not make it to the Consumer. They are: 
11264, 11265, 11266, 11267, 11268, 11269, 11270, 11271, 11272, 11273, 11274, 
11275, 11276, 11277, 11278, 11279, 11280, 11281, 11282, 11283, ...plus 252 
more. Total Acked: 11343, Total Consumed: 11054. We validated that the first 
272 of these missing messages correctly made it into Kafka's data files. This 
suggests they were lost on their way to the consumer.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1589: MINOR: Increase default `waitTime` in `waitUntilTr...

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1589


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-07-22 Thread Vahid S Hashemian
Thanks Ismael.

What do you think is the best way to check with Storm / Spark users? Their 
mailing list?

Thanks.
 
Regards,
--Vahid 




From:   Ismael Juma 
To: dev@kafka.apache.org
Date:   07/22/2016 01:44 AM
Subject:Re: [DISCUSS] KIP-70: Revise Partition Assignment 
Semantics on New Consumer's Subscription Change
Sent by:isma...@gmail.com



Thanks for the KIP Vahid. The change makes sense. On the compatibility
front, could we check some of the advanced Kafka users like Storm and 
Spark
in order to verify if they would be affected?

Ismael

On Wed, Jul 20, 2016 at 1:55 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Hi all,
>
> We have started a new KIP under
>
> 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-70%3A+Revise+Partition+Assignment+Semantics+on+New+Consumer%27s+Subscription+Change

>
> Your feedback is much appreciated.
>
> Regards,
> Vahid Hashemian
>
>






[GitHub] kafka pull request #1648: KAFKA-3983 - Add additional information to debug

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1648


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3983) It would be helpful if SocketServer's Acceptors logged both the SocketChannel port and the processor ID upon registra

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389844#comment-15389844
 ] 

ASF GitHub Bot commented on KAFKA-3983:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1648


> It would be helpful if SocketServer's Acceptors logged both the SocketChannel 
> port and the processor ID upon registra
> -
>
> Key: KAFKA-3983
> URL: https://issues.apache.org/jira/browse/KAFKA-3983
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Ryan P
>Assignee: Ryan P
>Priority: Minor
> Fix For: 0.10.0.1
>
>
> Currently Acceptors log the following message prior to registering passing 
> the accepted channel to a processor. 
> "Accepted connection from %s on %s [%d] sendBufferSize [actual|requested]: 
> [%d|%d] recvBufferSize [actual|requested]: [%d|%d]"
> It would be helpful to include the port number and the processor ID in this 
> message to aid in debugging efforts. Making it easier to track the amount of 
> time between acceptance and processing (connection configuration) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3983) It would be helpful if SocketServer's Acceptors logged both the SocketChannel port and the processor ID upon registra

2016-07-22 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3983.

Resolution: Fixed

> It would be helpful if SocketServer's Acceptors logged both the SocketChannel 
> port and the processor ID upon registra
> -
>
> Key: KAFKA-3983
> URL: https://issues.apache.org/jira/browse/KAFKA-3983
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Ryan P
>Assignee: Ryan P
>Priority: Minor
> Fix For: 0.10.0.1
>
>
> Currently Acceptors log the following message prior to registering passing 
> the accepted channel to a processor. 
> "Accepted connection from %s on %s [%d] sendBufferSize [actual|requested]: 
> [%d|%d] recvBufferSize [actual|requested]: [%d|%d]"
> It would be helpful to include the port number and the processor ID in this 
> message to aid in debugging efforts. Making it easier to track the amount of 
> time between acceptance and processing (connection configuration) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-07-22 Thread Vahid S Hashemian
Thanks Jason / Ewen for your feedback.

I agree that this is more like a bug than anything else and should have 
little impact on the users.

Regards, 
--Vahid



From:   Ewen Cheslack-Postava 
To: dev@kafka.apache.org
Date:   07/21/2016 10:59 PM
Subject:Re: [DISCUSS] KIP-70: Revise Partition Assignment 
Semantics on New Consumer's Subscription Change



Agreed w/ Jason re: compatibility. It seems like such an edge case to
actually rely on this and I'd consider the current behavior essentially a
bug given how surprising it is. While normally a stickler for
compatibility, I think this is a case where its fine to make the change.

-Ewen

On Wed, Jul 20, 2016 at 9:48 AM, Jason Gustafson  
wrote:

> Hey Vahid,
>
> Thanks for writing this up. This seems like a nice improvement over the
> existing somewhat surprising behavior. Currently if you have a consumer
> which changes subscriptions, then you will need to handle separately any
> cleanup for assigned partitions for topics which are no longer 
subscribed.
> With this change, the user can handle this exclusively in the
> onPartitionsRevoked() callback which seems less error prone. This also
> makes it unnecessary for us to do any special handling when autocommit 
is
> enabled since all partitions will still be assigned when we do the final
> offset commit prior to rebalancing. The main question mark in my mind is
> compatibility, but it seems unlikely that anyone depends on the current
> behavior. My hunch is that users probably expect it already works this 
way,
> so from that perspective, it's almost more of a bug fix.
>
> Thanks,
> Jason
>
> On Tue, Jul 19, 2016 at 5:55 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > Hi all,
> >
> > We have started a new KIP under
> >
> >
> 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-70%3A+Revise+Partition+Assignment+Semantics+on+New+Consumer%27s+Subscription+Change

> >
> > Your feedback is much appreciated.
> >
> > Regards,
> > Vahid Hashemian
> >
> >
>



-- 
Thanks,
Ewen






[jira] [Commented] (KAFKA-2932) Adjust importance level of Kafka Connect configs

2016-07-22 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389872#comment-15389872
 ] 

Ewen Cheslack-Postava commented on KAFKA-2932:
--

[~cotedm] Yes please! Just FYI, almost everything marked with the component 
KafkaConnect that's assigned to me you can feel free to grab -- they get 
auto-assigned to me, but these days I don't have as much time to write patches.

> Adjust importance level of Kafka Connect configs
> 
>
> Key: KAFKA-2932
> URL: https://issues.apache.org/jira/browse/KAFKA-2932
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> Some of the configuration importance levels are out of whack, probably due to 
> the way they evolved over time. For example, the internal converter settings 
> are currently marked with high importance, but they are really an internal 
> implementation detail that the user usually shouldn't need to worry about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2932) Adjust importance level of Kafka Connect configs

2016-07-22 Thread Dustin Cote (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote reassigned KAFKA-2932:
--

Assignee: Dustin Cote  (was: Ewen Cheslack-Postava)

> Adjust importance level of Kafka Connect configs
> 
>
> Key: KAFKA-2932
> URL: https://issues.apache.org/jira/browse/KAFKA-2932
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Dustin Cote
>
> Some of the configuration importance levels are out of whack, probably due to 
> the way they evolved over time. For example, the internal converter settings 
> are currently marked with high importance, but they are really an internal 
> implementation detail that the user usually shouldn't need to worry about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1653: KAFKA-2932: Adjust importance level of Kafka Conne...

2016-07-22 Thread cotedm
GitHub user cotedm opened a pull request:

https://github.com/apache/kafka/pull/1653

KAFKA-2932: Adjust importance level of Kafka Connect configs

@ewencp I went down the list of connect configs and it looks like only the 
internal converter configs are mismarked.  It looks like the `cluster` config 
that is present in the current docs is already gone.  The only other values I 
can see arguing to change importance on are the ssl configs (marked high) but 
they are consistent with the producer/consumer config docs so that's at least 
consistent.  Everything else marked high looks either mandatory or requires 
consideration in a production deployment to me.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cotedm/kafka KAFKA-2932

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1653


commit c3e076a75e1e714c2585e9b976f5b746d804eaa5
Author: Dustin Cote 
Date:   2016-07-22T18:01:39Z

lower importance rating for internal converter settings




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2932) Adjust importance level of Kafka Connect configs

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389961#comment-15389961
 ] 

ASF GitHub Bot commented on KAFKA-2932:
---

GitHub user cotedm opened a pull request:

https://github.com/apache/kafka/pull/1653

KAFKA-2932: Adjust importance level of Kafka Connect configs

@ewencp I went down the list of connect configs and it looks like only the 
internal converter configs are mismarked.  It looks like the `cluster` config 
that is present in the current docs is already gone.  The only other values I 
can see arguing to change importance on are the ssl configs (marked high) but 
they are consistent with the producer/consumer config docs so that's at least 
consistent.  Everything else marked high looks either mandatory or requires 
consideration in a production deployment to me.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cotedm/kafka KAFKA-2932

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1653


commit c3e076a75e1e714c2585e9b976f5b746d804eaa5
Author: Dustin Cote 
Date:   2016-07-22T18:01:39Z

lower importance rating for internal converter settings




> Adjust importance level of Kafka Connect configs
> 
>
> Key: KAFKA-2932
> URL: https://issues.apache.org/jira/browse/KAFKA-2932
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Dustin Cote
>
> Some of the configuration importance levels are out of whack, probably due to 
> the way they evolved over time. For example, the internal converter settings 
> are currently marked with high importance, but they are really an internal 
> implementation detail that the user usually shouldn't need to worry about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk8 #772

2016-07-22 Thread Apache Jenkins Server
See 



[GitHub] kafka pull request #1654: MINOR: Update MirrorMaker docs to remove multiple ...

2016-07-22 Thread ottomata
GitHub user ottomata opened a pull request:

https://github.com/apache/kafka/pull/1654

MINOR: Update MirrorMaker docs to remove multiple --consumer.config options

See:
- https://issues.apache.org/jira/browse/KAFKA-1650
- 
https://mail-archives.apache.org/mod_mbox/kafka-users/201512.mbox/%3ccahwhrruetq_-ehxiuxdrbghcrt-0e_t0+5koyaf9qy4anvq...@mail.gmail.com%3E

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ottomata/kafka mirror-maker-doc-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1654


commit 1ecc0a6491fdfdaf7a9bfa9611d5938599fdad3f
Author: Andrew Otto 
Date:   2016-07-22T19:03:44Z

Update MirrorMaker docs to remove multiple --consumer.config options

See:
- https://issues.apache.org/jira/browse/KAFKA-1650
- 
https://mail-archives.apache.org/mod_mbox/kafka-users/201512.mbox/%3ccahwhrruetq_-ehxiuxdrbghcrt-0e_t0+5koyaf9qy4anvq...@mail.gmail.com%3E




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-1650) Mirror Maker could lose data on unclean shutdown.

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390055#comment-15390055
 ] 

ASF GitHub Bot commented on KAFKA-1650:
---

GitHub user ottomata opened a pull request:

https://github.com/apache/kafka/pull/1654

MINOR: Update MirrorMaker docs to remove multiple --consumer.config options

See:
- https://issues.apache.org/jira/browse/KAFKA-1650
- 
https://mail-archives.apache.org/mod_mbox/kafka-users/201512.mbox/%3ccahwhrruetq_-ehxiuxdrbghcrt-0e_t0+5koyaf9qy4anvq...@mail.gmail.com%3E

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ottomata/kafka mirror-maker-doc-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1654


commit 1ecc0a6491fdfdaf7a9bfa9611d5938599fdad3f
Author: Andrew Otto 
Date:   2016-07-22T19:03:44Z

Update MirrorMaker docs to remove multiple --consumer.config options

See:
- https://issues.apache.org/jira/browse/KAFKA-1650
- 
https://mail-archives.apache.org/mod_mbox/kafka-users/201512.mbox/%3ccahwhrruetq_-ehxiuxdrbghcrt-0e_t0+5koyaf9qy4anvq...@mail.gmail.com%3E




> Mirror Maker could lose data on unclean shutdown.
> -
>
> Key: KAFKA-1650
> URL: https://issues.apache.org/jira/browse/KAFKA-1650
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-1650.patch, KAFKA-1650_2014-10-06_10:17:46.patch, 
> KAFKA-1650_2014-11-12_09:51:30.patch, KAFKA-1650_2014-11-17_18:44:37.patch, 
> KAFKA-1650_2014-11-20_12:00:16.patch, KAFKA-1650_2014-11-24_08:15:17.patch, 
> KAFKA-1650_2014-12-03_15:02:31.patch, KAFKA-1650_2014-12-03_19:02:13.patch, 
> KAFKA-1650_2014-12-04_11:59:07.patch, KAFKA-1650_2014-12-06_18:58:57.patch, 
> KAFKA-1650_2014-12-08_01:36:01.patch, KAFKA-1650_2014-12-16_08:03:45.patch, 
> KAFKA-1650_2014-12-17_12:29:23.patch, KAFKA-1650_2014-12-18_18:48:18.patch, 
> KAFKA-1650_2014-12-18_22:17:08.patch, KAFKA-1650_2014-12-18_22:53:26.patch, 
> KAFKA-1650_2014-12-18_23:41:16.patch, KAFKA-1650_2014-12-22_19:07:24.patch, 
> KAFKA-1650_2014-12-23_07:04:28.patch, KAFKA-1650_2014-12-23_16:44:06.patch
>
>
> Currently if mirror maker got shutdown uncleanly, the data in the data 
> channel and buffer could potentially be lost. With the new producer's 
> callback, this issue could be solved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk7 #1437

2016-07-22 Thread Apache Jenkins Server
See 

Changes:

[ismael] MINOR: Increase default `waitTime` in `waitUntilTrue` to 15 seconds

[ismael] KAFKA-3983; Add additional information to Acceptor debug message

--
[...truncated 6752 lines...]

kafka.coordinator.GroupMetadataManagerTest > testStoreNonEmptyGroup PASSED

kafka.coordinator.GroupMetadataManagerTest > testExpireGroup STARTED

kafka.coordinator.GroupMetadataManagerTest > testExpireGroup PASSED

kafka.coordinator.GroupMetadataManagerTest > testAddGroup STARTED

kafka.coordinator.GroupMetadataManagerTest > testAddGroup PASSED

kafka.coordinator.GroupMetadataManagerTest > testCommitOffset STARTED

kafka.coordinator.GroupMetadataManagerTest > testCommitOffset PASSED

kafka.coordinator.GroupMetadataManagerTest > testCommitOffsetFailure STARTED

kafka.coordinator.GroupMetadataManagerTest > testCommitOffsetFailure PASSED

kafka.coordinator.GroupMetadataManagerTest > testExpireOffset STARTED

kafka.coordinator.GroupMetadataManagerTest > testExpireOffset PASSED

kafka.coordinator.GroupMetadataManagerTest > testExpireOffsetsWithActiveGroup 
STARTED

kafka.coordinator.GroupMetadataManagerTest > testExpireOffsetsWithActiveGroup 
PASSED

kafka.coordinator.GroupMetadataManagerTest > testStoreEmptyGroup STARTED

kafka.coordinator.GroupMetadataManagerTest > testStoreEmptyGroup PASSED

kafka.coordinator.MemberMetadataTest > testMatchesSupportedProtocols STARTED

kafka.coordinator.MemberMetadataTest > testMatchesSupportedProtocols PASSED

kafka.coordinator.MemberMetadataTest > testMetadata STARTED

kafka.coordinator.MemberMetadataTest > testMetadata PASSED

kafka.coordinator.MemberMetadataTest > testMetadataRaisesOnUnsupportedProtocol 
STARTED

kafka.coordinator.MemberMetadataTest > testMetadataRaisesOnUnsupportedProtocol 
PASSED

kafka.coordinator.MemberMetadataTest > testVoteForPreferredProtocol STARTED

kafka.coordinator.MemberMetadataTest > testVoteForPreferredProtocol PASSED

kafka.coordinator.MemberMetadataTest > testVoteRaisesOnNoSupportedProtocols 
STARTED

kafka.coordinator.MemberMetadataTest > testVoteRaisesOnNoSupportedProtocols 
PASSED

kafka.coordinator.GroupMetadataTest > testDeadToAwaitingSyncIllegalTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testDeadToAwaitingSyncIllegalTransition 
PASSED

kafka.coordinator.GroupMetadataTest > testOffsetCommitFailure STARTED

kafka.coordinator.GroupMetadataTest > testOffsetCommitFailure PASSED

kafka.coordinator.GroupMetadataTest > 
testPreparingRebalanceToStableIllegalTransition STARTED

kafka.coordinator.GroupMetadataTest > 
testPreparingRebalanceToStableIllegalTransition PASSED

kafka.coordinator.GroupMetadataTest > testStableToDeadTransition STARTED

kafka.coordinator.GroupMetadataTest > testStableToDeadTransition PASSED

kafka.coordinator.GroupMetadataTest > testInitNextGenerationEmptyGroup STARTED

kafka.coordinator.GroupMetadataTest > testInitNextGenerationEmptyGroup PASSED

kafka.coordinator.GroupMetadataTest > testCannotRebalanceWhenDead STARTED

kafka.coordinator.GroupMetadataTest > testCannotRebalanceWhenDead PASSED

kafka.coordinator.GroupMetadataTest > testInitNextGeneration STARTED

kafka.coordinator.GroupMetadataTest > testInitNextGeneration PASSED

kafka.coordinator.GroupMetadataTest > testPreparingRebalanceToEmptyTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testPreparingRebalanceToEmptyTransition 
PASSED

kafka.coordinator.GroupMetadataTest > testSelectProtocol STARTED

kafka.coordinator.GroupMetadataTest > testSelectProtocol PASSED

kafka.coordinator.GroupMetadataTest > testCannotRebalanceWhenPreparingRebalance 
STARTED

kafka.coordinator.GroupMetadataTest > testCannotRebalanceWhenPreparingRebalance 
PASSED

kafka.coordinator.GroupMetadataTest > 
testDeadToPreparingRebalanceIllegalTransition STARTED

kafka.coordinator.GroupMetadataTest > 
testDeadToPreparingRebalanceIllegalTransition PASSED

kafka.coordinator.GroupMetadataTest > testCanRebalanceWhenAwaitingSync STARTED

kafka.coordinator.GroupMetadataTest > testCanRebalanceWhenAwaitingSync PASSED

kafka.coordinator.GroupMetadataTest > 
testAwaitingSyncToPreparingRebalanceTransition STARTED

kafka.coordinator.GroupMetadataTest > 
testAwaitingSyncToPreparingRebalanceTransition PASSED

kafka.coordinator.GroupMetadataTest > testStableToAwaitingSyncIllegalTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testStableToAwaitingSyncIllegalTransition 
PASSED

kafka.coordinator.GroupMetadataTest > testEmptyToDeadTransition STARTED

kafka.coordinator.GroupMetadataTest > testEmptyToDeadTransition PASSED

kafka.coordinator.GroupMetadataTest > testSelectProtocolRaisesIfNoMembers 
STARTED

kafka.coordinator.GroupMetadataTest > testSelectProtocolRaisesIfNoMembers PASSED

kafka.coordinator.GroupMetadataTest > testStableToPreparingRebalanceTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testStableToPreparingRebalanceTransition 
PASSED

kafka.coordinator.GroupM

Re: [DISCUSS] KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change

2016-07-22 Thread Dana Powers
This is a nice change. Great KIP write up.

-Dana

On Fri, Jul 22, 2016 at 10:07 AM, Vahid S Hashemian
 wrote:
> Thanks Ismael.
>
> What do you think is the best way to check with Storm / Spark users? Their
> mailing list?
>
> Thanks.
>
> Regards,
> --Vahid
>
>
>
>
> From:   Ismael Juma 
> To: dev@kafka.apache.org
> Date:   07/22/2016 01:44 AM
> Subject:Re: [DISCUSS] KIP-70: Revise Partition Assignment
> Semantics on New Consumer's Subscription Change
> Sent by:isma...@gmail.com
>
>
>
> Thanks for the KIP Vahid. The change makes sense. On the compatibility
> front, could we check some of the advanced Kafka users like Storm and
> Spark
> in order to verify if they would be affected?
>
> Ismael
>
> On Wed, Jul 20, 2016 at 1:55 AM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
>> Hi all,
>>
>> We have started a new KIP under
>>
>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-70%3A+Revise+Partition+Assignment+Semantics+on+New+Consumer%27s+Subscription+Change
>
>>
>> Your feedback is much appreciated.
>>
>> Regards,
>> Vahid Hashemian
>>
>>
>
>
>
>


[jira] [Work started] (KAFKA-3777) Extract the existing LRU cache out of RocksDBStore

2016-07-22 Thread Anna Povzner (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3777 started by Anna Povzner.
---
> Extract the existing LRU cache out of RocksDBStore
> --
>
> Key: KAFKA-3777
> URL: https://issues.apache.org/jira/browse/KAFKA-3777
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Anna Povzner
> Fix For: 0.10.1.0
>
>
> The LRU cache that is currently inside the RocksDbStore class. As part of 
> KAFKA-3776 it needs to come outside of RocksDbStore and be a separate 
> component used in:
> 1. KGroupedStream.aggregate() / reduce(), 
> 2. KStream.aggregateByKey() / reduceByKey(),
> 3. KTable.to() (this will be done in KAFKA-3779).
> As all of the above operators can have a cache on top to deduplicate the 
> materialized state store in RocksDB.
> The scope of this JIRA is to extract out the cache of RocksDBStore, and keep 
> them as item 1) and 2) above; and it should be done together / after 
> KAFKA-3780.
> Note it is NOT in the scope of this JIRA to re-write the cache, so this will 
> basically stay the same record-based cache we currently have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3982) Issue with processing order of consumer properties in console consumer

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390181#comment-15390181
 ] 

ASF GitHub Bot commented on KAFKA-3982:
---

GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/1655

KAFKA-3982: Fix processing order of some of the consumer properties

This PR updates processing of console consumer's input properties.

For both old and new consumer, the value provided for `auto.offset.reset` 
indirectly through `consumer.config` or `consumer.property` arguments will now 
take effect.
For new consumer and for `key.deserializer` and `value.deserializer` 
properties, the precedence order is fixed to first the value directly provided 
as an argument, then the value provided indirectly via `consumer.property` and 
then `consumer.config`, and finally a default value.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-3982

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1655


commit 6e112c8d097d55cbd5cd6dd81077180b89f175e3
Author: Vahid Hashemian 
Date:   2016-07-22T20:48:59Z

KAFKA-3982: Fix processing order of some of the consumer properties

This PR updates processing of console consumer's input properties.

For both old and new consumer, the value provided for `auto.offset.reset` 
indirectly through `consumer.config` or `consumer.property` arguments will now 
take effect.
For new consumer and for `key.deserializer` and `value.deserializer` 
properties, the precedence is fixed to first the value directly provided as an 
argument, then the value provided indirectly via `consumer.property` and then 
`consumer.config`, and finally a default value.




> Issue with processing order of consumer properties in console consumer
> --
>
> Key: KAFKA-3982
> URL: https://issues.apache.org/jira/browse/KAFKA-3982
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Minor
>
> With the recent introduction of {{consumer.property}} argument in console 
> consumer, both new and old consumer could overwrite certain properties 
> provided using this new argument.
> Specifically, the old consumer would overwrite the values provided for 
> [{{auto.offset.reset}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L173]
>  and 
> [{{zookeeper.connect}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L174],
>  and the new consumer would overwrite the values provided for 
> [{{auto.offset.reset}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L196],
>  
> [{{bootstrap.servers}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L197],
>  
> [{{key.deserializer}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L198],
>  and 
> [{{key.deserializer}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L199].
> For example, running the old consumer as {{bin/kafka-console-consumer.sh 
> --zookeeper localhost:2181 --topic foo --consumer-property 
> auto.offset.reset=none}} the value that's eventually selected for 
> {{auto.offset.reset}} will be {{largest}}, overwriting what the user provides 
> in the command line.
> This seems to be because the properties provided via {{consumer.property}} 
> argument are not considered when finalizing the configuration of the consumer.
> Some properties can now be provided in three different places (directly in 
> the command line, via the {{consumer.property}} argument, and via the 
> {{consumer.config}} argument, in the same order of precedence).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1655: KAFKA-3982: Fix processing order of some of the co...

2016-07-22 Thread vahidhashemian
GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/1655

KAFKA-3982: Fix processing order of some of the consumer properties

This PR updates processing of console consumer's input properties.

For both old and new consumer, the value provided for `auto.offset.reset` 
indirectly through `consumer.config` or `consumer.property` arguments will now 
take effect.
For new consumer and for `key.deserializer` and `value.deserializer` 
properties, the precedence order is fixed to first the value directly provided 
as an argument, then the value provided indirectly via `consumer.property` and 
then `consumer.config`, and finally a default value.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-3982

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1655


commit 6e112c8d097d55cbd5cd6dd81077180b89f175e3
Author: Vahid Hashemian 
Date:   2016-07-22T20:48:59Z

KAFKA-3982: Fix processing order of some of the consumer properties

This PR updates processing of console consumer's input properties.

For both old and new consumer, the value provided for `auto.offset.reset` 
indirectly through `consumer.config` or `consumer.property` arguments will now 
take effect.
For new consumer and for `key.deserializer` and `value.deserializer` 
properties, the precedence is fixed to first the value directly provided as an 
argument, then the value provided indirectly via `consumer.property` and then 
`consumer.config`, and finally a default value.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Changing hash algorithm to LogCleaner offset map

2016-07-22 Thread Luciano Afranllie
Hi

We are evaluating to change the hash algorithm used by the SkimpyOffsetMap
used by the LogCleaner from MD5 to SHA-1.

Besides the impact in performance (more memory, more cpu usage) is there
anything that may be impacted?

Regards
Luciano


[jira] [Updated] (KAFKA-3982) Issue with processing order of consumer properties in console consumer

2016-07-22 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian updated KAFKA-3982:
---
Status: Patch Available  (was: Open)

> Issue with processing order of consumer properties in console consumer
> --
>
> Key: KAFKA-3982
> URL: https://issues.apache.org/jira/browse/KAFKA-3982
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Minor
>
> With the recent introduction of {{consumer.property}} argument in console 
> consumer, both new and old consumer could overwrite certain properties 
> provided using this new argument.
> Specifically, the old consumer would overwrite the values provided for 
> [{{auto.offset.reset}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L173]
>  and 
> [{{zookeeper.connect}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L174],
>  and the new consumer would overwrite the values provided for 
> [{{auto.offset.reset}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L196],
>  
> [{{bootstrap.servers}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L197],
>  
> [{{key.deserializer}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L198],
>  and 
> [{{key.deserializer}}|https://github.com/apache/kafka/blob/10bbffd75439e10fe9db6cf0aa48a7da7e386ef3/core/src/main/scala/kafka/tools/ConsoleConsumer.scala#L199].
> For example, running the old consumer as {{bin/kafka-console-consumer.sh 
> --zookeeper localhost:2181 --topic foo --consumer-property 
> auto.offset.reset=none}} the value that's eventually selected for 
> {{auto.offset.reset}} will be {{largest}}, overwriting what the user provides 
> in the command line.
> This seems to be because the properties provided via {{consumer.property}} 
> argument are not considered when finalizing the configuration of the consumer.
> Some properties can now be provided in three different places (directly in 
> the command line, via the {{consumer.property}} argument, and via the 
> {{consumer.config}} argument, in the same order of precedence).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3973) Investigate feasibility of caching bytes vs. records

2016-07-22 Thread Bill Bejeck (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3973 started by Bill Bejeck.
--
> Investigate feasibility of caching bytes vs. records
> 
>
> Key: KAFKA-3973
> URL: https://issues.apache.org/jira/browse/KAFKA-3973
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Eno Thereska
>Assignee: Bill Bejeck
> Fix For: 0.10.1.0
>
>
> Currently the cache stores and accounts for records, not bytes or objects. 
> This investigation would be around measuring any performance overheads that 
> come from storing bytes or objects. As an outcome we should know whether 1) 
> we should store bytes or 2) we should store objects. 
> If we store objects, the cache still needs to know their size (so that it can 
> know if the object fits in the allocated cache space, e.g., if the cache is 
> 100MB and the object is 10MB, we'd have space for 10 such objects). The 
> investigation needs to figure out how to find out the size of the object 
> efficiently in Java.
> If we store bytes, then we are serialising an object into bytes before 
> caching it, i.e., we take a serialisation cost. The investigation needs 
> measure how bad this cost can be especially for the case when all objects fit 
> in cache (and thus any extra serialisation cost would show).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1656: KAFKA-3977: Defer fetch parsing for space efficien...

2016-07-22 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1656

KAFKA-3977: Defer fetch parsing for space efficiency and to ensure 
exceptions are raised to the user



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-3977

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1656


commit 0fa9dfa48826e5cf423f6329ec5c9e0f4e4b2673
Author: Jason Gustafson 
Date:   2016-07-22T19:30:12Z

KAFKA-3977: Defer fetch parsing for space efficiency and to ensure 
exceptions are raised to the user




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3977) KafkaConsumer swallows exceptions raised from message deserializers

2016-07-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390218#comment-15390218
 ] 

ASF GitHub Bot commented on KAFKA-3977:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/1656

KAFKA-3977: Defer fetch parsing for space efficiency and to ensure 
exceptions are raised to the user



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-3977

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1656


commit 0fa9dfa48826e5cf423f6329ec5c9e0f4e4b2673
Author: Jason Gustafson 
Date:   2016-07-22T19:30:12Z

KAFKA-3977: Defer fetch parsing for space efficiency and to ensure 
exceptions are raised to the user




> KafkaConsumer swallows exceptions raised from message deserializers
> ---
>
> Key: KAFKA-3977
> URL: https://issues.apache.org/jira/browse/KAFKA-3977
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> Message deserialization is currently done in the FetchResponse handler which 
> is executed by NetworkClient. Unfortunately, this means that any exceptions 
> raised by the deserializer will be eaten by NetworkClient and not raised to 
> users. This will be fixed (if unintentionally) in KAFKA-3888, but we should 
> make sure that this is fixed in 0.9.0 and 0.10.0 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Changing hash algorithm to LogCleaner offset map

2016-07-22 Thread Luciano Afranllie
A little bit of background first.

We are trying to make a deployment of Kafka that is FIPS 140-2 (
https://en.wikipedia.org/wiki/FIPS_140-2) complaint and one of the
requirements is not to use MD5.

As far as we could see, Kafka is using MD5 only to hash message keys in a
offset map (SkimpyOffsetMap) used by the log cleaner. So, we are planning
to change the hash algorithm to something allowed by FIPS.

With this in mind we are thinking that it would be great if we can add a
config property LogCleanerHashAlgorithmProp = "log.cleaner.hash.algorithm"
with a default value equal to "MD5" and use it in the constructor
of CleanerConfig. In that case in future versions of Kafka we can just
change the value of this property.

Please let me know if you are Ok with this change.
It is enough to create a pull request for this? Should I create a Jira
first?

Regards
Luciano

On Fri, Jul 22, 2016 at 5:58 PM, Luciano Afranllie  wrote:

> Hi
>
> We are evaluating to change the hash algorithm used by the SkimpyOffsetMap
> used by the LogCleaner from MD5 to SHA-1.
>
> Besides the impact in performance (more memory, more cpu usage) is there
> anything that may be impacted?
>
> Regards
> Luciano
>


Re: Changing hash algorithm to LogCleaner offset map

2016-07-22 Thread Shikhar Bhushan
Not sure I understand the motivation to use a FIPS-compliant hash function
for log compaction -- what are the security ramifications?

On Fri, Jul 22, 2016 at 2:56 PM Luciano Afranllie 
wrote:

> A little bit of background first.
>
> We are trying to make a deployment of Kafka that is FIPS 140-2 (
> https://en.wikipedia.org/wiki/FIPS_140-2) complaint and one of the
> requirements is not to use MD5.
>
> As far as we could see, Kafka is using MD5 only to hash message keys in a
> offset map (SkimpyOffsetMap) used by the log cleaner. So, we are planning
> to change the hash algorithm to something allowed by FIPS.
>
> With this in mind we are thinking that it would be great if we can add a
> config property LogCleanerHashAlgorithmProp = "log.cleaner.hash.algorithm"
> with a default value equal to "MD5" and use it in the constructor
> of CleanerConfig. In that case in future versions of Kafka we can just
> change the value of this property.
>
> Please let me know if you are Ok with this change.
> It is enough to create a pull request for this? Should I create a Jira
> first?
>
> Regards
> Luciano
>
> On Fri, Jul 22, 2016 at 5:58 PM, Luciano Afranllie <
> listas.luaf...@gmail.com
> > wrote:
>
> > Hi
> >
> > We are evaluating to change the hash algorithm used by the
> SkimpyOffsetMap
> > used by the LogCleaner from MD5 to SHA-1.
> >
> > Besides the impact in performance (more memory, more cpu usage) is there
> > anything that may be impacted?
> >
> > Regards
> > Luciano
> >
>