[GitHub] kafka pull request #2703: KAFKA-4772: Use peek to implement print

2017-03-18 Thread backender
GitHub user backender opened a pull request:

https://github.com/apache/kafka/pull/2703

KAFKA-4772: Use peek to implement print

**Tackles [KAFKA-4772](https://issues.apache.org/jira/browse/KAFKA-4772) 
and was previously discussed in https://github.com/apache/kafka/pull/2669**

The functionality of `KeyValuePrinter` is replaced with `PrintAction` which 
implements `ForeachAction` and can be passed to `KStreamPeek`. We therefore can 
get rid of `KeyValuePrinter`.
My only concern is that `ForeachAction` does not have access to the 
`ProcessorContext`, which means, that the topic name cannot be used within the 
deserialization process. For now, the a null value is filled in as a 
placeholder for topic.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/delftswa2017/kafka fix-KAFKA-4772-notopic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2703.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2703


commit 46d5f20eb1ee69abc24b05f890eca620d2399b17
Author: Marc Juchli 
Date:   2017-03-18T14:00:59Z

Use peek to implement print

The functionaliy of KeyValuePrinter is replaced with PrintAction which
implements ForeachAction and can be passed to KStreamPeek.
We therefore can get rid of KeyValuePrinter.
My only concern is that ForeachAction does not have access to the
ProcessorContext, which means, that the topic name cannot be used within
the deserialization process.
For now, the a null value is filled in as a placeholder for topic.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4772) Exploit #peek to implement #print() and other methods

2017-03-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931227#comment-15931227
 ] 

ASF GitHub Bot commented on KAFKA-4772:
---

GitHub user backender opened a pull request:

https://github.com/apache/kafka/pull/2703

KAFKA-4772: Use peek to implement print

**Tackles [KAFKA-4772](https://issues.apache.org/jira/browse/KAFKA-4772) 
and was previously discussed in https://github.com/apache/kafka/pull/2669**

The functionality of `KeyValuePrinter` is replaced with `PrintAction` which 
implements `ForeachAction` and can be passed to `KStreamPeek`. We therefore can 
get rid of `KeyValuePrinter`.
My only concern is that `ForeachAction` does not have access to the 
`ProcessorContext`, which means, that the topic name cannot be used within the 
deserialization process. For now, the a null value is filled in as a 
placeholder for topic.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/delftswa2017/kafka fix-KAFKA-4772-notopic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2703.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2703


commit 46d5f20eb1ee69abc24b05f890eca620d2399b17
Author: Marc Juchli 
Date:   2017-03-18T14:00:59Z

Use peek to implement print

The functionaliy of KeyValuePrinter is replaced with PrintAction which
implements ForeachAction and can be passed to KStreamPeek.
We therefore can get rid of KeyValuePrinter.
My only concern is that ForeachAction does not have access to the
ProcessorContext, which means, that the topic name cannot be used within
the deserialization process.
For now, the a null value is filled in as a placeholder for topic.




> Exploit #peek to implement #print() and other methods
> -
>
> Key: KAFKA-4772
> URL: https://issues.apache.org/jira/browse/KAFKA-4772
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Minor
>  Labels: beginner, newbie
>
> From: https://github.com/apache/kafka/pull/2493#pullrequestreview-22157555
> Things that I can think of:
> - print / writeAsTest can be a special impl of peek; KStreamPrint etc can be 
> removed.
> - consider collapse KStreamPeek with KStreamForeach with a flag parameter 
> indicating if the acted key-value pair should still be forwarded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4772) Exploit #peek to implement #print() and other methods

2017-03-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931229#comment-15931229
 ] 

ASF GitHub Bot commented on KAFKA-4772:
---

GitHub user backender opened a pull request:

https://github.com/apache/kafka/pull/2704

KAFKA-4772: [WIP] Use KStreamPeek to replace KeyValuePrinter

**Alternative to: https://github.com/apache/kafka/pull/2703 and serves as a 
reference for discussion.**
Tackles [KAFKA-4772](https://issues.apache.org/jira/browse/KAFKA-4772) and 
was previously discussed in https://github.com/apache/kafka/pull/2669

This PR contains only a slight improvement over the current 
`KeyValuePrinter`. The main functionaliy of the printer was refactored such 
that ForeachAction can be used. This would allow to further refactor 
KeyValuePrinter as soon as ForeachAction accepts a ProcessorContext as an 
argument, which is required to retrieve the topic name.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/delftswa2017/kafka fix-KAFKA-4772

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2704


commit 4963d01a4670d8907d5d7453ccf244da94cc4a2c
Author: Marc Juchli 
Date:   2017-03-18T12:22:41Z

Use KStreamPeek to implement KeyValuePrinter

This PR contains only a slight improvement over the current
KeyValuePrinter. The main functionaliy of the printer was refactored
such that ForeachAction is being used. This would allow to further
refactor KeyValuePrinter as soon as ForeachAction accepts a
ProcessorContext as an argument.




> Exploit #peek to implement #print() and other methods
> -
>
> Key: KAFKA-4772
> URL: https://issues.apache.org/jira/browse/KAFKA-4772
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Minor
>  Labels: beginner, newbie
>
> From: https://github.com/apache/kafka/pull/2493#pullrequestreview-22157555
> Things that I can think of:
> - print / writeAsTest can be a special impl of peek; KStreamPrint etc can be 
> removed.
> - consider collapse KStreamPeek with KStreamForeach with a flag parameter 
> indicating if the acted key-value pair should still be forwarded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2704: KAFKA-4772: [WIP] Use KStreamPeek to replace KeyVa...

2017-03-18 Thread backender
GitHub user backender opened a pull request:

https://github.com/apache/kafka/pull/2704

KAFKA-4772: [WIP] Use KStreamPeek to replace KeyValuePrinter

**Alternative to: https://github.com/apache/kafka/pull/2703 and serves as a 
reference for discussion.**
Tackles [KAFKA-4772](https://issues.apache.org/jira/browse/KAFKA-4772) and 
was previously discussed in https://github.com/apache/kafka/pull/2669

This PR contains only a slight improvement over the current 
`KeyValuePrinter`. The main functionaliy of the printer was refactored such 
that ForeachAction can be used. This would allow to further refactor 
KeyValuePrinter as soon as ForeachAction accepts a ProcessorContext as an 
argument, which is required to retrieve the topic name.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/delftswa2017/kafka fix-KAFKA-4772

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2704


commit 4963d01a4670d8907d5d7453ccf244da94cc4a2c
Author: Marc Juchli 
Date:   2017-03-18T12:22:41Z

Use KStreamPeek to implement KeyValuePrinter

This PR contains only a slight improvement over the current
KeyValuePrinter. The main functionaliy of the printer was refactored
such that ForeachAction is being used. This would allow to further
refactor KeyValuePrinter as soon as ForeachAction accepts a
ProcessorContext as an argument.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-132: Augment KStream.print to allow extra parameters in the printed string

2017-03-18 Thread Marc Juchli
Thanks!

I wanted to PING this thread. Not sure what the next steps of the KIP
process are?

Kind regards,
Marc

On Wed, Mar 15, 2017 at 9:13 PM Matthias J. Sax 
wrote:

> Thanks for updating the KIP.
>
> It's in very good shape IMHO and I support this idea!
>
>
>
> -Matthias
>
>
> On 3/15/17 3:05 AM, Marc Juchli wrote:
> > Dear Matthias,
> >
> > The KIP is updated. I think it now contains all the information on that
> > page.
> >
> > Marc
> >
> > On Mon, Mar 13, 2017 at 9:37 PM Matthias J. Sax 
> > wrote:
> >
> >> Marc,
> >>
> >> Thanks for the KIP.
> >>
> >> Can you please update the KIP in a way such that it is self contained.
> >> Right now, you link to all kind of other places making it hard to read
> >> the KIP.
> >>
> >> The KIP should be the "center of truth" -- if there is important
> >> information elsewhere, please c&p it into the KIP.
> >>
> >>
> >> Thanks a lot!
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 3/13/17 1:30 PM, Matthias J. Sax wrote:
> >>> Can you please add the KIP to this table:
> >>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-KIPsunderdiscussion
> >>>
> >>> Thanks,
> >>>
> >>>  Matthias
> >>>
> >>>
> >>> On 3/13/17 8:08 AM, Marc Juchli wrote:
>  Dear all,
> 
>  The following describes KIP-132, which I just created. See:
> 
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-132+-+Augment+KStream.print+to+allow+extra+parameters+in+the+printed+string
> 
>  Motivation
> 
>  As for now, KStream#print leads to a predefined output where key and
> >> value are
>  printed with comma separation.
>  KAFKA-4830 
> suggests
> >> to
>  extend print in a way that it takes KeyValueMapper as a parameter.
>  This will allow a user to change outputs according to the users
> demand.
>  Public Interfaces
> 
>  The affected interface is KStream, which needs to be extended with
> >> another
>  overloaded version of print:
> 
>  void print(final Serde keySerde,
> final Serde valSerde,
> final String streamName,
> final KeyValueMapper mapper);
> 
>  Proposed Changes
> 
>  See pull request GH-2669 .
>  This PR contains a discussion regarding KAFKA-4830
>   as well as
> >> KAFKA-4772
>  .
> 
>  Compatibility, Deprecation, and Migration Plan
> 
>  The extension of print will not introduce compatibility issues – we
> can
>  maintain the current output by keeping the current output format as a
>  default (if mapper was not set):
> 
>  if(mapper == null) {
>  printStream.println("[" + streamName + "]: " + keyToPrint + " , "
>  + valueToPrint);
>  } else {
>  printStream.println("[" + streamName + "]: " +
>  mapper.apply(keyToPrint, valueToPrint));
>  }
> 
> 
> 
>  Kind regards,
>  Marc
> 
> >>>
> >>
> >>
> >
>
>


[jira] [Assigned] (KAFKA-4917) Our built-in file connector can't work with our built-in SMT

2017-03-18 Thread Manasvi Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manasvi Gupta reassigned KAFKA-4917:


Assignee: Manasvi Gupta

> Our built-in file connector can't work with our built-in SMT
> 
>
> Key: KAFKA-4917
> URL: https://issues.apache.org/jira/browse/KAFKA-4917
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Gwen Shapira
>Assignee: Manasvi Gupta
>  Labels: newbie
>
> Our built-in file connector always returns STRING schema.
> All our transformations expect either STRUCT (if connectors return schema) or 
> a MAP (schemaless). 
> I understand why (how do you add a field to a STRING?), but it also means 
> that you can't have an example for SMT that works with Apache Kafka out of 
> the box. Which makes documentation kind of painful.
> Either we modify the file connector or we modify the SMTs to deal with STRING 
> better, but something gotta change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4912) Add check for topic name length

2017-03-18 Thread Sharad (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad reassigned KAFKA-4912:
-

Assignee: Sharad

> Add check for topic name length
> ---
>
> Key: KAFKA-4912
> URL: https://issues.apache.org/jira/browse/KAFKA-4912
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Sharad
>Priority: Minor
>  Labels: newbie
>
> We should check topic name length (if internal topics, and maybe for source 
> topics? -> in cause, {{topic.auto.create}} is enabled this might prevent 
> problems), and raise an exception if they are too long. Cf. KAFKA-4893



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2705: KAFKA-4907: message.timestamp.difference.max.ms sh...

2017-03-18 Thread becketqin
GitHub user becketqin opened a pull request:

https://github.com/apache/kafka/pull/2705

KAFKA-4907: message.timestamp.difference.max.ms should not be applied to 
log compacted topic if user did not set the property.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/becketqin/kafka KAFKA-4907

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2705


commit 96e1301338619b7d45d5b90adf0cecbf40491a2c
Author: Jiangjie Qin 
Date:   2017-03-19T06:11:19Z

KAFKA-4907: message.timestamp.difference.max.ms should not be applied to 
log compacted topic if user did not set the property.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4907) compacted topic shouldn't reject messages with old timestamp

2017-03-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931590#comment-15931590
 ] 

ASF GitHub Bot commented on KAFKA-4907:
---

GitHub user becketqin opened a pull request:

https://github.com/apache/kafka/pull/2705

KAFKA-4907: message.timestamp.difference.max.ms should not be applied to 
log compacted topic if user did not set the property.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/becketqin/kafka KAFKA-4907

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2705


commit 96e1301338619b7d45d5b90adf0cecbf40491a2c
Author: Jiangjie Qin 
Date:   2017-03-19T06:11:19Z

KAFKA-4907: message.timestamp.difference.max.ms should not be applied to 
log compacted topic if user did not set the property.




> compacted topic shouldn't reject messages with old timestamp
> 
>
> Key: KAFKA-4907
> URL: https://issues.apache.org/jira/browse/KAFKA-4907
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Jun Rao
>Assignee: Jiangjie Qin
>
> In LogValidator.validateTimestamp(), we check the validity of the timestamp 
> in the message without checking whether the topic is compacted or not. This 
> can cause messages to a compacted topic to be rejected when it shouldn't.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4918) Continuous fetch requests for offset storage topic in kafka connect

2017-03-18 Thread Liju (JIRA)
Liju created KAFKA-4918:
---

 Summary: Continuous fetch requests for offset storage topic in 
kafka connect
 Key: KAFKA-4918
 URL: https://issues.apache.org/jira/browse/KAFKA-4918
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 0.10.2.0, 0.10.1.1, 0.10.1.0
 Environment: unix, osx
Reporter: Liju


The kafka consumer in the KafkaOffsetBackingStore polls continuously with 
timeout hardcoded as 0 ms , this leads to high fetch request load to kafka 
server , and specifically for the sink connectors ( eg. kafka-connect-hdfs) 
which doesn't uses the offset storage topic for offset tracking , this becomes 
redundant and it continuously sends fetch request as there is no data in this 
topic 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)