date:20161022

[jira] [Commented] (KAFKA-4311) Multi layer cache eviction causes forwarding to incorrect ProcessorNode

2016-10-22 Thread Damian Guy (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15597385#comment-15597385
 ] 

Damian Guy commented on KAFKA-4311:
---

Frank, thanks for testing this for me. I just pushed another minor changing 
fixing a potential reason this could happen. I'm not sure it is the reason it 
is happening in your case as i'm unable to reproduce this specific issue. If 
you wouldn't mind trying out https://github.com/dguy/kafka/tree/kafka-4311 
again, it would be really appreciated.
I'm going to be out for the next week, but hopefully someone else will be able 
to look into into it further.

Thanks for your help so far.

Regards,
Damian

> Multi layer cache eviction causes forwarding to incorrect ProcessorNode 
> 
>
> Key: KAFKA-4311
> URL: https://issues.apache.org/jira/browse/KAFKA-4311
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.1.1
>
>
> The two exceptions below were reported by Frank on the dev mailing list. 
> After investigation, the root cause is multiple cache evictions happening in 
> the same topology. 
> Given a topology like the one below. If a record arriving in `tableOne` 
> causes a cache eviction, it will trigger the `leftJoin` that will do a `get` 
> from `reducer-store`. If the key is not currently cached in `reducer-store`, 
> but is in the backing store, it will be put into the cache, and it may also 
> trigger an eviction. If it does trigger an eviction and the eldest entry is 
> dirty it will flush the dirty keys. It is at this point that the exception in 
> the comment happens (ClassCastException). This occurs because the 
> ProcessorContext is still set to the context of the `leftJoin` and the next 
> child in the topology is `mapValues`.
> We need to set the correct `ProcessorNode`, on the context,  in the 
> `ForwardingCacheFlushListener` prior to calling `context.forward`. We also 
> need to set remember to reset the `ProcessorNode` to the previous node once 
> `context.forward` has completed.
> {code}
> final KTable one = builder.table(Serdes.String(), 
> Serdes.String(), tableOne, tableOne);
> final KTable two = builder.table(Serdes.Long(), 
> Serdes.String(), tableTwo, tableTwo);
> final KTable reduce = two.groupBy(new 
> KeyValueMapper>() {
> @Override
> public KeyValue apply(final Long key, final String 
> value) {
> return new KeyValue<>(value, key);
> }
> }, Serdes.String(), Serdes.Long())
> .reduce(new Reducer() {
> @Override
> public Long apply(final Long value1, final Long value2) {
> return value1 + value2;
> }
> }, new Reducer() {
> @Override
> public Long apply(final Long value1, final Long value2) {
> return value1 - value2;
> }
> }, "reducer-store");
> one.leftJoin(reduce, new ValueJoiner() {
> @Override
> public String apply(final String value1, final Long value2) {
> return value1 + ":" + value2;
> }
> })
> .mapValues(new ValueMapper() {
> @Override
> public String apply(final String value) {
> return value;
> }
> });
> {code}
> This exception is actually a symptom of the exception reported below in the 
> comment. After the first exception is thrown, the StreamThread triggers a 
> shutdown that then throws this exception.
> [StreamThread-1] ERROR
> org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
> [StreamThread-1] Failed to close state manager for StreamTask 0_0:
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
> to close state store addr-organization
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
> at
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java

[GitHub] kafka-site pull request #26: add trademark symbol on all pages plus longer f...

2016-10-22 Thread derrickdoo

GitHub user derrickdoo opened a pull request:

https://github.com/apache/kafka-site/pull/26

add trademark symbol on all pages plus longer footer message

- Added trademark symbol to first instance of "Kafka" or "Apache Kafka" 
across the site.
- Updated footer copy

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/derrickdoo/kafka-site brandingUpdates

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/26.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #26


commit 58f04a38bda74a33ade06573edbcbde050690420
Author: Derrick Or 
Date:   2016-10-22T11:21:00Z

add trademark symbol on all pages plus longer footer message




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] kafka pull request #2054: kafka-4295: ConsoleConsumer does not delete the te...

2016-10-22 Thread amethystic

GitHub user amethystic opened a pull request:

https://github.com/apache/kafka/pull/2054

kafka-4295: ConsoleConsumer does not delete the temporary group in zookeeper

Since consumer stop logic and zk node removal code are in separate threads, 
so when two threads execute in an interleaving manner, persistent node 
'/consumers/' might not be removed for those console consumer groups which do 
not specify "group.id". This will pollute Zookeeper with lots of inactive 
console consumer offset information.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amethystic/kafka 
kafka-4295_ConsoleConsumer_fail_to_remove_zknode_onexit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2054.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2054


commit 872441963c67d3ec57c18690bef14ef5afbfb45a
Author: huxi 
Date:   2016-10-22T12:41:42Z

kafka-4295: kafka-console-consumer.sh does not delete the temporary group 
in zookeeper
Author: huxi

Since consumer stop logic and zk node removal code are in separate threads, 
so when two threads execute in an interleaving manner, persistent node 
'/consumers/' might not be removed for those console consumer 
groups which do not specify "group.id". This will pollute Zookeeper with lots 
of inactive console consumer offset information.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (KAFKA-4334) HashCode in SinkRecord not handling null timestamp, checks on value

2016-10-22 Thread Andrew Stevenson (JIRA)

Andrew Stevenson created KAFKA-4334:
---

 Summary: HashCode in SinkRecord not handling null timestamp, 
checks on value
 Key: KAFKA-4334
 URL: https://issues.apache.org/jira/browse/KAFKA-4334
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.10.0.1
Reporter: Andrew Stevenson
Assignee: Ewen Cheslack-Postava


hashCode for timestamp field has null check on field value not timestamp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2055: check for null timestamp rather than value in hash...

2016-10-22 Thread andrewstevenson

GitHub user andrewstevenson opened a pull request:

https://github.com/apache/kafka/pull/2055

check for null timestamp rather than value in hashcode



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/datamountaineer/kafka kafka-4334

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2055.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2055


commit 3a7f687addc3a09d1b78dff034a24cc29c64ae25
Author: Andrew Stevenson 
Date:   2016-10-22T13:24:33Z

check for null timestamp rather than value in hashcode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (KAFKA-4334) HashCode in SinkRecord not handling null timestamp, checks on value

2016-10-22 Thread Andrew Stevenson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Stevenson updated KAFKA-4334:

Flags: Patch

PR raised

https://github.com/apache/kafka/pull/2055

> HashCode in SinkRecord not handling null timestamp, checks on value
> ---
>
> Key: KAFKA-4334
> URL: https://issues.apache.org/jira/browse/KAFKA-4334
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.1
>Reporter: Andrew Stevenson
>Assignee: Ewen Cheslack-Postava
>
> hashCode for timestamp field has null check on field value not timestamp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4306) Connect workers won't shut down if brokers are not available

2016-10-22 Thread Saravanan Tirugnanum (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15598388#comment-15598388
 ] 

Saravanan Tirugnanum commented on KAFKA-4306:
-

We are noticing this same problem (RetriableCommitFailedException) in 
production  caused by Request Timed out. consumer.request.timeout.ms is also 
far higher in our case,So , what could be the reason in our case. 

> Connect workers won't shut down if brokers are not available
> 
>
> Key: KAFKA-4306
> URL: https://issues.apache.org/jira/browse/KAFKA-4306
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Gwen Shapira
>Assignee: Ewen Cheslack-Postava
>
> If brokers are not available and we try to shut down connect workers, sink 
> connectors will be stuck in a loop retrying to commit offsets:
> 2016-10-17 09:39:14,907] INFO Marking the coordinator 192.168.1.9:9092 (id: 
> 2147483647 rack: null) dead for group connect-dump-kafka-config1 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:600)
> [2016-10-17 09:39:14,907] ERROR Commit of 
> WorkerSinkTask{id=dump-kafka-config1-0} offsets threw an unexpected 
> exception:  (org.apache.kafka.connect.runtime.WorkerSinkTask:194)
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: 
> org.apache.kafka.common.errors.GroupCoordinatorNotAvailableException
> We should probably limit the number of retries before doing "unclean" 
> shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1622: KAFKA-3859: Fix describe group command to report v...

2016-10-22 Thread vahidhashemian

Github user vahidhashemian closed the pull request at:

https://github.com/apache/kafka/pull/1622


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-3859) Consumer group is stuck in rebalancing status

2016-10-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15598439#comment-15598439
 ] 

ASF GitHub Bot commented on KAFKA-3859:
---

Github user vahidhashemian closed the pull request at:

https://github.com/apache/kafka/pull/1622


> Consumer group is stuck in rebalancing status
> -
>
> Key: KAFKA-3859
> URL: https://issues.apache.org/jira/browse/KAFKA-3859
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> * I have a topic (1 partition) and a producer and new consumer that produce 
> to and consumer from the topic.
> * The consumer belongs to group {{A}}.
> * I kill the consumer (whether it has consumed any messages or not does not 
> seem to be relevant).
> * After a short period when group status is processed and finalized, I run 
> the consumer-group describe command ({{kafka-consumer-groups.sh 
> --bootstrap-server localhost:9092 --new-consumer --describe --group A}}).
> * The response I receive is {{Consumer group `A` is rebalancing.}}
> * I keep trying the command but the response does not change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3859) Consumer group is stuck in rebalancing status

2016-10-22 Thread Vahid Hashemian (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian updated KAFKA-3859:
---
Status: Open  (was: Patch Available)

Fixed as part of [KAFKA-3144|https://issues.apache.org/jira/browse/KAFKA-3144].

> Consumer group is stuck in rebalancing status
> -
>
> Key: KAFKA-3859
> URL: https://issues.apache.org/jira/browse/KAFKA-3859
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> * I have a topic (1 partition) and a producer and new consumer that produce 
> to and consumer from the topic.
> * The consumer belongs to group {{A}}.
> * I kill the consumer (whether it has consumed any messages or not does not 
> seem to be relevant).
> * After a short period when group status is processed and finalized, I run 
> the consumer-group describe command ({{kafka-consumer-groups.sh 
> --bootstrap-server localhost:9092 --new-consumer --describe --group A}}).
> * The response I receive is {{Consumer group `A` is rebalancing.}}
> * I keep trying the command but the response does not change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (KAFKA-3859) Consumer group is stuck in rebalancing status

2016-10-22 Thread Vahid Hashemian (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian resolved KAFKA-3859.

Resolution: Duplicate

> Consumer group is stuck in rebalancing status
> -
>
> Key: KAFKA-3859
> URL: https://issues.apache.org/jira/browse/KAFKA-3859
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> * I have a topic (1 partition) and a producer and new consumer that produce 
> to and consumer from the topic.
> * The consumer belongs to group {{A}}.
> * I kill the consumer (whether it has consumed any messages or not does not 
> seem to be relevant).
> * After a short period when group status is processed and finalized, I run 
> the consumer-group describe command ({{kafka-consumer-groups.sh 
> --bootstrap-server localhost:9092 --new-consumer --describe --group A}}).
> * The response I receive is {{Consumer group `A` is rebalancing.}}
> * I keep trying the command but the response does not change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (KAFKA-4333) Report consumer group coordinator id when '--list' option is used

2016-10-22 Thread Vahid Hashemian (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4333 started by Vahid Hashemian.
--
> Report consumer group coordinator id when '--list' option is used
> -
>
> Key: KAFKA-4333
> URL: https://issues.apache.org/jira/browse/KAFKA-4333
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>Priority: Minor
>
> One piece of information missing when extracting information about consumer 
> groups (Java API based) is the coordinator id (broker id of the coordinator). 
> It would be useful to enhance the {{--list}} option of the consumer group 
> command to report the corresponding coordinator id of each consumer group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-68 Add a consumed log retention before log retention

2016-10-22 Thread Becket Qin

Hi David,

> 1. What scenario is used to this configuration?

One scenario is stream processing pipeline. In a stream processing DAG,
there will be a bunch of intermediate result, we only care about the
consumer group that is in the downstream of the DAG, but not other groups.
Ideally we want to delete the log of the intermediate topics right after
all the downstream processing jobs has successfully processed the messages.
In that case, we only care about the downstream processing jobs, but not
other groups. That means if a down stream job did not commit offset for
some reason, we want to wait for that job. Without the predefined
interested group, it is hard to achieve this.


2. Yes, the configuration should be at topic level and set dynamically.

Thanks,

Jiangjie (Becket) Qin

On Fri, Oct 21, 2016 at 7:40 AM, 东方甲乙 <254479...@qq.com> wrote:

> Hi Mayuresh,
> Thanks for the reply:
> 1.  In the log retention check schedule, the broker first find the all the
> consumed group which are consuming this topic, and query the commit offset
> of this consumed group for the topic
> using the OffsetFetch API. And the min commit offset is the minimal commit
> offset between these commit offsets.
>
>
> 2.  If the console consumer reading and commit, its commit offset will be
> used to calculate the min commit offset for this topic.
> We can avoid the random consumer using the method Becket suggested.
>
>
> 3. It will not delete the log immediately, the log will stay some time (
> retention.commitoffset.ms), and after that we only delete
> the log segments whose offsets are less than the min commit offset.  So
> the user can rewind its offset in the log.retention.ms.
>
>
> Thanks,
> David
>
>
>
>
> -- 原始邮件 --
> 发件人: "Mayuresh Gharat";;
> 发送时间: 2016年10月19日(星期三) 上午10:25
> 收件人: "dev";
>
> 主题: Re: [DISCUSS] KIP-68 Add a consumed log retention before log retention
>
>
>
> Hi David,
>
> Thanks for the KIP.
>
> I had some questions/suggestions :
>
> It would be great if you can explain with an example about how the min
> offset for all the consumers will be calculated, in the KIP.
> What I meant was, it would be great to understand with a pseudo
> code/workflow if possible, how each broker knows all the consumers for the
> given topic-partition and how the min is calculated.
>
> Also it would be good to understand what happens if we start a console
> consumer which would actually start reading from the beginning offset and
> commit and crash immediately. How will the segments get deleted?
>
> Will it delete all the log segments if all the consumers have read till
> latest? If Yes, would we be able to handle a scenario were we say that user
> can rewind its offset to reprocess the data since log.retention.ms might
> not has reached.
>
> Thanks,
>
> Mayuresh
>
> On Mon, Oct 17, 2016 at 12:27 AM, Becket Qin  wrote:
>
> > Hey David,
> >
> > Thanks for replies to the questions.
> >
> > I think one major thing still not clear at this point is that whether the
> > brokers will only apply the consumed log retention to a specific set of
> > interested consumer groups, or it does not have such a set of consumer
> > groups.
> >
> > For example, for topic T, assume we know that there will be two
> downstream
> > consumer groups CG1 and CG2 consuming data from topic T. Will we add a
> > topic configurations such as
> > "log.retention.commitoffset.interested.groups=CG1,CG2" to topic T so
> that
> > the brokers only care about CG1 and CG2. The committed offsets of other
> > groups are not interested and won't have any impact on the committed
> offset
> > based log retention.
> >
> > It seems the current proposal does not have an "interested consumer group
> > set" configuration, so that means any random consumer group may affect
> the
> > committed offset based log retention.
> >
> > I think the committed offset based log retention seems more useful in
> cases
> > where we already know which consumer groups will be consuming from this
> > topic, so we will only wait for those consumer groups but ignore the
> > others. If a group will be consumed by many unknown or unpredictable
> > consumer groups, it seems the existing time based log retention is much
> > simple and clear enough. So I would argue we don't need to address the
> case
> > that some groups may come later in the committed offset based retention.
> >
> > That said, there may still be value to keep the data for some time even
> > after all the interested consumer groups have consumed the messages. For
> > example, in a pipelined stream processing DAG, we may want to keep the
> data
> > of an intermediate topic for some time in case the job fails. So we can
> > resume from a previously succeeded stage instead of restart the entire
> > pipeline. Or we can use the intermediate topic for some debugging work.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Sun, Oct 16, 2016 at 2:15 AM, 东方甲乙 <254479...@qq.com> wrote:
> >
> > > Hi Don

[GitHub] kafka pull request #2056: KAFAK-4058: Failure in org.apache.kafka.streams.in...

2016-10-22 Thread mjsax

GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2056

KAFAK-4058: Failure in 
org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset

 - fixed consumer group dead condition
 - disabled state store cache

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka KAFKA-4058-instableResetToolTest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2056.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2056


commit 820f6b9d8803b5cafc9482be8d300d292ce5e6dd
Author: Matthias J. Sax 
Date:   2016-10-23T00:23:51Z

KAFAK-4058: Failure in 
org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset
 - fixed consumer group dead condition
 - disabled state store cache




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (KAFKA-4331) Kafka Streams resetter is slow because it joins the same group for each topic

2016-10-22 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4331:
---
Status: Patch Available  (was: In Progress)

https://github.com/apache/kafka/pull/2049

> Kafka Streams resetter is slow because it joins the same group for each topic
> -
>
> Key: KAFKA-4331
> URL: https://issues.apache.org/jira/browse/KAFKA-4331
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.1, 0.10.0.0
>Reporter: Roger Hoover
>Assignee: Matthias J. Sax
>
> The resetter is joining the same group for each topic which takes ~10secs in 
> my testing.  This makes the reset very slow when you have a lot of topics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (KAFKA-4331) Kafka Streams resetter is slow because it joins the same group for each topic

2016-10-22 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4331 started by Matthias J. Sax.
--
> Kafka Streams resetter is slow because it joins the same group for each topic
> -
>
> Key: KAFKA-4331
> URL: https://issues.apache.org/jira/browse/KAFKA-4331
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Roger Hoover
>Assignee: Matthias J. Sax
>
> The resetter is joining the same group for each topic which takes ~10secs in 
> my testing.  This makes the reset very slow when you have a lot of topics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-4058) Failure in org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset

2016-10-22 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4058:
---
Status: Patch Available  (was: Reopened)

https://github.com/apache/kafka/pull/2056

> Failure in 
> org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset
> --
>
> Key: KAFKA-4058
> URL: https://issues.apache.org/jira/browse/KAFKA-4058
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>  Labels: test
> Fix For: 0.10.1.0
>
>
> {code}
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.kafka.streams.integration.ResetIntegrationTest.cleanGlobal(ResetIntegrationTest.java:225)
>   at 
> org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset(ResetIntegrationTest.java:103)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:377)
>   at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndR

Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-22 Thread Nacho Solis

To be perfectly clear, I'm actually not in favor of adding a REST module to
Kafka, but that's because I think other things should be separated as well.

Again, I'm new and don't know the history, and I admit it's purely personal
preference, but I'm on the Tea Party side of things... the microkernel
sub-culture. In other words, I would rather have things divided into
components.  If it was up to me I would split the broker from the client
libraries from the protocol definition.  In that same vein I would split
Connect, Streams, Mirrormaker, etc.

I'm not sure what you mean by "data integration". Is that something that
you can't do from an external module? Do we have a high overhead to use the
Kafka facilities from outside our own source code? If so, we probably need
to rethink our APIs and data structures; or they way we have modularized
our code (callbacks, interceptors, etc).

I'm assuming we're not trying to be like the closed source companies that
give their own applications access to a more efficient API to have an
advantage over the competition, right?

So, if a streaming platform is what we want, then maybe what we need is a
set of modules that deliver that platform vision. A good example of this is
the connector set that Confluent provides.  These are not in the core
source, they create an offering. The same could be true for KStreams,
Connect and REST.

Do we have any code that calls Connect and KStreams from inside core? Or is
the code able to live on it's own?  (Same for Mirror maker).

Nacho

On Fri, Oct 21, 2016 at 2:31 PM, Sriram Subramanian 
wrote:

> FWIW, Apache Kafka has evolved a lot from where it started. It did start as
> a messaging system. Over time we realized that that the vision for Kafka is
> to build a streaming platform and not just a messaging system. You can take
> a look at the site for more description about what comprises the streaming
> platform http://kafka.apache.org/ and http://kafka.apache.org/intro.
>
> Can the streaming platform exist without Connect? - No. Data integration is
> fundamental to building an end to end platform
>
> Can the streaming platform exist without stream processing? - No.
> Processing stream data again is a core part of streaming platform.
>
> Can the streaming platform exist without clients? - We at least need one
> client library to complete the platform. Our Java clients help us to
> complete the platform story. The rest of the clients are built and
> maintained outside the project.
>
> Can the platform exist without the rest proxy? - Yes. The proxy does not
> complete the platform vision in anyway. It is just a good to have tool that
> might be required by quite a few users and there is an active project that
> works on this - https://github.com/confluentinc/kafka-rest
>
>
>
>
> On Fri, Oct 21, 2016 at 11:49 AM, Nacho Solis  >
> wrote:
>
> > Are you saying Kafka REST is subjective but Kafka Streams and Kafka
> Connect
> > are not subjective?
> >
> > > "there are likely places that can live without a rest proxy"
> >
> > There are also places that can live without Kafka Streams and Kafka
> > Connect.
> >
> > Nacho
> >
> > On Fri, Oct 21, 2016 at 11:17 AM, Jun Rao  wrote:
> >
> > > At the high level, I think ideally it makes sense to add a component to
> > > Apache Kafka if (1) it's widely needed and (2) it needs tight
> integration
> > > with Kafka core. For Kafka Stream, we do expect stream processing will
> be
> > > used widely in the future. Implementation wise, Kafka Stream only
> > supports
> > > getting data from Kafka and leverages quite a few of the core
> > > functionalities in Kafka core. For example, it uses customized
> rebalance
> > > callback in the consumer and uses the compacted topic heavily. So,
> having
> > > Kafka Stream in the same repo makes it easier for testing when those
> core
> > > functionalities evolve over time. Kafka Connect is in the same
> situation.
> > >
> > > For rest proxy, whether it's widely used or not is going to be a bit
> > > subjective. However, there are likely places that can live without a
> rest
> > > proxy. The rest proxy is just a proxy for the regular clients and
> doesn't
> > > need to be tightly integrated with Kafka core. So, the case for
> including
> > > rest proxy in Apache Kafka is probably not as strong as Kafka Stream
> and
> > > Kafka Connect.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Oct 20, 2016 at 11:28 PM, Michael Pearce <
> michael.pea...@ig.com>
> > > wrote:
> > >
> > > > So from my reading essentially the first question needs to
> answered/and
> > > > voted on is:
> > > >
> > > > Is Apache Kafka Community only about the Core or does the apache
> > > community
> > > > also support some subprojects (and just we need some better way to
> > manage
> > > > this)
> > > >
> > > > If vote for Core only wins, then the following should be removed:
> > > > Kafka Connect
> > > > Kafka Stream
> > > >
> > > > If vote for Core only loses (aka we will support subprojects) then:

[jira] [Commented] (KAFKA-4311) Multi layer cache eviction causes forwarding to incorrect ProcessorNode

[GitHub] kafka-site pull request #26: add trademark symbol on all pages plus longer f...

[GitHub] kafka pull request #2054: kafka-4295: ConsoleConsumer does not delete the te...

[jira] [Created] (KAFKA-4334) HashCode in SinkRecord not handling null timestamp, checks on value

[GitHub] kafka pull request #2055: check for null timestamp rather than value in hash...

[jira] [Updated] (KAFKA-4334) HashCode in SinkRecord not handling null timestamp, checks on value

[jira] [Commented] (KAFKA-4306) Connect workers won't shut down if brokers are not available

[GitHub] kafka pull request #1622: KAFKA-3859: Fix describe group command to report v...

[jira] [Commented] (KAFKA-3859) Consumer group is stuck in rebalancing status

[jira] [Updated] (KAFKA-3859) Consumer group is stuck in rebalancing status

[jira] [Resolved] (KAFKA-3859) Consumer group is stuck in rebalancing status

[jira] [Work started] (KAFKA-4333) Report consumer group coordinator id when '--list' option is used

Re: [DISCUSS] KIP-68 Add a consumed log retention before log retention

[GitHub] kafka pull request #2056: KAFAK-4058: Failure in org.apache.kafka.streams.in...

[jira] [Updated] (KAFKA-4331) Kafka Streams resetter is slow because it joins the same group for each topic

[jira] [Work started] (KAFKA-4331) Kafka Streams resetter is slow because it joins the same group for each topic

[jira] [Updated] (KAFKA-4058) Failure in org.apache.kafka.streams.integration.ResetIntegrationTest.testReprocessingFromScratchAfterReset

Re: [DISCUSS] KIP-80: Kafka REST Server

18 matches

Site Navigation

Mail list logo

Footer information