Deleting by writing null payload not working in compacted logs

2016-08-10 Thread Christiane Lemke
Hi all,

I am trying to set up a minimal example understanding log compaction
behaviour using kafka-clients-0.10.0.0.jar. I got the compaction behaviour
working fine, however, when trying to delete a message explicitly by
writing a null value, the message seems to not be deleted.

These are my settings for my topic for compaction and deletion kicking in
as soon as possible: Configs:min.cleanable.dirty.ratio=0.01,
delete.retention.ms=100,retention.ms=100,segment.ms
=100,cleanup.policy=compact

The offending consumer record looks like this:
[ConsumerRecord(topic = compation-test, partition = 0, offset = 33,
CreateTime = 1470812816735, checksum = 3859648886, serialized key size =
16, serialized value size = -1, key = 9c9bde71-29ec-4687-ab24-9459f5fc0d34,
value = null)]

I can see the cleaner threads running fine, producing output like this:
[2016-08-10 08:24:51,601] INFO Cleaner 0: Cleaning segment 37 in log
tns_ticket-0 (last modified Wed Aug 10 07:52:26 UTC 2016) into 0, retaining
deletes. (kafka.log.LogCleaner)
(retaining deletes?)

I am running out of ideas of settings to try - are there any ideas about
what I might have missed or misunderstood?

Any hint greatly appreciated :)

Best, Christiane


Re: simple test fails on OSX

2016-08-10 Thread Manu Zhang
Hi Eric,

I can reproduce your problem although I have no idea yet what is going
wrong.
Do you have any updates ?

Regards,
Manu Zhang

On Mon, Aug 8, 2016 at 11:03 AM Eric Newton  wrote:

> I wrote a little unit test and it works as I expect on linux (ubuntu,
> redhat, openjdk 1.8).
>
> It fails regularly on a colleagues' OSX build environment with Oracle
> jdk1.8.0_45.
>
> The intent of the test is to write out a file of random lines. Each line is
> sent into a kafka as a message. A single consumer reads these messages and
> writes them to a file. If the consumer waits longer than a second receiving
> data, it assumes that all the messages have been delivered, and closes the
> output file; the test compares the input and output files.
>
> On OSX, the output file is always short.
>
> 1) am I making unreasonable assumptions in the test?  I know that its legal
> for the consuming thread to fall asleep for a second, or for kafka to wait
> and not deliver the message for a second, but is it reasonable?  There's
> nothing else going on, no swapping, plenty of free disk and memory.
>
> 2) are there known differences in OS/JVM behaviors that might affect the
> test?
>
> 3) we have experimented with longer waits for all the messages, but have
> not found anything that works 100% of the time.  I am concerned that my
> basic understanding of how kafka is supposed to work is fundamentally
> flawed.
>
> For example, does kafka not provide all available messages unless a buffer
> size is met?  Did I miss some flush or commit call that is required to see
> all messages?  Is there a setting to reduce the wait time for the last
> messages?
>
> I think apache strips out attachments, so I've made the code available here
>  as a
> simple
> stand-alone maven package (.tar.gz).  Unpack, review, and run with `mvn
> verify`.
>
> Thanks!
>
> -Eric
>


Re: Deleting by writing null payload not working in compacted logs

2016-08-10 Thread Tom Crayford
I'd possibly check the log segments in question, by using the
DumpLogSegments tool.

Note that Kafka keeps a deleted tombstone for 24h by default after a
deletion. Are you checking for the key and the value being present during
this testing?

Sample code for the producer messages would be useful as well.

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 10 August 2016, Christiane Lemke 
wrote:

> Hi all,
>
> I am trying to set up a minimal example understanding log compaction
> behaviour using kafka-clients-0.10.0.0.jar. I got the compaction behaviour
> working fine, however, when trying to delete a message explicitly by
> writing a null value, the message seems to not be deleted.
>
> These are my settings for my topic for compaction and deletion kicking in
> as soon as possible: Configs:min.cleanable.dirty.ratio=0.01,
> delete.retention.ms=100,retention.ms=100,segment.ms
> =100,cleanup.policy=compact
>
> The offending consumer record looks like this:
> [ConsumerRecord(topic = compation-test, partition = 0, offset = 33,
> CreateTime = 1470812816735, checksum = 3859648886, serialized key size =
> 16, serialized value size = -1, key = 9c9bde71-29ec-4687-ab24-
> 9459f5fc0d34,
> value = null)]
>
> I can see the cleaner threads running fine, producing output like this:
> [2016-08-10 08:24:51,601] INFO Cleaner 0: Cleaning segment 37 in log
> tns_ticket-0 (last modified Wed Aug 10 07:52:26 UTC 2016) into 0, retaining
> deletes. (kafka.log.LogCleaner)
> (retaining deletes?)
>
> I am running out of ideas of settings to try - are there any ideas about
> what I might have missed or misunderstood?
>
> Any hint greatly appreciated :)
>
> Best, Christiane
>


Re: simple test fails on OSX

2016-08-10 Thread Eric Newton
Thanks Manu for taking a look.

I updated the test to run against 0.10.0.0, and it appears to work.  I'm
now working my way back to my original issue in our application to see if
it is still broken.

Testing is slowed by the fact that I don't use a Mac, and I'm very new to
kafka.  Hopefully you don't see anything too strange in my little test.

-Eric

On Wed, Aug 10, 2016 at 4:45 AM, Manu Zhang  wrote:

> Hi Eric,
>
> I can reproduce your problem although I have no idea yet what is going
> wrong.
> Do you have any updates ?
>
> Regards,
> Manu Zhang
>
> On Mon, Aug 8, 2016 at 11:03 AM Eric Newton  wrote:
>
> > I wrote a little unit test and it works as I expect on linux (ubuntu,
> > redhat, openjdk 1.8).
> >
> > It fails regularly on a colleagues' OSX build environment with Oracle
> > jdk1.8.0_45.
> >
> > The intent of the test is to write out a file of random lines. Each line
> is
> > sent into a kafka as a message. A single consumer reads these messages
> and
> > writes them to a file. If the consumer waits longer than a second
> receiving
> > data, it assumes that all the messages have been delivered, and closes
> the
> > output file; the test compares the input and output files.
> >
> > On OSX, the output file is always short.
> >
> > 1) am I making unreasonable assumptions in the test?  I know that its
> legal
> > for the consuming thread to fall asleep for a second, or for kafka to
> wait
> > and not deliver the message for a second, but is it reasonable?  There's
> > nothing else going on, no swapping, plenty of free disk and memory.
> >
> > 2) are there known differences in OS/JVM behaviors that might affect the
> > test?
> >
> > 3) we have experimented with longer waits for all the messages, but have
> > not found anything that works 100% of the time.  I am concerned that my
> > basic understanding of how kafka is supposed to work is fundamentally
> > flawed.
> >
> > For example, does kafka not provide all available messages unless a
> buffer
> > size is met?  Did I miss some flush or commit call that is required to
> see
> > all messages?  Is there a setting to reduce the wait time for the last
> > messages?
> >
> > I think apache strips out attachments, so I've made the code available
> here
> >  as a
> > simple
> > stand-alone maven package (.tar.gz).  Unpack, review, and run with `mvn
> > verify`.
> >
> > Thanks!
> >
> > -Eric
> >
>


Console consumer with SSL

2016-08-10 Thread Srikanth
Hi,

I'm trying a simple console producer & consumer test on a broker with SSL
enabled.
  listeners=PLAINTEXT://192.168.50.4:9092,SSL://192.168.50.4:9093

Producer was started and is publishing using SSL interface.
  bin/kafka-console-producer.sh --broker-list 192.168.50.4:9093 --topic
test-topic --producer.config config/producer-ssl.properties

Old consumer is able to read from plain test interface
  bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic
test-topic --consumer.config config/consumer.properties --from-beginning
--delete-consumer-offsets

But the new consumer is not reading anything from both SSL and plain test
ports.
  bin/kafka-console-consumer.sh --bootstrap-server 192.168.50.4:9093
--topic test-topic --new-consumer --consumer.config
config/consumer-ssl.properties --from-beginning
  bin/kafka-console-consumer.sh --bootstrap-server 192.168.50.4:9092
--topic test-topic --new-consumer --consumer.config
config/consumer.properties --from-beginning

group.id=console-consumer-group & exclude.internal.topics=false are set in
consumer properties
For SSl, I've set these too
  security.protocol=SSL
  ssl.truststore.location=/path/to/client.truststore.jks
  ssl.truststore.password=

Any suggestions on how to debug this?

Srikanth


Streams 'External source topic not found' while using groupBy/aggregate

2016-08-10 Thread Mathieu Fenniak
Hey there, Kafka Users,

I'm trying to join two topics with Kafka Streams.  The first topic is a
changelog of one object, and the second is a changelog of a related
object.  In order to join these tables, I'm grouping the second table by a
piece of data in it that indicates what record it is related to in the
first table.  But I'm getting an unexpected error related to the
repartitioning topic for the aggregated table:

org.apache.kafka.streams.errors.TopologyBuilderException: Invalid topology
building: External source topic not found:
*TableNumber2Aggregated-repartition*
at
org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.ensureCopartitioning(StreamPartitionAssignor.java:452)
at
org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.ensureCopartitioning(StreamPartitionAssignor.java:440)


(Full exception:
https://gist.github.com/mfenniak/11ca081191932fbb33a0c3cc32ad1686)

It appears that the "TableNumber2Aggregated-repartition" topic *is* created
in Kafka by the streams application, but the Kafka topic has a prefix that
matches my application id (timesheet-status).  Perhaps something is
prefixing the topic name, but it isn't being applied everywhere?

$ ./kafka-topics.sh --zookeeper localhost --list
TableNumber1
TableNumber2
__consumer_offsets
timesheet-status-TableNumber2Aggregated-repartition


Here's a sample that reproduces the issue (note, I've cut out all the
actual mapping, grouping, and aggregating logic, but, this still reproduces
the error):

public static TopologyBuilder createTopology() {
KStreamBuilder builder = new KStreamBuilder();

KTable table1Mapped = builder.table(Serdes.String(), new
JsonSerde(Map.class), "TableNumber1")
.mapValues((value) -> null);

KTable table2Aggregated = builder.table(Serdes.String(), new
JsonSerde(Map.class), "TableNumber2")
.groupBy((key, value) -> null)
.aggregate(() -> null, (k, v, t) -> null, (k, v, t) -> null,
new JsonSerde(Map.class), "TableNumber2Aggregated");

table1Mapped.join(table2Aggregated, (left, right) -> {
LOG.debug("join");
return null;
});

return builder;
}

I'm using the latest Kafka Streams release, 0.10.0.1.  Any thoughts on how
I could proceed to debug or workaround this?

Thanks all,

Mathieu


Kafka Connect questions

2016-08-10 Thread Jeyhun Karimov
Hi community,

I am using Kafka-Connect to import text files to kafka. Because I have to
give exact file name (like ./data.txt) and not the parent directory (./*)
as an input to Kafka-Connect, how can I import files in particular
directory as they are created in runtime? The solutions that come to my
mind are below but I don't know about their efficiency.
- I may create a new file in particular directory (after the kafka-connect
started already) and start new connect instance to point that file
-or I may write to only one file (just one ./data.txt) all the time (and no
other data files created)  and kafka-connect will transfer the data only
from one file and it will from where it left (as data writes are made)
-or any other way to efficiently handle this use case.

My second question is, is it safe to  override the data transfer methods of
kafka-connect? For example I want to put thread.sleeps in kafka-streams
side while transferring data and see the behaviour in kafka side or in
application side. You can think of as simulation of load.

Cheers
Jeyhun


-- 
-Cheers

Jeyhun


Kafka 0.9.0.1 allows offset commits for offsets already committed

2016-08-10 Thread Sean Morris (semorris)
I ran in to a strange scenario today that after looking closer I am seeing that 
Kafka doesn't fail on several invalid scenarios of committing offsets. I have 
two partitions and two processes in the same consumer group, so each process 
has one partition. Process 1 retrieved a set of records from partition 0 (say 
offset 1 to 10) and then as it processed them one by one it committed the 
offsets one by one. Process 2 was consuming from partition 1 at this time. 
Meanwhile a rebalance occurred, I see this message in the Kafka logs


2016-08-10 11:00:33,702] INFO [kafka.coordinator.GroupCoordinator] 
[GroupCoordinator 1]: Preparing to restabilize group casemonGatewayNotifier 
with old generation 2

[2016-08-10 11:00:35,503] INFO [kafka.coordinator.GroupCoordinator] 
[GroupCoordinator 1]: Group casemonGatewayNotifier generation 2 is dead and 
removed

[2016-08-10 11:00:39,700] INFO [kafka.coordinator.GroupCoordinator] 
[GroupCoordinator 1]: Preparing to restabilize group casemonGatewayNotifier 
with old generation 0

[2016-08-10 11:00:39,700] INFO [kafka.coordinator.GroupCoordinator] 
[GroupCoordinator 1]: Stabilized group casemonGatewayNotifier generation 1

[2016-08-10 11:00:39,701] INFO [kafka.coordinator.GroupCoordinator] 
[GroupCoordinator 1]: Assignment received from leader for group 
casemonGatewayNotifier for generation 1

[2016-08-10 11:00:39,751] INFO [kafka.coordinator.GroupCoordinator] 
[GroupCoordinator 1]: Preparing to restabilize group casemonGatewayNotifier 
with old generation 1

[2016-08-10 11:00:43,699] INFO [kafka.coordinator.GroupCoordinator] 
[GroupCoordinator 1]: Stabilized group casemonGatewayNotifier generation 2

[2016-08-10 11:00:43,701] INFO [kafka.coordinator.GroupCoordinator] 
[GroupCoordinator 1]: Assignment received from leader for group 
casemonGatewayNotifier for generation 2

At this point the two processes swapped partitions but process 1 was still 
processing and committing records with partition 0, some of these failed but 
then they started to succeed again. While it was doing this process 2 called 
"poll" and received records from partition 0 and received records with offset 7 
to 10. So at this point both processes have some of the same records from 
partition 0 and are both processing and committing them. The strange part is 
that none of the commits are failing. Both processes are able to commit the 
same records and process 1 is able to commit offsets to the partition it 
doesn't even own.

I created some test code to further verify this,  I continuously commit the 
same offset every time and what I saw was that "poll" would continue to return 
me new records (records with a higher offset then what I committed). 
kafka.admin.ConsumerGroupCommand continues to show the old offset and a lag. 
When I restart the process it correctly starts back at the old offset, but 
continues to get new records beyond the offset again.  It seems that invalid 
commits confuse the consumer logic.

Since the commits don't fail after the rebalance I don't seem to have a way to 
know that I should no longer be processing these records, therefore I end up 
processing them in both processes.  I know in 0.10 the poll has been changed to 
specify the number of records to retrieve, it seems that the only way to avoid 
these kind of duplicate processing is to only every process the 1st record 
returned by "poll", this seems inefficient. I know there is a rebalance 
callback but if I read the documentation correct this will only get invoked if 
you call "poll" so when working off my memory buffer I wouldn't know. How are 
people handling this currently?

Thanks,
Sean





Kafka behind NAT

2016-08-10 Thread Marcin
We have kafka behind NAT with *only one broker*.
Let say we have internal (A) and external (B) network.

When we try to reach the broker from external network (we use
bootstrap.servers parameter set to B address) then what is obvious the
broker responds with internal network's address (A) which is not resolvable
in external network. We cannot set advertised.listeners to external
network's address because the broker is also used from internal network.

Only solution I am aware is to set name instead of IP in
advertised.listeners parameter and use DNS or HOSTS file in internal and
external network. Unfortunatelly we cannot use this solution.

Is there any other way - beside kafka code change - to make the broker to
handle requests from internal and external networks.

In other words - is there any way to connect directly to the broker which
address we provided to the client?


I hope that somebody dealt with simillar problem.
Thanks for any help.


Re: Kafka behind NAT

2016-08-10 Thread Radoslaw Gruchalski
Marcin,

The DNS seems to be your friend. /etc/hosts should be sufficient but it
might be an operational hassle.

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com


On August 10, 2016 at 10:03:16 PM, Marcin (kan...@o2.pl) wrote:

We have kafka behind NAT with *only one broker*.
Let say we have internal (A) and external (B) network.

When we try to reach the broker from external network (we use
bootstrap.servers parameter set to B address) then what is obvious the
broker responds with internal network's address (A) which is not resolvable
in external network. We cannot set advertised.listeners to external
network's address because the broker is also used from internal network.

Only solution I am aware is to set name instead of IP in
advertised.listeners parameter and use DNS or HOSTS file in internal and
external network. Unfortunatelly we cannot use this solution.

Is there any other way - beside kafka code change - to make the broker to
handle requests from internal and external networks.

In other words - is there any way to connect directly to the broker which
address we provided to the client?


I hope that somebody dealt with simillar problem.
Thanks for any help.


RE: Kafka consumer getting duplicate message

2016-08-10 Thread Ghosh, Achintya (Contractor)
Can anyone please check this one?

Thanks
Achintya

-Original Message-
From: Ghosh, Achintya (Contractor) 
Sent: Monday, August 08, 2016 9:44 AM
To: users@kafka.apache.org
Cc: d...@kafka.apache.org
Subject: RE: Kafka consumer getting duplicate message

Thank you , Ewen for your response.
Actually we are using 1.0.0.M2 Spring Kafka release that uses Kafka 0.9 release.
Yes, we see a lot of duplicates and here is our producer and consumer settings 
in application. We don't see any duplicacy at Producer end I mean if we send 
1000 messages to a particular Topic we receive exactly (sometimes less) 1000 
messages.

But when we consume the message at Consumer level we see a lot of messages with 
same offset value and same partition , so please let us know what tweaking is 
needed to avaoid the duplicacy.

We have three types of Topics and each topic has 3 replication factors and 10 
partitions.

Producer Configuration:

bootstrap.producer.servers=provisioningservices-aq-dev.g.comcast.net:80
acks=1
retries=3
batch.size=16384
linger.ms=5
buffer.memory=33554432
request.timeout.ms=6
timeout.ms=6
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=com.comcast.provisioning.provisioning_.kafka.CustomMessageSer

Consumer Configuration:

bootstrap.consumer.servers=provisioningservices-aqr-dev.g.comcast.net:80
group.id=ps-consumer-group
enable.auto.commit=false
auto.commit.interval.ms=100
session.timeout.ms=15000
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=com.comcast.provisioning.provisioning_.kafka.CustomMessageDeSer

factory.getContainerProperties().setSyncCommits(true);
factory.setConcurrency(5);

Thanks
Achintya


-Original Message-
From: Ewen Cheslack-Postava [mailto:e...@confluent.io]
Sent: Saturday, August 06, 2016 1:45 AM
To: users@kafka.apache.org
Cc: d...@kafka.apache.org
Subject: Re: Kafka consumer getting duplicate message

Achintya,

1.0.0.M2 is not an official release, so this version number is not particularly 
meaningful to people on this list. What platform/distribution are you using and 
how does this map to actual Apache Kafka releases?

In general, it is not possible for any system to guarantee exactly once 
semantics because those semantics rely on the source and destination systems 
coordinating -- the source provides some sort of retry semantics, and the 
destination system needs to do some sort of deduplication or similar to only 
"deliver" the data one time.

That said, duplicates should usually only be generated in the face of failures. 
If you're seeing a lot of duplicates, that probably means shutdown/failover is 
not being handled correctly. If you can provide more info about your setup, we 
might be able to suggest tweaks that will avoid these situations.

-Ewen

On Fri, Aug 5, 2016 at 8:15 AM, Ghosh, Achintya (Contractor) < 
achintya_gh...@comcast.com> wrote:

> Hi there,
>
> We are using Kafka 1.0.0.M2 with Spring and we see a lot of duplicate 
> message is getting received by the Listener onMessage() method .
> We configured :
>
> enable.auto.commit=false
> session.timeout.ms=15000
> factory.getContainerProperties().setSyncCommits(true);
> factory.setConcurrency(5);
>
> So what could be the reason to get the duplicate messages?
>
> Thanks
> Achintya
>



--
Thanks,
Ewen


Re: Streams 'External source topic not found' while using groupBy/aggregate

2016-08-10 Thread Guozhang Wang
Hello Mathieu,

I think this issue is fixed in trunk but may get missed in the 0.10.0
branch, could you try running your program from trunk to verify if it is
the case? If yes we can consider backportting the hotfix from trunk to
0.10.0 and have another bug fix release.


Guozhang

On Wed, Aug 10, 2016 at 7:32 AM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> Hey there, Kafka Users,
>
> I'm trying to join two topics with Kafka Streams.  The first topic is a
> changelog of one object, and the second is a changelog of a related
> object.  In order to join these tables, I'm grouping the second table by a
> piece of data in it that indicates what record it is related to in the
> first table.  But I'm getting an unexpected error related to the
> repartitioning topic for the aggregated table:
>
> org.apache.kafka.streams.errors.TopologyBuilderException: Invalid topology
> building: External source topic not found:
> *TableNumber2Aggregated-repartition*
> at
> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.
> ensureCopartitioning(StreamPartitionAssignor.java:452)
> at
> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.
> ensureCopartitioning(StreamPartitionAssignor.java:440)
>
>
> (Full exception:
> https://gist.github.com/mfenniak/11ca081191932fbb33a0c3cc32ad1686)
>
> It appears that the "TableNumber2Aggregated-repartition" topic *is*
> created
> in Kafka by the streams application, but the Kafka topic has a prefix that
> matches my application id (timesheet-status).  Perhaps something is
> prefixing the topic name, but it isn't being applied everywhere?
>
> $ ./kafka-topics.sh --zookeeper localhost --list
> TableNumber1
> TableNumber2
> __consumer_offsets
> timesheet-status-TableNumber2Aggregated-repartition
>
>
> Here's a sample that reproduces the issue (note, I've cut out all the
> actual mapping, grouping, and aggregating logic, but, this still reproduces
> the error):
>
> public static TopologyBuilder createTopology() {
> KStreamBuilder builder = new KStreamBuilder();
>
> KTable table1Mapped = builder.table(Serdes.String(), new
> JsonSerde(Map.class), "TableNumber1")
> .mapValues((value) -> null);
>
> KTable table2Aggregated = builder.table(Serdes.String(), new
> JsonSerde(Map.class), "TableNumber2")
> .groupBy((key, value) -> null)
> .aggregate(() -> null, (k, v, t) -> null, (k, v, t) -> null,
> new JsonSerde(Map.class), "TableNumber2Aggregated");
>
> table1Mapped.join(table2Aggregated, (left, right) -> {
> LOG.debug("join");
> return null;
> });
>
> return builder;
> }
>
> I'm using the latest Kafka Streams release, 0.10.0.1.  Any thoughts on how
> I could proceed to debug or workaround this?
>
> Thanks all,
>
> Mathieu
>



-- 
-- Guozhang


Re: Streams 'External source topic not found' while using groupBy/aggregate

2016-08-10 Thread Mathieu Fenniak
Hi Guozhang,

Yes, it does seem to be fixed in trunk.  Thanks.  I should have tried that,
but I assumed that the recently released 0.10.0.1 would be pretty close to
trunk.  I can see where that was mistaken, since 0.10.0 is quite divergent
from trunk.

Mathieu


On Wed, Aug 10, 2016 at 2:39 PM, Guozhang Wang  wrote:

> Hello Mathieu,
>
> I think this issue is fixed in trunk but may get missed in the 0.10.0
> branch, could you try running your program from trunk to verify if it is
> the case? If yes we can consider backportting the hotfix from trunk to
> 0.10.0 and have another bug fix release.
>
>
> Guozhang
>
> On Wed, Aug 10, 2016 at 7:32 AM, Mathieu Fenniak <
> mathieu.fenn...@replicon.com> wrote:
>
> > Hey there, Kafka Users,
> >
> > I'm trying to join two topics with Kafka Streams.  The first topic is a
> > changelog of one object, and the second is a changelog of a related
> > object.  In order to join these tables, I'm grouping the second table by
> a
> > piece of data in it that indicates what record it is related to in the
> > first table.  But I'm getting an unexpected error related to the
> > repartitioning topic for the aggregated table:
> >
> > org.apache.kafka.streams.errors.TopologyBuilderException: Invalid
> topology
> > building: External source topic not found:
> > *TableNumber2Aggregated-repartition*
> > at
> > org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.
> > ensureCopartitioning(StreamPartitionAssignor.java:452)
> > at
> > org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.
> > ensureCopartitioning(StreamPartitionAssignor.java:440)
> >
> >
> > (Full exception:
> > https://gist.github.com/mfenniak/11ca081191932fbb33a0c3cc32ad1686)
> >
> > It appears that the "TableNumber2Aggregated-repartition" topic *is*
> > created
> > in Kafka by the streams application, but the Kafka topic has a prefix
> that
> > matches my application id (timesheet-status).  Perhaps something is
> > prefixing the topic name, but it isn't being applied everywhere?
> >
> > $ ./kafka-topics.sh --zookeeper localhost --list
> > TableNumber1
> > TableNumber2
> > __consumer_offsets
> > timesheet-status-TableNumber2Aggregated-repartition
> >
> >
> > Here's a sample that reproduces the issue (note, I've cut out all the
> > actual mapping, grouping, and aggregating logic, but, this still
> reproduces
> > the error):
> >
> > public static TopologyBuilder createTopology() {
> > KStreamBuilder builder = new KStreamBuilder();
> >
> > KTable table1Mapped = builder.table(Serdes.String(), new
> > JsonSerde(Map.class), "TableNumber1")
> > .mapValues((value) -> null);
> >
> > KTable table2Aggregated = builder.table(Serdes.String(), new
> > JsonSerde(Map.class), "TableNumber2")
> > .groupBy((key, value) -> null)
> > .aggregate(() -> null, (k, v, t) -> null, (k, v, t) -> null,
> > new JsonSerde(Map.class), "TableNumber2Aggregated");
> >
> > table1Mapped.join(table2Aggregated, (left, right) -> {
> > LOG.debug("join");
> > return null;
> > });
> >
> > return builder;
> > }
> >
> > I'm using the latest Kafka Streams release, 0.10.0.1.  Any thoughts on
> how
> > I could proceed to debug or workaround this?
> >
> > Thanks all,
> >
> > Mathieu
> >
>
>
>
> --
> -- Guozhang
>


[ANNOUCE] Apache Kafka 0.10.0.1 Released

2016-08-10 Thread Ismael Juma
The Apache Kafka community is pleased to announce the release for Apache
Kafka 0.10.0.1.
This is a bug fix release that fixes 53 issues in 0.10.0.0.

All of the changes in this release can be found in the release notes:
*https://archive.apache.org/dist/kafka/0.10.0.1/RELEASE_NOTES.html
*

Apache Kafka is high-throughput, publish-subscribe messaging system
rethought of as a distributed commit log.

** Fast => A single Kafka broker can handle hundreds of megabytes of reads
and writes per second from thousands of clients.

** Scalable => Kafka is designed to allow a single cluster to serve as the
central data backbone for a large organization. It can be elastically and
transparently expanded without downtime. Data streams are partitioned
and spread over a cluster of machines to allow data streams larger than
the capability of any single machine and to allow clusters of co-ordinated
consumers.

** Durable => Messages are persisted on disk and replicated within the
cluster to prevent data loss. Each broker can handle terabytes of messages
without performance impact.

** Distributed by Design => Kafka has a modern cluster-centric design that
offers strong durability and fault-tolerance guarantees.

You can download the source release from
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/kafka-0.10.0.1
-src.tgz

and binary releases from
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/kafka_2.10-0.10.
0.1.tgz
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/kafka_2.11-0.10.
0.1.tgz

A big thank you for the following people who have contributed to the
0.10.0.1 release.

Alex Glikson, Alex Loddengaard, Alexey Romanchuk, Ashish Singh, Avi Flax,
Damian Guy, Dustin Cote, Edoardo Comar, Eno Thereska, Ewen
Cheslack-Postava, Flavio Junqueira, Florian Hussonnois, Geoff Anderson,
Grant Henke, Greg Fodor, Guozhang Wang, Gwen Shapira, Henry Cai, Ismael
Juma, Jason Gustafson, Jeff Klukas, Jendrik Poloczek, Jeyhun Karimov,
Liquan Pei, Manikumar Reddy O, Mathieu Fenniak, Matthias J. Sax, Maysam
Yabandeh, Mayuresh Gharat, Mickael Maison, Moritz Siuts, Onur Karaman,
Philippe Derome, Rajini Sivaram, Rollulus, Ryan Pridgeon, Samuel Taylor,
Sebastien Launay, Sriharsha Chintalapani, Tao Xiao, Todd Palino, Tom
Crayford, Tom Rybak, Vahid Hashemian, Wan Wenli, Yuto Kawamura.

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
http://kafka.apache.org/

Thanks,
Ismael


[ANNOUCE] Apache Kafka 0.10.0.1 Released

2016-08-10 Thread Ismael Juma
The Apache Kafka community is pleased to announce the release for Apache
Kafka 0.10.0.1.
This is a bug fix release that fixes 53 issues in 0.10.0.0.

All of the changes in this release can be found in the release notes:
*https://archive.apache.org/dist/kafka/0.10.0.1/RELEASE_NOTES.html
*

Apache Kafka is high-throughput, publish-subscribe messaging system
rethought of as a distributed commit log.

** Fast => A single Kafka broker can handle hundreds of megabytes of reads
and writes per second from thousands of clients.

** Scalable => Kafka is designed to allow a single cluster to serve as the
central data backbone for a large organization. It can be elastically and
transparently expanded without downtime. Data streams are partitioned
and spread over a cluster of machines to allow data streams larger than
the capability of any single machine and to allow clusters of co-ordinated
consumers.

** Durable => Messages are persisted on disk and replicated within the
cluster to prevent data loss. Each broker can handle terabytes of messages
without performance impact.

** Distributed by Design => Kafka has a modern cluster-centric design that
offers strong durability and fault-tolerance guarantees.

You can download the source release from
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/k
afka-0.10.0.1-src.tgz

and binary releases from
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/k
afka_2.10-0.10.0.1.tgz
https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/k
afka_2.11-0.10.0.1.tgz

A big thank you for the following people who have contributed to the
0.10.0.1 release.

Alex Glikson, Alex Loddengaard, Alexey Romanchuk, Ashish Singh, Avi Flax,
Damian Guy, Dustin Cote, Edoardo Comar, Eno Thereska, Ewen
Cheslack-Postava, Flavio Junqueira, Florian Hussonnois, Geoff Anderson,
Grant Henke, Greg Fodor, Guozhang Wang, Gwen Shapira, Henry Cai, Ismael
Juma, Jason Gustafson, Jeff Klukas, Jendrik Poloczek, Jeyhun Karimov,
Liquan Pei, Manikumar Reddy O, Mathieu Fenniak, Matthias J. Sax, Maysam
Yabandeh, Mayuresh Gharat, Mickael Maison, Moritz Siuts, Onur Karaman,
Philippe Derome, Rajini Sivaram, Rollulus, Ryan Pridgeon, Samuel Taylor,
Sebastien Launay, Sriharsha Chintalapani, Tao Xiao, Todd Palino, Tom
Crayford, Tom Rybak, Vahid Hashemian, Wan Wenli, Yuto Kawamura.

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
http://kafka.apache.org/

Thanks,
Ismael


Consumer skip/ignore uncommitted message

2016-08-10 Thread sat
Hi,

We have come up with Kafka consumer and we only have 1 consumer in the
group. We are using Kafka 0.9. We are able to consume messages as soon as
we start polling, however for some reason our consumer gets crashed after
processing the message but before manually committing it.

When we restart our consumer after 15min, we observe the message that we
processed being polled again and also other messages that were published in
the 15mins interval.

We want to ignore these messages and start consuming the messages that
comes after the consumer is restarted. Please let us know how to achieve
this.

Kafka consumer consumed 10 messages and while processing 11th message
before committing our consumer went down. We started after 15min, and there
were 20 messages in that whole topic, we dont want to fetch/consume
messages 11 - 20, we want only new messages that is from 21 (message/record
id).

Please let us know if more details needed.

Thanks
A.SathishKumar


Re: [ANNOUCE] Apache Kafka 0.10.0.1 Released

2016-08-10 Thread Gwen Shapira
Woohoo!

Thank you, Ismael! You make a great release manager :)

On Wed, Aug 10, 2016 at 5:01 PM, Ismael Juma  wrote:
> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.0.1.
> This is a bug fix release that fixes 53 issues in 0.10.0.0.
>
> All of the changes in this release can be found in the release notes:
> *https://archive.apache.org/dist/kafka/0.10.0.1/RELEASE_NOTES.html
> *
>
> Apache Kafka is high-throughput, publish-subscribe messaging system
> rethought of as a distributed commit log.
>
> ** Fast => A single Kafka broker can handle hundreds of megabytes of reads
> and writes per second from thousands of clients.
>
> ** Scalable => Kafka is designed to allow a single cluster to serve as the
> central data backbone for a large organization. It can be elastically and
> transparently expanded without downtime. Data streams are partitioned
> and spread over a cluster of machines to allow data streams larger than
> the capability of any single machine and to allow clusters of co-ordinated
> consumers.
>
> ** Durable => Messages are persisted on disk and replicated within the
> cluster to prevent data loss. Each broker can handle terabytes of messages
> without performance impact.
>
> ** Distributed by Design => Kafka has a modern cluster-centric design that
> offers strong durability and fault-tolerance guarantees.
>
> You can download the source release from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/kafka-0.10.0.1
> -src.tgz
>
> and binary releases from
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/kafka_2.10-0.10.
> 0.1.tgz
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.1/kafka_2.11-0.10.
> 0.1.tgz
>
> A big thank you for the following people who have contributed to the
> 0.10.0.1 release.
>
> Alex Glikson, Alex Loddengaard, Alexey Romanchuk, Ashish Singh, Avi Flax,
> Damian Guy, Dustin Cote, Edoardo Comar, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, Florian Hussonnois, Geoff Anderson,
> Grant Henke, Greg Fodor, Guozhang Wang, Gwen Shapira, Henry Cai, Ismael
> Juma, Jason Gustafson, Jeff Klukas, Jendrik Poloczek, Jeyhun Karimov,
> Liquan Pei, Manikumar Reddy O, Mathieu Fenniak, Matthias J. Sax, Maysam
> Yabandeh, Mayuresh Gharat, Mickael Maison, Moritz Siuts, Onur Karaman,
> Philippe Derome, Rajini Sivaram, Rollulus, Ryan Pridgeon, Samuel Taylor,
> Sebastien Launay, Sriharsha Chintalapani, Tao Xiao, Todd Palino, Tom
> Crayford, Tom Rybak, Vahid Hashemian, Wan Wenli, Yuto Kawamura.
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
> Ismael



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: Kafka Connect questions

2016-08-10 Thread Gwen Shapira
Maybe you need a different KafkaConnect source? I think this one may
fit your needs better:
https://github.com/jcustenborder/kafka-connect-spooldir

It was built to copy data from files in directory into Kafka...



On Wed, Aug 10, 2016 at 8:56 AM, Jeyhun Karimov  wrote:
> Hi community,
>
> I am using Kafka-Connect to import text files to kafka. Because I have to
> give exact file name (like ./data.txt) and not the parent directory (./*)
> as an input to Kafka-Connect, how can I import files in particular
> directory as they are created in runtime? The solutions that come to my
> mind are below but I don't know about their efficiency.
> - I may create a new file in particular directory (after the kafka-connect
> started already) and start new connect instance to point that file
> -or I may write to only one file (just one ./data.txt) all the time (and no
> other data files created)  and kafka-connect will transfer the data only
> from one file and it will from where it left (as data writes are made)
> -or any other way to efficiently handle this use case.
>
> My second question is, is it safe to  override the data transfer methods of
> kafka-connect? For example I want to put thread.sleeps in kafka-streams
> side while transferring data and see the behaviour in kafka side or in
> application side. You can think of as simulation of load.
>
> Cheers
> Jeyhun
>
>
> --
> -Cheers
>
> Jeyhun



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: Verify log compaction

2016-08-10 Thread David Yu
For some reason, I cannot find log-cleaner.log anywhere.

On Fri, Jul 29, 2016 at 8:20 PM John Holland <
john.holl...@objectpartners.com> wrote:

> Check the log-cleaner.log file on the server.  When the thread runs you'll
> see output for every partition it compacts and the compaction ratio it
> achieved.
>
> The __consumer_offsets topic is compacted, I see log output from it being
> compacted frequently.
>
> Depending on your settings for the topic it may take a while for it to
> compact.  Compaction doesn't occur on the current log segment.  Look at
> these settings for the topic, "segment.bytes" and "segment.ms".  Lower
> them
> to force quicker compaction.
>
> On 0.8.2.2, occasionally, the compaction thread would die and then the
> __consumer_offets topic would grow out of control.  Kafka logs the thread
> death in the log-cleaner.log.
>
>
> On Fri, Jul 29, 2016 at 4:10 PM David Yu  wrote:
>
> > Hi,
> >
> > We are using Kafka 0.9.0.0. One of our topic is set to use log
> compaction.
> > We have also set log.cleaner.enable. However, we suspected that the topic
> > is not being compacted.
> >
> > What is the best way for us to verify the compaction is happening?
> >
> > Thanks,
> > David
> >
>


kafka broker is dropping the messages after acknowledging librdkafka

2016-08-10 Thread Mazhar Shaikh
Hi Kafka Team,

I'm using kafka (kafka_2.11-0.9.0.1) with librdkafka (0.8.1) API for
producer
During a run of 2hrs, I notice the total number of messaged ack'd by
librdkafka delivery report is greater than the maxoffset of a partition in
kafka broker.
I'm running kafka broker with replication factor of 2.

Here, message has been lost between librdkafka - kafka broker.

As librdkafka is providing success delivery report for all the messages.

Looks like kafka broker is dropping the messages after acknowledging
librdkafka.

Requesting you help in solving this issue.

Thank you.


Regards
Mazhar Shaikh