Re: Question about recovering from outage

2014-05-19 Thread Jonas Bergström
I'm running Kafka 0.8.

/ Jonas


2014-05-18 23:45 GMT+02:00 Jonas Bergström :

> Hi all, and thanks for a fantastic product.
>
> The other day our kafka node in our test environment went down due to full
> disc. I reconfigured kafka to save fewer messages, and restarted the node.
> It is a single node setup. At restart the node freed up some disc space,
> but no new messages where accepted. In the log we saw this:
>
> WARN [KafkaApi-0] Produce request with correlation id 12680557 from
> client  on partition [logs,0] failed due to Partition [logs,0] doesn't
> exist on 0 (kafka.server.KafkaApis)
>
> List-topics showed:
>
> topic: logs partition: 0 leader: 0 replicas: 0 isr: 0
>
> which seemed fine, but I figured I might have to reassign the topic
> partition anyway, so I did. Nothing seemed to happen, neither in the logs
> or in the status. Then I got another thing to take care of for awhile, and
> realized about 30 minutes later that the node started working again!
>
> Is this expected behavior? How long does a node take to "get online" again
> after a crash-restart? Is there a way to tell that the node is on it's way
> up?
>
>
> Thanks / Jonas
>


Re: ISR not updating

2014-05-19 Thread Shone Sadler
The value of under replicated partitions is 0 across the cluster.

Thanks,
Shone


On Mon, May 19, 2014 at 12:23 AM, Jun Rao  wrote:

> What's the value of under replicated partitions JMX in each broker?
>
> Thanks,
>
> Jun
>
>
> On Sat, May 17, 2014 at 6:16 PM, Paul Mackles  wrote:
>
> > Today we did a rolling restart of ZK. We also restarted the kafka
> > controller and ISRs are still not being updated in ZK. Again, the cluster
> > seems fine and the replicas in question do appear to be getting updated.
> I
> > am guessing there must be some bad state persisted in ZK.
> >
> > On 5/17/14 7:50 PM, "Shone Sadler"  wrote:
> >
> > >Hi Jun,
> > >
> > >I work with Paul and am monitoring the cluster as well.   The status has
> > >not changed.
> > >
> > >When we execute kafka-list-topic we are seeing the following (showing
> one
> > >of two partitions having the problem)
> > >
> > >topic: t1 partition: 33 leader: 1 replicas: 1,2,3 isr: 1
> > >
> > >when inspecting the logs of leader: I do see a spurt of ISR
> > >shrinkage/expansion  around the time that the brokers were partitioned
> > >from
> > >ZK. But nothing past the last message "Cached zkVersion [17] not equal
> to
> > >that in zookeeper." from  yesterday.  There are not constant changes to
> > >the
> > >ISR list.
> > >
> > >Is there a way to force the leader to update ZK with the latest ISR
> list?
> > >
> > >Thanks,
> > >Shone
> > >
> > >Logs:
> > >
> > >cat server.log | grep "\[t1,33\]"
> > >
> > >[2014-04-18 10:16:32,814] INFO [ReplicaFetcherManager on broker 1]
> > >Removing
> > >fetcher for partition [t1,33] (kafka.server.ReplicaFetcherManager)
> > >[2014-05-13 19:42:10,784] ERROR [KafkaApi-1] Error when processing fetch
> > >request for partition [t1,33] offset 330118156 from consumer with
> > >correlation id 0 (kafka.server.KafkaApis)
> > >[2014-05-14 11:02:25,255] ERROR [KafkaApi-1] Error when processing fetch
> > >request for partition [t1,33] offset 332896470 from consumer with
> > >correlation id 0 (kafka.server.KafkaApis)
> > >[2014-05-16 12:00:11,344] INFO Partition [t1,33] on broker 1: Shrinking
> > >ISR
> > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition)
> > >[2014-05-16 12:00:18,009] INFO Partition [t1,33] on broker 1: Cached
> > >zkVersion [17] not equal to that in zookeeper, skip updating ISR
> > >(kafka.cluster.Partition)
> > >[2014-05-16 13:33:11,344] INFO Partition [t1,33] on broker 1: Shrinking
> > >ISR
> > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition)
> > >[2014-05-16 13:33:12,403] INFO Partition [t1,33] on broker 1: Cached
> > >zkVersion [17] not equal to that in zookeeper, skip updating ISR
> > >(kafka.cluster.Partition)
> > >
> > >
> > >On Sat, May 17, 2014 at 11:44 AM, Jun Rao  wrote:
> > >
> > >> Do you see constant ISR shrinking/expansion of those two partitions in
> > >>the
> > >> leader broker's log ?
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >>
> > >> On Fri, May 16, 2014 at 4:25 PM, Paul Mackles 
> > >>wrote:
> > >>
> > >> > Hi - We are running kafka_2.8.0-0.8.0-beta1 (we are a little behind
> in
> > >> > upgrading).
> > >> >
> > >> > From what I can tell, connectivity to ZK was lost for a brief
> period.
> > >>The
> > >> > cluster seemed to recover OK except that we now have 2 (out of 125)
> > >> > partitions where the ISR appears to be out of date. In other words,
> > >> > kafka-list-topic is showing only one replica in the ISR for the 2
> > >> > partitions in question (there should be 3).
> > >> >
> > >> > What's odd is that in looking at the log segments for those
> > >>partitions on
> > >> > the file system, I can see that they are in fact getting updated and
> > >>by
> > >> all
> > >> > measures look to be in sync. I can also see that the brokers where
> the
> > >> > out-of-sync replicas reside are doing fine and leading other
> > >>partitions
> > >> > like nothing ever happened. Based on that, it seems like the ISR in
> > >>ZK is
> > >> > just out-of-date due to a botched recovery from the brief ZK outage.
> > >> >
> > >> > Has anyone seen anything like this before? I saw this ticket which
> > >> sounded
> > >> > similar:
> > >> >
> > >> > https://issues.apache.org/jira/browse/KAFKA-948
> > >> >
> > >> > Anyone have any suggestions for recovering from this state? I was
> > >> thinking
> > >> > of running the preferred-replica-election tool next to see if that
> > >>gets
> > >> the
> > >> > ISRs in ZK back in sync.
> > >> >
> > >> > After that, I guess the next step would be to bounce the kafka
> > >>servers in
> > >> > question.
> > >> >
> > >> > Thanks,
> > >> > Paul
> > >> >
> > >> >
> > >>
> >
> >
>


RE: starting of at a small scale, single ec2 instance with 7.5 GB RAM with kafka

2014-05-19 Thread S Ahmed
Hi,

I like how kafka operates, but I'm wondering if it is possible to run
everything on a single ec2 instance with 7.5 GB RAM.

So that would be zookeeper and a single kafka broker.

I would have a separate server to consume from the broker.

Producers would be from my web servers.


I don't want to complicate things as i don't really need failover or
redundancy etc.  I just want to keep things simple.

I'll have a single topic, and a few partitions because I want the guarantee
that the messages are in order.


Is this something that would be really out of the norm and not recommended?
i.e. nobody really uses it this way and who knows what is going to happen?
:)


Make kafka storage engine pluggable and provide a HDFS plugin?

2014-05-19 Thread Hangjun Ye
Hi there,

I recently started to use Kafka for our data analysis pipeline and it works
very well.

One problem to us so far is expanding our cluster when we need more storage
space.
Kafka provides some scripts for helping do this but the process wasn't
smooth.

To make it work perfectly, seems Kafka needs to do some jobs that a
distributed file system has already done.
So just wondering if any thoughts to make Kafka work on top of HDFS? Maybe
make the Kafka storage engine pluggable and HDFS is one option?

The pros might be that HDFS has already handled storage management
(replication, corrupted disk/machine, migration, load balance, etc.) very
well and it frees Kafka and the users from the burden, and the cons might
be performance degradation.
As Kafka does very well on performance, possibly even with some degree of
degradation, it's still competitive for the most situations.

Best,
-- 
Hangjun Ye


Re: ISR not updating

2014-05-19 Thread Jun Rao
Ok. That does indicate the ISR should include all replicas. Which version
of ZK server are you using? Could you check the ZK server log to see if
there if the ISR is being updated?

Thanks,

Jun


On Mon, May 19, 2014 at 1:30 AM, Shone Sadler wrote:

> The value of under replicated partitions is 0 across the cluster.
>
> Thanks,
> Shone
>
>
> On Mon, May 19, 2014 at 12:23 AM, Jun Rao  wrote:
>
> > What's the value of under replicated partitions JMX in each broker?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Sat, May 17, 2014 at 6:16 PM, Paul Mackles 
> wrote:
> >
> > > Today we did a rolling restart of ZK. We also restarted the kafka
> > > controller and ISRs are still not being updated in ZK. Again, the
> cluster
> > > seems fine and the replicas in question do appear to be getting
> updated.
> > I
> > > am guessing there must be some bad state persisted in ZK.
> > >
> > > On 5/17/14 7:50 PM, "Shone Sadler"  wrote:
> > >
> > > >Hi Jun,
> > > >
> > > >I work with Paul and am monitoring the cluster as well.   The status
> has
> > > >not changed.
> > > >
> > > >When we execute kafka-list-topic we are seeing the following (showing
> > one
> > > >of two partitions having the problem)
> > > >
> > > >topic: t1 partition: 33 leader: 1 replicas: 1,2,3 isr: 1
> > > >
> > > >when inspecting the logs of leader: I do see a spurt of ISR
> > > >shrinkage/expansion  around the time that the brokers were partitioned
> > > >from
> > > >ZK. But nothing past the last message "Cached zkVersion [17] not equal
> > to
> > > >that in zookeeper." from  yesterday.  There are not constant changes
> to
> > > >the
> > > >ISR list.
> > > >
> > > >Is there a way to force the leader to update ZK with the latest ISR
> > list?
> > > >
> > > >Thanks,
> > > >Shone
> > > >
> > > >Logs:
> > > >
> > > >cat server.log | grep "\[t1,33\]"
> > > >
> > > >[2014-04-18 10:16:32,814] INFO [ReplicaFetcherManager on broker 1]
> > > >Removing
> > > >fetcher for partition [t1,33] (kafka.server.ReplicaFetcherManager)
> > > >[2014-05-13 19:42:10,784] ERROR [KafkaApi-1] Error when processing
> fetch
> > > >request for partition [t1,33] offset 330118156 from consumer with
> > > >correlation id 0 (kafka.server.KafkaApis)
> > > >[2014-05-14 11:02:25,255] ERROR [KafkaApi-1] Error when processing
> fetch
> > > >request for partition [t1,33] offset 332896470 from consumer with
> > > >correlation id 0 (kafka.server.KafkaApis)
> > > >[2014-05-16 12:00:11,344] INFO Partition [t1,33] on broker 1:
> Shrinking
> > > >ISR
> > > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition)
> > > >[2014-05-16 12:00:18,009] INFO Partition [t1,33] on broker 1: Cached
> > > >zkVersion [17] not equal to that in zookeeper, skip updating ISR
> > > >(kafka.cluster.Partition)
> > > >[2014-05-16 13:33:11,344] INFO Partition [t1,33] on broker 1:
> Shrinking
> > > >ISR
> > > >for partition [t1,33] from 3,1,2 to 1 (kafka.cluster.Partition)
> > > >[2014-05-16 13:33:12,403] INFO Partition [t1,33] on broker 1: Cached
> > > >zkVersion [17] not equal to that in zookeeper, skip updating ISR
> > > >(kafka.cluster.Partition)
> > > >
> > > >
> > > >On Sat, May 17, 2014 at 11:44 AM, Jun Rao  wrote:
> > > >
> > > >> Do you see constant ISR shrinking/expansion of those two partitions
> in
> > > >>the
> > > >> leader broker's log ?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jun
> > > >>
> > > >>
> > > >> On Fri, May 16, 2014 at 4:25 PM, Paul Mackles 
> > > >>wrote:
> > > >>
> > > >> > Hi - We are running kafka_2.8.0-0.8.0-beta1 (we are a little
> behind
> > in
> > > >> > upgrading).
> > > >> >
> > > >> > From what I can tell, connectivity to ZK was lost for a brief
> > period.
> > > >>The
> > > >> > cluster seemed to recover OK except that we now have 2 (out of
> 125)
> > > >> > partitions where the ISR appears to be out of date. In other
> words,
> > > >> > kafka-list-topic is showing only one replica in the ISR for the 2
> > > >> > partitions in question (there should be 3).
> > > >> >
> > > >> > What's odd is that in looking at the log segments for those
> > > >>partitions on
> > > >> > the file system, I can see that they are in fact getting updated
> and
> > > >>by
> > > >> all
> > > >> > measures look to be in sync. I can also see that the brokers where
> > the
> > > >> > out-of-sync replicas reside are doing fine and leading other
> > > >>partitions
> > > >> > like nothing ever happened. Based on that, it seems like the ISR
> in
> > > >>ZK is
> > > >> > just out-of-date due to a botched recovery from the brief ZK
> outage.
> > > >> >
> > > >> > Has anyone seen anything like this before? I saw this ticket which
> > > >> sounded
> > > >> > similar:
> > > >> >
> > > >> > https://issues.apache.org/jira/browse/KAFKA-948
> > > >> >
> > > >> > Anyone have any suggestions for recovering from this state? I was
> > > >> thinking
> > > >> > of running the preferred-replica-election tool next to see if that
> > > >>gets
> > > >> the
> > > >> > ISRs in ZK back in sync.
> > > >> >
> > > >> > After that

Re: Question about recovering from outage

2014-05-19 Thread Jun Rao
Do you think you could upgrade to 0.8.1.1? It fixed a bunch of corner cases
in the controller.

Thanks,

Jun


On Mon, May 19, 2014 at 12:00 AM, Jonas Bergström wrote:

> I'm running Kafka 0.8.
>
> / Jonas
>
>
> 2014-05-18 23:45 GMT+02:00 Jonas Bergström :
>
> > Hi all, and thanks for a fantastic product.
> >
> > The other day our kafka node in our test environment went down due to
> full
> > disc. I reconfigured kafka to save fewer messages, and restarted the
> node.
> > It is a single node setup. At restart the node freed up some disc space,
> > but no new messages where accepted. In the log we saw this:
> >
> > WARN [KafkaApi-0] Produce request with correlation id 12680557 from
> > client  on partition [logs,0] failed due to Partition [logs,0] doesn't
> > exist on 0 (kafka.server.KafkaApis)
> >
> > List-topics showed:
> >
> > topic: logs partition: 0 leader: 0 replicas: 0 isr: 0
> >
> > which seemed fine, but I figured I might have to reassign the topic
> > partition anyway, so I did. Nothing seemed to happen, neither in the logs
> > or in the status. Then I got another thing to take care of for awhile,
> and
> > realized about 30 minutes later that the node started working again!
> >
> > Is this expected behavior? How long does a node take to "get online"
> again
> > after a crash-restart? Is there a way to tell that the node is on it's
> way
> > up?
> >
> >
> > Thanks / Jonas
> >
>


Kafka Migration Tool

2014-05-19 Thread Mo Firouz
Hi all,

I'm trying to migrate from Kafka 0.7.2-2.9.2 (with Zookeeper 3.3.4 from
Cloudera) to Kafka 0.8.1.1-2.9.2 (with official Zookeeper 3.4.5 ) - However
hitting brick walls with a very mysterious problem:

6) at kafka.tools.KafkaMigrationTool.main(KafkaMigrationTool.java:217)
Caused by: java.lang.NumberFormatException: For input string:
""1400511498394","host""

I've attached logs, scripts and configs from everything that I'm trying to
run.

FYI: We have three servers for Kafka Brokers and three servers for
Zookeeper. They are running on Staging (stag) as number s04, s05 and s06.
I've only given the properties for s04 as the other two are almost
identical.

Thanks,
Cheers,
Mo.


start-migration.sh
Description: Bourne shell script


Re: What happens to Kafka when ZK lost its quorum?

2014-05-19 Thread Guozhang Wang
Hi Weide/Connie,

I have added this entry in the FAQ, please let me know if anything on the
wiki is not clear to you.

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowareKafkabrokersdependonZookeeper
?

Guozhang


On Wed, May 14, 2014 at 8:15 AM, Neha Narkhede wrote:

> Kafka requires a functional and healthy zookeeper setup. It is recommended
> that you closely monitor your zookeeper cluster and provision it so that it
> is performant.
>
> Thanks,
> Neha
>
>
> On Tue, May 13, 2014 at 6:52 AM, Connie Yang 
> wrote:
>
> > Hi all,
> >
> > Can Kafka producers, brokers and consumers still be processing messages
> and
> > functioning in their normal states if Zookeeper lost its quorum?
> >
> > Thanks,
> > Connie
> >
>



-- 
-- Guozhang


Cause and Recovery

2014-05-19 Thread Mingtao Zhang
Hi,

I got this from the log:

 [exec] 03:50:06.216 [ProducerSendThread-] [1;31mERROR [0;39m
[1;35mk.producer.async.ProducerSendThread [0;39m - Error in handling batch
of 100 events
 [exec] kafka.common.FailedToSendMessageException: Failed to send
messages after 3 tries.
 [exec] at kafka.producer.async.DefaultEventHandler.handle(Unknown
Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
 [exec] at kafka.producer.async.ProducerSendThread.tryToHandle(Unknown
Source) [kafka_2.10-0.8.0.jar:0.8.0]
 [exec] at
kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(Unknown
Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
 [exec] at
kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(Unknown
Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
 [exec] at scala.collection.immutable.Stream.foreach(Stream.scala:547)
~[scala-library-2.10.1.jar:na]
 [exec] at
kafka.producer.async.ProducerSendThread.processEvents(Unknown Source)
[kafka_2.10-0.8.0.jar:0.8.0]
 [exec] at kafka.producer.async.ProducerSendThread.run(Unknown Source)
[kafka_2.10-0.8.0.jar:0.8.0]

My question is:

1. What could cause this?
2. Who should deal with the recovery? User or Kafka?

Let me know if more log is needed. (It hangs the integration test, so I
have 3.5G of them ... from a maven build)

-- 

Best Regards,
Mingtao Zhang


Re: Cause and Recovery

2014-05-19 Thread Guozhang Wang
Hi Mingtao,

1. Do you see any error/warn log before this entry?
2. The producer will re-try on sending when the previous trial is not
successful. In your case it has exhausted all the retries and hence
messages are dropped on this floor and lost.

Guozhang


On Mon, May 19, 2014 at 8:32 AM, Mingtao Zhang wrote:

> Hi,
>
> I got this from the log:
>
>  [exec] 03:50:06.216 [ProducerSendThread-] [1;31mERROR [0;39m
> [1;35mk.producer.async.ProducerSendThread [0;39m - Error in handling batch
> of 100 events
>  [exec] kafka.common.FailedToSendMessageException: Failed to send
> messages after 3 tries.
>  [exec] at kafka.producer.async.DefaultEventHandler.handle(Unknown
> Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>  [exec] at kafka.producer.async.ProducerSendThread.tryToHandle(Unknown
> Source) [kafka_2.10-0.8.0.jar:0.8.0]
>  [exec] at
>
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(Unknown
> Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>  [exec] at
>
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(Unknown
> Source) ~[kafka_2.10-0.8.0.jar:0.8.0]
>  [exec] at scala.collection.immutable.Stream.foreach(Stream.scala:547)
> ~[scala-library-2.10.1.jar:na]
>  [exec] at
> kafka.producer.async.ProducerSendThread.processEvents(Unknown Source)
> [kafka_2.10-0.8.0.jar:0.8.0]
>  [exec] at kafka.producer.async.ProducerSendThread.run(Unknown Source)
> [kafka_2.10-0.8.0.jar:0.8.0]
>
> My question is:
>
> 1. What could cause this?
> 2. Who should deal with the recovery? User or Kafka?
>
> Let me know if more log is needed. (It hangs the integration test, so I
> have 3.5G of them ... from a maven build)
>
> --
>
> Best Regards,
> Mingtao Zhang
>



-- 
-- Guozhang


Re: Question about recovering from outage

2014-05-19 Thread Jonas Bergström
Ok, I'll upgrade.
Is there a way to see the status of a node that is recovering, e.g. in
zookeeper or via jmx?

/ Jonas


2014-05-19 16:49 GMT+02:00 Jun Rao :

> Do you think you could upgrade to 0.8.1.1? It fixed a bunch of corner cases
> in the controller.
>
> Thanks,
>
> Jun
>
>
> On Mon, May 19, 2014 at 12:00 AM, Jonas Bergström  >wrote:
>
> > I'm running Kafka 0.8.
> >
> > / Jonas
> >
> >
> > 2014-05-18 23:45 GMT+02:00 Jonas Bergström :
> >
> > > Hi all, and thanks for a fantastic product.
> > >
> > > The other day our kafka node in our test environment went down due to
> > full
> > > disc. I reconfigured kafka to save fewer messages, and restarted the
> > node.
> > > It is a single node setup. At restart the node freed up some disc
> space,
> > > but no new messages where accepted. In the log we saw this:
> > >
> > > WARN [KafkaApi-0] Produce request with correlation id 12680557 from
> > > client  on partition [logs,0] failed due to Partition [logs,0] doesn't
> > > exist on 0 (kafka.server.KafkaApis)
> > >
> > > List-topics showed:
> > >
> > > topic: logs partition: 0 leader: 0 replicas: 0 isr: 0
> > >
> > > which seemed fine, but I figured I might have to reassign the topic
> > > partition anyway, so I did. Nothing seemed to happen, neither in the
> logs
> > > or in the status. Then I got another thing to take care of for awhile,
> > and
> > > realized about 30 minutes later that the node started working again!
> > >
> > > Is this expected behavior? How long does a node take to "get online"
> > again
> > > after a crash-restart? Is there a way to tell that the node is on it's
> > way
> > > up?
> > >
> > >
> > > Thanks / Jonas
> > >
> >
>


Consistent replication of an event stream into Kafka

2014-05-19 Thread Bob Potter
Hello,

We have a use case where we want to replicate an event stream which exists
outside of kafka into a kafka topic (single partition). The event stream
has sequence ids which always increase by 1. We want to preserve this
ordering.

The difficulty is that we want to be able to have the process that writes
these events automatically fail-over if it dies. While ZooKeeper can
guarantee a single writer at a given point in time we are worried about
delayed network packets, bugs and long GC pauses.

One solution we've thought of is to set the sequence_id as the key for the
Kafka messages and have a proxy running on each Kafka broker which refuses
to write new messages if they don't have the next expected key. This seems
to solve any issue we would have with badly behaving networks or processes.

Is there a better solution? Should we just handle these inconsistencies in
our consumers? Are we being too paranoid?

As a side-note, it seems like this functionality (guaranteeing that all
keys in a partition are in sequence on a particular topic) may be a nice
option to have in Kafka proper.

Thanks,
Bob


SocketServerStats not reporting bytes written or read

2014-05-19 Thread Xuyen On
Hi all,

I have an intermittent problem with the JMX SocketServer stats on my 0.7.2 
Kafka cluster.
I'm collecting the SocketServerStats with jmxstats and everything seems to be 
working fine except kafka.SocketServerStats:BytesWrittenPerSecond and 
kafka.SocketServerStats:BytesReadPerSecond are not working all the time. It 
sometimes will cut out and not report any traffic and then it will randomly 
report back normal stats. I've noticed that when I started a new topic and 
starting sending data with a new producer, the stats for bytes written and read 
will suddenly zero out. Funny thing is that the other stats seem  to still be 
working fine including cumulative bytes read and written. 

Does anyone know what might be causing this and how I can fix it?

Thanks,

Xuyen





Re: Kafka Migration Tool

2014-05-19 Thread Jun Rao
It seems that you may have set the zk connection string to one used by 0.8
Kafka brokers.

Thanks,

Jun


On Mon, May 19, 2014 at 8:34 AM, Mo Firouz  wrote:

> Hi all,
>
> I'm trying to migrate from Kafka 0.7.2-2.9.2 (with Zookeeper 3.3.4 from
> Cloudera) to Kafka 0.8.1.1-2.9.2 (with official Zookeeper 3.4.5 ) - However
> hitting brick walls with a very mysterious problem:
>
> 6) at kafka.tools.KafkaMigrationTool.main(KafkaMigrationTool.java:217)
> Caused by: java.lang.NumberFormatException: For input string:
> ""1400511498394","host""
>
> I've attached logs, scripts and configs from everything that I'm trying to
> run.
>
> FYI: We have three servers for Kafka Brokers and three servers for
> Zookeeper. They are running on Staging (stag) as number s04, s05 and s06.
> I've only given the properties for s04 as the other two are almost
> identical.
>
> Thanks,
> Cheers,
> Mo.
>
>
>


Re: Question about recovering from outage

2014-05-19 Thread Jun Rao
In trunk, we have a JMX monitoring the states of each broker. One of the
states is log recovery.

Thanks,

Jun


On Mon, May 19, 2014 at 11:15 AM, Jonas Bergström wrote:

> Ok, I'll upgrade.
> Is there a way to see the status of a node that is recovering, e.g. in
> zookeeper or via jmx?
>
> / Jonas
>
>
> 2014-05-19 16:49 GMT+02:00 Jun Rao :
>
> > Do you think you could upgrade to 0.8.1.1? It fixed a bunch of corner
> cases
> > in the controller.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, May 19, 2014 at 12:00 AM, Jonas Bergström  > >wrote:
> >
> > > I'm running Kafka 0.8.
> > >
> > > / Jonas
> > >
> > >
> > > 2014-05-18 23:45 GMT+02:00 Jonas Bergström :
> > >
> > > > Hi all, and thanks for a fantastic product.
> > > >
> > > > The other day our kafka node in our test environment went down due to
> > > full
> > > > disc. I reconfigured kafka to save fewer messages, and restarted the
> > > node.
> > > > It is a single node setup. At restart the node freed up some disc
> > space,
> > > > but no new messages where accepted. In the log we saw this:
> > > >
> > > > WARN [KafkaApi-0] Produce request with correlation id 12680557 from
> > > > client  on partition [logs,0] failed due to Partition [logs,0]
> doesn't
> > > > exist on 0 (kafka.server.KafkaApis)
> > > >
> > > > List-topics showed:
> > > >
> > > > topic: logs partition: 0 leader: 0 replicas: 0 isr: 0
> > > >
> > > > which seemed fine, but I figured I might have to reassign the topic
> > > > partition anyway, so I did. Nothing seemed to happen, neither in the
> > logs
> > > > or in the status. Then I got another thing to take care of for
> awhile,
> > > and
> > > > realized about 30 minutes later that the node started working again!
> > > >
> > > > Is this expected behavior? How long does a node take to "get online"
> > > again
> > > > after a crash-restart? Is there a way to tell that the node is on
> it's
> > > way
> > > > up?
> > > >
> > > >
> > > > Thanks / Jonas
> > > >
> > >
> >
>


Re: SocketServerStats not reporting bytes written or read

2014-05-19 Thread Jun Rao
Is the problem with the jmx beans themselves or jmxstats?

Thanks,

Jun


On Mon, May 19, 2014 at 2:48 PM, Xuyen On  wrote:

> Hi all,
>
> I have an intermittent problem with the JMX SocketServer stats on my 0.7.2
> Kafka cluster.
> I'm collecting the SocketServerStats with jmxstats and everything seems to
> be working fine except kafka.SocketServerStats:BytesWrittenPerSecond and
> kafka.SocketServerStats:BytesReadPerSecond are not working all the time. It
> sometimes will cut out and not report any traffic and then it will randomly
> report back normal stats. I've noticed that when I started a new topic and
> starting sending data with a new producer, the stats for bytes written and
> read will suddenly zero out. Funny thing is that the other stats seem  to
> still be working fine including cumulative bytes read and written.
>
> Does anyone know what might be causing this and how I can fix it?
>
> Thanks,
>
> Xuyen
>
>
>
>


Re: Consistent replication of an event stream into Kafka

2014-05-19 Thread Guozhang Wang
Hello Bob,

What you described is similar to the idempotent producer design that we are
now discussing about:

https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer

Do you think this new feature will solve your case?

Guozhang


On Mon, May 19, 2014 at 2:40 PM, Bob Potter  wrote:

> Hello,
>
> We have a use case where we want to replicate an event stream which exists
> outside of kafka into a kafka topic (single partition). The event stream
> has sequence ids which always increase by 1. We want to preserve this
> ordering.
>
> The difficulty is that we want to be able to have the process that writes
> these events automatically fail-over if it dies. While ZooKeeper can
> guarantee a single writer at a given point in time we are worried about
> delayed network packets, bugs and long GC pauses.
>
> One solution we've thought of is to set the sequence_id as the key for the
> Kafka messages and have a proxy running on each Kafka broker which refuses
> to write new messages if they don't have the next expected key. This seems
> to solve any issue we would have with badly behaving networks or processes.
>
> Is there a better solution? Should we just handle these inconsistencies in
> our consumers? Are we being too paranoid?
>
> As a side-note, it seems like this functionality (guaranteeing that all
> keys in a partition are in sequence on a particular topic) may be a nice
> option to have in Kafka proper.
>
> Thanks,
> Bob
>



-- 
-- Guozhang