I don't have it reproduced in a sandbox environment, but it's already happened twice on that cluster, so it's a safe bet to say it's reproducible in that setup. Are there special metrics / events that I should capture to make debugging this easier?
Thanks, Karol On Tue, Dec 2, 2014 at 11:20 PM, Jun Rao <jun...@gmail.com> wrote: > Is there an easy way to reproduce the issues that you saw? > > Thanks, > > Jun > > On Mon, Dec 1, 2014 at 6:31 AM, Karol Nowak <gryw...@gmail.com> wrote: > > > Hi, > > > > I observed some error messages / exceptions while running partition > > reassignment on kafka 0.8.1.1 cluster. Being fairly new to this system > I'm > > not sure if these indicate serious failures or transient problems, or if > > manual intervention is needed. > > > > I used kafka-reassign-partitions.sh to reassign partitions from brokers > > {143,155,155,93} to {143,155,115,68} on a healthy (?) cluster. Right now > > one partition has just two replicas in the ISR and a number of partitions > > is left with 4 partitions in ISR even though replication factor is 3. > Logs > > show a few zookeeper timeouts, but there were no GC pauses anywhere near > > the session timeout. Zookeeper itself seems healthy and not overloaded, > > with exception of regular CPU spikes, probably related to snapshots. > > > > I cleaned the log lines a little bit for brevity. > > > > First example: https://gist.github.com/knowak/a682afc1545fdeb836a1 > > Second one with two similar stack traces: > > https://gist.github.com/knowak/6398be433d869d8141e5 > > Third one, many many of these: > > https://gist.github.com/knowak/e78301259b74841702ae > > Fourth: https://gist.github.com/knowak/1fbde5ca90d8f1924141 > > Fifth:https://gist.github.com/knowak/57fdcb75b3dc7c626893 > > > > Hints? > > > > > > Thanks, > > Karol > > > -- pozdrawiam Karol Nowak http://knowak.wordpress.com