That will work Jun (I guess it's not different than the current situation
with 0.7.x).

(And I still think it should be possible to recover logs in parallel!).

Jason


On Tue, May 7, 2013 at 7:55 AM, Jun Rao <jun...@gmail.com> wrote:

> If a broker is down, the cluster will be running in under replicated mode,
> ie, data will be written to fewer replicas. When the broker comes back, it
> will catch up data from the current leader.
>
> Thanks,
>
> Jun
>
>
> On Mon, May 6, 2013 at 10:23 PM, Jason Rosenberg <j...@squareup.com> wrote:
>
> > Will producers also be able to start sending new messages to a replica,
> > while one broker is taking a long time to startup?
> >
> >
> > On Mon, May 6, 2013 at 9:31 PM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > In 0.8, if you turn on replication, it may not matter too much if a
> > broker
> > > takes long to start up since data can still be served from the
> replicas.
> > It
> > > may be possible to improve this by maintaining a flush checkpoint file
> on
> > > disk. We can then use that info to reduce the amount of the data to be
> > > recovered.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, May 6, 2013 at 3:07 PM, Jason Rosenberg <j...@squareup.com>
> > wrote:
> > >
> > > > Recently, we had an issue where our kafka brokers were shut down hard
> > > (and
> > > > so did not write out the clean shutdown file).  Thus on restart, it
> > went
> > > > through all logs and ran a recovery on them.
> > > >
> > > > Unfortunately, this took a long time (on the order of 30 minutes).
>  We
> > > have
> > > > a lot of topics (e.g. ~1000 or so).  Is there anyway this can be done
> > > more
> > > > quickly, say in parallel?
> > > >
> > > > Also, it be done as a background process, so the server can start up
> > and
> > > > start receiving messages, logs for incoming topics are prioritized in
> > the
> > > > recovery process, and perhaps messages can still be buffered in
> memory
> > > > while the log recovery is happening?
> > > >
> > > > It seems onerous to block all activity for 30 minutes while a slow,
> > > serial,
> > > > recovery job happens....
> > > >
> > > > Thoughts?
> > > >
> > > > Jason
> > > >
> > >
> >
>

Reply via email to