That will work Jun (I guess it's not different than the current situation with 0.7.x).
(And I still think it should be possible to recover logs in parallel!). Jason On Tue, May 7, 2013 at 7:55 AM, Jun Rao <jun...@gmail.com> wrote: > If a broker is down, the cluster will be running in under replicated mode, > ie, data will be written to fewer replicas. When the broker comes back, it > will catch up data from the current leader. > > Thanks, > > Jun > > > On Mon, May 6, 2013 at 10:23 PM, Jason Rosenberg <j...@squareup.com> wrote: > > > Will producers also be able to start sending new messages to a replica, > > while one broker is taking a long time to startup? > > > > > > On Mon, May 6, 2013 at 9:31 PM, Jun Rao <jun...@gmail.com> wrote: > > > > > In 0.8, if you turn on replication, it may not matter too much if a > > broker > > > takes long to start up since data can still be served from the > replicas. > > It > > > may be possible to improve this by maintaining a flush checkpoint file > on > > > disk. We can then use that info to reduce the amount of the data to be > > > recovered. > > > > > > Thanks, > > > > > > Jun > > > > > > > > > On Mon, May 6, 2013 at 3:07 PM, Jason Rosenberg <j...@squareup.com> > > wrote: > > > > > > > Recently, we had an issue where our kafka brokers were shut down hard > > > (and > > > > so did not write out the clean shutdown file). Thus on restart, it > > went > > > > through all logs and ran a recovery on them. > > > > > > > > Unfortunately, this took a long time (on the order of 30 minutes). > We > > > have > > > > a lot of topics (e.g. ~1000 or so). Is there anyway this can be done > > > more > > > > quickly, say in parallel? > > > > > > > > Also, it be done as a background process, so the server can start up > > and > > > > start receiving messages, logs for incoming topics are prioritized in > > the > > > > recovery process, and perhaps messages can still be buffered in > memory > > > > while the log recovery is happening? > > > > > > > > It seems onerous to block all activity for 30 minutes while a slow, > > > serial, > > > > recovery job happens.... > > > > > > > > Thoughts? > > > > > > > > Jason > > > > > > > > > >