But I think Maxime's point is valid. In the common case that you don't care
about every single message and are willing to tolerate a little loss 0.8
will seem like a pretty big step back.

I don't think the solution of just randomly partitioning works because you
will still produce 1/nth of your data to a dead broker which will lead to
timeouts or errors and will in any case be slow.

I think we are pretty hesitant to add functionality to 0.8, but it might be
worth thinking through the simplest thing that would make this work in 0.8.
I haven't looked at it, but I think we do automatically refresh metadata
when there is a failure, so maybe passing in the metadata to the producer
and making the RandomPartitioner make use of this properly would not be too
big of a change.

-Jay


On Wed, Jan 9, 2013 at 9:34 AM, Jun Rao <jun...@gmail.com> wrote:

> Maxime,
>
> First of all, in 0.8, you can choose to have a replication factor of 2. It
> just means that one can tolerate only one broker failure.
>
> Second, our producer logic supports retries on failure. If a message can't
> be sent, on retry, our default partitioner will select another random
> partition to send the message to. So, with a replication factor of 1,
> assuming that you have enough partitions (partitions are typically spread
> over all brokers) and enough retries, the message is likely to be
> delivered.
>
> To improve this, another possibility is to create a new type of partitioner
> that will route messages randomly to only those partitions with a leader.
> However, this likely requires a new interface of the partitioner. In
> addition to number of partitions, the partitioner needs to know whether
> each partition is available or not.
>
> Thanks,
>
> Jun
>
> On Wed, Jan 9, 2013 at 8:43 AM, Maxime Brugidou
> <maxime.brugi...@gmail.com>wrote:
>
> > Thanks for your response. I think the work-around is not really
> acceptable
> > for me since it will consume 3x the resources (because replication of 3
> is
> > the minimum acceptable) and it will still make the cluster less available
> > anyway (unless i have only 3 brokers).
> >
> > The thing is that 0.7 was making the cluster 100% available (for my use
> > case, accepting data loss) as long a single broker was alive.
> >
> > A way to handle this would be to:
> > 1. Have a lot of partitions per topic (more than the # of brokers)
> > 2. Have something that rebalances the partitions and make sure a broker
> has
> > a at least a partition for each topic (to make every topic "available")
> > 3. Have a setting in the consumer/producer that say "I don't care about
> > partitioning, just produce/consume wherever you can"
> >
> > This is probably not simple to implement, I'll add these ideas in the
> JIRA
> > and will pursue the discussion there.
> >
> > Maxime
> >
> > On Wed, Jan 9, 2013 at 5:18 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > As a work around in the meantime you can probably run with
> > > replication--although it sounds like you don't really need it, it
> > shouldn't
> > > hurt.
> > >
> >
>

Reply via email to