First, I am +1 on the design proposed in this PIP.

> Could you please explain that why the design will leads to unnecessary load 
> on the broker?
> IMO, I don't think it will bring huge expenses.

I did not mean the throttling design proposed by this PIP was
expensive. I meant that running the offloader within the broker puts
load on the broker, and because that load could be decoupled, I called
it unnecessary.

Thanks,
Michael

On Fri, Nov 11, 2022 at 8:10 AM Jiuming Tao
<jm...@streamnative.io.invalid> wrote:
>
> Hi Michael,
>
> > The main trade off with throttling is that we might
> > leave performance on the table if the cluster is not heavily utilized.
>
> It is indeed possible. When a broker don’t publish/consume many messages and 
> offloaders are in the throttling,
> the performance of the broker cannot be released.
> But offloader throttling exists as a bottom-line solution, it's reasonable.
>
>
> > It would not be a broker. It would be an offloader, and its sole task
> > would be offloading data.
>
>
> It surely a good solution, broker and offloader wouldn’t affect each other.
> But like you said, a new service would complicate the pulsar deployment.
>
>
> > I made my tangent because I think the current design leads to
> > unnecessary load on the broker
>
> Could you please explain that why the design will leads to unnecessary load 
> on the broker?
> IMO, I don't think it will bring huge expenses.
>
> >  which could be misinterpreted by the
> > load manager as a reason to load balance, which could interrupt
> > offloading
>
> It is indeed possible, because Offload takes longer, the chance of being 
> interrupted increases.
> But I think maybe there is no better solution for this.
>
> Thanks,
> Tao Jiuming
>
>
> > 2022年11月11日 上午4:04,Michael Marshall <mmarsh...@apache.org> 写道:
> >
> >> Yes, the PIP’s key point is protect the broker, to prevent offloading 
> >> takes too much broker resources.
> >
> > Throttling also protects the bookkeeper. Reads that are used to
> > offload data are the lowest priority reads since they are not serving
> > an actual client. Since we don't have a way to tell bookkeeper the
> > requested quality of service for a read operation, throttling is a
> > natural solution. The main trade off with throttling is that we might
> > leave performance on the table if the cluster is not heavily utilized.
> >
> >> Do you mean that add a new broker type? And this type broker only for 
> >> Offload processing?
> >
> > It would not be a broker. It would be an offloader, and its sole task
> > would be offloading data. The broker would still be the component that
> > serves reads from tiered storage.
> >
> >> I think that the introduction of a new broker type is relatively 
> >> heavyweight in order to implement offload throttling.
> >
> > The "offloader" component is independent of the throttling feature. We
> > can implement this PIP without addressing my tangent. That being said,
> > I made my tangent because I think the current design leads to
> > unnecessary load on the broker, which could be misinterpreted by the
> > load manager as a reason to load balance, which could interrupt
> > offloading. I would guess that interruption could force the offloader
> > to need to restart the task of offloading a ledger, which is very
> > inefficient.
> >
> > Thanks,
> > Michael
> >
> > On Thu, Nov 10, 2022 at 12:50 PM Jiuming Tao
> > <jm...@streamnative.io.invalid> wrote:
> >>
> >> Hi Michael,
> >>
> >>
> >>> This PIP is similar to autorecovery throttling. I think the feature
> >>> makes sense for the same reasons that throttling autorecovery makes
> >>> sense.
> >>
> >> Yes, the PIP’s key point is protect the broker, to prevent offloading 
> >> takes too much broker resources.
> >>
> >>> Tangentially, can we decouple writes to tiered storage from the broker
> >>> hosting the topic being offloaded?
> >>
> >> Do you mean that add a new broker type? And this type broker only for 
> >> Offload processing?
> >> I think that the introduction of a new broker type is relatively 
> >> heavyweight in order to implement offload throttling.
> >> We can do it in a simpler way
> >>
> >> Thanks,
> >> Tao Jiuming
> >>
> >>
> >>
> >>
> >>> 2022年11月8日 上午8:08,Michael Marshall <mmarsh...@apache.org> 写道:
> >>>
> >>> This PIP is similar to autorecovery throttling. I think the feature
> >>> makes sense for the same reasons that throttling autorecovery makes
> >>> sense.
> >>>
> >>> Tangentially, can we decouple writes to tiered storage from the broker
> >>> hosting the topic being offloaded? An independent service could write
> >>> to tiered storage without impacting the broker and could easily scale
> >>> as with the work. The primary complication for the service would be
> >>> figuring out which ledgers to offload. Perhaps the managed ledger
> >>> could "offer" ledgers up that need to be offloaded, and the new
> >>> service would only need to consume those events.
> >>>
> >>> Although, a new service would complicate the pulsar deployment.
> >>>
> >>> Thanks,
> >>> Michael
> >>>
> >>> On Mon, Nov 7, 2022 at 10:30 AM Jiuming Tao
> >>> <jm...@streamnative.io.invalid> wrote:
> >>>>
> >>>>> One alternative would be to throttle offload in the write path instead 
> >>>>> of adding additional logic to the read path in managed ledgers.
> >>>>
> >>>> This is really a feasible method.
> >>>> But we need to make changes in FileSystem and BlobStore offloaders, 
> >>>> event custom offloaders. I think this is not universal.
> >>>>
> >>>>> One simple way to do this is to to limit how many threads can write 
> >>>>> offloaded ledgers. This is the same way that reading of offloaded 
> >>>>> ledgers are already “throttled” by that thread count defaulting to 2.
> >>>>
> >>>> Yes, the offloader thread count is defaulting to 2, but, it does not 
> >>>> effectively limit traffic. If the reading rate of BK is very fast, it 
> >>>> also leads to high CPU/Memory/Network usage
> >>>>
> >>>> Thanks,
> >>>> Tao Jiuming
> >>>>
> >>>>> 2022年11月2日 上午1:43,Dave Fisher <w...@apache.org> 写道:
> >>>>>
> >>>>> One alternative would be to throttle offload in the write path instead 
> >>>>> of adding additional logic to the read path in managed ledgers.
> >>>>>
> >>>>> One simple way to do this is to to limit how many threads can write 
> >>>>> offloaded ledgers. This is the same way that reading of offloaded 
> >>>>> ledgers are already “throttled” by that thread count defaulting to 2.
> >>>>>
> >>>>> Regards,
> >>>>> Dave
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>>> On Nov 1, 2022, at 10:27 AM, Jiuming Tao 
> >>>>>> <jm...@streamnative.io.invalid> wrote:
> >>>>>>
> >>>>>> Hi pulsar community,
> >>>>>>
> >>>>>> I opened a PIP to discuss: PIP-211: Introduce offload throttling
> >>>>>>
> >>>>>> PIP link: https://github.com/apache/pulsar/issues/18004 
> >>>>>> <https://github.com/apache/pulsar/issues/18004>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Tao Jiuming
> >>>>>
> >>>>
> >>
>

Reply via email to