Re: [DISCUSS] KIP-405: Kafka Tiered Storage

Ying Zheng Tue, 19 Nov 2019 17:41:46 -0800

Hi Jun,

Thank you very much for your feedback!


Can we schedule a meeting in your Palo Alto office in December? I think a
face to face discussion is much more efficient than emails. Both Harsha and
I can visit you. Satish may be able to join us remotely.

On Fri, Nov 15, 2019 at 11:04 AM Jun Rao <j...@confluent.io> wrote:

> Hi, Satish and Harsha,
>
> The following is a more detailed high level feedback for the KIP. Overall,
> the KIP seems useful. The challenge is how to design it such that it’s
> general enough to support different ways of implementing this feature and
> support existing features.
>
> 40. Local segment metadata storage: The KIP makes the assumption that the
> metadata for the archived log segments are cached locally in every broker
> and provides a specific implementation for the local storage in the
> framework. We probably should discuss this more. For example, some tier
> storage providers may not want to cache the metadata locally and just rely
> upon a remote key/value store if such a store is already present. If a
> local store is used, there could be different ways of implementing it
> (e.g., based on customized local files, an embedded local store like
> RocksDB, etc). An alternative of designing this is to just provide an
> interface for retrieving the tier segment metadata and leave the details of
> how to get the metadata outside of the framework.
>
> 41. RemoteStorageManager interface and the usage of the interface in the
> framework: I am not sure if the interface is general enough.  For example,
> it seems that RemoteLogIndexEntry is tied to a specific way of storing the
> metadata in remote storage. The framework uses listRemoteSegments() api in
> a pull based approach. However, in some other implementations, a push based
> approach may be more preferred. I don’t have a concrete proposal yet. But,
> it would be useful to give this area some more thoughts and see if we can
> make the interface more general.
>
> 42. In the diagram, the RemoteLogManager is side by side with LogManager.
> This KIP only discussed how the fetch request is handled between the two
> layer. However, we should also consider how other requests that touch the
> log can be handled. e.g., list offsets by timestamp, delete records, etc.
> Also, in this model, it's not clear which component is responsible for
> managing the log start offset. It seems that the log start offset could be
> changed by both RemoteLogManager and LogManager.
>
> 43. There are quite a few existing features not covered by the KIP. It
> would be useful to discuss each of those.
> 43.1 I won’t say that compacted topics are rarely used and always small.
> For example, KStreams uses compacted topics for storing the states and
> sometimes the size of the topic could be large. While it might be ok to not
> support compacted topics initially, it would be useful to have a high level
> idea on how this might be supported down the road so that we don’t have to
> make incompatible API changes in the future.
> 43.2 We need to discuss how EOS is supported. In particular, how is the
> producer state integrated with the remote storage.
> 43.3 Now that KIP-392 (allow consumers to fetch from closest replica) is
> implemented, we need to discuss how reading from a follower replica is
> supported with tier storage.
> 43.4 We need to discuss how JBOD is supported with tier storage.
>
> Thanks,
>
> Jun
>
> On Fri, Nov 8, 2019 at 12:06 AM Tom Bentley <tbent...@redhat.com> wrote:
>
> > Thanks for those insights Ying.
> >
> > On Thu, Nov 7, 2019 at 9:26 PM Ying Zheng <yi...@uber.com.invalid>
> wrote:
> >
> > > >
> > > >
> > > >
> > > > Thanks, I missed that point. However, there's still a point at which
> > the
> > > > consumer fetches start getting served from remote storage (even if
> that
> > > > point isn't as soon as the local log retention time/size). This
> > > represents
> > > > a kind of performance cliff edge and what I'm really interested in is
> > how
> > > > easy it is for a consumer which falls off that cliff to catch up and
> so
> > > its
> > > > fetches again come from local storage. Obviously this can depend on
> all
> > > > sorts of factors (like production rate, consumption rate), so it's
> not
> > > > guaranteed (just like it's not guaranteed for Kafka today), but this
> > > would
> > > > represent a new failure mode.
> > > >
> > >
> > >  As I have explained in the last mail, it's a very rare case that a
> > > consumer
> > > need to read remote data. With our experience at Uber, this only
> happens
> > > when the consumer service had an outage for several hours.
> > >
> > > There is not a "performance cliff" as you assume. The remote storage is
> > > even faster than local disks in terms of bandwidth. Reading from remote
> > > storage is going to have higher latency than local disk. But since the
> > > consumer
> > > is catching up several hours data, it's not sensitive to the sub-second
> > > level
> > > latency, and each remote read request will read a large amount of data
> to
> > > make the overall performance better than reading from local disks.
> > >
> > >
> > >
> > > > Another aspect I'd like to understand better is the effect of serving
> > > fetch
> > > > request from remote storage has on the broker's network utilization.
> If
> > > > we're just trimming the amount of data held locally (without
> increasing
> > > the
> > > > overall local+remote retention), then we're effectively trading disk
> > > > bandwidth for network bandwidth when serving fetch requests from
> remote
> > > > storage (which I understand to be a good thing, since brokers are
> > > > often/usually disk bound). But if we're increasing the overall
> > > local+remote
> > > > retention then it's more likely that network itself becomes the
> > > bottleneck.
> > > > I appreciate this is all rather hand wavy, I'm just trying to
> > understand
> > > > how this would affect broker performance, so I'd be grateful for any
> > > > insights you can offer.
> > > >
> > > >
> > > Network bandwidth is a function of produce speed, it has nothing to do
> > with
> > > remote retention. As long as the data is shipped to remote storage, you
> > can
> > > keep the data there for 1 day or 1 year or 100 years, it doesn't
> consume
> > > any
> > > network resources.
> > >
> >
>

Re: [DISCUSS] KIP-405: Kafka Tiered Storage

Reply via email to