The performance typically varies between different tiered storages. And tiered storage is only used in scanning the historic data. So the dominated factor is the sequential scan throughput. Currently, I don't think there are any public articles about performance. Usually, we would recommend people to get the performance results themselves so people can get the first-hand results which are not biased.
Currently, Spark and Flink integration don't read directly from tiered storage. Only Presto supports reading directly from tiered storage. So if you want to see the performance, it is recommended to test using Presto hence you get a better sense about tiered storage. - Sijie On Thu, Apr 23, 2020 at 11:00 AM Qiu, Min-1 <min-1....@novartis.com> wrote: > Hello > > I am very interested in Apache Pulsar but have not tried yet. I searched > internet but seems there are nobody talked about the read performance on > the data in the cold tiered storage together with the data in the hot > bookie. > > Do you have any of the articles or data on the performance on reading the > s3 data? > Like compare on read only from hot bookie vs read data from hot bookie + s3 > Like compare to other framework like Spark, Flink etc. > > > Looking forwards to hearing from you. > > Thanks > > Min > >