Thanks for the feedback Andrew, my responses are inline: There is no reason a system can't provide results in milliseconds after it > is written (as we do in InfluxDB 3.0) even if data eventually landd in > object storage. This is achieved, unsurprisingly, by having metadata and > data stored closer to the compute in some combination of node-local and > cluster-local caches. I believe this is also at a high level how Snowflake > works
Yes, I described this design in this section <https://engineeringideas.substack.com/i/147163081/data-warehouses-designed-for-olap-manage-their-table-storage-themselves>. My point is that *not* using these object storage-caching, indexing, fresh-data layers is just wasteful for no good reason, neither for the DB vendor (who may have done more compute on their side and charged more money for it) nor for the user, who will instead pay for more, less efficient compute done by the processing engine somewhere else. See also this section <https://engineeringideas.substack.com/i/148607428/vendor-competition-and-the-lowest-common-denominator-effect> of the latter article for more on this point. The only winners in this arrangement are cloud and processing compute vendors. Thus I disagree with the premise that Iceberg and other table formats are not the future of OLAP. Instead I see them as the foundational layer > (mostly due to avoiding data gravity / vendor lockin) on which a new > ecosystem of OLAP tools will be developed > You are describing the current arrangement, in the absence of table transfer protocols that I proposed. I don't doubt that this arrangement currently exists and kind of works. Rather, I suggest that this arrangement creates persistent inefficiency, is inflexible, and stifles storage innovation. - Persistent inefficient: basically what has been said above, missing the usage of existing caching and indexes for no good reason, passing more data over the network instead of filtering them on the storage side when it makes sense, etc. - Inflexible: Parquet + Iceberg are not built for ML/AI/RAG: see DeepLake's paper <https://arxiv.org/abs/2209.10785>. This is also the reason Lance exists, of course, etc. - Stifles storage innovation: doesn't invite storage format innovation (Lance, Meta's Nimble, DeepLake's format, etc.) and storage architecture innovation, such as not using the standard S3 object storage interface as the least common denominator, but e.g. allowing for columnar-data-serving in the erasure-coded cold storage layer directly (this is what BigQuery Managed storage and Meta's Tectonic do), or putting some columnar data logic (even if truncated) directly into computers that were traditionally known as "SSD controllers". My understanding is that Vastdata <https://www.vastdata.com/platform/database> does something like this. All these points are discussed at greater length in the articles that I linked above. To your point about the "ecosystem of OLAP tools" sitting on top of Iceberg/S3-compatible object storage: this is a particular way to structure technology and business as somewhat of an add-on to the currently dominant arrangements. There is value in that as long as the dominant arrangement persists. Even the vendors of these "add-on" OLAP architectures would benefit from the existence of table transfer protocols, as I noted above (they would serve more compute themselves, and spent less resources on two-way syncs between their internal catalogs and caches and the Iceberg metadata format). And, again, there are other vendors who don't use Parquet and/or the conventional object storage foundation (as aforementioned Lance, DeepLake, Vastdata, also Druid and Pinot, and others) for whom converting syncing into Parquet/Iceberg is clearly nonsensical. The status quo (table transfer protocols don't exist) is kind of exclusionary for them. > A more efficient / performant version of Arrow Flight (for table read / > write) does sound interesting, but I think many vendors will likely > implement a custom proprietary protocol for communicating between their own > services if Flight isn't good enough. > I don't understand how the second part of this sentence contradicts the first one. I gestured in the posts that table transfer protocols could be used as inter-stage exchange formats for MPP or processing engines like Ballista or Ray. But this is just a side note. The main purpose of the protocols is to be the external protocols between columnar/OLAP/search/ML stores and diverse processing engines pulling data from them and writing processing results back: "bring your own processing engine" from the perspective of the store vendor, "bring your own database" from the perspective of the processing engine vendor.