Thanks for the feedback Andrew, my responses are inline:

There is no reason a system can't provide results in milliseconds after it
> is written (as we do in InfluxDB 3.0) even if data eventually landd in
> object storage. This is achieved, unsurprisingly, by having metadata and
> data stored closer to the compute in some combination of node-local and
> cluster-local caches. I believe this is also at a high level how Snowflake
> works


Yes, I described this design in this section
<https://engineeringideas.substack.com/i/147163081/data-warehouses-designed-for-olap-manage-their-table-storage-themselves>.
My point is that *not* using these object storage-caching, indexing,
fresh-data layers is just wasteful for no good reason, neither for the DB
vendor (who may have done more compute on their side and charged more money
for it) nor for the user, who will instead pay for more, less efficient
compute done by the processing engine somewhere else. See also this section
<https://engineeringideas.substack.com/i/148607428/vendor-competition-and-the-lowest-common-denominator-effect>
of the latter article for more on this point. The only winners in this
arrangement are cloud and processing compute vendors.

 Thus I disagree with the premise that Iceberg and other table formats are

not the future of OLAP. Instead I see them as the foundational layer
> (mostly due to avoiding data gravity / vendor lockin) on which a new
> ecosystem of OLAP tools will be developed
>

You are describing the current arrangement, in the absence of table
transfer protocols that I proposed. I don't doubt that this arrangement
currently exists and kind of works. Rather, I suggest that this arrangement
creates persistent inefficiency, is inflexible, and stifles storage
innovation.
 - Persistent inefficient: basically what has been said above, missing the
usage of existing caching and indexes for no good reason, passing more data
over the network instead of filtering them on the storage side when it
makes sense, etc.
 - Inflexible: Parquet + Iceberg are not built for ML/AI/RAG: see DeepLake's
paper <https://arxiv.org/abs/2209.10785>. This is also the reason Lance
exists, of course, etc.
 - Stifles storage innovation: doesn't invite storage format innovation
(Lance, Meta's Nimble, DeepLake's format, etc.) and storage architecture
innovation, such as not using the standard S3 object storage interface as
the least common denominator, but e.g. allowing for columnar-data-serving
in the erasure-coded cold storage layer directly (this is what BigQuery
Managed storage and Meta's Tectonic do), or putting some columnar data
logic (even if truncated) directly into computers that were traditionally
known as "SSD controllers". My understanding is that Vastdata
<https://www.vastdata.com/platform/database> does something like this.

All these points are discussed at greater length in the articles that I
linked above.

To your point about the "ecosystem of OLAP tools" sitting on top of
Iceberg/S3-compatible object storage: this is a particular way to structure
technology and business as somewhat of an add-on to the currently dominant
arrangements. There is value in that as long as the dominant arrangement
persists. Even the vendors of these "add-on" OLAP architectures would
benefit from the existence of table transfer protocols, as I noted above
(they would serve more compute themselves, and spent less resources on
two-way syncs between their internal catalogs and caches and the Iceberg
metadata format).

And, again, there are other vendors who don't use Parquet and/or the
conventional object storage foundation (as aforementioned Lance, DeepLake,
Vastdata, also Druid and Pinot, and others) for whom converting syncing
into Parquet/Iceberg is clearly nonsensical. The status quo (table transfer
protocols don't exist) is kind of exclusionary for them.


> A more efficient / performant version of Arrow Flight (for table read /
> write) does sound interesting, but I think many vendors will likely
> implement a custom proprietary protocol for communicating between their own
> services if Flight isn't good enough.
>

I don't understand how the second part of this sentence contradicts the
first one. I gestured in the posts that table transfer protocols could be
used as inter-stage exchange formats for MPP or processing engines like
Ballista or Ray. But this is just a side note. The main purpose of the
protocols is to be the external protocols between columnar/OLAP/search/ML
stores and diverse processing engines pulling data from them and writing
processing results back: "bring your own processing engine" from the
perspective of the store vendor, "bring your own database" from the
perspective of the processing engine vendor.

Reply via email to