Followup from iceberg newbie questions

kkishore iiith Tue, 09 Feb 2021 17:36:12 -0800

Hello,

This is followup from
https://lists.apache.org/thread.html/rd15bf1db711b1a31f39d4b98776f29753b544fa3a496111d3460e11e%40%3Cdev.iceberg.apache.org%3E

*If a file system does not support atomic renames, then you should use a
metastore to track tables. You can use Hive, Nessie, or Glue. We also are
working on a JDBC catalog.*

1. What would go wrong if I write directly to gcs from spark via iceberg?
Do we end up having data in gcs but would be missing the iceberg metadata
for these files ? Or would it just lose some snapshots during multiple
parallel transactions?

*Iceberg's API can tell you what files were added or removed in any given
snapshot. You can also use time travel to query the table at a given
snapshot and use SQL to find the row-level changes. We don't currently
support reading just the changes in a snapshot because there may be deletes
as well as inserts.*

2. I would like to further clarify whether iceberg supports incremental
query like https://hudi.apache.org/docs/querying_data.html#spark-incr-query.
https://medium.com/adobetech/iceberg-at-adobe-88cf1950e866 was talking
about incremental reads to query data between snapshots. But I am confused
with above response and
http://mail-archives.apache.org/mod_mbox/iceberg-dev/201907.mbox/%3ca237bb81-f4da-45d9-9827-36203624f...@tencent.com%3E
where you talked that the incremental query is not supported natively. If
the latter way
<http://mail-archives.apache.org/mod_mbox/iceberg-dev/201907.mbox/%3ca237bb81-f4da-45d9-9827-36203624f...@tencent.com%3E>
is the only way to derive incremental data, does iceberg use predicate
pushdown to get the incremental data based on file-delta as iceberg's
metadata contain file info for both snapshots.

Thanks,
Kishor.

Followup from iceberg newbie questions

Reply via email to