Hi Weston,
#1 is a problem and we should remove the auto-generation. The issue is that
auto-generating an ID can result in a collision between Iceberg's field IDs
and the generated IDs. Since Iceberg uses the ID to identify a field, that
would result in unrelated data being mistaken for a column's
Ok, this is matching my understanding of how field_id is used as well.
I believe #1 will not be an issue because I think Iceberg always sets
the field_id property when writing data? If that is the case then
Iceberg would never have noticed the old behavior. In other words,
Iceberg never relied on
You can read
https://iceberg.apache.org/custom-catalog/#custom-file-io-implementation
for more details of loading your custom FileIO, and see
http://iceberg.apache.org/aws/#s3-fileio as an example.
-Jack
On Tue, May 18, 2021 at 10:16 AM Vivekanand Vellanki
wrote:
> Is it possible to make the Fil
Is it possible to make the FileIO implementation extensible for a schema?
For e.g. for schema hdfs://, can I ensure that Iceberg uses my custom
implementation of FileIO at run time?
On Tue, May 18, 2021 at 9:45 PM Daniel Weeks wrote:
> Hey Vivek,
>
> The file_path per spec is technically just a
Hey Vivek,
The file_path per spec is technically just a string, but the representation
is expected to be a URI.
How this URI is interpreted is really up to the FileIO implementation. So
for example, the most common FileIO implementation is probably
HadoopFileIO, which is going to use whatever fi
Hey Vivek,
I think as you can see throughout this discussion there are a number of
issues with modifying the data files outside of Iceberg APIs. To maintain
data integrity, it's advised to only operate on the data through Iceberg.
In many ways this is similar to trying to change the history of a
Hey Weston,
>From the Iceberg's perspective, the field_id is necessary to track the
evolution of the schema over time. It's best to think of the problem from
a dataset perspective as opposed to a file perspective.
Iceberg maintains the mapping of the schema with respect to the field ids
because