Hi Ryan, The intended use of the *FlinkFileSystemIO* is to set it through the *Catalog*, like this:
* Map<String, String> props = new HashMap<>(3);* * props.put(CatalogProperties.WAREHOUSE_LOCATION, warehouse); props.put(CatalogProperties.URI, uri); props.put(CatalogProperties.FILE_IO_IMPL, FlinkFileIO.class.getName());* * CatalogLoader.hive("hive", hiveConf, props);* If I understand you correctly, this is exactly what you are suggesting. I absolutely agree that setting this through the TableProperties is a no-go. This should be used for Flink applications where the storage layer behind the Iceberg table is also used for other purposes, like checkpointing. Thanks, Peter Ryan Blue <b...@tabular.io> ezt írta (időpont: 2024. ápr. 22., H, 23:58): > I think the idea of introducing a Flink-specific FileIO isn't a good idea. > The intent of the Java API is for a table to use the FileIO instance that > is supplied by the table object. That puts the responsibility for supplying > a correctly configure FileIO on the catalog, which is the right place to > inject most customization. > > Having a Flink FileIO doesn't fit with that model. You could expose a > generic FileIO from the catalog if you wanted, which would make a lot more > sense. But FileIO is not the same thing as a FileSystem implementation. It > should be used in places where it makes sense to use FileIO and should not > be given the same lifecycle or responsibilities as a FileSystem. > > Ryan > > On Mon, Apr 22, 2024 at 11:00 AM Ferenc Csaky <ferenc.cs...@pm.me.invalid> > wrote: > >> Hi Peter, >> >> I am coming from the Flink side, but at Cloudera we also use >> Iceberg as well. >> >> Utilizing the Flink delegation token fw via the Iceberg Java API >> would be great. I think that simplifying the configuration for >> Flink related cases also has value on its own, and could help to >> eliminate some confusion regarding when/where set properties. >> >> Regarding the naming, maybe it would worth to be more explicit and >> call the class `FlinkFSFileIO`? Just to emphasize that "Flink" in >> this context is referred as the FS abstraction layer, not as a >> processing engine. WDYT? >> >> Looking forward to this change! >> >> Best, >> Ferenc >> >> On 2024/04/19 17:08:23 Péter Váry wrote: >> > Hi Iceberg Team, >> > >> > Flink has its own FileSystem implementation. See: >> > >> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/ >> > . >> > This FileSystem already has several implementations: >> > >> > - Hadoop >> > - Azure >> > - S3 >> > - Google Cloud Storage >> > - ... >> > >> > As a general rule in Flink, one should use this FileSystem to consume >> and >> > persistently store data. >> > If these FileSystems are configured, then Flink makes sure that the >> > configurations are consistent and available for the JM/TM. >> > Also as an added benefit, delegation tokens are handled and distributed >> for >> > these FileSystems automatically. See: >> > >> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/security/security-delegation-token/ >> > >> > In house, some of our new users are struggling with parametrizing >> > HadooFileIO, and S3FileIO for Iceberg, trying to wrap their head around >> > that they have to provide different configurations for the checkpointing >> > and for the Iceberg table storage (even if they are stored in the same >> > bucket, or on the same HDFS cluster) >> > >> > I have created a PR, which provides a FileIO implementation which uses >> > FlinkFileSystem. Very imaginatively I have named it FlinkFileIO. See: >> > https://github.com/apache/iceberg/pull/10151 >> > >> > This would allow the users to configure the FileSystem only once, and >> use >> > this FileSystem to access Iceberg tables. Also, if for whatever reason >> the >> > global nature of flink file system config is limiting, the users still >> > could revert back using the other FileIO implementations. >> > >> > What do you think? Would this be a useful addition to the Iceberg-Flink >> > integration? >> > >> > Thanks, >> > Peter >> > >> > > > -- > Ryan Blue > Tabular >