Hey John, I might be able to help answer some of your questions and provide some context around how you might want to go forward.
So, one fundamental aspect of Iceberg is that it only relies on a few operations (as defined by the FileIO interface). This makes much of the functionality and complexity of full file system implementations unnecessary. You should not need features like S3Guard or additional S3 operations these implementations rely on in order to achieve file system contract behavior. Consistency issues should also not be a problem since Iceberg does not overwrite or list and read-after-(initial)write is a guarantee provided by S3. At Netflix, we use a custom FileSystem implementation (somewhat like S3A), but with much of the contract behavior that drives additional operations against S3 disabled. However, we are transitioning to a more native implementation of S3FileIO, which you'll see as part of the ongoing work in Iceberg. Per your specific questions: 1) The S3FileIO implementation is very new, though internally we have something very similar. There are features missing that we are working to add (e.g. progressive multipart upload for large files is likely the most important). 2) You can use S3AFileSystem with the HadoopFileIO implementation, though you may still see similar behavior with additional calls being made (I don't know if these can be disabled). 3) The PrestoS3FileSystem is tailored to Presto's use and is likely not as complete as S3A, but seeing as it is using the Hadoop FileSystem api, it would likely work for what HadoopFileIO exercises (as would the EMRFileSystem). 4) I would probably discourage you from writing your own file system as the S3FileIO will likely be a more optimized implementation for what Iceberg needs. If you want to contribute or have time to help contribute to S3FileIO, that is the path I would recommend. As for configuration, I would say a lot of it comes down to how to configure the AWS S3 Client that you provide to the S3FileIO implementation, but a lot of the defaults are reasonable (you might want to tweak a few like max connections and maybe the retry policy). The recently committed work to dynamically load your FileIO should make it relatively easy to test out and we'd love to have extra eyes and feedback on it. Let me know if that helps, -Dan On Wed, Nov 11, 2020 at 1:45 PM John Clara <john.anthony.cl...@gmail.com> wrote: > Hello all, > > Thank you all for creating/continuing this great project! I am just > starting to get comfortable with the fundamentals and I'm thinking that my > team has been using Iceberg the wrong way at the FileIO level. > > I was wondering if people would be willing to share how they set up their > FileIO/FileSystem with S3 and any customizations they had to add. > > (Preferably from smaller teams. My team is small and cannot > realistically customize everything. If there's an up to date thread > discussing this that I missed, please link me that instead.) > > ***** My team's specific problems/setup which you can ignore *** > > My team has been using Hadoop FileIO with the S3AFileSystem. Jars are > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use DynamoDB for > atomic renames by implementing Iceberg's provided interfaces. We read/write > from either Spark in EMR or on-prem JVM's in docker containers (managed by > k8s). Both use s3a, but the EMR clusters have HDFS (backed by core nodes) > for the s3a buffered writes while the on-prem containers use the docker > container's default file system which uses an overlay2 storage driver (that > I know nothing about). > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and list > requests which is well known in the community (but not to my team > unfortunately). There's also GET PUT GET inconsistency issues with S3 that > have been talked about, but I don't yet understand how they arise in the > 2.8.5 S3AFilesystem (https://github.com/apache/iceberg/issues/1398). > > *** End of specific *** > > > The options I'm seeing are: > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod? > > This still seems very new unless it is actually based on Netflix's > prod implementation that they're releasing to the community? (I'm wondering > if it's safe to start moving onto it in prod in the near term. If Netflix > is using it (or rolling it out) that would be more than enough for my team.) > > 2. Using a newer hadoop version and use the S3AFileSystem. Any > recommendations on a version and are you also using S3Guard? > > From a quick look, most gains compared to older versions seem to be > from S3Guard. Are there substantial gains without it? (My team doesn't have > experience with S3Guard and Iceberg seems to not need it outside of atomic > renames?) > > 3. Using an alternative hadoop file system. Any recommendations? > > In the recent Iceberg S3 FileIO, the License states it was based off > the Presto FileSystem. Has anyone used this file system as is with Iceberg? > (https://github.com/apache/iceberg/blob/master/LICENSE#L251) > > 4. Roll our own hadoop file system. Anyone have stories/blogs about > pitfalls or difficulties? > > rdblue hints that Netflix already done this: > https://github.com/apache/iceberg/issues/1398#issuecomment-682837392 . > (My team probably doesn't have the capacity for this) > > > Places where I tried looking for this info: > > - https://github.com/apache/iceberg/issues/761 (issue for getting > started guide) > - https://iceberg.apache.org/spec/#file-system-operations > > Thanks everyone, > > John Clara >