Saisai,

Iceberg's FileIO interface doesn't require guarantees as strict as a
Hadoop-compatible FileSystem. For S3, that allows us to avoid negative
caching that can cause problems when reading a table that has just been
updated. (Specifically, S3A performs a HEAD request to check whether a path
is a directory even for overwrites.)

It's up to users whether they want to use S3FileIO or S3A with
HadoopFileIO, or even HadoopFileIO with a custom FileSystem. We are working
to make it easy to switch between them, but there are notable problems like
ORC not using the FileIO interface yet. I would recommend using S3FileIO to
avoid negative caching, but S3A with S3Guard may be another way to avoid
the problem. I think the default should be to use S3FileIO if it is in the
classpath, and fall back to a Hadoop FileSystem if it isn't.

For cloud providers, I think the question is whether there is a need for
the relaxed guarantees. If there is no need, then using HadoopFileIO and a
FileSystem works great and is probably the easiest way to maintain an
implementation for the object store.

On Thu, Nov 12, 2020 at 7:31 PM Saisai Shao <sai.sai.s...@gmail.com> wrote:

> Hi all,
>
> Sorry to chime in, I also have a same concern about using Iceberg with
> Object storage.
>
> One of my concerns with S3FileIO is getting tied too much to a single
>> cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful
>> so that S3FileIO and (a future) GCSFileIO could share logic? I haven't
>> looked deep enough into the S3FileIO to know how much logic is not s3
>> specific. Maybe the FileIO interface is enough.
>>
>
> Now we have a S3 specific FileIO implementation, is it recommended to use
> this one instead of s3a like HCFS implementation? Also each public cloud
> provider has its own HCFS implementation for its own object storage. Are we
> going to suggest to create a specific FileIO implementation or use the
> existing HCFS implementation?
>
> Thanks
> Saisai
>
>
> Daniel Weeks <dwe...@netflix.com.invalid> 于2020年11月13日周五 上午1:09写道:
>
>> Hey John, about the concerns around cloud provider dependency, I feel
>> like the FileIO interface is actually the right level of abstraction
>> already.
>>
>> That interface basically requires "open for read" and "open for write",
>> where the implementation will diverge across different platforms.
>>
>> I guess you could think of it as S3FileIO is to FileIO what S3AFileSystem
>> is to FileSystem (in Hadoop).  You can have many different implementations
>> that coexist.
>>
>> In fact, recent changes to the Catalog allow for very flexible management
>> of FIleIO and you could even have files within a table split across
>> multiple cloud vendors.
>>
>> As to the consistency questions, the list operation can be inconsistent
>> (e.g. if a new file is created and the implementation relies on list then
>> read, it may not see newly created objects.  Iceberg does not list, so that
>> should not be an issue).
>>
>> The stated read-after-write consistency is limited and does not include:
>>  - Read after overwrite
>>  - Read after delete
>>  - Read after negative cache (e.g. a GET or HEAD that occurred before the
>> object was created).
>>
>> Some of those inconsistencies have caused problems in certain cases when
>> it comes to committing data (the negative cache being the main culprit).
>>
>> -Dan
>>
>>
>> On Wed, Nov 11, 2020 at 6:49 PM John Clara <john.anthony.cl...@gmail.com>
>> wrote:
>>
>>> Update: I think I'm wrong about the listing part. I think it will only
>>> do the HEAD request. Also it seems like the consistency issue is
>>> probably not something my team would encounter with our current jobs.
>>>
>>> On 2020/11/12 02:17:10, John Clara <j...@gmail.com> wrote:
>>>  > (Not sure if this is actually replying or just starting a new thread)>
>>>  >
>>>  > Hi Daniel,>
>>>  >
>>>  > Thanks for the response! It's very helpful and answers a lot my
>>> questions.>
>>>  >
>>>  > A couple follow ups:>
>>>  >
>>>  > One of my concerns with S3FileIO is getting tied too much to a single
>>> >
>>>  > cloud provider. I'm wondering if an ObjectStoreFileIO would be
>>> helpful >
>>>  > so that S3FileIO and (a future) GCSFileIO could share logic? I
>>> haven't >
>>>  > looked deep enough into the S3FileIO to know how much logic is not s3
>>> >
>>>  > specific. Maybe the FileIO interface is enough.>
>>>  >
>>>  > About consistency (no need to respond here):>
>>>  > I'm seeing that during "getFileStatus" my version of s3a does some
>>> list >
>>>  > requests (but I'm not sure if that could fail from consistency
>>> issues).>
>>>  > I'm also confused about the read-after-(initial) write part:>
>>>  > "Amazon S3 provides read-after-write consistency for PUTS of new
>>> objects >
>>>  > in your S3 bucket in all Regions with one caveat. The caveat is that
>>> if >
>>>  > you make a HEAD or GET request to a key name before the object is >
>>>  > created, then create the object shortly after that, a subsequent GET >
>>>  > might not return the object due to eventual consistency. - >
>>>  > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html";>
>>>  >
>>>  > When my version of s3a does a create, it first does a
>>> getMetadataRequest >
>>>  > (HEAD) to check if the object exists before creating the object. I
>>> think >
>>>  > this is talked about in this issue: >
>>>  > https://github.com/apache/iceberg/issues/1398 and talked about in
>>> the >
>>>  > S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll
>>> follow
>>> up >
>>>  > in that issue for more info.>
>>>  >
>>>  > John>
>>>  >
>>>  >
>>>  > On 2020/11/12 00:36:10, Daniel Weeks <d....@netflix.com.INVALID>
>>> wrote:>
>>>  > > Hey John, I might be able to help answer some of your questions and
>>> >
>>>  > provide>>
>>>  > > some context around how you might want to go forward.>>
>>>  > >>
>>>  > > So, one fundamental aspect of Iceberg is that it only relies on a
>>> few>>
>>>  > > operations (as defined by the FileIO interface). This makes much of
>>> the>>
>>>  > > functionality and complexity of full file system implementations>>
>>>  > > unnecessary. You should not need features like S3Guard or
>>> additional S3>>
>>>  > > operations these implementations rely on in order to achieve file >
>>>  > system>>
>>>  > > contract behavior. Consistency issues should also not be a problem >
>>>  > since>>
>>>  > > Iceberg does not overwrite or list and read-after-(initial)write is
>>> a>>
>>>  > > guarantee provided by S3.>>
>>>  > >>
>>>  > > At Netflix, we use a custom FileSystem implementation (somewhat
>>> like >
>>>  > S3A),>>
>>>  > > but with much of the contract behavior that drives additional >
>>>  > operations>>
>>>  > > against S3 disabled. However, we are transitioning to a more
>>> native>>
>>>  > > implementation of S3FileIO, which you'll see as part of the ongoing
>>> >
>>>  > work in>>
>>>  > > Iceberg.>>
>>>  > >>
>>>  > > Per your specific questions:>>
>>>  > >>
>>>  > > 1) The S3FileIO implementation is very new, though internally we
>>> have>>
>>>  > > something very similar. There are features missing that we are >
>>>  > working to>>
>>>  > > add (e.g. progressive multipart upload for large files is likely
>>> the >
>>>  > most>>
>>>  > > important).>>
>>>  > > 2) You can use S3AFileSystem with the HadoopFileIO implementation, >
>>>  > though>>
>>>  > > you may still see similar behavior with additional calls being made
>>> (I>>
>>>  > > don't know if these can be disabled).>>
>>>  > > 3) The PrestoS3FileSystem is tailored to Presto's use and is likely
>>> >
>>>  > not as>>
>>>  > > complete as S3A, but seeing as it is using the Hadoop FileSystem
>>> api, >
>>>  > it>>
>>>  > > would likely work for what HadoopFileIO exercises (as would the>>
>>>  > > EMRFileSystem).>>
>>>  > > 4) I would probably discourage you from writing your own file
>>> system >
>>>  > as the>>
>>>  > > S3FileIO will likely be a more optimized implementation for what >
>>>  > Iceberg>>
>>>  > > needs.>>
>>>  > >>
>>>  > > If you want to contribute or have time to help contribute to >
>>>  > S3FileIO, that>>
>>>  > > is the path I would recommend. As for configuration, I would say a >
>>>  > lot of>>
>>>  > > it comes down to how to configure the AWS S3 Client that you
>>> provide >
>>>  > to the>>
>>>  > > S3FileIO implementation, but a lot of the defaults are reasonable
>>> (you>>
>>>  > > might want to tweak a few like max connections and maybe the retry >
>>>  > policy).>>
>>>  > >>
>>>  > > The recently committed work to dynamically load your FileIO should >
>>>  > make it>>
>>>  > > relatively easy to test out and we'd love to have extra eyes and >
>>>  > feedback>>
>>>  > > on it.>>
>>>  > >>
>>>  > > Let me know if that helps,>>
>>>  > > -Dan>>
>>>  > >>
>>>  > >>
>>>  > >>
>>>  > > On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>>
>>>  > > wrote:>>
>>>  > >>
>>>  > > > Hello all,>>
>>>  > > >>>
>>>  > > > Thank you all for creating/continuing this great project! I am
>>> just>>
>>>  > > > starting to get comfortable with the fundamentals and I'm
>>> thinking >
>>>  > that my>>
>>>  > > > team has been using Iceberg the wrong way at the FileIO level.>>
>>>  > > >>>
>>>  > > > I was wondering if people would be willing to share how they set
>>> up >
>>>  > their>>
>>>  > > > FileIO/FileSystem with S3 and any customizations they had to
>>> add.>>
>>>  > > >>>
>>>  > > > (Preferably from smaller teams. My team is small and cannot>>
>>>  > > > realistically customize everything. If there's an up to date
>>> thread>>
>>>  > > > discussing this that I missed, please link me that instead.)>>
>>>  > > >>>
>>>  > > > ***** My team's specific problems/setup which you can ignore ***>>
>>>  > > >>>
>>>  > > > My team has been using Hadoop FileIO with the S3AFileSystem. Jars
>>> are>>
>>>  > > > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use
>>> DynamoDB >
>>>  > for>>
>>>  > > > atomic renames by implementing Iceberg's provided interfaces. We >
>>>  > read/write>>
>>>  > > > from either Spark in EMR or on-prem JVM's in docker containers >
>>>  > (managed by>>
>>>  > > > k8s). Both use s3a, but the EMR clusters have HDFS (backed by
>>> core >
>>>  > nodes)>>
>>>  > > > for the s3a buffered writes while the on-prem containers use the >
>>>  > docker>>
>>>  > > > container's default file system which uses an overlay2 storage >
>>>  > driver (that>>
>>>  > > > I know nothing about).>>
>>>  > > >>>
>>>  > > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and
>>> list>>
>>>  > > > requests which is well known in the community (but not to my
>>> team>>
>>>  > > > unfortunately). There's also GET PUT GET inconsistency issues
>>> with >
>>>  > S3 that>>
>>>  > > > have been talked about, but I don't yet understand how they arise
>>> >
>>>  > in the>>
>>>  > > > 2.8.5 S3AFilesystem
>>> (https://github.com/apache/iceberg/issues/1398).>>
>>>  > > >>>
>>>  > > > *** End of specific ***>>
>>>  > > >>>
>>>  > > >>>
>>>  > > > The options I'm seeing are:>>
>>>  > > >>>
>>>  > > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>>
>>>  > > >>>
>>>  > > > This still seems very new unless it is actually based on
>>> Netflix's>>
>>>  > > > prod implementation that they're releasing to the community? (I'm
>>> >
>>>  > wondering>>
>>>  > > > if it's safe to start moving onto it in prod in the near term. If
>>> >
>>>  > Netflix>>
>>>  > > > is using it (or rolling it out) that would be more than enough
>>> for >
>>>  > my team.)>>
>>>  > > >>>
>>>  > > > 2. Using a newer hadoop version and use the S3AFileSystem. Any>>
>>>  > > > recommendations on a version and are you also using S3Guard?>>
>>>  > > >>>
>>>  > > > From a quick look, most gains compared to older versions seem to
>>> be>>
>>>  > > > from S3Guard. Are there substantial gains without it? (My team >
>>>  > doesn't have>>
>>>  > > > experience with S3Guard and Iceberg seems to not need it outside
>>> of >
>>>  > atomic>>
>>>  > > > renames?)>>
>>>  > > >>>
>>>  > > > 3. Using an alternative hadoop file system. Any recommendations?>>
>>>  > > >>>
>>>  > > > In the recent Iceberg S3 FileIO, the License states it was based
>>> off>>
>>>  > > > the Presto FileSystem. Has anyone used this file system as is
>>> with >
>>>  > Iceberg?>>
>>>  > > > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>>
>>>  > > >>>
>>>  > > > 4. Roll our own hadoop file system. Anyone have stories/blogs
>>> about>>
>>>  > > > pitfalls or difficulties?>>
>>>  > > >>>
>>>  > > > rdblue hints that Netflix already done this:>>
>>>  > > > >
>>>  > https://github.com/apache/iceberg/issues/1398#issuecomment-682837392
>>> .>>
>>>  > > > (My team probably doesn't have the capacity for this)>>
>>>  > > >>>
>>>  > > >>>
>>>  > > > Places where I tried looking for this info:>>
>>>  > > >>>
>>>  > > > - https://github.com/apache/iceberg/issues/761 (issue for
>>> getting>>
>>>  > > > started guide)>>
>>>  > > > - https://iceberg.apache.org/spec/#file-system-operations>>
>>>  > > >>>
>>>  > > > Thanks everyone,>>
>>>  > > >>>
>>>  > > > John Clara>>
>>>  > > >>>
>>>  > >>
>>>  >
>>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to