I opened a PR for appending manifests:
https://github.com/apache/incubator-iceberg/pull/201
On Mon, Jun 3, 2019 at 12:32 PM Ryan Blue wrote:
> Yes, we will need to expose ManifestWriter, but only the methods that work
> with DataFile because we only need to support append.
>
> Unfortunately, the
Yes, we will need to expose ManifestWriter, but only the methods that work
with DataFile because we only need to support append.
Unfortunately, these manifests will need to be rewritten because they don't
have the correct snapshot ID in the file metadata because that is set in
the final commit.
I
If we are to support appending manifest files, do we expect to expose
ManifestWriter?
Also, one more question about migrating bucketed Spark tables. Am I correct it
won’t work because of [1]? The bucketing field won’t be present in the
partition values map, as bucket ids are encoded in file nam
A few comments from me inline:
> I think it is reasonable to make this a Spark job. The number of files in
> tables we convert typically requires it. This would only be too much for the
> driver if all of the files are collected at one time. We commit 500,000 files
> per batch, which seems to w
Replies inline:
On Tue, May 14, 2019 at 3:21 AM Anton Okolnychyi
wrote:
> I would like to resume this topic. How do we see the proper API for
> migration?
>
> I have a couple of questions in mind:
> - Now, it is based on a Spark job. Do we want to keep it that way because
> the number of files m
dobe |
> xcoll...@adobe.com <mailto:xcoll...@adobe.com>
>
>
> From: mailto:aokolnyc...@apple.com>> on behalf of
> Anton Okolnychyi
> Reply-To: "dev@iceberg.apache.org <mailto:dev@iceberg.apache.org>"
> mailto:dev@iceberg.apache.org>>
>
; for incentivizing Iceberg adoption and/or PoCs.
>>>>
>>>>
>>>>
>>>> *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe |
>>>> xcoll...@adobe.com
>>>>
>>>>
>>>>
>>>> *From: * on
doption and/or PoCs.
>>>
>>>
>>>
>>> *Xabriel J Collazo Mojica* | Senior Software Engineer | Adobe |
>>> xcoll...@adobe.com
>>>
>>>
>>>
>>> *From: * on behalf of Anton Okolnychyi
>>>
>>> *Reply-To:
Software Engineer | Adobe |
>> xcoll...@adobe.com
>>
>>
>>
>> *From: * on behalf of Anton Okolnychyi
>>
>> *Reply-To: *"dev@iceberg.apache.org"
>> *Date: *Monday, March 18, 2019 at 2:22 PM
>> *To: *"dev@iceberg.apache.org" ,
t; xcoll...@adobe.com
>
>
>
> *From: * on behalf of Anton Okolnychyi
>
> *Reply-To: *"dev@iceberg.apache.org"
> *Date: *Monday, March 18, 2019 at 2:22 PM
> *To: *"dev@iceberg.apache.org" , Ryan Blue <
> rb...@netflix.com>
> *Subject: *Re: Ex
| Senior Software Engineer | Adobe |
xcoll...@adobe.com
From: on behalf of Anton Okolnychyi
Reply-To: "dev@iceberg.apache.org"
Date: Monday, March 18, 2019 at 2:22 PM
To: "dev@iceberg.apache.org" , Ryan Blue
Subject: Re: Extend SparkTableUtil to Handle Tables Not Tracked i
I definitely support this idea. Having a clean and reliable API to migrate
existing Spark tables to Iceberg will be helpful.
I propose to collect all requirements for the new API in this thread. Then I
can come up with a doc that we will discuss within the community.
From the feature perspective
I think that would be fine, but I want to throw out a quick warning:
SparkTableUtil was initially written as a few handy helpers, so it wasn't
well designed as an API. It's really useful, so I can understand wanting to
extend it. But should we come up with a real API for these conversion tasks
inst
Hi,
SparkTableUtil can be helpful for migrating existing Spark tables into Iceberg.
Right now, SparkTableUtil assumes that the partition information is always
tracked in Hive metastore.
What about extending SparkTableUtil to handle Spark tables that don’t rely on
Hive metastore? I have a local
14 matches
Mail list logo