Re: Welcoming Peter Vary as a new committer!

2021-01-25 Thread Jungtaek Lim
Congratulations Peter! Well deserved! On Tue, Jan 26, 2021 at 3:40 AM Wing Yew Poon wrote: > Congratulations Peter! > > > On Mon, Jan 25, 2021 at 10:35 AM Russell Spitzer < > russell.spit...@gmail.com> wrote: > >> Congratulations! >> >> On Jan 25, 2021, at 12:34 PM, Jacques Nadeau >> wrote: >>

Re: S3 strong read-after-write consistency

2020-12-02 Thread Jungtaek Lim
What about S3FileIO implementation? I see some issue filed that even with Hive catalog working with S3 brings unexpected issues, and S3FileIO supposed to fix the issue (according to Ryan). Is it safe without S3FileIO to use Hive catalog + Hadoop API for S3 now? 2020년 12월 2일 (수) 오후 6:54, Vivekanand

Re: [ANNOUNCE] Apache Iceberg release 0.10.0

2020-11-16 Thread Jungtaek Lim
Thanks everyone for the huge efforts on achieving the release! On Tue, Nov 17, 2020 at 9:09 AM Anton Okolnychyi wrote: > I am pleased to announce the release of Apache Iceberg 0.10.0! > > Apache Iceberg is an open table format for huge analytic datasets. Iceberg > delivers high query performance

Re: [VOTE] Release Apache Iceberg 0.10.0 RC2

2020-11-02 Thread Jungtaek Lim
Anton, thanks for the interest. It's across modules, and looks to be consistent. Looks like others have no problem with UT runs, probably need to have another mail thread or Github issue for this. On Tue, Nov 3, 2020 at 1:31 AM Anton Okolnychyi wrote: > Jungtaek, do you hit this issue only in a

Re: [VOTE] Release Apache Iceberg 0.10.0 RC2

2020-11-02 Thread Jungtaek Lim
Probably not a good thread to ask, but encounter the issue again during verification of RC so asking here: I'm consistently encountering multiple test failures due to HMS. It shouldn't matter as others verified the UTs, but if someone is aware of the issue and the resolution (or at least where to

Re: Welcoming Jingsong Lee as a new committer

2020-10-10 Thread Jungtaek Lim
Congrats! 2020년 10월 10일 (토) 오후 3:56, Junjie Chen 님이 작성: > Congratulations! Thanks for your great contribution in Flink sink and > source! > > On Sat, Oct 10, 2020 at 9:10 AM 张军 wrote: > >> >> Congratulations >> >> JunZhang >> zhangjunem...@126.com >> >>

Re: Welcoming Zheng Hu as a new committer

2020-10-10 Thread Jungtaek Lim
Congrats! 2020년 10월 10일 (토) 오후 3:56, Junjie Chen 님이 작성: > Congratulations! Thanks for your great contribution in Flink sink and > source! > > On Sat, Oct 10, 2020 at 9:09 AM 张军 wrote: > >> >> Congratulations >> >> JunZhang >> zhangjunem...@126.com >> >>

Re: Impact on Spark-Iceberg usage on missing to enforce clustering/sort requirement (SPARK-23889)

2020-09-20 Thread Jungtaek Lim
eat, too! > > rb > > On Wed, Sep 16, 2020 at 4:27 PM Jungtaek Lim > wrote: > >> Hi all, >> >> Recently I played around the partitioned Iceberg table in Spark, and >> realized it requires manual sort. I had to google to find a workaround - I >> gues

Impact on Spark-Iceberg usage on missing to enforce clustering/sort requirement (SPARK-23889)

2020-09-16 Thread Jungtaek Lim
understand this correctly? I feel we may need to spend efforts to push forward SPARK-23889 for Iceberg (or consider moving down to DSv1 writer), as I think the workaround is unacceptable for many end users. And probably need to document the impact and workaround till we fix the issue. Thanks, Jungtaek L

Re: Question about Iceberg release cadence

2020-08-26 Thread Jungtaek Lim
really would like to see the case also covered by Iceberg. > I see there're lots of works in progress on the milestone (and these are > great features which should be done), but after this we cover both batch > and streaming workloads being done with Spark, which is a huge step forwar

Re: [DISCUSS] Rename iceberg-hive module?

2020-08-19 Thread Jungtaek Lim
+1 for `iceberg-hive-metastore` and also +1 for RD's proposal. Thanks, Jungtaek Lim (HeartSaVioR) On Thu, Aug 20, 2020 at 11:20 AM Jingsong Li wrote: > +1 for `iceberg-hive-metastore` > > I'm confused about `iceberg-hive` and `iceberg-mr`. > > Best, > Jingsong &

Re: [VOTE] Release Apache Iceberg 0.9.1 RC0

2020-08-19 Thread Jungtaek Lim
Just FYI, looks like the 0.9.1 artifacts are available now, but the release page on the website hasn't been updated yet. On Sat, Aug 15, 2020 at 9:46 AM Ryan Blue wrote: > With 8 +1 votes and no others, this RC passes. Thanks for validating the > patch release, everyone! > > I'll get started on

Re: [DISCUSS] 0.9.1 release

2020-08-01 Thread Jungtaek Lim
doing the remainder of the refactor in > master? > > On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim > wrote: > >> If we still have some more days I think #1280 >> <https://github.com/apache/iceberg/pull/1280>: "fix serialization issue >> in BaseCombinedScan

Re: [DISCUSS] 0.9.1 release

2020-07-31 Thread Jungtaek Lim
If we still have some more days I think #1280 : "fix serialization issue in BaseCombinedScanTask with Kyro" is a good candidate to be included. The bug affects both Spark and Flink (according to #1279 ). On S

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-28 Thread Jungtaek Lim
anifests and data files quickly. But if you want to look up data by >> a dimension other than time -- for example, using the bucket of an ID -- >> then the natural clustering doesn't work well. In that case, you can use >> RewriteManifests or the RewriteManifestsAction to cluster data

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-28 Thread Jungtaek Lim
mplementing that, you could use a database for your metadata instead of > a JSON file, which could be faster for this use case. That's also what > you'd implement to inject a different FileIO library, or to change data > file placement. > > On Tue, Jul 28, 2020 at 12:56 AM Jun

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-28 Thread Jungtaek Lim
Jul 28, 2020 at 12:41 PM Jungtaek Lim wrote: > I'd love to contribute documentation about the actions - just need some > time to understand the needs for some actions (like RewriteManifestAction). > > I just submitted a PR for structured streaming sink [1]. I mentioned > expireSnapshot

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-27 Thread Jungtaek Lim
; docs for them. > > Same with the streaming sink, we just need someone to write up docs and > contribute them. We don't use the streaming sink, so I've unfortunately > overlooked it. > > On Mon, Jul 27, 2020 at 3:25 PM Jungtaek Lim > wrote: > >> Thanks for

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-27 Thread Jungtaek Lim
g > it on and making sure you're running `expireSnapshots()` regularly to prune > old table versions -- although expiring snapshots will remove them from > table metadata and limit how far back you can time travel. > > On Mon, Jul 27, 2020 at 4:33 AM Jungtaek Lim > wrote: > &g

Effect of enabling 'write.metadata.delete-after-commit.enabled'

2020-07-27 Thread Jungtaek Lim
is that it doesn't affect time-travel (as it refers to a snapshot), and restoring is also from snapshot, so not sure which point to consider when turning on the option. Thanks, Jungtaek Lim