RE: Custom OutputCommitter not called by Tez

2022-05-02 Thread Julien Phalip
Hi Peter, Thanks a lot for the breakdown, it all makes sense. Unfortunately I work with companies who are stuck with older versions of Hive, so I'm trying to find some workarounds. I was actually able to make it mostly work. Here's what I do: - In configureJobConf(): - Create a work di

Re: Custom OutputCommitter not called by Tez

2022-05-02 Thread Peter Vary
Hi Julien, With Iceberg we went through the same process, so here is some our findings: - Writers are running on the executors (LLAP or MR nodes). - OutputCommitter taskCommit runs on the same executors - We did some experimenting but found that even when we were able to call the OutputCommitter

Re: RE: Re: Custom OutputCommitter not called by Tez

2022-05-01 Thread Jennifer Chen
unsubscribe On Sat, Apr 30, 2022 at 9:10 PM Julien Phalip wrote: > After all, I was able to have my MetaHook class' commitInsertTable() > method be properly called by Tez. However, it looks like it's in fact a > different instance of that class, and therefore it doesn't share the same > Configur

RE: RE: Re: Custom OutputCommitter not called by Tez

2022-04-30 Thread Julien Phalip
After all, I was able to have my MetaHook class' commitInsertTable() method be properly called by Tez. However, it looks like it's in fact a different instance of that class, and therefore it doesn't share the same Configuration object as the one that was initialized at the beginning of the job. So

RE: RE: Re: Custom OutputCommitter not called by Tez

2022-04-30 Thread Julien Phalip
Hi Peter, So I've looked into the approach that you've pointed at in this pull request (https://github.com/apache/hive/pull/2161), which is to rely on HiveMetaHook.commitInsertTable() instead of the "traditional" OutputCommitter.commitJob(). I've tried to implement a similar approach, however som

RE: Re: Custom OutputCommitter not called by Tez

2022-04-29 Thread Julien Phalip
Hi Peter, Looking at https://issues.apache.org/jira/browse/TEZ-4279, it seems that the fix might have been applied to 0.9.3. Is that correct? If so, do you think that just upgrading Tez to that version might be enough to allow the "setUpJob()", "commitJob()" and "abortJob()" to be called appropria

Re: Custom OutputCommitter not called by Tez

2022-04-28 Thread Peter Vary
Hi Julien, Hive 3.1.2 is dependent on 0.9 Tez, and I seem to remember having issues running Hive 3.1.2 with Tez 0.10. OTOH you might get away with patching 0.9 Tez with the appropriate changes. I would ask this on the Tez mailing list. Are you trying out Hive-Iceberg integration, or it is anoth

RE: Re: Custom OutputCommitter not called by Tez

2022-04-27 Thread Julien Phalip
Thanks Peter. By chance could I get things to work by keeping my current version of Hive (3.1.2) and only upgrading Tez? Which version(s) should I use? Thank you, Julien On 2022/04/27 08:59:08 Peter Vary wrote: > We had the same issue with the IcebergOutputCommitter. > > The first solution was

Re: Custom OutputCommitter not called by Tez

2022-04-27 Thread Peter Vary
We had the same issue with the IcebergOutputCommitter. The first solution was this: https://issues.apache.org/jira/browse/HIVE-25006 It needed https://issues.apache.org/jira/browse/TEZ-4279 Later

Custom OutputCommitter not called by Tez

2022-04-26 Thread Julien Phalip
Hi, I'm working on a custom storage handler. My custom output committer class gets called normally when using the "mr" engine. However, it seems to be entirely ignored when using the "tez" engine. I'm setting the JobConf's "mapred.output.committer.class" key to my fully-qualified output committer