After all, I was able to have my MetaHook class' commitInsertTable() method be properly called by Tez. However, it looks like it's in fact a different instance of that class, and therefore it doesn't share the same Configuration object as the one that was initialized at the beginning of the job. So I'm not sure there's a way to pass information between the two instances.
Perhaps one way around this issue would be to create a temporary file based on the query ID conf ("hive.query.id")? The first instance would dump some information in that file, then the other instance would load it up on the other end to perform the commit. Julien On 2022/04/29 21:06:49 Julien Phalip wrote: > Hi Peter, > > Looking at https://issues.apache.org/jira/browse/TEZ-4279, it seems that > the fix might have been applied to 0.9.3. Is that correct? If so, do you > think that just upgrading Tez to that version might be enough to allow the > "setUpJob()", "commitJob()" and "abortJob()" to be called appropriately? > > I'm curious if the Hive changes that you've referenced are also needed or > not. Would you mind clarifying what those Hive changes specifically achieve? > > Also, to answer your question, I'm currently working on a rewrite of the > Hive-BigQuery connector ( > https://github.com/GoogleCloudDataproc/hive-bigquery-storage-handler). I'll > be happy to post a quick update here once I complete all the changes that > I'm working on, hopefully some time soon. > > Thanks, > > Julien > > On 2022/04/28 07:40:44 Peter Vary wrote: > > Hi Julien, > > > > Hive 3.1.2 is dependent on 0.9 Tez, and I seem to remember having issues > running Hive 3.1.2 with Tez 0.10. > > OTOH you might get away with patching 0.9 Tez with the appropriate > changes. I would ask this on the Tez mailing list. > > > > Are you trying out Hive-Iceberg integration, or it is another custom > SerDe? > > > > Thanks, > > Peter > > > > > On 2022. Apr 27., at 19:12, Julien Phalip <jp...@gmail.com> wrote: > > > > > > Thanks Peter. > > > > > > By chance could I get things to work by keeping my current version of > Hive (3.1.2) and only upgrading Tez? Which version(s) should I use? > > > > > > Thank you, > > > > > > Julien > > > > > > On 2022/04/27 08:59:08 Peter Vary wrote: > > > > We had the same issue with the IcebergOutputCommitter. > > > > > > > > The first solution was this: > https://issues.apache.org/jira/browse/HIVE-25006 < > https://issues.apache.org/jira/browse/HIVE-25006> < > https://issues.apache.org/jira/browse/HIVE-25006 < > https://issues.apache.org/jira/browse/HIVE-25006>> > > > > It needed https://issues.apache.org/jira/browse/TEZ-4279 < > https://issues.apache.org/jira/browse/TEZ-4279> < > https://issues.apache.org/jira/browse/TEZ-4279 < > https://issues.apache.org/jira/browse/TEZ-4279>> > > > > > > > > Later we ended up with this final solution: > https://issues.apache.org/jira/browse/HIVE-25208 < > https://issues.apache.org/jira/browse/HIVE-25208> < > https://issues.apache.org/jira/browse/HIVE-25208 < > https://issues.apache.org/jira/browse/HIVE-25208>> > > > > > > > > I hope this helps, > > > > Peter > > > > > > > > > On 2022. Apr 27., at 1:46, Julien Phalip <jp...@gmail.com < > ma...@gmail.com>> wrote: > > > > > > > > > > Hi, > > > > > > > > > > I'm working on a custom storage handler. My custom output committer > class gets called normally when using the "mr" engine. However, it seems to > be entirely ignored when using the "tez" engine. > > > > > > > > > > I'm setting the JobConf's "mapred.output.committer.class" key to my > fully-qualified output committer class name in the handler's > configureJobConf() method. I've also tried the > "hive.tez.mapreduce.output.committer.class" key and also tried setting > those keys in the job properties in the configureOutputJobProperties() > method. But that didn't work either. > > > > > > > > > > By the way, I'm using Hive 3.1.2 and Tez 0.9.1. > > > > > > > > > > Do you know what I might be missing or doing wrong? > > > > > > > > > > Thanks, > > > > > > > > > > Julien > > > > > > > > > > > > >