Hi Peter,

So I've looked into the approach that you've pointed at in this pull
request (https://github.com/apache/hive/pull/2161), which is to rely on
HiveMetaHook.commitInsertTable() instead of the "traditional"
OutputCommitter.commitJob().

I've tried to implement a similar approach, however somehow my metahook
class' commitInsertTable() method never gets called. Its preInsertTable()
method does get called though, so I'm not sure why commitInsertTable()
doesn't.

Do you know what I might be missing?

Also, in the same pull request it looks like rollbackInsertTable() is used
in place of OutputCommitter.abortJob().  However, my understanding is that
the former is called if commitInsertTable() throws an exception (see source
code here
<https://github.com/apache/hive/blob/release-3.1.2-rc0/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L900-L912>),
whereas the former should theoretically be called if any error occurs
during the job (i.e. a continuously failing task). One of my goals would be
to clean up the temporary files created by the tasks if the job fails. What
would be a good hook for that?

Thank you,

Julien

On 2022/04/29 21:06:49 Julien Phalip wrote:
> Hi Peter,
>
> Looking at https://issues.apache.org/jira/browse/TEZ-4279, it seems that
> the fix might have been applied to 0.9.3. Is that correct? If so, do you
> think that just upgrading Tez to that version might be enough to allow the
> "setUpJob()", "commitJob()" and "abortJob()" to be called appropriately?
>
> I'm curious if the Hive changes that you've referenced are also needed or
> not. Would you mind clarifying what those Hive changes specifically
achieve?
>
> Also, to answer your question, I'm currently working on a rewrite of the
> Hive-BigQuery connector (
> https://github.com/GoogleCloudDataproc/hive-bigquery-storage-handler).
I'll
> be happy to post a quick update here once I complete all the changes that
> I'm working on, hopefully some time soon.
>
> Thanks,
>
> Julien
>
> On 2022/04/28 07:40:44 Peter Vary wrote:
> > Hi Julien,
> >
> > Hive 3.1.2 is dependent on 0.9 Tez, and I seem to remember having issues
> running Hive 3.1.2 with Tez 0.10.
> > OTOH you might get away with patching 0.9 Tez with the appropriate
> changes. I would ask this on the Tez mailing list.
> >
> > Are you trying out Hive-Iceberg integration, or it is another custom
> SerDe?
> >
> > Thanks,
> > Peter
> >
> > > On 2022. Apr 27., at 19:12, Julien Phalip <jp...@gmail.com> wrote:
> > >
> > > Thanks Peter.
> > >
> > > By chance could I get things to work by keeping my current version of
> Hive (3.1.2) and only upgrading Tez? Which version(s) should I use?
> > >
> > > Thank you,
> > >
> > > Julien
> > >
> > > On 2022/04/27 08:59:08 Peter Vary wrote:
> > > > We had the same issue with the IcebergOutputCommitter.
> > > >
> > > > The first solution was this:
> https://issues.apache.org/jira/browse/HIVE-25006 <
> https://issues.apache.org/jira/browse/HIVE-25006> <
> https://issues.apache.org/jira/browse/HIVE-25006 <
> https://issues.apache.org/jira/browse/HIVE-25006>>
> > > > It needed https://issues.apache.org/jira/browse/TEZ-4279 <
> https://issues.apache.org/jira/browse/TEZ-4279> <
> https://issues.apache.org/jira/browse/TEZ-4279 <
> https://issues.apache.org/jira/browse/TEZ-4279>>
> > > >
> > > > Later we ended up with this final solution:
> https://issues.apache.org/jira/browse/HIVE-25208 <
> https://issues.apache.org/jira/browse/HIVE-25208> <
> https://issues.apache.org/jira/browse/HIVE-25208 <
> https://issues.apache.org/jira/browse/HIVE-25208>>
> > > >
> > > > I hope this helps,
> > > > Peter
> > > >
> > > > > On 2022. Apr 27., at 1:46, Julien Phalip <jp...@gmail.com <
> ma...@gmail.com>> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm working on a custom storage handler. My custom output
committer
> class gets called normally when using the "mr" engine. However, it seems
to
> be entirely ignored when using the "tez" engine.
> > > > >
> > > > > I'm setting the JobConf's "mapred.output.committer.class" key to
my
> fully-qualified output committer class name in the handler's
> configureJobConf() method. I've also tried the
> "hive.tez.mapreduce.output.committer.class" key and also tried setting
> those keys in the job properties in the configureOutputJobProperties()
> method. But that didn't work either.
> > > > >
> > > > > By the way, I'm using Hive 3.1.2 and Tez 0.9.1.
> > > > >
> > > > > Do you know what I might be missing or doing wrong?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Julien
> > > >
> > > >
> >
> >
>

Reply via email to