Thanks Janos for the feedback.

If I understand well your suggestion is support all of the properties below
for table level compactions and treat them as equivalent:
* compactor.mapred.job.queue.name
* compactor.mapreduce.job.queuename
* compactor.hive.compactor.job.queue

It is something that crossed my mind as well but I am slightly skeptical
because like this we explicitly state that people are free to use whatever
they like. It might also have as a consequence MR properties affecting Tez
(as it happens a bit with HIVE-25595) which from my perspective is not that
great. I am also thinking that it will lead to more requests for accepting
these MR specific properties in the query based compactor which cannot (and
probably will never) use MR as the underlying engine. We should also keep
in mind that the MR engine was deprecated ~6years ago and the MR compactor
may follow soon.

I am fine implementing this specific change (accepting all properties
above) as long as someone from the people contributing to the compactor
confirms it is the desired path going forward.

Best,
Stamatis


On Mon, Feb 7, 2022 at 11:50 AM Janos Kovacs <kovja...@gmail.com> wrote:

> Hi Stamatis,
>
> I agree that the [compactor.]*hive.compactor.queue.name
> <http://hive.compactor.queue.name>* is a better solution as hive now also
> supports query based compaction, not only MR.
> ...although I think this needs to be backward compatible!
>
> What do you think about a logic similar to this:
>
> --- a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java    
> 2022-02-07 10:31:28.000000000 +0100
> +++ b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java    
> 2022-02-07 10:33:25.000000000 +0100
> @@ -145,10 +145,19 @@
>      overrideMRProps(job, t.getParameters()); // override MR properties from 
> tblproperties if applicable
>      if (ci.properties != null) {
>        overrideTblProps(job, t.getParameters(), ci.properties);
>      }
>
> +    // make queue configuration backward compatible
> +    // at that point overrideMRProps and OverrideTblProps already 
> consolidated
> +    // the final value, just need to use job.TBALE_PROPS
> +    String queueNameLegacy =
> +      (new 
> StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty("compactor.mapred.job.queue.name");
> +    if (queueNameLegacy != null && queueNameLegacy.length() > 0) {
> +      job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy);
> +    }
> +
>      String queueName = HiveConf.getVar(job, ConfVars.COMPACTOR_JOB_QUEUE);
>      if (queueName != null && queueName.length() > 0) {
>        job.setQueueName(queueName);
>      }
>
>
> Of course this can be wrapped around with a new config if needed, like
> hive.compaction.queue.name.use.legacy or whatever...
> FYI: we might also want to check legacy config not only for 
> *"compactor.mapred.job.queue.name
> <http://compactor.mapred.job.queue.name>"* but also for
> *"compactor.mapreduce.job.queuename" *as the first one was already on the
> deprecated list as pointed out by Peter Vary.
>
> Please also note that the change introduced by HIVE-25595 is currently not
> compatible with the new config as it was developed for the old
> compactor.mapred... property:
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java#L31
> This also needs to be handled - for both the new prop name and backward
> compatibility.
>
> R, Janos
>
>
> On 2022/01/31 09:50:49 Stamatis Zampetakis wrote:
> > Hi all,
> >
> > This email is an attempt to converge on which Hive/Tez/MR properties
> > someone should use in order to schedule a compaction on specific queues.
> > For those who are not familiar with how queues are used the YARN capacity
> > scheduler documentation [1] gives the general idea.
> >
> > Using specific queues for compaction jobs is necessary to be able to
> > efficiently allocate resources for maintenance tasks (compaction) and
> > production workloads. Hive provides various ways to control the queues
> used
> > by the compactor and there have been various tickets with improvements
> and
> > fixes in this area (see list below).
> >
> > The granularity we can select queues for compactions (all tables vs. per
> > table) currently depends on which compactor is in use (MR vs Query based)
> > and boils down to the following properties:
> >
> > Global configuration:
> > * hive.compactor.job.queue
> > * mapred.job.queue.name
> > * tez.queue.name
> >
> > Per table/statement configuration (table properties):
> > * compactor.mapred.job.queue.name (before HIVE-20723)
> > * compactor.hive.compactor.job.queue (after HIVE-20723)
> >
> > Things are a bit blurred with respect to what properties someone should
> use
> > to achieve the desired result. Some changes, such as HIVE-20723, raise
> > backward compatibility concerns and other changes seem to have a larger
> > impact than the one specifically designed for. For example, after
> > HIVE-25595, map reduce queue properties can have an impact on the
> compactor
> > queues even when Tez is in use.
> >
> > In order to avoid confusion and ensure long term support of these queue
> > selection features we should clarify which of the above properties should
> > be used.
> >
> > Given the current situation, I would propose to officially support only
> the
> > following:
> > * hive.compactor.job.queue
> > * compactor.hive.compactor.job.queue
> > and align the implementation based on these (if necessary). In other
> words,
> > Hive users should not use mapred.job.queue.name and tez.queue.name
> > explicitly at least when it comes to the compactor. Hive should set them
> > transparently (as it happens now in various places) based on
> > [compactor.]hive.compactor.job.queue.
> >
> > What do people think? Are there other ideas?
> >
> > Best,
> > Stamatis
> >
> > [1]
> >
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> >
> > HIVE-11997: Add ability to send Compaction Jobs to specific queue
> > HIVE-13354: Add ability to specify Compaction options per table and per
> > request
> > HIVE-20723: Allow per table specification of compaction yarn queue
> > HIVE-24781: Allow to use custom queue for query based compaction
> > HIVE-25801: Custom queue settings is not honoured by Query based
> compaction
> > StatsUpdater
> > HIVE-25595: Custom queue settings is not honoured by compaction
> StatsUpdater
> >
>

Reply via email to