Hive has many optimizations.  One is that it will load the data directly
from storage (HDFS) if it's a trivial query.  For example:

Select * from table limit 10;

In natural language it says "give me any ten rows (if available) from the
table."  You don't need the overhead of launching a full mapreduce job for
this.  Just read the rows from the file directly.

Adding additional predicates on the query requires a mapreduce job to do
the heavy lifting.  The error message you're getting is probably the result
of a failed mapreduce job.  Nine times out of ten, the problem is that the
mappers/reducers are not granted enough memory for their YARN containers.

On Tue, Feb 11, 2020, 10:41 AM Pau Tallada <tall...@pic.es> wrote:

> Hi,
>
> Do you have more complete tracebacks?
>
> Missatge de Charles Givre <cgi...@apache.org> del dia dt., 11 de febr.
> 2020 a les 2:54:
>
>> Hello Everyone!
>> I recently joined a project that has a Hive/Impala installation and we
>> are experience a significant number of query failures.  We are using an
>> older version of Hive, and unfortunately there's nothing iI can do about
>> that, but I'm wondering is how I can make Hive do better with queries to
>> give our users a better experience.
>>
>> For example, I can execute a basic SELECT * query or SELECT <fields>
>> query without issues.
>>
>> However, if I attempt to:
>> 1.  Add filters
>> 2.  Do a SELECT DISTINCT
>> 3.  Perform basic aggregation
>>
>> I get errors like this: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.
>>
>> Could someone point me to some good guides for querying Hive and/or
>> assisting my engineers in preventing these errors?
>> Thanks,
>>
>>
>
> --
> ----------------------------------
> Pau Tallada Crespí
> Dep. d'Astrofísica i Cosmologia
> Port d'Informació Científica (PIC)
> Tel: +34 93 170 2729
> ----------------------------------
>
>

Reply via email to