Hive has many optimizations. One is that it will load the data directly from storage (HDFS) if it's a trivial query. For example:
Select * from table limit 10; In natural language it says "give me any ten rows (if available) from the table." You don't need the overhead of launching a full mapreduce job for this. Just read the rows from the file directly. Adding additional predicates on the query requires a mapreduce job to do the heavy lifting. The error message you're getting is probably the result of a failed mapreduce job. Nine times out of ten, the problem is that the mappers/reducers are not granted enough memory for their YARN containers. On Tue, Feb 11, 2020, 10:41 AM Pau Tallada <tall...@pic.es> wrote: > Hi, > > Do you have more complete tracebacks? > > Missatge de Charles Givre <cgi...@apache.org> del dia dt., 11 de febr. > 2020 a les 2:54: > >> Hello Everyone! >> I recently joined a project that has a Hive/Impala installation and we >> are experience a significant number of query failures. We are using an >> older version of Hive, and unfortunately there's nothing iI can do about >> that, but I'm wondering is how I can make Hive do better with queries to >> give our users a better experience. >> >> For example, I can execute a basic SELECT * query or SELECT <fields> >> query without issues. >> >> However, if I attempt to: >> 1. Add filters >> 2. Do a SELECT DISTINCT >> 3. Perform basic aggregation >> >> I get errors like this: Execution Error, return code 1 from >> org.apache.hadoop.hive.ql.exec.mr.MapRedTask. >> >> Could someone point me to some good guides for querying Hive and/or >> assisting my engineers in preventing these errors? >> Thanks, >> >> > > -- > ---------------------------------- > Pau Tallada Crespí > Dep. d'Astrofísica i Cosmologia > Port d'Informació Científica (PIC) > Tel: +34 93 170 2729 > ---------------------------------- > >