Hey Pau,
Thanks for the clarification. Yes, that helped to start the query, however
the query was taking huge time to retrieve a few records.
May I know what steps can I take to make this kind of query performance
better? I mean the predicates which does not have partitioning.
Thanks,
Sai.
On T
Hi Sai,
Let me summarize some of your data:
You have a 9 billion record table with 4 columns, which should account for
a minimum raw size of about 200 GiB (not including string column)
You want to select ALL columns from rows with a specific value in a column
which is not partitioned, so Hive has
Thanks for your detailed explanation Pau. The query actually never
returned even after 4 hours, I had to cancel the query. The reason might
be, I have too many small orc files as an input to Hive table.
Also, You are right my Cluster capacity is very less. But, do you suggest
we should keep on in
yes, 2 nodes is very few
On Fri, Nov 15, 2019, 16:37 Sai Teja Desu
wrote:
> Thanks for your detailed explanation Pau. The query actually never
> returned even after 4 hours, I had to cancel the query. The reason might
> be, I have too many small orc files as an input to Hive table.
>
> Also, Yo
Hello,
Not sure if this answers your question, but please note the following:
Processing occurs via MapReduce, Spark, or Tez. The processing engines run
on top of YARN. Each processing engine derives much of their HA from
YARN. There are some quarks there, but these engines running on YARN is