Thanks a lot for your reply .
In effect , here we tried to run the sql on kettle, hive and spark hive (by
HiveContext) respectively, the job seems frozen to finish to run .
In the 6 tables , need to respectively read the different columns in different
tables for specific information , then do some simple calculation before output
. join operation is used most in the sql .
Best wishes!
On Monday, July 18, 2016 6:24 PM, Chanh Le <[email protected]> wrote:
Hi,What about the network (bandwidth) between hive and spark? Does it run in
Hive before then you move to Spark?Because It's complex you can use something
like EXPLAIN command to show what going on.
On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu <[email protected]> wrote:
the sql logic in the program is very much complex , so do not describe the
detailed codes here .
On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu
<[email protected]> wrote:
Hi All,
Here we have one application, it needs to extract different columns from 6 hive
tables, and then does some easy calculation, there is around 100,000 number of
rows in each table,finally need to output another table or file (with format of
consistent columns) .
However, after lots of days trying, the spark hive job is unthinkably slow -
sometimes almost frozen. There is 5 nodes for spark cluster. Could anyone
offer some help, some idea or clue is also good.
Thanks in advance~
Zhiliang