Hello all,

I am a newbie to Spark, Just analyzing the product. I am facing a
performance problem with hive, Trying analyse whether the Spark will solve
it or not. but it seems that Spark also taking lot of time.Let me know if I
miss anything.

shark> select count(time) from table2;
OK
6050
Time taken: 7.571 seconds

shark> select count(time) from table1;
OK
18770
Time taken: 1.802 seconds

shark> select count(*) from table2 t2 JOIN table1 t1;
OK
113558500
Time taken: 40.332 seconds

shark> select count(*) from table2 t2 JOIN table1 t1 WHERE
unix_timestamp(t2.time, 'yyyy-MM-dd HH:mm:ss,SSS') >
unix_timestamp(t1.time, 'yyyy-MM-dd HH:mm:ss,SSS') and
testCompare(t1.coulmn1, t1.column2, t2.column1,t2.column2);
Note: testCompare is Java UDF function which returns true or false.
This query is running more than 1 hour. Is there any issue with this query?
Or do i miss anything basic in spark?

Thanks and Regards,
Malligarjunan S.

Reply via email to