Hello all, I am a newbie to Spark, Just analyzing the product. I am facing a performance problem with hive, Trying analyse whether the Spark will solve it or not. but it seems that Spark also taking lot of time.Let me know if I miss anything.
shark> select count(time) from table2; OK 6050 Time taken: 7.571 seconds shark> select count(time) from table1; OK 18770 Time taken: 1.802 seconds shark> select count(*) from table2 t2 JOIN table1 t1; OK 113558500 Time taken: 40.332 seconds shark> select count(*) from table2 t2 JOIN table1 t1 WHERE unix_timestamp(t2.time, 'yyyy-MM-dd HH:mm:ss,SSS') > unix_timestamp(t1.time, 'yyyy-MM-dd HH:mm:ss,SSS') and testCompare(t1.coulmn1, t1.column2, t2.column1,t2.column2); Note: testCompare is Java UDF function which returns true or false. This query is running more than 1 hour. Is there any issue with this query? Or do i miss anything basic in spark? Thanks and Regards, Malligarjunan S.