Re: when run the same job, time that spark used is very diffrent from shark.

2014-03-07 Thread Mayur Rustagi
So thr are static cost associated with parsing the queries, structuring the operators but should not be that much. Another thing is all the data is passed through a parser in Shark, serialized & passed through filter & sent to driver. In Spark data is simply read as text, run through contains & ret

when run the same job, time that spark used is very diffrent from shark.

2014-03-06 Thread qingyang li
*Hi, community, I have setup 3 nodes spark cluster using standalone mode, each machine's memery is 16G, the core is 4. * *when i run " val file = sc.textFile("/user/hive/warehouse/b/test.txt") file.filter(line => line.contains("2013-")).count() "* *it cost 2.7s , * *but , when