So thr are static cost associated with parsing the queries, structuring the
operators but should not be that much.
Another thing is all the data is passed through a parser in Shark,
serialized & passed through filter & sent to driver.
In Spark data is simply read as text, run through contains & ret
*Hi, community, I have setup 3 nodes spark cluster using standalone mode,
each machine's memery is 16G, the core is 4. *
*when i run " val file =
sc.textFile("/user/hive/warehouse/b/test.txt")
file.filter(line => line.contains("2013-")).count() "*
*it cost 2.7s , *
*but , when