Hi all,
I'm reading large text files from s3. Sizes between from 30GB and 40GB.
Every stage runs in 8-9s, except the last 32, jumps to 1mn-2mn for some reason!
Here is my sample code:
val myDF = sc.textFile(input_file).map{
x =>
val p = x.split("\t", -1)
new zzzzzzzz(....)
}.toDF()
myDF.registerTempTable("tbl")
sqlContext.sql("select count(1) from tbl").collect()
Any help/idea?
Thanks,
Younes Naguib
Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC H3G 1R8
Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 |
[email protected] <mailto:[email protected]>