Hi all,

I'm reading large text files from s3. Sizes between from 30GB and 40GB.
Every stage runs in 8-9s, except the last 32, jumps to 1mn-2mn for some reason!
Here is my sample code:
    val myDF = sc.textFile(input_file).map{
      x =>
        val p = x.split("\t", -1)
        new zzzzzzzz(....)
    }.toDF()

    myDF.registerTempTable("tbl")
    sqlContext.sql("select count(1) from tbl").collect()

Any help/idea?

Thanks,
Younes Naguib
Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC  H3G 1R8
Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | 
younes.nag...@tritondigital.com <mailto:younes.nag...@streamtheworld.com>

Reply via email to