When loading multiple files, spark loads each file as a partition(block). You
can run a function on each partition by using rdd.mapPartitions(function)
function. 

I think you can write a funciton x that extracts everything after the offset
and use this funtion with mapPartitions to extract the relevant lines for
each file. 

Hope this helps





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-file-content-based-on-offsets-into-the-memory-tp22802p22804.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to