Re: Loading file content based on offsets into the memory

in4maniac Thu, 07 May 2015 14:35:33 -0700

When loading multiple files, spark loads each file as a partition(block). You
can run a function on each partition by using rdd.mapPartitions(function)
function.


I think you can write a funciton x that extracts everything after the offset
and use this funtion with mapPartitions to extract the relevant lines for
each file. 

Hope this helps





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-file-content-based-on-offsets-into-the-memory-tp22802p22804.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Loading file content based on offsets into the memory

Reply via email to