I don't think you can get a SparkContext inside an RDD function (such as
mapPartitions), but you shouldn't need to. Have you considered returning
the data read from the database from mapPartitions to create a new RDD and
then just save it to a file like normal?
For example:
rddObject.mapPartitions(x => {
x.map(getDataFromDB(_))
}, true).saveAsTextFile("hdfs:///some-folder/")
Does that make sense?
On Wed, Oct 1, 2014 at 12:52 AM, Henry Hung <[email protected]> wrote:
> Hi All,
>
>
>
> A noob question:
>
> How to get SparckContext inside mapPartitions?
>
>
>
> Example:
>
>
>
> Let’s say I have rddObjects that can be split into different partitions to
> be assigned to multiple executors, to speed up the export data from
> database.
>
>
>
> Variable sc is created in the main program using these steps:
>
> val conf = new SparkConf().setAppName("ETCH VM Get FDC")
>
> val sc = new SparkContext(conf)
>
>
>
> and here is the mapPartitions code:
>
> rddObject.mapPartitions(x => {
>
> val uuid = java.util.UUID.randomUUID.toString
>
> val path = new org.apache.hadoop.fs.Path(“hdfs:///some-folder/” + uuid)
>
> val fs = path.getFileSystem(sc.hadoopConfiguration)
>
> val pw = new PrintWriter(fs.create(path))
>
> while (x.hasNext) {
>
> // … do something here, like fetch data from database and write it to
> hadoop file
>
> pw.println(getDataFromDB(x.next))
>
> }
>
> })
>
>
>
> My question is how can I use sc to get the hadoopConfiguration, thus
> enable me to create Hadoop file?
>
>
>
> Best regards,
>
> Henry
>
> ------------------------------
> The privileged confidential information contained in this email is
> intended for use only by the addressees as indicated by the original sender
> of this email. If you are not the addressee indicated in this email or are
> not responsible for delivery of the email to such a person, please kindly
> reply to the sender indicating this fact and delete all copies of it from
> your computer and network server immediately. Your cooperation is highly
> appreciated. It is advised that any unauthorized use of confidential
> information of Winbond is strictly prohibited; and any information in this
> email irrelevant to the official business of Winbond shall be deemed as
> neither given nor endorsed by Winbond.
>
--
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning
440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: [email protected] W: www.velos.io