Re: access hdfs file name in map()

Roberto Torella Mon, 04 Aug 2014 03:19:30 -0700

Thanks, Simon!

It helped a lot :D


Ciao,
r-


Xu (Simon) Chen wrote
> Hi Roberto,
> 
> Ultimately, the info you need is set here:
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L69
> 
> Being a spark newbie, I extended org.apache.spark.rdd.HadoopRDD class as
> HadoopRDDWithEnv, which takes in an additional parameter (varname) in the
> constructor, then override the compute() function to return something like
> """split.getPipeEnvVars.getOrElse(varName, "") + "|" + value.toString()"""
> as the value. This obviously is less general and makes certain assumptions
> about the input data. Also you need to write several wrappers in
> SparkContext, so that you can do something like sc.textFileWithEnv("hdfs
> path", "mapreduce_map_input_file").
> 
> I was hoping to do something like
> sc.textFile("hdfs_path").pipe("""/usr/bin/awk
> "{print\"${mapreduce_map_input_file}\",$0}" """). This gives me some weird
> kyro buffer overflow exception... Haven't got a chance to look into the
> details yet.
> 
> -Simon
> 
> 
> 
> On Fri, Aug 1, 2014 at 7:38 AM, Roberto Torella &lt;

> roberto.torella@

> &gt;
> wrote:
> 
>> Hi Simon,
>>
>> I'm trying to do the same but I'm quite lost.
>>
>> How did you do that? (Too direct? :)
>>
>>
>> Thanks and ciao,
>> r-
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11160.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11319.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: access hdfs file name in map()

Reply via email to