Thanks, Simon! It helped a lot :D
Ciao, r- Xu (Simon) Chen wrote > Hi Roberto, > > Ultimately, the info you need is set here: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L69 > > Being a spark newbie, I extended org.apache.spark.rdd.HadoopRDD class as > HadoopRDDWithEnv, which takes in an additional parameter (varname) in the > constructor, then override the compute() function to return something like > """split.getPipeEnvVars.getOrElse(varName, "") + "|" + value.toString()""" > as the value. This obviously is less general and makes certain assumptions > about the input data. Also you need to write several wrappers in > SparkContext, so that you can do something like sc.textFileWithEnv("hdfs > path", "mapreduce_map_input_file"). > > I was hoping to do something like > sc.textFile("hdfs_path").pipe("""/usr/bin/awk > "{print\"${mapreduce_map_input_file}\",$0}" """). This gives me some weird > kyro buffer overflow exception... Haven't got a chance to look into the > details yet. > > -Simon > > > > On Fri, Aug 1, 2014 at 7:38 AM, Roberto Torella < > roberto.torella@ > > > wrote: > >> Hi Simon, >> >> I'm trying to do the same but I'm quite lost. >> >> How did you do that? (Too direct? :) >> >> >> Thanks and ciao, >> r- >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11160.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11319.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org