Re: access hdfs file name in map()

2014-08-04 Thread Roberto Torella
t got a chance to look into the > details yet. > > -Simon > > > > On Fri, Aug 1, 2014 at 7:38 AM, Roberto Torella < > roberto.torella@ > > > wrote: > >> Hi Simon, >> >> I'm trying to do the same but I'm quite lost. >>

Re: access hdfs file name in map()

2014-08-01 Thread Xu (Simon) Chen
o the same but I'm quite lost. > > How did you do that? (Too direct? :) > > > Thanks and ciao, > r- > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11160.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: access hdfs file name in map()

2014-08-01 Thread Roberto Torella
Hi Simon, I'm trying to do the same but I'm quite lost. How did you do that? (Too direct? :) Thanks and ciao, r- -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11160.html Sent from the Apache Spark User Li

Re: access hdfs file name in map()

2014-06-04 Thread Xu (Simon) Chen
N/M.. I wrote a HadoopRDD subclass and append one env field of the HadoopPartition to the value in compute function. It worked pretty well. Thanks! On Jun 4, 2014 12:22 AM, "Xu (Simon) Chen" wrote: > I don't quite get it.. > > mapPartitionWithIndex takes a function that maps an integer index and

Re: access hdfs file name in map()

2014-06-03 Thread Xu (Simon) Chen
I don't quite get it.. mapPartitionWithIndex takes a function that maps an integer index and an iterator to another iterator. How does that help with retrieving the hdfs file name? I am obviously missing some context.. Thanks. On May 30, 2014 1:28 AM, "Aaron Davidson" wrote: > Currently there

Re: access hdfs file name in map()

2014-05-29 Thread Aaron Davidson
Currently there is not a way to do this using textFile(). However, you could pretty straightforwardly define your own subclass of HadoopRDD [1] in order to get access to this information (likely using mapPartitionsWithIndex to look up the InputSplit for a particular partition). Note that sc.textFi

access hdfs file name in map()

2014-05-29 Thread Xu (Simon) Chen
Hello, A quick question about using spark to parse text-format CSV files stored on hdfs. I have something very simple: sc.textFile("hdfs://test/path/*").map(line => line.split(",")).map(p => (XXX, p[0], p[2])) Here, I want to replace XXX with a string, which is the current csv filename for the l