t got a chance to look into the
> details yet.
>
> -Simon
>
>
>
> On Fri, Aug 1, 2014 at 7:38 AM, Roberto Torella <
> roberto.torella@
> >
> wrote:
>
>> Hi Simon,
>>
>> I'm trying to do the same but I'm quite lost.
>>
o the same but I'm quite lost.
>
> How did you do that? (Too direct? :)
>
>
> Thanks and ciao,
> r-
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11160.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
Hi Simon,
I'm trying to do the same but I'm quite lost.
How did you do that? (Too direct? :)
Thanks and ciao,
r-
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11160.html
Sent from the Apache Spark User Li
N/M.. I wrote a HadoopRDD subclass and append one env field of the
HadoopPartition to the value in compute function. It worked pretty well.
Thanks!
On Jun 4, 2014 12:22 AM, "Xu (Simon) Chen" wrote:
> I don't quite get it..
>
> mapPartitionWithIndex takes a function that maps an integer index and
I don't quite get it..
mapPartitionWithIndex takes a function that maps an integer index and an
iterator to another iterator. How does that help with retrieving the hdfs
file name?
I am obviously missing some context..
Thanks.
On May 30, 2014 1:28 AM, "Aaron Davidson" wrote:
> Currently there
Currently there is not a way to do this using textFile(). However, you
could pretty straightforwardly define your own subclass of HadoopRDD [1] in
order to get access to this information (likely using
mapPartitionsWithIndex to look up the InputSplit for a particular
partition).
Note that sc.textFi
Hello,
A quick question about using spark to parse text-format CSV files stored on
hdfs.
I have something very simple:
sc.textFile("hdfs://test/path/*").map(line => line.split(",")).map(p =>
(XXX, p[0], p[2]))
Here, I want to replace XXX with a string, which is the current csv
filename for the l