Hello,
A quick question about using spark to parse text-format CSV files stored on
hdfs.
I have something very simple:
sc.textFile("hdfs://test/path/*").map(line => line.split(",")).map(p =>
(XXX, p[0], p[2]))
Here, I want to replace XXX with a string, which is the current csv
filename for the line. This is needed since some information may be encoded
in the file name, like date.
In hive, I am able to define an external table and use INPUT__FILE__NAME as
a column in queries. I wonder if spark has something similar.
Thanks!
-Simon