Hello,

A quick question about using spark to parse text-format CSV files stored on
hdfs.

I have something very simple:
sc.textFile("hdfs://test/path/*").map(line => line.split(",")).map(p =>
(XXX, p[0], p[2]))

Here, I want to replace XXX with a string, which is the current csv
filename for the line. This is needed since some information may be encoded
in the file name, like date.

In hive, I am able to define an external table and use INPUT__FILE__NAME as
a column in queries. I wonder if spark has something similar.

Thanks!
-Simon

Reply via email to