You must be parsing each
line of the file at some point anyway, so adding a step to filter out
the header should work fine. It'll get executed at the same time as your
parsing/conversion to ints, so there's no significant overhead aside
from the check itself. For standalone programs, there's an section in the pyspark programming guide, along with a link to a complete example: http://spark.incubator.apache.org/docs/latest/python-programming-guide.html#standalone-programs
|
- Dealing with headers in csv file pyspark Chengi Liu
- Re: Dealing with headers in csv file pyspark Mayur Rustagi
- Re: Dealing with headers in csv file pyspar... Chengi Liu
- Re: Dealing with headers in csv file py... Ewen Cheslack-Postava
- Re: Dealing with headers in csv file pyspark Bryn Keller