S3 Zip File Loading Advice

Benjamin Kim Tue, 08 Mar 2016 21:09:42 -0800

I am wondering if anyone can help.

Our company stores zipped CSV files in S3, which has been a big headache from 
the start. I was wondering if anyone has created a way to iterate through 
several subdirectories (s3n://events/2016/03/01/00, s3n://2016/03/01/01, etc.) 
in S3 to find the newest files and load them. It would be a big bonus to 
include the unzipping of the file in the process so that the CSV can be loaded 
directly into a dataframe for further processing. I’m pretty sure that the S3 
part of this request is not uncommon. I would think the file being zipped is 
uncommon. If anyone can help, I would truly be grateful for I am new to Scala 
and Spark. This would be a great help in learning.


Thanks,
Ben
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

S3 Zip File Loading Advice

Reply via email to