subject:"Re\: S3 Zip File Loading Advice"

Re: S3 Zip File Loading Advice

2016-03-15 Thread Benjamin Kim

Hi Xinh, I tried to wrap it, but it still didn’t work. I got a "java.util.ConcurrentModificationException”. All, I have been trying and trying with some help of a coworker, but it’s slow going. I have been able to gather a list of the s3 files I need to download. ### S3 Lists ### import scala

Re: S3 Zip File Loading Advice

2016-03-09 Thread Xinh Huynh

Could you wrap the ZipInputStream in a List, since a subtype of TraversableOnce[?] is required? case (name, content) => List(new ZipInputStream(content.open)) Xinh On Wed, Mar 9, 2016 at 7:07 AM, Benjamin Kim wrote: > Hi Sabarish, > > I found a similar posting online where I should use the S3

Re: S3 Zip File Loading Advice

2016-03-09 Thread Benjamin Kim

Hi Sabarish, I found a similar posting online where I should use the S3 listKeys. http://stackoverflow.com/questions/24029873/how-to-read-multiple-text-files-into-a-single-rdd. Is this what you were thinking? And, your assumption is correct. The zipped CSV file contains only a single file. I f

Re: S3 Zip File Loading Advice

2016-03-09 Thread Jörn Franke

Oozie may be able to do this for you and integrate with Spark. > On 09 Mar 2016, at 06:03, Benjamin Kim wrote: > > I am wondering if anyone can help. > > Our company stores zipped CSV files in S3, which has been a big headache from > the start. I was wondering if anyone has created a way to i

Re: S3 Zip File Loading Advice

2016-03-09 Thread Sabarish Sasidharan

You can use S3's listKeys API and do a diff between consecutive listKeys to identify what's new. Are there multiple files in each zip? Single file archives are processed just like text as long as it is one of the supported compression formats. Regards Sab On Wed, Mar 9, 2016 at 10:33 AM, Benjami

Re: S3 Zip File Loading Advice

2016-03-08 Thread Hemant Bhanawat

https://issues.apache.org/jira/browse/SPARK-3586 talks about creating a file dstream which can monitor for new files recursively but this functionality is not yet added. I don't see an easy way out. You will have to create your folders based on timeline (looks like you are already doing that) and

Re: S3 Zip File Loading Advice

Re: S3 Zip File Loading Advice

Re: S3 Zip File Loading Advice

Re: S3 Zip File Loading Advice

Re: S3 Zip File Loading Advice

Re: S3 Zip File Loading Advice

6 matches

Site Navigation

Mail list logo

Footer information