subject:"Re\: Spark Streaming FileStream Nested File Support"

Re: Spark Streaming FileStream Nested File Support

2015-04-04 Thread Akhil Das

We've a custom version/build of sparktreaming doing the nested s3 lookups faster (uses native S3 APIs). You can find the source code over here : https://github.com/sigmoidanalytics/spark-modified, In particular the changes from here

Re: Spark Streaming FileStream Nested File Support

2015-04-03 Thread Tathagata Das

Yes, definitely can be added. Just haven't gotten around to doing it :) There are proposals for this that you can try - https://github.com/apache/spark/pull/2765/files . Have you review it at some point. On Fri, Apr 3, 2015 at 1:08 PM, Adam Ritter wrote: > That doesn't seem like a good solution

Re: Spark Streaming FileStream Nested File Support

2015-04-03 Thread Adam Ritter

That doesn't seem like a good solution unfortunately as I would be needing this to work in a production environment. Do you know why the limitation exists for FileInputDStream in the first place? Unless I'm missing something important about how some of the internals work I don't see why this feat

Re: Spark Streaming FileStream Nested File Support

2015-04-03 Thread Tathagata Das

I sort-a-hacky workaround is to use a queueStream where you can manually create RDDs (using sparkContext.hadoopFile) and insert into the queue. Note that this is for testing only as queueStream does not work with driver fautl recovery. TD On Fri, Apr 3, 2015 at 12:23 PM, adamgerst wrote: > So a