So you can run a job / spark job to get data to disk/hdfs. Then run a
dstream from a hdfs folder. As you move your files, the dstream will kick
in.
Regards
Mayur
On 6 Jun 2014 21:13, "Gianluca Privitera" <
gianluca.privite...@studio.unibo.it> wrote:

>  Where are the API for QueueStream and RddQueue?
> In my solution I cannot open a DStream with S3 location because I need to
> run a script on the file (a script that unluckily doesn't accept stdin as
> input), so I have to download it on my disk somehow than handle it from
> there before creating the stream.
>
> Thanks
> Gianluca
>
> On 06/06/2014 02:19, Mayur Rustagi wrote:
>
> You can look to create a Dstream directly from S3 location using file
> stream. If you want to use any specific logic you can rely on Queuestream &
> read data yourself from S3, process it & push it into RDDQueue.
>
>  Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
>  @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera <
> gianluca.privite...@studio.unibo.it> wrote:
>
>> Hi,
>> I've got a weird question but maybe someone has already dealt with it.
>> My Spark Streaming application needs to
>> - download a file from a S3 bucket,
>> - run a script with the file as input,
>> - create a DStream from this script output.
>> I've already got the second part done with the rdd.pipe() API that really
>> fits my request, but I have no idea how to manage the first part.
>> How can I manage to download a file and run a script on them inside a
>> Spark Streaming Application?
>> Should I use process() from Scala or it won't work?
>>
>> Thanks
>> Gianluca
>>
>>
>
>

Reply via email to