Hi,
The reason we want to use this method is that this way a file can be consumed 
by different streaming apps simultaneously (they just consume it's path from 
kafka and open it locally).
 
With fileStream to parallelize the processing of a specific file I will have to 
make several copies of it, which wasteful in terms of space and time.

Thanks,
Daniel

> On 16 במרץ 2015, at 22:12, Gwen Shapira <[email protected]> wrote:
> 
> Any reason not to use SparkStreaming directly with HDFS files, so
> you'll get locality guarantees from the Hadoop framework?
> StreamContext has textFileStream() method you could use for this.
> 
> On Mon, Mar 16, 2015 at 12:46 PM, Daniel Haviv
> <[email protected]> wrote:
>> Hi,
>> Is it possible to assign specific partitions to specific nodes?
>> I want to upload files to HDFS, find out on which nodes the file resides
>> and then push their path into a topic and partition it by nodes.
>> This way I can ensure that the consumer (Spark Streaming) will consume both
>> the message and file locally.
>> 
>> Can this be achieved ?
>> 
>> Thanks,
>> Daniel

Reply via email to