Hi, The reason we want to use this method is that this way a file can be consumed by different streaming apps simultaneously (they just consume it's path from kafka and open it locally). With fileStream to parallelize the processing of a specific file I will have to make several copies of it, which wasteful in terms of space and time.
Thanks, Daniel > On 16 במרץ 2015, at 22:12, Gwen Shapira <[email protected]> wrote: > > Any reason not to use SparkStreaming directly with HDFS files, so > you'll get locality guarantees from the Hadoop framework? > StreamContext has textFileStream() method you could use for this. > > On Mon, Mar 16, 2015 at 12:46 PM, Daniel Haviv > <[email protected]> wrote: >> Hi, >> Is it possible to assign specific partitions to specific nodes? >> I want to upload files to HDFS, find out on which nodes the file resides >> and then push their path into a topic and partition it by nodes. >> This way I can ensure that the consumer (Spark Streaming) will consume both >> the message and file locally. >> >> Can this be achieved ? >> >> Thanks, >> Daniel
