Will you be able to paste code here?

On 23 April 2015 at 22:21, Pat Ferrel <p...@occamsmachete.com> wrote:

> Using Spark streaming to create a large volume of small nano-batch input
> files, ~4k per file, thousands of 'part-xxxxx' files.  When reading the
> nano-batch files and doing a distributed calculation my tasks run only on
> the machine where it was launched. I'm launching in "yarn-client" mode. The
> rdd is created using sc.textFile("list of thousand files")
>
> What would cause the read to occur only on the machine that launched the
> driver.
>
> Do I need to do something to the RDD after reading? Has some partition
> factor been applied to all derived rdds?
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to