Might I ask why vfs?  I'm new to vfs and not sure wether or not it predates
the hadoop file system interfaces (HCFS).

After all spark natively supports any HCFS by leveraging the hadoop
FileSystem api and class loaders and so on.

So simply putting those resources on your classpath should be sufficient to
directly connect to s3. By using the sc.hadoopFile (...) commands.
On May 13, 2015 12:16 PM, "Akhil Das" <ak...@sigmoidanalytics.com> wrote:

> Did you happened to have a look at this https://github.com/abashev/vfs-s3
>
> Thanks
> Best Regards
>
> On Tue, May 12, 2015 at 11:33 PM, Stephen Carman <scar...@coldlight.com>
> wrote:
>
> > We have a small mesos cluster and these slaves need to have a vfs setup
> on
> > them so that the slaves can pull down the data they need from S3 when
> spark
> > runs.
> >
> > There doesn’t seem to be any obvious way online on how to do this or how
> > easily accomplish this. Does anyone have some best practices or some
> ideas
> > about how to accomplish this?
> >
> > An example stack trace when a job is ran on the mesos cluster…
> >
> > Any idea how to get this going? Like somehow bootstrapping spark on run
> or
> > something?
> >
> > Thanks,
> > Steve
> >
> >
> > java.io.IOException: Unsupported scheme s3n for URI s3n://removed
> >         at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> >         at
> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> >         at org.apache.spark.scheduler.Task.run(Task.scala:64)
> >         at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >         at java.lang.Thread.run(Thread.java:745)
> > 15/05/12 13:57:51 ERROR Executor: Exception in task 0.1 in stage 0.0 (TID
> > 1)
> > java.lang.RuntimeException: java.io.IOException: Unsupported scheme s3n
> > for URI s3n://removed
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:307)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> >         at
> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> >         at org.apache.spark.scheduler.Task.run(Task.scala:64)
> >         at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >         at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.io.IOException: Unsupported scheme s3n for URI
> > s3n://removed
> >         at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304)
> >         ... 8 more
> >
> > This e-mail is intended solely for the above-mentioned recipient and it
> > may contain confidential or privileged information. If you have received
> it
> > in error, please notify us immediately and delete the e-mail. You must
> not
> > copy, distribute, disclose or take any action in reliance on it. In
> > addition, the contents of an attachment to this e-mail may contain
> software
> > viruses which could damage your own computer system. While ColdLight
> > Solutions, LLC has taken every reasonable precaution to minimize this
> risk,
> > we cannot accept liability for any damage which you sustain as a result
> of
> > software viruses. You should perform your own virus checks before opening
> > the attachment.
> >
>

Reply via email to