Another way is to configure S3 as Tachyon's under storage system, and then
run Spark on Tachyon.

More info: http://tachyon-project.org/Setup-UFS.html

Best,

Haoyuan

On Wed, May 13, 2015 at 10:52 AM, Stephen Carman <scar...@coldlight.com>
wrote:

> Thank you for the suggestions, the problem exists in the fact we need to
> initialize the vfs s3 driver so what you suggested Akhil wouldn’t fix the
> problem.
>
> Basically a job is submitted to the cluster and it tries to pull down the
> data from s3, but fails because the s3 uri hasn’t been initilized in the
> vfs and it doesn’t know how to handle
> the URI.
>
> What I’m asking is, how do we before the job is ran, run some
> bootstrapping or setup code that will let us do this initialization or
> configuration step for the vfs so that when it executes the job
> it has the information it needs to be able to handle the s3 URI.
>
> Thanks,
> Steve
>
> On May 13, 2015, at 12:35 PM, jay vyas <jayunit100.apa...@gmail.com
> <mailto:jayunit100.apa...@gmail.com>> wrote:
>
>
> Might I ask why vfs?  I'm new to vfs and not sure wether or not it
> predates the hadoop file system interfaces (HCFS).
>
> After all spark natively supports any HCFS by leveraging the hadoop
> FileSystem api and class loaders and so on.
>
> So simply putting those resources on your classpath should be sufficient
> to directly connect to s3. By using the sc.hadoopFile (...) commands.
>
> On May 13, 2015 12:16 PM, "Akhil Das" <ak...@sigmoidanalytics.com<mailto:
> ak...@sigmoidanalytics.com>> wrote:
> Did you happened to have a look at this https://github.com/abashev/vfs-s3
>
> Thanks
> Best Regards
>
> On Tue, May 12, 2015 at 11:33 PM, Stephen Carman <scar...@coldlight.com
> <mailto:scar...@coldlight.com>>
> wrote:
>
> > We have a small mesos cluster and these slaves need to have a vfs setup
> on
> > them so that the slaves can pull down the data they need from S3 when
> spark
> > runs.
> >
> > There doesn’t seem to be any obvious way online on how to do this or how
> > easily accomplish this. Does anyone have some best practices or some
> ideas
> > about how to accomplish this?
> >
> > An example stack trace when a job is ran on the mesos cluster…
> >
> > Any idea how to get this going? Like somehow bootstrapping spark on run
> or
> > something?
> >
> > Thanks,
> > Steve
> >
> >
> > java.io.IOException: Unsupported scheme s3n for URI s3n://removed
> >         at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> >         at
> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> >         at org.apache.spark.scheduler.Task.run(Task.scala:64)
> >         at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >         at java.lang.Thread.run(Thread.java:745)
> > 15/05/12 13:57:51 ERROR Executor: Exception in task 0.1 in stage 0.0 (TID
> > 1)
> > java.lang.RuntimeException: java.io.IOException: Unsupported scheme s3n
> > for URI s3n://removed
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:307)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> >         at
> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> >         at org.apache.spark.scheduler.Task.run(Task.scala:64)
> >         at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >         at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.io.IOException: Unsupported scheme s3n for URI
> > s3n://removed
> >         at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330)
> >         at
> >
> com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304)
> >         ... 8 more
> >
> > This e-mail is intended solely for the above-mentioned recipient and it
> > may contain confidential or privileged information. If you have received
> it
> > in error, please notify us immediately and delete the e-mail. You must
> not
> > copy, distribute, disclose or take any action in reliance on it. In
> > addition, the contents of an attachment to this e-mail may contain
> software
> > viruses which could damage your own computer system. While ColdLight
> > Solutions, LLC has taken every reasonable precaution to minimize this
> risk,
> > we cannot accept liability for any damage which you sustain as a result
> of
> > software viruses. You should perform your own virus checks before opening
> > the attachment.
> >
>
> This e-mail is intended solely for the above-mentioned recipient and it
> may contain confidential or privileged information. If you have received it
> in error, please notify us immediately and delete the e-mail. You must not
> copy, distribute, disclose or take any action in reliance on it. In
> addition, the contents of an attachment to this e-mail may contain software
> viruses which could damage your own computer system. While ColdLight
> Solutions, LLC has taken every reasonable precaution to minimize this risk,
> we cannot accept liability for any damage which you sustain as a result of
> software viruses. You should perform your own virus checks before opening
> the attachment.
>



-- 
Haoyuan Li
CEO, Tachyon Nexus <http://www.tachyonnexus.com/>
AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Reply via email to