Re: File I/O in spark

2014-09-15 Thread Frank Austin Nothaft
Kartheek, What exactly are you trying to do? Those APIs are only for local file access. If you want to access data in HDFS, you’ll want to use one of the reader methods in org.apache.spark.SparkContext which will give you an RDD (e.g., newAPIHadoopFile, sequenceFile, or textFile). If you want t

Re: File I/O in spark

2014-09-15 Thread Mohit Jaggi
If you underlying filesystem is HDFS, you need to use HDFS APIs. A google search brought up this link which appears reasonable. http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample If you want to use java.io APIs, you have to make sure your filesystem is accessible from all nodes in your clust

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
Can you please direct me to the right way of doing this. On Mon, Sep 15, 2014 at 10:18 PM, rapelly kartheek wrote: > I came across these APIs in one the scala tutorials over the net. > > On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi > wrote: > >> But the above APIs are not for HDFS. >> >> On Mo

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
I came across these APIs in one the scala tutorials over the net. On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi wrote: > But the above APIs are not for HDFS. > > On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek > wrote: > >> Yes. I have HDFS. My cluster has 5 nodes. When I run the above comman

Re: File I/O in spark

2014-09-15 Thread Mohit Jaggi
But the above APIs are not for HDFS. On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek wrote: > Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I > see that the file gets created in the master node. But, there wont be any > data written to it. > > > On Mon, Sep 15, 2014

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
The file gets created on the fly. So I dont know how to make sure that its accessible to all nodes. On Mon, Sep 15, 2014 at 10:10 PM, rapelly kartheek wrote: > Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I > see that the file gets created in the master node. But, the

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I see that the file gets created in the master node. But, there wont be any data written to it. On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi wrote: > Is this code running in an executor? You need to make sure the file is

Re: File I/O in spark

2014-09-15 Thread Mohit Jaggi
Is this code running in an executor? You need to make sure the file is accessible on ALL executors. One way to do that is to use a distributed filesystem like HDFS or GlusterFS. On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek wrote: > Hi > > I am trying to perform some read/write file operatio