Re: Spark GraphFrame ConnectedComponents

Ankur Srivastava Wed, 04 Jan 2017 20:05:31 -0800

Hi

I am rerunning the pipeline to generate the exact trace, I have below part
of trace from last run:


Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
s3n://<folder-path>, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
at
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:516)
at
org.apache.hadoop.fs.ChecksumFileSystem.delete(ChecksumFileSystem.java:528)

Also I think the error is happening in this part of the code
"ConnectedComponents.scala:339" I am referring the code @
https://github.com/graphframes/graphframes/blob/master/src/main/scala/org/graphframes/lib/ConnectedComponents.scala

      if (shouldCheckpoint && (iteration % checkpointInterval == 0)) {
        // TODO: remove this after DataFrame.checkpoint is implemented
        val out = s"${checkpointDir.get}/$iteration"
        ee.write.parquet(out)
        // may hit S3 eventually consistent issue
        ee = sqlContext.read.parquet(out)

        // remove previous checkpoint
        if (iteration > checkpointInterval) {
          *FileSystem.get(sc.hadoopConfiguration)*
*            .delete(new Path(s"${checkpointDir.get}/${iteration -
checkpointInterval}"), true)*
        }

        System.gc() // hint Spark to clean shuffle directories
      }


Thanks
Ankur

On Wed, Jan 4, 2017 at 5:19 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Do you have more of the exception stack?
>
>
> ------------------------------
> *From:* Ankur Srivastava <ankur.srivast...@gmail.com>
> *Sent:* Wednesday, January 4, 2017 4:40:02 PM
> *To:* user@spark.apache.org
> *Subject:* Spark GraphFrame ConnectedComponents
>
> Hi,
>
> I am trying to use the ConnectedComponent algorithm of GraphFrames but by
> default it needs a checkpoint directory. As I am running my spark cluster
> with S3 as the DFS and do not have access to HDFS file system I tried using
> a s3 directory as checkpoint directory but I run into below exception:
>
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
> s3n://<folder-path>, expected: file:///
>
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
>
> at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(
> RawLocalFileSystem.java:69)
>
> If I set checkpoint interval to -1 to avoid checkpointing the driver just
> hangs after 3 or 4 iterations.
>
> Is there some way I can set the default FileSystem to S3 for Spark or any
> other option?
>
> Thanks
> Ankur
>
>

Re: Spark GraphFrame ConnectedComponents

Reply via email to