I just noticed about this one https://issues.apache.org/jira/browse/SPARK-6351 https://github.com/apache/spark/pull/5039
I verified it and this resolves my issues with Parquet and swift:// name space. From: Gil Vernik/Haifa/IBM@IBMIL To: dev <dev@spark.apache.org> Date: 16/03/2015 02:11 PM Subject: problems with Parquet in Spark 1.3.0 Hi, I am storing Parquet files in the OpenStack Swift and access those files from Spark. This works perfectly in Spark prior 1.3.0, but in 1.3.0 I am getting this error: Is there some configuration i missed? I am not sure where this error get from, does Spark 1.3.0 requires Parquet files to be accessed via "file://" ? I will be glad to dig into this in case it's a bug, but would like to know if this is something intentionally in Spark 1.3.0 ( I do can access swift:// names pace from SparkContext, only sqlContext has this issue ) Thanks, Gil Vernik. scala> val parquetFile = sqlContext.parquetFile("swift://ptest.localSwift12/SF311new3.parquet") java.lang.IllegalArgumentException: Wrong FS: swift://ptest.localSwift12/SF311new3.parquet, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465) at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:251) at org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:370) at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:522) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) at $iwC$$iwC$$iwC.<init>(<console>:32) at $iwC$$iwC.<init>(<console>:34) at $iwC.<init>(<console>:36) at <init>(<console>:38) at .<init>(<console>:42) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>)