reading the parquet file in spark sql

Angel Angel Sun, 06 Mar 2016 21:48:07 -0800

Hello Sir/Madam,

I am running one spark application having 3 slaves and one master.


I am wring the my information using the parquet format.

but when i am trying to read it got some error.
Please help me to resolve this problem.

code ;



val sqlContext = new org.apache.spark.sql.SQLContext(sc)


import sqlContext.implicits._


case class Table(Address: String, Couple_time: Int, WT_ID: Int, WT_Name:
String)


val df2 = sc.textFile("/root/Desktop/database.txt").map(_.split(",")).map(p
=> Table(p(0),p(1).trim.toInt, p(2).trim.toInt, p(3)))toDF


df2.write.parquet("Desktop/database2.parquet")




After that on master computer there is on folder name database2 have the
_success file

and on my slaves

 has the following tree

database2.parquet
└── _temporary
    └── 0
        ├── task_201603071435_0000_m_000001
        │   └── part-r-00002.gz.parquet
        ├── task_201603071435_0000_m_000004
        │   └── part-r-00005.gz.parquet
        └── _temporary



But when i am trying to read this file using following command i get the
error


val df1 = sqlContext.read.parquet("Desktop/database2.parquet")



error


ava.lang.AssertionError: assertion failed: No schema defined, and no
Parquet data file or summary file found under file:/root/database2.parquet.

at scala.Predef$.assert(Predef.scala:179)

at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.org$apache$spark$sql$parquet$ParquetRelation2$MetadataCache$$readSchema(newParquet.scala:443)

at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$15.apply(newParquet.scala:385)

at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$15.apply(newParquet.scala:385)

at scala.Option.orElse(Option.scala:257)

at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:385)

at org.apache.spark.sql.parquet.ParquetRelation2.org
$apache$spark$sql$parquet$ParquetRelation2$$metadataCache$lzycompute(newParquet.scala:154)

at org.apache.spark.sql.parquet.ParquetRelation2.org
$apache$spark$sql$parquet$ParquetRelation2$$metadataCache(newParquet.scala:152)

at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$dataSchema$1.apply(newParquet.scala:193)

at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$dataSchema$1.apply(newParquet.scala:193)

at scala.Option.getOrElse(Option.scala:120)

at
org.apache.spark.sql.parquet.ParquetRelation2.dataSchema(newParquet.scala:193)

at org.apache.spark.sql.sources.HadoopFsRel




Thanks.

reading the parquet file in spark sql

Reply via email to