Writing partitioned Avro data to HDFS

Jan Holmberg Tue, 22 Dec 2015 01:02:11 -0800

Hi,
I'm stuck with writing partitioned data to hdfs. Example below ends up with 
'already exists' -error.


I'm wondering how to handle streaming use case. 

What is the intended way to write streaming data to hdfs? What am I missing?

cheers,
-jan


import com.databricks.spark.avro._

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)

import sqlContext.implicits._

val df = Seq(
(2012, 8, "Batman", 9.8),
(2012, 8, "Hero", 8.7),
(2012, 7, "Robot", 5.5),
(2011, 7, "Git", 2.0)).toDF("year", "month", "title", "rating")

df.write.partitionBy("year", "month").avro("/tmp/data")

val df2 = Seq(
(2012, 10, "Batman", 9.8),
(2012, 10, "Hero", 8.7),
(2012, 9, "Robot", 5.5),
(2011, 9, "Git", 2.0)).toDF("year", "month", "title", "rating")

df2.write.partitionBy("year", "month").avro("/tmp/data")
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Writing partitioned Avro data to HDFS

Reply via email to