Hi, I'm stuck with writing partitioned data to hdfs. Example below ends up with 'already exists' -error.
I'm wondering how to handle streaming use case. What is the intended way to write streaming data to hdfs? What am I missing? cheers, -jan import com.databricks.spark.avro._ import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sc) import sqlContext.implicits._ val df = Seq( (2012, 8, "Batman", 9.8), (2012, 8, "Hero", 8.7), (2012, 7, "Robot", 5.5), (2011, 7, "Git", 2.0)).toDF("year", "month", "title", "rating") df.write.partitionBy("year", "month").avro("/tmp/data") val df2 = Seq( (2012, 10, "Batman", 9.8), (2012, 10, "Hero", 8.7), (2012, 9, "Robot", 5.5), (2011, 9, "Git", 2.0)).toDF("year", "month", "title", "rating") df2.write.partitionBy("year", "month").avro("/tmp/data") --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org