Short answer: if you downloaded spark-avro from the repo.maven.apache.org
repo you might be using an old version (pre-November 14, 2014) -
see timestamps at
http://repo.maven.apache.org/maven2/com/databricks/spark-avro_2.10/0.1/
Lots of changes at https://github.com/databricks/spark-avro since then.
Databricks, thank you for sharing the Avro code!!!
Could you please push out the latest version or update the version
number and republish to repo.maven.apache.org (I have no idea how jars get
there). Or is there a different repository that users should point to for
this artifact?
Workaround: Download from https://github.com/databricks/spark-avro and build
with latest functionality (still version 0.1) and add to your local Maven
or Ivy repo.
Long version:
I used a default Maven build and declared my dependency on:
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-avro_2.10</artifactId>
<version>0.1</version>
</dependency>
Maven downloaded the 0.1 version from
http://repo.maven.apache.org/maven2/com/databricks/spark-avro_2.10/0.1/
and included it in my app code jar.
From spark-shell:
import com.databricks.spark.avro._
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
# This schema includes LONG for time in millis
(https://github.com/medale/spark-mail/blob/master/mailrecord/src/main/avro/com/uebercomputing/mailrecord/MailRecord.avdl)
val recordsSchema = sqlContext.avroFile("/opt/rpm1/enron/enron-tiny.avro")
java.lang.RuntimeException: Unsupported type LONG
However, checking out the spark-avro code from its GitHub repo and adding
a test case against the MailRecord avro everything ran fine.
So I built the databricks spark-avro locally on my box and then put it in my
local Maven repo - everything worked from spark-shell when adding that jar
as dependency.
Hope this helps for the "save" case as well. On the pre-14NOV version,
avro.scala
says:
// TODO: Implement me.
implicit class AvroSchemaRDD(schemaRDD: SchemaRDD) {
def saveAsAvroFile(path: String): Unit = ???
}
Markus
On 03/12/2015 07:05 PM, kpeng1 wrote:
Hi All,
I am current trying to write out a scheme RDD to avro. I noticed that there
is a databricks spark-avro library and I have included that in my
dependencies, but it looks like I am not able to access the AvroSaver
object. On compilation of the job I get this:
error: not found: value AvroSaver
[ERROR] AvroSaver.save(resultRDD, args(4))
I also tried calling saveAsAvro on the resultRDD(the actual rdd with the
results) and that passes compilation, but when I run the code I get an error
that says the saveAsAvro is not implemented. I am using version 0.1 of
spark-avro_2.10
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-writing-in-avro-tp22021.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org