Burak's suggestion works :
read.df(sqlContext,
"file:///home/matmsh/myfile.avro","com.databricks.spark.avro" )
Maybe the above should be added to
SparkR (R on Spark) - Spark 1.4.0 Documentation
| |
| | | | | |
| SparkR (R on Spark) - Spark 1.4.0 DocumentationSparkR (R on Spark) Overview
SparkR DataFrames Starting Up: SparkContext, SQLContext Creating DataFrames
From local data frames From Data Sources From Hive tables DataFrame Operations
|
| |
| View on spark.apache.org | Preview by Yahoo |
| |
| |
Thanks!Shing
On Sunday, 14 June 2015, 3:07, Shivaram Venkataraman
<[email protected]> wrote:
Yep - Burak's answer should work. FWIW the error message from the stack trace
that shows this is the line "Failed to load class for data source: avro"
ThanksShivaram
On Sat, Jun 13, 2015 at 6:13 PM, Burak Yavuz <[email protected]> wrote:
Hi,
Not sure if this is it, but could you please try "com.databricks.spark.avro"
instead of just "avro".Thanks,
BurakOn Jun 13, 2015 9:55 AM, "Shing Hing Man" <[email protected]> wrote:
Hi, I am trying to read a avro file in SparkR (in Spark 1.4.0).
I started R using the following.
matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0
Inside the R shell, when I issue the following,
> read.df(sqlContext, "file:///home/matmsh/myfile.avro","avro")
I get the following exception.
Caused by: java.lang.RuntimeException: Failed to load class for data source:
avro
Below is the stack trace.
matmsh@gauss:~$ sparkR --packages com.databricks:spark-avro_2.10:1.0.0
R version 3.2.0 (2015-04-16) -- "Full of Ingredients"Copyright (C) 2015 The R
Foundation for Statistical ComputingPlatform: x86_64-suse-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to
redistribute it under certain conditions.Type 'license()' or 'licence()' for
distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.Type 'contributors()' for
more information and'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for
an HTML browser interface to help.Type 'q()' to quit R.
Launching java with spark-submit command
/home/matmsh/installed/spark/bin/spark-submit "--packages"
"com.databricks:spark-avro_2.10:1.0.0" "sparkr-shell"
/tmp/RtmpoT7FrF/backend_port464e1e2fb16a Ivy Default Cache set to:
/home/matmsh/.ivy2/cacheThe jars for the packages stored in:
/home/matmsh/.ivy2/jars:: loading settings :: url =
jar:file:/home/matmsh/installed/sparks/spark-1.4.0-bin-hadoop2.3/lib/spark-assembly-1.4.0-hadoop2.3.0.jar!/org/apache/ivy/core/settings/ivysettings.xmlcom.databricks#spark-avro_2.10
added as a dependency:: resolving dependencies ::
org.apache.spark#spark-submit-parent;1.0 confs: [default] found
com.databricks#spark-avro_2.10;1.0.0 in list found org.apache.avro#avro;1.7.6
in local-m2-cache found org.codehaus.jackson#jackson-core-asl;1.9.13 in list
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in list found
com.thoughtworks.paranamer#paranamer;2.3 in list found
org.xerial.snappy#snappy-java;1.0.5 in list found
org.apache.commons#commons-compress;1.4.1 in list found org.tukaani#xz;1.0 in
list found org.slf4j#slf4j-api;1.6.4 in list:: resolution report :: resolve
421ms :: artifacts dl 16ms :: modules in use:
com.databricks#spark-avro_2.10;1.0.0 from list in [default]
com.thoughtworks.paranamer#paranamer;2.3 from list in [default]
org.apache.avro#avro;1.7.6 from local-m2-cache in [default]
org.apache.commons#commons-compress;1.4.1 from list in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from list in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from list in [default]
org.slf4j#slf4j-api;1.6.4 from list in [default] org.tukaani#xz;1.0 from list
in [default] org.xerial.snappy#snappy-java;1.0.5 from list in [default]
--------------------------------------------------------------------- | |
modules || artifacts | | conf | number| search|dwnlded|evicted||
number|dwnlded|
--------------------------------------------------------------------- | default
| 9 | 0 | 0 | 0 || 9 | 0 |
---------------------------------------------------------------------::
retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts
copied, 9 already retrieved (0kB/9ms)15/06/13 17:37:42 INFO spark.SparkContext:
Running Spark version 1.4.015/06/13 17:37:42 WARN util.NativeCodeLoader: Unable
to load native-hadoop library for your platform... using builtin-java classes
where applicable15/06/13 17:37:42 WARN util.Utils: Your hostname, gauss
resolves to a loopback address: 127.0.0.1; using 192.168.0.10 instead (on
interface enp3s0)15/06/13 17:37:42 WARN util.Utils: Set SPARK_LOCAL_IP if you
need to bind to another address15/06/13 17:37:42 INFO spark.SecurityManager:
Changing view acls to: matmsh15/06/13 17:37:42 INFO spark.SecurityManager:
Changing modify acls to: matmsh15/06/13 17:37:42 INFO spark.SecurityManager:
SecurityManager: authentication disabled; ui acls disabled; users with view
permissions: Set(matmsh); users with modify permissions: Set(matmsh)15/06/13
17:37:43 INFO slf4j.Slf4jLogger: Slf4jLogger started15/06/13 17:37:43 INFO
Remoting: Starting remoting15/06/13 17:37:43 INFO Remoting: Remoting started;
listening on addresses :[akka.tcp://[email protected]:46219]15/06/13
17:37:43 INFO util.Utils: Successfully started service 'sparkDriver' on port
46219.15/06/13 17:37:43 INFO spark.SparkEnv: Registering
MapOutputTracker15/06/13 17:37:43 INFO spark.SparkEnv: Registering
BlockManagerMaster15/06/13 17:37:43 INFO storage.DiskBlockManager: Created
local directory at
/tmp/spark-c8661016-d922-4ad3-a171-7b0f719c40a2/blockmgr-e79853e5-e046-4b13-a3ba-0b4c4683146115/06/13
17:37:43 INFO storage.MemoryStore: MemoryStore started with capacity 265.4
MB15/06/13 17:37:43 INFO spark.HttpFileServer: HTTP File server directory is
/tmp/spark-c8661016-d922-4ad3-a171-7b0f719c40a2/httpd-0f11e45e-08fe-40b1-8bf9-21de1dd472b715/06/13
17:37:43 INFO spark.HttpServer: Starting HTTP Server15/06/13 17:37:43 INFO
server.Server: jetty-8.y.z-SNAPSHOT15/06/13 17:37:43 INFO
server.AbstractConnector: Started [email protected]:4850715/06/13
17:37:43 INFO util.Utils: Successfully started service 'HTTP file server' on
port 48507.15/06/13 17:37:43 INFO spark.SparkEnv: Registering
OutputCommitCoordinator15/06/13 17:37:43 INFO server.Server:
jetty-8.y.z-SNAPSHOT15/06/13 17:37:43 INFO server.AbstractConnector: Started
[email protected]:404015/06/13 17:37:43 INFO util.Utils:
Successfully started service 'SparkUI' on port 4040.15/06/13 17:37:43 INFO
ui.SparkUI: Started SparkUI at http://192.168.0.10:404015/06/13 17:37:43 INFO
spark.SparkContext: Added JAR
file:/home/matmsh/.ivy2/jars/com.databricks_spark-avro_2.10-1.0.0.jar at
http://192.168.0.10:48507/jars/com.databricks_spark-avro_2.10-1.0.0.jar with
timestamp 143421346362615/06/13 17:37:43 INFO spark.SparkContext: Added JAR
file:/home/matmsh/.ivy2/jars/org.apache.avro_avro-1.7.6.jar at
http://192.168.0.10:48507/jars/org.apache.avro_avro-1.7.6.jar with timestamp
143421346362715/06/13 17:37:43 INFO spark.SparkContext: Added JAR
file:/home/matmsh/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar
at
http://192.168.0.10:48507/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar
with timestamp 143421346362715/06/13 17:37:43 INFO spark.SparkContext: Added
JAR
file:/home/matmsh/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
at
http://192.168.0.10:48507/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
with timestamp 143421346362815/06/13 17:37:43 INFO spark.SparkContext: Added
JAR file:/home/matmsh/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar
at http://192.168.0.10:48507/jars/com.thoughtworks.paranamer_paranamer-2.3.jar
with timestamp 143421346362815/06/13 17:37:43 INFO spark.SparkContext: Added
JAR file:/home/matmsh/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar at
http://192.168.0.10:48507/jars/org.xerial.snappy_snappy-java-1.0.5.jar with
timestamp 143421346363015/06/13 17:37:43 INFO spark.SparkContext: Added JAR
file:/home/matmsh/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar at
http://192.168.0.10:48507/jars/org.apache.commons_commons-compress-1.4.1.jar
with timestamp 143421346363015/06/13 17:37:43 INFO spark.SparkContext: Added
JAR file:/home/matmsh/.ivy2/jars/org.slf4j_slf4j-api-1.6.4.jar at
http://192.168.0.10:48507/jars/org.slf4j_slf4j-api-1.6.4.jar with timestamp
143421346363015/06/13 17:37:43 INFO spark.SparkContext: Added JAR
file:/home/matmsh/.ivy2/jars/org.tukaani_xz-1.0.jar at
http://192.168.0.10:48507/jars/org.tukaani_xz-1.0.jar with timestamp
143421346363015/06/13 17:37:43 INFO executor.Executor: Starting executor ID
driver on host localhost15/06/13 17:37:43 INFO util.Utils: Successfully started
service 'org.apache.spark.network.netty.NettyBlockTransferService' on port
55381.15/06/13 17:37:43 INFO netty.NettyBlockTransferService: Server created on
5538115/06/13 17:37:43 INFO storage.BlockManagerMaster: Trying to register
BlockManager15/06/13 17:37:43 INFO storage.BlockManagerMasterEndpoint:
Registering block manager localhost:55381 with 265.4 MB RAM,
BlockManagerId(driver, localhost, 55381)15/06/13 17:37:43 INFO
storage.BlockManagerMaster: Registered BlockManager
Welcome to SparkR! Spark context is available as sc, SQL context is available
as sqlContext> read.df(sqlContext,
"file:///home/matmsh/myfile.avro","avro")15/06/13 17:38:53 ERROR
r.RBackendHandler: load on 1 failedjava.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:127)
at
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:74)
at
org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:36)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.RuntimeException:
Failed to load class for data source: avro at
scala.sys.package$.error(package.scala:27) at
org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:216)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229) at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) at
org.apache.spark.sql.SQLContext.load(SQLContext.scala:1230) ... 25 moreError:
returnStatus == 0 is not TRUE
Thanks in advance for any assistance!Shing