Hi Joris, I have looked at the tutorial you have been following but I confess I am confused. In the example you are following I do not see where the spark and sql contexts are created. I use PySpark through the Jupyter notebook and I have to specify a path to the connector on invoking the jupyter notebook. Is it possible for you to share all your code (and how you are invoking zeppelin) with me so I can trace everything through?
regards Stephen On Mon, Sep 12, 2016 at 3:27 PM, Agtmaal, Joris van < joris.vanagtm...@wartsila.com> wrote: > Hi > > > > I’m new to Riak and followed the installation instructions to get it > working on an AWS cluster (3 nodes). > > > > So far ive been able to use Riak in pyspark (zeppelin) to > create/read/write tables, but i would like to use the dataframes directly > from spark, using the Spark-Riak Connector. > > When following the example found here: http://docs.basho.com/riak/ts/ > 1.4.0/add-ons/spark-riak-connector/quick-start/#python > > But i run into trouble on this last part: > > > > host= my_ip_adress_of_riak_node > > pb_port = '8087' > > hostAndPort = ":".join([host, pb_port]) > > client = riak.RiakClient(host=host, pb_port=pb_port) > > > > df.write \ > > .format('org.apache.spark.sql.riak') \ > > .option('spark.riak.connection.host', hostAndPort) \ > > .mode('Append') \ > > .save('test') > > > > Important to note that i’m using a local download of the Jar file that is > loaded into the pyspark interpreter in zeppeling through: > > %dep > > z.reset() > > z.load("/home/hadoop/spark-riak-connector_2.10-1.6.0.jar") > > > > Here is the error message i get back: > > Py4JJavaError: An error occurred while calling o569.save. : > java.lang.NoClassDefFoundError: > com/basho/riak/client/core/util/HostAndPort at com.basho.riak.spark.rdd. > connector.RiakConnectorConf$.apply(RiakConnectorConf.scala:76) at > com.basho.riak.spark.rdd.connector.RiakConnectorConf$. > apply(RiakConnectorConf.scala:89) at org.apache.spark.sql.riak. > RiakRelation$.apply(RiakRelation.scala:115) at org.apache.spark.sql.riak. > DefaultSource.createRelation(DefaultSource.scala:51) at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) at > java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at > py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand. > invokeMethod(AbstractCommand.java:133) at > py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:209) at > java.lang.Thread.run(Thread.java:745) (<class > 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while > calling o569.save.\n', JavaObject id=o570), <traceback object at > 0x7f7021bb0200>) > > > > Hope somebody can help out. > > thanks, joris > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- { "name" : "Stephen Etheridge", "title" : "Solution Architect, EMEA", "Organisation" : "Basho Technologies, Inc", "Telephone" : "07814 406662", "email" : "mailto:setheri...@basho.com", "github" : "http://github.com/datalemming", "twitter" : "@datalemming"}
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com