RIAK TS installed nodes not connecting

2016-09-13 Thread Agtmaal, Joris van
Hi

I'm new to Riak and followed the installation instructions to get it working on 
an AWS cluster (3 nodes).

So far ive been able to use Riak in pyspark (zeppelin) to create/read/write 
tables, but i would like to use the dataframes directly from spark, using the 
Spark-Riak Connector.
When following the example found here: 
http://docs.basho.com/riak/ts/1.4.0/add-ons/spark-riak-connector/quick-start/#python
But i run into trouble on this last part:

host= my_ip_adress_of_riak_node
pb_port = '8087'
hostAndPort = ":".join([host, pb_port])
client = riak.RiakClient(host=host, pb_port=pb_port)

df.write \
.format('org.apache.spark.sql.riak') \
.option('spark.riak.connection.host', hostAndPort) \
.mode('Append') \
.save('test')

Important to note that i'm using a local download of the Jar file that is 
loaded into the pyspark interpreter in zeppeling through:
%dep
z.reset()
z.load("/home/hadoop/spark-riak-connector_2.10-1.6.0.jar")

Here is the error message i get back:
Py4JJavaError: An error occurred while calling o569.save. : 
java.lang.NoClassDefFoundError: com/basho/riak/client/core/util/HostAndPort at 
com.basho.riak.spark.rdd.connector.RiakConnectorConf$.apply(RiakConnectorConf.scala:76)
 at 
com.basho.riak.spark.rdd.connector.RiakConnectorConf$.apply(RiakConnectorConf.scala:89)
 at org.apache.spark.sql.riak.RiakRelation$.apply(RiakRelation.scala:115) at 
org.apache.spark.sql.riak.DefaultSource.createRelation(DefaultSource.scala:51) 
at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at 
py4j.Gateway.invoke(Gateway.java:259) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:209) at 
java.lang.Thread.run(Thread.java:745) (, 
Py4JJavaError(u'An error occurred while calling o569.save.\n', JavaObject 
id=o570), )

Hope somebody can help out.
thanks, joris
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak 2.1.3 - Multiple indexes created by Solr for the same Riak object

2016-09-13 Thread Magnus Kessler
On 11 September 2016 at 02:27, Weixi Yen  wrote:

> Sort of a unique case, my app was under heavy stress and one of my riak
> nodes got backed up (other 4 nodes were fine).
>
> I think this caused Riak.update to create an extra index in Solr for the
> same object when users began running .update on that object.
>

Hi Weixi,

Can you please confirm what you mean by "extra index"? Do you mean that an
object was indexed more than once and gets counted / returned by Solr
queries? If that's the case, can you please let me know how you query Solr?



>
> I have basically 2 questions:
>
> 1) Is what I'm describing something that is possible?
>

Riak/Yokozuna indexes each replica of a Riak object into Solr. With the
default n_val of 3, there will be 3 copies of any given object indexed in
Solr. Depending on the version of Riak you are using, it's also possible
that siblings of Riak objects get indexed independently. So yes, it is
possible to find several additional objects in Solr for each KV object.
When querying Solr through Riak/Yokozuna, the internal queries are
structured in a way that only one replica is returned. Quering Solr nodes
directly will typically lack these filters and may return more than one
copy of an object.


>
> 2) Is there a way to tell Solr to re-index one single item and get rid of
> all other indexes of that item?
>

You can perform a GET/PUT cycle through Riak KV on an object. This will
result in n_val copies of the objects across the Solr instances, that
replace previous versions. It is not possible to have just 1 copy, unless
the n_val for the object is exactly 1. AFAIK, there have been some fixes to
Yokozuna in 2.0.7 and the upcoming 2.2 release that deal better with
indexed siblings. Discrepancies between KV objects and their Solr
counterparts should be detected and resolved by active anti-entropy (AAE).


>
> Considering RiakTS to resolve these issues long term, but have to stick
> with Solr for at least the next 3 months, would appreciate any insight into
> how to solve this duplicate index problem.
>
> Thanks,
>
> Weixi
>
>
Regards,

Magnus

-- 
Magnus Kessler
Client Services Engineer
Basho Technologies Limited

Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: RIAK TS installed nodes not connecting

2016-09-13 Thread Stephen Etheridge
Hi Joris,

I have looked at the tutorial you have been following but I confess I am
confused.  In the example you are following I do not see where the spark
and sql contexts are created.  I use PySpark through the Jupyter notebook
and I have to specify a path to the connector on invoking the jupyter
notebook. Is it possible for you to share all your code (and how you are
invoking zeppelin) with me so I can trace everything through?

regards
Stephen

On Mon, Sep 12, 2016 at 3:27 PM, Agtmaal, Joris van <
joris.vanagtm...@wartsila.com> wrote:

> Hi
>
>
>
> I’m new to Riak and followed the installation instructions to get it
> working on an AWS cluster (3 nodes).
>
>
>
> So far ive been able to use Riak in pyspark (zeppelin) to
> create/read/write tables, but i would like to use the dataframes directly
> from spark, using the Spark-Riak Connector.
>
> When following the example found here: http://docs.basho.com/riak/ts/
> 1.4.0/add-ons/spark-riak-connector/quick-start/#python
>
> But i run into trouble on this last part:
>
>
>
> host= my_ip_adress_of_riak_node
>
> pb_port = '8087'
>
> hostAndPort = ":".join([host, pb_port])
>
> client = riak.RiakClient(host=host, pb_port=pb_port)
>
>
>
> df.write \
>
> .format('org.apache.spark.sql.riak') \
>
> .option('spark.riak.connection.host', hostAndPort) \
>
> .mode('Append') \
>
> .save('test')
>
>
>
> Important to note that i’m using a local download of the Jar file that is
> loaded into the pyspark interpreter in zeppeling through:
>
> %dep
>
> z.reset()
>
> z.load("/home/hadoop/spark-riak-connector_2.10-1.6.0.jar")
>
>
>
> Here is the error message i get back:
>
> Py4JJavaError: An error occurred while calling o569.save. : 
> java.lang.NoClassDefFoundError:
> com/basho/riak/client/core/util/HostAndPort at com.basho.riak.spark.rdd.
> connector.RiakConnectorConf$.apply(RiakConnectorConf.scala:76) at
> com.basho.riak.spark.rdd.connector.RiakConnectorConf$.
> apply(RiakConnectorConf.scala:89) at org.apache.spark.sql.riak.
> RiakRelation$.apply(RiakRelation.scala:115) at org.apache.spark.sql.riak.
> DefaultSource.createRelation(DefaultSource.scala:51) at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43) at 
> java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at
> py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.
> invokeMethod(AbstractCommand.java:133) at 
> py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:209) at
> java.lang.Thread.run(Thread.java:745) ( 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while
> calling o569.save.\n', JavaObject id=o570),  0x7f7021bb0200>)
>
>
>
> Hope somebody can help out.
>
> thanks, joris
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
{ "name" : "Stephen Etheridge",
   "title" : "Solution Architect, EMEA",
   "Organisation" : "Basho Technologies, Inc",
   "Telephone" : "07814 406662",
   "email" : "mailto:setheri...@basho.com";,
   "github" : "http://github.com/datalemming";,
   "twitter" : "@datalemming"}
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: RIAK TS installed nodes not connecting

2016-09-13 Thread Alex Moore
Joris,

One thing to check - since you are using a downloaded jar, are you using
the Uber jar that contains all the dependencies?
http://search.maven.org/remotecontent?filepath=com/basho/riak/spark-riak-connector_2.10/1.6.0/spark-riak-connector_2.10-1.6.0-uber.jar

Thanks,
Alex

On Tue, Sep 13, 2016 at 8:44 AM, Stephen Etheridge 
wrote:

> Hi Joris,
>
> I have looked at the tutorial you have been following but I confess I am
> confused.  In the example you are following I do not see where the spark
> and sql contexts are created.  I use PySpark through the Jupyter notebook
> and I have to specify a path to the connector on invoking the jupyter
> notebook. Is it possible for you to share all your code (and how you are
> invoking zeppelin) with me so I can trace everything through?
>
> regards
> Stephen
>
> On Mon, Sep 12, 2016 at 3:27 PM, Agtmaal, Joris van <
> joris.vanagtm...@wartsila.com> wrote:
>
>> Hi
>>
>>
>>
>> I’m new to Riak and followed the installation instructions to get it
>> working on an AWS cluster (3 nodes).
>>
>>
>>
>> So far ive been able to use Riak in pyspark (zeppelin) to
>> create/read/write tables, but i would like to use the dataframes directly
>> from spark, using the Spark-Riak Connector.
>>
>> When following the example found here: http://docs.basho.com/riak/ts/
>> 1.4.0/add-ons/spark-riak-connector/quick-start/#python
>>
>> But i run into trouble on this last part:
>>
>>
>>
>> host= my_ip_adress_of_riak_node
>>
>> pb_port = '8087'
>>
>> hostAndPort = ":".join([host, pb_port])
>>
>> client = riak.RiakClient(host=host, pb_port=pb_port)
>>
>>
>>
>> df.write \
>>
>> .format('org.apache.spark.sql.riak') \
>>
>> .option('spark.riak.connection.host', hostAndPort) \
>>
>> .mode('Append') \
>>
>> .save('test')
>>
>>
>>
>> Important to note that i’m using a local download of the Jar file that is
>> loaded into the pyspark interpreter in zeppeling through:
>>
>> %dep
>>
>> z.reset()
>>
>> z.load("/home/hadoop/spark-riak-connector_2.10-1.6.0.jar")
>>
>>
>>
>> Here is the error message i get back:
>>
>> Py4JJavaError: An error occurred while calling o569.save. :
>> java.lang.NoClassDefFoundError: com/basho/riak/client/core/util/HostAndPort
>> at 
>> com.basho.riak.spark.rdd.connector.RiakConnectorConf$.apply(RiakConnectorConf.scala:76)
>> at 
>> com.basho.riak.spark.rdd.connector.RiakConnectorConf$.apply(RiakConnectorConf.scala:89)
>> at org.apache.spark.sql.riak.RiakRelation$.apply(RiakRelation.scala:115)
>> at 
>> org.apache.spark.sql.riak.DefaultSource.createRelation(DefaultSource.scala:51)
>> at org.apache.spark.sql.execution.datasources.ResolvedDataSourc
>> e$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrame
>> Writer.save(DataFrameWriter.scala:148) at org.apache.spark.sql.DataFrame
>> Writer.save(DataFrameWriter.scala:139) at 
>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method) at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606) at
>> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at
>> py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.
>> invokeMethod(AbstractCommand.java:133) at 
>> py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:209) at
>> java.lang.Thread.run(Thread.java:745) (> 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while
>> calling o569.save.\n', JavaObject id=o570), > 0x7f7021bb0200>)
>>
>>
>>
>> Hope somebody can help out.
>>
>> thanks, joris
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
> --
> { "name" : "Stephen Etheridge",
>"title" : "Solution Architect, EMEA",
>"Organisation" : "Basho Technologies, Inc",
>"Telephone" : "07814 406662",
>"email" : "mailto:setheri...@basho.com";,
>"github" : "http://github.com/datalemming";,
>"twitter" : "@datalemming"}
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com