I feel like the topic changed a bit, so I changed the email subject.
Mark Schmidt wrote:
Hi Dams, I appreciate the assistance.
I was able to turn up Banana fairly easily using the steps you laid
out below, but I have a few questions regarding communicating with my
Riak environment.
We have a 6 Riak nodes sitting behind a proxy that handles both HTTP
and protocol buffer load balancing. I’ve turned up a new standalone
Riak node (hosts Banana/Nginx) outside of the cluster.
First of all, you're using a proxy. It's common practice, and should not
impact your solr requests, as long as the proxy properly forward the
request (riak KV, search or solr, on any port) to one of the node. I
personally am not a big fan of using a proxy in front of Riak nodes,
although it's not a technical issue here. Most of the time people are
using proxies in front of clusters (like Riak) to have the proxy answer
the question "what are the nodes that are up and running ?" instead of
answering this question on the hosts initiating the request. Although it
looks like a great idea and seems to simplify things, it brings issues,
like network traffic concentration, or potential SPOF. To work around
that, the proxy has to be made redundant, and not configured as
pass-through, etc. A better approach is - imho - to use something like
Zookeeper or any lighter equivalent solution to have real-time knowledge
of which nodes are up and running, and share that knowledge with all the
hosts that are going to use Riak. They can then do round-robin or random
selection of Riak nodes among the ones that are up and running. Anyway,
that's a different topic than the present email.
1)Should the Nginx config file be pointed at my HAProxy IP that
handles the Riak node load balancing, or do I need to incorporate
additional settings in the config to handle the 6 Riak Solr nodes?
The nginx configuration should point to your HAProxy IP. Distributed
solr requests will be forwarded to the other nodes properly, as long as
they are allowed to (firewal rules) on 8098 and 8093.
2)Should I use the Riak HTTP interface port (8098) or the Solr
interface port (8093) in the Nginx config file?
You should use 8098 ports for all queries
3)Is there any way to perform faceted queries or other more advanced
query functionality against the Riak Solr nodes? Searching through the
conversation archives, it sounds like we may be able to query the Solr
nodes themselves outside of the Riak API.
In a nutshell: yes! *all* Solr API is available through riak search,
because the Riak API is just forwarding to solr. I recommend to *not*
use any special Riak client to query Riak Search, but instead use plain
Http, using any http client that your language provide. I recommend
reading this page again
http://docs.basho.com/riak/latest/dev/using/search/ but clicking on HTTP :)
curl "$RIAK_HOST/search/query/famous?wt=json&q=age_i:%5B30%20TO%20*%5D"
This is the example given. In my company, I've been using facets
queries, stats, etc... The only limitation is the Solr version that is
bundled with Riak Search (hopefully it'll be upgraded in a later release ).
So back to nginx rules:
The first rule is to allow banana to query riak search: it thinks that
it's talking to a regular Solr, so you have to have a nginx rule to fix
that. I think you've got that part right. In my setup I added an
additional rule to allow using the solr admin web interface, whic give
some interesting figures and options, and it's useful for debugging. So
I added a rule to say that if it starts with "internal_solr", in this
case instead of forwarding to 8098 it continues on 8093. But that's the
only rule I added from the configuration I pointed to you.
Here is an example of a request that I do using Riak Search :
I do a query on a node, on port 80:
$RIAK_HOST:80/solr/query/$SOLR_INDEX?stats.field=nr_requests_count_l&stats.facet=datacenter_s&q=*:*&rows=0&stats=true&fq=epoch_l:1449480300&fq=timeseries_s:requests&wt=json&indent=false
the nginx rule ransforms that into :
$RIAK_HOST:8098/search/$SOLR_INDEX?stats.field=nr_requests_count_l&stats.facet=datacenter_s&q=*:*&rows=0&stats=true&fq=epoch_l:1449480300&fq=timeseries_s:requests&wt=json&indent=false
Disclaimer: I manually edited the request so it's probably not 100%
valid, but at least you get the idea of what we can do : I'm using the
solr stats features *with* facets at the same time. In this case I4m
only interested by the stats (min/max/sum_of_squares/average) and not
the actual results, so I set row=0.
So basically all the solr power is there :)
Hope that helps and sorry for the somewhat late answer,
dams
Thanks again Dams,
-Mark Schmidt
*From:*Damien Krotkine [mailto:dam...@krotkine.com]
*Sent:* Saturday, November 28, 2015 5:50 AM
*To:* Mark Schmidt <mschm...@orcawave.net>
*Cc:* 'riak-users' <riak-users@lists.basho.com>
*Subject:* Re: LucidWorks Banana Integration
Hi Mark,
I have successfully integrated Banana with Riak 2.0 Solr
implementation. I simply configured a nginx to act as proxy between
Riak Search / Solr / What banana expects. So basically:
- Install Riak 2, java, and enable Riak Search (follow basho doc)
- Install banana
- install nginx and use this as a base :
https://github.com/glickbot/riak-banana/blob/b9bd32242ee0a6ee133fb2804b1976a4fcd73f82/puppet/modules/riakbanana/templates/nginx.conf.erb
- configure banana to point to the solr on your riak search.
If you need more help, feel free to ask,
dams
Mark Schmidt wrote:
Has anyone successfully integrated Banana with the Riak 2.0 Solr
implementation?
Regards,
-Mark Schmidt
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com