I feel like the topic changed a bit, so I changed the email subject.


Mark Schmidt wrote:

Hi Dams, I appreciate the assistance.

I was able to turn up Banana fairly easily using the steps you laid out below, but I have a few questions regarding communicating with my Riak environment.

We have a 6 Riak nodes sitting behind a proxy that handles both HTTP and protocol buffer load balancing. I’ve turned up a new standalone Riak node (hosts Banana/Nginx) outside of the cluster.

First of all, you're using a proxy. It's common practice, and should not impact your solr requests, as long as the proxy properly forward the request (riak KV, search or solr, on any port) to one of the node. I personally am not a big fan of using a proxy in front of Riak nodes, although it's not a technical issue here. Most of the time people are using proxies in front of clusters (like Riak) to have the proxy answer the question "what are the nodes that are up and running ?" instead of answering this question on the hosts initiating the request. Although it looks like a great idea and seems to simplify things, it brings issues, like network traffic concentration, or potential SPOF. To work around that, the proxy has to be made redundant, and not configured as pass-through, etc. A better approach is - imho - to use something like Zookeeper or any lighter equivalent solution to have real-time knowledge of which nodes are up and running, and share that knowledge with all the hosts that are going to use Riak. They can then do round-robin or random selection of Riak nodes among the ones that are up and running. Anyway, that's a different topic than the present email.

1)Should the Nginx config file be pointed at my HAProxy IP that handles the Riak node load balancing, or do I need to incorporate additional settings in the config to handle the 6 Riak Solr nodes?


The nginx configuration should point to your HAProxy IP. Distributed solr requests will be forwarded to the other nodes properly, as long as they are allowed to (firewal rules) on 8098 and 8093.


2)Should I use the Riak HTTP interface port (8098) or the Solr interface port (8093) in the Nginx config file?


You should use 8098 ports for all queries


3)Is there any way to perform faceted queries or other more advanced query functionality against the Riak Solr nodes? Searching through the conversation archives, it sounds like we may be able to query the Solr nodes themselves outside of the Riak API.


In a nutshell: yes! *all* Solr API is available through riak search, because the Riak API is just forwarding to solr. I recommend to *not* use any special Riak client to query Riak Search, but instead use plain Http, using any http client that your language provide. I recommend reading this page again http://docs.basho.com/riak/latest/dev/using/search/ but clicking on HTTP :)

curl "$RIAK_HOST/search/query/famous?wt=json&q=age_i:%5B30%20TO%20*%5D"

This is the example given. In my company, I've been using facets queries, stats, etc... The only limitation is the Solr version that is bundled with Riak Search (hopefully it'll be upgraded in a later release ).

So back to nginx rules:
The first rule is to allow banana to query riak search: it thinks that it's talking to a regular Solr, so you have to have a nginx rule to fix that. I think you've got that part right. In my setup I added an additional rule to allow using the solr admin web interface, whic give some interesting figures and options, and it's useful for debugging. So I added a rule to say that if it starts with "internal_solr", in this case instead of forwarding to 8098 it continues on 8093. But that's the only rule I added from the configuration I pointed to you.

Here is an example of a request that I do using Riak Search :

I do a query on a node, on port 80:
$RIAK_HOST:80/solr/query/$SOLR_INDEX?stats.field=nr_requests_count_l&stats.facet=datacenter_s&q=*:*&rows=0&stats=true&fq=epoch_l:1449480300&fq=timeseries_s:requests&wt=json&indent=false
the nginx rule ransforms that into :
$RIAK_HOST:8098/search/$SOLR_INDEX?stats.field=nr_requests_count_l&stats.facet=datacenter_s&q=*:*&rows=0&stats=true&fq=epoch_l:1449480300&fq=timeseries_s:requests&wt=json&indent=false

Disclaimer: I manually edited the request so it's probably not 100% valid, but at least you get the idea of what we can do : I'm using the solr stats features *with* facets at the same time. In this case I4m only interested by the stats (min/max/sum_of_squares/average) and not the actual results, so I set row=0.

So basically all the solr power is there :)

Hope that helps and sorry for the somewhat late answer,

dams






Thanks again Dams,

-Mark Schmidt

*From:*Damien Krotkine [mailto:dam...@krotkine.com]
*Sent:* Saturday, November 28, 2015 5:50 AM
*To:* Mark Schmidt <mschm...@orcawave.net>
*Cc:* 'riak-users' <riak-users@lists.basho.com>
*Subject:* Re: LucidWorks Banana Integration

Hi Mark,

I have successfully integrated Banana with Riak 2.0 Solr implementation. I simply configured a nginx to act as proxy between Riak Search / Solr / What banana expects. So basically:
- Install Riak 2, java, and enable Riak Search (follow basho doc)
- Install banana
- install nginx and use this as a base : https://github.com/glickbot/riak-banana/blob/b9bd32242ee0a6ee133fb2804b1976a4fcd73f82/puppet/modules/riakbanana/templates/nginx.conf.erb
- configure banana to point to the solr on your riak search.

If you need more help, feel free to ask,

dams


Mark Schmidt wrote:

    Has anyone successfully integrated Banana with the Riak 2.0 Solr
    implementation?

    Regards,

    -Mark Schmidt

    _______________________________________________

    riak-users mailing list

    riak-users@lists.basho.com  <mailto:riak-users@lists.basho.com>

    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to