Solr requests and Riak Search (was: Re: LucidWorks Banana Integration)

Damien Krotkine Mon, 07 Dec 2015 03:59:08 -0800


I feel like the topic changed a bit, so I changed the email subject.



Mark Schmidt wrote:

Hi Dams, I appreciate the assistance.
I was able to turn up Banana fairly easily using the steps you laidout below, but I have a few questions regarding communicating with myRiak environment.
We have a 6 Riak nodes sitting behind a proxy that handles both HTTPand protocol buffer load balancing. I’ve turned up a new standaloneRiak node (hosts Banana/Nginx) outside of the cluster.

First of all, you're using a proxy. It's common practice, and should notimpact your solr requests, as long as the proxy properly forward therequest (riak KV, search or solr, on any port) to one of the node. Ipersonally am not a big fan of using a proxy in front of Riak nodes,although it's not a technical issue here. Most of the time people areusing proxies in front of clusters (like Riak) to have the proxy answerthe question "what are the nodes that are up and running ?" instead ofanswering this question on the hosts initiating the request. Although itlooks like a great idea and seems to simplify things, it brings issues,like network traffic concentration, or potential SPOF. To work aroundthat, the proxy has to be made redundant, and not configured aspass-through, etc. A better approach is - imho - to use something likeZookeeper or any lighter equivalent solution to have real-time knowledgeof which nodes are up and running, and share that knowledge with all thehosts that are going to use Riak. They can then do round-robin or randomselection of Riak nodes among the ones that are up and running. Anyway,that's a different topic than the present email.

1)Should the Nginx config file be pointed at my HAProxy IP thathandles the Riak node load balancing, or do I need to incorporateadditional settings in the config to handle the 6 Riak Solr nodes?

The nginx configuration should point to your HAProxy IP. Distributedsolr requests will be forwarded to the other nodes properly, as long asthey are allowed to (firewal rules) on 8098 and 8093.

2)Should I use the Riak HTTP interface port (8098) or the Solrinterface port (8093) in the Nginx config file?


You should use 8098 ports for all queries

3)Is there any way to perform faceted queries or other more advancedquery functionality against the Riak Solr nodes? Searching through theconversation archives, it sounds like we may be able to query the Solrnodes themselves outside of the Riak API.

In a nutshell: yes! *all* Solr API is available through riak search,because the Riak API is just forwarding to solr. I recommend to *not*use any special Riak client to query Riak Search, but instead use plainHttp, using any http client that your language provide. I recommendreading this page againhttp://docs.basho.com/riak/latest/dev/using/search/ but clicking on HTTP :)


curl "$RIAK_HOST/search/query/famous?wt=json&q=age_i:%5B30%20TO%20*%5D"

This is the example given. In my company, I've been using facetsqueries, stats, etc... The only limitation is the Solr version that isbundled with Riak Search (hopefully it'll be upgraded in a later release ).


So back to nginx rules:

The first rule is to allow banana to query riak search: it thinks thatit's talking to a regular Solr, so you have to have a nginx rule to fixthat. I think you've got that part right. In my setup I added anadditional rule to allow using the solr admin web interface, whic givesome interesting figures and options, and it's useful for debugging. SoI added a rule to say that if it starts with "internal_solr", in thiscase instead of forwarding to 8098 it continues on 8093. But that's theonly rule I added from the configuration I pointed to you.


Here is an example of a request that I do using Riak Search :

I do a query on a node, on port 80:
$RIAK_HOST:80/solr/query/$SOLR_INDEX?stats.field=nr_requests_count_l&stats.facet=datacenter_s&q=*:*&rows=0&stats=true&fq=epoch_l:1449480300&fq=timeseries_s:requests&wt=json&indent=false
the nginx rule ransforms that into :
$RIAK_HOST:8098/search/$SOLR_INDEX?stats.field=nr_requests_count_l&stats.facet=datacenter_s&q=*:*&rows=0&stats=true&fq=epoch_l:1449480300&fq=timeseries_s:requests&wt=json&indent=false

Disclaimer: I manually edited the request so it's probably not 100%valid, but at least you get the idea of what we can do : I'm using thesolr stats features *with* facets at the same time. In this case I4monly interested by the stats (min/max/sum_of_squares/average) and notthe actual results, so I set row=0.


So basically all the solr power is there :)

Hope that helps and sorry for the somewhat late answer,

dams

Thanks again Dams,

-Mark Schmidt

*From:*Damien Krotkine [mailto:dam...@krotkine.com]
*Sent:* Saturday, November 28, 2015 5:50 AM
*To:* Mark Schmidt <mschm...@orcawave.net>
*Cc:* 'riak-users' <riak-users@lists.basho.com>
*Subject:* Re: LucidWorks Banana Integration

Hi Mark,

I have successfully integrated Banana with Riak 2.0 Solrimplementation. I simply configured a nginx to act as proxy betweenRiak Search / Solr / What banana expects. So basically:

- Install Riak 2, java, and enable Riak Search (follow basho doc)
- Install banana

- install nginx and use this as a base :https://github.com/glickbot/riak-banana/blob/b9bd32242ee0a6ee133fb2804b1976a4fcd73f82/puppet/modules/riakbanana/templates/nginx.conf.erb

- configure banana to point to the solr on your riak search.

If you need more help, feel free to ask,

dams


Mark Schmidt wrote:

    Has anyone successfully integrated Banana with the Riak 2.0 Solr
    implementation?

    Regards,

    -Mark Schmidt

    _______________________________________________

    riak-users mailing list

    riak-users@lists.basho.com  <mailto:riak-users@lists.basho.com>

    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Solr requests and Riak Search (was: Re: LucidWorks Banana Integration)

Reply via email to