We worked through this on IRC, but to summarize:

It looks like the 403's were caused by a misformatted curl command. The
steps below make it work:
# Install the bucket hook.
bin/search-cmd install my_bucket

# Index the document in the bucket. Assumes the XML document is in
document.xml

curl -X POST -H "content-type:application/xml"
http://localhost:8098/riak/my_bucket/my_object --data-binary @document.xml


# Run the query.

bin/search-cmd search my_bucket "Event_EntityRef:nur"


Regarding the perceived query wackiness, this was caused because the
document had been indexed with content-type text/plain, ignoring the XML
structure. As a result, the entire document was tokenized and stuffed into
the default field (the "value" field).


Further compounding the issue is the fact that the default analyzer
tokenizes by 1) splitting on any whitespace or punctuation, 2) lowercasing
all tokens, and 3) throwing out any tokens smaller than three characters.
This has been a repeated cause of confusion, and as a result the next
version of Search will use the Whitespace analyzer by default. (The
whitespace analyzer simply tokenizes based on spaces/tabs/newlines but
leaves everything else alone.)


Best,

Rusty


On Wed, Dec 29, 2010 at 10:01 AM, Sven Johansson
<johansson.s...@gmail.com>wrote:

> I am putting together an application prototype for a client that needs to
> store vast amounts of historical data.
> The data is supposed to be delivered to this application as XML, and these
> documents needs to be searchable
> based on certain field values in these documents.
>
> After reading up on Riak, and riak-search more specifically, I found that
> it seemed like a very good match given
> that it is supposed to be able to understand and index XML documents out of
> the box.
> (In the end, I'm expecting to find that it's a more efficient solution to
> using a custom index & analyzer factory, but
> from what I can make of the docs in the Basho wiki the out of the
> box-behaviour should suffice for the prototype).
>
> However, things are not going as smooth as one would expect.
> As soon as I enable an index through the precommit-hook on a bucket, I am
> getting a 403 response back
> for any PUT or POST operation that does not use the content type
> "text/plain".
> This seems a bit off to me that riak-search is able to introspect both
> application/xml and application/json, but
> will not accept content declared to be of these MIME types.
> I am getting errors
>
> Having accepted this for the time being, I have gone ahead to import some
> test documents into riak-search and
> attempt to query the index.
>
> This is the format of my test documents:
>
> <?xml version="1.0"?>
> <Event>
>   <EntityRef>NUR-X5-199-456</EntityRef>
>   <Nodes>
>     <NodeRef>NN-CUST-7278</NodeRef>
>     <NodeRef>NN-CUST-9619</NodeRef>
>     <NodeRef>NN-CUST-699</NodeRef>
>     <NodeRef>NN-CUST-8184</NodeRef>
>     <NodeRef>NN-SYS-2383</NodeRef>
>    </Nodes>
> </Event>
>
> Querying the index with q=EntityRef will give me all of the results back,
> as expected.
> However, attempting to use actual values to search, like so:
> q=EntityRef:NUR-X5-199-456
> gives me nothing.
>
> I have also stored the sample XML from the wiki in the same bucket, and yes
> - a query for
> q=name:Alyssa%20P.%20Hacker does indeed return _that_ document.
>
> I am completely new to Riak, so what I know I've learned through this
> experiment and reading up
> on the wiki (which, alas, does not indulge in detail), which has left me
> with having to do some
> guesswork to get me this far.
>
> Any feedback & help on what I might be doing wrong, would be greatly
> appreciated.
>
> Thanks / Sven
>
> --
> Sven Johansson
> Phone: +46704966945
> Twitter: @svjson
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to