Can't start riaksearch

2011-05-09 Thread Greg Pascale
Hi,

I seem to have wrangled riaksearch it into an inconsistent state where I
can't start it up. I've pasted the crash report I get - any ideas what might
be going on?
*
*
Thanks,
-Greg
*
*
*=ERROR REPORT 9-May-2011::12:15:42 ===*
*js_vm_count has been deprecated. Please use map_js_vm_count to configure
the map pool.*
*=ERROR REPORT 9-May-2011::12:15:42 ===*
*js_vm_count has been deprecated. Please use reduce_js_vm_count to configure
the reduce pool.*
*=ERROR REPORT 9-May-2011::12:15:42 ===*
*js_vm_count has been deprecated. Please use hook_js_vm_count to configure
the hook callback pool.*
*=SUPERVISOR REPORT 9-May-2011::12:15:42 ===*
* Supervisor: {local,riak_kv_sup}*
* Context:start_error*
* Reason:
{{badmatch,not_found},[{riak_kv_map_master,read_entry,2},{riak_kv_map_master,dequeue_mapper,1},{riak_kv_map_master,init,1},{gen_server2,init_it,6},{proc_lib,init_p_do_apply,3}]}
*
* Offender:
[{pid,undefined},{name,riak_kv_map_master},{mfa,{riak_kv_map_master,start_link,[]}},{restart_type,permanent},{shutdown,3},{child_type,worker}]
*
*
*
*
*
*=CRASH REPORT 9-May-2011::12:15:42 ===*
*  crasher:*
*initial call: gen:init_it/7*
*pid: <0.172.0>*
*registered_name: []*
*exception exit:
{{badmatch,not_found},[{riak_kv_map_master,read_entry,2},{riak_kv_map_master,dequeue_mapper,1},{riak_kv_map_master,init,1},{gen_server2,init_it,6},{proc_lib,init_p_do_apply,3}]}
*
*  in function  gen_server2:init_it/6*
*  in call from proc_lib:init_p_do_apply/3*
*ancestors: [riak_kv_sup,<0.135.0>]*
*messages: []*
*links:
[#Port<0.3428>,#Port<0.3432>,<0.136.0>,#Port<0.3433>,#Port<0.3430>,#Port<0.3424>,#Port<0.3426>,#Port<0.3422>]
*
*dictionary:
[{#Ref<0.0.0.447>,{bc_state,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue",fresh,undefined,[{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304808859.bitcask.data",1304808859,{file_descriptor,prim_file,{#Port<0.3430>,21}},undefined,0},{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304844075.bitcask.data",1304844075,{file_descriptor,prim_file,{#Port<0.3428>,20}},undefined,0},{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304872675.bitcask.data",1304872675,{file_descriptor,prim_file,{#Port<0.3426>,19}},undefined,0},{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304964599.bitcask.data",1304964599,{file_descriptor,prim_file,{#Port<0.3424>,18}},undefined,0},{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304965780.bitcask.data",1304965780,{file_descriptor,prim_file,{#Port<0.3422>,17}},undefined,0}],2147483648,[{expiry_secs,-1},read_write],<<>>}}]
*
*trap_exit: true*
*status: running*
*heap_size: 6765*
*stack_size: 24*
*reductions: 2267083*
*  neighbours:*
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Can't start riaksearch

2011-05-10 Thread Greg Pascale
I started from scratch with a new database and in less than a day have
gotten it into a similar state. The database is populated with a few hundred
thousand wikipedia articles, and I've been testing out various sorts of
searches and map/reduce queries. There's only one node (my box) and I'm the
only one hitting it.

This time, I can start riaksearch and ping it successfully once. If I ping
it a second time, it goes down. Crash log is below...

-Greg

*=ERROR REPORT 10-May-2011::16:34:31 ===*
*js_vm_count has been deprecated. Please use map_js_vm_count to configure
the map pool.*
*=ERROR REPORT 10-May-2011::16:34:31 ===*
*js_vm_count has been deprecated. Please use reduce_js_vm_count to configure
the reduce pool.*
*=ERROR REPORT 10-May-2011::16:34:31 ===*
*js_vm_count has been deprecated. Please use hook_js_vm_count to configure
the hook callback pool.*
*=CRASH REPORT 10-May-2011::16:34:31 ===*
*  crasher:*
*initial call: gen:init_it/7*
*pid: <0.172.0>*
*registered_name: []*
*exception exit:
{{badmatch,not_found},[{riak_kv_map_master,read_entry,2},{riak_kv_map_master,dequeue_mapper,1},{riak_kv_map_master,init,1},{gen_server2,init_it,6},{proc_lib,init_p_do_apply,3}]}
*
*  in function  gen_server2:init_it/6*
*  in call from proc_lib:init_p_do_apply/3*
*ancestors: [riak_kv_sup,<0.135.0>]*
*messages: []*
*links: [#Port<0.3424>,<0.136.0>,#Port<0.3425>,#Port<0.3422>]*
*dictionary:
[{#Ref<0.0.0.436>,{bc_state,"/Users/greg/code/deps/builds/riak_search-0.14.0/data/mr_queue",fresh,undefined,[{filestate,read_only,"/Users/greg/code/deps/builds/riak_search-0.14.0/data/mr_queue/1304980174.bitcask.data",1304980174,{file_descriptor,prim_file,{#Port<0.3422>,17}},undefined,0}],2147483648,[{expiry_secs,-1},read_write],<<>>}}]
*
*trap_exit: true*
*status: running*
*heap_size: 1597*
*stack_size: 24*
*reductions: 2294998*
*  neighbours:*
*
*
*=SUPERVISOR REPORT 10-May-2011::16:34:31 ===*
* Supervisor: {local,riak_kv_sup}*
* Context:start_error*
* Reason:
{{badmatch,not_found},[{riak_kv_map_master,read_entry,2},{riak_kv_map_master,dequeue_mapper,1},{riak_kv_map_master,init,1},{gen_server2,init_it,6},{proc_lib,init_p_do_apply,3}]}
*
* Offender:
[{pid,undefined},{name,riak_kv_map_master},{mfa,{riak_kv_map_master,start_link,[]}},{restart_type,permanent},{shutdown,3},{child_type,worker}]
*
*
*

On Mon, May 9, 2011 at 12:29 PM, Greg Pascale  wrote:

> Hi,
>
> I seem to have wrangled riaksearch it into an inconsistent state where I
> can't start it up. I've pasted the crash report I get - any ideas what might
> be going on?
> *
> *
> Thanks,
> -Greg
> *
> *
> *=ERROR REPORT 9-May-2011::12:15:42 ===*
> *js_vm_count has been deprecated. Please use map_js_vm_count to configure
> the map pool.*
> *=ERROR REPORT 9-May-2011::12:15:42 ===*
> *js_vm_count has been deprecated. Please use reduce_js_vm_count to
> configure the reduce pool.*
> *=ERROR REPORT 9-May-2011::12:15:42 ===*
> *js_vm_count has been deprecated. Please use hook_js_vm_count to configure
> the hook callback pool.*
> *=SUPERVISOR REPORT 9-May-2011::12:15:42 ===*
> * Supervisor: {local,riak_kv_sup}*
> * Context:start_error*
> * Reason:
> {{badmatch,not_found},[{riak_kv_map_master,read_entry,2},{riak_kv_map_master,dequeue_mapper,1},{riak_kv_map_master,init,1},{gen_server2,init_it,6},{proc_lib,init_p_do_apply,3}]}
> *
> * Offender:
> [{pid,undefined},{name,riak_kv_map_master},{mfa,{riak_kv_map_master,start_link,[]}},{restart_type,permanent},{shutdown,3},{child_type,worker}]
> *
> *
> *
> *
> *
> *=CRASH REPORT 9-May-2011::12:15:42 ===*
> *  crasher:*
> *initial call: gen:init_it/7*
> *pid: <0.172.0>*
> *registered_name: []*
> *exception exit:
> {{badmatch,not_found},[{riak_kv_map_master,read_entry,2},{riak_kv_map_master,dequeue_mapper,1},{riak_kv_map_master,init,1},{gen_server2,init_it,6},{proc_lib,init_p_do_apply,3}]}
> *
> *  in function  gen_server2:init_it/6*
> *  in call from proc_lib:init_p_do_apply/3*
> *ancestors: [riak_kv_sup,<0.135.0>]*
> *messages: []*
> *links:
> [#Port<0.3428>,#Port<0.3432>,<0.136.0>,#Port<0.3433>,#Port<0.3430>,#Port<0.3424>,#Port<0.3426>,#Port<0.3422>]
> *
> *dictionary:
> [{#Ref<0.0.0.447>,{bc_state,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue",fresh,undefined,[{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304808859.bitcask.data",1304808859,{file_descriptor,prim_file,{#Port<0.3430>,21}},undefined,0},{filestate,read_only,"/U

Re: Can't start riaksearch

2011-05-10 Thread Greg Pascale
Thanks! This seems to work.

-Greg

On Tue, May 10, 2011 at 4:51 PM, Dan Reverri  wrote:

> Try deleting the following folder and starting Riak Search again:
> *
> /Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue
> *
> *
> *
> The mr_queue persists scheduled MapReduce jobs to disk but in this case the
> requests that started the MapReduce jobs have likely ended.
>
> I've filed a bug to investigate this issue further:
> https://issues.basho.com/show_bug.cgi?id=1096
>
> Thanks,
> Dan
> *
> *Daniel Reverri
> Developer Advocate
> Basho Technologies, Inc.
> d...@basho.com
>
>
> On Mon, May 9, 2011 at 12:29 PM, Greg Pascale  wrote:
>
>> Hi,
>>
>> I seem to have wrangled riaksearch it into an inconsistent state where I
>> can't start it up. I've pasted the crash report I get - any ideas what might
>> be going on?
>> *
>> *
>>  Thanks,
>> -Greg
>> *
>> *
>> *=ERROR REPORT 9-May-2011::12:15:42 ===*
>> *js_vm_count has been deprecated. Please use map_js_vm_count to configure
>> the map pool.*
>> *=ERROR REPORT 9-May-2011::12:15:42 ===*
>> *js_vm_count has been deprecated. Please use reduce_js_vm_count to
>> configure the reduce pool.*
>> *=ERROR REPORT 9-May-2011::12:15:42 ===*
>> *js_vm_count has been deprecated. Please use hook_js_vm_count to
>> configure the hook callback pool.*
>> *=SUPERVISOR REPORT 9-May-2011::12:15:42 ===*
>> * Supervisor: {local,riak_kv_sup}*
>> * Context:start_error*
>> * Reason:
>> {{badmatch,not_found},[{riak_kv_map_master,read_entry,2},{riak_kv_map_master,dequeue_mapper,1},{riak_kv_map_master,init,1},{gen_server2,init_it,6},{proc_lib,init_p_do_apply,3}]}
>> *
>> * Offender:
>> [{pid,undefined},{name,riak_kv_map_master},{mfa,{riak_kv_map_master,start_link,[]}},{restart_type,permanent},{shutdown,3},{child_type,worker}]
>> *
>> *
>> *
>> *
>> *
>> *=CRASH REPORT 9-May-2011::12:15:42 ===*
>> *  crasher:*
>> *initial call: gen:init_it/7*
>> *pid: <0.172.0>*
>> *registered_name: []*
>>  *exception exit:
>> {{badmatch,not_found},[{riak_kv_map_master,read_entry,2},{riak_kv_map_master,dequeue_mapper,1},{riak_kv_map_master,init,1},{gen_server2,init_it,6},{proc_lib,init_p_do_apply,3}]}
>> *
>> *  in function  gen_server2:init_it/6*
>> *  in call from proc_lib:init_p_do_apply/3*
>> *ancestors: [riak_kv_sup,<0.135.0>]*
>> *messages: []*
>> *links:
>> [#Port<0.3428>,#Port<0.3432>,<0.136.0>,#Port<0.3433>,#Port<0.3430>,#Port<0.3424>,#Port<0.3426>,#Port<0.3422>]
>> *
>> *dictionary:
>> [{#Ref<0.0.0.447>,{bc_state,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue",fresh,undefined,[{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304808859.bitcask.data",1304808859,{file_descriptor,prim_file,{#Port<0.3430>,21}},undefined,0},{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304844075.bitcask.data",1304844075,{file_descriptor,prim_file,{#Port<0.3428>,20}},undefined,0},{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304872675.bitcask.data",1304872675,{file_descriptor,prim_file,{#Port<0.3426>,19}},undefined,0},{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304964599.bitcask.data",1304964599,{file_descriptor,prim_file,{#Port<0.3424>,18}},undefined,0},{filestate,read_only,"/Users/greg/gitclients/clipboard/deps/builds/riak_search-0.14.0/data/mr_queue/1304965780.bitcask.data",1304965780,{file_descriptor,prim_file,{#Port<0.3422>,17}},undefined,0}],2147483648,[{expiry_secs,-1},read_write],<<>>}}]
>> *
>> *trap_exit: true*
>> *status: running*
>> *heap_size: 6765*
>> *stack_size: 24*
>> *reductions: 2267083*
>> *  neighbours:*
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Error when trying to use a javascript custom extractor in Riaksearch

2011-05-21 Thread Greg Pascale
I've been banging my head against the wall trying to get a javascript custom
extractor working. Here is the simplest example I could come up with to
reproduce the error.

*curl -v -X PUT -H "Content-Type: application/json"
http://localhost:8098/riak/test -d @data*

where *@data* is a file that looks like

*{"props":*
* {"rs_extractfun":*
*  {"language" : "javascript", *
*   "source" : "function(a,b){return{\"user\":\"gpascale\",
\"name\":\"greg\"};}"*
*  }*
* }*
*}*
*
*
This completes successfully, and I can verify it by looking at the
properties of the "test" bucket.

*{"props":{"allow_mult":true,"basic_quorum":true,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"test","notfound_ok":false,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[{"mod":"riak_search_kv_hook","fun":"precommit"}],"pw":0,"r":"quorum","rs_extractfun":{"language":"javascript","source":"function(a,b){return{\"user\":\"gpascale\",
\"name\":\"greg\"};}"},"rw":"quorum","small_vclock":10,"w":"quorum","young_vclock":20}}
*

However, when I try to insert something into the bucket, I get an error

*curl -X PUT http://localhost:8098/riak/test/test1 -d "Hello, world!"*

*{error,badarg,*
*[{erlang,iolist_to_binary,*
* [{hook_crashed,*
*  {riak_search_kv_hook,precommit,exit,*
*  {noproc,*
*  {gen_server,call,*
*  [riak_search_js_extract,reserve_vm,*
*   infinity]]},*
* {wrq,append_to_response_body,2},*
* {riak_kv_wm_raw,accept_doc_body,2},*
* {webmachine_resource,resource_call,3},*
* {webmachine_resource,do,3},*
* {webmachine_decision_core,resource_call,1},*
* {webmachine_decision_core,accept_helper,0},*
* {webmachine_decision_core,decision,1}]}}*
*
*
It doesn't matter if the thing I insert is a string, as above, or real json
object that matches my schema - the error is the same. Any ideas what might
be going on here?

Thanks,
-Greg
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Error when trying to use a javascript custom extractor in Riaksearch

2011-05-22 Thread Greg Pascale
Yes that's the reference I've been using. I went with the {struct,
JsonProplist} option because it can be set via the HTTP API.

-Greg

On Sat, May 21, 2011 at 11:48 PM, Andrew Berman  wrote:

> I'll preface this by saying I've never used this feature
>
> rs_extractfun should be set to one of the values defined in the Other
> Encodings section (
> http://wiki.basho.com/Riak-Search---Indexing-and-Querying-Riak-KV-Data.html).
> In your case, {jsanon, "function(a,b){return{\"user\":\"gpascale\",
> \"name\":\"greg\"};}"}
>
> Hope that helps,
>
> Andrew
>
> On Sat, May 21, 2011 at 7:48 PM, Greg Pascale  wrote:
>
>> I've been banging my head against the wall trying to get a javascript
>> custom extractor working. Here is the simplest example I could come up with
>> to reproduce the error.
>>
>> *curl -v -X PUT -H "Content-Type: application/json"
>> http://localhost:8098/riak/test -d @data*
>>
>> where *@data* is a file that looks like
>>
>> *{"props":*
>> * {"rs_extractfun":*
>> *  {"language" : "javascript", *
>> *   "source" : "function(a,b){return{\"user\":\"gpascale\",
>> \"name\":\"greg\"};}"*
>> *  }*
>> * }*
>> *}*
>> *
>> *
>> This completes successfully, and I can verify it by looking at the
>> properties of the "test" bucket.
>>
>> *{"props":{"allow_mult":true,"basic_quorum":true,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"test","notfound_ok":false,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[{"mod":"riak_search_kv_hook","fun":"precommit"}],"pw":0,"r":"quorum","rs_extractfun":{"language":"javascript","source":"function(a,b){return{\"user\":\"gpascale\",
>> \"name\":\"greg\"};}"},"rw":"quorum","small_vclock":10,"w":"quorum","young_vclock":20}}
>> *
>>
>> However, when I try to insert something into the bucket, I get an error
>>
>> *curl -X PUT http://localhost:8098/riak/test/test1 -d "Hello, world!"*
>>
>> *{error,badarg,*
>> *[{erlang,iolist_to_binary,*
>> * [{hook_crashed,*
>> *  {riak_search_kv_hook,precommit,exit,*
>> *  {noproc,*
>> *  {gen_server,call,*
>> *  [riak_search_js_extract,reserve_vm,*
>> *   infinity]]},*
>> * {wrq,append_to_response_body,2},*
>> * {riak_kv_wm_raw,accept_doc_body,2},*
>> * {webmachine_resource,resource_call,3},*
>> * {webmachine_resource,do,3},*
>> * {webmachine_decision_core,resource_call,1},*
>> * {webmachine_decision_core,accept_helper,0},*
>> * {webmachine_decision_core,decision,1}]}}*
>> *
>> *
>> It doesn't matter if the thing I insert is a string, as above, or real
>> json object that matches my schema - the error is the same. Any ideas what
>> might be going on here?
>>
>> Thanks,
>> -Greg
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


riaksearch: using index docs in place of real objects

2011-05-24 Thread Greg Pascale
Hi,

In our data model, our riak objects are flat JSON objects, and thus their
corresponding index documents are nearly identical - the only difference is
that a few fields which are ints in the riak objects are strings in the
index doc.

Since they are so similar, we are directly using the index docs returned
from our search call, skipping the second step of doing gets on the returned
keys to retrieve the real objects.

Is this advisable? Are there any circumstances under which we might run into
consistency issues?

Thanks,
-Greg
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch: using index docs in place of real objects

2011-05-26 Thread Greg Pascale
Eric, I believe the key is the document id, which will be the same as the
key of the corresponding object in .

-Greg

On Thu, May 26, 2011 at 12:41 PM, Eric Moritz wrote:

> Out of curiosity what is the key in this URL?
> http://riak.host:8098/riak/_rsid_/key
>
> On Thu, May 26, 2011 at 9:42 AM, Mathias Meyer  wrote:
> > Greg,
> >
> > Riak Search stores indexed documents in Riak KV too, as serialized Erlang
> terms. You can easily verify that by requesting a document from
> http://riak.host:8098/riak/_rsid_/key.
> >
> > So whenever you query something through the Solr interface the documents
> you get back are fetched from these buckets, and therefore the same
> distribution and consistency properties apply to them as to objects stored
> directly in Riak KV. Bottom line is there's nothing wrong with just using
> them instead of fetching them again from Riak KV.
> >
> > Mathias Meyer
> > Developer Advocate, Basho Technologies
> >
> >
> > On Mittwoch, 25. Mai 2011 at 00:34, Greg Pascale wrote:
> >
> >> Hi,
> >>
> >> In our data model, our riak objects are flat JSON objects, and thus
> their corresponding index documents are nearly identical - the only
> difference is that a few fields which are ints in the riak objects are
> strings in the index doc.
> >>
> >> Since they are so similar, we are directly using the index docs returned
> from our search call, skipping the second step of doing gets on the returned
> keys to retrieve the real objects.
> >>
> >> Is this advisable? Are there any circumstances under which we might run
> into consistency issues?
> >>
> >> Thanks,
> >> -Greg
> >>
> >> ___
> >> riak-users mailing list
> >> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


riak-search numFound incorrect

2011-06-13 Thread Greg Pascale
Hi,

we recently upgraded to riak-search 0.14.2 and it seems that the numFound
value returned from SOLR searches is no longer correct.

In one particular search, there are actually 22 results. If I set start = 30
and count = 10, I get 0 results as expected. However, no matter what I set
those to, numFound comes back as 114.

I thought numFound was supposed to return the number of matched documents,
disregarding start and count. Am I misunderstanding something or is this a
bug?

Thanks
-Greg
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak-search numFound incorrect

2011-06-30 Thread Greg Pascale
Hi Ryan,

1) I *think *so.

2) Not that I can remember.

-Greg

On Wed, Jun 22, 2011 at 5:32 PM, Ryan Zezeski  wrote:

> Hi Greg,
>
> Two questions:
>
> 1. Was this count normally correct _before_ upgrading to 14.2?
>
> 2. Have you performed a direct delete (e.g. via curl) of any keys under
> your _rsid_ bucket?
>
> -Ryan
>
> On Mon, Jun 13, 2011 at 4:22 PM, Greg Pascale  wrote:
>
>> Hi,
>>
>> we recently upgraded to riak-search 0.14.2 and it seems that the numFound
>> value returned from SOLR searches is no longer correct.
>>
>> In one particular search, there are actually 22 results. If I set start =
>> 30 and count = 10, I get 0 results as expected. However, no matter what I
>> set those to, numFound comes back as 114.
>>
>> I thought numFound was supposed to return the number of matched documents,
>> disregarding start and count. Am I misunderstanding something or is this a
>> bug?
>>
>> Thanks
>> -Greg
>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


riaksearch performace when numFound is high

2011-07-06 Thread Greg Pascale
Hi,

I'm looking at ways to improve riaksearch queries that produce a lot of
matches.

In my use case, I only ever want the top 20 results for any query, and
results should be ordered by date (which is encoded in the key). For
searches with few matches (numFound < ~1000), performance is great. For
searches with more matches (numFound > ~1), performance starts to lag
even though I only ever want the top 20. I assume this is because the system
needs to fetch and sort all of the results to know what the top 20 are, but
I'm hoping I can exploit the constraints of my use case in some way to
increase performance. I've looked at the following approaches.

1) AND the "text:" term with a small date range (e.g. text: AND
date:[]). This reduces the result set, but performance
does not improve. At best, the performance is as good as simply doing the
"text:" search without the date range, and in some cases worse.

2) Same as above, but make the date an inline field. From what I could find
on the topic, it sounded like this is exactly what inline fields or for, but
I was disappointed to discover it performed far worse than even the compound
query above.

3) In this article ,
which I was linked to from somewhere on the basho site, the author describes
a technique in which he calls search_fold directly and stops after he's
received enough results. He claims this is possible in his case because
results are returned in key order, and he's chosen his keys to match the
desired ordering of his results. My keys have the same property, as I'm
already using the presort=key option. Is this behavior of search_fold a
lucky side-effect, or is this actually guaranteed to work?

Am I simply expecting too much of riaksearch here, or is there a way to make
this work? If all else fails, I suppose I could divide my data into more
buckets, but I'm hoping to avoid that as it would make querying much more
complex.

Thanks,
-Greg
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch performace when numFound is high

2011-07-14 Thread Greg Pascale
Hi Ryan,

Yes we are using 14.2.

I think I get what you are saying about inline fields. It looks like it will
fix some of our problems, but not all of them. If you'll indulge me in a
little contrived example, I think I can explain what I mean.

Let's say I'm implementing a simple twitter clone. In my system, tweets have
a user, text, date and can be either public or private - so one might look
like

{
username: greg,
public: true,
text: "I bought a new guitar!",
date: 07/14/11
}

Let's say that my service is popular - I have 100k users, each of whom has
published exactly 100 tweets, and exactly half of the tweets are public, the
rest are private. So I have 10 million tweets total and 5 million of them
are public.

Query 1: "username:greg AND public:false" - this finds all tweets by greg
that are private. In this case "username:greg" matches 100 results and
"public:false" matches 5 million. My tests have shown that the performance
of this compound query will be roughly equivalent to the worse of the two
("public:false") - so this query will be way too slow.
However, since "public" values are nice and small, I can inline it for a big
win.

Query 2: "text:guitar AND date:07/14/11" - this finds all tweets from today
that contain "guitar". Suppose there are 100k tweets that contain "guitar",
but it's early in the day so there have only been 1k tweets in total on
07/14/11. My result set is therefore no bigger than 1k (probably much
smaller unless all my users bought new guitars today), but this query is
still bounded by the "text:guitar" piece which matches 100k results. In this
case, inlining date wouldn't help because it's not the slow part, and
indexing on the text field isn't practical - it's too big.

If you could follow that, am I understanding this correctly? In our system,
we have some queries like #1 above, but since we support full text search,
many like #2 as well. Do you have any suggestions for what we could do to
make queries like #2 performant?

Thanks,
-Greg



On Wed, Jul 13, 2011 at 7:19 PM, Ryan Zezeski  wrote:

> Greg,
>
> I'm assuming you are using 14.2.
>
> 1) There is a bug in 14.2 that will cause a (potentially very fast growing)
> memory leak when using AND.  This is unfortunate, sorry.  The good news I
> have since patched it [1].
>
> 2) This is your best course of action, and you were so close but you've
> actually crossed your fields.  That is, the inline field should be the one
> that contains the more common term (i.e. the 'text' field).  So you should
> perform a range query on your date with a filter on the text inline field.
>  Obviously, the more terms in this field the more the index will inflate
> (space-wise), but if you can live with that then it should reduce your
> latency substantially (famous last words).  Please try this and get back to
> me.
>
> 3) That is a very well written article, props to the author.  However, I
> would leave this as a last resort.  Try what I mentioned in #2, and if
> that's not enough to get you by then let's brainstorm.
>
> [1]:
> https://github.com/basho/riak_search/commit/cd910c2519f94e9d7e8a8e21894db9d0eecdd5b4
>
>
> On Wed, Jul 6, 2011 at 2:43 PM, Greg Pascale  wrote:
>
>> Hi,
>>
>> I'm looking at ways to improve riaksearch queries that produce a lot of
>> matches.
>>
>> In my use case, I only ever want the top 20 results for any query, and
>> results should be ordered by date (which is encoded in the key). For
>> searches with few matches (numFound < ~1000), performance is great. For
>> searches with more matches (numFound > ~1), performance starts to lag
>> even though I only ever want the top 20. I assume this is because the system
>> needs to fetch and sort all of the results to know what the top 20 are, but
>> I'm hoping I can exploit the constraints of my use case in some way to
>> increase performance. I've looked at the following approaches.
>>
>> 1) AND the "text:" term with a small date range (e.g. text:
>> AND date:[]). This reduces the result set, but
>> performance does not improve. At best, the performance is as good as simply
>> doing the "text:" search without the date range, and in some
>> cases worse.
>>
>> 2) Same as above, but make the date an inline field. From what I could
>> find on the topic, it sounded like this is exactly what inline fields or
>> for, but I was disappointed to discover it performed far worse than even the
>> compound query above.
>>
>> 3) In this article <http://blog.inagist.com/searching-with-riaksearch>,
>> which I 

Re: riaksearch performace when numFound is high

2011-07-15 Thread Greg Pascale
Ryan,

Thanks for the detailed reply.

We're a node.js shop and we're hitting riak via the riak-js library, which
AFAIK just uses the http client.

We haven't actually scaled out to the point where we're seeing huge #s of
results returned, but if we extrapolate from the numbers we are seeing, it's
clear that we'll start to hit these limits. We have a lot of #2 type queries
because we support full text search, and unfortunately our "text" tends to
be a lot longer than 140 characters, so I really don't think inlining it is
practical.

-- Greg
Clipboard <http://www.clipboard.com/> is hiring<http://www.clipboard.com/jobs>
!


On Fri, Jul 15, 2011 at 11:49 AM, Ryan Zezeski  wrote:

> Greg,
>
> I'm curious to know how you are querying search (i.e. which client/API) and
> what your setup looks like (i.e. # of nodes, disk, network between them,
> etc).  How many of these #2 query types are you seeing and what's the
> average # of results being returned?  Is your search instance opened up to
> user queries?
>
> Ignoring any other inefficiencies in the Search code I'd say that there are
> two main points working against you in query #2
>
> 1.) Search uses a _global index_ [1] which means that, as Rusty would say,
> it's partitioned by term (as opposed to a _local index_ which would
> partition by document).  It has been found, in cases where the query
> contains less terms than the processor count, that a global index scales out
> better as it has better concurrency characteristics [1].  However, a local
> index can balance load better and in the case of #2 could possibly drop
> latency times.  When you perform that "text:guitar AND date:07/14/11" each
> term is queried by one, and only one, vnode.  These results are then
> combined by a coordinator (called a broker in [1]).  How much using a local
> index would change your observed latency is unknown to me, but it would have
> other effects that may offset any latency drops.
>
> 2.) Search is not limiting the results returned.  Instead of sorting the
> results based on relevance and only taking the top N from each query term
> it's getting all data from each query, passing it to the coordinator and
> saying "hey, you figure this out."  When you run that #2 query, as you said,
> you are bounded by your largest result set because that's the only way for
> search to know it's correctly answered the query.  Said another way, you may
> only have 10 entries for the 14th but you still need to search all 100K
> guitar entries to determine which ones fall on that date.  Performing some
> sort of relevance calculation beforehand and limiting the result
> would certainly help but it also means you won't get the full result set.  I
> don't think there's any way to have your cake and eat it too unless you can
> afford to put everything in memory.
>
> At this point in time I think inline fields will be your best bet.  Inline
> fields help dramatically here because you are essentially bringing part of
> your document into each term entry in the index.  This means you can query
> on high cardinality terms and filter on lower ones, i.e. low latency (high
> and low being relative here).  If you're worried about disk space then you
> can make use of the `only` value for the inline attribute.  This tells
> Search to store this field's value inline but _don't_ index it.  If you're
> only using the `text` field to filter results then this is exactly what you
> should do.  In fact, I would recommend you do that because any search
> against the `text` field for a corpus of tweets is probably going to have a
> large result set.
>
>
> HTH,
> -Ryan
>
> [1]: C.S. Badue.  Distributed query processing using partitioned inverted
> files.  Master's thesis, Federal University of Minas Gerais, Belo Horizonte,
> Minas Gerias, Brazil, March 2001.
>
> On Thu, Jul 14, 2011 at 6:20 PM, Greg Pascale  wrote:
>
>> Hi Ryan,
>>
>> Yes we are using 14.2.
>>
>> I think I get what you are saying about inline fields. It looks like it
>> will fix some of our problems, but not all of them. If you'll indulge me in
>> a little contrived example, I think I can explain what I mean.
>>
>> Let's say I'm implementing a simple twitter clone. In my system, tweets
>> have a user, text, date and can be either public or private - so one might
>> look like
>>
>> {
>> username: greg,
>> public: true,
>> text: "I bought a new guitar!",
>> date: 07/14/11
>> }
>>
>> Let's say that my service is popular - I have 100k users, each of whom has
>

riak_search: custom extractor syntax

2011-07-18 Thread Greg Pascale
Hi,

I'm trying to use a custom extractor, but I can't for the life of me seem to
get the syntax right. Even the simplest thing I can think to try won't work.

I've tried setting the rs_extractfun property as described in the
documentation - both of these methods


   - {jsanon, {Bucket, Key}}, where Bucket and Key name a Riak object that
   contains the source of a Javascript function to call for extraction.
   - {jsanon, Source}, where Source is the source of a Javascript function
   to call for extraction.

> Client:set_bucket(<<"extractTest">>, [{rs_extractfun, {jsanon,
<<"function(){return{'name':'greg', 'login':'gpascale'};}">>}}]).
> Client:set_bucket(<<"extractTest">>, [{rs_extractfun, {jsanon,
{<<"extractors">>, <<"myextractor">>}}}]).

but neither of these work.

Can somebody point out where I'm going wrong?

-- 
Greg
Clipboard  is hiring
!
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak_search: custom extractor syntax

2011-07-19 Thread Greg Pascale
I've been debugging through the riak code, and it looks like the
riak_search_js_extract process is not running, so it's not even getting to
the point of executing my JS. Has this feature ever worked?

=ERROR REPORT 19-Jul-2011::10:25:24 ===
problem invoking hook riak_search_kv_hook:precommit -> exit:{noproc,
 {gen_server,
  call,

[riak_search_js_extract,
   reserve_vm,
   infinity]}}
[{gen_server,call,3},
 {riak_kv_js_manager,blocking_dispatch,4},
 {riak_search_kv_hook,run_extract,3},
 {riak_search_kv_hook,make_indexed_doc,4},
 {riak_search_kv_hook,index_object,2},
 {riak_search_kv_hook,precommit,1},
 {riak_kv_put_fsm,invoke_hook,4},
 {riak_kv_put_fsm,precommit,2}]

-Greg


On Mon, Jul 18, 2011 at 6:22 PM, Greg Pascale  wrote:

> Hi,
>
> I'm trying to use a custom extractor, but I can't for the life of me seem
> to get the syntax right. Even the simplest thing I can think to try won't
> work.
>
> I've tried setting the rs_extractfun property as described in the
> documentation - both of these methods
>
>
>- {jsanon, {Bucket, Key}}, where Bucket and Key name a Riak object that
>contains the source of a Javascript function to call for extraction.
>- {jsanon, Source}, where Source is the source of a Javascript function
>to call for extraction.
>
> > Client:set_bucket(<<"extractTest">>, [{rs_extractfun, {jsanon,
> <<"function(){return{'name':'greg', 'login':'gpascale'};}">>}}]).
> > Client:set_bucket(<<"extractTest">>, [{rs_extractfun, {jsanon,
> {<<"extractors">>, <<"myextractor">>}}}]).
>
> but neither of these work.
>
> Can somebody point out where I'm going wrong?
>
> --
> Greg
> Clipboard <http://www.clipboard.com> is hiring<http://www.clipboard.com/jobs>
> !
>
>


-- 
Greg
Clipboard <http://www.clipboard.com> is hiring<http://www.clipboard.com/jobs>
!
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Precommit Hook Difficulties

2011-09-17 Thread Greg Pascale
Hi,

I'm trying to write a simple precommit hook to modify a JSON object by
removing certain fields. The simplest way to do this, I figure, is to decode
the object's value with mochijson2, remove the fields I don't want,
re-encode it, and update the value.

What happens, though, is my object ends up somehow mangled. When I inspect
it via curl, it sort of looks like JSON, but characters like "{" and ":"
seem to be replaced with garbage.

For example, what should read "hostname":"www.google.com" looks like *
hostnamea"ja:la"mwww.google.coma"j*

To try to diagnose the issue, I reduced my hook to the simplest possible
case. I don't even modify the JSON, I just decode the object and re-encode
exactly the same value, but I still have the problem. The code is pasted
below

*precommit(RiakObject) ->*
*Value = riak_object:get_value(RiakObject),*
*{struct, TermList} = mochijson2:decode(Value),*
*riak_object:apply_updates(*
*  riak_object:update_value(RiakObject, *
*   mochijson2:encode({struct, TermList}))).
*

Can anybody point out what I'm doing wrong?

-- 
Greg
Clipboard  is hiring
!
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Precommit Hook Difficulties

2011-09-19 Thread Greg Pascale
Actually it looks like I can remove the JSON decode/encode and I still see
the same issue. Just reading and writing the value is enough, so all I'm
doing now is

*precommit(RiakObject) ->*
*Value = riak_object:get_value(RiakObject),*
*riak_object:apply_updates(*
*  riak_object:update_value(RiakObject, Value)).*

I'm really baffled here. The only documentation I know of for this code is
at
http://basho.github.com/riak-erlang-client/riak-erlang-client/riakc_obj.html
and
it really doesn't say much.

-Greg
Clipboard

On Sat, Sep 17, 2011 at 4:11 PM, Greg Pascale  wrote:

> Hi,
>
> I'm trying to write a simple precommit hook to modify a JSON object by
> removing certain fields. The simplest way to do this, I figure, is to decode
> the object's value with mochijson2, remove the fields I don't want,
> re-encode it, and update the value.
>
> What happens, though, is my object ends up somehow mangled. When I inspect
> it via curl, it sort of looks like JSON, but characters like "{" and ":"
> seem to be replaced with garbage.
>
> For example, what should read "hostname":"www.google.com" looks like *
> hostnamea"ja:la"mwww.google.coma"j*
>
> To try to diagnose the issue, I reduced my hook to the simplest possible
> case. I don't even modify the JSON, I just decode the object and re-encode
> exactly the same value, but I still have the problem. The code is pasted
> below
>
> *precommit(RiakObject) ->*
> *Value = riak_object:get_value(RiakObject),*
> *{struct, TermList} = mochijson2:decode(Value),*
> *riak_object:apply_updates(*
> *  riak_object:update_value(RiakObject, *
> *   mochijson2:encode({struct,
> TermList}))).*
>
> Can anybody point out what I'm doing wrong?
>
> --
> Greg
> Clipboard <http://www.clipboard.com> is hiring<http://www.clipboard.com/jobs>
> !
>
>


-- 
Greg
Clipboard <http://www.clipboard.com> is hiring<http://www.clipboard.com/jobs>
!
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Precommit Hook Difficulties

2011-09-19 Thread Greg Pascale
Scratch that - the issue doesn't actually manifest without the
decode/encode.

On Mon, Sep 19, 2011 at 10:47 AM, Greg Pascale  wrote:

> Actually it looks like I can remove the JSON decode/encode and I still see
> the same issue. Just reading and writing the value is enough, so all I'm
> doing now is
>
> *precommit(RiakObject) ->*
> *Value = riak_object:get_value(RiakObject),*
> *riak_object:apply_updates(*
> *  riak_object:update_value(RiakObject, Value)).*
>
> I'm really baffled here. The only documentation I know of for this code is
> at
> http://basho.github.com/riak-erlang-client/riak-erlang-client/riakc_obj.html 
> and
> it really doesn't say much.
>
> -Greg
> Clipboard
>
> On Sat, Sep 17, 2011 at 4:11 PM, Greg Pascale  wrote:
>
>> Hi,
>>
>> I'm trying to write a simple precommit hook to modify a JSON object by
>> removing certain fields. The simplest way to do this, I figure, is to decode
>> the object's value with mochijson2, remove the fields I don't want,
>> re-encode it, and update the value.
>>
>> What happens, though, is my object ends up somehow mangled. When I inspect
>> it via curl, it sort of looks like JSON, but characters like "{" and ":"
>> seem to be replaced with garbage.
>>
>> For example, what should read "hostname":"www.google.com" looks like *
>> hostnamea"ja:la"mwww.google.coma"j*
>>
>> To try to diagnose the issue, I reduced my hook to the simplest possible
>> case. I don't even modify the JSON, I just decode the object and re-encode
>> exactly the same value, but I still have the problem. The code is pasted
>> below
>>
>> *precommit(RiakObject) ->*
>> *Value = riak_object:get_value(RiakObject),*
>> *{struct, TermList} = mochijson2:decode(Value),*
>> *riak_object:apply_updates(*
>> *  riak_object:update_value(RiakObject, *
>> *   mochijson2:encode({struct,
>> TermList}))).*
>>
>> Can anybody point out what I'm doing wrong?
>>
>> --
>> Greg
>> Clipboard <http://www.clipboard.com> is hiring<http://www.clipboard.com/jobs>
>> !
>>
>>
>
>
> --
> Greg
> Clipboard <http://www.clipboard.com> is hiring<http://www.clipboard.com/jobs>
> !
>
>


-- 
Greg
Clipboard <http://www.clipboard.com> is hiring<http://www.clipboard.com/jobs>
!
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Precommit Hook Difficulties

2011-09-19 Thread Greg Pascale
Thanks,

I was able to work around the issue by using regexes to do string replacement, 
thereby avoiding decoding and encoding JSON, but it would be nice to know what 
was going wrong here.

-Greg
Clipboard



On Sep 19, 2011, at 11:44 AM, Sean Cribbs wrote:

> Greg,
> 
> Your general approach looks good, and jives with my reading of the 
> riak_object module. Let us know if you still have problems with mochijson2.
> 
> Sean
> 
> On Mon, Sep 19, 2011 at 1:42 PM, Greg Pascale  wrote:
> Scratch that - the issue doesn't actually manifest without the decode/encode.
> 
> 
> On Mon, Sep 19, 2011 at 10:47 AM, Greg Pascale  wrote:
> Actually it looks like I can remove the JSON decode/encode and I still see 
> the same issue. Just reading and writing the value is enough, so all I'm 
> doing now is
> 
> precommit(RiakObject) ->
> Value = riak_object:get_value(RiakObject),
> riak_object:apply_updates(
>   riak_object:update_value(RiakObject, Value)).
> 
> I'm really baffled here. The only documentation I know of for this code is at 
> http://basho.github.com/riak-erlang-client/riak-erlang-client/riakc_obj.html 
> and it really doesn't say much.
> 
> -Greg
> Clipboard
> 
> On Sat, Sep 17, 2011 at 4:11 PM, Greg Pascale  wrote:
> Hi,
> 
> I'm trying to write a simple precommit hook to modify a JSON object by 
> removing certain fields. The simplest way to do this, I figure, is to decode 
> the object's value with mochijson2, remove the fields I don't want, re-encode 
> it, and update the value.
> 
> What happens, though, is my object ends up somehow mangled. When I inspect it 
> via curl, it sort of looks like JSON, but characters like "{" and ":" seem to 
> be replaced with garbage.
> 
> For example, what should read "hostname":"www.google.com" looks like 
> hostnamea"ja:la"mwww.google.coma"j
> 
> To try to diagnose the issue, I reduced my hook to the simplest possible 
> case. I don't even modify the JSON, I just decode the object and re-encode 
> exactly the same value, but I still have the problem. The code is pasted below
> 
> precommit(RiakObject) ->
> Value = riak_object:get_value(RiakObject),
> {struct, TermList} = mochijson2:decode(Value),
> riak_object:apply_updates(
>   riak_object:update_value(RiakObject, 
>mochijson2:encode({struct, TermList}))).
> 
> Can anybody point out what I'm doing wrong?
> 
> -- 
> Greg
> Clipboard is hiring!
> 
> 
> 
> 
> -- 
> Greg
> Clipboard is hiring!
> 
> 
> 
> 
> -- 
> Greg
> Clipboard is hiring!
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Precommit Hook Difficulties

2011-09-19 Thread Greg Pascale
Awesome, thanks.

-Greg
Clipboard



On Sep 19, 2011, at 12:53 PM, Jon Meredith wrote:

> I had a quick play with this. Mochijson2:encode returns an erlang I/O list.  
> You can convert back to a proper binary with iolist_to_binary().  
> 
> precommit_json_identity(Obj) ->
> Values = riak_object:get_values(Obj),
> {struct, TermList} = mochijson2:decode(hd(Values)),
> riak_object:apply_updates(
>   riak_object:update_value(Obj, 
>iolist_to_binary(mochijson2:encode({struct, 
> TermList}.
> 
> Cheers, Jon
> 
> On Mon, Sep 19, 2011 at 1:50 PM, David Smith  wrote:
> On Sat, Sep 17, 2011 at 5:11 PM, Greg Pascale  wrote:
> > Hi,
> > I'm trying to write a simple precommit hook to modify a JSON object by
> > removing certain fields. The simplest way to do this, I figure, is to decode
> > the object's value with mochijson2, remove the fields I don't want,
> > re-encode it, and update the value.
> > What happens, though, is my object ends up somehow mangled. When I inspect
> > it via curl, it sort of looks like JSON, but characters like "{" and ":"
> > seem to be replaced with garbage.
> > For example, what should read "hostname":"www.google.com" looks
> > like hostnamea"ja:la"mwww.google.coma"j
> 
> If I had to guess, I would say that you are seeing some sort of weird
> unicode encoding of the double-quote character; maybe the data stored
> was UTF-16 encoded? I.e. is there some unicode character that would
> display a " but actually be stored as [a, "]?
> 
> D.
> 
> --
> Dave Smith
> Director, Engineering
> Basho Technologies, Inc.
> diz...@basho.com
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Search Precommit Hook in 1.0

2011-09-19 Thread Greg Pascale
Hi,

I've noticed that the Riak Search precommit hook behaves in a really odd and
non-standard way in Riak 1.0. It seems that setting search:true on a bucket
automatically causes the precommit hook to be installed, and setting
search:false automatically uninstalls it.

Ok, but if I set search true and then set precommit to [], it still shows
the search hook. Weird. And now if I set precommit to some other hook that I
wrote, it just slots it in before the search hook.

I find this really lame. I've been working on a precommit hook that needs to
come after the search hook, but that seems to be impossible now. Prior to
1.0, I could just issue the command

*curl -X PUT  -d
'{"props":{"precommit":[{"mod":"riak_search_kv_hook","fun":"precommit"},
{}]}}' *

to install the hooks in the right order.

-- 
Greg
Clipboard
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search Precommit Hook in 1.0

2011-09-20 Thread Greg Pascale
Hi Jon,

Thanks for getting back to me.

I'm doing full-text indexing, but I don't need to keep the text around once my 
object has been indexed, so I wrote a second precommit hook to remove the text 
from the object. This reduces the size of my objects drastically (since text 
tends to be the largest part) and has given us substantial perf gains in our 
tests.

-Greg
Clipboard



On Sep 20, 2011, at 7:54 AM, Jon Meredith wrote:

> Hi Greg,
> 
> This is a consequence of the changes we made to bucket properties to try and 
> simplify a few configuration issues.  I've filed a bug for it.
> 
> https://issues.basho.com/show_bug.cgi?id=1216
> 
> Why does your hook need to go after the search hook?  
> 
> Cheers, Jon.
> 
> 
> On Mon, Sep 19, 2011 at 6:37 PM, Greg Pascale  wrote:
> Hi,
> 
> I've noticed that the Riak Search precommit hook behaves in a really odd and 
> non-standard way in Riak 1.0. It seems that setting search:true on a bucket 
> automatically causes the precommit hook to be installed, and setting 
> search:false automatically uninstalls it.
> 
> Ok, but if I set search true and then set precommit to [], it still shows the 
> search hook. Weird. And now if I set precommit to some other hook that I 
> wrote, it just slots it in before the search hook. 
> 
> I find this really lame. I've been working on a precommit hook that needs to 
> come after the search hook, but that seems to be impossible now. Prior to 
> 1.0, I could just issue the command 
> 
> curl -X PUT  -d 
> '{"props":{"precommit":[{"mod":"riak_search_kv_hook","fun":"precommit"}, { hook>}]}}' 
> 
> to install the hooks in the right order.
> 
> -- 
> Greg
> Clipboard
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search Precommit Hook in 1.0

2011-09-20 Thread Greg Pascale
Man, it took a whole day after I complained about this to get it fixed? You 
guys are such slackers! :)

Thanks a bunch!

-- 
Greg
Clipboard

On Tuesday, September 20, 2011 at 1:47 PM, Andrew Thompson wrote:

> On Tue, Sep 20, 2011 at 10:53:28AM -0700, Greg Pascale wrote:
> > Hi Jon,
> > 
> > Thanks for getting back to me.
> > 
> > I'm doing full-text indexing, but I don't need to keep the text around once 
> > my object has been indexed, so I wrote a second precommit hook to remove 
> > the text from the object. This reduces the size of my objects drastically 
> > (since text tends to be the largest part) and has given us substantial perf 
> > gains in our tests.
> 
> I just pushed a fix for this. It should be present in the next 1.0
> prerelease.
> 
> https://github.com/basho/riak_search/pull/87
> 
> Andrew
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search Precommit Hook in 1.0

2011-09-21 Thread Greg Pascale
 A few questions 

1) Is this fix in the RC1 build that just came out today?
2) How exactly do search-cmd install, the "search" bucket property and the 
precommit hook fit together now? I'm a bit unclear on what I actually need to 
do to enable search on a bucket.

Thanks.

--
Greg
Clipboard

On Tuesday, September 20, 2011 at 3:42 PM, Greg Pascale wrote:

> Man, it took a whole day after I complained about this to get it fixed? You 
> guys are such slackers! :)
> 
> Thanks a bunch!
> 
> -- 
> Greg
> Clipboard
> 
> On Tuesday, September 20, 2011 at 1:47 PM, Andrew Thompson wrote:
> 
> > On Tue, Sep 20, 2011 at 10:53:28AM -0700, Greg Pascale wrote:
> > > Hi Jon,
> > > 
> > > Thanks for getting back to me.
> > > 
> > > I'm doing full-text indexing, but I don't need to keep the text around 
> > > once my object has been indexed, so I wrote a second precommit hook to 
> > > remove the text from the object. This reduces the size of my objects 
> > > drastically (since text tends to be the largest part) and has given us 
> > > substantial perf gains in our tests.
> > 
> > I just pushed a fix for this. It should be present in the next 1.0
> > prerelease.
> > 
> > https://github.com/basho/riak_search/pull/87
> > 
> > Andrew
> > 
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search Precommit Hook in 1.0

2011-09-21 Thread Greg Pascale
 Got it. 

Sounds like what I want to do is set search to true and also explicitly add the 
riak_search_kv_hook at the end of the precommit list.

Thanks for all the help. 

-- 
Greg
Clipboard

On Wednesday, September 21, 2011 at 2:30 PM, Dan Reverri wrote:

> One more correction, the "install" command is not being removed. The 
> "install" command simply sets the "search" property to true.
> 
> Thanks,
> Dan
> 
>  Daniel Reverri
> Developer Advocate
> Basho Technologies, Inc.
> d...@basho.com (mailto:d...@basho.com)
> 
> 
> On Wed, Sep 21, 2011 at 12:01 PM, Dan Reverri  (mailto:d...@basho.com)> wrote:
> > On Wed, Sep 21, 2011 at 11:47 AM, Dan Reverri  > (mailto:d...@basho.com)> wrote:
> > >  Hi Greg,
> > > 
> > > The "install" command has been removed from "search-cmd". The new 
> > > strategy is as follows:
> > > * The "search" bucket property must be "true" in order for the search 
> > > precommit to be executed
> > > * If the "precommit" property does NOT include the search precommit hook 
> > > it will be inserted at the beginning of the precommit list
> > > 
> > 
> > 
> > Correction: the search precommit is added to the end by default. 
> > 
> > > * If the "precommit" property does include the search precommit hook it 
> > > will be left in it's defined location
> > > 
> > > Thanks,
> > > Dan
> > > 
> > > Daniel Reverri
> > > Developer Advocate
> > > Basho Technologies, Inc.
> > > d...@basho.com (mailto:d...@basho.com)
> > > 
> > > 
> > > 
> > > On Wed, Sep 21, 2011 at 11:09 AM, Greg Pascale  > > (mailto:g...@clipboard.com)> wrote:
> > > >  A few questions 
> > > > 
> > > > 1) Is this fix in the RC1 build that just came out today?
> > > > 2) How exactly do search-cmd install, the "search" bucket property and 
> > > > the precommit hook fit together now? I'm a bit unclear on what I 
> > > > actually need to do to enable search on a bucket.
> > > > 
> > > > Thanks.
> > > > 
> > > > --
> > > > Greg
> > > > Clipboard
> > > > 
> > > > On Tuesday, September 20, 2011 at 3:42 PM, Greg Pascale wrote:
> > > > 
> > > > > Man, it took a whole day after I complained about this to get it 
> > > > > fixed? You guys are such slackers! :)
> > > > > 
> > > > > Thanks a bunch!
> > > > > 
> > > > > -- 
> > > > > Greg
> > > > > Clipboard 
> > > > > 
> > > > > On Tuesday, September 20, 2011 at 1:47 PM, Andrew Thompson wrote:
> > > > > 
> > > > > > On Tue, Sep 20, 2011 at 10:53:28AM -0700, Greg Pascale wrote:
> > > > > > > Hi Jon,
> > > > > > > 
> > > > > > > Thanks for getting back to me.
> > > > > > > 
> > > > > > > I'm doing full-text indexing, but I don't need to keep the text 
> > > > > > > around once my object has been indexed, so I wrote a second 
> > > > > > > precommit hook to remove the text from the object. This reduces 
> > > > > > > the size of my objects drastically (since text tends to be the 
> > > > > > > largest part) and has given us substantial perf gains in our 
> > > > > > > tests.
> > > > > > 
> > > > > > I just pushed a fix for this. It should be present in the next 1.0
> > > > > > prerelease.
> > > > > > 
> > > > > > https://github.com/basho/riak_search/pull/87
> > > > > > 
> > > > > > Andrew
> > > > > > 
> > > > > > ___
> > > > > > riak-users mailing list
> > > > > > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > > > 
> > > > 
> > > > 
> > > > ___
> > > >  riak-users mailing list
> > > > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > > 
> > > 
> > 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Keys out of sync

2011-09-27 Thread Greg Pascale
 Hi,

I've got my riak ring in a state where streaming keys or curling a bucket with 
?keys = true returns a whole bunch of keys of deleted objects.

Since this is test data, I'm going to go ahead and blow away the bitcask 
folders and start over, but I'm wondering if anybody could shed some light on 
how I might have gotten into this state and how I can get out. 

-- 
Greg
Clipboard
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Keys out of sync

2011-09-28 Thread Greg Pascale
 Hi Jon, 

I'm running the 1.0 RC1 build. I'm doing a bunch of basho_bench testing, so 
there's a pretty high turnover where I'm creating a bunch of keys and then 
running a script to delete them all. At some point I noticed that failing on a 
whole bunch of keys and observed the state I described.

Thanks

-- 
Greg
Clipboard

On Tuesday, September 27, 2011 at 4:12 PM, Jon Meredith wrote:

> Hi Greg,
> 
> What version of riak did you experience this with? Were the keys being 
> created and deleted in cycles or were they one off create/deletes?
> 
> Cheers,
> 
>  Jon Meredith
> Basho Technologies.
> 
> On Tue, Sep 27, 2011 at 4:56 PM, Greg Pascale  (mailto:g...@clipboard.com)> wrote:
> >  Hi,
> > 
> > I've got my riak ring in a state where streaming keys or curling a bucket 
> > with ?keys = true returns a whole bunch of keys of deleted objects.
> > 
> >  Since this is test data, I'm going to go ahead and blow away the bitcask 
> > folders and start over, but I'm wondering if anybody could shed some light 
> > on how I might have gotten into this state and how I can get out. 
> > 
> > -- 
> > Greg
> > Clipboard
> > 
> > ___
> >  riak-users mailing list
> > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Custom Extractor Syntax

2011-09-30 Thread Greg Pascale
 Hi, 

Somewhere along the line, it looks like the mechanism for setting a custom 
extractor changed slightly. The documentation now says to set a property called 
search_extractor with the fields mod, fun and arg.

It used to be a property called rs_extractfun with the fields language, 
function and module.

The old rs_extractfun syntax still seems to work with 1.0, but I'm wondering if 
I should switch over the new style to stay compatible with future releases. 
This also brings up an interesting question - how do I unset a bucket property?

-- 
Greg
Clipboard
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: script erlang contrib function

2011-10-03 Thread Greg Pascale
 Hey Francisco, 

If what you're looking to do is connect to Riak in Erlang without having to run 
'riak attach', try this little bit of magic.

http://www.clipboard.com/clip/LR04fvr5rXWvT__G

The value for "cookie" will be "riak" unless you've changed it. 

-- 
Greg
Clipboard

On Monday, October 3, 2011 at 6:20 AM, francisco treacy wrote:

> Please? at least a pointer!
> 
> 2011/9/29 francisco treacy  (mailto:francisco.tre...@gmail.com)>
> >  I'm wanting to use the `bucket_inspector` contrib function (but have zero 
> > Erlang experience).
> > 
> > Following the "usage" page, I do the following:
> > $ /opt/riak/erts-5.7.5/bin/erlc -o /tmp /tmp/bucket_inspector.erl $ riak 
> > attach (riak@127.0.0.1 (mailto:riak@127.0.0.1))1> code:add_path("/tmp"). 
> > (riak@127.0.0.1 (mailto:riak@127.0.0.1))2> m(bucket_inspector). 
> > (riak@127.0.0.1 (mailto:riak@127.0.0.1))3> 
> > bucket_inspector:inspect(<<"bucket">>, 'riaksearch@127.0.0.1 
> > (mailto:riaksearch@127.0.0.1)').
> > And it works.
> > 
> > Now, if I want to run that non-interactively, as a script, how should I 
> > proceed?
> > 
> > I played around with escript but I can't seem to load the Riak path (so the 
> > script will fail at `code` et al) 
> > 
> > Thanks,
> > 
> > Francisco 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Custom Extractor Syntax

2011-10-03 Thread Greg Pascale
 Good to know, thanks. 

Upvote for being able to unset properties over HTTP. We don't deploy erlang in 
production and rely on curl to set all our properties, so that'd be great for 
us. 

-- 
Greg
Clipboard

On Sunday, October 2, 2011 at 8:11 PM, Ryan Zezeski wrote:

> Greg,
> 
> Yes, use the new property going forward. The old property will still work but 
> at some point will be phased out. When that happens it will be documented in 
> the release notes.
> 
>  You currently can't remove a bucket property. In the case where 
> rs_extractfun is still set in a future version it shouldn't matter because it 
> will just be ignored by the code. However, I could maybe see this being an 
> issue if you wanted to change a property to no longer have any effect (e.g. 
> unset a custom extractor). Currently, as it stands, you could do this by 
> setting the property to `undefined` but that can only be achieved via the 
> Erlang client. We should probably have a way to set a sentinel value via HTTP 
> as well. 
> 
> -Ryan
> 
> On Fri, Sep 30, 2011 at 7:51 PM, Greg Pascale  (mailto:g...@clipboard.com)> wrote:
> >  Hi, 
> > 
> > Somewhere along the line, it looks like the mechanism for setting a custom 
> > extractor changed slightly. The documentation now says to set a property 
> > called search_extractor with the fields mod, fun and arg. 
> > 
> > It used to be a property called rs_extractfun with the fields language, 
> > function and module.
> > 
> > The old rs_extractfun syntax still seems to work with 1.0, but I'm 
> > wondering if I should switch over the new style to stay compatible with 
> > future releases. This also brings up an interesting question - how do I 
> > unset a bucket property? 
> > 
> > -- 
> > Greg
> > Clipboard
> > 
> > ___
> >  riak-users mailing list
> > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Search 1.0 Bug - Inline Fields

2011-10-05 Thread Greg Pascale
 Hi, 

I have uncovered what I think has to be a bug with inline field searches in 
Riak 1.0. In short, it seems that there are issues when including a field in 
both the query and filter query. I have a query in production that used to work 
with 14.2 but no longer does with 1.0.

I dove into the issue a bit and was able to come up with some simpler examples 
that illustrate what I'm seeing. I ran these on my dev box which only has about 
15 items in the clips bucket, so I'm nowhere close to the results limit. ctime 
is a field with inline set to true.

Example 1: Works

search-cmd search clips 'ctime:([98682333448080 TO 98682333448089])'

:: Searching for 'ctime:([98682333448080 TO 98682333448089])' / '' in clips...

--

index/id: clips/LR04QZTJS_wVDcb6
<<"ctime">> -> <<"98682333448084">>
p -> [0]
<<"ctime">> -> [<<"98682333448084">>]
<<"private">> -> [<<"0">>]
score -> 0.0

--

Example 2: Times out (should return 0 results)

search-cmd search clips 'ctime:([98682333448080 TO 98682333448089])' 
'ctime:(99)' :: Searching for 'ctime:([98682333448080 TO 
98682333448089])' / 'ctime:(99)' in clips... 
-- :: ERROR: timeout 

Example 3: Times out (should return 1 result)

search-cmd search clips 'ctime:(98682333448084)' 'ctime:(98682333448084)'

:: Searching for 'ctime:(98682333448084)' / 'ctime:(98682333448084)' in clips...

--

:: ERROR: {badarg,[{lists,member,[<<"98682333448084">>,<<"98682333448084">>]},
 {riak_search_inlines,passes_inlines_1,3},
 {lists,all,2},
{mi_server,iterate,6},
 {mi_server,lookup,8}]}


Thanks

-- 
Greg
Clipboard
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search 1.0 Bug - Inline Fields

2011-10-06 Thread Greg Pascale
Anybody have any ideas on this one? I was able to deploy a workaround, but it 
involved some ugly special case logic in my search code. 

Also, the last example should say "crashes", not "times out". It returned with 
that error immediately. 

-- 
Greg
Clipboard

On Wednesday, October 5, 2011 at 11:31 AM, Greg Pascale wrote:

>  Hi, 
> 
> I have uncovered what I think has to be a bug with inline field searches in 
> Riak 1.0. In short, it seems that there are issues when including a field in 
> both the query and filter query. I have a query in production that used to 
> work with 14.2 but no longer does with 1.0.
> 
> I dove into the issue a bit and was able to come up with some simpler 
> examples that illustrate what I'm seeing. I ran these on my dev box which 
> only has about 15 items in the clips bucket, so I'm nowhere close to the 
> results limit. ctime is a field with inline set to true.
> 
> Example 1: Works
> 
> search-cmd search clips 'ctime:([98682333448080 TO 98682333448089])'
> 
>   :: Searching for 'ctime:([98682333448080 TO 98682333448089])' / '' in 
> clips...
> 
> --
> 
> index/id: clips/LR04QZTJS_wVDcb6
> <<"ctime">> -> <<"98682333448084">>
> p -> [0]
> <<"ctime">> -> [<<"98682333448084">>]
> <<"private">> -> [<<"0">>]
> score -> 0.0
> 
> --
> 
> Example 2: Times out (should return 0 results)
> 
> search-cmd search clips 'ctime:([98682333448080 TO 98682333448089])' 
> 'ctime:(99)' :: Searching for 'ctime:([98682333448080 TO 
> 98682333448089])' / 'ctime:(99)' in clips... 
> -- :: ERROR: timeout 
> 
> Example 3: Times out (should return 1 result)
> 
> search-cmd search clips 'ctime:(98682333448084)' 'ctime:(98682333448084)'
> 
>  :: Searching for 'ctime:(98682333448084)' / 'ctime:(98682333448084)' in 
> clips...
> 
> --
> 
>  :: ERROR: 
> {badarg,[{lists,member,[<<"98682333448084">>,<<"98682333448084">>]},
> {riak_search_inlines,passes_inlines_1,3},
> {lists,all,2},
> {mi_server,iterate,6},
> {mi_server,lookup,8}]}
> 
> 
> Thanks
> 
> -- 
> Greg
> Clipboard

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search 1.0 Bug - Inline Fields

2011-10-06 Thread Greg Pascale
 Sounds good. Thanks for looking into this, Rusty.

-- 
Greg
Clipboard

On Thursday, October 6, 2011 at 2:34 PM, Rusty Klophaus wrote:

> Hi Greg,
> 
> Did some investigation into this issue. This is indeed a bug introduced in 
> 1.0. 
> 
> Specifically, it affects filtering on inline fields when that field is also 
> used in the search query. In other words, if your main search is on field 
> "ctime" and the filter is on field "foo", then this bug is not triggered. 
> 
> This issue is tracked here: https://issues.basho.com/1240
> 
> There is no easy workaround, so your special cased search logic is probably 
> the best route until this is fixed. 
> 
> Best,
> Rusty
> 
> 
> On Thu, Oct 6, 2011 at 3:21 PM, Greg Pascale  (mailto:g...@clipboard.com)> wrote:
> > Anybody have any ideas on this one? I was able to deploy a workaround, but 
> > it involved some ugly special case logic in my search code. 
> > 
> > Also, the last example should say "crashes", not "times out". It returned 
> > with that error immediately. 
> > 
> > -- 
> > Greg
> > Clipboard
> > 
> > On Wednesday, October 5, 2011 at 11:31 AM, Greg Pascale wrote:
> > 
> > >  Hi, 
> > > 
> > > I have uncovered what I think has to be a bug with inline field searches 
> > > in Riak 1.0. In short, it seems that there are issues when including a 
> > > field in both the query and filter query. I have a query in production 
> > > that used to work with 14.2 but no longer does with 1.0. 
> > > 
> > > I dove into the issue a bit and was able to come up with some simpler 
> > > examples that illustrate what I'm seeing. I ran these on my dev box which 
> > > only has about 15 items in the clips bucket, so I'm nowhere close to the 
> > > results limit. ctime is a field with inline set to true. 
> > > 
> > > Example 1: Works
> > > 
> > > search-cmd search clips 'ctime:([98682333448080 TO 98682333448089])'
> > > 
> > >   :: Searching for 'ctime:([98682333448080 TO 98682333448089])' / '' in 
> > > clips...
> > > 
> > > -- 
> > > 
> > > index/id: clips/LR04QZTJS_wVDcb6
> > > <<"ctime">> -> <<"98682333448084">>
> > > p -> [0]
> > > <<"ctime">> -> [<<"98682333448084">>]
> > > <<"private">> -> [<<"0">>]
> > > score -> 0.0
> > > 
> > > --
> > > 
> > > Example 2: Times out (should return 0 results) 
> > > 
> > > search-cmd search clips 'ctime:([98682333448080 TO 98682333448089])' 
> > > 'ctime:(99)' :: Searching for 'ctime:([98682333448080 TO 
> > > 98682333448089])' / 'ctime:(99)' in clips... 
> > > -- :: ERROR: timeout 
> > > 
> > > Example 3: Times out (should return 1 result)
> > > 
> > > search-cmd search clips 'ctime:(98682333448084)' 'ctime:(98682333448084)' 
> > > 
> > >  :: Searching for 'ctime:(98682333448084)' / 'ctime:(98682333448084)' in 
> > > clips...
> > > 
> > > -- 
> > > 
> > >  :: ERROR: 
> > > {badarg,[{lists,member,[<<"98682333448084">>,<<"98682333448084">>]},
> > > {riak_search_inlines,passes_inlines_1,3},
> > > {lists,all,2},
> > > {mi_server,iterate,6},
> > > {mi_server,lookup,8}]}
> > > 
> > > 
> > > Thanks
> > > 
> > > -- 
> > > Greg
> > > Clipboard
> > 
> > 
> > ___
> >  riak-users mailing list
> > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > 
> 
> 
> 
> -- 
> Rusty Klophaus
> 
> Basho Technologies, Inc.
>  11921 Freedom Drive, Suite 550
> Reston, VA 20190
> www.basho.com (http://www.basho.com/) 
> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


2i for single-result lookup

2011-11-07 Thread Greg Pascale
Hi, 

I'm thinking about using 2i for a certain piece of my system, but I'm worried 
that the document-based partitioning may make it suboptimal.

The issue is that the secondary fields I want to query over (email and 
username) are unique, so each will only ever map to one value. Since 2i queries 
a coverage set, but I'm only ever looking for one result, it's going to be 
hitting n-1 machines needlessly.

So, what I'm looking to understand is how much overhead a single-result 2i 
lookup like this will incur vs. a primary-key lookup, or even vs. search. 
Search doesn't intuitively feel like the right tool here, but I wonder if it 
may actually be preferable since it uses term-based partitioning.

Thanks,

-- 
Greg
Clipboard
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: 2i for single-result lookup

2011-11-07 Thread Greg Pascale
Hi Nate,

 There are only 2 secondary keys for now (in addition to the primary key), but 
this number will grow to 5 or more pretty soon.  

I think when you say "insert each separately", you mean create 2 duplicate 
objects, one keyed by username and one keyed by email. Or do you mean create 
one object keyed by username, and then another object containing the username 
and keyed by email (a manual index if you will)? Code complexity is the main 
reason I'd like to avoid a solution like this. Suddenly a user create operation 
requires n writes to be considered a success. If one fails, I need to delete 
the others, etc… It quickly becomes a pain.

I don't know what you mean by "some relationship between the keys"  

--  
Greg
Clipboard

On Monday, November 7, 2011 at 5:59 PM, Nate Lawson wrote:

> On Nov 7, 2011, at 5:45 PM, Greg Pascale wrote:
>  
> > Hi,
> >  
> > I'm thinking about using 2i for a certain piece of my system, but I'm 
> > worried that the document-based partitioning may make it suboptimal.
> >  
> > The issue is that the secondary fields I want to query over (email and 
> > username) are unique, so each will only ever map to one value. Since 2i 
> > queries a coverage set, but I'm only ever looking for one result, it's 
> > going to be hitting n-1 machines needlessly.
> >  
> > So, what I'm looking to understand is how much overhead a single-result 2i 
> > lookup like this will incur vs. a primary-key lookup, or even vs. search. 
> > Search doesn't intuitively feel like the right tool here, but I wonder if 
> > it may actually be preferable since it uses term-based partitioning.
> >  
> > Thanks,
>  
>  
> If it's only 2 keys, why not insert each separately? You will double your 
> total number of keys in the db. But both search and 2I are creating extra 
> keys anyway in their private indices, so it has the same or worse effect on 
> total storage as doubling your primary keys. And query efficiency is worse, 
> as you point out.
>  
> 2I and search are more useful where there's some relationship between the 
> keys, not when they're fully independent as you point out.
>  
> -Nate
>  
>  
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: 2i for single-result lookup

2011-11-08 Thread Greg Pascale
Thanks for all the suggestions. 

Let's not worry about the problem of ensuring uniqueness right now - I have 
that part solved separately.

Rohman, this approach is what I described as a "manual index". It adds a good 
deal of code that I'm hoping to avoid by using 2i or search. This whole thing 
is a balancing act between performance, code complexity and to some extent disk 
space. I'm trying to evaluate which approach gives me the best of all 3.

Manual Index:
  - Adds lots of code complexity
  - Requires only 2 gets look up a user - one to the index to retrieve the 
primary key, then the lookup of the primary key
  - Not much extra space used.

Search:
  - Barely any extra code
  - Have to index user objects, so about 2x as much space is required
  - Since search uses term-based partitioning, querying should be pretty fast 
in theory. Write performance will take a hit due to indexing.

2i:
  - Barely any extra code
  - Unsure of amount of extra space required - think it won't be too much
  - Because of document-based partitioning, there is overhead in talking to a 
coverage set when only one machine will have the result I'm looking for. The 
significance of this overhead is what I'm really trying to evaluate.

-- 
Greg
Clipboard

On Monday, November 7, 2011 at 8:38 PM, Antonio Rohman Fernandez wrote:

> Instead of using 2i, you could do the following when saving:
> 
> POST http://{IP}:8098/riak/users/rohman
> {"email":"roh...@mahalostudio.com 
> (mailto:roh...@mahalostudio.com)","otherdata":""}
> 
> POST http://{IP}:8098/riak/emails/roh...@mahalostudio.com 
> (mailto:roh...@mahalostudio.com)
> {"owner":"rohman"}
> 
> So checking if an email address exists is only a GET 
> http://{IP}:8098/riak/emails/roh...@mahalostudio.com 
> (mailto:roh...@mahalostudio.com) ( and you can even know who is the user 
> owner -> rohman )
> 
> Just you need to create a simple new bucket/key for it... but is worth the 
> effort... not much work... no big key arrays, no loops, etc... instant 
> response ; ) 
> Rohman
> 
>  
> Antonio Rohman Fernandez
> CEO, Founder & Lead Engineer
> roh...@mahalostudio.com (mailto:roh...@mahalostudio.com) 
>  
> Projects
> MaruBatsu.es (http://marubatsu.es)
> PupCloud.com (http://pupcloud.com)
> Wedding Album (http://wedding.mahalostudio.com) 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Secondary Indexes - Feedback?

2011-11-30 Thread Greg Pascale
Here at Clipboard, we make very heavy use of Riak Search and a couple of manual 
indices here and there. I've wanted to use 2i a few times but have decided 
against it for a few reasons:

1) Apprehension about the coverage set query, as Matt articulated.

2) Lack of ordering of returned results. Generally I just want the top 10 or 
so, and the ordering information is in the primary key. I can accomplish this 
with search via the presort parameter. 

To me, the implementations of search and 2I are backwards. Search has 
scalability issues because term-based partitioning optimizes for single-term 
queries, but creates huge hotspots making many AND queries prohibitively 
expensive. 2I's document-based partitioning makes single-term queries more 
expensive (coverage set) but should allow AND queries to scale. But 2i only 
supports single-term queries!

-- 
Greg
Clipboard

On Monday, November 21, 2011 at 10:18 PM, Fyodor Yarochkin wrote:

> > Have you tried Secondary Indexes?
> > Does the feature help solve your problems? If not, why not? Any concerns?
> > What is your wish list for the future of Secondary Indexes?
> 
> yup. I think secondary indexes is probably one of the most-wanted
> options for this release. It does impact how you are able to model
> your data alot. We discussed the data modeling patterns internally
> here, and the cool thing with secondary indexes is that it is not only
> queries are possible but also the secondary index name could be
> throught of as dynamic variable. Thus, as long as you can predict the
> secondary index name, you can pretty much use it as indexed field in
> SQL data model. One thing we have not tested yet though: if there is
> a limit on number of secondary indexes for a single object, and how
> the system would behave if the number of secondary indexes for a
> particular object is huge.
> 
> Another limitation (or wouldbegoodtohave :-)) thing that we have
> noticed is that there is no straight-forward way to query data by
> multiple secondary indexes at once. You can either do key filtering,
> or do one query, feed it to map job, and then reduce by removing
> entries that do not much 2nd criteria, but not query by
> secondaryAval_int/2 and secondaryBval_int/4. This said, I haven't
> really looked into inner workings of secondary indexes implementation,
> so I am simply commenting on this from a user perspective.
> 
> Other than this would be interesting to hear some comparisons on
> performance of secondary index queries vs. SOLR indexes (riak_search),
> in our experience secondary indexes perform way faster on large volume
> of data but this could could be just my impression.
> 
> cheers,
> -Fyodor
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Secondary Indexes - Feedback?

2011-12-01 Thread Greg Pascale
> That's a concern, but you gain parallelism, compared to Search's single term 
> index.
> 
> 
> 

> While they are more expensive in the sense that they require more nodes to 
> participate, they split the load between the nodes, thus overall, the work 
> should be about the same, and unless the nodes are busy with some other work, 
> it should complete sooner, as each node has less work to do.
> 
> 
> 


I think your logic is flawed. Each node has fewer keys to return, but that 
doesn't mean it has that much less work. Whether you're returning 1 key or 100, 
you still have to go to disk to read from the index, and I have to imagine 
that's much more expensive than reading the keys (if there isn't a huge number 
of them). In other words, I believe the latency dominates the cost here. It's 
the same idea as how downloading 100 1MB files is slower than 1 100MB file. For 
a simple query - the only kind 2I supports - I'd rather read the whole index 
with only one disk read.


And what if I don't have a lot of keys? In many cases, my 2I lookup may only 
ever return one result. For example, imagine a Person record with secondary 
indices over email address and username. Presumably, each email address and 
username is unique, so any 2I queries I do on those fields should return one 
result. I really hate the idea that I have to talk to 1/3 of the machines in my 
ring (and they probably all have to go to disk too) just so that one of them 
can ultimately return my one result.
> Not sure why this would be a concern.  Search's presort option must have the 
> full result set before it can fully sort it, no?  There is no reason why 
> sorting the results of a a 2i query should be any slower.  In addition, 2i is 
> stored in leveldb, and leveldb, like merge_index if I recall correctly, 
> stores keys and values sorted. Thus, the result set is already partially 
> ordered.
> 
> 
> 

No - presort is applied to keys before any index documents are retrieved, so 
it's quite fast. Yes, the results are ordered in the index, but that doesn't 
matter to the user. The API states result ordering is undefined.


-- 
Greg
Clipboard

On Wednesday, November 30, 2011 at 3:05 PM, Elias Levy wrote:

> On Wed, Nov 30, 2011 at 1:32 PM,  (mailto:riak-users-requ...@lists.basho.com)> wrote:
> > Here at Clipboard, we make very heavy use of Riak Search and a couple of 
> > manual indices here and there. I've wanted to use 2i a few times but have 
> > decided against it for a few reasons:
> > 
> >  1) Apprehension about the coverage set query, as Matt articulated.
> 
> That's a concern, but you gain parallelism, compared to Search's single term 
> index.
>  
> >  2) Lack of ordering of returned results. Generally I just want the top 10 
> > or so, and the ordering information is in the primary key. I can accomplish 
> > this with search via the presort parameter.
> 
> Not sure why this would be a concern.  Search's presort option must have the 
> full result set before it can fully sort it, no?  There is no reason why 
> sorting the results of a a 2i query should be any slower.  In addition, 2i is 
> stored in leveldb, and leveldb, like merge_index if I recall correctly, 
> stores keys and values sorted. Thus, the result set is already partially 
> ordered. 
> 
> > To me, the implementations of search and 2I are backwards. Search has 
> > scalability issues because term-based partitioning optimizes for 
> > single-term queries, but creates huge hotspots making many AND queries 
> > prohibitively expensive. 2I's document-based partitioning makes single-term 
> > queries more expensive (coverage set) but should allow AND queries to 
> > scale. But 2i only supports single-term queries!
> 
> While they are more expensive in the sense that they require more nodes to 
> participate, they split the load between the nodes, thus overall, the work 
> should be about the same, and unless the nodes are busy with some other work, 
> it should complete sooner, as each node has less work to do. 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: ad-hoc queries

2011-12-02 Thread Greg Pascale
 Hey Francisco, 

I find myself doing this fairly often - mainly through curl. Definitely room 
for improvement there... 

-- 
Greg
Clipboard

On Friday, December 2, 2011 at 5:39 AM, francisco treacy wrote:

> Hi riak-users,
> 
> I just tweeted this... but here is probably the best place to ask:
> 
> Do you regularly ad-hoc query your production data?
> 
> I'm actually interested in finding out how people perform these
> queries. Use client libraries in a REPL, or scripts? Erlang by
> attaching to a node? Post the full-blown JSON with curl?
> 
> I am using riak-js, often times with riak-ql, off the node.js REPL.
> (It's quite ok, but comparing it to SQL, hell even MongoDB, it feels
> just too much work to get data out.)
> 
> Francisco
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com