Re: Logical operators in Ripple

2011-06-07 Thread Sean Cribbs
If this is done incorrectly in Ripple, please file an issue[1] on Github -- or 
even better, send a pull-request.  We should get this fixed on the wiki[2] ASAP 
as well.

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

[1] https://github.com/seancribbs/ripple/issues
[2] https://github.com/basho/riak_wiki

On Jun 6, 2011, at 8:17 PM, Ryan Caught wrote:

> Thanks Jeremiah.  Nesting the "and" in another array worked great.
> 
> On Mon, Jun 6, 2011 at 1:58 PM, David Mitchell  
> wrote:
> The "and" based key filter is not working for me either.  See: 
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-June/004432.html
> 
>  
> David
> 
>  
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Finding Value greater then anyvalue

2011-06-07 Thread Muhammad Yousaf

Hi,
I am using Riaksearch with erlang PB client by following  
"https://github.com/basho/riak-erlang-client/"; my schema is 

{
schema,
[
{version, "1.1"},
{default_field, "playername"},
{default_op, "or"},
{n_val, 3},
{analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}}
],
[
{field, [
{name, "score"},
{type, integer},
{padding_size, 10},
{analyzer_factory, {erlang, text_analyzers, 
integer_analyzer_factory}}
]},
]
}.

i am populating my bucket in format like 
Bucket,Key,[{key,value},{key,value},{score,200}]


can I find keys greater then value for example 500 by using 

riakc_pb_socket:search(Client, "player", score > 500)

i tried but with no luck. If yes then how? if not then how can i do that ?

thanks in advance and looking forward for ideas?


Regards,
Muhammad Yousaf

  ___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Finding Value greater then anyvalue

2011-06-07 Thread Rusty Klophaus
Hi Muhammad,

Riak Search uses Lucene Query syntax. Because Lucene syntax is meant for
text searching, doing a 'greater than' query is somewhat convoluted, but
still possible:

*riakc_pb_socket:search(Client, "player", "score: {500 TO 99]")*
*
*
That tells Riak Search to query the "players" index, looking for documents
with a field of "score" that is from 500 (exclusive) to 
(inclusive). Notice the curly brace on the left vs. the bracket on the
right. A curly brace is for exclusive ranges, a bracket for inclusive
ranges.

Hope that helps,

Best,
Rusty


On Tue, Jun 7, 2011 at 7:08 AM, Muhammad Yousaf wrote:

>  Hi,
> I am using Riaksearch with erlang PB client by following  "
> https://github.com/basho/riak-erlang-client/"; my schema is
>
> {
> schema,
> [
> {version, "1.1"},
> {default_field, "playername"},
> {default_op, "or"},
> {n_val, 3},
> {analyzer_factory, {erlang, text_analyzers,
> standard_analyzer_factory}}
> ],
> [
> {field, [
> {name, "score"},
> {type, integer},
> {padding_size, 10},
> {analyzer_factory, {erlang, text_analyzers,
> integer_analyzer_factory}}
> ]},
> ]
> }.
>
> i am populating my bucket in format like
> Bucket,Key,[{key,value},{key,value},{score,200}]
>
>
> can I find keys greater then value for example 500 by using
>
> riakc_pb_socket:search(Client, "player", score > 500)
>
> i tried but with no luck. If yes then how? if not then how can i do that ?
>
> thanks in advance and looking forward for ideas?
>
>
> Regards,
>
> Muhammad Yousaf
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Correct way to use pbc/mapreduce to do multiget where keys and bucket names are binary values?

2011-06-07 Thread Jacques
Have you had any success with reading the server response?

Thanks,
Jacques

On Sat, Jun 4, 2011 at 1:19 PM, Russell Brown  wrote:

>
> On 4 Jun 2011, at 18:22, Jacques wrote:
>
> I like the sound of option 3 also. I'll have a look at it this weekend and
>> get back to you.
>>
>
> Awesome!  Thanks.  If you can give me a point in the right direction
> regarding the correct typing approach and what not, I'm up for giving it a
> shot as well.
>
>
> Ok, I have a half working hack. It isn't pretty 'cos the Jinterface API is
> verbose. I've hacked the pbc.MapReduceBuilder to encode the job as
> "application/x-erlang-binary" and submit that, but really this code should
> be in a separate class, maybe using the output from
> MapReduceBuilder.getJSON() as input. That way you can get the feature
> without patching the client.
>
> What I haven't done is decode the response from Riak yet. If you want  a
> pointer here is a gist of the (unclean) hack. It could use a lot of work,
> but it proves the concept:
>
> https://gist.github.com/1008293
>
> The gist is just the diff so you can apply it as a patch
> to src/main/java/com/basho/riak/pbc/mapreduce/MapReduceBuilder.java if you
> want to play with it.
>
> You'll have to add Jinterface to your pom too.
>
> 
> 
> org.erlang.otp
> jinterface
> 1.5.4
> 
>
> I think it is best to put the result decoding outside the library too. I'm
> going to hack up a poc for that now, but I thought I'd post what I have thus
> far.
>
> Cheers
>
> Russell
>
>
> Thanks again,
> Jacques
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Correct way to use pbc/mapreduce to do multiget where keys and bucket names are binary values?

2011-06-07 Thread Russell Brown

On 7 Jun 2011, at 15:29, Jacques wrote:

> Have you had any success with reading the server response?

Yes. Sorry I didn't post a reply. 

It was trivial (but fraught, see below), I just used the OtpInputStream to 
deserialize the byte array returned from pbc.MapReduceResponse.

Like:

 ByteString bs = resp.getContent();
OtpInputStream is = new OtpInputStream(bs.toByteArray());
OtpErlangObject result = is.read_any();
// and then all sorts of looping, sniffing types, unpacking etc


Caveats: 

You have a pain unpacking any reasonably complex result.
The ETF specification drops this doozy about "strings": 
http://www.erlang.org/doc/apps/erts/erl_ext_dist.html#id85596. My first test 
actually returned a list of [0,1,2,3...200] and Jinterface helpfully turned 
that into a string for me.

That aside it is certainly feasible to use Jinterface to serialize/deserialize 
Map/Reduce jobs/results.

Cheers

Russell

> 
> Thanks,
> Jacques
> 
> On Sat, Jun 4, 2011 at 1:19 PM, Russell Brown  wrote:
> 
> On 4 Jun 2011, at 18:22, Jacques wrote:
> 
>> I like the sound of option 3 also. I'll have a look at it this weekend and 
>> get back to you.
>> 
>> Awesome!  Thanks.  If you can give me a point in the right direction 
>> regarding the correct typing approach and what not, I'm up for giving it a 
>> shot as well.
> 
> Ok, I have a half working hack. It isn't pretty 'cos the Jinterface API is 
> verbose. I've hacked the pbc.MapReduceBuilder to encode the job as 
> "application/x-erlang-binary" and submit that, but really this code should be 
> in a separate class, maybe using the output from MapReduceBuilder.getJSON() 
> as input. That way you can get the feature without patching the client.
> 
> What I haven't done is decode the response from Riak yet. If you want  a 
> pointer here is a gist of the (unclean) hack. It could use a lot of work, but 
> it proves the concept:
> 
> https://gist.github.com/1008293
> 
> The gist is just the diff so you can apply it as a patch to 
> src/main/java/com/basho/riak/pbc/mapreduce/MapReduceBuilder.java if you want 
> to play with it. 
> 
> You'll have to add Jinterface to your pom too.
> 
> 
> 
>   org.erlang.otp
>   jinterface
>   1.5.4
> 
> 
> I think it is best to put the result decoding outside the library too. I'm 
> going to hack up a poc for that now, but I thought I'd post what I have thus 
> far.
> 
> Cheers
> 
> Russell
> 
>> 
>> Thanks again,
>> Jacques
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


RE: riak_search moving forward

2011-06-07 Thread Jasinek, Jason
David,

Are there plans to add in additional Erlang Analyzers that would provide 
additional terms?  Analyzers that I would like to see added would including 
stemming and ngrams.

Jason

-Original Message-
From: riak-users-boun...@lists.basho.com 
[mailto:riak-users-boun...@lists.basho.com] On Behalf Of David Smith
Sent: Monday, June 06, 2011 5:33 PM
To: riak-users
Subject: riak_search moving forward

Hi all,

One of the things we've got cooking at Basho HQ is a plan to roll the
functionality currently in the riak_search project into the default
Riak distribution. This means that using riak_search will become a
matter of flipping a configuration switch versus running a whole
separate instance. This will make it easier to use and deploy .

As part of this merging process, we've realized that while having
Java-based analyzers is useful in some circumstances, it also imposes
a lot of overhead per search-query (if you're using it) and adds an
unwieldy dependency on the JDK to the Riak server. All the deployments
we're aware of do NOT use the Java-based analyzers; i.e. the default,
Erlang-based analyzers provide the necessary functionality with much
lower overhead.

So, I'm soliciting YOUR feedback on the removal of Java-based
analyzers in riak_search. Are you using it in production and would you
miss it if it was gone?

Thanks,

D.

--
Dave Smith
Director, Engineering
Basho Technologies, Inc.
diz...@basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

DISCLAIMER: This message (including any files transmitted with it) may contain 
confidential and / or proprietary information, is the property of Interactive 
Data Corporation and / or its subsidiaries and is directed only to the 
addressee(s). If you are not the designated recipient or have reason to believe 
you received this message in error, please delete this message from your system 
and notify the sender immediately. An unintended recipient's disclosure, 
copying, distribution or use of this message, or any attachments, is prohibited 
and may be unlawful.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak_search moving forward

2011-06-07 Thread David Smith
On Tue, Jun 7, 2011 at 9:02 AM, Jasinek, Jason
 wrote:
>
> Are there plans to add in additional Erlang Analyzers that would provide 
> additional terms?  Analyzers that I would like to see added would including 
> stemming and ngrams.

Hi Jason,

We're not currently planning on implementing those specific analyzers,
but we will be improving the documentation of the API for writing such
analyzers so that someone else could do that.

Patches, of course, are always welcome. :)

D.

-- 
Dave Smith
Director, Engineering
Basho Technologies, Inc.
diz...@basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch memory growth issues

2011-06-07 Thread Gordon Tillman
Guys I have put together a simple test to reproduce the error that we are 
seeing.

It is on github here:

https://github.com/gordyt/riaksearch-test

This is a multi-threaded test that connects to Riak using the protocol buffers 
interface.  Each iteration of the run loop issues one simple search and uploads 
one small json object.

Thanks very much for any input you might have.

Regards,

--gordon

On Jun 6, 2011, at 10:01 , Gordon Tillman wrote:

Good Morning Gilbert,

I have posted this gist:

https://gist.github.com/1010384

It is a minor update we made to 
it_op_collector_loop/3 in riak_search_op_utils.  This update was done to 
alleviate the situation that we observe here:

https://gist.github.com/1000735

But it was made with the understanding that 
this is treating a symptom and not fixing the cause of the problem.

A little bit of followup information: The problem seems to be exacerbated when 
Riak is hit with a series of operations that are all generating the same 
search/map/reduce operation (albeit with differing search input parameters).

We installed 0.14.2 and tested this weekend (without our update applied) and 
observed the same issues.

If I found out anything else I will let you know.

--gordon



On May 31, 2011, at 18:09 , Gilbert Glåns wrote:

Gordon,

Great news!  Much appreciated.

Gilbert

On Tue, May 31, 2011 at 2:25 PM, Gordon Tillman 
mailto:gtill...@mezeo.com>> wrote:
Howdy Gilbert,

Hey we are testing a fix now.  If this works I will send you a copy of the 
update file.

--gordon


On May 31, 2011, at 12:55 , Gilbert Glåns wrote:

Hi Gordon,
Thank you for sharing the information.  We are seeing the same exact
type of behavior from our search cluster.  I have tracked the
problem(s) though the query system.  It looks like the mailboxes we
are both seeing are "abandoned" and / or the messages are never
matched within the Erlang code (it_op_collector_loop,
riak_search_op_utils.erl); the messages are then never processed,
therefore the resources they utilize never released.  This is a major
problem.

I have been debugging this for some time and I wish I could say it was
going well.  The implementation is convoluted -- have you gotten
through it?  Can you verify the same cause?

We have been internally discussing the possibility of removing this
query processing implementation completely and replacing it with
something built in-house because the problems we have uncovered trying
to debug the "abandoned mailbox" problem are related and systemic:  1)
indeterminate and possibly very large data structures created and
manipulated for intermediate and final sets of results, 2) very poor
or non-existent ability to gain any insight into what is executing
within the "plumbing" of the current query execution system without
"herculean" effort (in my opinion), and 3) unacceptable performance
(predictably or subjectively) from the merge_index riak_search
backend.

Are there any other backends available for riak_search with the
Enterprise Riak offering?  I really like the design of riak_search but
the performance seems to be only a very small fraction of our
equivalent SOLR installation, even with several times the amount of
resources "thrown at it" -- it does not seem to use resources we
"throw at it" well, either, or in the mailboxes case, responsibly.

I will quickly admit I may be doing something wrong.  Is there a
user-error situation in which mailboxes should be abandoned taking up
memory?

Does anyone else have experiences with equivalent riak_search vs. SOLR
installations?

Thanks again for sharing Gordon.  Your results make me feel like this
may not be entirely stupidity on my part.

Gilbert


On Tue, May 31, 2011 at 8:51 AM, Gordon Tillman 
mailto:gtill...@mezeo.com>> wrote:
Howdy Gilbert,
I reproduced the issue this morning and then ran the command that you
specified on two of the non-empty mailboxes.
The output from that is posted here:
https://gist.github.com/1000735
Please let me know if this corresponds to the issue that you are seeing.
Thank you,
--gordon

On May 27, 2011, at 20:10 , Gilbert Glåns wrote:

Gordon,
Could you try:

erlang:process_info(list_to_pid("<0.16614.32>"), [messages,
current_function, initial_call, links, memory, status]).

in a riak search console for one/some of those mailboxes and share the
results? I am curious to see if you are having the same systemic
memory consumption I am experiencing.

Gilbert

On Fri, May 27, 2011 at 5:15 PM, Gordon Tillman 
mailto:gtill...@mezeo.com>> wrote:

Howdy Gang,

We are having a bit of an issue with our 3-node riaksearch cluster.  What is
happing is this:

Cluster is up and running.  We start testing our application against it.  As
the application runs the erlang process consumes more and more memory
without ever releasing it.

In trying to investigate the issue we ran the riaksearch-admin cluster_info
command.  It appears that the bulk of this memory is bei

Pruning (merging) after storage reaches a certain size?

2011-06-07 Thread Steve Webb

Hello there.

I'm loading a 2-node (1GB mem, 20GB storage, vmware VMs) riaksearch 
cluster with the spritzer twitter feed.  I used the bitcask 'expiry_secs' 
to expire data after 3 days.


I'm curious - I'm up to about 10GB of storage and I'm guessing that I'll 
be full in 3-4 more days of ingesting data.  I have no idea if/when a 
merge will run to expire the older data.


Q: Is there a method or command to force a merge at any time?
Q: Is there a way to run a merge when the storage size reaches a specific 
threshold?


- Steve

--
Steve Webb - Senior System Administrator for gnip.com
http://twitter.com/GnipWebb

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch memory growth issues

2011-06-07 Thread David Smith
Gordon,

Thanks for the test case. I've queued it up for review by a dev, as
time permits.

D.

On Tue, Jun 7, 2011 at 1:33 PM, Gordon Tillman  wrote:
> Guys I have put together a simple test to reproduce the error that we are
> seeing.
> It is on github here:
> https://github.com/gordyt/riaksearch-test
> This is a multi-threaded test that connects to Riak using the protocol
> buffers interface.  Each iteration of the run loop issues one simple search
> and uploads one small json object.
> Thanks very much for any input you might have.
> Regards,
> --gordon
> On Jun 6, 2011, at 10:01 , Gordon Tillman wrote:
>
> Good Morning Gilbert,
> I have posted this gist:
> https://gist.github.com/1010384
> It is a minor update we made to it_op_collector_loop/3 in
> riak_search_op_utils.  This update was done to alleviate the situation that
> we observe here:
> https://gist.github.com/1000735
> But it was made with the understanding that this is treating a symptom and
> not fixing the cause of the problem.
> A little bit of followup information: The problem seems to be exacerbated
> when Riak is hit with a series of operations that are all generating the
> same search/map/reduce operation (albeit with differing search input
> parameters).
> We installed 0.14.2 and tested this weekend (without our update applied) and
> observed the same issues.
> If I found out anything else I will let you know.
> --gordon
>
>
>
> On May 31, 2011, at 18:09 , Gilbert Glåns wrote:
>
> Gordon,
>
> Great news!  Much appreciated.
>
> Gilbert
>
> On Tue, May 31, 2011 at 2:25 PM, Gordon Tillman  wrote:
>
> Howdy Gilbert,
>
> Hey we are testing a fix now.  If this works I will send you a copy of the
> update file.
>
> --gordon
>
>
> On May 31, 2011, at 12:55 , Gilbert Glåns wrote:
>
> Hi Gordon,
>
> Thank you for sharing the information.  We are seeing the same exact
>
> type of behavior from our search cluster.  I have tracked the
>
> problem(s) though the query system.  It looks like the mailboxes we
>
> are both seeing are "abandoned" and / or the messages are never
>
> matched within the Erlang code (it_op_collector_loop,
>
> riak_search_op_utils.erl); the messages are then never processed,
>
> therefore the resources they utilize never released.  This is a major
>
> problem.
>
> I have been debugging this for some time and I wish I could say it was
>
> going well.  The implementation is convoluted -- have you gotten
>
> through it?  Can you verify the same cause?
>
> We have been internally discussing the possibility of removing this
>
> query processing implementation completely and replacing it with
>
> something built in-house because the problems we have uncovered trying
>
> to debug the "abandoned mailbox" problem are related and systemic:  1)
>
> indeterminate and possibly very large data structures created and
>
> manipulated for intermediate and final sets of results, 2) very poor
>
> or non-existent ability to gain any insight into what is executing
>
> within the "plumbing" of the current query execution system without
>
> "herculean" effort (in my opinion), and 3) unacceptable performance
>
> (predictably or subjectively) from the merge_index riak_search
>
> backend.
>
> Are there any other backends available for riak_search with the
>
> Enterprise Riak offering?  I really like the design of riak_search but
>
> the performance seems to be only a very small fraction of our
>
> equivalent SOLR installation, even with several times the amount of
>
> resources "thrown at it" -- it does not seem to use resources we
>
> "throw at it" well, either, or in the mailboxes case, responsibly.
>
> I will quickly admit I may be doing something wrong.  Is there a
>
> user-error situation in which mailboxes should be abandoned taking up
>
> memory?
>
> Does anyone else have experiences with equivalent riak_search vs. SOLR
>
> installations?
>
> Thanks again for sharing Gordon.  Your results make me feel like this
>
> may not be entirely stupidity on my part.
>
> Gilbert
>
>
> On Tue, May 31, 2011 at 8:51 AM, Gordon Tillman  wrote:
>
> Howdy Gilbert,
>
> I reproduced the issue this morning and then ran the command that you
>
> specified on two of the non-empty mailboxes.
>
> The output from that is posted here:
>
> https://gist.github.com/1000735
>
> Please let me know if this corresponds to the issue that you are seeing.
>
> Thank you,
>
> --gordon
>
> On May 27, 2011, at 20:10 , Gilbert Glåns wrote:
>
> Gordon,
>
> Could you try:
>
> erlang:process_info(list_to_pid("<0.16614.32>"), [messages,
>
> current_function, initial_call, links, memory, status]).
>
> in a riak search console for one/some of those mailboxes and share the
>
> results? I am curious to see if you are having the same systemic
>
> memory consumption I am experiencing.
>
> Gilbert
>
> On Fri, May 27, 2011 at 5:15 PM, Gordon Tillman  wrote:
>
> Howdy Gang,
>
> We are having a bit of an issue with our 3-node riaksearch cluster.  What is
>
> happing is this:
>
> Cluster is up and r

speeding up riaksearch precommit indexing

2011-06-07 Thread Steve Webb

Hey there.

I'm inserting twitter spritzer tweets into a bucket that doesn't have a 
precommit index hook, and a few fields from the tweet into a second bucket 
that does have the precommit hook.


Speeds on the inserts into the indexed bucket are an order or magnitude 
slower than the non-indexed bucket.


I'm using a 1GB ram, 20GB disk vmware VM, 2-node cluster, ubuntu 10.4, 
riaksearch 0.14.0 with n_val = 2.


Is there a way to do a more lazy indexing to where it doesn't slow down 
inserts so much?


- Steve

--
Steve Webb - Senior System Administrator for gnip.com
http://twitter.com/GnipWebb

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch memory growth issues

2011-06-07 Thread Gordon Tillman
Thanks David,

If there is anything I can do from this end to help please don't hesitate to 
ask.

--gordon

On Jun 7, 2011, at 15:34 , David Smith wrote:

> Gordon,
> 
> Thanks for the test case. I've queued it up for review by a dev, as
> time permits.
> 
> D.
> 
> On Tue, Jun 7, 2011 at 1:33 PM, Gordon Tillman  wrote:
>> Guys I have put together a simple test to reproduce the error that we are
>> seeing.
>> It is on github here:
>> https://github.com/gordyt/riaksearch-test
>> This is a multi-threaded test that connects to Riak using the protocol
>> buffers interface.  Each iteration of the run loop issues one simple search
>> and uploads one small json object.
>> Thanks very much for any input you might have.
>> Regards,
>> --gordon
>> On Jun 6, 2011, at 10:01 , Gordon Tillman wrote:
>> 
>> Good Morning Gilbert,
>> I have posted this gist:
>> https://gist.github.com/1010384
>> It is a minor update we made to it_op_collector_loop/3 in
>> riak_search_op_utils.  This update was done to alleviate the situation that
>> we observe here:
>> https://gist.github.com/1000735
>> But it was made with the understanding that this is treating a symptom and
>> not fixing the cause of the problem.
>> A little bit of followup information: The problem seems to be exacerbated
>> when Riak is hit with a series of operations that are all generating the
>> same search/map/reduce operation (albeit with differing search input
>> parameters).
>> We installed 0.14.2 and tested this weekend (without our update applied) and
>> observed the same issues.
>> If I found out anything else I will let you know.
>> --gordon
>> 
>> 
>> 
>> On May 31, 2011, at 18:09 , Gilbert Glåns wrote:
>> 
>> Gordon,
>> 
>> Great news!  Much appreciated.
>> 
>> Gilbert
>> 
>> On Tue, May 31, 2011 at 2:25 PM, Gordon Tillman  wrote:
>> 
>> Howdy Gilbert,
>> 
>> Hey we are testing a fix now.  If this works I will send you a copy of the
>> update file.
>> 
>> --gordon
>> 
>> 
>> On May 31, 2011, at 12:55 , Gilbert Glåns wrote:
>> 
>> Hi Gordon,
>> 
>> Thank you for sharing the information.  We are seeing the same exact
>> 
>> type of behavior from our search cluster.  I have tracked the
>> 
>> problem(s) though the query system.  It looks like the mailboxes we
>> 
>> are both seeing are "abandoned" and / or the messages are never
>> 
>> matched within the Erlang code (it_op_collector_loop,
>> 
>> riak_search_op_utils.erl); the messages are then never processed,
>> 
>> therefore the resources they utilize never released.  This is a major
>> 
>> problem.
>> 
>> I have been debugging this for some time and I wish I could say it was
>> 
>> going well.  The implementation is convoluted -- have you gotten
>> 
>> through it?  Can you verify the same cause?
>> 
>> We have been internally discussing the possibility of removing this
>> 
>> query processing implementation completely and replacing it with
>> 
>> something built in-house because the problems we have uncovered trying
>> 
>> to debug the "abandoned mailbox" problem are related and systemic:  1)
>> 
>> indeterminate and possibly very large data structures created and
>> 
>> manipulated for intermediate and final sets of results, 2) very poor
>> 
>> or non-existent ability to gain any insight into what is executing
>> 
>> within the "plumbing" of the current query execution system without
>> 
>> "herculean" effort (in my opinion), and 3) unacceptable performance
>> 
>> (predictably or subjectively) from the merge_index riak_search
>> 
>> backend.
>> 
>> Are there any other backends available for riak_search with the
>> 
>> Enterprise Riak offering?  I really like the design of riak_search but
>> 
>> the performance seems to be only a very small fraction of our
>> 
>> equivalent SOLR installation, even with several times the amount of
>> 
>> resources "thrown at it" -- it does not seem to use resources we
>> 
>> "throw at it" well, either, or in the mailboxes case, responsibly.
>> 
>> I will quickly admit I may be doing something wrong.  Is there a
>> 
>> user-error situation in which mailboxes should be abandoned taking up
>> 
>> memory?
>> 
>> Does anyone else have experiences with equivalent riak_search vs. SOLR
>> 
>> installations?
>> 
>> Thanks again for sharing Gordon.  Your results make me feel like this
>> 
>> may not be entirely stupidity on my part.
>> 
>> Gilbert
>> 
>> 
>> On Tue, May 31, 2011 at 8:51 AM, Gordon Tillman  wrote:
>> 
>> Howdy Gilbert,
>> 
>> I reproduced the issue this morning and then ran the command that you
>> 
>> specified on two of the non-empty mailboxes.
>> 
>> The output from that is posted here:
>> 
>> https://gist.github.com/1000735
>> 
>> Please let me know if this corresponds to the issue that you are seeing.
>> 
>> Thank you,
>> 
>> --gordon
>> 
>> On May 27, 2011, at 20:10 , Gilbert Glåns wrote:
>> 
>> Gordon,
>> 
>> Could you try:
>> 
>> erlang:process_info(list_to_pid("<0.16614.32>"), [messages,
>> 
>> current_function, initial_call, links, memory, status]).
>> 
>> i

Has there been any talk of dropping the PB interface?

2011-06-07 Thread Andrew Berman
I'm curious if there has been any talk to drop the protocol buffers
interface in favor of one of the more user-friendly serialization libraries
which support more languages, like Bert (http://bert-rpc.org/) or
MessagePack (http://msgpack.org/).  I would think Bert is a perfect fit for
Riak since it uses native Erlang binary which would make exposing the Erlang
client pretty seamless.  I'm not sure of the speed difference, but the fact
that Google only provides PB support in three languages seems to me to be a
bit of a hindrance.

Thoughts?

--Andrew
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Has there been any talk of dropping the PB interface?

2011-06-07 Thread Mike Oxford
Protobufs is "tighter on the wire" than BERT, due to predefined schema
and better packing of things like numbers.  The same goes for Thrift.

If you need more languages for protobuf, have a look at
http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns

Rock on.

-mox


On Tue, Jun 7, 2011 at 3:39 PM, Andrew Berman  wrote:
> I'm curious if there has been any talk to drop the protocol buffers
> interface in favor of one of the more user-friendly serialization libraries
> which support more languages, like Bert (http://bert-rpc.org/) or
> MessagePack (http://msgpack.org/).  I would think Bert is a perfect fit for
> Riak since it uses native Erlang binary which would make exposing the Erlang
> client pretty seamless.  I'm not sure of the speed difference, but the fact
> that Google only provides PB support in three languages seems to me to be a
> bit of a hindrance.
>
> Thoughts?
>
> --Andrew
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Correct way to use pbc/mapreduce to do multiget where keys and bucket names are binary values?

2011-06-07 Thread Jacques
I've been working on this.  I have it working with an anonymous javascript
function.  I was hoping to move it to using the "map_object_value" built-in
erlang function.  However, when I attempt to use this function, I get
failures if any of my keys don't exist.  Is there a way to construct my map
phase so that it gracefully handles not founds and just returns what it
finds?

Also, using this function, is there a way to return the bucket and key
names?  Or can one assume the response order is identical to the order of
inputs?

I have my current constructions below.

Thanks for any help.

Jacques



*Working Function Definition (including not founds)*
  private static final OtpErlangAtom FUNCTION_TYPE = new
OtpErlangAtom("jsanon");
  private static final OtpErlangBinary FUNCTION_VALUE = new
OtpErlangBinary("function(v){ return [v]; }".getBytes());
  private static final OtpErlangTuple MAP_REDUCE_FUNCTION = new
OtpErlangTuple(new OtpErlangObject[] { FUNCTION_TYPE, FUNCTION_VALUE });

*Non-working Function Definition (works fine if all input list exists.
 Fails on not founds.)*
  private static final OtpErlangAtom FUNCTION_TYPE = new
OtpErlangAtom("modfun");
  private static final OtpErlangAtom FUNCTION_MODULE = new
OtpErlangAtom("riak_kv_mapreduce");
  private static final OtpErlangAtom FUNCTION_NAME = new
OtpErlangAtom("map_object_value");
  private static final OtpErlangTuple MAP_REDUCE_FUNCTION = new
OtpErlangTuple(new OtpErlangObject[] { FUNCTION_TYPE, FUNCTION_MODULE,
FUNCTION_NAME });


*Final construction of Query object for submission (using either of above).*
  private static final OtpErlangTuple MAP_REDUCE_PHASE = new
OtpErlangTuple(new OtpErlangObject[]{ATOM_MAP, MAP_REDUCE_FUNCTION,
ATOM_NONE, KEEP_TRUE});
  private static final OtpErlangList MAP_REDUCE_PHASES = new
OtpErlangList(new OtpErlangTuple[]{MAP_REDUCE_PHASE});
  private static final OtpErlangTuple FULL_QUERY = new OtpErlangTuple(new
OtpErlangObject[] { ATOM_QUERY, MAP_REDUCE_PHASES });



On Tue, Jun 7, 2011 at 7:57 AM, Russell Brown  wrote:

>
> On 7 Jun 2011, at 15:29, Jacques wrote:
>
> Have you had any success with reading the server response?
>
>
> Yes. Sorry I didn't post a reply.
>
> It was trivial (but fraught, see below), I just used the OtpInputStream to
> deserialize the byte array returned from pbc.MapReduceResponse.
>
> Like:
>
>  ByteString bs = resp.getContent();
> OtpInputStream is = new OtpInputStream(bs.toByteArray());
> OtpErlangObject result = is.read_any();
> // and then all sorts of looping, sniffing types, unpacking etc
>
>
> Caveats:
>
> You have a pain unpacking any reasonably complex result.
> The ETF specification drops this doozy about "strings":
> http://www.erlang.org/doc/apps/erts/erl_ext_dist.html#id85596.
> My first test actually returned a list of [0,1,2,3...200] and Jinterface
> helpfully turned that into a string for me.
>
> That aside it is certainly feasible to use Jinterface to
> serialize/deserialize Map/Reduce jobs/results.
>
> Cheers
>
> Russell
>
>
> Thanks,
> Jacques
>
> On Sat, Jun 4, 2011 at 1:19 PM, Russell Brown wrote:
>
>>
>> On 4 Jun 2011, at 18:22, Jacques wrote:
>>
>>  I like the sound of option 3 also. I'll have a look at it this weekend
>>> and get back to you.
>>>
>>
>> Awesome!  Thanks.  If you can give me a point in the right direction
>> regarding the correct typing approach and what not, I'm up for giving it a
>> shot as well.
>>
>>
>> Ok, I have a half working hack. It isn't pretty 'cos the Jinterface API is
>> verbose. I've hacked the pbc.MapReduceBuilder to encode the job as
>> "application/x-erlang-binary" and submit that, but really this code should
>> be in a separate class, maybe using the output from
>> MapReduceBuilder.getJSON() as input. That way you can get the feature
>> without patching the client.
>>
>> What I haven't done is decode the response from Riak yet. If you want  a
>> pointer here is a gist of the (unclean) hack. It could use a lot of work,
>> but it proves the concept:
>>
>> https://gist.github.com/1008293
>>
>> The gist is just the diff so you can apply it as a patch
>> to src/main/java/com/basho/riak/pbc/mapreduce/MapReduceBuilder.java if you
>> want to play with it.
>>
>> You'll have to add Jinterface to your pom too.
>>
>> 
>> 
>>  org.erlang.otp
>>  jinterface
>>  1.5.4
>> 
>>
>> I think it is best to put the result decoding outside the library too. I'm
>> going to hack up a poc for that now, but I thought I'd post what I have thus
>> far.
>>
>> Cheers
>>
>> Russell
>>
>>
>> Thanks again,
>> Jacques
>>
>>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


bitcask hash algo

2011-06-07 Thread Aaron Blohowiak
as far as i can tell, bitcask c_src is using the murmurhash2 algo, which has
a known flaw ( https://sites.google.com/site/murmurhash/murmurhash2flaw )..
while this is not *likely* to cause an issue, I was wondering if there was a
reason that it does not use murmurhash3 ?

if this is not the appropriate list for this question, please let me know!

- Aaron
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com