Re: speeding up riaksearch precommit indexing

2011-06-09 Thread Rusty Klophaus
Hi Steve,

Riak does best with a lot of memory and a fast disk. Depending on how much
data you have in the system, putting two nodes into 1GB of memory on a
single VM may be causing the system to overrun available resources and page
out to disk, and depending on how you've set up your virtualized
environment, you could be paying extra costs with each disk access,
compounding the problem. My first recommendation would be to either run the
test again while monitoring disk operations using iostat to see if disk is
the problem, or to just go ahead and test on bigger hardware. I suspect you
will see much less of a performance difference between the tests once there
are ample resources.

That said, some slowdown is expected when you turn on indexing, as Riak
Search adds quite a bit of overhead in parsing and tokenizing the document,
and then storing the results.

There are two ways to speed up indexing:

   1. Reduce the size of your documents. If your documents are large, but
   you only need one or two fields indexed, you can create smaller "surrogate"
   documents with just the fields you need indexed, plus a link back to your
   original document.
   2. Batch your writes using the Solr interface. Riak Search uses
   "term-based partitioning". Term-based partitioning reduces complexity during
   queries, at the cost of increased complexity during writes.  You can gain
   back some of the lost performance by batching your writes. This allows the
   system to plan which messages it sends more intelligently, thus sending
   fewer messages and reducing overhead. The downside here is that you can't
   use the Riak KV interface, you need to switch to the Solr interface.

Would you mind describing a bit more about your the size and shape of your
data (how many objects, average object size, object format, throughput,
etc.) and ideally attach your Riak Search schema?

Thanks,
Rusty


On Tue, Jun 7, 2011 at 4:35 PM, Steve Webb  wrote:

> Hey there.
>
> I'm inserting twitter spritzer tweets into a bucket that doesn't have a
> precommit index hook, and a few fields from the tweet into a second bucket
> that does have the precommit hook.
>
> Speeds on the inserts into the indexed bucket are an order or magnitude
> slower than the non-indexed bucket.
>
> I'm using a 1GB ram, 20GB disk vmware VM, 2-node cluster, ubuntu 10.4,
> riaksearch 0.14.0 with n_val = 2.
>
> Is there a way to do a more lazy indexing to where it doesn't slow down
> inserts so much?
>
> - Steve
>
> --
> Steve Webb - Senior System Administrator for gnip.com
> http://twitter.com/GnipWebb
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Question for Riak Developer Advocates

2011-06-09 Thread Srdjan Pejic
What do you guys hate about Riak right now?
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Erlang API

2011-06-09 Thread Anthony Molinaro
You can, I've done this, you have to modify the /etc/riak/vm.args to
include and start your application (assuming your application has a start/0)
function.

So if you have a foo app, you can add something like

-pa /usr/lib/erlang/lib/foo-0.0.0/ebin
-s foo_app

Then if foo_app.erl has something like

-module(foo_app).

-export([start/0]).

-behaviour(application).
-export([start/2,stop/1]).

start() ->
 application:start (foo).

start(_,_) ->
  foo_sup:start_link().

stop (_) ->
  ok.

That should do it.

-Anthony

On Wed, Jun 08, 2011 at 11:49:28AM -0400, Evans, Matthew wrote:
> Hi List,
> 
> I'm new to riak, and I am thinking of using riak to store log file / 
> statistics information from a client application. The main benefit riak could 
> offer here are its map-reduce/search capabilities.
> 
> The client side applications are developed in a variety of languages, but in 
> the end all requests will be tunneled via a light weight API to our own 
> Erlang "admin" application.
> 
> I've noticed that the riak Erlang API really dispatches requests to the riak 
> cluster via the gen_tcp module. The logging (insert) rate could be very high, 
> so what would be neat is if I could embed our Erlang "admin" application 
> directly into the riak cluster to avoid the IPC hop between the 
> riak-erlang-client node and the riak node(s).
> 
> Now obviously I can fork the code and implement my own way of doing this, but 
> the question is there an official way to embed your own code directly into a 
> riak cluster to avoid that extra IPC hop?
> 
> Thanks
> 
> Matt

> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


-- 

Anthony Molinaro   

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Question for Riak Developer Advocates

2011-06-09 Thread Scott Gonyea
While I am not a Developer Advocate, I hate that I cannot play Angry Birds 
using the Basho logo.

On Jun 9, 2011, at 10:25 AM, Srdjan Pejic wrote:

> What do you guys hate about Riak right now? 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Question for Riak Developer Advocates

2011-06-09 Thread John Hornbeck
+1

hornbeck
VP Client Services
Basho Technologies 



On Jun 9, 2011, at 13:52, Scott Gonyea  wrote:

> While I am not a Developer Advocate, I hate that I cannot play Angry Birds 
> using the Basho logo.
> 
> On Jun 9, 2011, at 10:25 AM, Srdjan Pejic wrote:
> 
>> What do you guys hate about Riak right now? 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Pros/Cons to not storing JSON

2011-06-09 Thread Andrew Berman
I am using Riak using the Erlang Client API (PB) and I was storing my
documents as JSON and then converting them to records when I pull them out
of Riak, but I got to thinking that maybe this isn't the greatest approach.
I'm thinking that maybe it's better to store documents just as the record
itself (Erlang binary) and then just converting the binary back to the
record when I pull them from Riak.  I was wondering what the pros/cons are
to this approach.  Here's my list so far:

Pros:

Native Erlang is stored, so less time to convert to the record
Better support for nested records
Smaller storage requirements and hence faster on the wire (?)

Cons:

Not readable through Rekon (or other utils) without modification
Can't use standard M/R functions which analyze the document (have to write
all custom functions using Erlang)
Not portable across languages

Thanks,

Andrew
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pros/Cons to not storing JSON

2011-06-09 Thread Sean Cribbs
Andrew,

I think you're on the right track here, but I might add that you'll want to 
have upgrade paths available if you're using records -- that is, version them 
-- so that you can evolve their structure over time.  That could be a little 
hairy unless done carefully.

That said, you could use BERT as the serialization format, making implementing 
JavaScript M/R functions a little easier, and interop with other languages.

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote:

> I am using Riak using the Erlang Client API (PB) and I was storing my 
> documents as JSON and then converting them to records when I pull them out of 
> Riak, but I got to thinking that maybe this isn't the greatest approach.  I'm 
> thinking that maybe it's better to store documents just as the record itself 
> (Erlang binary) and then just converting the binary back to the record when I 
> pull them from Riak.  I was wondering what the pros/cons are to this 
> approach.  Here's my list so far:
> 
> Pros:
> 
> Native Erlang is stored, so less time to convert to the record
> Better support for nested records
> Smaller storage requirements and hence faster on the wire (?)
> 
> Cons:
> 
> Not readable through Rekon (or other utils) without modification
> Can't use standard M/R functions which analyze the document (have to write 
> all custom functions using Erlang)
> Not portable across languages
> 
> Thanks,
> 
> Andrew
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pros/Cons to not storing JSON

2011-06-09 Thread Will Moss
Hey Andrew,

We're using BSON (bsonspec.org), because it stores binary (and other) data
types better than JSON and is also faster and more wire efficient (sounds
like about the same reasons you're considering leaving JSON). There are also
libraries to parse BSON it in just about every language.

I haven't tried using it in a Erlang map-reduce yet (we don't do map-reduces
for any of our production work), but there is a library out there so it
shouldn't be too hard.

Will


On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs  wrote:

> Andrew,
>
> I think you're on the right track here, but I might add that you'll want to
> have upgrade paths available if you're using records -- that is, version
> them -- so that you can evolve their structure over time.  That could be a
> little hairy unless done carefully.
>
> That said, you could use BERT as the serialization format, making
> implementing JavaScript M/R functions a little easier, and interop with
> other languages.
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote:
>
> > I am using Riak using the Erlang Client API (PB) and I was storing my
> documents as JSON and then converting them to records when I pull them out
> of Riak, but I got to thinking that maybe this isn't the greatest approach.
>  I'm thinking that maybe it's better to store documents just as the record
> itself (Erlang binary) and then just converting the binary back to the
> record when I pull them from Riak.  I was wondering what the pros/cons are
> to this approach.  Here's my list so far:
> >
> > Pros:
> >
> > Native Erlang is stored, so less time to convert to the record
> > Better support for nested records
> > Smaller storage requirements and hence faster on the wire (?)
> >
> > Cons:
> >
> > Not readable through Rekon (or other utils) without modification
> > Can't use standard M/R functions which analyze the document (have to
> write all custom functions using Erlang)
> > Not portable across languages
> >
> > Thanks,
> >
> > Andrew
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Process for acceptance of pull requests

2011-06-09 Thread Mark Phillips
Hey Anthony,

Great question. We do have "Code Submission and Integration"
guidelines written up on the wiki:

http://wiki.basho.com/Code-Submission-and-Integration.html

(They were, admittedly, a touch out of date, so I just updated them to
reflect the current process.)

More specifically, a pull request via GitHub is our preferred method
of code contribution. As that link also points out, patches sent to
the mailing list and submission via issues.basho.com are also both
acceptable.

On a general note, we've seen a marked increase in the number of pull
requests over the last few months (keep them coming!) and this,
combined with internal efforts to build out new features and fix some
long standing issues, has resulted in a bit of a back log. That said,
we're working on them. I can personally assure you that each and every
pull request submitted to date is in our review queue. I'll endeavor
to make sure each new submission is acknowledged with a comment or
email response by myself or another committer. (It's I who is slacking
on this. Apologies.)

Lastly, I have it on my list of medium and long term TODOs to work
with the dev team make our code submission and integration process
more transparent and expeditious where possible; it works, but there
are a few areas in which we can improve. The last thing we want to do
is stifle contributions by being unresponsive.

Thanks for your contributions. And definitely email me or the list
with further suggestions/comments/questions.

Mark

Community Manager
Basho Technologies
wiki.basho.com
twitter.com/pharkmillups


On Wed, Jun 8, 2011 at 11:43 AM, Anthony Molinaro
 wrote:
> Hi,
>
>  I had a pretty minor pull request against riak_kv
>
> https://github.com/basho/riak_kv/pull/105
>
> which I sent a few weeks ago, however, I've not seen any comments
> or anything.  So I wanted to understand a little better what the
> process for this sort of thing should be.  Is a github pull request
> the preferred mode of communication, or should we also send emails
> or open bugs in the bugtracker?
>
> Maybe the process is spelled out somewhere and I'm just missing it?
> Or maybe the process is just to submit a pull request and wait and
> I'm impatient ;)
>
> Thanks for the info,
>
> -Anthony
>
> --
> 
> Anthony Molinaro                           
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Errors when using Basho Bench make results

2011-06-09 Thread Ken Perkins
I've followed the directions verbatim for setting up basho bench. I 
successfully ran a test, but now I get the following error when running "make 
results"

root@basho-bench:~/basho_bench# make results
priv/summary.r -i tests/current
/usr/bin/env: Rscript --vanilla: No such file or directory
make: *** [results] Error 127

I also tried:

root@basho-bench:~/basho_bench# priv/summary.r -i tests/current
/usr/bin/env: Rscript --vanilla: No such file or directory

These were tried on a clean linode machine using erlang latest and apt-get 
install r-base and r-base-dev. 

Can anyone offer any assistance here? Obviously the #!/usr/bin/env Rscript 
--vanilla statement is causing some problems. Rscript is correctly in path.

Thanks!

--Ken
clipboard, inc.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


RE: Pros/Cons to not storing JSON

2011-06-09 Thread Evans, Matthew
Hi,

Why not convert your term to a string, and then you can do map reduce can't you?

Term to a string...

1> Term = [{one,1},{two,2},{three,3}].
[{one,1},{two,2},{three,3}]
2> String = lists:flatten(io_lib:format("~p.", [Term])).
"[{one,1},{two,2},{three,3}]."

Save "String" in riak...

Then back to a term...

3> String = "[{one,1},{two,2},{three,3}].".
"[{one,1},{two,2},{three,3}]."
4> {ok,Tok,_} = erl_scan:string(String).
5> {ok,Term} = erl_parse:parse_term(Tok).
{ok,[{one,1},{two,2},{three,3}]}

/Matt


From: riak-users-boun...@lists.basho.com 
[mailto:riak-users-boun...@lists.basho.com] On Behalf Of Will Moss
Sent: Thursday, June 09, 2011 5:27 PM
To: Sean Cribbs
Cc: riak-users
Subject: Re: Pros/Cons to not storing JSON

Hey Andrew,

We're using BSON (bsonspec.org), because it stores binary 
(and other) data types better than JSON and is also faster and more wire 
efficient (sounds like about the same reasons you're considering leaving JSON). 
There are also libraries to parse BSON it in just about every language.

I haven't tried using it in a Erlang map-reduce yet (we don't do map-reduces 
for any of our production work), but there is a library out there so it 
shouldn't be too hard.

Will

On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs 
mailto:s...@basho.com>> wrote:
Andrew,

I think you're on the right track here, but I might add that you'll want to 
have upgrade paths available if you're using records -- that is, version them 
-- so that you can evolve their structure over time.  That could be a little 
hairy unless done carefully.

That said, you could use BERT as the serialization format, making implementing 
JavaScript M/R functions a little easier, and interop with other languages.

Sean Cribbs mailto:s...@basho.com>>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote:

> I am using Riak using the Erlang Client API (PB) and I was storing my 
> documents as JSON and then converting them to records when I pull them out of 
> Riak, but I got to thinking that maybe this isn't the greatest approach.  I'm 
> thinking that maybe it's better to store documents just as the record itself 
> (Erlang binary) and then just converting the binary back to the record when I 
> pull them from Riak.  I was wondering what the pros/cons are to this 
> approach.  Here's my list so far:
>
> Pros:
>
> Native Erlang is stored, so less time to convert to the record
> Better support for nested records
> Smaller storage requirements and hence faster on the wire (?)
>
> Cons:
>
> Not readable through Rekon (or other utils) without modification
> Can't use standard M/R functions which analyze the document (have to write 
> all custom functions using Erlang)
> Not portable across languages
>
> Thanks,
>
> Andrew
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pros/Cons to not storing JSON

2011-06-09 Thread Andrew Berman
Ah, yes, you're right.  Basically I'd have to either update all previous
record docs with the new field or I'd have to have multiple record
implementations to support the history of that particular record.  That
could be really, really ugly.

Thanks Sean!

On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs  wrote:

> Andrew,
>
> I think you're on the right track here, but I might add that you'll want to
> have upgrade paths available if you're using records -- that is, version
> them -- so that you can evolve their structure over time.  That could be a
> little hairy unless done carefully.
>
> That said, you could use BERT as the serialization format, making
> implementing JavaScript M/R functions a little easier, and interop with
> other languages.
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote:
>
> > I am using Riak using the Erlang Client API (PB) and I was storing my
> documents as JSON and then converting them to records when I pull them out
> of Riak, but I got to thinking that maybe this isn't the greatest approach.
>  I'm thinking that maybe it's better to store documents just as the record
> itself (Erlang binary) and then just converting the binary back to the
> record when I pull them from Riak.  I was wondering what the pros/cons are
> to this approach.  Here's my list so far:
> >
> > Pros:
> >
> > Native Erlang is stored, so less time to convert to the record
> > Better support for nested records
> > Smaller storage requirements and hence faster on the wire (?)
> >
> > Cons:
> >
> > Not readable through Rekon (or other utils) without modification
> > Can't use standard M/R functions which analyze the document (have to
> write all custom functions using Erlang)
> > Not portable across languages
> >
> > Thanks,
> >
> > Andrew
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pros/Cons to not storing JSON

2011-06-09 Thread Andrew Berman
Cool, I've looked at BSON before for another project, and it might make
sense in this case as well.

Thanks!

On Thu, Jun 9, 2011 at 2:26 PM, Will Moss  wrote:

> Hey Andrew,
>
> We're using BSON (bsonspec.org), because it stores binary (and other) data
> types better than JSON and is also faster and more wire efficient (sounds
> like about the same reasons you're considering leaving JSON). There are also
> libraries to parse BSON it in just about every language.
>
> I haven't tried using it in a Erlang map-reduce yet (we don't do
> map-reduces for any of our production work), but there is a library out
> there so it shouldn't be too hard.
>
> Will
>
>
> On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs  wrote:
>
>> Andrew,
>>
>> I think you're on the right track here, but I might add that you'll want
>> to have upgrade paths available if you're using records -- that is, version
>> them -- so that you can evolve their structure over time.  That could be a
>> little hairy unless done carefully.
>>
>> That said, you could use BERT as the serialization format, making
>> implementing JavaScript M/R functions a little easier, and interop with
>> other languages.
>>
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote:
>>
>> > I am using Riak using the Erlang Client API (PB) and I was storing my
>> documents as JSON and then converting them to records when I pull them out
>> of Riak, but I got to thinking that maybe this isn't the greatest approach.
>>  I'm thinking that maybe it's better to store documents just as the record
>> itself (Erlang binary) and then just converting the binary back to the
>> record when I pull them from Riak.  I was wondering what the pros/cons are
>> to this approach.  Here's my list so far:
>> >
>> > Pros:
>> >
>> > Native Erlang is stored, so less time to convert to the record
>> > Better support for nested records
>> > Smaller storage requirements and hence faster on the wire (?)
>> >
>> > Cons:
>> >
>> > Not readable through Rekon (or other utils) without modification
>> > Can't use standard M/R functions which analyze the document (have to
>> write all custom functions using Erlang)
>> > Not portable across languages
>> >
>> > Thanks,
>> >
>> > Andrew
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pros/Cons to not storing JSON

2011-06-09 Thread Andrew Berman
Well, I'd rather not do it that way and converting it to a string.  But
another thing I can do is convert the record to a proplist and then store
that in the database.  When I pull it out of the database, I would have to
loop through the fields of the record definition, use each field as a key in
the proplist to get the value out of the proplist.  This would avoid the
issue Sean raised with storing a record directly.

On Thu, Jun 9, 2011 at 2:41 PM, Evans, Matthew  wrote:

>  Hi,
>
>
>
> Why not convert your term to a string, and then you can do map reduce can’t
> you?
>
>
>
> Term to a string…
>
>
>
> 1> Term = [{one,1},{two,2},{three,3}].
>
> [{one,1},{two,2},{three,3}]
>
> 2> String = lists:flatten(io_lib:format("~p.", [Term])).
>
> "[{one,1},{two,2},{three,3}]."
>
>
>
> Save “String” in riak…
>
>
>
> Then back to a term…
>
>
>
> 3> String = "[{one,1},{two,2},{three,3}].".
>
> "[{one,1},{two,2},{three,3}]."
>
> 4> {ok,Tok,_} = erl_scan:string(String).
>
> 5> {ok,Term} = erl_parse:parse_term(Tok).
>
> {ok,[{one,1},{two,2},{three,3}]}
>
>
>
> /Matt
>
>
>  --
>
> *From:* riak-users-boun...@lists.basho.com [mailto:
> riak-users-boun...@lists.basho.com] *On Behalf Of *Will Moss
> *Sent:* Thursday, June 09, 2011 5:27 PM
> *To:* Sean Cribbs
> *Cc:* riak-users
> *Subject:* Re: Pros/Cons to not storing JSON
>
>
>
> Hey Andrew,
>
>
>
> We're using BSON (bsonspec.org), because it stores binary (and other) data
> types better than JSON and is also faster and more wire efficient (sounds
> like about the same reasons you're considering leaving JSON). There are also
> libraries to parse BSON it in just about every language.
>
>
>
> I haven't tried using it in a Erlang map-reduce yet (we don't do
> map-reduces for any of our production work), but there is a library out
> there so it shouldn't be too hard.
>
>
>
> Will
>
>
>
> On Thu, Jun 9, 2011 at 2:24 PM, Sean Cribbs  wrote:
>
> Andrew,
>
> I think you're on the right track here, but I might add that you'll want to
> have upgrade paths available if you're using records -- that is, version
> them -- so that you can evolve their structure over time.  That could be a
> little hairy unless done carefully.
>
> That said, you could use BERT as the serialization format, making
> implementing JavaScript M/R functions a little easier, and interop with
> other languages.
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
>
> On Jun 9, 2011, at 5:14 PM, Andrew Berman wrote:
>
> > I am using Riak using the Erlang Client API (PB) and I was storing my
> documents as JSON and then converting them to records when I pull them out
> of Riak, but I got to thinking that maybe this isn't the greatest approach.
>  I'm thinking that maybe it's better to store documents just as the record
> itself (Erlang binary) and then just converting the binary back to the
> record when I pull them from Riak.  I was wondering what the pros/cons are
> to this approach.  Here's my list so far:
> >
> > Pros:
> >
> > Native Erlang is stored, so less time to convert to the record
> > Better support for nested records
> > Smaller storage requirements and hence faster on the wire (?)
> >
> > Cons:
> >
> > Not readable through Rekon (or other utils) without modification
> > Can't use standard M/R functions which analyze the document (have to
> write all custom functions using Erlang)
> > Not portable across languages
> >
> > Thanks,
> >
> > Andrew
>
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Recap as a Blog (?)

2011-06-09 Thread Mark Phillips
Firstly, thanks for the feedback. This is exactly what we were looking for!

It's pretty clear that people would find value in consuming the Recaps via
both email and blog/rss. So, that's what we are going to do :) Once the
initial setup is taken care of, posting a blog entry with the same content
as the plain text Recap that's sent to this list will take all of 15
minutes. That's a small price to pay IMO, and I'll happily do it to reach a
wider audience. :)

For those of you who were asking for a prettier search interface for the
list, http://riak.markmail.org/ is an invaluable resource. I'll work on
making that link more visible. We might be able to make this list more
searchable, too. I'm looking into it.

Alright. Off to build a Recap blog. Not sure when that'll be done but it
shouldn't take more than a week or two to finalize.

One other thought I just had: it might be cool for the Recap blog to have a
nifty logo. Any amateur graphic designers out there want to take a stab
creating a "Riak Recap" logo? It pays a Riak t-shirt, stickers, and praise
and adoration of Riak users everywhere ...

Thanks again!

Mark

On Mon, Jun 6, 2011 at 7:54 PM, Jonathan Langevin <
jlange...@loomlearning.com> wrote:

> +1 on blog.
>
> Additionally, I agree with Keith about ability to search. While a mailing
> list is great for being able to converse via email, it seems that quite alot
> of valuable conversations and tips get "lost" due to how ML discussions are
> archived (i.e. - the interface).
>
> For instance, it would be nice to browse in a forum-style interface
> (instead of having to dig through date-archived postings), and esp if the
> ability to respond via web existed, that would be awesome.
> Google Groups would be a good solution, as it seems to be pretty much a
> forum interface to a mailing list. Only issue is I've seen spam quite often
> in Google Groups, but I don't know if that's just due to lists not having
> proper administrators (or some type of spam filter is not enabled).
> *
>
>  
>  Jonathan Langevin
> Systems Administrator
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
> www.loomlearning.com - Skype: intel352
> *
>
>
>
> On Mon, Jun 6, 2011 at 10:32 AM, Keith Bennett <
> keith.benn...@lmnsolutions.com> wrote:
>
>> Mark -
>>
>> I think that's a great idea.  I'd go even further, and suggest that the
>> valuable information discussed on the mailing list be managed in a
>> searchable form, as is the case with Google or Yahoo Groups.  (Is it
>> searchable now?  I couldn't see how to search the entire archive for a
>> keyword.)
>>
>> This would be helpful for we users of riak, and even possibly for riak
>> developers, since finding more information might prevent people from asking
>> a question that has already been asked in the past, saving them time.  In
>> this case, you might want to copy the Recap to a message on the mailing
>> list, so it's searchable there as well.
>>
>> Thanks,
>> Keith
>>
>>
>> On Jun 6, 2011, at 2:51 AM, Mark Phillips wrote:
>>
>> > Hey All -
>> >
>> > Quick question: how would you feel if we turned the Riak Recap into a
>> blog?
>> >
>> > I've spoken with various people in various channels about how to best
>> > deliver the Recap, and while it's clear that it's a valuable tool for
>> > the community, I'm not sure the Mailing List is still the best vehicle
>> > through which to publish it.
>> >
>> > Publishing it as a blog (perhaps at "recap.basho.com") makes a lot of
>> > sense as it would enable people to consume it without having to sift
>> > through the rest of the mailing list traffic (and I know there are
>> > more than a few of you who are on this ML only for the Recaps). More
>> > importantly, I think it would bring more new readers to the Recap (and
>> > more users to Riak).
>> >
>> > So, in the interest of convenience and expanding the size of the Riak
>> > community, I think making it a blog might make sense. It would still
>> > be written, published, and tweeted thrice weekly, just delivered to
>> > you in your Reader, for example, instead of on the ML.
>> >
>> > As you all are the primary consumers of the Recap, I thought I would
>> > gather some opinions before I did anything drastic. Anyone have
>> > thoughts on this?
>> >
>> > +/-1s, rants, and all other expressions of opinion are encouraged.
>> >
>> > Thanks,
>> >
>> > Mark
>> >
>> > Community Manager
>> > Basho Technologies
>> > wiki.basho.com
>> > twitter.com/pharkmillups
>> >
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://

Re: Question for Riak Developer Advocates

2011-06-09 Thread Ben Tilly
I am not a developer advocate.  But my top hate is that when machines
leave/rejoin your data can be inaccessable for some time.

We had a great case where we wanted to use Riak, but that was a
complete showstopper and we won't be using it because of that.  (We
wanted to store information which needed to be read in the event of a
machine failing.  But the machine that could fail would be on the same
cluster that was running Riak, so we'd be potentially trying to do
reads exactly when data was unavailable.)

On Thu, Jun 9, 2011 at 10:25 AM, Srdjan Pejic  wrote:
> What do you guys hate about Riak right now?
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Recap as a Blog (?)

2011-06-09 Thread Sylvain Niles
This calls for silly hats.
It's a re-cap, heh.

http://rookery9.aviary.com.s3.amazonaws.com/8461000/8461482_11a7_625x625.jpg

If anyone likes silly hats I could certainly clean this up a lot.




On Thu, Jun 9, 2011 at 4:27 PM, Mark Phillips  wrote:

> Firstly, thanks for the feedback. This is exactly what we were looking for!
>
> It's pretty clear that people would find value in consuming the Recaps via
> both email and blog/rss. So, that's what we are going to do :) Once the
> initial setup is taken care of, posting a blog entry with the same content
> as the plain text Recap that's sent to this list will take all of 15
> minutes. That's a small price to pay IMO, and I'll happily do it to reach a
> wider audience. :)
>
> For those of you who were asking for a prettier search interface for the
> list, http://riak.markmail.org/ is an invaluable resource. I'll work on
> making that link more visible. We might be able to make this list more
> searchable, too. I'm looking into it.
>
> Alright. Off to build a Recap blog. Not sure when that'll be done but it
> shouldn't take more than a week or two to finalize.
>
> One other thought I just had: it might be cool for the Recap blog to have a
> nifty logo. Any amateur graphic designers out there want to take a stab
> creating a "Riak Recap" logo? It pays a Riak t-shirt, stickers, and praise
> and adoration of Riak users everywhere ...
>
> Thanks again!
>
> Mark
>
> On Mon, Jun 6, 2011 at 7:54 PM, Jonathan Langevin <
> jlange...@loomlearning.com> wrote:
>
>> +1 on blog.
>>
>> Additionally, I agree with Keith about ability to search. While a mailing
>> list is great for being able to converse via email, it seems that quite alot
>> of valuable conversations and tips get "lost" due to how ML discussions are
>> archived (i.e. - the interface).
>>
>> For instance, it would be nice to browse in a forum-style interface
>> (instead of having to dig through date-archived postings), and esp if the
>> ability to respond via web existed, that would be awesome.
>> Google Groups would be a good solution, as it seems to be pretty much a
>> forum interface to a mailing list. Only issue is I've seen spam quite often
>> in Google Groups, but I don't know if that's just due to lists not having
>> proper administrators (or some type of spam filter is not enabled).
>> *
>>
>>  
>>  Jonathan Langevin
>> Systems Administrator
>> Loom Inc.
>> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
>> www.loomlearning.com - Skype: intel352
>> *
>>
>>
>>
>> On Mon, Jun 6, 2011 at 10:32 AM, Keith Bennett <
>> keith.benn...@lmnsolutions.com> wrote:
>>
>>> Mark -
>>>
>>> I think that's a great idea.  I'd go even further, and suggest that the
>>> valuable information discussed on the mailing list be managed in a
>>> searchable form, as is the case with Google or Yahoo Groups.  (Is it
>>> searchable now?  I couldn't see how to search the entire archive for a
>>> keyword.)
>>>
>>> This would be helpful for we users of riak, and even possibly for riak
>>> developers, since finding more information might prevent people from asking
>>> a question that has already been asked in the past, saving them time.  In
>>> this case, you might want to copy the Recap to a message on the mailing
>>> list, so it's searchable there as well.
>>>
>>> Thanks,
>>> Keith
>>>
>>>
>>> On Jun 6, 2011, at 2:51 AM, Mark Phillips wrote:
>>>
>>> > Hey All -
>>> >
>>> > Quick question: how would you feel if we turned the Riak Recap into a
>>> blog?
>>> >
>>> > I've spoken with various people in various channels about how to best
>>> > deliver the Recap, and while it's clear that it's a valuable tool for
>>> > the community, I'm not sure the Mailing List is still the best vehicle
>>> > through which to publish it.
>>> >
>>> > Publishing it as a blog (perhaps at "recap.basho.com") makes a lot of
>>> > sense as it would enable people to consume it without having to sift
>>> > through the rest of the mailing list traffic (and I know there are
>>> > more than a few of you who are on this ML only for the Recaps). More
>>> > importantly, I think it would bring more new readers to the Recap (and
>>> > more users to Riak).
>>> >
>>> > So, in the interest of convenience and expanding the size of the Riak
>>> > community, I think making it a blog might make sense. It would still
>>> > be written, published, and tweeted thrice weekly, just delivered to
>>> > you in your Reader, for example, instead of on the ML.
>>> >
>>> > As you all are the primary consumers of the Recap, I thought I would
>>> > gather some opinions before I did anything drastic. Anyone have
>>> > thoughts on this?
>>> >
>>> > +/-1s, rants, and all other expressions of opinion are encouraged.
>>> >
>>> > Thanks,
>>> >
>>> > Mark
>>> >
>>> > Community Manager
>>> > Basho Technologies
>>> > wiki.basho.com
>>> > twitter.com/pharkmillups
>>> >
>>> > ___
>>> > riak-users mailing list
>>> > riak-u

Re: Question for Riak Developer Advocates

2011-06-09 Thread Ryan Zezeski
Ben,

I hate non-obvious behavior too, and it's something we constantly try to
fight at Basho.  That said, I don't think Riak is in as bad a position as
you think.  Lets see if I can convince you :)

If I'm understanding you correctly you are making two points here:

1) When performing a join/leave under load most GETs return 404 until data
transfer has completed.

2) A node in the cluster has failed and that is causing data to become
unavailable.

Assuming these are indeed your claims I counter...

1) Yes, performing a join/leave **can** cause reads to return 404s.  Just
ask Greg Nelson and he can tell you all about it.  However, I want to
emphasize the **can** qualifier here.  It depends on the # of nodes you are
going from->to.  The reason this matters is b/c this number will affect how
the claim algorithm behaves and how much data actually shifts around.

Now I can hear you saying "Yea, but that's still brittle/broken!"  Yes, I
agree 100% with the words I just put in your mouth.  My point is simply that
there are shades of grey here and depending on how many nodes you have you
might never hit this case (note that 3-5 nodes **will** hit this case).  We
are actively working on a solution to this problem as we recognize it's
seriousness and very much want to see it fixed.

2) This should absolutely not be happening.  This is Riak's bread and butter
use case, i.e. high availability.  My guess is I'm misunderstanding what you
are saying.

-Ryan




On Thu, Jun 9, 2011 at 8:00 PM, Ben Tilly  wrote:

> I am not a developer advocate.  But my top hate is that when machines
> leave/rejoin your data can be inaccessable for some time.
>
> We had a great case where we wanted to use Riak, but that was a
> complete showstopper and we won't be using it because of that.  (We
> wanted to store information which needed to be read in the event of a
> machine failing.  But the machine that could fail would be on the same
> cluster that was running Riak, so we'd be potentially trying to do
> reads exactly when data was unavailable.)
>
> On Thu, Jun 9, 2011 at 10:25 AM, Srdjan Pejic  wrote:
> > What do you guys hate about Riak right now?
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Question for Riak Developer Advocates

2011-06-09 Thread Ben Tilly
It sounds like you understood perfectly.

Basically we are running a cluster of machines that are busy doing
lots of stuff.  We wanted to use Riak to keep configuration
information about those machines and the stuff they were doing.  So
Riak would be running on machines whose primary job is something else.
 A critical use case for us is to figure out what needs to be done on
which other machines after one of the machines goes down.  Therefore
having the potential to have our data unavailable during a failover
because of the failover kills the benefit that we wanted from a high
availability system.

We've chosen to go with the simple approach of a relational database
on external hardware in a high availability setup.  We didn't want
that dependency, but we've done enough now that we're committed to it.

On Thu, Jun 9, 2011 at 7:33 PM, Ryan Zezeski  wrote:
> Ben,
> I hate non-obvious behavior too, and it's something we constantly try to
> fight at Basho.  That said, I don't think Riak is in as bad a position as
> you think.  Lets see if I can convince you :)
> If I'm understanding you correctly you are making two points here:
> 1) When performing a join/leave under load most GETs return 404 until data
> transfer has completed.
> 2) A node in the cluster has failed and that is causing data to become
> unavailable.
> Assuming these are indeed your claims I counter...
> 1) Yes, performing a join/leave **can** cause reads to return 404s.  Just
> ask Greg Nelson and he can tell you all about it.  However, I want to
> emphasize the **can** qualifier here.  It depends on the # of nodes you are
> going from->to.  The reason this matters is b/c this number will affect how
> the claim algorithm behaves and how much data actually shifts around.
> Now I can hear you saying "Yea, but that's still brittle/broken!"  Yes, I
> agree 100% with the words I just put in your mouth.  My point is simply that
> there are shades of grey here and depending on how many nodes you have you
> might never hit this case (note that 3-5 nodes **will** hit this case).  We
> are actively working on a solution to this problem as we recognize it's
> seriousness and very much want to see it fixed.
> 2) This should absolutely not be happening.  This is Riak's bread and butter
> use case, i.e. high availability.  My guess is I'm misunderstanding what you
> are saying.
> -Ryan
>
>
>
> On Thu, Jun 9, 2011 at 8:00 PM, Ben Tilly  wrote:
>>
>> I am not a developer advocate.  But my top hate is that when machines
>> leave/rejoin your data can be inaccessable for some time.
>>
>> We had a great case where we wanted to use Riak, but that was a
>> complete showstopper and we won't be using it because of that.  (We
>> wanted to store information which needed to be read in the event of a
>> machine failing.  But the machine that could fail would be on the same
>> cluster that was running Riak, so we'd be potentially trying to do
>> reads exactly when data was unavailable.)
>>
>> On Thu, Jun 9, 2011 at 10:25 AM, Srdjan Pejic  wrote:
>> > What do you guys hate about Riak right now?
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>> >
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Question for Riak Developer Advocates

2011-06-09 Thread Andrew Thompson
On Thu, Jun 09, 2011 at 08:01:24PM -0700, Ben Tilly wrote:
> It sounds like you understood perfectly.
> 
> Basically we are running a cluster of machines that are busy doing
> lots of stuff.  We wanted to use Riak to keep configuration
> information about those machines and the stuff they were doing.  So
> Riak would be running on machines whose primary job is something else.
>  A critical use case for us is to figure out what needs to be done on
> which other machines after one of the machines goes down.  Therefore
> having the potential to have our data unavailable during a failover
> because of the failover kills the benefit that we wanted from a high
> availability system.

I think you're confusing 2 issues. The time when a *new* node is joining
a cluster or an established node is leaving the cluster *VS* when a node
in the cluster is unavailable.

In the first case, when you are changing the cluster membership (not the
list of available nodes) you will currently get notfounds as the
partitions claims are changed.

In the second case (which should be the more common one if you aren't
constantly adding/removing nodes from the cluster) you will NOT receive
notfounds (assuming your n_val is > 1 and your r value is less than the
n_val). When you do a get on the key which is stored on the downed node
riak will create a fallback vnode for the missing vnode, do a GET on
that key (and it should get N-1 replies) and do a
read-repair on the newly spun up fallback vnode. You should not get
notfounds in this case unless R==N or N=1 or you've had multiple nodes
fail since you last fetched this key (if the cluster is large enough
this should be unlikely).

If you want to increase your robustness further, you can increase N and
reduce R. Git HEAD also has a new GET parameter called notfound_ok,
which instructs riak to treat notfound responses as valid responses
(counting towards R) instead as an error.

Let me know if that clarifies things at all.

Andrew

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Waiting for 'backend' bucket property to take effect

2011-06-09 Thread Greg Nelson
Hello,

I have been debugging something I've seen popping up intermittently when 
running my application's functional tests against Riak (local 5 node devrel 
cluster). The behavior is basically that sometimes an object which was PUT will 
seemingly disappear. Any future GETs will 404. Even if waiting seconds or 
minutes between PUT and GET.

After staring at this and pulling out some hair I finally figured out what was 
happening (I think). I noticed it was always the first few objects written that 
were lost, and only on a certain bucket. My application uses multi-backend with 
two bitcask backends. That bucket is the only one which uses the non-default 
backend.

What's happening is the application first gets the bucket properties and then 
sets the "backend" prop if it's not set. You can probably guess the rest. (PUTs 
come into nodes which don't have the property in their ring state yet and store 
the objects in the default backend)

I don't think this is necessarily a "bug". It's expected behavior when you 
think about it, as long as you know how bucket properties are propagated. But 
even knowing that, this is pretty subtle.

Is there a good way for a client to know when the property has been gossiped to 
all the nodes? Seems like the only approach is to wait a bit after setting a 
property before doing a PUT...

Also, does this sound right? It's very possible I'm wrong about what's causing 
this behavior.

-Greg ___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com