Re: Scale up or out?

2012-06-26 Thread Aphyr

On 06/26/2012 07:24 AM, Eric Anderson wrote:

Hey all,

Question about EC2 (or scale in general): i'm building a decent cluster,
to handle 15-20k inserts/s and 5-10k gets per second. (for a rough idea
of what I'm doing). I've been playing with a 15-node cluster of
m2.xlarge systems, but I am wondering what is better: more small systems
or less larger systems?

Any recommendations/hints/tricks would be a huge help!


Your cheapest/fastest option is probably physical HW with SSDs.

--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Bitcask - large keydirs

2011-03-10 Thread Aphyr
TLDR: hey, what about using extendible hashing for bitcask keydirs? 
Constant-time lookups with two disk seeks end-to-end, much larger 
keyspaces than currently supportable, but without the total rehashing 
cost. Also avoids the O(log N) insertion/search/deletion costs of b-trees.


At length:

I've been thinking a lot recently about how to do quick lookups of keys 
where the space is much larger than memory--say, a few billion 32-byte 
keys. Similarly, bitcask is going to need to store more keys than can 
fit in an in-memory hashtable at some point.


One possibility is constructing bytewise (or multi-byte-wise) tries from 
the keys. These have the advantage of being orderable (hmm, range 
queries? faster bucket listing?), reasonably short, and supporting 
log(n) operations. You could cache the initial levels of the trie in 
memory and drop to disk for the leaves. An adaptive caching algorithm 
could also be used to maintain frequently accessed leaf nodes in memory. 
(the FS cache may actually provide acceptable results as well). It also 
takes advantage of the relatively low entropy of most Riak keys, and 
similar keys could be fast to access if they reside in nearby pages.


The major disadvantage is that trees can involve a lot of O(log N) churn 
for insertions, which... theoretically... sucks on disk. Obviously there 
are ways to make it perform well because ReiserFS and most DB indexes 
make use of them, but... maybe there are alternatives.


Ideally we want constant time operations, but hash tables usually come 
with awkward rehashing periods or insane space requirements. O(N) 
rehashing can block other operations, which blows latency through the 
roof when disks are involved. Not a good property for a k/v store.


So I started doodling some hybrid tree-hash structures, browsing through 
NIST's datastructures list, and lo and behold, there is actually a 
structure which combines some of the advantages of tries but behaves 
well for disk media!


http://www.smckearney.com/adb/notes/lecture.extendible.hashing.pdf

You store values on disk in buckets which are small multiples of the 
page size. Finding a value involves choosing the bucket, reading the 
bucket from disk, and a linear search for the value.


To choose the bucket you use a hash table which specifies the on-disk 
address of the right bucket for your value. Here's the catch: you keep 
the index table small. In fact, it's a bitwise trie (or, even faster, a 
flattened hashtable) of the least significant bits of the hash of the 
key. As buckets fill up, you split them in half and (possibly) increase 
the depth of the index. Hence growth/shrinking is incremental and only 
operates on one bucket at a time.


In the case of bitcask, where values can be variable length, it probably 
makes sense to store the file ID/offset in the bucket, and take the hit 
of a second seek to support faster/more predictable searching over each 
bucket.


The downside is that this is still an in-memory hash table and can only 
store an additional (values/bucket). Perhaps dropping the index to disk 
as well and taking advantage of the FS cache over it could work?


--Kyle Kingsbury

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


A script to check bitcask keydir sizes

2011-03-16 Thread Aphyr
I'm trying to track some basic metrics so we can plan for cluster 
capacity, monitor transfers, etc. Figured this might be of interest to 
other riak admins. Apologies if my erlang is nonidiomatic, I'm still 
learning. :)


#!/usr/bin/env escript
%%! -name riakstatuscheck -setcookie riak

main([]) -> main(["riak@127.0.0.1"]);
main([Node]) ->
  io:format("~w\n", [
lists:foldl(
  fun({_VNode, Count}, Sum) -> Sum + Count end,
  0,
  rpc:call(list_to_atom(Node), riak_kv_bitcask_backend, key_counts, [])
)
  ]).


$ ./riakstatus riak@127.0.0.1
18729

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak.mapValuesJson crashes on \r\n

2011-03-23 Thread Aphyr
Newline parsing is broken in JSON2.js shipped with Riak. Drop a more 
recent version of JSON2.js in the directory referred to by js_source_dir 
in app.config's riak_kv section, e.g.


{js_source_dir, "/etc/riak/js/"}

and reload the node.

--Kyle

On 03/23/2011 01:31 PM, Michael Ossareh wrote:

Greets,

https://gist.github.com/883872

That stack trace is the product of calling Riak.mapValuesJson in a map
phase of a map reduce job. I checked the same parsing against Chrome and
Firefox and they both parse the string as I'd expect. The object, as a
string, that is being parsed is:

"{\"bar\":\"baz\\r\\nfrob\"}"

The setup is: Riak 0.14.0 on Mac OS X 10.6.6 and CentOS 5.5


Any advice / pointers, greatly appreciated. Also, let me know if there
is anything else I can do to help.

Cheers,

mike



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: JSON with newlines

2011-04-15 Thread Aphyr
Yes, it's because the JSON2.js included with Riak has a bug around 
newlines. Dropping an updated JSON2.js in your js_source_dir will fix it.


--Kyle

On 04/15/2011 12:20 AM, Matt Ranney wrote:

I'm using Riak Search 0.14.0-1, and it seems like JSON docs with
otherwise legal \r characters in them confuse both the search indexer
and JSON.parse() from within JavaScript map functions.  Presumably these
two are related.

After looking around, it sounds like maybe this has been fixed, but
perhaps not yet in release builds.  Is it really the case that Riak
isn't yet using the native JSON parse from spidermonkey, or is something
else going on here?



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: This sure looks like a bug...?

2011-04-18 Thread Aphyr

I actually had a question about that page.  Why is it that when there
is a conflict we can only get the conflicting versions of the data?
If I'm going to try to resolve the conflict intelligently, I really
want the common ancestor as well so that I can try to do a 3-way
merge.


Good call. If an ancestor were available it would make counting and 
merging orthogonal changes *much* simpler.


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: 'not found' after join

2011-05-02 Thread Aphyr
I'd like to chime in here by noting that it would be incredibly nice if 
the client could distinguish between a record that is missing because 
the vnode is unavailable, and a record that truly does not exist. My 
consistency-repair system was running during partition handoff, 
determined that several thousand users were "deleted", and removed their 
following relationships.


--Kyle

On 05/02/2011 06:48 PM, Greg Nelson wrote:

Hello riak users!

I have a 4 node cluster that started out as 3 nodes. ring_creation_size
= 2048, target_n_val is default (4), and all buckets have n_val = 3.

When I joined the 4th node, for a few minutes some GETs were returning
'not found' for data that was already in riak. Eventually the data was
returned, due to read repair I would assume. Is this expected? It seems
that 'not found' and read repairs should only happen when something goes
wrong, like a node goes down. Not when adding a node to the cluster,
which is supposed to be part of normal operation!

Any help or insight is appreciated!

Greg



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Authentication

2011-05-03 Thread Aphyr
Any system which presents plaintext is vulnerable; it is simply a matter 
of complexity. Once you've compromised a layer which processes 
plaintext, all layers below it are essentially moot, as the Playstation 
network recently discovered.


The only scheme which will defend against data compromise is one in 
which the application does not contain sufficient information to 
reconstruct the plaintext. For example, you can have the client of the 
system (each user, for example) store a secret (say, a password) which 
is never yielded directly to the application, but is used as a part of 
the cryptosystem key. Hence the application can never reconstruct the 
plaintext. This may, of course, limit how useful your application can be.


Long story short: it's application dependent. I don't think it would be 
useful to bake that feature into Riak. My advice is to design in depth, 
modularize systems that handle critical data to reduce their 
vulnerability surface, and plan for each layer to be compromised 
progressively. It can buy you some time.


--Kyle

On 05/03/2011 05:26 AM, David Greenstein wrote:


This is a question/survey on people's approach to security and
appetite for baked in security features to Riak/NoSQL. A typical
exploit path hackers take is to exploit a public facing application
(like the application server, of which there typically numerous
vulnerabilities), determine the data source and credentials by
exploring the application code and it's network activity, access the
db and steal info. Firewalls do not help in this case since the data
store is being accessed from a legitimate source. So, database
authentication and password encryption on the client is pretty key
here.

What are people's typical approach to protecting against this
scenario? Is it a reverse proxy (not sure if this really solves the
problem give the request is from a legit host)? Also, what are
people's appetite for baked in features in Riak to do db
authentication and help with password encryption and key mgt on the
client?

Seems like an important feature for anyone dealing with compliance.

Thank you! Dave ___
riak-users mailing list riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: How many links does it take til you get to the center of the ... ?

2011-05-04 Thread Aphyr

HA! I just ran into this limit!

5,000 links like /tablet_users/whoever riaktag=following will cause your 
user objects to take upwards of 4 seconds to return, on our beefy 
cluster, with a variance of ~3 seconds. (Over HTTP, ruby riak_client.) I 
had to move them into the JSON body, which makes things much faster. 
Hundreds of links seems just fine.


--Kyle

On 05/04/2011 02:50 PM, Luc Castera wrote:

Hi,

Sorry for the Lil Kim reference on the subject line, I couldn't help it :-)

In the links documentation on wiki.basho.com , it
says:
"this can substitute as a lightweight graph database, as long as the
number of links attached to a given key are kept reasonably low"

What's considered reasonably low here? tens? hundreds? thousands? millions?

Has anyone published benchmarks or stress-testing results of this
anywhere for me to take a look?

Thank you,

--
__
Luc Castera
http://www.intellum.com
Phone: 571-765-1982
Skype: luc.castera



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Links vs Key Filters for Performance

2011-05-05 Thread Aphyr
The key filter still has to walk the entire keyspace, which will make 
fetches an O(n) operation as opposed to O(1).


--Kyle

On 05/05/2011 03:35 PM, Andrew Berman wrote:

I was curious if anyone has any thoughts on what is more performant,
links or key filters in terms of secondary links.  For example:

I want to be able to look up a user by id and email:

*Link implementation:*

Two buckets: user and user_email, where id is the key of user and email
is the key of user_email.  User_email contains no data but simply has a
link pointing back to the proper user.

*Key Filter:*

One bucket: user, where id_email is the key of the bucket.  Lookups
would use a key filter tokenizing the id and then looking up the id or
email based on the proper token.

Obviously both work, but I'm curious what the implications are from a
performance standpoint.

Thanks,

Andrew



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Links vs Key Filters for Performance

2011-05-05 Thread Aphyr
I suppose if you had a really small number of keys in Riak it might be 
faster, but you're almost certainly better off maintaining a second 
object and making the lookup constant time. Here's an example:


https://github.com/aphyr/risky/blob/master/lib/risky/indexes.rb

--Kyle

On 05/05/2011 03:49 PM, Andrew Berman wrote:

Ah, that makes sense.  So is it the case that using the link
implementation will always be faster?  Or are there cases where it makes
more sense to use a key filter?

Thanks!

--Andrew

On Thu, May 5, 2011 at 3:44 PM, Aphyr mailto:ap...@aphyr.com>> wrote:

The key filter still has to walk the entire keyspace, which will
make fetches an O(n) operation as opposed to O(1).

--Kyle


On 05/05/2011 03:35 PM, Andrew Berman wrote:

I was curious if anyone has any thoughts on what is more performant,
links or key filters in terms of secondary links.  For example:

I want to be able to look up a user by id and email:

*Link implementation:*

Two buckets: user and user_email, where id is the key of user
and email
is the key of user_email.  User_email contains no data but
simply has a
link pointing back to the proper user.

*Key Filter:*

One bucket: user, where id_email is the key of the bucket.  Lookups
would use a key filter tokenizing the id and then looking up the
id or
email based on the proper token.

Obviously both work, but I'm curious what the implications are
from a
performance standpoint.

Thanks,

Andrew



___
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Millions of buckets?

2011-05-11 Thread Aphyr
Since buckets are essentially key prefixes, I think buckets will 
probably not make this faster. Maybe one of the riak-search experts 
knows why your search is taking so long.


--Kyle

On 05/11/2011 12:00 PM, alexeypro wrote:

Generally the problem there that I may end up with N buckets, where N is
number of users. If I have 5 to 10 mln of users -- then it's a lot of
buckets. How Riak will handle it?

--
View this message in context: 
http://riak-users.197444.n3.nabble.com/Millions-of-buckets-tp2928567p2928642.html
Sent from the Riak Users mailing list archive at Nabble.com.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Production Backup Strategies

2011-05-13 Thread Aphyr
In the exciting event that your application or riak goes rogue and 
deletes everything, bitcask will allow you to recover amazing, 
life-saving amounts of data from its log-structured format.


ASK ME HOW I KNOW. :-P

Uh, more typically, I've heard that FS-level snapshots of /var/lib/riak 
or simple tarballs work well. riak-admin backup works fairly well below, 
say, 50 million keys, but can take several hours at that scale.


--Kyle

On 05/13/2011 05:27 PM, Mike Katz wrote:

Hey All,

I'm sizing up some database options for a fairly ambitious app I'm
building out for a client of mine. I've read a good amount of the
available docs and have toyed around with Riak enough to know that it's
one of my finalists (one of two, to be precise).

Before I set off building this app, there was one thing I wanted to ask
about: backups. Specifically, what strategies/methods are people using
to backup the data in their Riak clusters? I've worked with this client
enough to know that they won't sign off on a relatively new database
technology without knowing that the backup story was rock-solid

I'd really love to use Riak, and this is basically my last hurdle. Any
anecdotes/examples/pointers to blog posts that my Googling has yet to
uncover would be much appreciated.

Thanks!

MK



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Mapreduce crosstalk

2011-05-17 Thread Aphyr
I was writing a new mapreduce query to look at users over time, and ran 
it over a single user in production. After that, other mapreduce jobs 
over users started returning results from my new map phase, some of the 
time. After five minutes of this, I had to restart every node in the 
cluster to get it to stop.


Every node has {map_cache_size, 0} in riak_kv.

The map phase that screwed things up was:

function(v) {
  o = JSON.parse(v.values[0].data);

  // Age of account in days
  age = Math.round(
(Date.now() - Date.iso8601(o.created_at)) /
(1000 * 60 * 60 * 24)
  );

  return [['t_user_scores', v.key, age]];
}

It looks like one node started running that phase instead of the 
requested phase for subsequent jobs. It *should* have run this one, but 
didn't.


function(v) {
o = JSON.parse(v.values[0].data);
return [{
key: v.key,
name: o.name,
thumbnail: o.thumbnail
}];
}

Now I'm scared to run MR jobs. Could it be an issue with returning 
keydata? Anybody else seen this before?


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Bitcask bindings for Ruby

2011-05-18 Thread Aphyr
I've written an extremely basic library to read Bitcask data files. If 
you lose everything, but have backups, all is not lost. It's easy to 
dump those bitcask files to a directory, or pump them back into Riak. If 
you are feeling *really* enterprising, you can even do it preserving 
vclocks.


https://github.com/aphyr/bitcask-ruby

--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Aphyr
Agreed. In fact, jrecursive pointed out to me last week that vnode 
operations are synchronous. That means that when you call list-keys, not 
only is it going to take a long time (right now upwards of 5 minutes) to 
complete, but while each vnode is returning its list of keys *it blocks 
any other requests*.


While list-keys is an unfortunate necessity for some things, its use 
should be minimized if you're going to get to any appreciable (100M 
keys) scale. I don't even know how we're going to use it at all above a 
billion. Possibly by listing the keys periodically from bitcask 
directly, and maintaining an index ourselves.


--Kyle

On 05/26/2011 09:40 AM, Sean Cribbs wrote:

With recent commits (
https://github.com/seancribbs/ripple/compare/35d7323fb0e179c8c971...da3ab71a19d194c65a7b

), it is cached until you either refresh it manually by passing :reload
=> true or a block (for streaming key lists). This was the compromise
reached in that pull-request.

All of this caching discussion glosses over the fact that you *should
not list keys* in any real application. It really begs the question --
how often do you list keys in Redis, or memcached? I suspect that
generally you don't. This isn't a relational database. (Also, how often
do you actually do a full-table scan in MySQL? You don't if you're sane
-- you use an index, or even LIMIT + OFFSET.)

I'm tempted to remove Document::all and make Bucket#keys harder to
access, but the balance between discouraging bad behavior and exposing
available functionality is a hard one to strike. I don't want new
developers to immediately use list-keys and then be discouraged from
using Riak because it's slow; on the other hand, it /can be useful/ in
some circumstances. In those cases where it's useful, the developer
should probably be responsible enough to request the key list only once;
the caching behavior simply does this for them. I guess whether it
/should/ do this for them is the issue at hand.

All that said, I'm really torn on this issue, and the same problem
applies to full-bucket MapReduce. Caveat emptor.

Sean Cribbs mailto:s...@basho.com>>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 26, 2011, at 10:35 AM, Jonathan Langevin wrote:


How long is the key list cached like that, naturally?*


*/
/*Jonathan Langevin*/
Systems Administrator
*Loom Inc.*
Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com
 - www.loomlearning.com
 - Skype: intel352

/*

*


On Thu, May 26, 2011 at 10:35 AM, Sean Cribbs mailto:s...@basho.com>> wrote:

Keith,

There was a pull-request issue out for this on the Github project
(https://github.com/seancribbs/ripple/pull/168). For various
reasons, the list of keys is memoized in the Riak::Bucket
instance. Passing :reload => true to the #keys method will cause
it to refresh. I like to discourage list-keys, but with the
memoized list you don't shoot yourself in the foot as often.

Sean Cribbs mailto:s...@basho.com>>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 26, 2011, at 10:29 AM, Keith Bennett wrote:

> All -
>
> I just started working with Riak, and am using the riak-client
Ruby gem.
>
> When I delete a key from a bucket, and try to fetch the value
associated with that key, I get a 404 error (which is reasonable).
However, it remains in the bucket's list of keys (i.e. the value
returned by bucket.keys(). Why is the key still reported to exist
in the bucket? Is bucket.keys cached, and therefore unaware of the
deletion? Here's a riak-client Ruby script and its output in irb
that illustrates this:
>
> ree-1.8.7-2010.02 :001 > require 'riak'
> => true
> ree-1.8.7-2010.02 :002 >
> ree-1.8.7-2010.02 :003 > client = Riak::Client.new
> => #http://127.0.0.1:8098 >
> ree-1.8.7-2010.02 :004 > bucket = client['links']
> => #
> ree-1.8.7-2010.02 :005 > key = bucket.keys.first
> => "4000-17.xml"
> ree-1.8.7-2010.02 :006 > object = bucket[key]
> => #
> ree-1.8.7-2010.02 :007 > object.delete
> => #
> ree-1.8.7-2010.02 :008 > bucket.keys.first
> => "4000-17.xml"
> ree-1.8.7-2010.02 :009 > object = bucket[key]
> Riak::HTTPFailedRequest: Expected [200, 300] from Riak but
received 404. not found
>
> from

/Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:55:in
`perform'
> from

/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1054:in
`request'
> from

/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:2142:in
`reading_body'
> from

/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/l

Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Aphyr

In software products that have containment metaphors, how often do we
see a function return a cached value rather than the up-to-date
value, especially for products that manage shared data?


Pretty frequently, actually. Every Ruby ORM I've used caches
associations by default. Even when listing is cheap, deserialization isn't.

For some reason this never tripped me up; I recall looking at the rdoc
and finding it quite obvious that this method had a :reload option. But 
if you just guessed at the existence of #keys, it is probably (like many 
things about Riak) surprising! :)


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Best practice for using erlang modules in riak?

2011-06-02 Thread Aphyr

On 06/02/2011 01:49 PM, Sylvain Niles wrote:

Is there any open source code out there using erlang functions via
ripple or rest that I can look at to see a fully functional flow?


I made a lot of stupid mistakes getting this to work. Leaving add_paths 
commented out, not using arrays in arguments to #map and #reduce, not 
exporting the functions, incorrect arities, not compiling the code... 
etc. Hopefully this helps some.


Oh, and forgive the erlang; this was my first function. ;-)

--Kyle

-module(structs).

-export([
  from_dict/1,
  sort_by_value/2,
  unique/2
]).

...

% Sorts a list of structs by a given value.
sort_by_value(Structs, [Key, <<"asc">>]) ->
  lists:sort(
fun({struct, A}, {struct, B}) ->
  proplists:get_value(Key, A) < proplists:get_value(Key, B)
end,
Structs
  );
sort_by_value(Structs, [Key, <<"desc">>]) ->
  lists:sort(
fun({struct, A}, {struct, B}) ->
  proplists:get_value(Key, A) > proplists:get_value(Key, B)
end,
Structs
  );
sort_by_value(Structs, Key) ->
  sort_by_value(Structs, [Key, <<"asc">>])



{riak_kv, [
...
{add_paths, ["/etc/riak/erl/"]},
...
]}



$ cd /etc/riak/erl/
$ erlc structs.erl
$ riak attach
> l(structs).



results = Tablet::Comment.mr(items).
  map(['tablet', 'map_to_links_with_data'], :arg => 'user').
  map(['tablet', 'map_to_keydata_with_data'], :arg => ['user', 
['name', 'thumbnail']]).
  reduce(['structs', 'sort_by_value'], :arg => ['created_at', 
'desc'], :keep => true).run


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Question: Object Not Saved After Save/Delete/Save

2011-06-03 Thread Aphyr
Riak can't use the vclock for conflict resolution on a fresh object, 
i.e. one without a vclock. Deletes are writes. You should use get or 
reload before writing to help Riak sequence your writes correctly.


On top of this, Riak has some weirdness around very quick sequences of 
deletes/writes due, IIRC, to deletes not being tagged with a vector 
clock. I... think... this will be addressed in an upcoming release.


--Kyle

On 06/03/2011 01:55 PM, Keith Bennett wrote:

Hi, all.  I have a weird issue.  I'm using the Riak ruby client.  When I store 
an object, create an object with the same bucket and key and delete it, then 
create an object with the same bucket and key and store it again, it is not in 
riak.  Can someone tell me what I'm doing wrong?  I posted my code here:

https://gist.github.com/1007142

Thanks very much,
Keith


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Bitcask-ruby update

2011-06-11 Thread Aphyr
Bitcask-ruby now implements the keydir and knows how to use hintfiles. 
It's now capable of loading 62,000 keys (from a 535mb bitcask) in 1.5 
seconds. We're using this at Showyou to list keys and run various 
analytics without blocking Riak.


https://github.com/aphyr/bitcask-ruby

--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Newbie Ripple

2011-06-21 Thread Aphyr

2. You could write
   x = Klass.find(key)
   if x.nil?
 x = Klass.new
 x.save
   end


get_or_new doesn't save, so perhaps Klass.find(key) || Klass.new(key)

Risky (another Ruby Riak model layer) offers Klass.get_or_new(key)


3. control the bucket on which the document is stored/retrieve
(using an attribute). From what I've seen, riak uses the class name
as bucket.


I don't know about Ripple, but in Risky you could:

a.) Override MyModel.bucket to choose different buckets.
b.) Subclass and set MyModel.bucket
c.) Ignore/override MyModel.[] altogether, fetch the robject yourself, 
and use Klass.from_riak_object.


You should be aware that buckets are essentially key prefixes. When you 
go to *get* a record, you would need to know the bucket name. Two cases:


Lots of buckets, e.g. one per user: Why not make the username part of 
the key? It may not feel as clean, but you won't have to fight the ORM 
as much.


Few buckets, e.g. Record, RecordTypeA < Record, RecordTypeB < Record: 
Subclass and set/override .bucket.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: mr_queue gone wild

2011-06-30 Thread Aphyr
The mr_queue is a bitcask, so you should expect it to grow monotonically 
until compaction. The file size is not an indication of the number of 
pending jobs. You can read the contents using any bitcask utility. For 
example, using https://github.com/aphyr/bitcask-ruby:


$ bitcask --no-riak /var/lib/riak/mr_queue/ all
...
3264
bitcask_tombstone
3265
bitcask_tombstone
3266
bitcask_tombstone
3267
bitcask_tombstone

--Kyle

On 06/30/2011 01:52 PM, Sylvain Niles wrote:

So I backrev'd everything to: Erlang R13B04, Riak 0.14.2 (no riak
search) and got rid of any functionality using search. After importing
the 7k objects the bitcask dir is ~41MB. Starting up our app
everything works fine until a worker starts updating objects with new
values at the rate of about 1-2/second. It finds those objects via a
map function that looks for one json integer and compares it with an
input (currently with a javascript function but I'm slowly porting
them all to erlang). While this worker is running the mr_queue
directory grows at about 1MB every 2 minutes, forever. It's my
understanding that pending m/r jobs are persisted to disk in this
directory, but the amount of work is trivial and the mr_queue never
gets smaller even after we shut down all our workers and leave riak
alone.

Is there a way to list the m/r jobs in the queue in case there's
something else going on? Is there a reason they never get removed?

Thanks in advance,
Sylvain


On Wed, Jun 29, 2011 at 12:59 AM, Mathias Meyer  wrote:

Sylvain,

you should not be using riak HEAD for anything that's close to your production 
environment. Development in Riak is in big flux right now, and it'll be hard 
for us to help you find the specific problem.

Could you please install Riak Search 0.14.2, best with a fresh installation, 
and try running this setup again to see if you get the same erroneous results? 
If you do, some more details on your data and the MapReduce jobs you're running 
would be great to reproduce and figure out the problem.

Mathias Meyer
Developer Advocate, Basho Technologies


On Mittwoch, 29. Juni 2011 at 00:41, Sylvain Niles wrote:


We recently started trying to move our production environment over to
riak and we're seeing some weird behavior that's preventing us from
doing so:

Running: riak HEAD from github as of last Friday, riak_search turned
on with indexing of the problem bucket "events".
When we turn on our processes that start creating objects in the
bucket "events", the mr_queue directory starts growing massively and
the riak process starts spending most of its time in io_wait. With a
total of about 7000 objects (each is a ripple document that's got
maybe 10-20 lines of text in it) in the events bucket out bitcask dir
was ~240MB and the mr_queue dir was 1.2GB. Something is going horribly
wrong.. Our logs only show flow_timeouts for normal requests trying to
do simple map/reduce lookups. Any ideas where to look?

Thanks in advance,
Sylvain

___
riak-users mailing list
riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com






___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Lots of bitcask files for a vnode, unable to merge

2011-06-30 Thread Aphyr
One of the vnodes on one of my hosts has a *lot* of bitcask data/hint 
files, and makes a new one every 3 minutes. In the logs, I get


=ERROR REPORT 30-Jun-2011::20:24:14 ===
Failed to merge 
["/var/lib/riak/bitcask/794976964837219653749465284983368790965189869568", 
[],

...HUGE LIST OF DATA FILES...

in bitcask_fileops:fold_loop, bitcask:merge_single_entry, merge_files, 
merge1, bitcask_merge_worker:do_merge.


Here's the directory:

...
-rw---   1 riak riak 0 2011-06-30 19:55 1309481706.bitcask.data
-rw-r--r--   1 riak riak 0 2011-06-30 19:55 1309481706.bitcask.hint
-rw---   1 riak riak 0 2011-06-30 19:58 1309481886.bitcask.data
-rw-r--r--   1 riak riak 0 2011-06-30 19:58 1309481886.bitcask.hint
-rw---   1 riak riak 0 2011-06-30 20:01 1309482066.bitcask.data
-rw-r--r--   1 riak riak 0 2011-06-30 20:01 1309482066.bitcask.hint
-rw---   1 riak riak 0 2011-06-30 20:04 1309482246.bitcask.data
-rw-r--r--   1 riak riak 0 2011-06-30 20:04 1309482246.bitcask.hint
-rw---   1 riak riak 0 2011-06-30 20:07 1309482426.bitcask.data
-rw-r--r--   1 riak riak 0 2011-06-30 20:07 1309482426.bitcask.hint
-rw---   1 riak riak 0 2011-06-30 20:10 1309482606.bitcask.data
-rw-r--r--   1 riak riak 0 2011-06-30 20:10 1309482606.bitcask.hint
-rw---   1 riak riak 0 2011-06-30 20:13 1309482786.bitcask.data
-rw-r--r--   1 riak riak 0 2011-06-30 20:13 1309482786.bitcask.hint
-rw---   1 riak riak 32948 2011-06-30 20:21 1309482913.bitcask.data
-rw-r--r--   1 riak riak  1043 2011-06-30 20:21 1309482913.bitcask.hint
-rw---   1 riak riak 0 2011-06-30 20:18 1309483092.bitcask.data
-rw-r--r--   1 riak riak 0 2011-06-30 20:18 1309483092.bitcask.hint
-rw---   1 riak riak 0 2011-06-30 20:21 1309483272.bitcask.data
-rw-r--r--   1 riak riak 0 2011-06-30 20:21 1309483272.bitcask.hint

Any ideas as to how it could have gotten into this state, and how to fix it?

--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Namespace in Ripple?

2011-07-01 Thread Aphyr

class TcWeb::Root
  include Ripple::Document
  bucket_name = 'roots' # or tcweb_roots, whatever
  ...
end

On 07/01/2011 08:25 AM, Thomas Fee wrote:

I'm currently using Ripple with the application name prepended to the
typename in an effort to artificially create a namespace for app, to not
collide with other apps, e.g.

class TcwebRoot
   include Ripple::Document
   property :typed_root_symbol, String, :presence => true
   key_on   :typed_root_symbol
   # et cetera
end

Where "Tcweb" is the appname functioning as a namespace prefix. The
object class should ideally be called just "Root". I would prefer to not
have "Tcweb" mangled into the classname.

Does Ripple allow this sort of thing?...

Ripple::namespace("Tcweb")
class Root
   include Ripple::Document
   property :typed_root_symbol, String, :presence => true
   # et cetera
end

Note:
With my name mangling, plus Ripple's name mapping conventions, a
TcwebRoot object is currently queried like this...
http://172.22.59.51:8098/riak/tcweb_roots/%24200-KOR%7C0



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak crashing due to "eheap_alloc: Cannot allocate xxxx bytes of memory"

2011-07-05 Thread Aphyr
Since you were able to create and write these objects in the first 
place, you probably had enough ram at one point to load and save them. I 
would try bringing each node up in isolation, then issuing a delete 
request against the local node, then restarting the node in normal, 
talking-to-the-ring mode. If there are any local processes you can stop 
to free up memory, try that too.


When I encountered this problem, I was able to use the riak:local_client 
at the erlang shell to delete my huge objects--so long as other 
processes weren't hammering it with requests.


--Kyle

On 07/05/2011 09:28 PM, Jeff Pollard wrote:

Thanks to some help from Aphyr + Sean Cribbs on IRC, we narrowed the
issue down to us having several multiple-hundred-megabyte sized
documents and one 1.1 gig document.  Deletion of those documents has now
kept the cluster running quite happily for 3+ hours now, where before
nodes were crashing after 15 minutes.

I've managed to delete most of the large documents, but there are still
a handful (3) that I am unable to delete.  Attempts to curl -X DELETE
them result in 503 error from Riak:

< HTTP/1.1 503 Service Unavailable
< Server: MochiWeb/1.1 WebMachine/1.7.3 (participate in the frantic)
< Date: Wed, 06 Jul 2011 04:20:15 GMT
< Content-Type: text/plain
< Content-Length: 18

<
request timed out


In the erlang.log, I see this right before the timeout comes back:

=INFO REPORT 5-Jul-2011::21:26:35 ===
[{alarm_handler,{set,{process_memory_high_watermark,<0.10425.0>}}}]


Anyone have any help/ideas on what's going on here and how to fix it?

On Tue, Jul 5, 2011 at 8:58 AM, Jeff Pollard mailto:jeff.poll...@gmail.com>> wrote:

Over the last few days we've had random nodes in our 5-node cluster
crash with "eheap_alloc: Cannot allocate  bytes of memory"
errors in the erl_crash.dump file.  In general, the error messages
seem to crash trying to allocate 13-20 gigs of memory (our boxes
have 32 gigs total).  As far as I can tell crashing doesn't seem to
coincide with any particular requests to Riak.  I've tried to make
some sense fo the erl_crash.dump file but haven't had any luck.  I'm
also in the process of restoring our riak bakups to our staging
cluster in hopes of more accurately reproducing the issue in a less
noisy environment.

My questions for the list are:

   1. Any clue how to further diagnose the issue? I can attach my
  erl_crash.dump if needed.
   2. Is it possible/likely this is due to large m/r requests?  We
  have a couple m/r requests.  One goes over no more than 4
  documents at a time while the other goes over anywhere between
  60 and 10,000 documents, though more towards the smaller
  number of documents.  We use 16 js VMs with max memory for the
  VM and stack of 32 MB, each.
   3. We're running riak 0.14.1.  Would upgrading to 0.14.2 help?

Thanks!




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Bitcask merge

2011-08-25 Thread Aphyr
Have you checked that your bitcask maximum file size is small enough? 
Bitcask will only merge *inactive* files, so if your active file limit 
is 500MB and your active file is 320, you won't merge.


--Kyle

On 08/25/2011 06:43 AM, raghwani sohil wrote:

I have deleted all the keys from all buckets manually and *before
deleting *the**keys the size of *bitcask

*directory was *64 GB *and after deleting it is showing me the same i.e
*6.4 GB*.

So to run bitcask merging process i have added this line

  {riak_kv_bitcask_backend, [
 {small_file_threshold, 335544320}
  ]}


in my app.config  . I have added this line because size of all data file
in bitcask directory are ( < 320 MB ) .

but bitcask merging process is not running . Is there  any solution to
run bitcask merge process so that i can reduce size of my bitcask
directory ??




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


SF talk: Scaling at Showyou

2011-09-06 Thread Aphyr

Hello all,

Wanted to say that John Mullerleile and I will be giving a talk on 
high-volume, high-availability technologies at http://showyou.com. We'll 
discuss building an application with Riak, Solr, distributed queuing, 
and metrics, and present some new open-source tools we've built to 
tackle these problems.


We're offering free food and drinks, plus you get to hang out with other 
riakers after.


When:  Monday, September 26th, 2011
   7-10 PM

Where: 48 Gold St (Right next to Bix)
   San Francisco, CA 94133

http://scalingatshowyou.eventbrite.com

--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Bitcask folder 0

2011-09-14 Thread Aphyr
Yep, that's partition 0. Partitions are spaced evenly around the hash 
range, which is [0, 2^160) and cover hashes starting at their name. If 
you have two partitions, they'll be {0, N/2}. three partitions: {0, N/3, 
2N/3}.


--Kyle

On 09/14/2011 10:56 AM, Jeremy Raymond wrote:

In /var/lib/riak/bitcask there are a bunch of data directories whose
name are a string of numbers. On once node I noticed I have a directory
whose name is 0. This seemed out of place as the directory names are
typically a large string of numbers. Is this 0 directory normal?

- Jeremy



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak security

2011-09-30 Thread Aphyr
We've been over this several times on riak-users, which suggests to me a 
blog post might help. I'll try to draft something.


On 09/30/2011 11:00 AM, Kyle Quest wrote:

This is a pretty common situation with the NoSQL databases. They have
no security and the standard answer is that it's your job to do with
firewalls and proxies. This is a good indication that the NoSQL world
is still in its infancy. Security features will get there eventually
and Accumulo is an example of progress in terms of security
capabilities, but it's going to take a while... a long while :-)


If you have a sensible, flexible, comprehensive proposal for access 
control in an eventually consistent distributed key-value store, I would 
love to hear about it. Thus far, every attempt I've encountered rapidly 
approaches the proverbial "blatant layering violation", or is a subset 
of {inconsistent, inadequate, overly specific, complex, slow}.


Accumulo (IIUC, docs are scarce) tags data with authorizations; 
restricting access is up to the client. You can do that in Riak. You can 
front Riak with a layer which allows puts/gets (globally, to buckets, to 
keys, at various times, etc.) on the basis of HTTP auth, sessions, IP, 
certificate, etc. You can also implement access control by writing valid 
state transitions on an object to a statebox and invalidating concurrent 
modifications that introduce policy conflicts. Authentication provided 
by cryptographic hashes on transitions, or by a layer above. You could 
introduce a lock service and use it to enforce certain classes of 
sequential access, simplifying consistency.


Hopefully this suggests why it's not as simple as "adding security to 
the database". There are a wide variety of security semantics, and many 
can be layered on top of Riak (or any other datastore) without changing 
the database itself.



Now in this case you can do something :-) One option is to use a web
proxy that would expose two different ports for GET and PUT requests
and then have the appropriate HTTP method filters for each of those
ports.


Writes require reads beforehand, but this will do the trick.


However, this doesn't really do much for security because these
GET and PUT requests will still be sent to the same Riak node.


...


A better solution is to have separate Riak nodes for reads and writes.


Riak forwards requests to the appropriate vnode for a key. Doing this 
would have no effect on operational semantics.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak security

2011-09-30 Thread Aphyr

On 09/30/2011 01:28 PM, Kyle Quest wrote:

Having separate nodes for reads and writes provides an opportunity for
better isolation and control even when the requests are forwarded to
different vnodes...


I humbly suggest this is a bad idea. Varying behavior between nodes

  a.) is a headache to configure and maintain
  b.) creates fault-tolerance problems
  c.) creates unbalanced loads

It's at odds with the symmetric layout of a dynamo system. Best case, 
you'll fail more often. In addition, splitting reads and writes will 
require you to work harder to handle client IDs. If you aren't careful, 
you'll lose data.



Just like with anything else you can always build something on top...
The difference in maturity is determined by what you have to build
yourself vs what's already available and integrated into a unified
solution. Yes, there are different and unique aspects to how NoSQL
databases operate, but it's no excuse for not having any integrated
security. It's going to take time for integrated solutions to emerge,
but this is exactly what I was saying about the maturity stage the
NoSQL databases are in.


What was the last datastore you *didn't* have to wrap in your own 
security layer? I've built... I dunno, twenty or thirty of them for 
various applications. Because trust is complicated, sufficiently general 
security systems require almost as much configuration and integration 
glue as the code you'd write to do it from existing primitives.


That said, if you have a proposal for a security model I'd like to see 
it. There is a dearth of pluggable access control layers for datastores. 
I suspect there's a good reason for that.



Either way saying, "you customer go take care of our database
product security" is not the answer :-)


It is an entirely reasonable answer; access control is almost totally 
orthogonal to robust data storage. You're asking Ikea to control who 
puts things in your cabinets.


I'm guessing the reason your posts appear so confused is because you 
don't have a clear idea about what properties a "database security" 
system would have. It might be worth writing down your specific 
requirements, and asking "What percentage of use cases will this satisfy 
efficiently?"


--Kyle Kingsbury

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak security

2011-09-30 Thread Aphyr

On 09/30/2011 02:50 PM, Kyle Quest wrote:

I'm not here to define a perfect infrastructure for securing NoSQL
databases and Riak and go into implementation details... It's not my
intention because I simply don't have time to dedicate to this big
project and it's impossible to come up with a perfect solution right
away. Either way asking customers to be security experts is asking
for trouble... And I base this statement on the actual real world
experience in security, which I have quite a bit. I'll leave it on
this note :-) And let's talk in 10 or 15 years :-)


Let's skip the ad hominem. I'm gay. You are *not* going to win a
bitchiness contest.

I want to help people build robust, secure systems. What little you've
proposed is not only useless but dangerous. I can't risk someone
implementing it.

I'm volunteering my time to answer questions on building secure
applications in general: with Riak, with MySQL, with HTTP, with Active
Directory, whatever. Not an expert on everything, but I can provide
pointers to more comprehensive sources. Feel free to contact me off-list
if it doesn't pertain to Riak.

I'll also try to write an introductory blog post on application security
this weekend. If you'd like to contribute, or just want to see some
topic covered, let me know.

--Kyle Kingsbury

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Systems Security: a Primer

2011-10-02 Thread Aphyr
As promised, a brief overview of designing secure applications, with a 
quick rundown of how you might expose Riak to the world.


http://aphyr.com/journals/show/systems-security-a-primer

--Kyle Kingsbury

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Aphyr
Option C: Deploy your web servers with a list of hosts to connect to. 
Have the clients fail over when a riak node goes down. Lower latency 
without sacrificing availability. If you're using protobufs, this may 
not be as big of an issue.


--Kyle

On 10/04/2011 02:04 PM, O'Brien-Strain, Eamonn wrote:

I am contemplating two different architectures for deploying Riak nodes and web 
servers.

Option A:  Riak nodes are in their own cluster of dedicated machines behind a 
load balancer.  Web servers talk to the Riak nodes via the load balancer. (See 
diagram http://eamonn.org/i/riak-arch-A.png )

Option B: Each web server machine also has a Riak node, and there are also some 
Riak-only machines.  Each web server only talks to its own localhost Riak node. 
(See diagram http://eamonn.org/i/riak-arch-B.png )


All machines will deployed as elastic cloud instances.  I will want to spin up 
and spin down instances, particularly the web servers, as demand varies.  Both 
load balancers are non-sticky.  Web servers are currently talking to Riak via 
HTTP (though might change that to protocol buffers in the future).  Currently 
Riak is configured with the default options.

Here is my thinking of the comparative advantages:

Option A:

  - Better for security, because can lock down the Riak load balancer to only 
open a single port and only for connections from the web servers.
  - Less churn for Riak of nodes entering and leaving the Riak cluster (as web 
servers spin up and down)
  - More flexibility in scaling storage and web tiers independently of each 
other

Option B:

  - Faster localhost connection from web server to Riak

I think availability is similar for the two options.

The web server response time is the primary metric I want to optimize.  Most 
web server requests will cause several requests to Riak.

What other factors should I take into account?  What measurements could I make 
to help me decide between the architectures?  Are there other architectures I 
should consider? Should I add memcached? Does anyone have any experiences they 
could share in deploying such systems?

Thanks.
__
Eamonn

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Aphyr
Internode times in our datacenter at SL are indistinguishible from 
loopback; TCP/IP processing dominates. HTTP, on the other hand, involves 
either in-depth connection management/multiplexing, or TCP/IP 
setup/teardown latency at either end of a request. In read-write heavy 
apps, protobufs outperforms HTTP in throughput by 2x or more, against 
objects of 500-4000 bytes. That's with the ruby client; ymmv.


--Kyle

On 10/04/2011 07:18 PM, Greg Stein wrote:


On Oct 4, 2011 7:01 PM, "Mike Oxford" mailto:moxf...@gmail.com>> wrote:
 >
 > You'll want to run protobufs if you're looking to optimize your
 > response time; HTTP sockets (even to localhost) will require much more
 > overhead and time.

Hmm? The protocol seems moot, compared to inter-node comms when r > 1
Protocol parsing just doesn't seem like much of a factor. On my laptop,
I was seeing a 3ms response time against one node. I can't imagine that
parsing was more than a few percent, no matter the protocol.

(and no, I have no specific numbers to confirm/deny my thought
experiment here)

 > Even better would be unix sockets if they're available, and you can
 > bypass the whole TCP stack.

What? Is that even an option for Riak? I haven't seen anything about that.

 >...

Cheers,
-g



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Have Riak servers in separate cluster behind a load balancer, or on same machines as web server?

2011-10-04 Thread Aphyr
It's not an intrinsic property of HTTP... more that some of the HTTP 
libraries the clients are built on can have awkward semantics for using 
connections efficiently. Sounds like you've already addressed this, 
which is great. Mochiweb + HTTP parsing + mime-multipart will introduce 
some time/space overhead compared to tagged values in protobufs, but it 
may be negligible. Try it and see!


--Kyle

On 10/04/2011 09:09 PM, Greg Stein wrote:

I don't see that multiplexing or TCP setup is specific to HTTP.

The only difference between protobuf and HTTP is what goes on the wire.
Not how the wire is managed.

(and with that said, the Python client managed the wire in the most
horrible ways imaginable for the HTTP Client; I've since fixed that on
my branch)

On Oct 4, 2011 11:37 PM, "Aphyr" mailto:ap...@aphyr.com>> wrote:
 > Internode times in our datacenter at SL are indistinguishible from
 > loopback; TCP/IP processing dominates. HTTP, on the other hand, involves
 > either in-depth connection management/multiplexing, or TCP/IP
 > setup/teardown latency at either end of a request. In read-write heavy
 > apps, protobufs outperforms HTTP in throughput by 2x or more, against
 > objects of 500-4000 bytes. That's with the ruby client; ymmv.
 >
 > --Kyle
 >
 > On 10/04/2011 07:18 PM, Greg Stein wrote:
 >>
 >> On Oct 4, 2011 7:01 PM, "Mike Oxford" mailto:moxf...@gmail.com>
 >> <mailto:moxf...@gmail.com <mailto:moxf...@gmail.com>>> wrote:
 >> >
 >> > You'll want to run protobufs if you're looking to optimize your
 >> > response time; HTTP sockets (even to localhost) will require much more
 >> > overhead and time.
 >>
 >> Hmm? The protocol seems moot, compared to inter-node comms when r > 1
 >> Protocol parsing just doesn't seem like much of a factor. On my laptop,
 >> I was seeing a 3ms response time against one node. I can't imagine that
 >> parsing was more than a few percent, no matter the protocol.
 >>
 >> (and no, I have no specific numbers to confirm/deny my thought
 >> experiment here)
 >>
 >> > Even better would be unix sockets if they're available, and you can
 >> > bypass the whole TCP stack.
 >>
 >> What? Is that even an option for Riak? I haven't seen anything about
that.
 >>
 >> >...
 >>
 >> Cheers,
 >> -g
 >>
 >>
 >>
 >> ___
 >> riak-users mailing list
 >> riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
 >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Need help with Mapreduce

2011-10-05 Thread Aphyr
Like it says, the request that you submitted isn't JSON. MR functions 
belong in the source attribute of the JSON document, not floating 
outside it.


--Kyle

On 10/05/2011 05:19 PM, urvi wrote:

I am trying to use this fuctntion to get the highest number from given
date. My map is working fine but reduce function is giving me error.
Please help me

curl -X POST http://x:8098/mapred -H "Content-Type: application/json" -d @-
function(value, keyData, arg){
var data = Riak.map.ValuesJson(value) [0];
return [data.High];
}

{"inputs":[["symc","2011-10-04"],
["symc","2011-09-02"],
["symc","2011-10-03"]],
"query":[{"map":{"language":"javascript",
"source":"function(value, keyData, arg) { var data =
Riak.mapValuesJson(value)[0]; return [data.High];}"
}}
{"reduce":{"language":"javascript","name":"Riak.reduceMax","keep":true}}]
}
--Ctrl-D to submit it

Output :

The POST body was not valid JSON.
The error from the parser was:
{{case_clause,<<"function(value, keyData, arg){ var data =
Riak.map.ValuesJson(value) [0]; return
[data.High];}{\"inputs\":[[\"symc\",\"2011-10-04\"],
[\"symc\",\"2011-09-02\"], [\"symc\",\"2011-10-03\"]],
\"query\":[{\"map\":{\"language\":\"javascript\",
\"source\":\"function(value, keyData, arg) { var data =
Riak.mapValuesJson(value)[0]; return [data.High];}\" }}
{\"reduce\":{\"language\":\"javascript\",\"name\":\"Riak.reduceMax\",\"keep\":true}}]}">>},
[{mochijson2,tokenize,2},
{mochijson2,decode1,2},
{mochijson2,json_decode,2},
{riak_kv_mapred_json,parse_request,1},
{riak_kv_wm_mapred,verify_body,2},
{riak_kv_wm_mapred,malformed_request,2},
{webmachine_resource,resource_call,3},
{webmachine_resource,do,3}]}



View this message in context: Need help with Mapreduce

Sent from the Riak Users mailing list archive
 at Nabble.com.



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak 1.0, Clojure and the Java Client

2011-10-10 Thread Aphyr

On 10/07/2011 04:23 PM, Tim Robinson wrote:


I just read the Satebox page you linked as an example and have a hard
time thinking I would want to use this. While automation is always nice,
the overhead is an unnecessary burden. Since Clojure provides
coordinated/transactional data structures, it's already easy *enough* to
resolve conflicts within your natural code flow without having to resort
to the rationalizing of queued values. Also, I can only speak for
myself, but I believe most people would only want this to apply in
selective cases such that a performance hit is not taken for the other
90% of data where last write winning is just fine.

Does that make sense to you? I could be completely off considering I
only read the 5 minute 'read-me' blurb.


Do you ever plan to have more than one clojure runtime modify the same 
object? For that matter, do you ever plan to restore a node from a 
backup or failure? That's where statebox comes into play.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: network-based access control

2011-10-17 Thread Aphyr
Yes; front Riak with a proxy which performs the appropriate access 
control. Note that you'll have to ban (or have a javascript/erlang 
interpreter to identify/contain incorrect access) mapreduce through this 
proxy as well.


--Kyle

On 10/17/2011 10:39 AM, Simon Chen wrote:

Hi folks,

Is it possible to perform some network-level access control in riak?
For example, clients within a network prefix can only access a
specific set of buckets?

How is riak usually set up in a multi-tenant environment?

Thanks.
-Simon

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Do not expose Riak to the Internet

2011-10-19 Thread Aphyr
With Eric Redmond's permission*, I am releasing a proof-of-concept 
exploit which uses Riak's mapreduce API to execute arbitrary Erlang code 
and obtain shell access.


http://aphyr.com/journals/show/do-not-expose-riak-directly-to-the-internet

Please stop doing this.

--Kyle

* Eric (http://crudcomic.com/post/11656603627/downside-of-crowdsourcing) 
announced a publicly accessible Riak cluster today; with his 
permission/supervision I used this exploit to gain shell access to his 
server. He's taken down that cluster and encouraged me to release this 
information.


I love the idea of a public sandbox for Riak--but it should be wrapped 
in a rate-limiting HTTP proxy that only allows certain URLs, object 
sizes, and methods. And limits the number of keys. :)


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Do not expose Riak to the Internet

2011-10-19 Thread Aphyr

On 10/19/2011 04:36 PM, Nate Lawson wrote:

You can call 'os:cmd' to shell out from a M-R job. You can't do that directly 
in MySQL.


No, but you can do other interesting things. Writing binaries to the 
filesystem will get you quite a ways. :)


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Do not expose Riak to the Internet

2011-10-20 Thread Aphyr

On 10/20/2011 08:42 AM, Jonathan Langevin wrote:

Don't the other KVs implement M/R as well? If so, how is it they
don't have a similar exploit? Is this issue due specifically to
Basho's usage of Erlang for processing M/R?

Is it possible to circumvent this type of exploit by running Erlang
& Riak as unprivileged users?

Seems sandboxing may be needed for all nodes to ensure server
integrity. If someone made it into a node somehow (regardless of
public availability), it's still quite a concern that they could then
exploit Riak to gain root access.


I've actually been looking into this. MongoDB, for instance, exposes a
delightful javascript shell with a version of Spidermonkey (and, it
appears, V8 and Java! Even *more* surface area!) which could be
susceptible to arbitrary-code-execution attacks. There are some
interestingly named functions in their JS context, like "nativeHelper",
and a few undocumented server control functions, but without any real
comments in their source tree I'm finding it hard to follow.

I *do* know that multitenant mongo is vulnerable to trivial
denial-of-service vulnerabilities, thanks to a global write lock and
gleefully executing javascript everywhere. While we're talking DoS, it's
worth mentioning that if you can convince a sufficiently large riak
cluster to list-keys, it *will* go down.*

As for privileges; Basho's debian packages set up Riak as an
unprivileged user, and I presume most sysadmins are doing something
similar. In most cases, having full access to db data files is scary
enough to warrant caution. It doesn't stop there, however.

After exploiting the javascript VM one could inject an executable onto
the system; some small payload designed to get an interactive shell or
execute code from a remote server. There are many ways to escalate
privileges under linux--some direct (e.g.
http://www.exploit-db.com/exploits/15704/), some relying on other
services. From there, install a rootkit and proceed to neighboring boxes.

There are some easy ways to mitigate the damage caused by taking control
of the riak process; running riak in a chroot jail, for instance, and
aggressively tracking security patches for your operating system. You
can also disable erlang mapreduce and restrict the functions exported to
the javascript context... but these are deterrents, not failsafes.**

The answer is not to ban mapreduce (or distributed code execution of any
kind). The answer is to avoid running code from people in dark alleys on
a system you care about.*** :)

--Kyle

* http://twitter.com/#!/aphyr/status/124275497042591746/photo/1/large

** e.g. http://www.cvedetails.com/cve/CVE-2011-2998/. They won't show
you the bugzilla page unless you're on the sec team, so you *know* it's
juicy. :)

*** The best mapreduce system I can think of would be one with provable
execution consequences. Perhaps something like GHC (the haskell
compiler) could verify that one's mapreduce phases are purely
functional--or control which side-effect functions were exposed in
various contexts. Combine that with a memory/cpu/time budget for each
function.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Key Filter Timeout

2011-10-23 Thread Aphyr

On 10/23/2011 12:11 PM, Jim Adler wrote:

I will be loosening the key filter criterion after I get the basics
working, which I thought would be a simple equality check. 8M keys
isn't really a large data set, is it? I thought that keys were stored
in memory and key filters just operated on those memory keys and not
data.

Jim


That's about where we started seeing timeouts in list-keys. Around 25
million keys, list-keys started to take down the cluster. (6 nodes, 1024
partitions). You may not encounter these problems, but were I in your
position and planning to grow... I would prepare to stop using key
filters, bucket listing, and key listing early.

Our current strategy is to store the keys in Redis, and synchronize them
with post-commit hooks and a process that reads over bitcask. With
ionice 3, it's fairly low-impact. https://github.com/aphyr/bitcask-ruby
may be useful.

--Kyle

  # Simplified code, extracted from our bitcask scanner:
  def run
`renice 10 #{Process.pid}`
`ionice -c 3 -p #{Process.pid}`

  begin
bitcasks_dir = '/var/lib/riak/bitcask'
dirs = Dir.entries(bitcasks_dir).select do |dir|
  dir =~ /^\d+$/
end.map do |dir|
  File.join(bitcasks_dir, dir)
end

dirs.each do |dir|
  scan dir
  GC.start
end
log.info "Completed run"
  rescue => e
log.error "#{e}\n#{e.backtrace.join "\n"}"
sleep 10
  end
end
  end

  def scan(dir)
log.info "Loading #{dir}"
b = Bitcask.new dir
b.load

log.info "Updating #{dir}"
b.keydir.each do |key, e|
  bucket, key = BERT.decode(key).map { |x|
Rack::Utils.unescape x
  }
  # Handle determines what to do with this particular bucket/key
  # combo; e.g. insert into redis.
  handle bucket, key, e
end
  end

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Severe problems when adding a new node

2011-10-28 Thread Aphyr
I was waiting for Basho to write an official notice about this, but it's 
been three days and I really don't want anyone else to go through this 
shitshow.


1.0.1 contains a race condition which can cause vnodes to crash during 
partition drop. This crash will kill the entire riak process. On our 
six-node, 1024 partition cluster, during riak-admin leave, we 
experienced roughly one crash per minute for over an hour. Basho's 
herculean support efforts got us a patch which forces vnode drop to be 
synchronous; leave-join is quite stable with this change.


https://issues.basho.com/show_bug.cgi?id=1263

I strongly encourage 1.0.1 users to avoid using riak-admin join and 
riak-admin leave until this patch is available.


--Kyle

On 10/28/2011 08:14 AM, John Axel Eriksson wrote:

Last night we did two things. First we upgraded our entire cluster from 
riak-search 0.14.2 to 1.0.1. This process went
pretty well and the cluster was responding correctly after this was completed.

In our cluster we have around 40 000 files stored in Luwak (we also have about 
the same amount of keys, or more, in riak which is mostly
the metadata for the files in Luwak). The files are in sizes ranging from 
around 50K to  around 400MB, most of the files are pretty small though. I
think we're up to a total of around 30GB now.

Anyway, upon adding a new node to the now 1.0.1 cluster I saw the beam.smp 
processes on all the servers, including the new one, taking
up almost all available cpu. It stayed in this state for around an hour and the 
cluster was slow to respond and occasionally timed out. During the
process Riak crashed on random nodes from time to time and I had to restart it. 
After about an hour things settled down. I added this
new node to our load-balancer so it too could serve requests. When testing our 
apps against the cluster we still got lots of timeouts and something
seemed very very wrong.

After a while I did a "riak-admin leave" on the node that was added (kind of a 
panic move I guess). Around 20 minutes after I did this, the cluster started
responding correctly again. All was not well though - files seemed to be 
corrupted(not sure what percentage but could be 1 % or more). I have no idea how
that could happen but files that we had accessed before now contained garbage. 
I haven't thoroughly researched exactly WHAT garbage they contain but
they're not in a usable state anymore. Is this something that could happen 
under any circumstances in Riak?

I'm afraid of adding a node at all now since it resulted in downtime and 
corruption when I tried it. I checked and rechecked the configuration files and 
really - they're
the same on all the nodes (except for vm.args where they have different names 
of course). Has anyone ever seen anything like this? Could it somehow be 
related to
the fact that I did an upgrade from 0.14.2 to 1.0.1 and maybe an hour later 
added a new 1.0.1 node?

Thanks for any input!

John
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: key name conventions

2011-10-28 Thread Aphyr
Yep, two buckets: one for users, one for users_by_id. Or, you could use 
secondary indexes, and not worry about keeping the ids in sync. 
http://basho.com/blog/technical/2011/09/14/Secondary-Indexes-in-Riak/


For ID generation, UUIDs will work, SHA1s will work, or you could use an 
ID generation service like Snowflake. https://github.com/twitter/snowflake


On 10/28/2011 11:09 AM, Justin Karneges wrote:

Hi,

I read that when setting a value I can choose a key name or let Riak come up
with a name for me.  The majority of examples in the docs seem to choose
names.  However, it seems like anytime you'd store "table"-ish data, you'd
want to avoid choosing names for your keys ("rows") and let Riak do it, right?

For example, suppose I want to store some user data, and I let users change
their usernames whenever they want.  Here's a natural first pass at a schema:

/users/{user_fixed_id}:
   data: user data

/users_by_name/{username}:
   data: empty
   link: /users/{user_fixed_id}

Here I have the real user data keyed by some fixed id, and then pointers to
those objects keyed by name.  If the user ever changes his name I delete the
old pointer and create a new one.

"users" is basically the table then, and I let Riak choose the ids.  I have to
put the index ("users_by_name") as a different bucket so that Riak ids don't
conflict with potential usernames.  Unless there is a way to control how Riak
chooses its ids (like choosing key name prefixes), I think you pretty much have
to split into a bucket right?

Alternatively, the client could generate the fixed value itself, allowing it to
use key name prefixes in the same bucket (e.g. "user_{user_fixed_id}" and
"username_{username}"}.  Generated ids would most certainly have to be UUIDs,
though it might be interesting to know if an auto-increment integer is
possible, if the integer is stored in Riak.

I'm just trying to get an idea of how people go about this.

Thanks,
Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: atomically updating multiple keys

2011-10-30 Thread Aphyr
One easy way to solve an atomic a->b relationship in an eventually 
consistent way is to require the existence of both A and B in order to 
consider the write valid, to write A first, and use a message queue to 
retry writes until both A and B exist. There are other approaches for 
agreement between objects, but they add complexity. Paxos, 2PC, etc.


--Kyle

On 10/30/2011 09:42 AM, Les Mikesell wrote:

On Sun, Oct 30, 2011 at 10:43 AM, Alexander Sicular  wrote:

Greetings Justin,

IMHO, AFAIK, IANAL, etc. "is it ok?" really boils down to whatever you're ok
with, can program, can understand, is within you're budget, can implement
and/or do all of the above within your own timeline. I think it is always
true that the number of opinions you will get is more than the number of
participants in the conversation. Ergo, filter all contributions through the
lense of: Go with what you know. Coming here for advice is fine but like
preventing wildfires, only you know your use case.

That said, you may want to check out using a message queue to control the
asynchronous, eventually consistent data model I feel you feel you are
leaning towards.


I don't see how pushing the problem elsewhere helps.  Even if your
message queue orders a set of updates in the right sequence, riak does
not have a mechanism to ensure that they go into the db in that order
and a delete might happen before the insert intended to be done first
but actually happening concurrently - unless you make the writes wait
for all nodes on every operation.  I think riak has a concept of a
partition 'owner' node, but it is used only in the data migration
process for failover and node adds/removes, not to give ordinary
writes an atomic property.



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: safely resolving conflicts on read

2011-11-02 Thread Aphyr

On 11/02/2011 10:40 AM, Justin Karneges wrote:

Thanks everyone for these replies (and also Aphyr, off-list).  It has helped me
confirm my suspicions and sounds like I'm on the right track.

For one of my keys, I am doing sort of a manual "last write wins" by having
the reader sort siblings by timestamp, then by vtag, to deterministically
select the same sibling every time.  The reason for keeping the other siblings
around is they may contain the only references to other keys created along
with them.  A separate cleanup process can then be sure to delete the referred
keys before removing the siblings.  And of course the algorithm used to
determine the winning sibling is shared by both the read function and the
cleanup function.


We do something similar. A feed, for example, stores a list of keys 
pointing to feed items. In memory, I store "pending insertion" and 
"pending deletion" lists on a feed. At save time, a model callback saves 
the items pending insertion, and deletes any pending deletion.


The conflict resolution operator for a feed takes the union of all feed 
item keys, sorts them (to keep the most recent X) and truncates the 
list--storing all the truncated items in the pending-deletion list. This 
isn't perfect, but has acceptable probabilistic bounds for our use case.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: best practices for testing eventual consistency?

2011-11-15 Thread Aphyr
The fastest thing is probably to generate conflicts right below the 
conflict resolution system. If you are worried you can't predict the 
conflicts at all, go ahead and perform multiple reads and writes at 
overlapping times. No need for excessive load; controlling the timing 
alone should be sufficient.


--Kyle

On 11/15/2011 10:23 AM, Jesse Myers wrote:

I'm contemplating migrating a write-intensive system from MySQL to Riak.

I understand the eventual consistency model and the need to resolve
conflicts in application code, especially if allow_mult is true. My
concern is that I won't discover all of the conflict scenarios my
application code needs to handle until after we're live in production.
Are there best practices for producing conflicts in development
environment? Is my best option to simulate a large amount of load and
see what happens? Should I lower my R or W values? Kill off nodes
randomly?

Related question: I'd like to write unit tests for conflict scenarios
I anticipate/encounter. Do any of the client libraries come with good
mock support or is that something I need to roll myself?

Thanks,

Jesse

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Social network data / Graph properties of Riak

2011-11-18 Thread Aphyr
Depending on whether you think it will be more efficient to store the 
graph or its dual, consider each node a vertex and write the adjacency 
list as a part of its data. You can store whatever weights, etc. you 
need on the edges there.


Don't use links; they're just a thin layer on top of mapreduce, so 
there's really not much advantage to using them. Links are subject to 
the HTTP header lengths too, so storing more than a thousand on any node 
is likely to break down.


--Kyle

On 11/18/2011 07:38 AM, Jeroen van Dijk wrote:

Hi all,

I'm currently evaluating whether Riak would fit as the main storage of
my current project, a social network. The reason I am attracted to Riak
and less to a Graph database as main storage is that I want the easy
horizontal scalability and multi-site replication that Riak provides.
The only thing I doubt is whether the key-value/link model of Riak is
flexible enough to be able to store a property graph
(http://arxiv.org/abs/1006.2361). I am not asking whether the
querying/graph traversing will be easy; I'm probably going to use a
graph database or a Pregel like platform (e.g.
http://www.goldenorbos.org/) for that problem. I guess my main question
is whether it would be easy/feasible to import and export a property
graph in and from Riak? Has someone done this before?

I realize the above might be too specific, so here are two more
questions that I think are relevant:

- Is there a known upper limit of links that can be stored (I don't want
to add them all at once so 1000 per request is fine,
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-March/000786.html)
- Is there a way to add meta data to links (edges)? E.g. weigths and
other attributes.

Any other ideas or advise are also highly appreciated.

Cheers,

Jeroen



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Social network data / Graph properties of Riak

2011-11-18 Thread Aphyr

On 11/18/2011 11:50 AM, Jeroen van Dijk wrote:

And I also didn't include the riak user list for this reply:


On Fri, Nov 18, 2011 at 7:04 PM, Aphyr mailto:ap...@aphyr.com>> wrote:

Depending on whether you think it will be more efficient to store
the graph or its dual, consider each node a vertex and write the
adjacency list as a part of its data. You can store whatever
weights, etc. you need on the edges there.

Don't use links; they're just a thin layer on top of mapreduce, so
there's really not much advantage to using them. Links are subject
to the HTTP header lengths too, so storing more than a thousand on
any node is likely to break down.


Thank you for this suggestion. Also thanks for the warning on not using
links for what I want. So you are saying each vertex will have a list of
the other vertices that it is connected to? And save each edge as
key/value pair? Or are you saying each vertex should embed the adjacent
edges, meaning duplicated edges?


Depends on whether you want to store the graph or its dual. If you're 
dealing with a sparse DAG where the input keys are likely to be vertex 
names, the natural choice is to store each vertex as an object in riak 
and each outbound edge from it as a property of that vertex.


/users/user1:
  owns: [item1, ...]
  follows: [user2, ...]

Of course, this method doesn't work well when you need to follow edges 
in reverse, so you may need to store the reciprocal relationship on 
target nodes:


/items/item1:
  owner: user1,

/users/user2:
  followed-by: [user1, ...]

and so forth. That requires two writes and potentially lossy conflict 
resolution strategies, e.g. a follows b but b is not followed by a. We 
use processes which walk the graph continuously and enforce relationship 
integrity as they go. We have a social graph of several hundred million 
objects stored this way in Riak, and it works reasonably well.


Naturally, you'll want to choose an encoding which is fast and 
space-efficient for a large dataset. Depending on your needs, JSON, 
protocol buffers, or fixed-length record entries might be work well.


Also consider the depth of traversals you'll need to perform. It may be 
best to use a graph database like neo4j for deep traversals. You could 
store your primary dataset in Riak and replicate only the important 
graph information to a graph DB via post-commit hooks. That would solve 
the reciprocal consistency problem (at the cost of replication lag), and 
could reduce the amount of data you need to put into the graph DB.


Given that Linkedin has this problem, you might look into their tools as 
well.


--Kyle


I'm guessing you mean the former, because that makes sense to me. So you
would save a graph assuming users and items like the following key/value
pairs:

//Vertices
user1: properties
user2: properties
item1: properties

//Edges
user1-owns-item1: properties
user1-follows-user2: properties
user2-follows-user1: properties

To be able to find the available edges, each vertices would need to
reference the keys of the edges. Is this what you mean?

If so, one more question about a possible problem. Say I have an item
with many many outgoing edges, so it needs to embed these references.
This would make it really costly to fetch this item from Riak I assume,
even if you are only interested in normal properties. Wouldn't that mean
you will have to save the properties seperately from the edges
references to it feasible?

Did I grasp what you were proposing Kyle?

Thanks,
Jeroen

--Kyle


On 11/18/2011 07:38 AM, Jeroen van Dijk wrote:

Hi all,

I'm currently evaluating whether Riak would fit as the main
storage of
my current project, a social network. The reason I am attracted
to Riak
and less to a Graph database as main storage is that I want the easy
horizontal scalability and multi-site replication that Riak
provides.
The only thing I doubt is whether the key-value/link model of
Riak is
flexible enough to be able to store a property graph
(http://arxiv.org/abs/1006.__2361
<http://arxiv.org/abs/1006.2361>). I am not asking whether the
querying/graph traversing will be easy; I'm probably going to use a
graph database or a Pregel like platform (e.g.
http://www.goldenorbos.org/) for that problem. I guess my main
question
is whether it would be easy/feasible to import and export a property
graph in and from Riak? Has someone done this before?

I realize the above might be too specific, so here are two more
questions that I think are relevant:

- Is there a known upper limit of links that can be stored (I
don't want
to add them all at once so 1000 per request is fine,

http://lists.basho.com/__pipermail/riak-users_lists.__basho

Re: slow 2 node cluster

2011-11-20 Thread Aphyr

On 11/20/2011 05:19 AM, Catalin Constantin wrote:

The connection between servers is 10MBytes / sec not 10Mbit / sec.


Are you sure? To my knowledge almost no ethernet gear runs at 10 MB/s. 
It's almost always 10, 100, 1000, or 1 Mb/s.


It may be your n_val. If it's the default (3), one of your two machines 
has to handle that third replica. That'll cut your throughput 
significantly. Finally, your disks may be the bottleneck. I'd take a 
look at iostat and look for significant (on our servers, 2.6% meant we 
were thrashing) IO_WAIT time on the riak beam.smp process.


--Kyle


One row of data looks like this:
309819178daz...@gmail.com
55942dzt1home2011-05-3116:22:102011-09-07
17:03:48127.0.0.111

I use Protobuf transport.
If dw = 0 and w = 0 there is no wait for other replica nodes, right ?

This should improve the write speed, correct ?

On Sun, Nov 20, 2011 at 2:21 PM, Erik Søe Sørensen mailto:e...@trifork.com>> wrote:

This depends quite a bit on the sizes of your objects.
Supposing an average size of 2KB, and n=3: on each write, on average
1.5 of the replicas would be on the other node, implying inter-node
network traffic of 1.5*2KB=3KB (and this is just in one direction).
If your inter-node network connection is indeed 10Mbit ~  1MB/s,
then 300 writes/s * 3KB = 0.9MB/s would just about saturate the
connection.

You may want to check the network utilization.

From: riak-users-boun...@lists.basho.com

[riak-users-boun...@lists.basho.com
] On Behalf Of Catalin
Constantin [daz...@gmail.com ]
Sent: 20 November 2011 11:10
To: riak-users@lists.basho.com 
Subject: slow 2 node cluster

Hello,

I am trying to evaluate / run some tests on a 4 mil dataset.
I have 2 nodes setup (different machines - 8GB ram each, I7 cpus).
10 MB connection between them.

I am trying to insert data into riak using w=1 and dw=1 (also tried
with dw = 0, w = 0).
For each risk object i have 4 indexes (2 binary 2 int).
Riak backend is leveldb.

I can't get more than 300 inserts per second.
I have also tried running 2 threads each hitting the different node.
Nothing changed too much.

Is this normal behavior ?

--
Catalin Constantin




--
Catalin Constantin
Dazoot Software
http://www.dazoot.eu/



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: slow 2 node cluster

2011-11-20 Thread Aphyr

On 11/20/2011 12:14 PM, Catalin Constantin wrote:

I am 100% sure the transfer rate is 10MBytes / second. This is not the
problem.


In ten years of network administration I have never encountered an 
ethernet device with a wire rate of 10 MBps. I have, however, 
encountered frequent confusion over units. Perhaps you understand my 
suspicion here. :)



IOWAIT is also pretty low. iostat shows: iowait 4.69%


If you read my email, I suggested that even values as low as 2.6% may 
suggest contention. It depends on your CPU arch and utilization. I would 
investigate your disks more closely. Are they spinning or solid-state? 
Median seek time? 95/99 seek times? Disk cache? Does the riak process 
spend disproportionate time in IO_WAIT relative to USER? Filesystem 
atime/relatime/noatime? FS block size properly aligned for your disk? 
Insufficient filesystem or leveldb cache? hdparm options? Is riak on an 
independent disk or competing with other processes, i.e. syslog, file 
servers, etc? Does strace show an unusual amount of time spent in 
certain system calls?


It might just be leveldb, too. I only have experience with bitcask in 
production.



I have retried the test with one node, a new bucket newly created where
i have set: n to 1.
bucket.set_n_val(1)

Results are the same. Less than 300 inserts / second.


This is good; it rules out replication.


Any idea why riak is so slow on inserting data ?


Disk, disk disk disk, disk disk? Disk!

You could also look at the client. Are you writing to a local node or a 
remote one? Is your client's threading model getting in the way? Can you 
actually produce data fast enough to insert it? Is your client fighting 
for the same resources as the riak process? Presuming you've ruled these 
out, it's almost certainly network or disk.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: slow 2 node cluster

2011-11-20 Thread Aphyr

On 11/20/2011 01:34 PM, Catalin Constantin wrote:

To make it simple. No more networking. Just one node (with n = 1) and
local tests.

The producing of data is a simple CSV file read (ruled out too cause
this is fast).


Read from the same disk? If you're interleaving every write with a read 
from this file, how many back-forth seeks do you think your disk is doing?



HDD: 2 x 750 GB SATA 2 (RAID1)


Hint hint hint.


What insert rate should i expect on a normal Debian 6.0 64 bit
installation (no tweaks) ?


450 inserts/second. Or, if you address some of the points I mentioned 
earlier, perhaps 2000-4000/sec, depending on write characteristics. Most 
people find performance improves linearly with nodes, so long as the 
network is not the bottleneck.


Our six-node cluster (bitcask-dedicated SSDs, 2x bonded gige, 2:1 
read:write ratio, median value ~10 kB, n_val 3, typical r/w: quorum) 
tops out at about 3,000 aggregate ops/sec while maintaining reasonable 
(~10ms 99%) latencies. I can push it higher if I relax latency constraints.



I can only compare it with other DBs i have tested on the same machine:
ex: mongodb, kyototycoon


These databases solve different problems in different ways. You should 
expect them to perform differently. The question is: for your workload, 
what balance of raw IOPS, redundancy, availability, latency, and 
conflict handling model fits best? Riak trades IOPS for availability and 
redundancy, and trades MVCC/locking for vclock resolution.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is Riak appropriate for website metrics?

2011-11-28 Thread Aphyr
For limited mapreduce (where you know the keys in advance) riak would be 
a fine choice. 500 million keys, n val 3 is readily achievable on 
commodity hardware; say four nodes with 128GB SSDs.


If large-scale mapreduce (more than a few hundred thousand keys) is 
important, or listing keys is critical, you might consider HBase.


If you start hitting latency/write bottlenecks, it may be worth 
accumulating metrics in Redis before flushing them to disk.


At Showyou, we're also building a custom backend called Mecha which 
integrates Riak and SOLR, specifically for this kind of analytics over 
billions of keys. We haven't packaged it for open-source release yet, 
but it might be worth talking about off-list.


--Kyle

On 11/28/2011 02:07 PM, Michael Dungan wrote:

Hi,

Sorry if this has been asked before - I couldn't find a searchable
archive of this list.

I was told to ask this list whether or not Riak would be appropriate for
tracking our site's metrics. We are currently using Redis for this but
are at the point where we need both clustering and m/r capability, and
on the surface, Riak looks to fit this bill (we already use Erlang
elsewhere in our app, so that's an additional plus).

The records are pretty small and can be representated easily in json. An
example:

{
"id": "c4473dc5cfc5da53831d47c4c016d1c7de0a31e4fd94229e47ade569ef011a7b"
"type": "Photo::Click",
"user_id": 2640,
"photo_id": 255,
"ip": "100.101.102.103",
"created_at": "2011/04/08 17:09:40 -0700"
}

We currently have around 25 million records similar to this one, and are
adding 4-5 million more each month.

Is Riak appropriate for this use case? Are there any gotchas I need to
be aware of?

thank you,

-mike


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is Riak appropriate for website metrics?

2011-11-28 Thread Aphyr

Sure.

To clarify, Riak mapreduce is decent. We store hundreds of millions of 
objects without trouble, and mapreduce over hundreds for many requests 
with decent (50-500ms) latencies.


It's just not the best for job over millions of keys; it will take much 
longer than a comparable job implemented in, say, Hadoop. It's also 
difficult to debug MR in riak--but it's difficult to debug Hadoop as 
well. If either *could* work, the answer probably falls down to "do you 
have the man-hours and expertise necessary to keep hadoop happy".


Riak can also collapse in horrible ways when asked to list huge numbers 
of keys. Some people say it just gets slow on their large installations. 
We've actually seen it hang the cluster altogether. Try it and find out! 
Basho understands this and is aiming to address it, but I've heard no 
specific timetable or plans. Meanwhile we pull keys out of the 
underlying storage directly, and cache them in Redis. That may be a 
viable solution for you.


Mecha is something experimental that John Mullerleile is working on.

http://www.slideshare.net/jmuellerleile/scaling-with-riak-at-showyou

Basically, it's a new backend for Riak (if you weren't aware, Riak has 
pluggable storage backends). You still read and write to Riak as normal, 
but underneath the hood, it stores the data in leveldb (one per 
partition per vnode), and *also* indexes specially named fields in a 
local solr core on each node. Using the coverage code in Riak 1.0, we 
can then issue a solr query to some subset of nodes and receive a 
response for all the values stored in Riak. You can filter, count, 
facet, etc by text, numbers, multivalued texts, geolocation, etc. I 
would describe it as "scary fast".


Downside is it's also experimental, and glues together a lot of 
different technologies. All those moving parts means we haven't had time 
to package it up and open-source it yet, but sometime in December or 
January we're hoping to focus on polish and release.


--Kyle

On 11/28/2011 02:59 PM, Michael Dungan wrote:

Thank you for getting back to me. It does look like we'll be needing to
go big, as we're already at 5m new records/month, so just dealing with
monthly numbers is already beyond the few hundred thousand keys you
mentioned, unless I'm thinking about this wrong.

I would love to hear more about Mecha if you're willing to share. Feel
free to contact me off-list.

thanks again,

-mike


On 11/28/11 2:24 PM, Aphyr wrote:

For limited mapreduce (where you know the keys in advance) riak would be
a fine choice. 500 million keys, n val 3 is readily achievable on
commodity hardware; say four nodes with 128GB SSDs.

If large-scale mapreduce (more than a few hundred thousand keys) is
important, or listing keys is critical, you might consider HBase.

If you start hitting latency/write bottlenecks, it may be worth
accumulating metrics in Redis before flushing them to disk.

At Showyou, we're also building a custom backend called Mecha which
integrates Riak and SOLR, specifically for this kind of analytics over
billions of keys. We haven't packaged it for open-source release yet,
but it might be worth talking about off-list.

--Kyle

On 11/28/2011 02:07 PM, Michael Dungan wrote:

Hi,

Sorry if this has been asked before - I couldn't find a searchable
archive of this list.

I was told to ask this list whether or not Riak would be appropriate for
tracking our site's metrics. We are currently using Redis for this but
are at the point where we need both clustering and m/r capability, and
on the surface, Riak looks to fit this bill (we already use Erlang
elsewhere in our app, so that's an additional plus).

The records are pretty small and can be representated easily in json. An
example:

{
"id": "c4473dc5cfc5da53831d47c4c016d1c7de0a31e4fd94229e47ade569ef011a7b"
"type": "Photo::Click",
"user_id": 2640,
"photo_id": 255,
"ip": "100.101.102.103",
"created_at": "2011/04/08 17:09:40 -0700"
}

We currently have around 25 million records similar to this one, and are
adding 4-5 million more each month.

Is Riak appropriate for this use case? Are there any gotchas I need to
be aware of?

thank you,

-mike




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Recap for November 23 - 27

2011-11-28 Thread Aphyr

6) Q --- Are people running riak natively on osx (for development) or
running on a vm that matches production?  (from kenperkins via #riak)

 A --- Anyone? (We had a similar thread on the list several months
back about this but I figured it couldn't hurt to open it up to more
discussion.)


We've got a few devs who use riak on OSX for development here, but not 
heavily. ulimit appears to be a capricious liar on that platform, at 
least from the issues we've encountered. :/


I do most of our riak-facing development, and use Ubuntu and Debian as 
my everyday dev OS. No VMs. We deploy to Ubuntu server.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Client Pooling in python

2011-12-08 Thread Aphyr
I don't know about Python, but we've been attacking this problem in the 
Ruby client. You might find these useful:


Re-entrant threadsafe resource pooling:
https://github.com/seancribbs/ripple/blob/master/riak-client/lib/riak/client/pool.rb

Node configuration/error tracking:
https://github.com/seancribbs/ripple/blob/master/riak-client/lib/riak/client/node.rb

Client checking out conns ("backends") from pools:
https://github.com/seancribbs/ripple/blob/master/riak-client/lib/riak/client.rb

Methods of interest:
#http
#new_http_backend
#protobuffs
#new_protobuffs_backend
#recover_from
#choose_node

--Kyle

On 12/08/2011 11:13 AM, Eric Siegel wrote:

Hey everyone, I'm sure this has been asked before but I was wondering
what other people are doing for pooling clients using python?
I've noticed that HttpPoolTranport is deprecated, and there doesn't seem
to exist an analogous PBuffers version.
I could just go ahead and write my own, but I thought that something
must exist.

I'm a relative Riak novice, and as such, am probably not grasping some
of the difficulties involved in it implementation.

Eric



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: standby cluster experiment

2011-12-19 Thread Aphyr

On 12/09/2011 11:53 AM, John Loehrer wrote:

I am currently evaluating riak. I'd like to be able to do periodic
snapshots of /var/lib/riak using LVM without stopping the node.
According to a response on this ML you should be able to copy the
data directory for eleveldb backend.

http://comments.gmane.org/gmane.comp.db.riak.user/5202


If I cycle through each node and do `riak stop` before taking a
snapshot everything works fine. But if I don't shut down the node
before copying, I run into problems. Since I access the http
interface of the cluster through an haproxy load-balancer, once the
node turns off it is taken out of the pool almost immediately. But
for a millisecond or two before haproxy detects the node is down
there might be some bad responses. I can live with it and build
better retries into my client, but would rather avoid it if I can.


Haproxy has a standby system you can use to remove a node from rotation 
politely, allowing existing requests to finish. You can remove them from 
haproxy directly at the command line (or using http-check 
disable-on-404, but that doesn't really make sense for riak)


echo "disable server /" | socat stdio 
/etc/haproxy/haproxysock


... perform maintenance ...

echo "enable server /" | socat stdio /etc/haproxy/haproxysock

--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Absolute consistency

2012-01-05 Thread Aphyr

On 01/05/2012 11:44 AM, Tim Robinson wrote:

Ouch.

I'm shocked that is not considered a major bug. At minimum that kind of stuff 
should be front and center in their wiki/docs. Here I am thinking n 2 on a 3 
node cluster means I'm covered when in fact I am not. It's the whole reason I 
gave Riak consideration.

Tim


I think you may have this backwards. N=3 and 2 nodes would mean one node 
has 1 copy, and 1 node has 2 copies, of any given piece. For n=2 and 3 
nodes, there should be no overlap.


The other thing to consider is that for certain combinations of 
partition number P and node number N, distributing partitions mod N can 
result in overlaps at the edge of the ring. This means zero to n 
preflists can overlap on some nodes. That means n=3 can, *with the wrong 
choice of N and P*, result in minimum 2 machines having copies of any 
given key, assuming P > N.


There are also failure modes to consider. I haven't read the new key 
balancing algo, so my explanation may be out of date.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Absolute consistency

2012-01-05 Thread Aphyr

On 01/05/2012 12:12 PM, Tim Robinson wrote:

Thank you for this info. I'm still somewhat confused.

Why would anyone ever want 2 copies on one physical PC? Correct me if
I am wrong, but part of the sales pitch for Riak is that the cost of
hardware is lessened by distributing your data across a cluster of
less expensive machines as opposed to having it all one reside on an
enormous server with very little redundancy.

The 2 copies of data on one physical PC provides no redundancy, but
increases hardware costs quite a bit.

Right?


Because in the case you expressed shock over, the pigeonhole
principle makes it *impossible* to store three copies of information in
two places without overlap. The alternative is lying to you about the
replica semantics. That would be bad.

In the second case I described, it's an artifact of a simplistic but 
correct vnode sharding algorithm which uses the partion ID modulo node 
count to assign the node for each partition. When N is not a multiple of 
n, the last and the first (or second, etc, you do the math) partitions 
can wind up on the same node. If you don't use even multiples of n/N, 
the proportion of data that does overlap on one node is on the order of 
1/64 to 1/1024 of the keyspace. This is not a significant operational cost.


This *does* reduce fault tolerance: losing those two "special" nodes 
(but not two arbitrary nodes) can destroy those special keys even though 
they were stored with N=3. As the probability of losing two *particular* 
nodes simultaneously compares favorably with the probability of losing 
*any three* nodes simultaneously, I haven't been that concerned over it. 
It takes roughly six hours for me to allocate a new machine and restore 
the destroyed node's backup to it. Anecdotally, I think you're more 
likely to see *cluster* failure than *dual node* failure in a small 
distributed system, but that's a long story.


The riak team has been aware of this since at least Jun 2010 
(https://issues.basho.com/show_bug.cgi?id=228), and there are 
operational workarounds involving target_n_val. As I understand it, 
solving the key distribution problem is... nontrivial.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Absolute consistency

2012-01-05 Thread Aphyr

On 01/05/2012 12:53 PM, Tim Robinson wrote:

So with the original thread where with N=3 on 3 nodes. The developer
believed each node was getting a copy. When in fact 2 copies went to
a single node. So yes, there's redundancy and the "shock" value can
go away :) My apologies.

That said, I have no ability to assess how much data space that is
wasting, but it seems like potentially 1/3 - correct?

Another way to look at it, using the above noted case, is that I need
to double[1] the amount of hardware needed to achieve a single amount
of redundancy.

[1] not specifically, but effectively.


For the third time, no. Please read
http://wiki.basho.com/What-is-Riak%3F.html.

For 256 partitions, N=3:

PartNode
0   0
1   1
2   2
3   0
4   1
...
253 1
254 2
255 0

With n = 3, a key assigned to partition 0 will be stored on partition 0,
1, and 2.

Key PreflistNodes
0   0,1,2   0,1,2
1   1,2,3   1,2,0
...
253 253,254,255 1,2,0
254 254,255,0   2,0,0   <-- overlap!
255 255,0,1 0,0,1   <-- overlap!

Only 1/128 of the data will reside on only two nodes. 127/128 of the
data will be distributed to three nodes. No data resides on only one
node. Data loss requires the simultaneous destruction of:

a. Any three nodes
b. Node 0 and 1
c. Node 0 and 2

This is true regardless of the number of nodes, so long as n does not
evenly divide N AND you are not using the workaround I linked to in my
previous post. If you do either of those things (use 4, 8, 16, or 32 
nodes instead of 3, or use the n_val workaround), the distribution will 
be even.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Adding a new machine to a three node cluster cause partition handoff problems

2012-01-10 Thread Aphyr
There's a code snippet in riak 1.0.1 or 1.0.2 release notes which addresses 
this. Sorry can't find it for you, network here is useless. :(

Ivaylo Panitchkov  wrote:

>
>Hello All,
>
>We have a cluster of three machines (Debian 6.0, 4GB RAM, 
>riak_1.0.2-1_amd64.deb, n_val: 3) that serves an application for a 
>while. As we go to production soon added a fourth machine to the cluster 
>(exactly the same as the first three) yesterday. The partition handoff 
>began in the late afternoon and I had an impression that the transition 
>will not take too long as there are only few hundred IMPORTANT records 
>in the storage for the moment. Today in the morning checked the 
>situation again and realized the partition handoff still runs (or get 
>stuck). The Ownership Handoff is still the same since yesterday (at 
>least 19 hours till now). Any suggestions to fix the problem are welcome :-)
>
>REMARK: Replaced the IP addresses for security sake
>
>
># riak-admin ringready
>Attempting to restart script through sudo -u riak
>TRUE All nodes agree on the ring 
>['r...@yyy.yyy.yyy.yyy','r...@xxx.xxx.xxx.xxx','r...@aaa.aaa.aaa.aaa','r...@bbb.bbb.bbb.bbb']
>
>
># riak-admin transfers
>Attempting to restart script through sudo -u riak
>'r...@bbb.bbb.bbb.bbb' waiting to handoff 2 partitions
>'r...@aaa.aaa.aaa.aaa' waiting to handoff 2 partitions
>'r...@yyy.yyy.yyy.yyy' waiting to handoff 2 partitions
>
>
># riak-admin ring_status
>Attempting to restart script through sudo -u riak
>== Claimant 
>===
>Claimant: 'r...@xxx.xxx.xxx.xxx'
>Status: up
>Ring Ready: true
>
>== Ownership Handoff 
>==
>Owner: r...@xxx.xxx.xxx.xxx
>Next Owner: r...@yyy.yyy.yyy.yyy
>
>Index: 548063113999088594326381812268606132370974703616
>Waiting on: [riak_kv_vnode]
>Complete: [riak_pipe_vnode]
>
>Index: 1370157784997721485815954530671515330927436759040
>Waiting on: [riak_kv_vnode]
>Complete: [riak_pipe_vnode]
>
>---
>
>== Unreachable Nodes 
>==
>All nodes are up and reachable
>
>
># riak-admin member_status
>Attempting to restart script through sudo -u riak
>= Membership 
>==
>Status Ring Pending Node
>---
>valid 21.9% 25.0% 'r...@yyy.yyy.yyy.yyy'
>valid 28.1% 25.0% 'r...@xxx.xxx.xxx.xxx'
>valid 25.0% 25.0% 'r...@aaa.aaa.aaa.aaa'
>valid 25.0% 25.0% 'r...@bbb.bbb.bbb.bbb'
>---
>Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>
>
>-- 
>Ivaylo Panitchkov
>Software developer
>Hibernum Creations Inc.
>
>Ce courriel est confidentiel et peut aussi être protégé par la loi.Si vous 
>avez reçu ce courriel par erreur, veuillez nous en aviser immédiatement en y 
>répondant, puis supprimer ce message de votre système. Veuillez ne pas le 
>copier, l’utiliser pour quelque raison que ce soit ni divulguer son contenu à 
>quiconque.
>This email is confidential and may also be legally privileged. If you have 
>received this email in error, please notify us immediately by reply email and 
>then delete this message from your system. Please do not copy it or use it for 
>any purpose or disclose its content.
>
>
>___
>riak-users mailing list
>riak-users@lists.basho.com
>http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

2012-01-18 Thread Aphyr
https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTES.org

If partition transfer is blocked awaiting [] (as opposed to [kv_vnode] or 
whatever), There's a snippet in there that might be helpful.

--Kyle

On Jan 18, 2012, at 1:43 PM, Fredrik Lindström wrote:

> After some digging I found a suggestion from Joseph Blomstedt in an earlier 
> mail thread 
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-January/007116.html
> 
> in the riak console:
> riak_core_ring_manager:force_update().
> riak_core_vnode_manager:force_handoffs().
> 
> The symptoms would appear to be the same although the cluster referenced in 
> the mail thread does not appear to have search enabled,
> as far as I can tell from the log snippets. The mail thread doesn't really 
> specify which node to run the commands on so I tried both the new node and 
> the current claimant of the cluster.
> 
> Sadly the suggested steps did not produce any kind of ownership handoff.
> 
> Any helpful ideas would be much appreciated :)
> 
> /F
> 
> 
> From: riak-users-boun...@lists.basho.com [riak-users-boun...@lists.basho.com] 
> on behalf of Fredrik Lindström [fredrik.lindst...@qbranch.se]
> Sent: Wednesday, January 18, 2012 4:00 PM
> To: riak-users@lists.basho.com
> Subject: Pending transfers when joining 1.0.3 node to 1.0.0 cluster
> 
> Hi everyone,
> when we try to join a 1.0.3 node to an existing 1.0.0 (3 node) cluster the 
> ownership transfer doesn't appear to take place. I'm guessing that we're 
> making some stupid little mistake but we can't figure it out at the moment. 
> Anyone run into something similar?
> 
> Riak Search is enabled on the original nodes in the cluster as well as the 
> new node.
> Ring size is set to 128
> 
> The various logfiles do not appear to contain any errors or warnings
> 
> Output from riak-admin member_status
> = Membership 
> ==
> Status RingPendingNode
> ---
> valid  33.6% 25.0%'qbkp...@qbkpx01.ad.qnet.local'
> valid  33.6% 25.0%'qbkp...@qbkpx02.ad.qnet.local'
> valid  32.8% 25.0%'qbkp...@qbkpx03.ad.qnet.local'
> valid   0.0% 25.0%'t...@qbkpxadmin01.ad.qnet.local'
> ---
> 
> Output from riak-admin ring_status
> See attached file
> 
> Output from riak-admin transfers
> 't...@qbkpxadmin01.ad.qnet.local' waiting to handoff 10 partitions
> 'qbkp...@qbkpx03.ad.qnet.local' waiting to handoff 62 partitions
> 'qbkp...@qbkpx01.ad.qnet.local' waiting to handoff 63 partitions
> 
> 
> /F
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

2012-01-18 Thread Aphyr
Did you try riak_core_ring_manager:force_update() and force_handoffs() on the 
old partition owner as well as the new one? Can't recall off the top of my head 
which one needs to execute that handoff.

--Kyle

On Jan 18, 2012, at 2:08 PM, Fredrik Lindström wrote:

> Thanks for the response Aphyr.
> 
> I'm seeing Waiting on: [riak_search_vnode,riak_kv_vnode,riak_pipe_vnode] 
> instead of [] so I'm thinking it's a different scenario.
> It might be worth mentioning that the data directory on the new node does 
> contain relevant subdirectories but the disk footprint is so small I doubt 
> any data has been transferred.
> 
> /F
> From: Aphyr [ap...@aphyr.com]
> Sent: Wednesday, January 18, 2012 10:46 PM
> To: Fredrik Lindström
> Cc: riak-users@lists.basho.com
> Subject: Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster
> 
> https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTES.org
> 
> If partition transfer is blocked awaiting [] (as opposed to [kv_vnode] or 
> whatever), There's a snippet in there that might be helpful.
> 
> --Kyle
> 
> On Jan 18, 2012, at 1:43 PM, Fredrik Lindström wrote:
> 
>> After some digging I found a suggestion from Joseph Blomstedt in an earlier 
>> mail thread 
>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-January/007116.html
>> 
>> in the riak console:
>> riak_core_ring_manager:force_update().
>> riak_core_vnode_manager:force_handoffs().
>> 
>> The symptoms would appear to be the same although the cluster referenced in 
>> the mail thread does not appear to have search enabled,
>> as far as I can tell from the log snippets. The mail thread doesn't really 
>> specify which node to run the commands on so I tried both the new node and 
>> the current claimant of the cluster.
>> 
>> Sadly the suggested steps did not produce any kind of ownership handoff.
>> 
>> Any helpful ideas would be much appreciated :)
>> 
>> /F
>> 
>> 
>> From: riak-users-boun...@lists.basho.com 
>> [riak-users-boun...@lists.basho.com] on behalf of Fredrik Lindström 
>> [fredrik.lindst...@qbranch.se]
>> Sent: Wednesday, January 18, 2012 4:00 PM
>> To: riak-users@lists.basho.com
>> Subject: Pending transfers when joining 1.0.3 node to 1.0.0 cluster
>> 
>> Hi everyone,
>> when we try to join a 1.0.3 node to an existing 1.0.0 (3 node) cluster the 
>> ownership transfer doesn't appear to take place. I'm guessing that we're 
>> making some stupid little mistake but we can't figure it out at the moment. 
>> Anyone run into something similar?
>> 
>> Riak Search is enabled on the original nodes in the cluster as well as the 
>> new node.
>> Ring size is set to 128
>> 
>> The various logfiles do not appear to contain any errors or warnings
>> 
>> Output from riak-admin member_status
>> = Membership 
>> ==
>> Status RingPendingNode
>> ---
>> valid  33.6% 25.0%'qbkp...@qbkpx01.ad.qnet.local'
>> valid  33.6% 25.0%'qbkp...@qbkpx02.ad.qnet.local'
>> valid  32.8% 25.0%'qbkp...@qbkpx03.ad.qnet.local'
>> valid   0.0% 25.0%'t...@qbkpxadmin01.ad.qnet.local'
>> ---
>> 
>> Output from riak-admin ring_status
>> See attached file
>> 
>> Output from riak-admin transfers
>> 't...@qbkpxadmin01.ad.qnet.local' waiting to handoff 10 partitions
>> 'qbkp...@qbkpx03.ad.qnet.local' waiting to handoff 62 partitions
>> 'qbkp...@qbkpx01.ad.qnet.local' waiting to handoff 63 partitions
>> 
>> 
>> /F
>> 
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

2012-01-18 Thread Aphyr
Hmm. I can tell you that *typically* we see riak-admin transfers show 
many partitions awaiting transfer. If you run the transfers command it 
resets the timer for transfers to complete, so don't do it too often. 
The total number of partitions awaiting transfer should slowly decrease.


When zero partitions are waiting to hand off, then you may see 
riak-admin ring_status waiting to finish ownership changes. Sometimes it 
gets stuck on [riak_kv_vnode], in which case force-handoffs seems to do 
the trick. Then it can *also* get stuck on [], and then the long snippet 
I linked to does the trick.


So: give it 15 minutes, and check to see if fewer partitions are 
awaiting transfer. If you're eager, you can watch the logs for handoff 
messages or iptraf that sucker to see the handoff network traffic 
directly; it runs on a distinct port IIRC so it's easy to track.


--Kyle

On 01/18/2012 02:40 PM, Fredrik Lindström wrote:

I just ran the two commands on all 4 nodes.

When run on one of the original nodes the first
command(riak_core_ring_manager:force_update()) resultsin output like the
following in the console of the new node

23:20:06.928 [info] loading merge_index
'./data/merge_index/331121464707782692405522344912282871640797216768'
23:20:06.929 [info] opened buffer
'./data/merge_index/331121464707782692405522344912282871640797216768/buffer.1'
23:20:06.929 [info] finished loading merge_index
'./data/merge_index/331121464707782692405522344912282871640797216768'
with rollover size 912261.12
23:20:07.006 [info] loading merge_index
'./data/merge_index/730750818665451459101842416358141509827966271488'
23:20:07.036 [info] opened buffer
'./data/merge_index/730750818665451459101842416358141509827966271488/buffer.1'
23:20:07.036 [info] finished loading merge_index
'./data/merge_index/730750818665451459101842416358141509827966271488'
with rollover size 1132462.08
23:20:47.050 [info] loading merge_index
'./data/merge_index/513809169374145557180982949001818249097788784640'
23:20:47.054 [info] opened buffer
'./data/merge_index/513809169374145557180982949001818249097788784640/buffer.1'
23:20:47.055 [info] finished loading merge_index
'./data/merge_index/513809169374145557180982949001818249097788784640'
with rollover size 975175.67


riak_core_vnode_manager:force_handoffs() does not produce any output on
any console on any node besides "OK". No tasty handover log messages to
be found.

Furthermore I'm not sure what to make of the output from riak-admin
transfers:
't...@qbkpxadmin01.ad.qnet.local' waiting to handoff 62 partitions
'qbkp...@qbkpx03.ad.qnet.local' waiting to handoff 42 partitions
'qbkp...@qbkpx01.ad.qnet.local' waiting to handoff 42 partitions

Our second node (qbkpx02) is missing from that list. The output also
states that the new node (test) wants to handoff 62 partitions although
it is the owner of 0 partitions.

riak-admin ring_status lists various pending ownership handoffs, all of
them are between our 3 original nodes. The new node is not mentioned
anywhere.

I'm really curious about the current state of our cluster. It does look
rather exciting :)

/F

*From:* Aphyr [ap...@aphyr.com]
*Sent:* Wednesday, January 18, 2012 11:15 PM
*To:* Fredrik Lindström
*Cc:* riak-users@lists.basho.com
*Subject:* Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

Did you try riak_core_ring_manager:force_update() and force_handoffs()
on the old partition owner as well as the new one? Can't recall off the
top of my head which one needs to execute that handoff.

--Kyle

On Jan 18, 2012, at 2:08 PM, Fredrik Lindström wrote:


Thanks for the response Aphyr.

I'm seeing Waiting on:
[riak_search_vnode,riak_kv_vnode,riak_pipe_vnode] instead of [] so I'm
thinking it's a different scenario.
It might be worth mentioning that the data directory on the new node
does contain relevant subdirectories but the disk footprint is so
small I doubt any data has been transferred.

/F

*From:*Aphyr [ap...@aphyr.com]
*Sent:*Wednesday, January 18, 2012 10:46 PM
*To:*Fredrik Lindström
*Cc:*riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
*Subject:*Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTES.org
<https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTESorg>

If partition transfer is blocked awaiting [] (as opposed to [kv_vnode]
or whatever), There's a snippet in there that might be helpful.

--Kyle

On Jan 18, 2012, at 1:43 PM, Fredrik Lindström wrote:


After some digging I found a suggestion from Joseph Blomstedt in an
earlier mail thread
http://lists.basho.com/pipermail/riak-users_lists.basho.co

Re: Riak for eCommerce

2012-01-21 Thread Aphyr
Side question: dynamo exposes both partial and fully consistent reads. Does 
anyone know what the conflict semantics are? Last write wins? Actual mvcc?

Ahmed Al-Saadi  wrote:

>I suppose this speaks to DynamoDB's consistent read feature that Vishal 
>pointed out (though I believe statebox is more general). Thanks to both of you.
>
>Your link helped me find the following insight from Bob Ippolito's blog:
>"[for an eventually consistent data store,] you have to move your conflict 
>resolution from writes to reads."
>http://bob.pythonmac.org/archives/2011/03/17/statebox/
>
>--  
>Ahmed Al-Saadi
>Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
>On Friday, January 20, 2012 at 9:03 PM, Zheng Zhibin wrote:
>
>>  
>>  
>> Best regards,
>> Zheng Zhibin
>>  
>> ÔÚ 2012-1-21£¬ÉÏÎç1:01£¬Dmitry Demeshchuk > (mailto:demeshc...@gmail.com)> дµÀ£º
>>  
>> > Generally, using eventually consistent databases for e-commerce sounds
>> > too risky.
>> >  
>> > But I know that there was some e-commerce stealth startup using Riak
>> > for their needs (probably not for all the data though, I don't know
>> > any details). Amazon uses Dynamo, which is quite similar to Riak. So
>> > NoSQL can be used in this niche somehow.
>> >  
>> > Also, intuitively, most of the consistency problems might be avoided
>> > by setting all w/dw values to maximum (better set r to maximum too, of
>> > course).
>> >  
>> > The hardest task I see here is to organize transactions for the cases
>> > like "we remove a product from database and we should alter all the
>> > orders that contain this product".
>> >  
>>  
>> There is a lib which can help for this in some extent. 
>> github.com/mochi/statebox (http://github.com/mochi/statebox)
>>  
>> There is a Riak version as well: github.com/mochi/statebox_riak 
>> (http://github.com/mochi/statebox_riak)
>>  
>> Write for free, ops in read.
>> >  
>> > On Fri, Jan 20, 2012 at 5:43 PM, Ahmed Al-Saadi > > (mailto:thaterlang...@gmail.com)> wrote:
>> > > Hello:
>> > >  
>> > > After reviewing a few options in the NoSQL space, I am considering using
>> > > Riak for an e-commerce platform. I gather that atomicity (transactions) 
>> > > is
>> > > not supported while durability can be enforced per request (using dw=1 
>> > > or,
>> > > at least, w=?). In other words, for most non-critical
>> > > reads/writes, r/w can be optimized for availability while critical writes
>> > > must be committed to disk (or to "enough" nodes?), sacrificing 
>> > > availability
>> > > in the process.
>> > >  
>> > > Does this describe the state-of-affairs or am I missing something?
>> > >  
>> > > --
>> > > Ahmed Al-Saadi
>> > > Sent with Sparrow
>> > >  
>> > >  
>> > > ___
>> > > riak-users mailing list
>> > > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
>> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> > >  
>> >  
>> >  
>> >  
>> >  
>> > --  
>> > Best regards,
>> > Dmitry Demeshchuk
>> >  
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >  
>>  
>>  
>>  
>
>
>
>___
>riak-users mailing list
>riak-users@lists.basho.com
>http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is Riak a good solution for this problem?

2012-02-12 Thread Aphyr

On 02/12/2012 03:27 AM, Marco Monteiro wrote:

I'm considering Riak for the statistics of a site that is approaching
a billion page views per month. The plan is to log a little
information about each the page view and then to query that data.


Honestly, I wouldn't use stock Riak for this; the MR times will become 
prohibitive over billions of records. However, jrecursive has been 
solving a very similar set of problems at Showyou, and wrote Mecha, a 
solr/leveldb backend for Riak. Mecha provides fast, distributed querying 
on top of Riak's redundancy/distribution. There's a talk at the next 
Riak Meetup on the 23rd, and we should be releasing then as well.


http://www.meetup.com/San-Francisco-Riak-Meetup/events/51287272/

--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: A couple of questions about Riak

2012-02-16 Thread Aphyr



On 02/16/2012 01:07 AM, Jerome Renard wrote:

Hello,

I am really interested into Riak but I would like to know if my goals
can be achieved with for my project.

The use case is the following :

- I need to support 10 000 writes/second minimum. Object size will be
from 1kb to 5kb


Definitely. 10 SSDs should do it.


- I need to organize data in buckets. Bucket 'dc1' would store all my
data from data center one, bucket 'dc2' would store all my data from
data center two, etc. The size of a bucket is going to grow large
really quickly; (Would links be a more relevant alternative ?)


Yes; buckets are just prefixed namespaces for keys. You can have vast 
numbers of keys in a bucket. Links are something totally different.



- I need to search on these data : full-text search + facets. Searches
will most likely be date based range queries;


Riak_search and secondary indices may meet your needs, but read the docs 
carefully. You may also be interested in

http://www.meetup.com/San-Francisco-Riak-Meetup/events/51287272/


- Those data are meant to expire after a certain period of time, so I
will have to run large delete operations every week/month/year.


Yup, readily doable. Just be aware that listing keys in stock Riak can 
be expensive. Search or 2I may reduce that load.



- I can get a replication factor of 2 if needed instead of 3 by default;


Yup. Just set n_val in the bucket properties once and you're good.


- I need to get my disk space back when I remove data. This may sounds
odd to some users, but people who come from a MySQL/InnoDB
background know that removing large amounts of data does not mean you
will get your disk space back ;)


Bitcask and leveldb both reclaim space. Bitcask is log-structured so 
you'll see some (configurable) amount of dead space--in my 
continuous-compaction environment, roughly 15-30% dead bytes on top of 
the "real" dataset.



Based on the use case described below my questions are:

- I am sure Riak can achieve the required write speed. But is there
any hardware recommandations for storing Tb of quickly growing data ?


It sounds like your data might be largely immutable and write heavy. In 
that case I would try a small number of partitions per host, bitcask, 
huge amounts of spinning disk, and lots of RAM cache on top of that. 
Those writes will translate into big stripey writes on top of the disk. 
If you can afford SSDs that's the obvious option.


Be aware that bitcask does not support compression--but if your 
filesystem performs block-level compression then you should see 
excellent savings. The riak robject structure on disk is quite fluffy 
and compresses well. We see ~30% on snappy block-level compression in 
leveldb.



- Which storage backend would be the most relevant to me ? Bitcast or LevelDB ?


Leveldb supports secondary indexes, but I don't understand its 
performance characteristics across a variety of read/write loads. Maybe 
someone else can chime in?



- Does riak-search support facetted search ? (I would say yes but I
found no documentation in the wiki about that)



- Will it be a problem if I decide to run Riak on ZFS + compression enabled ?


I suspect it would work quite well. If you try it, please report back!


If you need any more details feel free to ask.

Thanks in advance for your feedback.

Best Regards,

--
Jérôme Renard
http://39web.fr | http://jrenard.info | http://twitter.com/jeromerenard

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: 1.1 upgrade woes

2012-02-22 Thread Aphyr
I also discovered MR issues during a rolling upgrade to 1.1.0 last 
night. We had so many MR errors that the 1.1 node crashed altogether, 
and I had to roll it back to 1.0.3. Basho support is working on that 
problem.


2012-02-22 00:56:16.429 [error] <0.1615.0> gen_server 
riak_pipe_vnode_master terminated with reason: no function clause 
matching 
riak_core_vnode_master:handle_call({return_vnode,{riak_vnode_req_v1,479555224749202520035584085735030365824602865664,{raw,#Ref<6584.0.7263.108930>,...},...}}, 
{<6604.7292.421>,#Ref<6604.0.6697.250987>}, 
{state,undefined,undefined,riak_pipe_vnode,undefined})


2012-02-22 00:56:16.431 [error] <0.1615.0> CRASH REPORT Process 
riak_pipe_vnode_master with 0 neighbours crashed with reason: no 
function clause matching 
riak_core_vnode_master:handle_call({return_vnode,{riak_vnode_req_v1,479555224749202520035584085735030365824602865664,{raw,#Ref<6584.0.7263.108930>,...},...}}, 
{<6604.7292.421>,#Ref<6604.0.6697.250987>}, 
{state,undefined,undefined,riak_pipe_vnode,undefined})


Anything like these messages in your logs?

--Kyle

On 02/22/2012 02:35 PM, heffergm wrote:

After upgrading to 1.1, we've had all kinds of MR issues (essentially all
failing). I'm seeing tons of these in the logs. Any ideas? We've rolled back
in the meantime:

2012-02-22 03:12:59.664 [error]<0.354.0>@riak_pipe_vnode:new_worker:763
Pipe worker startup failed:fitting was gone before startup

--
View this message in context: 
http://riak-users.197444.n3.nabble.com/1-1-upgrade-woes-tp3768092p3768092.html
Sent from the Riak Users mailing list archive at Nabble.com.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak for Messaging Project Question

2012-02-22 Thread Aphyr

On 02/22/2012 02:10 PM, char...@contentomni.com wrote:

1. Is Riak a good fit for this solution going up to and beyond 20
million users (i.e. terabytes upon terabytes added per year)?


The better question might be: what do you actually plan to do with that 
much data?



2. I plan to use 2i, which means I would be using the LevelDB backend.
Will this be reasonably performant for billions of keys added each year?

3. I'm using what I have here
(http://wiki.basho.com/Cluster-Capacity-Planning.html) as my guide for
capacity planning. I plan on using Rackspace Cloud Servers for this
specific project. Can I just keep adding servers as the size of my data
grows?!


Riak clusters have a functional upper limit of around a hundred nodes; 
inter-node traffic dominates at that level. That said, at that scale 
it's gonna be WAY cheaper to run your own HW. If I were speccing a 
cluster for 16tb of data/year:


Data warehouse (huge dark pool, variable latency tolerable): six, say, 
Sun Thumpers running ZFS on BSD, 48 TB maximum capacity per box, ~60 TB 
of usable storage in Riak at N=4, assuming ZFS parity as well. Start 
small, add drives progressively. 24 rack units total.


Hot cluster (small dark pool, IO latencies critical): Six nodes with 1x 
10.24 TB FusionIO Octals apiece, 15TB usable immediately, add additional 
FIO cards to each node as you grow.


The answer isn't scale out or scale up. You can scale *diagonally* and 
get the benefits of both.


As you grow, rotate in new nodes with bigger hard drives, more memory, 
more processors, more bandwidth. Drive upgrades are cheap: just shut 
down the box, install the new HW, and bring it back again. Riak is 
*good* at this; the other nodes will bring the original box up to speed 
when it's back. We've rotated in new drives on our six node cluster 3 
times now, and are about to do it again.


When virtualized HW becomes the bottleneck (and I guarantee that is much 
sooner than you think), spread out onto physical nodes. When commodity 
spinning disks are too slow (will also happen sooner than you think), 
rotate in SSDs. Then exotic solid-state HW. At every stage you can add 
more nodes with the same class of HW, but there will come an equilibrium 
point when bigger is cheaper than more.



4. From the guide mentioned in 3 above, it appears I will need about 400
[4GbRAM 160GbHDD] servers for 20 million users (assuming an n_val of 4).
This means I would need to add 20 servers annually for each million
active users I add. Is it plausible to have an n_val of 4 for this many
servers?! Wouldn't going higher just mean I'd have to add many more
servers needlessly?!


You can choose whatever n_val you like, up to the number of servers you 
have. Data volume scales linearly with n_val, so it takes 33% more space 
for n_val 4 over n_val 3.



5. Should I put all my keys in one bucket (considering I'm using 2i,
does it matter)?!


It doesn't really matter. Buckets are just a part of the key: riak keys 
are actually [bucket, key]. Use them for namespacing.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Questions on configuring public and private ips for riak on ubuntu

2012-03-04 Thread Aphyr

ssh -NL 8098:localhost:8098 your.vps.com

--Kyle

On 03/04/2012 09:55 PM, Tim Robinson wrote:

Yeah, I read your blog post when it first came out. I liked it.

I appreciate the warning, but practically speaking I'm really just not worried 
about it. It's a test environment on an external VPS that no one knows the info 
for. Demo to the company means show image/content-type load, JSON via browser 
with proper indentation, and Riak Control. SSH isn't going to do that for me.

I'm using public data for the testing. I can blow the whole thing away any time.

Aside from warnings does anyone want to help with the question.

Thanks,
Tim


-Original Message-
From: "Aphyr"
Sent: Sunday, March 4, 2012 10:41pm
To: "Tim Robinson"
Subject: Re: Questions on configuring public and private ips for riak on ubuntu

I can get SSH access over Riak's HTTP and protobufs interfaces in about
five seconds, and can root a box shortly after that, depending on
kernel. Please don't do it. Just don't.

http://aphyr.com/posts/224-do-not-expose-riak-to-the-internet
http://aphyr.com/posts/218-systems-security-a-primer

--Kyle

On 03/04/2012 09:38 PM, Tim Robinson wrote:

Right now I am just loading data for test purposes. It's nice to be able to do 
some benchmarks against the private network (which is @1Gbit/s)... while being 
able to poke a hole in the firewall when I want to do a test/demo.

Tim

-Original Message-
From: "Alexander Sicular"
Sent: Sunday, March 4, 2012 9:15pm
To: "Tim Robinson"
Cc: "riak-users@lists.basho.com"
Subject: Re: Questions on configuring public and private ips for riak on ubuntu

this is a "Very Bad" idea. do not expose your riak instance over a public ip 
address. riak has no internal security mechanism to keep people from doing very bad 
things to your data, configuration, etc.

-Alexander Sicular

@siculars

On Mar 5, 2012, at 12:43 AM, Tim Robinson wrote:


Hello all,

I have a few questions on networking configs for riak.

I have both a public ip and a private ip for each riak node. I want Riak to 
communicate over the private ip addresses to take advantage of free bandwidth, 
but I would also like the option to interface with riak using the public ip's 
if need be (i.e. for testing / demo's etc).

I'm gathering that the way people to this is by setting up app.config to use ip 
"0.0.0.0" to listen for all ip's. I'm also gathering vm.args needs to have a 
unique name in the cluster so I would need to use the hostname for the -name option (i.e. 
r...@www.fake-node-domain-name-1.com).

My hosts file would contain:

127.0.0.1  localhost.localdomain  localhost
x.x.x.xwww.fake-node-domain-name-1.commynode-1


where x.x.x.x is the public ip not the private.

This is where I start to get lost.

As it sits, if I attempt to join using the private ip's i will get the 
unreachable error - yet I can telnet connect to/from the equivalent nodes.

So I could add a second IP to the hosts file, but since I need to keep the 
public one as well, how is that riak is going to use the private ips for gissip 
ring, hinted hand-off, ... etc etc.

There's obviously some networking basics I am missing.

Any guidance from those of you who have done this?

Thanks.
Tim





___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




Tim Robinson



Tim Robinson



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




Tim Robinson



Tim Robinson



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Need quick fix for > lineno":466, "message":"SyntaxError: syntax error", "source":"()

2012-03-06 Thread Aphyr

On 03/06/2012 08:11 AM, Ivaylo Panitchkov wrote:


Hello guys,

We are in production and noticed ALL of the M/R requests failing right
after a bulk delete with the following response returned back:

lineno":466,"message":"SyntaxError: syntax error","source":"()

The problem is now persistent even if the delete operation was done
awhile ago.
I googled to find a solution and realized this is a bug not fixed yet.
We have a cluster of 4 machines with riak (1.0.2 2011-11-17) Debian
x86_64 installed.
I need a quick fix ASAP if someone could help me out.


For us, that means you need to check 
robject.values[i].metadata['X-Riak-Deleted'] before trying to parse the 
object as JSON.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak problems on Ubuntu (novice user)

2012-03-06 Thread Aphyr
First, for security reasons, don't run Riak on a public IP. Access it 
through an application proxy, a VPN, or an SSH tunnel if you need.


Second, when you change the name of a node, you need to run

riak-admin reip r...@my.old.ip r...@my.new.ip

... to update the ring file with the new node name. Only change the name 
and reip when the node is not running. You can also just delete 
/var/lib/riak/ring if you don't care about the data on the node.


--Kyle

On 03/06/2012 02:02 PM, Curtis Taylor wrote:

All,

Ubuntu versions 10.10 and 11.10.

I'm new to Riak, but I have two machines with public IP addresses. I
installed started to install Riak from source but work blocks ports for
Git so I installed via Debian package:
wget http://downloads.basho.com/riak/riak-1.1.0/riak_1.1.0-1_i386.deb

After installing on both machines I was able to do:
riak start
riak-admin status
riak stop

with no issues. I wanted to cluster these two machines machines together
so I found I need to edit vm.args and app.config. In vm.args I changed
-name riak@127.0.0.1 
to
-name riak@1.2.3.4  (my public IP)

and in app.config I changed under riak_core
{http, [ {"127.0.0.1", 8098 } ]},
to
{http, [ {"0.0.0.0", 8098 } ]},

and under riak_kv
{pb_ip, "127.0.0.1" },
to
{pb_ip, "0.0.0.0" },

I think try
riak start
Riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.

I've adjusted WAIT_FOR_ERLANG, but to no avail. Now, an interesting
point is that if I stop riak and change everything back to default, riak
will start successfully but riak-admin tools fail (saying node not
running). I have no idea where I'm going wrong. I'm not completely sure
on the riak cookie but it's the same on both machines. I've experience
the same behavior on 4 machines. Two on a private network, two on public
network.

Any help is appreciated.
Thanks






___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Listing buckets causes nodes to misbehave

2012-03-15 Thread Aphyr
Yup, list-keys and list-buckets does this for us too, since Riak 0.14. 
Bitcask, 6 nodes, physical hardware, 1024 partitions, 100-300 million 
keys with n_val 3.


--Kyle

On 03/15/2012 11:17 AM, Armon Dadgar wrote:

We are currently running Riak 1.1 in production, using LevelDB
with snappy compression and yesterday we ran into a strange
issue where listing the buckets via:

$ curl "http://east-riak-001:8098/riak?buckets=true";

Doing this caused curl to hang indefinitely, and the target node to
freak out.
Symptoms included a massive increase in system load (2-3x), FSM timings
exploding
from ~30-50 msec to 500 - 5000 msec, and most of the APIs becoming
totally unresponsive.

We ran this on a few nodes on the cluster, which basically brought
everything to a halt. Our solution was to just reboot all the nodes, at
which point things returned to normal.

Curious if anybody else has experienced this.

Best Regards,

Armon Dadgar



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Adoption - What can we do better?

2012-04-21 Thread Aphyr

On 04/21/2012 09:07 AM, Les Mikesell wrote:

On Fri, Apr 20, 2012 at 5:00 PM, Kyle Kingsbury  wrote:



OK, so how about Statebox? We use timestamps to ameliorate the GC problem so
long as a given time window. Our hosts are running NTP so it's all cool, ya?
Wrong. One of your hosts is not running NTP. Clock desync issues are fucking
*ubiquitous*, sadly, and you have to be willing to accept, say, losing all
conflicting writes from a client under some clock skew circumstances.


How hard is it for a cluster-aware application to tell that the clocks
are out of sync?   You probably can't do better than NTP at fixing it,
but why even continue to run in that state?   If all it takes is a
good clock for reliability, let's build good clocks.


You are a network packet. It is very dark. You are likely to be eaten by 
a partition.


Joking aside, many applications do try to do broken clock detection, but 
correcting the error automatically depends on application semantics. On 
top of that, it can be impossible to detect in partitioned situations. 
There are also cases where you *want* to be available in cases where 
time sync is impossible; consider, for example, mobile clients which may 
make requests long after the user interaction. A *logical* consistency 
is paramount... but I digress, haha.


It is *OK* to accept the clock issue sometimes, especially with the 
understanding that it provides probabilistic constraints on conflict 
resolution. Being right 80% of the time can be better than 50% of the 
time. But you *have* to be willing to understand and accept the risks; 
it's something most people have no idea exists.


That's what I think Riak adoption is missing; a huge portion of devs 
just don't understand the implications of Riak's HA approach: logical 
clocks, causality bounds, and synchronization boundaries. I'm hoping to 
change some of that with Meangirls, but I'm not convinced it's enough. 
Reid Draper told me a little while ago that he wanted devs to extend 
CRDTs and logical clocks all the way into their APIs and mobile clients, 
and I think he might be right.


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com