On 06/26/2012 07:24 AM, Eric Anderson wrote:
Hey all,
Question about EC2 (or scale in general): i'm building a decent cluster,
to handle 15-20k inserts/s and 5-10k gets per second. (for a rough idea
of what I'm doing). I've been playing with a 15-node cluster of
m2.xlarge systems, but I am wonde
TLDR: hey, what about using extendible hashing for bitcask keydirs?
Constant-time lookups with two disk seeks end-to-end, much larger
keyspaces than currently supportable, but without the total rehashing
cost. Also avoids the O(log N) insertion/search/deletion costs of b-trees.
At length:
I'v
I'm trying to track some basic metrics so we can plan for cluster
capacity, monitor transfers, etc. Figured this might be of interest to
other riak admins. Apologies if my erlang is nonidiomatic, I'm still
learning. :)
#!/usr/bin/env escript
%%! -name riakstatuscheck -setcookie riak
main([])
Newline parsing is broken in JSON2.js shipped with Riak. Drop a more
recent version of JSON2.js in the directory referred to by js_source_dir
in app.config's riak_kv section, e.g.
{js_source_dir, "/etc/riak/js/"}
and reload the node.
--Kyle
On 03/23/2011 01:31 PM, Michael Ossareh wrote:
Gre
Yes, it's because the JSON2.js included with Riak has a bug around
newlines. Dropping an updated JSON2.js in your js_source_dir will fix it.
--Kyle
On 04/15/2011 12:20 AM, Matt Ranney wrote:
I'm using Riak Search 0.14.0-1, and it seems like JSON docs with
otherwise legal \r characters in them
I actually had a question about that page. Why is it that when there
is a conflict we can only get the conflicting versions of the data?
If I'm going to try to resolve the conflict intelligently, I really
want the common ancestor as well so that I can try to do a 3-way
merge.
Good call. If an a
I'd like to chime in here by noting that it would be incredibly nice if
the client could distinguish between a record that is missing because
the vnode is unavailable, and a record that truly does not exist. My
consistency-repair system was running during partition handoff,
determined that seve
Any system which presents plaintext is vulnerable; it is simply a matter
of complexity. Once you've compromised a layer which processes
plaintext, all layers below it are essentially moot, as the Playstation
network recently discovered.
The only scheme which will defend against data compromise
HA! I just ran into this limit!
5,000 links like /tablet_users/whoever riaktag=following will cause your
user objects to take upwards of 4 seconds to return, on our beefy
cluster, with a variance of ~3 seconds. (Over HTTP, ruby riak_client.) I
had to move them into the JSON body, which makes t
The key filter still has to walk the entire keyspace, which will make
fetches an O(n) operation as opposed to O(1).
--Kyle
On 05/05/2011 03:35 PM, Andrew Berman wrote:
I was curious if anyone has any thoughts on what is more performant,
links or key filters in terms of secondary links. For ex
I suppose if you had a really small number of keys in Riak it might be
faster, but you're almost certainly better off maintaining a second
object and making the lookup constant time. Here's an example:
https://github.com/aphyr/risky/blob/master/lib/risky/indexes.rb
--Kyle
On 05/05/
Since buckets are essentially key prefixes, I think buckets will
probably not make this faster. Maybe one of the riak-search experts
knows why your search is taking so long.
--Kyle
On 05/11/2011 12:00 PM, alexeypro wrote:
Generally the problem there that I may end up with N buckets, where N i
In the exciting event that your application or riak goes rogue and
deletes everything, bitcask will allow you to recover amazing,
life-saving amounts of data from its log-structured format.
ASK ME HOW I KNOW. :-P
Uh, more typically, I've heard that FS-level snapshots of /var/lib/riak
or simpl
I was writing a new mapreduce query to look at users over time, and ran
it over a single user in production. After that, other mapreduce jobs
over users started returning results from my new map phase, some of the
time. After five minutes of this, I had to restart every node in the
cluster to g
g
vclocks.
https://github.com/aphyr/bitcask-ruby
--Kyle
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Agreed. In fact, jrecursive pointed out to me last week that vnode
operations are synchronous. That means that when you call list-keys, not
only is it going to take a long time (right now upwards of 5 minutes) to
complete, but while each vnode is returning its list of keys *it blocks
any other
In software products that have containment metaphors, how often do we
see a function return a cached value rather than the up-to-date
value, especially for products that manage shared data?
Pretty frequently, actually. Every Ruby ORM I've used caches
associations by default. Even when listing is
On 06/02/2011 01:49 PM, Sylvain Niles wrote:
Is there any open source code out there using erlang functions via
ripple or rest that I can look at to see a fully functional flow?
I made a lot of stupid mistakes getting this to work. Leaving add_paths
commented out, not using arrays in arguments
Riak can't use the vclock for conflict resolution on a fresh object,
i.e. one without a vclock. Deletes are writes. You should use get or
reload before writing to help Riak sequence your writes correctly.
On top of this, Riak has some weirdness around very quick sequences of
deletes/writes due
Bitcask-ruby now implements the keydir and knows how to use hintfiles.
It's now capable of loading 62,000 keys (from a 535mb bitcask) in 1.5
seconds. We're using this at Showyou to list keys and run various
analytics without blocking Riak.
https://github.com/aphyr/bitcask-ru
2. You could write
x = Klass.find(key)
if x.nil?
x = Klass.new
x.save
end
get_or_new doesn't save, so perhaps Klass.find(key) || Klass.new(key)
Risky (another Ruby Riak model layer) offers Klass.get_or_new(key)
3. control the bucket on which the document is stored/retri
The mr_queue is a bitcask, so you should expect it to grow monotonically
until compaction. The file size is not an indication of the number of
pending jobs. You can read the contents using any bitcask utility. For
example, using https://github.com/aphyr/bitcask-ruby:
$ bitcask --no-riak /var
One of the vnodes on one of my hosts has a *lot* of bitcask data/hint
files, and makes a new one every 3 minutes. In the logs, I get
=ERROR REPORT 30-Jun-2011::20:24:14 ===
Failed to merge
["/var/lib/riak/bitcask/794976964837219653749465284983368790965189869568",
[],
...HUGE LIST OF DATA
class TcWeb::Root
include Ripple::Document
bucket_name = 'roots' # or tcweb_roots, whatever
...
end
On 07/01/2011 08:25 AM, Thomas Fee wrote:
I'm currently using Ripple with the application name prepended to the
typename in an effort to artificially create a namespace for app, to not
colli
/2011 09:28 PM, Jeff Pollard wrote:
Thanks to some help from Aphyr + Sean Cribbs on IRC, we narrowed the
issue down to us having several multiple-hundred-megabyte sized
documents and one 1.1 gig document. Deletion of those documents has now
kept the cluster running quite happily for 3+ hours now,
Have you checked that your bitcask maximum file size is small enough?
Bitcask will only merge *inactive* files, so if your active file limit
is 500MB and your active file is 320, you won't merge.
--Kyle
On 08/25/2011 06:43 AM, raghwani sohil wrote:
I have deleted all the keys from all buckets
Hello all,
Wanted to say that John Mullerleile and I will be giving a talk on
high-volume, high-availability technologies at http://showyou.com. We'll
discuss building an application with Riak, Solr, distributed queuing,
and metrics, and present some new open-source tools we've built to
tackl
Yep, that's partition 0. Partitions are spaced evenly around the hash
range, which is [0, 2^160) and cover hashes starting at their name. If
you have two partitions, they'll be {0, N/2}. three partitions: {0, N/3,
2N/3}.
--Kyle
On 09/14/2011 10:56 AM, Jeremy Raymond wrote:
In /var/lib/riak/b
We've been over this several times on riak-users, which suggests to me a
blog post might help. I'll try to draft something.
On 09/30/2011 11:00 AM, Kyle Quest wrote:
This is a pretty common situation with the NoSQL databases. They have
no security and the standard answer is that it's your job t
On 09/30/2011 01:28 PM, Kyle Quest wrote:
Having separate nodes for reads and writes provides an opportunity for
better isolation and control even when the requests are forwarded to
different vnodes...
I humbly suggest this is a bad idea. Varying behavior between nodes
a.) is a headache to c
On 09/30/2011 02:50 PM, Kyle Quest wrote:
I'm not here to define a perfect infrastructure for securing NoSQL
databases and Riak and go into implementation details... It's not my
intention because I simply don't have time to dedicate to this big
project and it's impossible to come up with a perfec
As promised, a brief overview of designing secure applications, with a
quick rundown of how you might expose Riak to the world.
http://aphyr.com/journals/show/systems-security-a-primer
--Kyle Kingsbury
___
riak-users mailing list
riak-users@lists.bas
Option C: Deploy your web servers with a list of hosts to connect to.
Have the clients fail over when a riak node goes down. Lower latency
without sacrificing availability. If you're using protobufs, this may
not be as big of an issue.
--Kyle
On 10/04/2011 02:04 PM, O'Brien-Strain, Eamonn wro
Internode times in our datacenter at SL are indistinguishible from
loopback; TCP/IP processing dominates. HTTP, on the other hand, involves
either in-depth connection management/multiplexing, or TCP/IP
setup/teardown latency at either end of a request. In read-write heavy
apps, protobufs outper
re.
Not how the wire is managed.
(and with that said, the Python client managed the wire in the most
horrible ways imaginable for the HTTP Client; I've since fixed that on
my branch)
On Oct 4, 2011 11:37 PM, "Aphyr" mailto:ap...@aphyr.com>> wrote:
> Internode times in our data
Like it says, the request that you submitted isn't JSON. MR functions
belong in the source attribute of the JSON document, not floating
outside it.
--Kyle
On 10/05/2011 05:19 PM, urvi wrote:
I am trying to use this fuctntion to get the highest number from given
date. My map is working fine bu
On 10/07/2011 04:23 PM, Tim Robinson wrote:
I just read the Satebox page you linked as an example and have a hard
time thinking I would want to use this. While automation is always nice,
the overhead is an unnecessary burden. Since Clojure provides
coordinated/transactional data structures, it's
Yes; front Riak with a proxy which performs the appropriate access
control. Note that you'll have to ban (or have a javascript/erlang
interpreter to identify/contain incorrect access) mapreduce through this
proxy as well.
--Kyle
On 10/17/2011 10:39 AM, Simon Chen wrote:
Hi folks,
Is it poss
With Eric Redmond's permission*, I am releasing a proof-of-concept
exploit which uses Riak's mapreduce API to execute arbitrary Erlang code
and obtain shell access.
http://aphyr.com/journals/show/do-not-expose-riak-directly-to-the-internet
Please stop doing this.
--Kyle
* Eric (http://crudco
On 10/19/2011 04:36 PM, Nate Lawson wrote:
You can call 'os:cmd' to shell out from a M-R job. You can't do that directly
in MySQL.
No, but you can do other interesting things. Writing binaries to the
filesystem will get you quite a ways. :)
--Kyle
__
failsafes.**
The answer is not to ban mapreduce (or distributed code execution of any
kind). The answer is to avoid running code from people in dark alleys on
a system you care about.*** :)
--Kyle
* http://twitter.com/#!/aphyr/status/124275497042591746/photo/1/large
** e.g. http://www.cvedetai
prepare to stop using key
filters, bucket listing, and key listing early.
Our current strategy is to store the keys in Redis, and synchronize them
with post-commit hooks and a process that reads over bitcask. With
ionice 3, it's fairly low-impact. https://github.com/aphyr/bitcask-ruby
may be
I was waiting for Basho to write an official notice about this, but it's
been three days and I really don't want anyone else to go through this
shitshow.
1.0.1 contains a race condition which can cause vnodes to crash during
partition drop. This crash will kill the entire riak process. On our
Yep, two buckets: one for users, one for users_by_id. Or, you could use
secondary indexes, and not worry about keeping the ids in sync.
http://basho.com/blog/technical/2011/09/14/Secondary-Indexes-in-Riak/
For ID generation, UUIDs will work, SHA1s will work, or you could use an
ID generation s
One easy way to solve an atomic a->b relationship in an eventually
consistent way is to require the existence of both A and B in order to
consider the write valid, to write A first, and use a message queue to
retry writes until both A and B exist. There are other approaches for
agreement betwee
On 11/02/2011 10:40 AM, Justin Karneges wrote:
Thanks everyone for these replies (and also Aphyr, off-list). It has helped me
confirm my suspicions and sounds like I'm on the right track.
For one of my keys, I am doing sort of a manual "last write wins" by having
the reader s
The fastest thing is probably to generate conflicts right below the
conflict resolution system. If you are worried you can't predict the
conflicts at all, go ahead and perform multiple reads and writes at
overlapping times. No need for excessive load; controlling the timing
alone should be suff
Depending on whether you think it will be more efficient to store the
graph or its dual, consider each node a vertex and write the adjacency
list as a part of its data. You can store whatever weights, etc. you
need on the edges there.
Don't use links; they're just a thin layer on top of mapred
On 11/18/2011 11:50 AM, Jeroen van Dijk wrote:
And I also didn't include the riak user list for this reply:
On Fri, Nov 18, 2011 at 7:04 PM, Aphyr mailto:ap...@aphyr.com>> wrote:
Depending on whether you think it will be more efficient to store
the graph or its dual, co
On 11/20/2011 05:19 AM, Catalin Constantin wrote:
The connection between servers is 10MBytes / sec not 10Mbit / sec.
Are you sure? To my knowledge almost no ethernet gear runs at 10 MB/s.
It's almost always 10, 100, 1000, or 1 Mb/s.
It may be your n_val. If it's the default (3), one of y
On 11/20/2011 12:14 PM, Catalin Constantin wrote:
I am 100% sure the transfer rate is 10MBytes / second. This is not the
problem.
In ten years of network administration I have never encountered an
ethernet device with a wire rate of 10 MBps. I have, however,
encountered frequent confusion ove
On 11/20/2011 01:34 PM, Catalin Constantin wrote:
To make it simple. No more networking. Just one node (with n = 1) and
local tests.
The producing of data is a simple CSV file read (ruled out too cause
this is fast).
Read from the same disk? If you're interleaving every write with a read
from
For limited mapreduce (where you know the keys in advance) riak would be
a fine choice. 500 million keys, n val 3 is readily achievable on
commodity hardware; say four nodes with 128GB SSDs.
If large-scale mapreduce (more than a few hundred thousand keys) is
important, or listing keys is criti
uld love to hear more about Mecha if you're willing to share. Feel
free to contact me off-list.
thanks again,
-mike
On 11/28/11 2:24 PM, Aphyr wrote:
For limited mapreduce (where you know the keys in advance) riak would be
a fine choice. 500 million keys, n val 3 is readily achievable on
comm
6) Q --- Are people running riak natively on osx (for development) or
running on a vm that matches production? (from kenperkins via #riak)
A --- Anyone? (We had a similar thread on the list several months
back about this but I figured it couldn't hurt to open it up to more
discussion.)
We
I don't know about Python, but we've been attacking this problem in the
Ruby client. You might find these useful:
Re-entrant threadsafe resource pooling:
https://github.com/seancribbs/ripple/blob/master/riak-client/lib/riak/client/pool.rb
Node configuration/error tracking:
https://github.com/se
On 12/09/2011 11:53 AM, John Loehrer wrote:
I am currently evaluating riak. I'd like to be able to do periodic
snapshots of /var/lib/riak using LVM without stopping the node.
According to a response on this ML you should be able to copy the
data directory for eleveldb backend.
http://comments.gm
On 01/05/2012 11:44 AM, Tim Robinson wrote:
Ouch.
I'm shocked that is not considered a major bug. At minimum that kind of stuff
should be front and center in their wiki/docs. Here I am thinking n 2 on a 3
node cluster means I'm covered when in fact I am not. It's the whole reason I
gave Riak
On 01/05/2012 12:12 PM, Tim Robinson wrote:
Thank you for this info. I'm still somewhat confused.
Why would anyone ever want 2 copies on one physical PC? Correct me if
I am wrong, but part of the sales pitch for Riak is that the cost of
hardware is lessened by distributing your data across a clu
On 01/05/2012 12:53 PM, Tim Robinson wrote:
So with the original thread where with N=3 on 3 nodes. The developer
believed each node was getting a copy. When in fact 2 copies went to
a single node. So yes, there's redundancy and the "shock" value can
go away :) My apologies.
That said, I have no
There's a code snippet in riak 1.0.1 or 1.0.2 release notes which addresses
this. Sorry can't find it for you, network here is useless. :(
Ivaylo Panitchkov wrote:
>
>Hello All,
>
>We have a cluster of three machines (Debian 6.0, 4GB RAM,
>riak_1.0.2-1_amd64.deb, n_val: 3) that serves an appli
https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTES.org
If partition transfer is blocked awaiting [] (as opposed to [kv_vnode] or
whatever), There's a snippet in there that might be helpful.
--Kyle
On Jan 18, 2012, at 1:43 PM, Fredrik Lindström wrote:
> After some digging I found a sug
Did you try riak_core_ring_manager:force_update() and force_handoffs() on the
old partition owner as well as the new one? Can't recall off the top of my head
which one needs to execute that handoff.
--Kyle
On Jan 18, 2012, at 2:08 PM, Fredrik Lindström wrote:
> Thanks for the respon
he owner of 0 partitions.
riak-admin ring_status lists various pending ownership handoffs, all of
them are between our 3 original nodes. The new node is not mentioned
anywhere.
I'm really curious about the current state of our cluster. It does look
rather exciting :)
/F
--
Side question: dynamo exposes both partial and fully consistent reads. Does
anyone know what the conflict semantics are? Last write wins? Actual mvcc?
Ahmed Al-Saadi wrote:
>I suppose this speaks to DynamoDB's consistent read feature that Vishal
>pointed out (though I believe statebox is more
On 02/12/2012 03:27 AM, Marco Monteiro wrote:
I'm considering Riak for the statistics of a site that is approaching
a billion page views per month. The plan is to log a little
information about each the page view and then to query that data.
Honestly, I wouldn't use stock Riak for this; the MR
On 02/16/2012 01:07 AM, Jerome Renard wrote:
Hello,
I am really interested into Riak but I would like to know if my goals
can be achieved with for my project.
The use case is the following :
- I need to support 10 000 writes/second minimum. Object size will be
from 1kb to 5kb
Definitely. 1
I also discovered MR issues during a rolling upgrade to 1.1.0 last
night. We had so many MR errors that the 1.1 node crashed altogether,
and I had to roll it back to 1.0.3. Basho support is working on that
problem.
2012-02-22 00:56:16.429 [error] <0.1615.0> gen_server
riak_pipe_vnode_master t
On 02/22/2012 02:10 PM, char...@contentomni.com wrote:
1. Is Riak a good fit for this solution going up to and beyond 20
million users (i.e. terabytes upon terabytes added per year)?
The better question might be: what do you actually plan to do with that
much data?
2. I plan to use 2i, whic
ngs does anyone want to help with the question.
Thanks,
Tim
-Original Message-
From: "Aphyr"
Sent: Sunday, March 4, 2012 10:41pm
To: "Tim Robinson"
Subject: Re: Questions on configuring public and private ips for riak on ubuntu
I can get SSH access over Riak'
On 03/06/2012 08:11 AM, Ivaylo Panitchkov wrote:
Hello guys,
We are in production and noticed ALL of the M/R requests failing right
after a bulk delete with the following response returned back:
lineno":466,"message":"SyntaxError: syntax error","source":"()
The problem is now persistent even
First, for security reasons, don't run Riak on a public IP. Access it
through an application proxy, a VPN, or an SSH tunnel if you need.
Second, when you change the name of a node, you need to run
riak-admin reip r...@my.old.ip r...@my.new.ip
... to update the ring file with the new node name.
Yup, list-keys and list-buckets does this for us too, since Riak 0.14.
Bitcask, 6 nodes, physical hardware, 1024 partitions, 100-300 million
keys with n_val 3.
--Kyle
On 03/15/2012 11:17 AM, Armon Dadgar wrote:
We are currently running Riak 1.1 in production, using LevelDB
with snappy compres
On 04/21/2012 09:07 AM, Les Mikesell wrote:
On Fri, Apr 20, 2012 at 5:00 PM, Kyle Kingsbury wrote:
OK, so how about Statebox? We use timestamps to ameliorate the GC problem so
long as a given time window. Our hosts are running NTP so it's all cool, ya?
Wrong. One of your hosts is not running
74 matches
Mail list logo