from:"Justin Sheehy"

Re: What kind of protocol are used between Riak nodes?

2012-05-28 Thread Justin Sheehy

Hi, Alek.

On May 28, 2012, at 1:40 PM, Alek Morfi wrote:

> What kind of protocol is used betwwen Riak nodes to communicate. Because if 
> all Riak nodes are located in the same cluster (LAN network scale) there is 
> no problem.
> But when Riak nodes are located on different clusters which are connected 
> through Internet, there are some limitations. Because some ISPs only allow 
> communicating by HTTP and SMTP protocol and I am wondering how Riak nodes can 
> communicate over the Internet.

Within a single Riak cluster, nodes communicate with each other using the 
Erlang distribution protocol. There are a number of reasons within Riak's 
design -- this just being one of them -- why spreading a Riak cluster across a 
wide area is not recommended.

The Riak Enterprise system (http://basho.com/products/riak-overview/) uses an 
entirely different protocol for managing long-haul communication, and also uses 
a different methodology. In that system we do not spread a single cluster 
widely, but rather create a topology of one cluster per datacenter, with each 
of those connected to each other.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak as Binary File Store

2012-05-29 Thread Justin Sheehy

Hi, Praveen.

Nothing about what you have said would cause a problem for Riak. Go for it!

Justin



On May 29, 2012, at 8:36 AM, Praveen Baratam  wrote:

> Hello Everybody!
> 
> I have read abundantly over the web that Riak is very well suited to store 
> and retrieve small binary objects such as images, docs, etc.
> 
> In our scenario we are planning to use Riak to store uploads to our portal 
> which is a Social Network. Uploads are mostly images with maximum size of 2 
> MB and typical size ranges between few KBs to few 100 KBs.
> 
> Does this usage pattern fit Riak? What are the caveats if any?
> 
> Thank you!
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Atomicity of if_not_modified?

2013-01-04 Thread Justin Sheehy

On Jan 4, 2013, at 1:25 PM, Les Mikesell wrote:

> And, doesn't every description of riak behavior have to include the
> scenario where the network is partitioned and updates are
> simultaneously performed by entities that can't contact each other?
> If it weren't for that possibility, it could just elect a master and
> do real atomic operations.

Yes, absolutely.

There are no atomic compare-and-set operations available from Riak, regardless 
of headers and R/W values.

Conditional HTTP requests are present because they are "free" due to 
Webmachine, and they are sometimes useful, but should not be seen as 
semantically very different from the client doing a read itself to decide 
whether to write.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Atomicity of if_not_modified?

2013-01-06 Thread Justin Sheehy


On Jan 3, 2013, at 11:44 AM, Kaspar Thommen wrote:

> Can someone confirm this? If it's true, what exactly is the purpose of 
> offering the if_not_modified flag?

Yes, I confirmed this earlier in this thread:

http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-January/010672.html

-Justin



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Stopping/Starting Riak.

2013-01-12 Thread Justin Sheehy

On Jan 12, 2013, at 10:26 AM, Kevin Burton wrote:

> I noticed that I have no problem with ‘sudo /etc/init.d/riak stop’. But, when 
> I try to start the process with ‘sudo /etc/init.d/riak start’ I am met with a 
> prompt for a password. What is the password? I don’t recall setting a 
> password.

That is sudo, not Riak, asking for your password. You should use the same 
password that you use to log in to that machine.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

mailing list headers (was Re: riak cluster suddenly became unresponsive)

2013-03-19 Thread Justin Sheehy

Hi, Ingo.

On Mar 19, 2013, at 10:41 AM, Ingo Rockel wrote:

> and the riak-users mailer-daemon should really set a "reply-to"…

Most email client programs have two well-understood controls for replies, one 
for "reply (to sender)" and one for "reply to all."

We are not going to make one of them broken.

-Justin




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: two-node cluster for riak?

2013-04-18 Thread Justin Sheehy

Hi, Michael.

Your spidey-sense is absolutely correct.

Recall for a moment that Riak by default will store 3 copies of everything. 
This means that in a two-node configuration any given value will be stored once 
on one node and twice on the other. Not only does this mean a whole lot of 
wasted work, it removes much of the safety and availability that people look to 
get from Riak. If one node goes down, then for half of your keys you have lost 
a majority of their replicas.

On only two nodes, you can't really get the kind of fault-tolerance Riak 
provides... with Riak or any other software.

-Justin

On Apr 18, 2013, at 11:26 AM, Michael Forrester wrote:

> Greetings Everyone,
> I am not sure if this is the right forum to ask this question, but here goes.
> 
> We are currently running a six-node cluster in Amazon AWS.   There has been 
> some talk by our architects of going to a two-node configuration using 
> SSD-backed instances with super fast hardware, but for some reason this is 
> triggering " this is not correct, but I don't remember why" spidey sense.  
> From my understanding, it is best to run riak with 5 nodes or at least N +2 
> and that a two-node cluster (even though the hardware will be way faster) 
> will not satisfy that.   
> 
> Any loose suggestions about how to approach this?   I am open to the two 
> ultrafast nodes... I am not sure how to put riak on them to work in a 
> fault-tolerant way.   
> 
> Articles, dirty limericks, and soliloquies are all appreciated.  
> 
> -- 
>  Michael Forrester
>  Director of Infrastructure
>  WorthPoint Corporation
> 
> 404.996.1470 O)
> 404.939.6499 C)
> 
> 
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: write value reality check

2013-06-28 Thread Justin Sheehy

Hi, Louis-Philippe.

With a 2-node cluster and N=3, each value will be written to disk a total of 
three times: twice on one node, once on the other. (The W setting has no effect 
on the number of copies made or hosts used.) That behavior might seem a bit 
strange, but it's a strange configuration to run Riak on only two machines 
while asking it to store data on three of them.

The standard settings and behavior of Riak are generally optimized for non-tiny 
clusters, and make much more sense when there are at least five machines.

I hope this helps with your understanding.

-Justin

On Jun 28, 2013, at 10:54 AM, Louis-Philippe Perron wrote:

> So if I get you right and extrapolate with the replication documentation 
> page, can I say that on a 2 nodes cluster, with a bucket set to N=3 and 
> W=ALL, my writes would be written 3 times to disk? (and with no guarantee to 
> be on different nodes)?
> 
> thanks! 
> 
> On Wed, Jun 26, 2013 at 8:17 PM, Mark Phillips  wrote:
> Hi Louis-Philippe 
> 
> There are no dumb questions. :)
> 
> On Wednesday, June 26, 2013, Louis-Philippe Perron wrote:
> Hi Riak people!
> Here is a dumb question, but anyway I want to clear this doubt out:
> 
> What happens when a bucket has a W quorum value higher than the N number of 
> nodes?
> are writes to disk multiplied?
> 
> 
> Precisely. For example, if you run a one node Riak cluster on your dev 
> machine you'll be writing with a N val of 3 and W of 2 by default. In other 
> words, Riak will always attempt to satisfy the W value regardless of physical 
> node count. 
> 
> Hope that helps. 
> 
> Mark 
> twitter.com/pharkmillups  
>  
> thanks!
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Help with local restore for dev enviroment

2013-07-10 Thread Justin Sheehy

Hi, Mark.

You've already received a little advice generally so I won't pile on that part, 
but one thing stood out to me:

> My client has sent me a backup from one of their cluster nodes. bitcask 
> data,. rings and config.

Unless I'm misunderstanding what you're doing, what you're working on will not 
get you the data from the whole cluster, but the fraction of the data that was 
stored on the one node that you have a backup from. Just a warning, in case you 
hadn't realized this.

-Justin



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: What is the purpose of "rel" links?

2013-07-22 Thread Justin Sheehy

Hi, Age.

The Link header in HTTP as used by Riak is defined by RFC 5988. In the Link 
Relation Type registry (http://tools.ietf.org/html/rfc5988#section-6.2.2) you 
can see that the relation type "up" refers to a parent document in a hierarchy 
of documents. In Riak, this means the bucket a key is in.

These are not Riak's own links, but rather an additional use of the Link header 
form which may be useful to some clients.

I hope that this helps.

-Justin

On Jul 12, 2013, at 1:52 PM, Age Mooij  wrote:

> Hi
> 
> I've been looking at links and link walking and I noticed that Riak very 
> often returns a special type of link with rel="up" instead of a riaktag, 
> which is illegal for users to create.
> 
> What is the purpose of this link? (beyond the reasonably obvious "this key 
> belongs to bucket X). Why was it added?
> Are there other "rels" than "up"? 
> Can they be followed through link walking?
> 
> This behavior is not documented anywhere that I (or Google) could find.
> 
> Age

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Funky List-Id headers in sent messages from this list

2013-11-08 Thread Justin Sheehy


The mailman host that is used to manage the list was moved last month, so 
that's probably the source of the change.

-Justin



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: no access logs by default?

2011-03-01 Thread Justin Sheehy

Hi, Ryan.

On Tue, Mar 1, 2011 at 8:07 PM, Ryan Zezeski  wrote:

> Is this intentional?  It seems like odd default behavior.

Most databases, including Riak, do not write to a file every time you
do a GET, SELECT, or other query as appropriate.

This is because the additional disk I/O of an access log imposes a
performance cost that many do not wish to pay.  As you note, it can be
turned on -- but we believe that by default production users generally
are happier with it off and do not expect such a human-readable log
for database accesses.

I agree that the way to turn it on should be clearly documented.  I
would add that we should make sure that the documentation warns people
not to turn it on except in testing/debugging scenarios.

Best,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak search - Lucene

2011-03-02 Thread Justin Sheehy

Hi, Joshua.

On Wed, Mar 2, 2011 at 6:26 PM, Joshua Partogi  wrote:

> I am trying to picture the relationship between Riak and Lucene [or
> how Riak interacts with Lucene], which makes Riak search.

This is a very easy relationship to picture, as there is no such
interaction.  :-)

Riak Search does not use Lucene or Solr.  It provides a very similar
interface to those search systems in order to ease the transition for
developers, but is an independent piece of software from top to
bottom.  Indexes are stored in Riak Search's own storage engine,
queries are parsed by Riak Search's parser, and so on.

The closest thing there is to such a relationship is that you can (but
do not need to) use the same text analyzer libraries in Riak Search
that you use in Lucene.

I hope that this helps with your understanding.

Best regards,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: A script to check bitcask keydir sizes

2011-03-24 Thread Justin Sheehy

Hi, Greg.

On Thu, Mar 24, 2011 at 10:17 AM, Greg Nelson  wrote:
> Wouldn't it be the common case that
> there are relatively few buckets?  And so wouldn't it save a lot of memory
> to keep a reference to an interned bucket name string in each entry, instead
> of the whole bucket name?

One reason this isn't done is that bitcask is an independent
application, used-by rather than part-of Riak.  It's just a local kv
store, and knows nothing of higher-level concepts like buckets.
Another reason is that there are also users with very many buckets in
use, a situation that makes the proposed solution uncomfortable.

In cases where there are truly few buckets and one knows it would stay
that way, one could plausibly modify riak_kv_bitcask_backend (the part
of Riak that talks to Bitcask) to use a bitcask per bucket on each
vnode instead of a single bitcask per vnode.  One downside of that
approach would be that if the number of buckets did grow then the file
descriptor consumption would be large and the node-wide I/O profile
might be much worse as well.

Everything has tradeoffs.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak vs riak_core

2011-03-30 Thread Justin Sheehy

Hi, Mike.

On Wed, Mar 30, 2011 at 5:46 PM, Mike Oxford  wrote:

> I thought I understood Riak, then I ran across the fact that riak_core was
> split out separately.
> When would you use riak_core that you wouldn't use Riak?

Good question.

Riak Core is the distributed systems center that Riak is built around.
 Riak Core is not a standalone database, and in fact by itself it
doesn't do data storage or even much of anything at all from the point
of view of a client application.

You use Riak to store, query, and retrieve your data.

You use Riak Core to build something shaped a bit like Riak.

Another way of looking at this is that Riak Core is a bit more
abstract, providing mechanisms for techniques such as vector clocks,
gossip, and other useful parts of the servers in a robust and scalable
system.  Riak, the database, builds on that core by adding a
client-facing storage and retrieval protocol, storage engines for
placing data on disk, and so on.

I hope that this helps to clarify matters.  If not, or even if you
just have additional questions, please ask.

Best regards,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Load question

2011-04-12 Thread Justin Sheehy

Hi, Runar.

On Tue, Apr 12, 2011 at 3:22 AM, Runar Jordahl  wrote:

> It would be helpful if a wiki page (under Best Practices) was created
> to discuss various load balance configurations. I am also wondering if
> a Riak client could use strategy (2), like Dynamo clients can.

There is not currently any client that uses strategy #2 of partition-awareness.

To make it practical, we would need to extend the client-facing
protocol so that an incoming client could ask to be redirected to an
"ideal" incoming node.  This is quite doable, though would have the
downside of making such clients more complex and thus possibly more
fragile.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Bitcask vs innostore, again

2011-04-28 Thread Justin Sheehy

Hi, Dmitry.

I will try to reply to some of the questions you raised about bitcask.

On Thu, Apr 7, 2011 at 12:30 AM, Dmitry Demeshchuk  wrote:

> Now being considered as the main Riak storage.

It's not just being considered, it is the main Riak storage.  We are
very confident in bitcask's quality and it has been the default
storage engine now for some time.  Some people may of course still
choose innostore for various reasons but at Basho we believe that
Bitcask will better suit the needs of the majority of users.

> I've been having myself some problems with
> bitcask previously (running out of file descriptors, bad merges) and
> heard that some people periodically try to migrate from innostore to
> bitcask, and stick to innostore, keeping disappointing in bitcask.

We honestly don't hear much of any real problems with bitcask.  It is
true that depending on your setup riak can quickly run out of file
descriptors if you haven't set your ulimit properly, but that is
easily fixed.  (and is also true under innostore, just in slightly
different scenarios)

I am not sure what you mean by bad merges or any failed migrations --
I'd need to hear more details to reply to that part.

> What I haven't heard about bitcask yet is any production success
> stories. Which storage does Wikia use, for example? Or Vibrant Media?

I will leave it to each individual user to describe any details of
their own production configuration as that is not our privilege to
disclose.  However, I can certainly say that the majority of
production deployments are running bitcask.  There are a few notable
exceptions, certainly -- but bitcask is the typical storage engine for
Riak in production these days.  This certainly includes a number of
businesses with the volume and duration you described.

Others might share their anecdotes; what I can provide is an aggregate
view.  And from that perspective we are very happy with the
performance and stability that bitcask's known users are experiencing.

Best regards,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: A function as an input for map/reduce

2011-05-05 Thread Justin Sheehy

Hi, Mikhail.

On Tue, May 3, 2011 at 5:55 PM, Mikhail Sobolev  wrote:

>   Is there more information about "it can through a few keys at a time,
>   and the map/reduce chain would go ahead and start doing the
>   processing on whatever keys it gets as soon as it gets them, it does
>   not have to wait for the whole list of that function" (@ ~9:54 in the
>   video)?  What I'm concerned here is about a chain of
>   map/map/map/reduce/reduce phases.   How the processing is actually
>   performed?  What are the synchronization points?

The "map" part of the MapReduce programming paradigm is not only
inherently parallel, it also does not impose a point of order on the
overall dataflow and thus does not introduce a concurrency barrier.
In practical terms this means that individual data items can be
processed as soon as they arrive, and the results can be immediately
pushed on to the next phase of the overall job without waiting for all
other data to make it through the map.

The "reduce" part does not have this pleasant property, as that phase
is present in order to perform exactly the kinds of operations (such
as counting) that do require waiting.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: A function as an input for map/reduce

2011-05-07 Thread Justin Sheehy

Hi, Mikhail.

On Thu, May 5, 2011 at 5:15 PM, Mikhail Sobolev  wrote:

> Thank you for the description.  I now wonder if it's possible for a
> map-function instead of returning the whole list of results, do
> something that Riak would take as "ah! another map result, let's do pass
> it to the next phase"?

It is quite possible in Riak to have a map phase followed by another
map phase.  You simply have to declare the job as having those phases,
each with their map function.

The way you showed it wouldn't quite work, as it is the return value
-- not a side effect -- that a map function passes on to the following
phase.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Make Riak use different Erlang version than that included in the .deb package

2011-05-12 Thread Justin Sheehy

Hi, Jeremy.

If you build Riak from source, you'll end up with Riak using the
version of Erlang that you used to build it.  With a pre-packaged
version, it will use the Erlang that was used to make the packages.

Riak will be moving to a newer Erlang in upcoming releases, by the way.

-Justin



On Thu, May 12, 2011 at 4:00 PM, Jeremy Raymond  wrote:
>
> I'm using Riak installed from riak_0.14.0-1_amd64.deb but am having a
> problem with an Erlang reduce function I wrote because it used
> calendar:iso_week_number/1 which isn't available in R13B04 which is bundled
> with the .deb package. Is there an easy way to configure Riak to use a
> different Erlang install (say installed at /usr/local/lib/erlang)?
> - Jeremy
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Production Backup Strategies

2011-05-13 Thread Justin Sheehy

Hi, Mike.

Assuming that the cluster is using the default storage engine
(bitcask) then the backup story is straightforward. Bitcask only ever
appends to files, and never re-opens a file for writing after it is
closed.  This means that your favorite existing server filesystem
backup mechanism will Just Work.

Other means exist, but that is the simplest and often the best.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Production Backup Strategies

2011-05-16 Thread Justin Sheehy

Hi, Jeremy.

On Sat, May 14, 2011 at 2:45 PM, Jeremy Raymond  wrote:

> So just backing up the files from separate nodes works? There won't be
> inconsistencies in the data say if all the nodes had to be restored?

That's right, it works.  :-)

Inconsistencies due to modifications that occur between the moments
two different nodes are backed up will fixed by anti-entropy
mechanisms such as read-repair.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Issues with capacity planning pages on wiki

2011-05-25 Thread Justin Sheehy

Hi, Anthony.

There are really three different things below:

1- reducing the minimum overhead of the {Bucket, Key} encoding when
riak is storing into bitcask

2- reducing the size of the vector clock encoding

3- reducing the size of the overall riak_object structure and metadata

All three of these are worth doing.  The reason they are the way they
are now is that the initial assumptions for most Riak deployments was
of a high enough mean object size that these few bytes per object
would proportionally be small noise -- but that's just history and not
a reason to avoid improvements.

In fact, preliminary work has been done on all three of these.  It
just hasn't yet been such a high priority that it got pushed through
to the finish.  One tricky part with all three is backward
compatibility, as most production Riak clusters do not expect to need
a full stop every time we want to make an improvement like these.

Solving #1, by the way, isn't really in bitcask itself but rather in
riak_kv_bitcask_backend.  I can take a swing at that (with backward
compatibility) shortly.  I might also be able to help dig up some of
the old work on #2 that is nearly a year old, and I think Andy Gross
may have done some of what's needed for #3.

With less words: I agree, all this should be made smaller.

And don't let this stop you if you want to jump ahead and give some of it a try!

-Justin

On Wed, May 25, 2011 at 1:50 PM, Anthony Molinaro
 wrote:

> Anyway, things make a lot more sense now, and I'm thinking I may need
> to fork bitcask and get rid of some of that extra overhead.  For instance
> 13 bytes of overhead to store a tuple of binaries seems unnecessary, it's
> probably better to just have a single binary with the bucket size as a
> prefix, so something like
>
> <>
>
> That way you turn 13 bytes of overhead to 2.
>
> Of course I'd need some way to work with old data, but a one time migration
> shouldn't be too bad.
>
> It also seems like there should be some way to trim down some of that on
> disk usage.  I mean 300+ bytes to store 36 bytes is a lot.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak doesn't use consistent hashing.

2011-05-26 Thread Justin Sheehy

Hi, Greg.

Thanks for your thoughtful analysis and the pull request.

On Thu, May 26, 2011 at 1:54 AM, Greg Nelson  wrote:

> However, the skipping bit isn't part of
> Riak's preflist calculation.  Instead, nodes claim partitions in such a way
> as to be spaced out by target_n_val, to obviate the need for skipping.

A fun bit of history here: once upon a time, Riak's claiming worked in
the same way as described by Amazon, with "skipping" and all.  We
noticed that this approach caused a different set of operational
difficulties when hinted handoff due to node outages was occurring at
the same time as a membership change.  That prompted changes to the
claim algorithm, which we still consider an area deserving of active
improvement.

Multiple people will be reading, analyzing, and testing your work to
contribute to this improvement.  We very much appreciate your efforts,
and want to make sure that we incorporate them in the best possible
way.

Thanks,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak locking and out of memory

2011-05-26 Thread Justin Sheehy

Hi, Ron.

On Thu, May 26, 2011 at 4:33 PM, Ron Yang  wrote:

> On the macbook I looped across 400meg files using bash and curl to
> upload them as documents into a bucket:

There are other details in your post that I might comment on, but I
will focus on the main point.

What you describe here simply will not work.  Single documents in Riak
at that size are going to cause problems.  There is an interface atop
Riak ("Luwak") which can handle such things just fine, if large file
storage is your main use case.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: A script to check bitcask keydir sizes

2011-06-08 Thread Justin Sheehy

On Thu, Mar 24, 2011 at 1:51 PM, Nico Meyer  wrote:

> The bigger concern for me would be the way the bucket/key tuple is
> serialized:
>
> Eshell V5.8  (abort with ^G)
> 1> iolist_size(term_to_binary({<<>>,<<>>})).
> 13
>
> That's 13 bytes of overhead per key were only 2 bytes is needed with
> reasonable bucket/key length limits of 256 bytes each. Or if that is not
> enough, one could also use a variable length encoding, so bucket/keys
> can be arbitrarily large and the most common cases (less then 128 bytes)
> still only use 2 bytes of overhead.

I've made a branch of bitcask that effectively does this.  It uses 3
bytes per record instead of 13, saving 10 bytes (both in RAM and on
disk) per element stored.

The tricky thing, however, is backward compatibility.  There are many
Riak installations out there with data stored in bitcask using the old
key encoding, and we shouldn't force them all to do a very costly
full-sweep of their existing data in order to get these savings.  When
we sort out the best way to manage a smooth upgrade, I would happily
push out the smaller encoding.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Pruning (merging) after storage reaches a certain size?

2011-06-08 Thread Justin Sheehy

Hi, Steve.

Check out this page: 
http://wiki.basho.com/Bitcask-Configuration.html#Disk-Usage-and-Merging-Settings

Basically, a "merge trigger" must be met in order to have the merge process 
occur.  When it does occur, it will affect all existing files that meet a 
"merge threshold."

One note that is relevant for your specific use: the expiry_secs parameter will 
cause a given item to disappear from the client API immediately after expiry, 
and to be cleaned if it is in a file already being merged, but will not 
currently contribute toward merge triggers or thresholds on its own if not 
otherwise "dead".

-Justin

On Jun 7, 2011, at 4:29 PM, Steve Webb wrote:

> Hello there.
> 
> I'm loading a 2-node (1GB mem, 20GB storage, vmware VMs) riaksearch cluster 
> with the spritzer twitter feed.  I used the bitcask 'expiry_secs' to expire 
> data after 3 days.
> 
> I'm curious - I'm up to about 10GB of storage and I'm guessing that I'll be 
> full in 3-4 more days of ingesting data.  I have no idea if/when a merge will 
> run to expire the older data.
> 
> Q: Is there a method or command to force a merge at any time?
> Q: Is there a way to run a merge when the storage size reaches a specific 
> threshold?
> 
> - Steve
> 
> --
> Steve Webb - Senior System Administrator for gnip.com
> http://twitter.com/GnipWebb
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Pruning (merging) after storage reaches a certain size?

2011-06-13 Thread Justin Sheehy

Hi, Steve.

The key to your situation was in my earlier email:

One note that is relevant for your specific use: the expiry_secs
parameter will cause a given item to disappear from the client
API immediately after expiry, and to be cleaned if it is in a file
already being merged, but will not currently contribute toward
merge triggers or thresholds on its own if not otherwise "dead".

That is, bitcask wasn't originally designed around the expiry-centric
way of removing old data, and data that has simply expired (but not
actively been deleted) will not be counted as garbage toward
thresholds or triggers at this time.  It will be cleaned up in a
merge, but will not contribute toward causing the merge in the first
place.  In a use case where you only add items and never actually
delete anything, a merge will never be dynamically triggered.

It is plausible that we could add some expiry-statistics measurement
and triggering to bitcask, but today that's the state of things.  You
could manually trigger merges, but that currently requires a bit of
Erlang.

I hope that this helps.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak Ruby Client Thread Safe?

2011-06-15 Thread Justin Sheehy

Hi, Keith.

It is not safe to share a single Riak client instance across multiple
client-facing threads.

Riak's conflict detection mechanisms will be misled by that sort of
sharing.  Luckily, the client is quite lightweight so you shouldn't
have to worry about the cost of doing it right.

-Justin



On Wed, Jun 15, 2011 at 2:05 PM, Keith Bennett
 wrote:
> Hi, all.  Is the Ruby Riak::Client thread safe?  I'm wondering if it's safe 
> to share a single Riak::Client instance across all threads in an application. 
>  I might run the app in JRuby, by the way.
>
> Are there any pros and cons to sharing a single client you can offer?
>
> An obvious pro is that it saves some memory, but probably an insignificant 
> amount.
>
> Thanks,
> Keith
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Benchmarks of backends

2011-06-21 Thread Justin Sheehy

Hi, Anthony.

Most people using Riak today use either Bitcask or Innostore, as I suspect you 
know. Bitcask has excellent performance, but the limitation that you are aware 
of with a hard limit on number of keys per unit of available RAM. Innostore 
does not have that limitation, but is much harder to achieve equivalent 
performance on.

You've noticed that multiple people (including Basho's own Dizzy and also the 
estimable Paul Davis) have produced wrappers for LevelDB, and indeed we are 
currently evaluating this as another alternative storage engine behind Riak.  
We will be posting some performance thoughts on LevelDB shortly, and generally 
it looks promising.  The main blocker at this point is portability; we would 
like for the backend to run well on all of Riak's existing main platforms.

Expect more from us on this soon. The short answer is that if you have too many 
keys for bitcask, the answer today is usually Innostore but soon might be 
LevelDB instead.

Best,

-Justin

On Jun 17, 2011, at 7:12 PM, Anthony Molinaro wrote:

> Hi,
> 
>  I'm wondering if anyone has done any testing with regards to memory
> usage of various backends.  After recent emails about the large overhead
> of bitcask keydir indexes, and by comparing with my current production
> nodes. I find that the overhead per key ends up being too large for
> small keys.
> 
>  So I'm in the market for a new backend, and was wondering if anyone
> out there has done any measurements on memory overhead per key, and
> access times.
> 
> I'm also wondering if there are any backends floating out there I haven't
> found. I've done some google searches to come across
> 
>  https://github.com/krestenkrab/riak_btree_backend
>  https://github.com/cstar/riak_redis_backend
> 
> but I'm assuming there might be others.
> 
> Also, I figure it would be interesting to understand the overhead for
> the built in backends and innnostore and possibly look at other stores
> I've found which seem to have erlang wrappers like
> 
> LevelDB:
>  https://github.com/basho/e_leveldb
>  https://github.com/davisp/erleveldb
> Tokyo Cabinet:
>  https://github.com/rabbitmq/toke
> Berkeley DB:
>  https://github.com/krestenkrab/bets
> 
> So anyone know anything about these backends or other k/v stores in terms
> of memory versus disk for large datasets?
> 
> The thing prompting this is a cassandra cluster with about 14 billion
> entries (7 billion with replication factor of 2), which uses 60 machines.
> I was trying to determine how many bitcask backed machines it would take
> to store this data and it ends up being about 150.  This is mostly because
> of the 84 bytes of overhead per key (43 bytes by calculations determined
> on this list a few weeks ago, another 41 by measuring my current production
> setup).  Even with the keys of 17 bytes, thats 101 bytes of overhead,
> so just wondering if there's anything better.
> 
> Anyway, I'm trying to get some hardware to run basho_bench with and will
> try out some different things, but if anyone has done any of this work
> already it might be interesting to know.
> 
> Thanks,
> 
> -Anthony
> 
> -- 
> 
> Anthony Molinaro   
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: LevelDB driver

2011-07-04 Thread Justin Sheehy

Hi, Jonathan.

On Mon, Jul 4, 2011 at 9:42 AM, Jonathan Langevin
 wrote:

> I've seen users show concern of Bitcask's space usage overhead. How does that
> compare against LevelDB?

Bitcask doesn't have much in the way of disk space "overhead" unless
you mean that the space used by deleted or overwritten values is not
reclaimed until after a merge. In that way LevelDB is similar since
space used by deleted and overwritten items is reclaimed as they are
moved into older "levels" of the DB. The behavior here is not
identical, but similar in concept.

By way of comparison, InnoDB imposes about a 2x space overhead cost on
many common datasets but the overhead is usually fairly static.

> If using a Level backend, what advantages do we lose of Bitcask? ls 
> replication &
> availability an issue at all?

The functionality provided by Riak above the storage engines (such as
replication and system-wide availability) are generally not impacted
by your choice of storage engine.

There are two main things you would lose today:

1 - latency
2 - stability

The first of these is fundamental: for many usage patterns Bitcask
will have a latency advantage over LevelDB due to being able to
guarantee that it will never perform more than a single disk seek per
operation.

The second is just about the relative immaturity of LevelDB: we have
not yet seen LevelDB in production environments for an extended amount
of time as we have with Bitcask. Anyone using it now as a Bitcask
replacement should realize that they are on the leading edge and
taking the usual risks that come with adopting new software. That
said, we expect LevelDB to do well over time as one of the alternative
Riak storage engines.

The main reason to use LevelDB under Riak would be if your number of
keys is huge and thus the RAM consumption of Bitcask would make it
unsuitable. That is, we expect people to use LevelDB in the same
situations that they might previously have chosen Innostore as their
storage engine.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: LevelDB driver

2011-07-04 Thread Justin Sheehy

On Mon, Jul 4, 2011 at 10:33 AM, Jonathan Langevin
 wrote:

> Thanks Justin for the helpful response :-)

Happy to help.

> Can you define what you would consider "huge" regarding # keys?

A bit depends on the details (such as key size) but generally the
tipping point is somewhere near ten million keys per GB of RAM.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: LevelDB driver

2011-07-12 Thread Justin Sheehy

Hi, Phil.

I might have caused a little confusion. I mentioned, but perhaps
didn't sufficiently emphasize, that the benchmark comparing LevelDB to
InnoDB was not a benchmark of Riak at all, but just directly talking
to the storage engines in order to look at the feasibility of doing
more with LevelDB.

That is why there is no mention of how many nodes or any such thing:
it was a one-machine test of embedded storage engines. The data was
generated by basho_bench during the tests, initially using the
sequential_int_gen generator and then using the pareto generators for
subsequent access.

As LevelDB becomes more fully supported as a backend for Riak, we will
certainly publish directions and examples for configuration.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How much memory for 20GB of data?

2011-07-14 Thread Justin Sheehy

Hi, Maria.

In addition to what others have said, I would note that (at least) the
following issues matter quite a bit for such planning:

- how many items the data is broken up into
- how large the keys will be (especially if they are very large due to
embedded structure)
- what storage engine ("backend") is in use
- how many machines are in the cluster
- the N-val, or how many replicas are being stored (default is 3)

If you know those things, then you can make a more meaningful estimation.

I hope that this helps.

-Justin




On Thu, Jul 14, 2011 at 6:02 PM, Maria Neise
 wrote:
> Hey,
> I would like to store 20GB of data with Riak. Does anyone know how
> much memory Riak would need for that?
>
> Cheers,
> Maria
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How much memory for 20GB of data?

2011-07-14 Thread Justin Sheehy

Do you perhaps mean disk space instead of memory?

If so, and if you have left the N-val at the default of 3, then you
will need at least 60G of space before any other overhead is accounted
for.

-Justin



On Thu, Jul 14, 2011 at 7:04 PM, Maria Neise
 wrote:
> Hey,
> thank you a lot for your hints.
> I have 2000 records à 1KB. The key is a string like
> "user123456789". I am using the default backend bitcask. There is just
> one machine in the cluster and I didn't change the N-val. I already
> tried to insert the 20GB of data, but 40GB of memory were obviously
> not enough, because only 700 records were inserted. So I thought
> mybe 150GB should be enough?
>
> Cheers,
> Maria
>
> 2011/7/15 Justin Sheehy :
>> Hi, Maria.
>>
>> In addition to what others have said, I would note that (at least) the
>> following issues matter quite a bit for such planning:
>>
>> - how many items the data is broken up into
>> - how large the keys will be (especially if they are very large due to
>> embedded structure)
>> - what storage engine ("backend") is in use
>> - how many machines are in the cluster
>> - the N-val, or how many replicas are being stored (default is 3)
>>
>> If you know those things, then you can make a more meaningful estimation.
>>
>> I hope that this helps.
>>
>> -Justin
>>
>>
>>
>>
>> On Thu, Jul 14, 2011 at 6:02 PM, Maria Neise
>>  wrote:
>>> Hey,
>>> I would like to store 20GB of data with Riak. Does anyone know how
>>> much memory Riak would need for that?
>>>
>>> Cheers,
>>> Maria
>>>
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Connection Pool with Erlang PB Client Necessary?

2011-07-26 Thread Justin Sheehy

The simplest guidance on client IDs that I can give:

If two mutation (PUT) operations could occur concurrently or without
awareness of each other, then they should have different client IDs.

As a result of the above: if you are sharing a connection, then you
should use a different client ID for each separate user of that
connection.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Connection Pool with Erlang PB Client Necessary?

2011-07-26 Thread Justin Sheehy

Yes, Andrew -- that is a fine approach to using a connection pool.

Go for it.

-Justin



On Tue, Jul 26, 2011 at 3:18 PM, Andrew Berman  wrote:
> Thanks for all the replies guys!
>
> I just want to make sure I'm totally clear on this.  Bob's solution
> would work well with my design.  So basically, this would be the
> workflow?
>
> 1.  check out connection from the pool
> 2.  set client id on connection (which would have some static and some
> random component)
> 3.  perform multiple operations (gets, puts, etc.) which would be seen
> as a single "transaction"
> 4.  check in the connection to the pool
>
> This way once the connection is checked out from the pool, if another
> user comes along he cannot get that same connection until it has been
> checked back in, which would meet Justin's requirements.  However,
> each time it's checked out, a new client id is created.
>
> Does this sound reasonable and in line with proper client id usage?
>
> Thanks again!
>
> Andrew
>
>
> On Tue, Jul 26, 2011 at 11:55 AM, Justin Sheehy  wrote:
>> The simplest guidance on client IDs that I can give:
>>
>> If two mutation (PUT) operations could occur concurrently or without
>> awareness of each other, then they should have different client IDs.
>>
>> As a result of the above: if you are sharing a connection, then you
>> should use a different client ID for each separate user of that
>> connection.
>>
>> -Justin
>>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak_core questions

2011-07-28 Thread Justin Sheehy

Hi, Dmitry.

A couple of suggestions...

The reason that you're not seeing an easy way to automatically have nodes be 
added or removed from the cluster upon going down or coming up is that we 
recommend strongly against such behavior.

The idea is that intentional (administrative) outages are very different in 
nature from unintentional and potentially transitory outages. We have explicit 
administrative commands such as "join" and "leave" for the administrative 
cases, making it very easy to add or remove hosts to a cluster. When a node is 
unreachable, you often can't automatically tell whether it is a host problem or 
a network problem and can't automatically tell if it is a long-term or 
short-term outage. This is why mechanisms such as quorums and hinted handoff 
exist: to ensure proper operation of the cluster as a whole throughout such 
outages. Consider the case where you have a network problem such that several 
of your nodes lose visibility to each other for brief and distinct periods of 
time. If nodes are auto-added and auto-removed then you will have quite a bit 
of churn and potentially a very harmful feedback scenario. Instead of 
auto-adding and auto-removing, consider using things like 
riak_core_node_watcher to decide which nodes to interact with on a 
per-operation basis.

I'm also not sure what you mean by "if the master node goes down" since in most 
riak_core applications there is no master node. Of course you can create such a 
mechanism if you need it, but (e.g.) Riak KV and the accompanying applications 
do not have any notion of a master node and thus do not have any such concern.

I hope that this is useful.

Best regards,

-Justin



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak_core questions

2011-07-28 Thread Justin Sheehy

Hi, Dmitry.

On Thu, Jul 28, 2011 at 12:22 PM, Dmitry Demeshchuk
 wrote:

> By master node, I mean the one that is used when we are joining new
> nodes using riak-admin (as far as I remember, only one node can be
> used for this).

You can use any node at all in the existing cluster for this purpose.
They are all effectively identical.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Getting a value: get vs map

2011-07-29 Thread Justin Sheehy

Jeremiah,

You were essentially correct. A "targeted" MR does not have to search
for the data, and does not slow down with database size. It is a
bucket-sweeping MR that currently has that behavior.

-Justin



On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka
 wrote:
> I would have suspected that an MR job where you supply a Bucket, Key pair 
> would be just as fast as a Get request. Shows what I know.
> ---
> Jeremiah Peschka
> Founder, Brent Ozar PLF, LLC
>
> On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
>
>> MapReduce ( or a simply Map ) gets really slow when database has a 
>> significant amount of data ( or distributed over several servers ). Get 
>> instead is always faster as Riak doesn't have to search for the key ( you 
>> tell Riak exactly where to GET the data in your url )
>>
>> Rohman
>>
>> On Thu, 28 Jul 2011 23:43:06 +0400, m...@mawhrin.net wrote:
>>
>>> Hi,
>>>
>>> (I looked at various places for the information, however I could not
>>> find anything that would answer the question.  It's not completely ruled
>>> out that not all places were checked though :))
>>>
>>> I use PB erlang interface to access the database.  Given a bucket name
>>> and a key, the value can easily be extracted using:
>>>
>>>     {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key),
>>>     Value = riakc_obj:get_value(Object)
>>>
>>> Alternatively, a mapred (actually, just map) request could be issued:
>>>
>>>     {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [
>>>         {Bucket, Key}
>>>     ], [
>>>         {map, {modfun, riak_kv, map_object_value}, none, true}
>>>     ])
>>>
>>> I would expect that the result is the same while in the second case, the
>>> amount of data transferred to the client is smaller (which might be good
>>> for certain situations).
>>>
>>> So the [open] question is: are there any reasons for using the first
>>> approach over the second?
>>>
>>> --
>>> Misha
>>>
>> --
>>
>>               Antonio Rohman Fernandez
>> CEO, Founder & Lead Engineer
>> roh...@mahalostudio.com               Projects
>> MaruBatsu.es
>> PupCloud.com
>> Wedding Album
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: how can I trigger a manual merge?

2011-07-30 Thread Justin Sheehy

A direct call to bitcask:merge could force all of the files to be
processed, including the removal of expired entries. That won't happen
under normal Riak operation as none of the triggers will be passed by
your use, but you could certainly write a script to do it directly.

-Justin



On Fri, Jul 29, 2011 at 7:36 PM, Steve Webb  wrote:
> So, I'm still working on an "insert and never delete" use of riak.  I'm
> expiring data after a certain amount of time, but from what I've heard/read,
> it's not possible to trigger a merge at all with my usage pattern.
>
> So, is there a way for me to write something in erlang or something that I
> can throw into cron to do periodic merges and clean things up?
>
> - Steve
>
> --
> Steve Webb - Senior System Administrator for gnip.com
> http://twitter.com/GnipWebb
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Understanding put if_not_modified.

2011-09-18 Thread Justin Sheehy

Hi, Igor.

Riak (quite intentionally, for availability reasons) does not provide any sort 
of global transactions or user-exposed locking. One result of this is that you 
can't do exactly what you tried -- or least not that simply.

You might be interested in https://github.com/mochi/statebox 

-Justin



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: automatically expiring keys with LevelDB?

2011-10-21 Thread Justin Sheehy

On Oct 21, 2011, at 4:22 PM, Nate Lawson wrote:

> I know Bitcask has the expiry_secs option for expiring keys, but what about 
> LevelDB? We're thinking of using Luwak as a file cache frontend to S3, and it 
> would be nice for older entries to be deleted in LRU order as we store newer 
> files. This could be implemented as a storage quota also (high/low water 
> mark).

There is no functionality like this in LevelDB at this time.

Also, I do not recommend using bitcask's expiry beneath Luwak unless you are 
prepared to deal with the fact that parts of a Luwak object might disappear 
before others.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Moving Riak bitcask directory

2011-11-23 Thread Justin Sheehy

Stephen,

That should work fine.

-Justin



On Nov 23, 2011, at 11:05 AM, Stephen Bennett  wrote:

> I want to move my Riak bitcask directory onto a different filesystem 
> partition in order to make use of more space that is available.
> 
> Is it as simple as:
> 
> 1. Stopping Riak
> 2. Moving the directory to the new partition
> 3. Sym-linking the directory to the old location
> 4. Starting Riak
> 
> Is there a better way to do this, and is there anything that I should be 
> looking out for when doing this?
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Bitcask won't merge without explicit merge() call

2011-12-12 Thread Justin Sheehy

Dmitry,

What you are expecting is Bitcask's normal behavior, though I can see why it 
might not be what you expected.

Bitcask does not quite auto-merge; instead it provides you with the tools to 
easily decide when a merge is needed, and to easily have a merge scheduled when 
you wish.

Does this example of usage clarify it for you? 
https://github.com/basho/riak_kv/blob/master/src/riak_kv_bitcask_backend.erl#L371-374

We should probably create better documentation for this aspect of Bitcask usage 
in any case.

-Justin





___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Open ticket for configurable R-value in MapReduce?

2011-12-14 Thread Justin Sheehy

Elias,

On Dec 14, 2011, at 5:32 PM, Elias Levy wrote:

> If you add a node, that node will be empty.  If MR chooses the new node, the 
> choice of R=1 will cause it to think there is no data to process.  As time 
> goes on that node will gain new data or be populated by read-repair, but it 
> will still not have a complete data set until either all previous data has 
> been read, updated, or deleted.

That is not the case. The new node will be populated by its peers in order to 
fill up its newly-owned vnodes with the appropriate data.

> Just to confirm, you are saying that existing KV and Search data will be 
> redistributed within a cluster when you add a new node?  

That is indeed what he was saying, yes.

I hope that this clarification is helpful for you.

Best,

-Justin



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Python-riak links?

2011-12-24 Thread Justin Sheehy

They are just stored in the metadata field of the object; what you describe is 
roughly equivalent except that link traversal can occur without roundtrips 
between Riak and your client.

Justin

On Dec 24, 2011, at 11:38 AM, Shuhao Wu  wrote:

> How are the links implemented?
> 
> Would it be faster if I just store the unicode key in the db and look
> it up or should I use links instead?
> 
> Thanks,
> 
> Shuhao
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Absolute consistency

2012-01-10 Thread Justin Sheehy

On Jan 10, 2012, at 9:42 PM, Les Mikesell wrote:

> How do things like mongo and elasticsearch manage atomic operations
> while still being redundant?

Most such systems use some variant of primary copy replication, also known as 
master/slave replication.

That approach can provide consistency, but has much weaker availability 
properties.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Delete old record

2012-01-19 Thread Justin Sheehy


On Jan 18, 2012, at 7:12 PM, kser wrote:

> Is there anyway to delete old record ??

This question could mean either of two things.

You can of course issue a delete request against any records you like, using 
any of Riak's APIs.

If you would instead like records to automatically be deleted when they are 
old, and you are using the Bitcask storage engine, you can configure it for 
expiry:
https://help.basho.com/entries/466512-how-can-i-automatically-expire-a-key-from-riak

So, no matter which of the two questions you were asking -- the answer is "yes."

-Justin



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

licenses (was Re: riakkit, a python riak object mapper, has hit beta!(

2012-03-01 Thread Justin Sheehy

Hi, Andrey.

On Mar 1, 2012, at 10:18 PM, "Andrey V. Martyanov"  wrote:

> Sorry for GPL, it's a typo. I just don't like GPL-based licenses, including 
> LGPL. I think it's overcomplicated.

You are of course free to dislike anything you wish, but it is worth mentioning 
that GPL and LGPL are very different licenses; the LGPL is missing infectious 
aspects of the GPL.

There are many projects which could not use GPL code compatibly with their 
preferred license but which can safely use LGPL code.

Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Question about the source code: riak_get_fsm

2010-04-13 Thread Justin Sheehy

Hi, Marc.

I understand your confusion as that code is a bit subtle.

The reason this isn't a bug is that upon receiving the very first
notfound in your situation, the  "FailThreshold" case in the clause
for notfound messages would return true -- since it would already know
that it could never get 3 ok responses after that.  The FSM would
immediately send a notfound to the client and would not wait for the
subsequent vnode responses.

I hope that this explanation was helpful.

Best,

-Justin

On Tue, Apr 13, 2010 at 9:00 AM, Marc Worrell  wrote:
> Hi,
>
> I was reading the source code of riak_get_fsm to see how failure is handled.
> I stumbled on a construction that I don't understand.
>
> In waiting_vnode_r/2 I see that:
> 1. on receiving an ok: there is a check if there are R ok replies
> 2. on receiving notfound: there is a check of there are R (ok + notfound) 
> replies
>
> Now suppose I have R = N = 3.
> And I get back from the nodes the sequence: [notfound, ok, ok]
> Then #state.replied_r = 2, and #state.replied_notfound = 1.
> This will let "waiting_vnode_r({r, {ok, RObj}, ...)" stay in the state 
> "waiting_vnode_r".
> Though we know we got an answer from all R (N) nodes, only a timeout will 
> move the fsm further.
>
> Could this be handled differently or am I missing something?
>
> - Marc
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Big Changes in Riak Tip

2010-04-14 Thread Justin Sheehy

On Wed, Apr 14, 2010 at 1:47 PM, Jonathan Lee  wrote:

> I'm having trouble building with the latest tip on OS X 10.6.  Does 0.10
> require Erlang R13B04?

Yes, it does.  That (and the reason for it) will be in the 0.10 release notes.

Our apologies for not making that clearer earlier.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: sidebar :: quick webmachine question

2010-04-19 Thread Justin Sheehy

Hi, Richard.

On Mon, Apr 19, 2010 at 12:08 PM, Richard Bucker  wrote:

> I read an article(from someone at basho) that said that WebMachine was going
> to be more public or something like that. In the meantime it has been forked
> several times and yet projects like riak integrate it. Other branches are
> many months old.
> So would the real webmachine please stand up.

Webmachine has been public for some time.

It has its own mailing list and repo:

http://www.basho.com/developers.html#Webmachine

http://lists.therestfulway.com/mailman/listinfo/webmachine_lists.therestfulway.com

http://hg.basho.com/webmachine

http://webmachine.basho.com/docs.html

I hope that those references help you to find what you need.  If you
have more questions, please feel free to ask them.  You might get even
more useful answers from the people on the Webmachine mailing list.

Cheers,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: running naked : suggested firewall rules

2010-04-21 Thread Justin Sheehy

On Wed, Apr 21, 2010 at 8:27 AM, richard bucker  wrote:

> If a riak server is insecure in the DMZ then it's also insecure in the
> enterprise.

I might be misunderstanding what you mean by this.  I don't know of
any enterprises that think it is a good idea to run their Oracle
databases directly available to the general internet.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: setting default bucket props

2010-04-28 Thread Justin Sheehy

If the N value for the bucket is lower than the R or W value in a
request, then the request cannot succeed.  That sounds likely in this
case.  An upcoming release will provide more useful messages when
someone makes that particular client error.

-Justin



On Wed, Apr 28, 2010 at 12:35 PM, Matthew Pflueger
 wrote:
> Doing what Sean suggested worked (or just specifying the chash_fun in
> the default_bucket_props).  Now I'm running into weird behavior that
> I'm guessing is related to the n_val setting.  I'm running three nodes
> all on separate machines joined with a ring partition size of 64
> (22,21,21).  On a fourth machine I'm running a load test in which a
> process spawns 10 threads per node, each thread connecting to a one of
> the nodes via protobuffs getting and putting random key/values in one
> bucket.  In my previous tests I used the default settings for the
> bucket (n_val of 3) and everything ran smoothly for many hours.  Now
> I'm trying to set the default_bucket_props just changing the n_val to
> 1.  No errors in the logs and all clients connect successfully.
> However, pretty much all communication times-out which does not happen
> with the default bucket props (changing the n_val back to 3 fixes the
> problem).
>
> --Matthew
>
>
>
> On Wed, Apr 28, 2010 at 11:39, Sean Cribbs  wrote:
>> We used to have a function that would merge the values from app.config with
>> the hardcoded defaults for bucket properties.  I've opened an issue on
>> bugzilla for this problem (Bug 123). In the meantime, remove the stuff
>> you've set, start up the console, and run this in the Erlang shell:
>> application:get_all_env(riak_core).
>> From that output, copy the default_bucket_props and modify what you want.
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>> On Apr 28, 2010, at 10:57 AM, Matthew Pflueger wrote:
>>
>> Forgot to say I'm using riak-0.10.1...
>>
>> --Matthew
>>
>>
>>
>> On Wed, Apr 28, 2010 at 10:56, Matthew Pflueger
>>  wrote:
>>
>> I am trying to set the default n_val in my app.config.  I'm not
>>
>> getting any errors on startup but when a client tries to put some data
>>
>> a process crashes eventually causing a time-out on the client side...
>>
>> app.config part:
>>
>> [
>>
>>  %% Riak Core config
>>
>>  {riak_core, [
>>
>>              %% Default location of ringstate
>>
>>              {ring_state_dir, "data/ring"},
>>
>>              %% Default bucket props
>>
>>              {default_bucket_props, [{n_val, 1}]},
>>
>>
>> I'm seeing the following in the logs:
>>
>> sasl-error.log:
>>
>> =CRASH REPORT 28-Apr-2010::15:36:22 ===
>>
>>  crasher:
>>
>>    initial call: riak_kv_put_fsm:init/1
>>
>>    pid: <0.505.0>
>>
>>    registered_name: []
>>
>>    exception exit: {undef,[{riak_core_bucket,defaults,[]},
>>
>>                            {riak_core_util,chash_key,1},
>>
>>                            {riak_kv_put_fsm,initialize,2},
>>
>>                            {gen_fsm,handle_msg,7},
>>
>>                            {proc_lib,init_p_do_apply,3}]}
>>
>>      in function  gen_fsm:terminate/7
>>
>>    ancestors: [<0.504.0>]
>>
>>    messages: []
>>
>>    links: []
>>
>>    dictionary: []
>>
>>    trap_exit: false
>>
>>    status: running
>>
>>    heap_size: 1597
>>
>>    stack_size: 24
>>
>>    reductions: 475
>>
>>  neighbours:
>>
>> erlang.log.1
>>
>> =ERROR REPORT 28-Apr-2010::15:36:22 ===
>>
>> ** State machine <0.503.0> terminating
>>
>> ** Last event in was timeout
>>
>> ** When State == initialize
>>
>> **      Data  == {state,
>>
>>                     {r_object,<<"profiles">>,<<"DymvhHkDplIEmpowMdQ35Q">>,
>>
>>                         [{r_content,
>>
>>                              {dict,0,16,16,8,80,48,
>>
>>                                  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>
>>                                   [],[]},
>>
>>
>>  {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>
>>                                    [],[]}}},
>>
>>                              <<>>}],
>>
>>                         [{<<31,41,45,38>>,{1,63439684582}}],
>>
>>                         {dict,1,16,16,8,80,48,
>>
>>
>> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>
>>                             {{[],[],[],[],[],[],[],[],[],[],
>>
>>                               [[<<"content-type">>,97,112,112,108,105,99,97,
>>
>>
>> 116,105,111,110,47,111,99,116,101,116,45,115,
>>
>>                                 116,114,101,97,109]],
>>
>>                               [],[],[],[],[]}}},
>>
>>                         <<4,155,69,121,249,86,125,168,81,201,133,2,65,248,
>>
>>                           238,53,23,1,40,242,226,220,30,37,113,164,204,34,
>>
>>
>> 199,41,155,198,77,100,101,234,83,233,181,96,207,10,
>>
>>                           ...lots more data...
>>
>> ** Reason for termination =
>>
>> ** {'function not exported',[{riak_core_bucket,defaults,[]},
>>
>>                             {riak_core_util,chash_key,1},
>>
>>

Re: setting default bucket props

2010-04-28 Thread Justin Sheehy

On Wed, Apr 28, 2010 at 1:38 PM, Matthew Pflueger
 wrote:

> Stupid question: Is there a way to set the default read values for a
> request on the server side when a client doesn't explicitly set them?

Not currently.  The defaults at this time are in the client libraries.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Hello, Bitcask!

2010-05-05 Thread Justin Sheehy

Riak Users,

You might have noticed that we released a new local key/value store
recently: http://blog.basho.com/2010/04/27/hello,-bitcask/

As of just now, it is available as a storage engine ("backend") in the
tip of the Riak repository.

You can use it like any other backend just by setting the
storage_backend application variable in the riak_kv application to
riak_kv_bitcask_backend (in your "app.config") on a fresh node so that
it will use Bitcask for storage.

There is a new application in app.config, "bitcask", for more detailed
configuration of bitcask behavior.  Some of the variables you can set
in there are:

data_root: string (required) - the directory for bitcask to use for
storage and metadata

merge_strategy: {hours, N} - perform a data file merge every N hours

sync_strategy: how to manage syncing of data files being written.  choices:
   none   (default)- let the O/S decide
   o_sync   - use the O_SYNC flag to sync each write
   {seconds, N}   - call bitcask:sync/1 every N seconds

A couple of things aren't done yet, including more proactive
generation of hintfiles, faster startup time, smarter merge
strategies, more extensive testing on more platforms, documentation on
usage, and more.  We are not yet recommending this as a primary
production backend, but we expect to very soon.  Your feedback is
welcomed.

-Justin

p.s. -- it's not slow.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak Bitcask backend is very unstable on OS X 10.6.3

2010-05-08 Thread Justin Sheehy

Hello,

That error message is due to running out of filehandles.  I am
guessing that you have a large number of empty files in your bitcask
data directories.  If so, there are two pieces of information you may
find useful:

1 - it is safe to delete the empty files

2 - This will be addressed very soon, before bitcask is considered an
officially-supported backend.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Replication behavior

2010-05-13 Thread Justin Sheehy

Hi, Jimmy.

With an n_val of 3, there will be 3 copies of each data item in the
cluster even when there are less than 3 hosts.  With 2 nodes in that
situation, each node will have either 1 or 2 copies of each item.

Does that help with your understanding?

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: CAP controls

2010-05-13 Thread Justin Sheehy

Hi, Jeremy.

It sounds like an interesting project.  At this time, there is no way
to indicate in Riak that two nodes are actually on the same host (and
therefore should not overlap in replica sets).  It could certainly be
done, but to do so today would require modification to the ring
partition claim logic.

Best,

-Justin



On Thu, May 13, 2010 at 4:57 PM, Jeremy Hinegardner
 wrote:
> I am thinking about how to possibly replace an existing system that has heavy
> I/O load, low CPU usage, with riak.  Its a file storage system, with smallish
> files, a few K normally, but billions of them.
>
> The architecture, I think, would be one riak node per disk on the hardware,
> and probably run about 16 riak nodes per physical machine.  Say I had
> 4 of these machines, which would be 64 riak nodes.
>
> With something like this, if I set W=3 as a CAP tuning, I would want to make
> sure that at least 2 of those writes where on 2 physically different machines,
> so in case I had a hardware failure, and it took out a physical machine, I 
> could
> still operate with the other 3 machines.
>
> Is something like this possible with riak?
>
> enjoy,
>
> -jeremy
>
> --
> 
>  Jeremy Hinegardner                              jer...@hinegardner.org
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: returning multiple documents

2010-05-14 Thread Justin Sheehy

Hi, Gareth,

You've pretty much hit on it.  Either of your two options will work fine.

Regards,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Recovering datas when a node was joining again the cluster (with all node datas lost)

2010-05-18 Thread Justin Sheehy

Hello, Germain.

You've already come across read-repair.  Between that and
hinted-handoff a great deal of passive anti-entropy is performed in a
Riak cluster.  As long as one doesn't use requests with R=N these
mechanisms are generally sufficient.

We do have plans for a more "active" anti-entropy as well, so that if
you know a given node has lost all of its data you can trigger a much
more efficient and immediate complete repair from the replicas in
other nodes.  (without needing external backups)  At this point, that
is only a plan and not a developed feature, so if you don't perform
backups then a trawling read-repair is your best bet in the case of a
complete loss of a node.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: cannot query bucket when a node is down

2010-06-01 Thread Justin Sheehy

On Tue, Jun 1, 2010 at 1:56 PM, Sam Tingleff  wrote:

> With no single point of failure there is no single index of keys. So
> the only way to get an exhaustive list of keys in a given bucket is to
> ask all nodes (I do not know if this is what riak is actually doing).

Sam is exactly right that Riak doesn't centralize anything and so
there is no collected index of keys.

However, you don't quite have to ask every node; you have to ask
enough nodes to know that you hit at least one replica of every
object.  This is what listing keys (GET /bucket) does.  There was a
bug that just recently got fixed that could in some cases cause the
whole listing to hang up due to a single misbehaving or down node,
depending on timing.  This fix was placed in tip this morning and the
fix will go out in the next release.

As long as you have enough nodes around that you could get every
object in the cluster using R=1, you should be able to list the keys
in a bucket.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: I need distributed file system ?

2010-06-03 Thread Justin Sheehy

Hello, Antoni.

Riak handles all the distribution for you, and generally expects to
store its data to a local filesystem.  You do not need or want any
sort of underlying distributed filesystem in addition to Riak.

Best,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Switching of backends

2010-06-08 Thread Justin Sheehy

Germain,

If you have enough excess capacity that your cluster will be safe with
one less machine for a little while, you can do this another way.

Just "riak-admin leave" one machine, wait for it to hand off all of
its data, "riak stop", set up that machine with a new
install/config-file/backend/etc, and then start and join it as though
it was a brand new node.  Wait for it to get its share of data sent to
it in its new role, then repeat this process on the next node.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Switching of backends

2010-06-08 Thread Justin Sheehy

On Tue, Jun 8, 2010 at 8:29 AM, Mårten Gustafson
 wrote:

> How would I know when a node has handed off all its data - would the
> status command report that it doesn't own any partitions?

Good question.  That won't quite do it, because the node will give up
ownership of the partitions first, and that will cause it to begin
pushing off that data to the new owners.

We hope to add a more obvious sign in the stats resource for this, but
for now the easiest way to tell is to just look at the disk usage in
the exiting node's data directory.  It should become empty when the
node completes handing off data.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: [ANN] Riak Release 0.11.0

2010-06-11 Thread Justin Sheehy

Hi, Germain.

On Fri, Jun 11, 2010 at 11:07 AM, Germain Maurice
 wrote:

> Because of its append-only nature, stale data are created, so, how does
> Bitcask to remove stale data ?

An excellent question, and one that we haven't yet written enough about.

> With CouchDB the compaction process on our data never succeed, too much
> data.
> I really don't like to have to launch manually this kind of process.

Bitcask's merging (compaction) process is automated and very tunable.
These parameters are the most relevant in your bitcask section of
app.config:

(see the whole thing at http://hg.basho.com/bitcask/src/tip/ebin/bitcask.app)

%% Merge trigger variables. Files exceeding ANY of these
%% values will cause bitcask:needs_merge/1 to return true.
%%
{frag_merge_trigger, 60},  % >= 60% fragmentation
{dead_bytes_merge_trigger, 536870912}, % Dead bytes > 512 MB

%% Merge thresholds. Files exceeding ANY of these values
%% will be included in the list of files marked for merging
%% by bitcask:needs_merge/1.
%%
{frag_threshold, 40},  % >= 40% fragmentation
{dead_bytes_threshold, 134217728}, % Dead bytes > 128 MB
{small_file_threshold, 10485760},  % File is < 10 MB

Every few minutes, the Riak storage backend for a given partition will
send a message to bitcask, requesting that it queue up a possible
merge job.  (only one partition will be in the merge process at once
as a result of that queue)  The bitcask application will examine that
partition when that request reaches the front of the queue.  If any of
the trigger values have been exceeded, then all of the files in that
partition which exceed any threshold values will be run through
compaction.

This allows you a great deal of flexibility in your demands, and also
provides reasonable amortization of the cost since each partition is
processed independently.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: [ANN] Riak Release 0.11.0

2010-06-14 Thread Justin Sheehy

Hi, Alan.

Your replicas do in fact exist on both nodes.  However, I understand
that the situation you are observing is confusing.  I will attempt to
explain.

Quite some time ago, something surprising was noticed by some of our
users during their pre-production testing.  Some intentional failure
scenarios (with busted nodes, etc) would fail much more slowly when
R=1 than when R=2.  This was due to the fact that to satisfy a R=1
request with a non-object response (timeout or notfound), we would
wait for all N nodes to reply.  With R=2, we could send this response
as soon as N-1 nodes reply.  In some situations this is a dramatic
difference in time.

To remove this perceived problem we implemented what we refer to as
"basic quorum".  If a simple majority of vnodes have produced
non-successful internal replies, we return a non-success value such as
a notfound.  This means that if there is only one copy of the object
out there, and the node holding it is slowest to respond, the client
will not see that object in their response but will instead get the
notfound instead of waiting for the last node to respond or time out.

(note that read-repair will still occur in any case)

This could be avoided if we considered "not found" to be a success
condition, but then in the above situation you would see not founds
even with R=2.  That would simply be defined as another kind of
"successful" response.  Either way, it is a tradeoff of different
kinds of surprise.

I hope that this explanation helps with your understanding.

On another note, it's not useful to run Riak with a number of physical
hosts less than your N value unless you're planning on expanding it
soon.  So: testing with 2 hosts and N=3 means that you are testing
against a very much not-recommended configuration.  I suggest either
using more hosts or else changing your default bucket N value to 2.

-Justin

On Mon, Jun 14, 2010 at 1:59 PM, Alan McConnell  wrote:
> Hey Dan,
> I have a 2-node cluster with default bucket settings (N=3, etc.), and if I
> take one of the boxes down (and perform reads with R=1) I get tons of "key
> not found" errors for keys I know exist in the cluster.  Seems like for many
> keys, all 3 replicas live on one host.  From what you've written here
> though, it seems like that should not happen.  Do you know of any way my
> cluster could have gotten into this state?
> I did run a restore on this cluster using a riak-admin backup from a
> different, single-node cluster.  I wonder if that caused an uneven
> distribution.
> Any help would be appreciated.  As it stands now our 2-node cluster has
> serious read problems if either node goes down.
> -Alan

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak Recap for 6/10 - 6/13

2010-06-22 Thread Justin Sheehy

Hi, Joel.  Thanks for your input!

On Mon, Jun 14, 2010 at 4:17 PM, Joel Pitt  wrote:

> [re bitcask and in-memory data]

> I'm sure it's probably already been considered, but just in case...
> bloom filters could be an alternative to the requirement of keeping
> *all* the keys in memory. I don't know if this would necessarily fit
> with the usage of this in-memory key/metadata data structure though.

We are exploring ways to keep bitcask's overall performance profile
while relaxing the memory requirement a bit, though we have not yet
determined how to do so.

Bloom filters can be incredibly handy, but wouldn't (alone) solve this
problem as we use the in-memory hash table to tell bitcask the
location of the stored value in terms of file and offset.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Best way to back-up riak

2010-07-11 Thread Justin Sheehy

Hi, Jan.

On Sun, Jul 11, 2010 at 8:53 PM, Jan Vincent  wrote:

> Given that riak is new in the database field, if ever I use riak in 
> production,
> what would be the best way to back it up? I know that there's redundancy
> on the different nodes and NRW may be modifiable per request, but I'm
> wondering if there's a way to snapshot the dataset periodically -- at least
> until riak becomes provably battle tested.

Riak is fairly battle-tested already: we were using its prior version
under Basho's own customer-facing applications in 2008, and a number
of external customers and users are in production today.  That said,
even a solid distributed database needs to be backed up as there are
many reasons to have backups.

The easiest and best way to back up Riak is, if you are using bitcask
(the default) as the backend, to simply back up the filesystem of your
nodes with whatever backup system you use for the rest of your
systems.  Bitcask uses append-only files, and once it closes a file it
will never change the content of that file again.  This makes it very
backup-friendly.

If you are using a backend with less backup-friendly disk format (such
as innostore) then you can use the "riak-admin backup" command at
either the per-node or whole-cluster level to produce a
backend-independent snapshot that can be loaded back in via
"riak-admin restore".  This method is much slower, will impose
additional load on your cluster when running, and requires that you
have a place to put the generated snapshot.  However, it will work
regardless of backend and is also a simple if heavyweight way to
migrate to a cluster with a different configuration.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Conflict Resolution

2010-07-13 Thread Justin Sheehy

Hello, Misha.

On Tue, Jul 13, 2010 at 1:06 PM, Misha Gorodnitzky  wrote:

> From doing a little testing, the last value in a multipart document is
> the first, so "Thursday" in this case, can we assume that this will
> always be the case? And is it a good idea to base conflict resolution
> on this?

It is not really a good idea to base conflict resolution on the order
that Riak presents the siblings.  While in simple cases you may see
predictable behavior, there is no guarantee of determinism in the
order they'll be stored in.

I suggest instead that if you need an interesting conflict resolution
strategy, you might do well to store the information needed for that
strategy explicitly in the object along with the content.

I hope that this helps.  Please do ask more if this doesn't clear it up for you.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak slides?

2010-07-13 Thread Justin Sheehy

Hi, Wilson.

There are many sets out there.  Which ones suit you best depends a lot
on what you plan on saying in your talk.  If you tell us a bit about
the audience, the event, and what you hope to get across in your talk,
then I bet that one of the people here who has given a Riak talk will
have material useful for you to crib from.

Cheers,

-Justin



2010/7/13 Wilson MacGyver :
> Hi,
>
> I'm going to be giving a talk on riak sometime soon. Anyone has slides
> I can steal/borrow? :)
>
> Thanks
>
> --
> Omnem crede diem tibi diluxisse supremum.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Conflict Resolution

2010-07-14 Thread Justin Sheehy

On Wed, Jul 14, 2010 at 5:25 AM, Misha Gorodnitzky  wrote:

> I don't suppose there are any examples anywhere of how people have
> approached conflict resolution with RIak? That would be useful to help
> people understand how to approach it ... maybe a section on the wiki
> could be dedicated to it.

This is a great idea.  We'll find a right place to present this that's
easier to find.

> In our particular case, we're trying to store transactions in Riak and
> need to guard against a transaction being placed on a balance that has
> reached 0. The problem we keep running into is race conditions between
> when we record the transaction and when we update the cached balance
> value. Any suggestions on how this has been, or can, be solved would
> be appreciated.

I suggest that you solve this similarly to the way that banks have
been doing so for far longer than there have even been computers
involved.  Each transaction should be (at least) a unique identifier,
a time, and the amount being added or subtracted to the balance.  This
way (in addition to storing what you believe the balance to be at any
time) you can reconcile balances even if you get some transactions
late or multiple times.

More specifics than that will depend a lot on your application, but
the key here is that you can make things much neater in situations
where your actions can be commutative and idempotent.  That's why you
store the transaction itself instead of just the balance, and a unique
id so that you don't repeat yourself.

Best of luck,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Expected vs Actual Bucket Behavior

2010-07-20 Thread Justin Sheehy

Hi, Eric!  Thanks for your thoughts.

On Tue, Jul 20, 2010 at 12:39 PM, Eric Filson  wrote:

> I would think that this requirement,
> retrieving all objects in a bucket, to be a _very_ common
> place occurrence for modern web development and perhaps (depending on
> requirements) _the_ most common function aside from retrieving a single k/v
> pair.

I tend to see people that mostly try to write applications that don't
select everything from a whole bucket/table/whatever as a very
frequent occurrence, but different people have different requirements.
 Certainly, it is sometimes unavoidable.

> In my mind, this seems to leave the only advantage to buckets in this
> application to be namespacing... While certainly important, I'm fuzzy on
> what the downside would be to allowing buckets to exist as a separate
> partition/pseudo-table/etc... so that retrieving all objects in a bucket
> would not need to read all objects in the entire system

The namespacing aspect is a huge advantage for many people.  Besides
the obvious way in which that allows people to avoid collisions, it is
a powerful tool for data modeling.  For example, sets of 1-to-1
relationships can be very nicely represented as something like
"bucket1/keyA, bucket2/keyA, bucket3/keyA", which allows related items
to be fetched without any intermediate queries at all.

One of the things that many users have become happily used to is that
buckets in Riak are generally "free"; they come into existence on
demand, and you can use as many of them as you want in the above or
any other fashion.  This is in essence what conflicts with your
desire.  Making buckets more fundamentally isolated from each other
would be difficult without incurring some incremental cost per bucket.

> I might recommend a hybrid
> solution (based in my limited knowledge of Riak)... What about allowing a
> bucket property named something like "key_index" that points to a key
> containing a value of "keys in bucket".  Then, when calling GET
> /riak/bucket, Riak would use the key_index to immediately reduce its result
> set before applying m/r funcs.  While I understand this is essentially what
> a developer would do, it would certainly alleviate some code requirements
> (application side) as well as make the behavior of retrieving a bucket's
> contents more "expected" and efficient.

A much earlier incarnation of Riak actually stored bucket keylists
explicitly in a fashion somewhat like what you describe.  We removed
this as one of our biggest goals is predictable and understandable
behavior in a distributed systems sense, and a model like this one
turns each write operation into at least two operations.  This isn't
just a performance issue, but also adds complexity.  For instance, it
is not immediately obvious what should be returned to the client if a
data item write succeeds, but the read/write of the index fails?

Most people using distributed data systems (including but not limited
to Riak) do explicit data modeling, using things like key identity as
above, or objects that contain links to each other (Riak has great
support for this) or other data modeling means to plan out their
expected queries in advance.

> Anyway, information is pretty limited on riak right now, seeing as how it's
> so new, but talk in my development circles is very positive and lively.

Please do let us know any aspects of information on Riak that you
think are missing.  We think that between the wiki, the web site, and
various other materials, the information is pretty good.  Riak's been
open source for about a year, and in use longer than that; while there
are many things much older than Riak, we don't see relative youth as a
reason not to do things right.

Thanks again for your thoughts, and I hope that this helps with your
understanding.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread Justin Sheehy

I think that we are all (myself included) getting two different issues
a bit mixed up in this discussion:

1: storing an implicit index of keys in the Riak key/value store

2: making buckets separate in that a per-bucket operation's
performance would not be affected by the content of other buckets

The thread started out with a request for #2, but included a
suggestion to do #1.  These are actually two different topics.

The first issue, implicitly storing a big index of keys, is
impractical in a distributed key/value storage system that has Riak's
availability goals.  We are very unlikely to implement this as
described in the near future.  However, we very much recognize that
there are many different ways that people would like to find their
data.  In that light, we are working on multiple different efforts
that will use the Riak core to provide data storage with more than
just "simple" key/value access.

The second issue, of isolating buckets, is a much simpler design
choice and is also a per-backend implementation detail.  We can create
and provide an alternative bitcask adapter that does this.  It will be
a real tradeoff: in exchange for buckets not impacting each other as
much, the system will consume more filehandles, be a bit less
efficient at rebalancing, and will generally make buckets no longer
"free".  This is a reasonable tradeoff in either direction for various
applications, and I support making it available as a choice.  I have
created a bugzilla entry to track it:
https://issues.basho.com/show_bug.cgi?id=480

I hope that this helps to clarify the issue.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread Justin Sheehy

Hi, Alexander.

On Wed, Jul 21, 2010 at 1:36 PM, Alexander Sicular  wrote:

> uses a separate bitcask per-bucket per-partition. What is a partition here? A
> vnode or a physical host or something else?

My apologies.  Given that it was in our bugzilla I let myself use some
Riak-internals jargon without explanation.

In this context, a partition is a logical segment of the ring space,
managed by a vnode process on a given physical host.  There is a
1-to-1 mapping between a vnode process and a partition.

The idea is that right now the bitcask backend stores all data in a
given partition together in a single bitcask instance.  The
alternative backend under discussion would break that up, such that
within a partition (and thus in each vnode), there would be a bitcask
instance for every bucket that had any data.

Does that help to clarify?

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Is it inefficient to map over a small bucket when you have millions of other buckets?

2010-07-27 Thread Justin Sheehy

On Tue, Jul 13, 2010 at 6:02 AM, Nicolas Fouché  wrote:

> Giving just a bucket WILL traverse the entire keyspace.

You may be interested in:

https://issues.basho.com/show_bug.cgi?id=480

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Best way to back-up riak

2010-07-27 Thread Justin Sheehy

On Wed, Jul 21, 2010 at 2:01 PM, Alan McConnell  wrote:

> I'm curious about this as well.  Say I have a ten node cluster.  Could I
> just schedule a midnight copy of each bitcask data directory every night,
> then restore to another ten node cluster by dropping one of each data
> directories on each new node?  How close does the timing needs to be?  What
> if the data directory snapshots were taken seconds or minutes apart?

While Basho does provide a product including features that make
whole-datacenter failure much less of a problem (by fully replicating
to a cluster in another location) I will answer assuming you have only
a single cluster.

The timing doesn't have to be perfectly synchronized, but you should
try to make it as close as is practical just so that you have a good
way to judge what is contained in a given backup.  If a storage (put)
operation occurs in an interval between single-node backups, it will
be present in the restored cluster when requested (and repopulated via
read-repair) as long as it was in at least one of the nodes.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Use of fallback nodes for get requests?

2010-08-02 Thread Justin Sheehy

Hi, Nico.

On Mon, Aug 2, 2010 at 1:19 PM, Nico Meyer  wrote:

> What I mean is, if I do a get request for a key with R=N, and one of the
> first N nodes in the preflist is down the request will still succeed.
> Why is that? Doesn't that undermine the purpose of seting R to a high
> number (specifically setting it to N)? That way a request might succeed
> even if all primary nodes responsible for the key are unavailable.

You are correct, and this is intentional.  There is nothing in the R
or W settings that is intended to indicate anything at all about
"primary" nodes.  It is rather simply the number of successful
responses that the client wishes to wait for, and thus the degree of
quorum sought before a client reply is sent.  Using fallback nodes to
satisfy reads is a natural result of using fallback nodes to satisfy
writes.

If all primary nodes responsible for a key are unavailable, but enough
of the fallback nodes for that key have received a value for that key
since they went unavailable (through a fallback write) then a request
to get that key might succeed.  I am not sure why you see this as a
bad thing.

(It will only succeed if R nodes actually provide a successful result,
not just if they are available.)

> On a similar note, why is the riak_kv_get_fsm waiting for at least
> (N/2)+1 responses, if there are only not_found responses, effectively
> ignoring a smaller R value of the request if the key does not exists?

This is a compromise to deal with real situations that can occur where
a single node might be taking a very long time to reply, and a value
has never been stored for a given key.  Without either this basic
quorum default for notfounds or alternately considering a notfound as
success and thus only waiting for R of them, that situation would mean
that an R=1 request would take much longer to complete than an R=2
request (due to waiting for the slow node) which is confusing to most
users.  Note that since it applies to notfounds, this tends to only
come into play for items that have never been successfully stored with
at least a basic quorum -- things that really are not present, that
is.

> My guess was, that this also has to do with the use of fallback nodes:
> Since the partition will usually be very small on the fallback/handoff
> node, it is likely to be the first to answer. So to avoid returning
> false not_found responses, a basic quorum is required.
> Am I on the right track here?

It doesn't have anything to do with fallback nodes explicitly.  It is
for situations where a node is under any condition that will slow it
down significantly.  In such situations, there is little to be gained
in waiting for all N replies if (N/2)+1 have already declared
notfound.

> The problem is, this is imposed even for the case that all nodes are up.
> If one requires very low latency or very high availability (that's why
> one uses a small R value in the first place) and does a lot of gets for
> non existent keys, riak silently screws you over by raising R for those
> keys.

It seems that there is something here worth clarifying.  If you are
issuing requests with W+R<=N, and some reads following writes return
notfound during an interval immediately following initial storage
time... well, that's what you asked for by not requesting a quorum.
If you store the object with a sufficiently high W value first, then
you will not get this sort of notfound response even if your R value
is only 1.

I suppose that providing the freedom to do this might be considered
"screwing you over," but we see it more as allowing you to make
different choices while still providing safe and unsurprising default
behavior.  If you try hard enough to screw yourself over, though, Riak
won't stop you.  If you issue write requests (to any dynamo-model
system) with some W, followed immediately by a read request with some
R, and W+R is not greater than N, you should not be expecting the
write to necessarily be reflected yet.

> I most likely missed something here, but some ad hoc test I did seem to
> be consistent with my understanding of the code.

You have certainly put some real effort into understanding some
choices made in the Riak code, which I appreciate.  I hope that I have
helped to extend your understanding of the real operational scenarios
that have motivated those choices, and how the code will behave in
those scenarios.

Best,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak Heterogeneity

2010-08-21 Thread Justin Sheehy

Hi, Michael.

On Tue, Aug 17, 2010 at 12:52 PM, Michael Russo  wrote:

> In the Dynamo design, the number of vnodes per physical node can be tweaked
> to satisfy the heterogeneity principle.

> Is there any way to do something similar with Riak?

This is something that we think is an important idea, and that the
underlying structure of Riak can work fine with.  However, simply out
of prioritization thus far we have not yet made it easy to do this and
doing so effectively is not simple from a user point of view.  I do
not know of any production clusters at this time that use anything
other than the standard near-equal distribution of vnodes.

I do expect that explicitly configuring different nodes to have
different "weight" will be enabled in a future release, but it is not
currently on anyone's scheduled plans that I know of.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: failed to merge?

2010-08-21 Thread Justin Sheehy

Hi, Wilson.

On Sat, Aug 21, 2010 at 10:06 PM, Wilson MacGyver  wrote:

> =ERROR REPORT 
> Failed to merge
> follow by a bunch of list of bitcask files
>
> with final status
>
> : no_files_to_merge
>
> how does this happen, does this mean some files in the bitcask are missing?

That's just an overenthusiastic message, and nothing to worry about.
It was a very useful thing to see when doing the initial bitcask
integration/backend into Riak.  The message will cease to appear from
your error log in a subsequent release.

All it means is that one merge was scheduled while another was
running, so the first one did all the work and the second had nothing
to do.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

list_keys is less bad

2010-08-23 Thread Justin Sheehy

Riak Users,

One aspect of Riak's interface that has often been discouraged in the
past is the listing of all keys in a bucket.  This has been for two
reasons: the first is that it is necessarily an operation that is more
heavyweight than any of the more targeted get/put/delete sorts of
things, but the second is that due to the priorities of the first many
users of Riak we hadn't really put much optimization into that area.
As a result, anything that required getting all keys from a bucket was
fairly slow and also fairly heavy in terms of memory consumption.

We have put some effort into this recently and seen marked
improvement.  The changes can be summed up as:

1- bitcask has a new fold_keys operation, which performs far less I/O
in most cases than the previous mechanism underlying list_keys.

2- the Riak backend interface to bitcask uses the new fold_keys operation.

3- the mechanism underlying the cluster-wide list_keys operation has
changed to require far less total memory in proportion to the list.

Due to these three changes, there are two effective results:

1- In nearly all cases, the list_keys operator is much faster than
before.  In some common cases it is 10 times faster.

2- In cases of very large buckets, memory allocation will not spike
during key listing. (though of course if you ask Riak to build the
whole list for you instead of streaming it out, then at least that
much must be used to accommodate)

Note that since map/reduce uses the streaming list_keys under the hood
when performing map/reduce over a whole bucket, these changes affect
that interface's performance as well.

The described changes are now in the trunks of the relevant
repositories, and will be included in the next release.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: list_keys is less bad

2010-08-23 Thread Justin Sheehy

On Mon, Aug 23, 2010 at 10:05 PM, Alexander Sicular  wrote:

> Three cheers!

:-)

> Git clone && make all && make rel

It looks like they haven't yet migrated out to the github repos, but
should do so sometime soon.

In the meantime, the bitbucket repos are up to date with tip so you
can get the bleeding edge from there.

Sorry if there was any confusion there.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Filesize in riak

2010-09-04 Thread Justin Sheehy

Hi, John.

On Thu, Sep 2, 2010 at 11:24 AM, John Axel Eriksson  wrote:
> I know the recommendation of max 50 megs per file in riak currently... but I 
> tried
> uploading a file that was around 120 megs and everything went fine.

Riak doesn't itself mandate a maximum object size... but since a
riak_object in transit must be materialized into an in-memory data
structure and copied across processes, large objects can cause very
poor performance or failure.  The exact practical maximum can vary a
bit.

There is some (prototyped but not yet fully integrated) work that,
when released, will allow you to store large objects easily by
transparently chunking them into a hash tree of smaller objects.  More
details on this when it lands.

I hope that this information is helpful.

Best,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak and no of clients limit?

2010-09-04 Thread Justin Sheehy

Hello, Senthilkumar.

On Fri, Sep 3, 2010 at 4:28 PM, Senthilkumar Peelikkampatti wrote:

>    I am using Riak with distributed Erlang and I wanted to know what's
> the limit on # of riak clients (I used it before erlang pb client, so yet to
> migrate). I am using single client to talk to Riak, is it better? or in web,
> is it ok to create a client per request? I looked riak_kv_wm_raw.erl which
> seems using a connection per request but it is a erlang local_client.

There is not a fixed limit imposed by Riak, but it is a general good
practice to re-use clients for subsequent (non-concurrent) requests.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak and no of clients limit?

2010-09-04 Thread Justin Sheehy

Hi, Seth.

On Sat, Sep 4, 2010 at 5:59 PM, Seth Falcon  wrote:

> I'm working on a project where we have a webmachine-backed service
> that talks to Riak.  I currently initialize one pb client for each
> node in the cluster as part of the webmachine startup.  Then the
> resources in the webmachine app ask for one of these clients for each
> request.
>
> Your comment above about reusing clients for non-concurrent requests
> makes me wonder if this is the wrong approach.  Comments or
> suggestions?

Each instantiated riak_client has a unique client-id that will
represent that client in all updates (put-requests) that it makes.
That is, the entries in the vector clock will match that client-id.
Much of the value of vector clocks can vanish if concurrent writes to
the same values can be issued with the same client-id.

Sharing connections as you describe might be fine, depending on the
details.  However, if your resources might overlap in a way like the
following example then you probably have a problem.

A and B are resource instances handling separate concurrent HTTP
requests but sharing a client-id C.
A issues get(K), receiving object X with vector clock V
B issues get(K), receiving object X with vector clock V
A issues put(K,Xa) where Xa is an update to X
B issues put(K,Xb) where Xb is an update to X

You can lose one of the two updates, as they are both a single update
to V from client C.  It is assumed that a given client will not
compete with itself for updates.

I hope that this explanation is helpful.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak and no of clients limit?

2010-09-04 Thread Justin Sheehy

On Sat, Sep 4, 2010 at 7:31 PM, Seth Falcon  wrote:

> Given that, it sounds like one would want a pool of pb clients such
> that each resource takes a client out of the pool when handling a
> request and returns it when done.  So there would be no concurrent
> requests going through the same client.
>
> Does that seem like a reasonable approach?

Yes.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Listing large key spaces, and bucket Links header

2010-09-06 Thread Justin Sheehy

Hi, Gavin.

A couple of things you may be interested in:

 - There have been improvements in both Bitcask and Riak since 0.12.1
(in tip of trunk and will be in the next release) to speed up (and
reduce the resource consumption of) key listing.

 - You should probably use keys=stream in your requests instead of
keys=true, to avoid the full keylist (and Link header) being built up
all at once.

Between those two items, you may have all that you need.  I hope this helps.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: File descriptor leaks?

2010-10-18 Thread Justin Sheehy

Hi, Dmitry.

What version of Riak are you using?  And is there anything interesting
in the error logs?

-Justin




On Thu, Oct 14, 2010 at 7:53 AM, Dmitry Demeshchuk  wrote:
> A small update. I've just encountered the same problem. Just about 3-4
> hours have passed.
>
> lsof | wc -l showed only about 2k descriptors for all users. That's
> even more weird as the 32k descriptors limit is per user. So, we
> haven't reached the limit so far.
>
> On Thu, Oct 14, 2010 at 3:48 PM, Dmitry Demeshchuk  
> wrote:
>> Greetings.
>>
>> We have recently started to get the emfile errors. ulimit -n is 32767.
>> Restarting Riak helps for several hours and then we run out of
>> descriptors again.
>>
>> Some time later after restart I performed lsof and found the following
>> descriptors:
>>
>> kondemand   154       root  cwd   unknown
>> /proc/154/cwd (readlink: Permission denied)
>> kondemand   154       root  rtd   unknown
>> /proc/154/root (readlink: Permission denied)
>> kondemand   154       root  txt   unknown
>> /proc/154/exe (readlink: Permission denied)
>> kondemand   154       root NOFD
>> /proc/154/fd (opendir: Permission denied)
>> kondemand   155       root  cwd   unknown
>> /proc/155/cwd (readlink: Permission denied)
>> kondemand   154       root  cwd   unknown
>> /proc/154/cwd (readlink: Permission denied)
>> kondemand   154       root  rtd   unknown
>> /proc/154/root (readlink: Permission denied)
>> kondemand   154       root  txt   unknown
>> /proc/154/exe (readlink: Permission denied)
>> kondemand   154       root NOFD
>> /proc/154/fd (opendir: Permission denied)
>> kondemand   155       root  cwd   unknown
>> /proc/155/cwd (readlink: Permission denied)
>> kondemand   155       root  rtd   unknown
>> /proc/155/root (readlink: Permission denied)
>> kondemand   155       root  txt   unknown
>> /proc/155/exe (readlink: Permission denied)
>> kondemand   155       root NOFD
>> /proc/155/fd (opendir: Permission denied)
>> kondemand   156       root  cwd   unknown
>> /proc/156/cwd (readlink: Permission denied)
>> kondemand   156       root  rtd   unknown
>> /proc/156/root (readlink: Permission denied)
>> kondemand   156       root  txt   unknown
>> /proc/156/exe (readlink: Permission denied)
>> kondemand   156       root NOFD
>> /proc/156/fd (opendir: Permission denied)
>> kondemand   157       root  cwd   unknown
>> /proc/157/cwd (readlink: Permission denied)
>> kondemand   157       root  rtd   unknown
>> /proc/157/root (readlink: Permission denied)
>> kondemand   157       root  txt   unknown
>> /proc/157/exe (readlink: Permission denied)
>> kondemand   157       root NOFD
>> /proc/157/fd (opendir: Permission denied)
>> kondemand   158       root  cwd   unknown
>> /proc/158/cwd (readlink: Permission denied)
>> kondemand   158       root  rtd   unknown
>> /proc/158/root (readlink: Permission denied)
>> kondemand   158       root  txt   unknown
>> /proc/158/exe (readlink: Permission denied)
>>
>> Also, the following couple of descriptors is opened several times at
>> the same time:
>>
>> bash      20176        dem  mem       REG     252,0   256316   1179925
>> /usr/lib/locale/en_US.utf8/LC_CTYPE
>> bash      20176        dem  mem       REG     252,0       54   1179926
>> /usr/lib/locale/en_US.utf8/LC_NUMERIC
>> bash      20176        dem  mem       REG     252,0     2454   1179927
>> /usr/lib/locale/en_US.utf8/LC_TIME
>> bash      20176        dem  mem       REG     252,0   966938   1179928
>> /usr/lib/locale/en_US.utf8/LC_COLLATE
>> bash      20176        dem  mem       REG     252,0      286   1179929
>> /usr/lib/locale/en_US.utf8/LC_MONETARY
>> bash      20176        dem  mem       REG     252,0       52   1179930
>> /usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES
>> bash      20176        dem  mem       REG     252,0       34   1179931
>> /usr/lib/locale/en_US.utf8/LC_PAPER
>> bash      20176        dem  mem       REG     252,0       77   1179932
>> /usr/lib/locale/en_US.utf8/LC_NAME
>> bash      20176        dem  mem       REG     252,0      155   1179933
>> /usr/lib/locale/en_US.utf8/LC_ADDRESS
>> bash      20176        dem  mem       REG     252,0       59   1179934
>> /usr/lib/locale/en_US.utf8/LC_TELEPHONE
>> bash      20176        dem  mem       REG     252,0       23   1179935
>> /usr/lib/locale/en_US.utf8/LC_MEASUREMENT
>> bash      20176        dem  mem       REG     252,0    26048    917676
>> /usr/lib/gconv/gconv-modules.cache
>> bash      20176        dem  mem       REG     252,0      373   1179936
>> /usr/lib/locale/en_US.utf8/LC_IDENTIFICATION
>>
>> Version of Riak is 0.12.1. There was a similar problem once and the
>> user was advised to make sure to use 0.12.1
>>
>> Any ideas?
>>
>> --
>> Best regards,
>> Dmitry Demeshchuk
>>
>
>
>
> --
> Best regards,
> Dmitry Demeshchuk
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_

Re: File descriptor leaks?

2010-10-24 Thread Justin Sheehy

Hi, Dmitry.

On Mon, Oct 18, 2010 at 11:07 PM, Dmitry Demeshchuk
 wrote:

> We are using 0.12.1.

There was indeed a file descriptor leak in that version of Riak, fixed
between then and the 0.13 release.

I hadn't seen any situations which were causing it to take effect
nearly as quickly as you're describing, but nonetheless an upgrade
should get rid of the problem.

Best,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: RiakSearch Backend Innostore ?

2010-10-30 Thread Justin Sheehy

On Sat, Oct 30, 2010 at 11:51 AM, Prometheus wrote:

> Can we use Innostore for RiakSearch ?  what is the performance comparison for 
> search backends ?  Any information will be valuable.

That depends on whether you mean the actual Search index backend, or
the KV backend used for storing complete analyzed documents.  There is
currently only one backend (merge_index) that works under Riak Search,
but one could certainly swap out the KV part's backend if bitcask
wasn't a good fit for that part.

The relative performance characteristics of bitcask and innostore
under Riak KV are well known in general, but no top-to-bottom testing
of Riak Search that I know of has been performed using innostore as
the KV backend.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak and Locks

2010-11-08 Thread Justin Sheehy

Hello, Neville.

On Mon, Nov 8, 2010 at 10:35 PM, Neville Burnell
 wrote:

> Are there any plans for a Distributed Lock Service for Riak, to allow for
> apps that *need* locking for some KV ?

It has been discussed and agreed that it would be interesting, but
there is nothing currently being developed in the short term to
provide this service integrally to Riak.  If you application needs
locking, some part of it other than Riak will need to provide that
functionality.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Understanding Riaks rebalancing and handoff behaviour

2010-11-09 Thread Justin Sheehy

On Tue, Nov 9, 2010 at 10:30 AM, Alexander Sicular  wrote:
> Mainly, I'm of the impression that you should join/leave a cluster one
> node at a time.

This impression is correct.

I believe that in the not-too-distant future a feature may be added to
enable stable addition of many nodes at once, but at this time the
right approach is to add a node, allow the ringstate to stabilize
through gossip, then repeat as needed.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How could we test/simulate siblings?

2010-11-12 Thread Justin Sheehy

Hi, Cagdas.

On Fri, Nov 12, 2010 at 8:17 PM, Cagdas Tulek  wrote:

> What is the best way of creating sibling records to see if my logic is
> handling them correctly?

Ensure that allow_mult is set to true.

Create some object B/K.

Get that object.  It will come with some vector clock V.

Put some new value X to B/K, using vector clock V in the put request.

Put some new value Y to B/K, using vector clock V in the put request.
(different value but same vclock as the previous put)

Get B/K.  You should get multiple values and some vclock V1.

If you wish to resolve back to a single value, store some new value Z
using vclock V1.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak won't die all the way on OS X

2010-11-30 Thread Justin Sheehy

Jon,

You can just leave empd running.  That is standard erlang runtime
behavior and generally won't cause any problems.

-Justin



On Tue, Nov 30, 2010 at 9:59 AM, Jon Brisbin  wrote:
> I'm running the pre-built binaries for Riak 0.13 (and 0.12 x64, for that
> matter) for OS X 10.6.
> When I do a "riak stop", there is one process still running. The epmd
> -daemon process. I have to kill it manually.
> In my testing, I'm starting 0.13, running a test, then shutting it down,
> starting 0.12 and running another test. If I'm not switching versions, then
> I just leave it running.
> Will this cause a problem if I restart the server and leave this last
> process running? What about if I switch from 0.13 to 0.12 (or vice versa)?
> Will it interfere with anything? Do I even need to kill it?
>
> Thanks!
> J. Brisbin
> http://jbrisbin.com/
>
>
>
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Storing relationship data

2010-12-30 Thread Justin Sheehy

Hi, Bryan.

The link data is embedded in the riak_object metadata, so you can
easily observe it from outside Riak even when not performing
link-walking queries.

To see this in action, check out the "Link" headers when using the
HTTP interface.

-Justin



On Thu, Dec 30, 2010 at 6:36 PM, Bryan Nagle
 wrote:
> Hi,
> For the project that I am currently working on, we are trying to decide on
> the best way to store relational data in riak.  At first glance, the obvious
> choice looks to be links, however one of our requirements is that this
> relationship information has to be sent to a client along with actual data.
>  The client has to be able to map the relationships between the
> data solely from the information it receives while being completely outside
> of Riak.
> So, I was wondering if anyone had any suggestions?  We are considering
> either encapsulating our relationship data within the riak store itself (in
> the value tied to the key), or using riak links.  However, if we use riak
> links, then we have to convert those links into data that the client can
> receive & understand when sending data, and then convert the data we get
> back from the client into riak links;  we are wondering if this extra step
> is worth implementing this kind of a translation.
> Bryan Nagle
> Liquid Analytics
> bryan.na...@liquidanalytics.com
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: allow_multi VS HTTP Conditional PUT

2011-01-03 Thread Justin Sheehy

Hi, Eric.

On Mon, Jan 3, 2011 at 1:09 AM, Eric Moritz  wrote:

> Hi I just read "Why Vector Clocks are Easy". I am having trouble
> seeing the advantage of letting a stale PUT into production and merge
> afterwards vs HTTP's Conditional PUT, which never let's a stale PUT
> into production.

This is an excellent question, and one that we could discuss for some time.

I am a big fan of HTTP conditional requests, but they are not always
compatible with the other operational needs imposed in the interest of
availability.

The main issue is that Riak's approach is designed for a
highly-available distributed system on the server side, while a
standard HTTP conditional PUT mostly makes sense for single-writer (or
at least single-leader) servers.

Riak is designed to accept requests even when arbitrary nodes are down
or unable to talk to each other.  Achieving that availability goal is
in conflict with the typical expectations around conditional PUT,
which are basically those of an atomic CAS operation.  Since not all
nodes that might hold a copy of some given data might be reached
during a write request, Riak cannot maintain its intended level of
availability and simultaneously ensure that you are really only
overwriting exactly the version that you specify.

I hope that this sheds some light on why we have made the choices that
you see in Riak.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: PDF or OpenOffice Impresse presentations instead of .KEY + Windows question

2011-01-05 Thread Justin Sheehy

Hello, Jérôme.

It looks like Jeremiah has already answered one of your questions, so
I'll get the other.

On Wed, Jan 5, 2011 at 6:41 PM, Jérôme Verstrynge  wrote:

> My other question/remark is: there does not seem to be a downloadable
> version of Riak for Windows. Is there a technical reason for this or is it a
> 'religious' issue?

There's certainly no operating system religion at work here, simply
the limited resources of a small team.  A few different people in the
community have been working on Windows support, which we think is a
great idea -- we just currently don't have anyone spare to make
official windows releases, QA and benchmark those releases on various
windows systems, and so on.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Storing relationship data

2011-01-05 Thread Justin Sheehy

Hi, Bryan.

On Thu, Dec 30, 2010 at 8:02 PM, Bryan Nagle
 wrote:

> Our current setup, is we are using webmachine;  Client connects to
> webmachine, and webmachine connects to riak via the erlang pcb client.  So,
> if we use links, and we want the client to be aware of the relationships, we
> would still have to translate the links into the http response from
> webmachine back to the client;  or am I missing something?

You are correct with regard to standard behavior -- if there is a
layer between the client application and Riak, and you wish for the
actual links (as opposed to just the ability to traverse them) to be
visible to that client, then the intermediary must pass along the link
data as well.

There is a rarely-used alternative that might suit the scenario you
described if you find it too annoying to carry the metadata to your
client: you could set the "linkfun" bucket property to examine the
content of objects instead of metadata, and define your own custom
link serialization format to match your custom link storage format.
This would allow you to embed the links directly inside the objects
and still have mapreduce link queries work, but might break some other
things such as the HTTP interface to link walking.  I don't generally
recommend this path, as you'd be well outside the realm of "normal"
usage, but it is possible.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Getting all the Keys

2011-01-22 Thread Justin Sheehy

On Sat, Jan 22, 2011 at 3:18 PM, Alexander Sicular  wrote:

> I'll drop a phat tangent and just mention that I watched @rk's talk at Qcon
> SF 2010 the other day and am kinda crushing on how they implemented
> distributed counters in cassandra (mainlined in 0.7.1 me thinks) which,
> imho, is so choice for a riak implementation it isn't even funny. It was
> like pow pow in da face and my face got melted.

I know that a couple of people have done their own spikes on
distributed counters for Riak and have demonstrated that it's
certainly doable.

The question isn't "can it be done" as we know it can.  The tricky
questions are about which tradeoffs to make: write-performance,
read-performance, and so on.

In other words, I am in support of this sort of feature.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

1 2 >

1 - 100 of 101 matches

Mail list logo