Any performance comparison / best practice advice for choosing a riak backend ?

2010-08-28 Thread Neville Burnell
Hi,

I'm new to riak, and have been busily reading though the wiki, watching the
videos, and catching up on the mail list, so I will have lots of questions
over the next few weeks - so sorry 

To begin, I'm curious about the characteristics of the seven backends for
riak [1]

   1. riak_kv_bitcask_backend - stores data to bitcask
   2. riak_kv_fs_backend - stores data directly to files in a nested
   directory structure on disk
   3. riak_kv_ets_backend - stores data in ETS tables (which makes it
   volatile storage, but great for debugging)
   4. riak_kv_dets_backend - stores data on-disk in DETS tables
   5. riak_kv_gb_trees_backend - stores data using Erlang gb_trees
   6. riak_kv_cache_backend - turns a bucket into a memcached-type memory
   cache, and ejects the least recently used objects either when the cache
   becomes full or the object's lease expires
   7. riak_kv_multi_backend - configure per-bucket backends

Unfortunately this amount of choice means I need to do my homework to make
an informed decision ;-) so I'd love any pointers or to hear any advice on
performance comparisons, best practices, backends for development vs
deployment etc

Kind Regards

Neville

[1]
http://wiki.basho.com/display/RIAK/How+Things+Work#HowThingsWork-Backends
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Any performance comparison / best practice advice for choosing a riak backend ?

2010-08-29 Thread Neville Burnell
Thanks Sean,

Do riak_kv_gb_trees_backend and riak_kv_fs_backend have particular
strengths/weaknesses?

Kind Regards

Neville
When would one use

On 29 August 2010 01:02, Sean Cribbs  wrote:

> Your choice should be dictated by your use-case.  In most situations,
> "riak_kv_bitcask_backend" (the default) will work for you. It stores data on
> disk in a fast (append-only) log-structured file format.  If your data is
> transient or doesn't need to persist across restarts (and needs to be fast),
> try "riak_kv_ets_backend" or "riak_kv_cache_backend"; the latter uses a
> global LRU timeout.  If you want to use several of the backends in the same
> cluster (for different buckets), use the "riak_kv_multi_backend" and
> configure each backend separately.
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Aug 28, 2010, at 5:10 AM, Neville Burnell wrote:
>
> Hi,
>
> I'm new to riak, and have been busily reading though the wiki, watching the
> videos, and catching up on the mail list, so I will have lots of questions
> over the next few weeks - so sorry 
>
> To begin, I'm curious about the characteristics of the seven backends for
> riak [1]
>
>1. riak_kv_bitcask_backend - stores data to bitcask
>2. riak_kv_fs_backend - stores data directly to files in a nested
>directory structure on disk
>3. riak_kv_ets_backend - stores data in ETS tables (which makes it
>volatile storage, but great for debugging)
>4. riak_kv_dets_backend - stores data on-disk in DETS tables
>5. riak_kv_gb_trees_backend - stores data using Erlang gb_trees
>6. riak_kv_cache_backend - turns a bucket into a memcached-type memory
>cache, and ejects the least recently used objects either when the cache
>becomes full or the object's lease expires
>7. riak_kv_multi_backend - configure per-bucket backends
>
> Unfortunately this amount of choice means I need to do my homework to make
> an informed decision ;-) so I'd love any pointers or to hear any advice on
> performance comparisons, best practices, backends for development vs
> deployment etc
>
> Kind Regards
>
> Neville
>
> [1]
> http://wiki.basho.com/display/RIAK/How+Things+Work#HowThingsWork-Backends
>  ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Practical limit on total size?

2010-09-04 Thread Neville Burnell
I guess the corollary to his question is:

Are there any size/performance details of large riak db instances and the
hardware they are running on ??

Thanks,

Neville Burnell

On 5 September 2010 14:16, Sean Cribbs  wrote:

> There are no architectural limits on the size of your entire data set, you
> are only limited by the capacity of the individual machines in your cluster.
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Sep 4, 2010, at 10:43 PM, Mojito Sorbet wrote:
>
> > I see mentions of upper size suggestions on Riak database instances in
> > the small GB ranges.  (2?  4?)   Can one reasonably expect a Riak DB to
> > scale to a couple hundred terrabytes?
> >
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search

2010-09-19 Thread Neville Burnell
Hey Mark,

Any chance of getting hold of a beta or some docs?

Kind Regards

Neville
Waiting excitedly for Riak Search

On 24 August 2010 05:51, Mark Phillips  wrote:

> Hey Nickolay,
>
> On Tue, Aug 17, 2010 at 10:46 AM, Nickolay Platonov 
> wrote:
> > Hello,
> >
> > Any information when the Riak Search will be publicly available?
> >
>
> Sometimes development of a new product takes far less time than
> expected. More often, however, it takes longer than expected, and Riak
> Search is no exception. Trust me when I say that Basho and the Search
> team are as excited to release the code as you are to get your hands
> on it.
>
> Thanks for your patience.
>
> Mark
>
> Community Manager
> Basho Technologies
> wiki.basho.com
> twitter.com/pharkmillups
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Search

2010-10-11 Thread Neville Burnell
Great work to release Riak 0.13 and Riak Search.

I'm curious to understand how much of Riak Search is pure Erlang, and when
does Riak Search use the JVM?

Also are the Riak Search Indexes and config/schema files all stored in
Bitcask, or does Riak Search use the local node file system for storage at
all?

Thanks and congratulations

Neville
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search

2010-10-11 Thread Neville Burnell
Hi Dan,

Thanks for your reply,

Does this mean that if I use the pure erlang analyser, I will have get pure
erlang search, ie, not need a JVM ?

Kind Regards

Neville

On 12 October 2010 10:49, Dan Reverri  wrote:

> Hi Neville,
>
> Riak Search uses the JVM to run analyzers. Analyzers are configured using
> the analyzer_factory option in the schema (set schema wide or per field).
> Analyzers determine how incoming data is broken down and indexed, for
> example a white space analyzer would break a document into words by
> splitting on white space characters and indexing the words.
>
> Regarding the schema, this section of the wiki should answer your question:
>
> http://wiki.basho.com/display/RIAK/Riak+Search+-+Schema#RiakSearch-Schema-DefiningaSchema
>
>
> Thanks,
> Dan
>
> Daniel Reverri
> Developer Advocate
> Basho Technologies, Inc.
> d...@basho.com
>
>
>
> On Mon, Oct 11, 2010 at 3:40 PM, Neville Burnell <
> neville.burn...@gmail.com> wrote:
>
>> Great work to release Riak 0.13 and Riak Search.
>>
>> I'm curious to understand how much of Riak Search is pure Erlang, and when
>> does Riak Search use the JVM?
>>
>> Also are the Riak Search Indexes and config/schema files all stored in
>> Bitcask, or does Riak Search use the local node file system for storage at
>> all?
>>
>> Thanks and congratulations
>>
>> Neville
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Search performance experiences ?

2010-10-13 Thread Neville Burnell
Hi,

Congratulations on releasing Riak Search - effectively implementing Lucene
and Solr in Erlang is a great effort!

Since Riak Search has been in use for some time with beta testers, I'm
wondering if Basho might share some performance insights, especially if its
possible to compare customers who were using Solr and have switched to Riak
Search, which would be our use case.

For example,  would Riak Search performance be comparable to Solr over say
1M 1K docs, assuming sufficient RAM and fast disk ?

Thanks for any performance insights from your beta testing experiences

Kind Regards

Neville
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search performance experiences ?

2010-10-18 Thread Neville Burnell
Hi Rusty,

Thanks for your reply,

I'm keen to hear any Riak Search deployment experiences you or your beta
testers are willing to share.

Please, do tell !!!

Kind Regards

Neville

On 19 October 2010 05:36, Rusty Klophaus  wrote:

> Hi Neville,
>
> Thanks! Performance comparisons are tricky business. Any time you compare a
> distributed system to a non-distributed system on a single machine, the
> non-distributed system is going to be much, much faster simply because it
> can skip all of the overhead needed to make the system distributed.
>
> So for 1M 1K docs, which can easily fit on one machine, Lucene will be much
> faster. A more apples-to-apples question, and one we haven't benchmarked
> yet, is how Riak Search compares to something like ElasticSearch. I expect
> each system to have it's own unique strengths and weaknesses depending on
> the type of documents, the ratio of reads-to-writes, the number of replicas,
> and the type of failure scenarios that are tested.
>
> We haven't done much comparison with distributed Lucene projects yet
> because Riak Search is first and foremost intended to be a tightly
> integration index into Riak KV data. The Solr interface is a nice bonus. It
> helps users familiar with Solr get started with Riak Search more easily and
> it gives users an additional way to access there data, so it is something we
> will continue to enhance over future releases.
>
> Best,
> Rusty
>
>
> On Wed, Oct 13, 2010 at 11:22 PM, Neville Burnell <
> neville.burn...@gmail.com> wrote:
>
>> Hi,
>>
>> Congratulations on releasing Riak Search - effectively implementing Lucene
>> and Solr in Erlang is a great effort!
>>
>> Since Riak Search has been in use for some time with beta testers, I'm
>> wondering if Basho might share some performance insights, especially if its
>> possible to compare customers who were using Solr and have switched to Riak
>> Search, which would be our use case.
>>
>> For example,  would Riak Search performance be comparable to Solr over say
>> 1M 1K docs, assuming sufficient RAM and fast disk ?
>>
>> Thanks for any performance insights from your beta testing experiences
>>
>> Kind Regards
>>
>> Neville
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: couchdb performace 10x: using NIF for file io

2010-10-24 Thread Neville Burnell
erlang groups link:

http://groups.google.com/group/erlang-programming/browse_thread/thread/f667a9a87ada3e7d

On 25 October 2010 08:05, NevB  wrote:

> Hi,
>
> I came across this thread in the Erlang group and thought the Riak
> team might find it interesting.
>
> Please forgive me if its not relevant to Riak
>
> Kind Regards
>
> Neville
>
>
> -- Forwarded message --
> From: Joel Reymont 
> Date: Oct 25, 12:56 am
> Subject: couchdb performace 10x: using NIF for file io
> To: Erlang Programming
>
>
> Simply switching to NIFs for file IO seems to have improved CouchDB
> write performance more than ten-fold.
>
> Compare the old graph
>
> http://graphs.mikeal.couchone.com/#/graph/62b286fbb7aa55a4b0c4cc913c0...
>
> to the new graph
>
> http://graphs.mikeal.couchone.com/#/graph/62b286fbb7aa55a4b0c4cc913c0...
>
> I was under the impression that the Erlang IO subsystem was highly
> optimized but there seems to be no limit to perfection.
>
> NIFs are a giant black hole that will subsume Erlang code as
> performance has to be improved. Start at the lowest level and keep
> moving up. All that will be left of Erlang in the end is 99.9%
> uptime, fault tolerance and supervision... of optimized C code. It's
> swell and I'm all for it!
>
> Patch is here:
>
> http://github.com/wagerlabs/couchdb/commit/23527eb8165f81e63d47b230f3...
>
> ---http://twitter.com/wagerlabs
>
> 
> erlang-questions (at) erlang.org mailing list.
> Seehttp://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscr...@erlang.org
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: RiakSearch Benchmark Test Invitation.

2010-10-28 Thread Neville Burnell
Put it on S3

On 28 October 2010 20:20, francisco treacy wrote:

> Very good idea!
>
> 2010/10/28 Prometheus WillSurvive :
> > Hi All,
> > We have prepare wikipedia database output ready to submit RiakSearch. It
> is
> > XML and described format for solr submit. Each file has 20.000 Document
> and
> > totaly 15 xml files. Each file around 44 MB.
> > You can submit all XML 's =bin/search-cmd solr wikipedia
> > /wikipedia/content-xml-out/wikipedia_1.xml
> > So you only need to submit this files to the riaksearch and than make a
> > benchmark test/tune and share your experience.
> > I would like to ask Riak Admin guys is there any place that I can share
> > these files for public access to start collaborative tests  ?
> > Second phase I can put 3 million wikipedia XML sets to ready to submit
> > riaksearch. So All we have some common benchmark and tuning parameters.
> > I hope this will help the riaksearch community to better understanding
> its
> > capability.
> > Best Regards
> >
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: RiakSearch Reached Its Limit and gave below Error ..

2010-10-31 Thread Neville Burnell
Have you increased your ulimit?

On 31 October 2010 19:23, Prometheus WillSurvive <
prometheus.willsurv...@gmail.com> wrote:

> Hi,
>
> We started  a batch index test (wikipedia)  when we reached around 600K
> docs  system gave below error..  Any idea ?
>
> We can not index any more doc in this index.
>
>
>
> =ERROR REPORT 31-Oct-2010::10:22:42 ===
> ** Too many db tables **
>
> DEBUG: riak_search_dir_indexer:197 - "{ error , Type , Error , erlang :
> get_stacktrace ( ) }"
>
>  {error,error,system_limit,
>[{ets,new,[batch,[protected,duplicate_bag]]},
> {riak_search_client,process_terms_1,5},
> {riak_search_client,process_terms_1,5},
> {riak_search_client,index_docs,3},
> {riak_search_client,index_docs,2},
> {riak_solr_search_client,run_solr_command,4},
> {solr_search,'-index_dir/2-lc$^0/1-0-',2},
> {riak_search_dir_indexer,worker_loop,5}]}
>
>
> =ERROR REPORT 31-Oct-2010::10:22:44 ===
> Error in process <0.22187.14> on node 'riaksea...@192.168.250.154' with
> exit value: {system_limit,[{riak_search_dir_indexer,worker_loop,5}]}
>
> RPC to 'riaksea...@192.168.250.154' failed: {'EXIT',
> {system_limit,
>  [{riak_search_dir_indexer,
>worker_loop,5}]}}
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak and Locks

2010-11-08 Thread Neville Burnell
Are there any plans for a Distributed Lock Service for Riak, to allow for
apps that *need* locking for some KV ?

A lease based DLM based on something like PaxosLease [1] would be a great
service.

Kind Regards

Neville

[1] http://scalien.com/pdf/PaxosLease.pdf
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak and Locks

2010-11-08 Thread Neville Burnell
Hi Justin,

Thanks for your reply,

I might try wrapping a http api around this erlang implementation of paxos
[1] just for fun.

Kind Regards

Neville

[1] https://github.com/kuenishi/gen_paxos

On 9 November 2010 14:37, Justin Sheehy  wrote:

> Hello, Neville.
>
> On Mon, Nov 8, 2010 at 10:35 PM, Neville Burnell
>  wrote:
>
> > Are there any plans for a Distributed Lock Service for Riak, to allow for
> > apps that *need* locking for some KV ?
>
> It has been discussed and agreed that it would be interesting, but
> there is nothing currently being developed in the short term to
> provide this service integrally to Riak.  If you application needs
> locking, some part of it other than Riak will need to provide that
> functionality.
>
> -Justin
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Roadmap ?

2010-11-14 Thread Neville Burnell
Hi,

Does riak have a public roadmap somewhere ? I've googled and seen mentions
of a roadmap, but failed to get any further.

Thanks,

Nev
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Roadmap ?

2010-11-14 Thread Neville Burnell
Thanks Sean.

On 15 November 2010 13:05, Sean Cribbs  wrote:

> Best way to tell is via the Bugzilla at issues.basho.com.  Milestones are
> coded by alphabetical names, the upcoming one is Dakota.  Other than the
> things listed there, there's not much.
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Nov 14, 2010, at 8:15 PM, Neville Burnell wrote:
>
> > Hi,
> >
> > Does riak have a public roadmap somewhere ? I've googled and seen
> mentions of a roadmap, but failed to get any further.
> >
> > Thanks,
> >
> > Nev
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak-search default_field

2010-11-17 Thread Neville Burnell
heh, in a previous job working with Solr, my team created the "any" field

On 18 November 2010 08:49, Wilson MacGyver  wrote:

> thanks for the info. that means for multiple fields search
> as default. We'd have to create some sort of combined field
>
>
> On Wed, Nov 17, 2010 at 4:07 PM, Dan Reverri  wrote:
> > Only 1 field can be specified for the default_field property.
>
>
> --
> Omnem crede diem tibi diluxisse supremum.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak-search default_field

2010-11-17 Thread Neville Burnell
yep. Worked really well, however our SQL database was quite mature, so few
schema changes, which might have required rebuilding the "any" field.

On 18 November 2010 09:48, Wilson MacGyver  wrote:

> were you guys using "copyField" in solr to fill "any"? :)
>
> On Wed, Nov 17, 2010 at 5:18 PM, Neville Burnell
>  wrote:
> > heh, in a previous job working with Solr, my team created the "any" field
> >
> > On 18 November 2010 08:49, Wilson MacGyver  wrote:
> >>
> >> thanks for the info. that means for multiple fields search
> >> as default. We'd have to create some sort of combined field
> >>
> >>
> >> On Wed, Nov 17, 2010 at 4:07 PM, Dan Reverri  wrote:
> >> > Only 1 field can be specified for the default_field property.
> >>
> >>
> >> --
> >> Omnem crede diem tibi diluxisse supremum.
> >>
> >> ___
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
>
>
>
> --
> Omnem crede diem tibi diluxisse supremum.
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Whole cluster times out if one node is gone

2010-11-23 Thread Neville Burnell
Just a thought ... have you verified your switch, cables, nics, etc

On 24 November 2010 09:33, Jay Adkisson  wrote:

> (many profuse apologies to Dan - hit "reply" instead of "reply all")
>
> Alrighty, I've done a little more digging.  When I throttle the writes
> heavily (2/sec) and set R and W to 1 all around, the cluster works just fine
> after I restart the node for about 15-20 seconds.  Then the read request
> hangs for about a minute, until node D disappears from connected_nodes in
> riak-admin status, at which point it returns the desired value (although
> sometimes I get a 503):
>
> --2010-11-23 13:*01:28*--  http://:8098/riak//?r=1
> Resolving ... 
> Connecting to ||:8098... connected.
> HTTP request sent, awaiting response... * *200 OK
> Length: 3684 (3.6K) [image/jpeg]
> Saving to: `?r=1'
>
> 100%[==>] 3,684   --.-K/s   in 0s
>
> 2010-11-23 13:*02:21* (49.5 MB/s) - `?r=1' saved [3684/3684]
>
> --2010-11-23 13:02:23--  http://:8098/riak//?r=1
> Resolving ... 
> Connecting to ||:8098... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 3684 (3.6K) [image/jpeg]
> Saving to: `?r=1'
>
> 100%[==>] 3,684   --.-K/s   in 0s
>
> 2010-11-23 13:02:23 (220 MB/s) - `?r=1' saved [3684/3684]
>
> Afterwards, node D comes back up and re-joins the cluster seamlessly.
>
> Any insights?
>
> --Jay
>
> On Mon, Nov 22, 2010 at 5:59 PM, Jay Adkisson  wrote:
>
>> Hey Dan,
>>
>> Thanks for the response!  I tried it again while watching `riak-admin
>> status` - basically, it takes about 30 seconds of node C being down before
>> riak realizes it's gone.  During that time, if I'm writing to the cluster at
>> all (I throttled it to 2 writes per second for testing), both writes and
>> reads hang indefinitely, and sometimes time out.
>>
>> I'm using Ripple to do the writes, and wget to test reads, all on node A
>> for now, since I know it'll be up.  I'm using the default R and W options
>> for now.
>>
>> Thanks for the help and clarification around ringready.
>>
>> --Jay
>>
>>
>> On Mon, Nov 22, 2010 at 5:15 PM, Dan Reverri  wrote:
>>
>>> Your HTTP calls should not being timing out. Are you sending requests
>>> directly to the Riak node or are you using a load balancer? How much load
>>> are you placing on node A? Is it a write only load or are there reads as
>>> well? Can you confirm "all" requests time out or is it a large subset of the
>>> requests? How large are the objects being written? Are you setting R and W
>>> in the request? Are you using a particular client (Ruby, Python, etc.)? Can
>>> you provide the output of "riak-admin status" from node A?
>>>
>>> Regarding the ringready command; that is behaving as I would expect
>>> considering a node is down.
>>>
>>> Thanks,
>>> Dan
>>>
>>> Daniel Reverri
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> d...@basho.com
>>>
>>>
>>> On Mon, Nov 22, 2010 at 4:55 PM, Jay Adkisson  wrote:
>>>
 Hey all,

 Here's what I'm seeing: I have four nodes A, B, C, and D.  I'm loading
 lots of data into node A, which is being distributed evenly across the
 nodes.  If I physically reboot node D, all my HTTP calls time out, and
 `riak-admin ringready` complains that not all nodes are up.  Is this
 intended behavior?  Is there a configuration option I can set so it fails
 more gracefully?

 --Jay

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


>>>
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak and PGM

2010-12-04 Thread Neville Burnell
Does/can Riak use PGM [1] for replicating writes ?

It seems PGM would be more efficient when the number of physical nodes
starts to get high.

Just curious!

[1] http://en.wikipedia.org/wiki/Pragmatic_General_Multicast
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak and PGM

2010-12-04 Thread Neville Burnell
A quick follow up - it seems RabbitMQ now supports 0MQ [1][2] which might be
a good fit with Riak

>From the blog post [1]:
"0MQ is bundled with OpenPGM library which implements a reliable mutlicast
protocol called PGM. The r0mq bridge thus allows to multicast messages from
RabbitMQ broker to the clients (0MQ clients to be precise — AMQP has no
multicast support). This kind of functionality is extremely useful in
scenarios where a lot of identical data is passed to many boxes on the LAN.
If a separate copy of each datum is sent to each subscriber, you can easily
exceed capacity of your network. With multicast, data is sent once only to
all the subscribers thus keeping the bandwidth usage constant even when the
number of subscribers grows."

[1] http://www.rabbitmq.com/blog/2010/10/18/rabbitmq0mq-bridge/
 <http://www.rabbitmq.com/blog/2010/10/18/rabbitmq0mq-bridge/>[2]
https://github.com/rabbitmq/rmq-0mq

<https://github.com/rabbitmq/rmq-0mq>
On 5 December 2010 17:32, Neville Burnell  wrote:

> Does/can Riak use PGM [1] for replicating writes ?
>
> It seems PGM would be more efficient when the number of physical nodes
> starts to get high.
>
> Just curious!
>
> [1] http://en.wikipedia.org/wiki/Pragmatic_General_Multicast
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak and PGM

2010-12-05 Thread Neville Burnell
And 0MQ has Erlang bindings anyhow, so rabbitMQ now required!

http://www.zeromq.org/bindings:erlang

On 5 December 2010 18:04, Neville Burnell  wrote:

> A quick follow up - it seems RabbitMQ now supports 0MQ [1][2] which might
> be a good fit with Riak
>
> From the blog post [1]:
> "0MQ is bundled with OpenPGM library which implements a reliable mutlicast
> protocol called PGM. The r0mq bridge thus allows to multicast messages from
> RabbitMQ broker to the clients (0MQ clients to be precise — AMQP has no
> multicast support). This kind of functionality is extremely useful in
> scenarios where a lot of identical data is passed to many boxes on the LAN.
> If a separate copy of each datum is sent to each subscriber, you can easily
> exceed capacity of your network. With multicast, data is sent once only to
> all the subscribers thus keeping the bandwidth usage constant even when the
> number of subscribers grows."
>
> [1] http://www.rabbitmq.com/blog/2010/10/18/rabbitmq0mq-bridge/
>  <http://www.rabbitmq.com/blog/2010/10/18/rabbitmq0mq-bridge/>[2]
> https://github.com/rabbitmq/rmq-0mq
>
> <https://github.com/rabbitmq/rmq-0mq>
> On 5 December 2010 17:32, Neville Burnell wrote:
>
>> Does/can Riak use PGM [1] for replicating writes ?
>>
>> It seems PGM would be more efficient when the number of physical nodes
>> starts to get high.
>>
>> Just curious!
>>
>>  [1] http://en.wikipedia.org/wiki/Pragmatic_General_Multicast
>>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak and PGM

2010-12-05 Thread Neville Burnell
that should be "RabbitMQ *not* required"

On 5 December 2010 19:27, Neville Burnell  wrote:

> And 0MQ has Erlang bindings anyhow, so rabbitMQ now required!
>
> http://www.zeromq.org/bindings:erlang
>
>
> On 5 December 2010 18:04, Neville Burnell wrote:
>
>> A quick follow up - it seems RabbitMQ now supports 0MQ [1][2] which might
>> be a good fit with Riak
>>
>> From the blog post [1]:
>> "0MQ is bundled with OpenPGM library which implements a reliable mutlicast
>> protocol called PGM. The r0mq bridge thus allows to multicast messages from
>> RabbitMQ broker to the clients (0MQ clients to be precise — AMQP has no
>> multicast support). This kind of functionality is extremely useful in
>> scenarios where a lot of identical data is passed to many boxes on the LAN.
>> If a separate copy of each datum is sent to each subscriber, you can easily
>> exceed capacity of your network. With multicast, data is sent once only to
>> all the subscribers thus keeping the bandwidth usage constant even when the
>> number of subscribers grows."
>>
>> [1] http://www.rabbitmq.com/blog/2010/10/18/rabbitmq0mq-bridge/
>>  <http://www.rabbitmq.com/blog/2010/10/18/rabbitmq0mq-bridge/>[2]
>> https://github.com/rabbitmq/rmq-0mq
>>
>> <https://github.com/rabbitmq/rmq-0mq>
>> On 5 December 2010 17:32, Neville Burnell wrote:
>>
>>> Does/can Riak use PGM [1] for replicating writes ?
>>>
>>> It seems PGM would be more efficient when the number of physical nodes
>>> starts to get high.
>>>
>>> Just curious!
>>>
>>>  [1] http://en.wikipedia.org/wiki/Pragmatic_General_Multicast
>>>
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak and PGM

2010-12-05 Thread Neville Burnell
Hi Bob,

Thanks for your reply.

> We've looked at these recently and they're not exactly prime time.

Yes, seems that way, although the jury has mixed assessments. Ryan rejected
0MQ for NodeJS, while Zed Shaw used 0MQ to build Mongrel2.

Kind Regards

Neville

On 5 December 2010 19:49, Bob Ippolito  wrote:

> We've looked at these recently and they're not exactly prime time.
> They don't even compile on Mac OS X without a lot of prodding, at
> least with our configuration. I think they also require some
> development version of 0mq. That said, would be more than happy if
> someone fixes these problems and builds something cool with it.
>
> I don't think there's really any communication in the Riak model that
> would benefit from better multicast. Maybe some ring stuff, but not
> the reads or writes.
>
> On Sun, Dec 5, 2010 at 3:27 PM, Neville Burnell
>  wrote:
> > And 0MQ has Erlang bindings anyhow, so rabbitMQ now required!
> > http://www.zeromq.org/bindings:erlang
> >
> > On 5 December 2010 18:04, Neville Burnell 
> wrote:
> >>
> >> A quick follow up - it seems RabbitMQ now supports 0MQ [1][2] which
> might
> >> be a good fit with Riak
> >> From the blog post [1]:
> >> "0MQ is bundled with OpenPGM library which implements a reliable
> mutlicast
> >> protocol called PGM. The r0mq bridge thus allows to multicast messages
> from
> >> RabbitMQ broker to the clients (0MQ clients to be precise — AMQP has no
> >> multicast support). This kind of functionality is extremely useful in
> >> scenarios where a lot of identical data is passed to many boxes on the
> LAN.
> >> If a separate copy of each datum is sent to each subscriber, you can
> easily
> >> exceed capacity of your network. With multicast, data is sent once only
> to
> >> all the subscribers thus keeping the bandwidth usage constant even when
> the
> >> number of subscribers grows."
> >> [1] http://www.rabbitmq.com/blog/2010/10/18/rabbitmq0mq-bridge/
> >> [2] https://github.com/rabbitmq/rmq-0mq
> >>
> >> On 5 December 2010 17:32, Neville Burnell 
> >> wrote:
> >>>
> >>> Does/can Riak use PGM [1] for replicating writes ?
> >>> It seems PGM would be more efficient when the number of physical nodes
> >>> starts to get high.
> >>> Just curious!
> >>> [1] http://en.wikipedia.org/wiki/Pragmatic_General_Multicast
> >
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: {error,<<"{precommit_fail,notfound}">>}

2010-12-15 Thread Neville Burnell
Cant you simply PUT all the objects again, ie, dont bother deleting?

On 16 December 2010 12:49, Xiaopong Tran  wrote:

> Well, that's what I had been trying to do: delete all the objects
> in the bucket and re-insert again.
>
> So now, I can't delete the objects, and there's no delete-bucket
> operation. Is that supposed to mean that I have to delete the
> entire Riack db and start everything from scratch??? I hope I
> got that part wrong.
>
> Best
>
> Xiaopong
>
> On Wed, 2010-12-15 at 22:50 +0800, Joseph Lambert wrote:
> > You need to add the hook before you add the objects you want indexed
> > when inserting KV data. The precommit hook is what triggers the
> > indexing of the objects. If it's not installed when you insert an
> > object, it's not going to index it.
> >
> >
> > From the wiki:
> >
> >
> > "Riak Search indexing of KV data must be enabled on a per-KV-bucket
> > basis. To enable indexing for a bucket, simply add the Search
> > precommit hook to that bucket's properties"
> >
> >
> > "With the precommit hook installed, Riak Search will index your data
> > each time that data is written"
> >
> >
> > I believe the delete is failing because it can't find the value in the
> > index, so the precommit hook returns {precommit_fail,notfound} error
> > (since a delete updates the meta data of the object).
> >
> >
> > If you want to see index data, the index buckets start are named like
> > _rsid_bucketname, where bucketname is the bucket you install the hook
> > on.
> >
> > - Joe Lambert
> >
> > joseph.g.lamb...@gmail.com
> >
> >
> > On Wed, Dec 15, 2010 at 7:12 PM, Xiaopong Tran
> >  wrote:
> > I installed Riak Search, added a few JSON docs, then install
> > the
> > search hook for the bucket. Now, here are the questions:
> >
> > 1) How can I list the indexes that have been created
> > for my bucket?
> >
> > 2) After installing the hook, I can't delete my object
> > anymore. Here is the error:
> >
> > 7> {ok, P} = test3:init().
> > {ok,<0.44.0>}
> > 8> riakc_pb_socket:get(P, "user_bucket", "jane").
> > {ok,{riakc_obj,"user_bucket","jane",
> >
> > <<107,206,97,96,96,96,204,96,202,5,82,44,108,85,66,103,50,
> > 152,18,25,243,88,25,...>>,
> >   [{{dict,3,16,16,8,80,48,
> >
> > {[],[],[],[],[],[],[],[],[],[],[],[],...},
> >   {{[],[],[],[],[],[],[],[],[],[],...}}},
> > <<"{\"userid\":\"jane\",\"first_name\":\"Jane
> > \",
> > \"last_name\":\"Doe\",\"gender\":\"F\",\"joined_"...>>}],
> >   undefined,undefined}}
> > 9> riakc_pb_socket:delete(P, "user_bucket", "jane").
> > {error,<<"{precommit_fail,notfound}">>}
> > 10> riakc_pb_socket:delete(P, "user_bucket", <<"jane">>).
> > {error,<<"{precommit_fail,notfound}">>}
> >
> > I can retrieve the object, but can't delete it. What's is
> > going on?
> >
> > Thanks
> >
> > Xiaopong
> >
> >
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> >
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Performance issues with small dataset

2011-01-12 Thread Neville Burnell
On 13 January 2011 12:33, Alexander Staubo  wrote:
> Here's something else that is weird. I repeated the steps above on a
> new, empty bucket, again using just 1,000 items, but after loading 1.5
> million items into a separate, empty bucket. The numbers now are very
> odd:
>
> * 4.5 seconds to list all keys.
> * 6.5 seconds to list + fetch.
> * 5.1 seconds to run map/reduce query.
>
> Why are operations on the small bucket suddenly worse in the presence
> of a separate, large bucket? Surely the key spaces are completely
> separate? Even listing keys or querying on an *empty* bucket is taking
> several seconds in this scenario.

Buckets are "virtual" containers for the purpose of setting NRW defaults.

Riak uses the Bucket+Key for hashing.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Recap for Jan. 12 - 13.

2011-01-14 Thread Neville Burnell
Dumb Question: could Riak be changed to perform read repair before
responding, to improve consistency of response?

On 15 January 2011 11:12, Sean Cribbs  wrote:

>
>
> Crap, the second after I hit "send" the lightbulb goes on!  Why is that?
>
> The quorum _was_ met (all vnodes just migrated to the one machine) but
> since some of them were fail-overs they didn't have the value yet (or the
> wrong value)?  In this case a read repair happened and subsequent gets
> worked.
>
>
> Your understanding is correct. However, when I say "quorum was met" I
> usually mean that "it had R successful replies". Minor semantic quibble.
>
> You are correct in saying that the wiki is misleading -- read repair
> happens when any successful reply reaches the FSM, even if "not found" was
> returned to the client, that is, if quorum was not met. We'll get that
> fixed.
>
> I'm still dark on the second question.
>
>
>> 2) Why doesn't r=1 work?
>>
>> In the IRC session, you claimed that r=1 would not have helped this
>> problem.  Just like the OP, this confused me.  You then went on to say it
>> was because of some optimization and then mentioned a "basic quorum."
>>
>> I took a few minutes to think about this and the only conclusion I came to
>> is that when r=1 you will treat the first response as the final response,
>> and in this case the notfound response will always come back first?  I'm not
>> sure if what I just said makes sense but I would have expected r=1 to work,
>> just like the OP.  I'll admit that I still haven't read all the wiki docs
>> yet (but I've read Read Repair 3 times now), so I'd be happy to hear RTFM.
>>
>
> A number of months ago, we ran into some issues with a cluster where "not
> found" responses were not returning in a reasonable amount of time,
> especially when R=1. That is, the requests took MUCH longer than a
> SUCCESSFUL read. We determined that this occurred because one of the
> partitions was too busy to reply, causing the request timeout to expire.  So
> we added a special case called "basic quorum" (n_val/2 + 1) that is invoked
> only when receiving a "not found" response from a replica.  The idea is that
> if a simple majority of the replica partitions report "not found", it's
> probably not there.  This way, you don't sit around waiting for the last
> lonely partition to reply when R=1 (and your successful reads are still fast
> because you only wait for one replica).  It's a tradeoff of availability:
> returning a potentially incorrect response vs. appearing unavailable (timing
> out). We chose the former.
>
> Hope that helps,
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
>
>
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Recap for Jan. 12 - 13.

2011-01-14 Thread Neville Burnell
Or to include a "read repair in progress" header to indicate the request
should be retried?

On 15 January 2011 11:33, Neville Burnell  wrote:

> Dumb Question: could Riak be changed to perform read repair before
> responding, to improve consistency of response?
>
> On 15 January 2011 11:12, Sean Cribbs  wrote:
>
>>
>>
>> Crap, the second after I hit "send" the lightbulb goes on!  Why is that?
>>
>> The quorum _was_ met (all vnodes just migrated to the one machine) but
>> since some of them were fail-overs they didn't have the value yet (or the
>> wrong value)?  In this case a read repair happened and subsequent gets
>> worked.
>>
>>
>> Your understanding is correct. However, when I say "quorum was met" I
>> usually mean that "it had R successful replies". Minor semantic quibble.
>>
>> You are correct in saying that the wiki is misleading -- read repair
>> happens when any successful reply reaches the FSM, even if "not found" was
>> returned to the client, that is, if quorum was not met. We'll get that
>> fixed.
>>
>> I'm still dark on the second question.
>>
>>
>>> 2) Why doesn't r=1 work?
>>>
>>> In the IRC session, you claimed that r=1 would not have helped this
>>> problem.  Just like the OP, this confused me.  You then went on to say it
>>> was because of some optimization and then mentioned a "basic quorum."
>>>
>>> I took a few minutes to think about this and the only conclusion I came
>>> to is that when r=1 you will treat the first response as the final response,
>>> and in this case the notfound response will always come back first?  I'm not
>>> sure if what I just said makes sense but I would have expected r=1 to work,
>>> just like the OP.  I'll admit that I still haven't read all the wiki docs
>>> yet (but I've read Read Repair 3 times now), so I'd be happy to hear RTFM.
>>>
>>
>> A number of months ago, we ran into some issues with a cluster where "not
>> found" responses were not returning in a reasonable amount of time,
>> especially when R=1. That is, the requests took MUCH longer than a
>> SUCCESSFUL read. We determined that this occurred because one of the
>> partitions was too busy to reply, causing the request timeout to expire.  So
>> we added a special case called "basic quorum" (n_val/2 + 1) that is invoked
>> only when receiving a "not found" response from a replica.  The idea is that
>> if a simple majority of the replica partitions report "not found", it's
>> probably not there.  This way, you don't sit around waiting for the last
>> lonely partition to reply when R=1 (and your successful reads are still fast
>> because you only wait for one replica).  It's a tradeoff of availability:
>> returning a potentially incorrect response vs. appearing unavailable (timing
>> out). We chose the former.
>>
>> Hope that helps,
>>
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>>
>>
>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Fwd: Riak Recap for Jan. 12 - 13.

2011-01-14 Thread Neville Burnell
oops, forgot to include the list

>> This sucks, so as an optimization you decided that if a majority of the
cluster thinks it's X, well then it must be X!  I'm not sure I explained
that well, but I'm sure I understand it now :)

All true, except when Riak knows X is out of date, and Y is the correct
value, and forces read repair with Y, but returns X which sucks, because a
KV can be Found, Not Found, and Found, when you need it to
be continuously found.

Note, I dont mind that an out of date version is returned by Riak - that I
can handle. But a 404 for a KV that actually exists is a problem for me.



 On 15 January 2011 11:49, Ryan Zezeski  wrote:

>
>
> On Fri, Jan 14, 2011 at 7:12 PM, Sean Cribbs  wrote:
>
>>
>>
>> Crap, the second after I hit "send" the lightbulb goes on!  Why is that?
>>
>> The quorum _was_ met (all vnodes just migrated to the one machine) but
>> since some of them were fail-overs they didn't have the value yet (or the
>> wrong value)?  In this case a read repair happened and subsequent gets
>> worked.
>>
>>
>> Your understanding is correct. However, when I say "quorum was met" I
>> usually mean that "it had R successful replies". Minor semantic quibble.
>>
>> You are correct in saying that the wiki is misleading -- read repair
>> happens when any successful reply reaches the FSM, even if "not found" was
>> returned to the client, that is, if quorum was not met. We'll get that
>> fixed.
>>
>> I'm still dark on the second question.
>>
>>
>>> 2) Why doesn't r=1 work?
>>>
>>> In the IRC session, you claimed that r=1 would not have helped this
>>> problem.  Just like the OP, this confused me.  You then went on to say it
>>> was because of some optimization and then mentioned a "basic quorum."
>>>
>>> I took a few minutes to think about this and the only conclusion I came
>>> to is that when r=1 you will treat the first response as the final response,
>>> and in this case the notfound response will always come back first?  I'm not
>>> sure if what I just said makes sense but I would have expected r=1 to work,
>>> just like the OP.  I'll admit that I still haven't read all the wiki docs
>>> yet (but I've read Read Repair 3 times now), so I'd be happy to hear RTFM.
>>>
>>
>> A number of months ago, we ran into some issues with a cluster where "not
>> found" responses were not returning in a reasonable amount of time,
>> especially when R=1. That is, the requests took MUCH longer than a
>> SUCCESSFUL read. We determined that this occurred because one of the
>> partitions was too busy to reply, causing the request timeout to expire.  So
>> we added a special case called "basic quorum" (n_val/2 + 1) that is invoked
>> only when receiving a "not found" response from a replica.  The idea is that
>> if a simple majority of the replica partitions report "not found", it's
>> probably not there.  This way, you don't sit around waiting for the last
>> lonely partition to reply when R=1 (and your successful reads are still fast
>> because you only wait for one replica).  It's a tradeoff of availability:
>> returning a potentially incorrect response vs. appearing unavailable (timing
>> out). We chose the former.
>>
>> Hope that helps,
>>
>
>
> Reading your explanation made me realize it's because I'm mucking up
> the semantics of "quorum."  It was previously my understanding that if R=1
> then you only need a quorum of 1 vnode, where a quorum is simply defined as
> a response.  Which would mean that the first reply (whether notfound or a
> value) would be considered the cluster value.  However, as you subtly hinted
> to above, quorum does not mean that, i.e. it's more than just a response.
>  It's that R vnodes found _a_ value and agreed on it's contents.  Going back
> to the case of R=1, N=3, and the value is missing on 2 of it's preferred
> vnodes it means that the request will take as long as the longest vnode to
> respond, even if 2 vnodes reply immediately with no value.  This sucks, so
> as an optimization you decided that if a majority of the cluster thinks it's
> X, well then it must be X!  I'm not sure I explained that well, but I'm sure
> I understand it now :)
>
>
> -Ryan
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Getting all the Keys

2011-01-22 Thread Neville Burnell
>As of Riak 0.14 your m/r can filter on key name. I would highly recommend
that your data architecture take this into account by using keys that have
meaningful names.

>>This will allow you to not scan every key in your cluster.
Is this part true?

I understood that key filtering just means you dont have to fetch the
'value' from the backend (bitcask or innostore). How would it help wrt to
scanning every key? Without a 'secondary index/set' somewhere, you would
still need to scan every key in the cluster to find all the keys that match
your filter.

Kind Regards

Nev

On 23 January 2011 03:31, Alexander Sicular  wrote:

> Hi Thomas,
>
> This is a topic that has come up many times. Lemme just hit a couple of
> high notes in no particular order:
>
> - If you must do a list keys op on a bucket, you must must must use
> "?keys=stream". True will block on the coordinating node until all nodes
> return their keys. Stream will start sending keys as soon as the first node
> returns.
>
> - "list keys" is one of the most expensive native operations you can
> perform in Riak. Not only does it do a full key scan of all the keys in your
> bucket, but all the keys in your cluster. It is obnoxiously expensive and
> only more so as the number of keys in your cluster grows. There has been
> discussions about changing this but everything comes with a cost (more open
> file descriptors) and I do not believe a decision has been made yet.
>
> -Riak is in no way a relational system. It is, in fact, about as opposite
> as you can get. Incidentally, "select *" is generally not recommended in the
> Kingdom of Relations and regarded as wasteful. You need a bit of a mind
> shift from relational world to have success with nosql in general and Riak
> in particular.
>
> -There are no native indices in Riak. By default Riak uses the bitcask
> backend. Bitcask has many advantages but one disadvantage is that all keys
> (key length + a bit of overhead) must fit in ram.
>
> -Do not use "?keys=true". Your computer will melt. And then your face.
>
> -As of Riak 0.14 your m/r can filter on key name. I would highly recommend
> that your data architecture take this into account by using keys that have
> meaningful names. This will allow you to not scan every key in your cluster.
>
> -Buckets are analogous to relational tables but only just. In Riak, you can
> think of a bucket as a namespace holder (it is used as part of the default
> circular hash function) but primarily as a mechanism to differentiate system
> settings from one group of keys to the next.
>
> -There is no penalty for unlimited buckets except for when their settings
> deviate from the system defaults. By settings I mean things like hooks,
> replication values and backends among others.
>
> -One should list keys by truth if one enjoys sitting in parking lots on the
> freeway on a scorching summers day or perhaps waiting in a TSA line at your
> nearest international point of embarkation surrounded by octomom families
> all the while juggling between the grope or the pr0n slideshow. If that is
> for you, use "?keys=true".
>
> -Virtually everything in Riak is transient. Meaning, for the most part (not
> including the 60 seconds or so of m/r cache), there is no caching going on
> in Riak outside of the operating system. Ie. your subsequent queries will do
> more or less the same work as their predecessors. You need to cache your own
> results if you want to reuse them... quickly.
>
>
>
> Oh, there's more but I'm pretty jelloed from last night. Welcome to the
> fold, Thomas. Can I call you Tom?
>
> Cheers,
> -Alexander Sicular
>
> @siculars
>
> On Jan 22, 2011, at 10:19 AM, Thomas Burdick wrote:
>
> > I've been playing around with riak lately as really my first usage of a
> distributed key/value store. I quite like many of the concepts and
> possibilities of Riak and what it may deliver, however I'm really stuck on
> an issue.
> >
> > Doing the equivalent of a select * from sometable in riak is seemingly
> slow. As a quick test I tried...
> >
> > http://localhost:8098/riak/mytable?keys=true
> >
> > Before even iterating over the keys this was unbearably slow already.
> This took almost half a second on my machine where mytable is completely
> empty!
> >
> > I'm a little baffled, I would assume that getting all the keys of a table
> is an incredibly common task?  How do I get all the keys of a table quickly?
> By quickly I mean a few milliseconds or less as I would expect of even a
> "slow" rdbms with an empty table, even some tables with 1000's of items can
> get all the primary keys of a sql table in a few milliseconds.
> >
> > Tom Burdick
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-us