Hi Dmitri, Thanks for your clarification. I was pretty sure that was how it would work - and so I had planned a different way of migrating to a new backend. I intended to introduce new nodes which have the eleveldb backend configured, and presumed that Riak would move data into this backend as the node joined the cluster. Then I would migrate out the bitcask nodes one-by-one.
Would this approach work? Or will I need to look at a migration tool? Matt On 10 April 2013 00:06, Dmitri Zagidulin <dzagidu...@basho.com> wrote: > Matt, > > Just for clarity - you mention that you plan to move the backend to > LevelDB before backing up old data. > I just want to caution and say - if you switch the config setting from > Bitcask to LevelDB and restart the cluster, Riak does not automatically > migrate the data for you, to the new back end. > > Meaning, if you just switch to LevelDB (without backing up data), you'll > have an empty cluster running on leveldb, and you'd have no way to access > the old data in Bitcask. Backing up and restoring data is helpful precisely > in the areas of migrating to a different back end (or to a different ring > size). > > (You probably knew this, and have a migration plan in mind already, but I > just wanted to make sure). > > If you need a good "logical backup" tool, take a look at > https://github.com/dankerrigan/riak-data-migrator (it's java-based, but > is pretty good at backing up the contents of one or more buckets to disk, > and then restoring afterwards). (As opposed to "file based backup" as > described in http://docs.basho.com/riak/latest/cookbooks/Backups/ , which > is the recommended approach for backups for a production cluster, but won't > help you in migrating to a different backend). > > Dmitri > > > On Mon, Apr 8, 2013 at 7:20 PM, Matt Black <matt.bl...@jbadigital.com>wrote: > >> All, >> >> Huge thanks for your replies. It seems to me that our approach with >> MapReduce queries has been fundamentally wrong, and that I should rewrite >> my backup script to use sequential GETs. Currently we're on the bitcask >> backend, and on our roadmap is a move over to eleveldb and the application >> of appropriate 2i across the whole dataset. Looks like that will be the >> next step - before doing any backup of old data. >> >> Matt >> >> >> >> On 9 April 2013 01:01, Dmitri Zagidulin <dzagidu...@basho.com> wrote: >> >>> Matt, >>> >>> My recommendation to you is - don't use MapReduce for this use case. >>> Fetch the objects via regular Riak GETs (using connection pooling and >>> multithreading, preferably). >>> >>> I'm assuming that you have a list of keys (either by keeping track of >>> them externally to Riak, or via a Secondary Index query or a Search query), >>> and you want to back up those objects. >>> >>> The natural inclination, once you know the keys, is to want to fetch all >>> of those objects via a single query, and MapReduce immediately comes to >>> mind. (And to most developers, writing the MR function in Javascript is >>> easier and more familiar than in Erlang). Unfortunately, as Christian >>> mentioned, it's very easy for the JS VMs to run out of resources and crash >>> or time out. In addition, I've found that rewriting the MapReduce in Erlang >>> affords only a bit more resources -- once you hit a certain number of keys >>> that you want to fetch, or a certain object size threshold, even Erlang MR >>> jobs can time out (keep in mind, while the Map phase can happen in parallel >>> on all of the nodes in a cluster, all the object values have to be >>> serialized on the single coordinating node, which becomes the bottleneck). >>> >>> The workaround for this, even though it might seem counter-intuitive, is >>> -- if you know the list of keys, fetch them using GETs. Even a naive >>> single-threaded "while loop" way of fetching the objects can often be >>> faster than a MapReduce job (for this use case), and it doesn't time out. >>> Add to that connection-pooling and multiple worker threads, and this method >>> is invariably faster. >>> >>> Dmitri >>> >>> >>> On Mon, Apr 8, 2013 at 4:27 AM, Christian Dahlqvist <christ...@basho.com >>> > wrote: >>> >>>> Hi Matt, >>>> >>>> If you have a complicated mapreduce job containing multiple phases >>>> implemented in JavaScript, you will most likely see a lot of contention for >>>> the JavaScript VMs which will cause problems. While you can tune the >>>> configuration [1], you may find that you will need a very large pool size >>>> in order to properly support your job, especially for map phases as these >>>> run in parallel. >>>> >>>> The best way to speed up the mapreduce job and get around the VM pool >>>> contention is to implement the mapreduce functions in Erlang. >>>> >>>> Best regards, >>>> >>>> Christian >>>> >>>> [1] >>>> http://docs.basho.com/riak/1.2.0/references/appendices/MapReduce-Implementation/#Configuration-Tuning-for-Javascript >>>> >>>> >>>> >>>> -------------------- >>>> Christian Dahlqvist >>>> Client Services Engineer >>>> Basho Technologies >>>> EMEA Office >>>> E-mail: christ...@basho.com >>>> Skype: c.dahlqvist >>>> Mobile: +44 7890 590 910 >>>> >>>> On 8 Apr 2013, at 08:20, Matt Black <matt.bl...@jbadigital.com> wrote: >>>> >>>> Thanks for the reply, Christian. >>>> >>>> I didn't explain well enough in my first post - the map reduce >>>> operation is merely loading a bunch of objects, and a Python script which >>>> makes the connection to Riak then will write these objects to disk. (It's >>>> probably obvious, but I'm using javascript and riak python client.) >>>> >>>> The query itself has many map phases where a composite object is built >>>> up from related objects spread across many buckets. >>>> >>>> I was hoping there may be some kind of timeout I could adjust on a >>>> per-map phase basis - clutching at straws really. >>>> >>>> Cheers >>>> Matt >>>> >>>> >>>> On 8 April 2013 17:14, Christian Dahlqvist <christ...@basho.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Without having access to the mapreduce functions you are running, I >>>>> would assume that a mapreduce job both writing data to disk as well as >>>>> deleting the written record from Riak might be quite slow. This is not >>>>> really a use case mapreduce was designed for, and when a mapreduce job >>>>> crashes or times out it is difficult to know how far along the processing >>>>> of different records it got. >>>>> >>>>> I would therefore recommend considering running this type of archiving >>>>> and delete job as an external batch process instead as it will give you >>>>> better control over the execution and avoid timeout problems. >>>>> >>>>> Best regards, >>>>> >>>>> Christian >>>>> >>>>> >>>>> >>>>> On 8 Apr 2013, at 00:49, Matt Black <matt.bl...@jbadigital.com> wrote: >>>>> >>>>> > Dear list, >>>>> > >>>>> > I'm currently getting a timeout during a single phase of a >>>>> multi-phase map reduce query. Is there anything I can do to assist this in >>>>> running? >>>>> > >>>>> > It's purpose is to backup and remove objects from Riak, so it will >>>>> run periodically during quiet times moving old data out of Riak into file >>>>> storage. >>>>> > >>>>> > Traceback (most recent call last): >>>>> > File "./tools/rolling_backup.py", line 185, in <module> >>>>> > main() >>>>> > File "./tools/rolling_backup.py", line 181, in main >>>>> > args.func(**kwargs) >>>>> > File "/srv/backup/tools/mapreduce.py", line 295, in do_map_reduce >>>>> > raise e >>>>> > Exception: >>>>> {"phase":2,"error":"timeout","input":"[<<\"cart-products\">>,<<\"cd67d7f6e2688bc2089e6fa79506ac05-2\">>,{struct,[{<<\"uid\">>,<<\"cd67d7f6e2688bc2089e6fa79506ac05\">>},{<<\"cart\">>,{struct,[{<<\"expired_ts\">>,<<\"2013-03-05T19:12:23.906228\">>},{<<\"last_updated\">>,<<\"2013-03-05T19:12:23.906242\">>},{<<\"tags\">>,{struct,[{<<\"type\">>,<<\"AB\">>}]}},{<<\"completed\">>,false},{<<\"created\">>,<<\"2013-03-04T02:10:18.638413\">>},{<<\"products\">>,[{struct,[{<<\"cost\">>,0},{<<\"bundleName\">>,<<\"Product\">>},...]},...]},...]}},...]}]","type":"exit","stack":"[{riak_kv_w_reduce,'-js_runner/1-fun-0-',3,[{file,\"src/riak_kv_w_reduce.erl\"},{line,283}]},{riak_kv_w_reduce,reduce,3,[{file,\"src/riak_kv_w_reduce.erl\"},{line,206}]},{riak_kv_w_reduce,maybe_reduce,2,[{file,\"src/riak_kv_w_reduce.erl\"},{line,157}]},{riak_pipe_vnode_worker,process_input,3,[{file,\"src/riak_pipe_vnode_worker.erl\"},{line,444}]},{riak_pipe_vnode_worker,wait_for_input,2,[{file,\"src/riak_pipe_vnode_worker.erl\"},{line,376}]},{gen_fsm,handle_msg,7,[{file,\"gen_fsm.erl\"},{line,494}]},{proc_lib,...}]"} >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > riak-users mailing list >>>>> > riak-users@lists.basho.com >>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> riak-users mailing list >>>> riak-users@lists.basho.com >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>> >>>> >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com