Register Jackson Module with Riak-Java

2013-03-06 Thread Justin Long
Hello Riak-Java users.

Just trying to figure out how I can register the Scala Jackson module with
Jackson so that when Riak goes to convert objects to be saved in the
database it will pick up on the included module. In the README in
https://github.com/FasterXML/jackson-module-scala, it's rather simple but I
have a feeling there's more to this problem when using Riak-Java.

Anyone have any experience with this?

Thanks!

p.s.: take a look at
http://stackoverflow.com/questions/15236140/jackson-cannot-map-scala-list-in-riak-clientif
you need some background.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Wiping data directory break vnodes with read_block_error

2013-03-06 Thread Jonas Lindmark
Hi, tested and while only wiping leveldb and ring helped ultimately I
modified my wipe script to

riak stop
rm -rf /var/lib/riak/ring/*
rm -rf /var/lib/riak/leveldb/*
riak start
riak-admin wait-for-service riak_kv riak@127.0.0.1

Which worked like a charm.

Regards
Jonas


On 1 March 2013 20:46, Jonas Lindmark  wrote:

> Thanks for the advice, I'll be sure to try it out soon and report back.
>
> /jonas
> On Feb 28, 2013 7:01 PM, "Brian Shumate"  wrote:
>
>> Hello Jonas,
>>
>> I think the issue is that you actually wiped out too much by removing
>> the entire /var/lib/riak directory.
>>
>> Typically when resetting the data directories on a Riak node, you'd remove
>> the contents of only those backend directories which correspond to the
>> ones
>> defined in your app.config.
>>
>> For example, if you are using the Bitcask backend, you'd want to only do
>> something like:
>>
>> riak stop
>> sudo rm -rf /var/lib/riak/bitcask/*
>> riak start
>>
>> This removes all Bitcask data but preserves the node's ring state.
>>
>> If you wanted to reset _both_ the Bitcask data and the ring state you'd
>> do:
>>
>> riak stop
>> sudo rm -rf /var/lib/riak/bitcask/*
>> sudo rm -rf /var/lib/riak/ring/*
>> riak start
>>
>> This would essentially reset the node back to a new state while retaining
>> any configuration changes you already specified in app.config or vm.args.
>>
>> I hope this helps.
>>
>> Regards,
>>
>> Brian Shumate
>>
>> On Feb 28, 2013, at 2:55 AM, Jonas Lindmark wrote:
>>
>> > Hi,
>> >
>> > I'm looking for a way to manually wipe the data off of a single-node
>> riak installation. I'm using riak (1.3.0 2013-02-19) Debian x86_64 with the
>> riak_kv_eleveldb_backend.
>> >
>> > What I'm currently doing is:
>> >
>> >   • riak stop
>> >   • rm -rf /var/lib/riak
>> >   • mkdir riak
>> >   • chown riak:riak riak
>> >   • start riak
>> > Riak starts ok but a riak-admin vnode-status gives me:
>> >
>> > Backend: riak_kv_eleveldb_backend
>> > Status:
>> > [{stats,<<"   Compactions\nLevel  Files
>> Size(MB) Time(sec) Read(MB)
>> Write(MB)\n--\n">>},
>> >  {read_block_error,<<"0">>}]
>> >
>> > For each vnode.
>> >
>> > It seems to me that there is something more I need to wipe for this to
>> work but I can't find which files that would be.
>> >
>> > /Jonas
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>


-- 
/Jonas
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


after raising of n_val all keys exists multiple times in ?keys=true

2013-03-06 Thread Simon Effenberg
Hi,

we changed the n_val of a bucket from 3 to 12. If we are now doing this:

riak:8098/riak/config?keys=true
or
riak:8098/buckets/config/keys?keys=true

we get some keys multiple times. Getting the content works
well but we can't rely on the output (or have to sort/uniq the output).

Is this a normal behavior or is it a bug? (Riak 1.3.0 is used with
eleveldb as backend).

Cheers,
Simon

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Protection from listing buckets and other rogue queries

2013-03-06 Thread John Daily
I've been asking our engineers for help with this, and here's what I've found...

In theory, you could use the standard OTP mechanism[1] via the console to stop 
and restart the riak_kv application to stop a (non-MapReduce) query, but that 
has not worked across all versions of Riak. The rolling cluster restart seems 
to be your best bet.


Regarding the HTTP API...

I do not recommend doing this without careful testing, because it doesn't 
appear to be something we or our customers have done, but you should be able to 
comment out the http listener configuration item under riak_core in app.config 
on each node and then restart. In my very, very limited testing, I did have one 
node that seemed to die quietly after making the change, but I've not been able 
to reproduce it and the logs didn't have anything useful.

The administrative console appears to work properly despite the change; the 
only functionality you should lose would be link walking.


It is possible to disable the Webmachine routes that expose bucket and key 
lists, but there's no built-in way to make that configuration change persist 
across restarts, so I haven't looked into the mechanism for that.


[1] application:stop(riak_kv), followed by application:start(riak_kv)

-John Daily
Technical Evangelist
jda...@basho.com



On Mar 5, 2013, at 10:01 AM, Chris Read  wrote:

> Greetings all...
> 
> We have had a situation where someone ran a List Buckets query on the HTTP 
> interface on a large cluster. This caused the whole system to become 
> unresponsive and the only way we could think of to stop the load on the 
> system was to restart the whole cluster. 
> 
> I've spent the morning trawling through the documentation, and can't find 
> answers to the following:
> 
> - Is it possible, and if so how does one go about terminating a query like 
> this? I can see things like Pid in riak-admin top, what can I do with it?
> - Can we run useful things over HTTP like the /admin console but have the 
> rest of the HTTP API disabled?
> 
> Thanks,
> 
> Chris
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Protection from listing buckets and other rogue queries

2013-03-06 Thread John E. Vincent
Just a thought but we've been working on disabling certain API
operations at the proxy level. We have a subsystem that uses riak that
should NEVER see a DELETE call ever and we're planning on guaranteeing
that by blocking it at the proxy level.

Combined with the actual nodes being inaccessible in front of the
proxy it seems like this would work. Fair warning I haven't yet DONE
this. It's just on my todo.

On Wed, Mar 6, 2013 at 12:27 PM, John Daily  wrote:
> I've been asking our engineers for help with this, and here's what I've 
> found...
>
> In theory, you could use the standard OTP mechanism[1] via the console to 
> stop and restart the riak_kv application to stop a (non-MapReduce) query, but 
> that has not worked across all versions of Riak. The rolling cluster restart 
> seems to be your best bet.
>
>
> Regarding the HTTP API...
>
> I do not recommend doing this without careful testing, because it doesn't 
> appear to be something we or our customers have done, but you should be able 
> to comment out the http listener configuration item under riak_core in 
> app.config on each node and then restart. In my very, very limited testing, I 
> did have one node that seemed to die quietly after making the change, but 
> I've not been able to reproduce it and the logs didn't have anything useful.
>
> The administrative console appears to work properly despite the change; the 
> only functionality you should lose would be link walking.
>
>
> It is possible to disable the Webmachine routes that expose bucket and key 
> lists, but there's no built-in way to make that configuration change persist 
> across restarts, so I haven't looked into the mechanism for that.
>
>
> [1] application:stop(riak_kv), followed by application:start(riak_kv)
>
> -John Daily
> Technical Evangelist
> jda...@basho.com
>
>
>
> On Mar 5, 2013, at 10:01 AM, Chris Read  wrote:
>
>> Greetings all...
>>
>> We have had a situation where someone ran a List Buckets query on the HTTP 
>> interface on a large cluster. This caused the whole system to become 
>> unresponsive and the only way we could think of to stop the load on the 
>> system was to restart the whole cluster.
>>
>> I've spent the morning trawling through the documentation, and can't find 
>> answers to the following:
>>
>> - Is it possible, and if so how does one go about terminating a query like 
>> this? I can see things like Pid in riak-admin top, what can I do with it?
>> - Can we run useful things over HTTP like the /admin console but have the 
>> rest of the HTTP API disabled?
>>
>> Thanks,
>>
>> Chris
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: after raising of n_val all keys exists multiple times in ?keys=true

2013-03-06 Thread Mark Phillips
Hey Simon,

A few questions for starters:

How many Riak nodes are in your cluster?
Is your R still the default of 2?
Out of curiosity, what's the use case for an n_val of 12? I think the
highest I've ever seen is 5 or 6. :)

Mark

On Wed, Mar 6, 2013 at 10:28 AM, Simon Effenberg
 wrote:
> Hi,
>
> we changed the n_val of a bucket from 3 to 12. If we are now doing this:
>
> riak:8098/riak/config?keys=true
> or
> riak:8098/buckets/config/keys?keys=true
>
> we get some keys multiple times. Getting the content works
> well but we can't rely on the output (or have to sort/uniq the output).
>
> Is this a normal behavior or is it a bug? (Riak 1.3.0 is used with
> eleveldb as backend).
>
> Cheers,
> Simon
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.4.0

2013-03-06 Thread Ryan Zezeski
Hello Riak Users,

Today I'm pleased to announce the 0.4.0 release of Yokozuna.  This is a
small release in terms of features added, as there are no new features, but
an important release for reasons enumerated below.

* Performance improvements to Solr's distributed search thus improving
performance of Yokozuna queries [1] [2] [3].

* This release is based off Riak 1.3.0.  This release is essentially Riak
1.3.0 with the Yokozuna bits added to it.

* Yokozuna has moved from my personal GitHub account into the Basho
organization.  The prototype status is still in effect but this is a very
important step towards the goal of merging Yokozuna into Riak proper.

release notes:

https://github.com/basho/yokozuna/blob/v0.4.0/docs/RELEASE_NOTES.md

instructions to deploy on ec2:

https://github.com/basho/yokozuna/blob/c3a1cad34f65f1f5f1d416f3f25b2ab5254a583a/docs/EC2.md

source package:

http://s3.amazonaws.com/yzami/pkgs/src/riak-yokozuna-0.4.0-src.tar.gz

-Z

[1] Yokozuna pull-request: https://github.com/basho/yokozuna/pull/26

[2] Upstream patch to Solr: https://issues.apache.org/jira/browse/SOLR-4509

[3] I discuss this change in depth:
http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak 1.3 wont start on Macos

2013-03-06 Thread István
I did not.

uname -r
11.4.2

I.


On Fri, Mar 1, 2013 at 9:09 AM, Mark Phillips  wrote:

> Hey Istvan,
>
> Did you happen to figure this out? I may have missed this, but what
> version of osx?
>
> Mark
>
>
> On Thursday, February 28, 2013, István wrote:
>
>> Hi Riak Users,
>>
>> Here is what I have:
>>
>> ::riak]ulimit -a
>> core file size  (blocks, -c) 0
>> data seg size   (kbytes, -d) unlimited
>> file size   (blocks, -f) unlimited
>> max locked memory   (kbytes, -l) unlimited
>> max memory size (kbytes, -m) unlimited
>> open files  (-n) 131072
>> pipe size(512 bytes, -p) 1
>> stack size  (kbytes, -s) 8192
>> cpu time   (seconds, -t) unlimited
>> max user processes  (-u) 709
>> virtual memory  (kbytes, -v) unlimited
>>
>> erl -v
>> Erlang R15B03 (erts-5.9.3.1) [source] [64-bit] [smp:4:4]
>> [async-threads:0] [hipe] [kernel-poll:false] [dtrace]
>>
>> The error:
>>
>> 2013-02-28 14:24:38.594 [error] <0.84.0> gen_server memsup terminated
>> with reason: m
>> aximum number of file descriptors exhausted, check ulimit -n
>> 2013-02-28 14:24:38.603 [error] <0.84.0> CRASH REPORT Process memsup with
>> 0 neighbours exited with reason: maximum number of file descriptors
>> exhausted, check ulimit -n in gen_server:terminate/6 line 747
>> 2013-02-28 14:24:38.612 [info] <0.994.0>@riak_kv_app:prep_stop:165
>> Stopping application riak_kv - marked service down.
>> 2013-02-28 14:24:38.619 [error] <0.82.0> Supervisor os_mon_sup had child
>> memsup started with memsup:start_link() at <0.84.0> exit with reason
>> maximum number of file descriptors exhausted, check ulimit -n in context
>> child_terminated
>> 2013-02-28 14:24:38.626 [info] <0.994.0>@riak_kv_app:prep_stop:169
>> Unregistered pb services
>> 2013-02-28 14:24:38.637 [error] <0.4225.0> gen_server memsup terminated
>> with reason: maximum number of file descriptors exhausted, check ulimit -n
>>
>> I guess 131072 files should be enough, this must be something else.
>>
>> Does anybody have an idea what?
>>
>> thank you in advance,
>>
>> Istvan
>> --
>> the sun shines for all
>>
>>
>>


-- 
the sun shines for all
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Protection from listing buckets and other rogue queries

2013-03-06 Thread Chris Read
Thanks Johns...

Our main interface into Riak is via Erlang, we just like the HTTP for the
Big Green Check mark that tells us the cluster is ok.

What we'll probably do for now then is turn off the HTTP(S) interfaces and
write a little clone of sorts for the status.

It would be good if looking at least one of these could make it into the
backlog - being able to accidentally DOS yourself is a bit of a weakness...

Chris

On Wed, Mar 6, 2013 at 5:38 PM, John E. Vincent <
lusis.org+riak-us...@gmail.com> wrote:

> Just a thought but we've been working on disabling certain API
> operations at the proxy level. We have a subsystem that uses riak that
> should NEVER see a DELETE call ever and we're planning on guaranteeing
> that by blocking it at the proxy level.
>
> Combined with the actual nodes being inaccessible in front of the
> proxy it seems like this would work. Fair warning I haven't yet DONE
> this. It's just on my todo.
>
> On Wed, Mar 6, 2013 at 12:27 PM, John Daily  wrote:
> > I've been asking our engineers for help with this, and here's what I've
> found...
> >
> > In theory, you could use the standard OTP mechanism[1] via the console
> to stop and restart the riak_kv application to stop a (non-MapReduce)
> query, but that has not worked across all versions of Riak. The rolling
> cluster restart seems to be your best bet.
> >
> >
> > Regarding the HTTP API...
> >
> > I do not recommend doing this without careful testing, because it
> doesn't appear to be something we or our customers have done, but you
> should be able to comment out the http listener configuration item under
> riak_core in app.config on each node and then restart. In my very, very
> limited testing, I did have one node that seemed to die quietly after
> making the change, but I've not been able to reproduce it and the logs
> didn't have anything useful.
> >
> > The administrative console appears to work properly despite the change;
> the only functionality you should lose would be link walking.
> >
> >
> > It is possible to disable the Webmachine routes that expose bucket and
> key lists, but there's no built-in way to make that configuration change
> persist across restarts, so I haven't looked into the mechanism for that.
> >
> >
> > [1] application:stop(riak_kv), followed by application:start(riak_kv)
> >
> > -John Daily
> > Technical Evangelist
> > jda...@basho.com
> >
> >
> >
> > On Mar 5, 2013, at 10:01 AM, Chris Read  wrote:
> >
> >> Greetings all...
> >>
> >> We have had a situation where someone ran a List Buckets query on the
> HTTP interface on a large cluster. This caused the whole system to become
> unresponsive and the only way we could think of to stop the load on the
> system was to restart the whole cluster.
> >>
> >> I've spent the morning trawling through the documentation, and can't
> find answers to the following:
> >>
> >> - Is it possible, and if so how does one go about terminating a query
> like this? I can see things like Pid in riak-admin top, what can I do with
> it?
> >> - Can we run useful things over HTTP like the /admin console but have
> the rest of the HTTP API disabled?
> >>
> >> Thanks,
> >>
> >> Chris
> >> ___
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: {error, notfound} items in mapreduce during ownership handoff

2013-03-06 Thread Jeremy Raymond
So I should expect {error, notfound} inputs to map jobs while ownership
handoff is in progress? Are the not found items actually unavailable during
handoff or is this just not found on the old node, but will be picked up by
the new node during the same mapreduce job?

--
Jeremy


On Thu, Feb 28, 2013 at 11:28 AM, Jeremy Raymond wrote:

> Yesterday I added a new node to my cluster. During the time when ownership
> handoff was happening (several hours of work) mapreduce map functions were
> receiving {error, notfound} as inputs. My Erlang mapred functions weren't
> designed to handle this. They hadn't encountered this before during normal
> operation. After the ownership handoff process completed the {error,
> notfound} inputs have stopped.
>
> Any explanations for the {error, notfound} inputs during ownership
> handoff? Is this because a node is attempting to process an object now
> moved to another node? If this is the case would the notfound object be
> found on the other node in the same mapreduce job (i.e. still visible to
> the overall mapred process)? Should I assume {error, notfound} inputs for
> all mapred jobs as a valid possible input and always handle it?
>
> I've also accumulated about 50MB of "Pipe worker startup failed:fitting
> was gone before startup" on each node during the ownership_transfer
> process. These messages are benign?
>
> Thanks a lot.
>
> --
> Jeremy
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: {error, notfound} items in mapreduce during ownership handoff

2013-03-06 Thread John Daily
I'm sorry this went unanswered, Jeremy.  Thanks for the follow up.

Your code needs to be able to handle notfound errors and tombstones[1] 
regardless of ownership handoff. The coverage for the 2i or listkeys input will 
be calculated up front, with the work distributed to a node where the key is 
expected to be found, but it's always possible that the node selected may not 
have the data you want due to network or system errors that haven't yet healed.

Both the notfound and the fitting error in the logs (which is benign, and 
related to the notfound problem) should be less common under Riak 1.3, although 
ownership handoff will still exacerbate the problem.

[1] https://github.com/basho/riak_kv/issues/358

-John Daily
Technical Evangelist
jda...@basho.com


On Mar 6, 2013, at 3:14 PM, Jeremy Raymond  wrote:

> So I should expect {error, notfound} inputs to map jobs while ownership 
> handoff is in progress? Are the not found items actually unavailable during 
> handoff or is this just not found on the old node, but will be picked up by 
> the new node during the same mapreduce job?
> 
> --
> Jeremy
> 
> 
> On Thu, Feb 28, 2013 at 11:28 AM, Jeremy Raymond  wrote:
> Yesterday I added a new node to my cluster. During the time when ownership 
> handoff was happening (several hours of work) mapreduce map functions were 
> receiving {error, notfound} as inputs. My Erlang mapred functions weren't 
> designed to handle this. They hadn't encountered this before during normal 
> operation. After the ownership handoff process completed the {error, 
> notfound} inputs have stopped.
> 
> Any explanations for the {error, notfound} inputs during ownership handoff? 
> Is this because a node is attempting to process an object now moved to 
> another node? If this is the case would the notfound object be found on the 
> other node in the same mapreduce job (i.e. still visible to the overall 
> mapred process)? Should I assume {error, notfound} inputs for all mapred jobs 
> as a valid possible input and always handle it?
> 
> I've also accumulated about 50MB of "Pipe worker startup failed:fitting was 
> gone before startup" on each node during the ownership_transfer process. 
> These messages are benign?
> 
> Thanks a lot.
> 
> --
> Jeremy
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: {error, notfound} items in mapreduce during ownership handoff

2013-03-06 Thread Jeremy Raymond
Thanks for the response. To handle not founds and tombstones I need this in
every map function?

map_something({error, notfound}, _, _) ->
[];
map_something(RiakObj, _, _) ->
Metadata = riak_object:get_metadata(RiakObj),
case dict:is_key(<<"X-Riak-Deleted">>, Metadata) of
true ->
[]; % I have a tombstone
false ->
[ok] % I have a valid item
end.

Is there a better or built-in way to do this filtering?

--
Jeremy


On Wed, Mar 6, 2013 at 4:08 PM, John Daily  wrote:

> I'm sorry this went unanswered, Jeremy.  Thanks for the follow up.
>
> Your code needs to be able to handle notfound errors and tombstones[1]
> regardless of ownership handoff. The coverage for the 2i or listkeys input
> will be calculated up front, with the work distributed to a node where the
> key is expected to be found, but it's always possible that the node
> selected may not have the data you want due to network or system errors
> that haven't yet healed.
>
> Both the notfound and the fitting error in the logs (which is benign, and
> related to the notfound problem) should be less common under Riak 1.3,
> although ownership handoff will still exacerbate the problem.
>
> [1] https://github.com/basho/riak_kv/issues/358
>
> -John Daily
> Technical Evangelist
> jda...@basho.com
>
>
> On Mar 6, 2013, at 3:14 PM, Jeremy Raymond  wrote:
>
> So I should expect {error, notfound} inputs to map jobs while ownership
> handoff is in progress? Are the not found items actually unavailable during
> handoff or is this just not found on the old node, but will be picked up by
> the new node during the same mapreduce job?
>
> --
> Jeremy
>
>
> On Thu, Feb 28, 2013 at 11:28 AM, Jeremy Raymond wrote:
>
>> Yesterday I added a new node to my cluster. During the time when
>> ownership handoff was happening (several hours of work) mapreduce map
>> functions were receiving {error, notfound} as inputs. My Erlang mapred
>> functions weren't designed to handle this. They hadn't encountered this
>> before during normal operation. After the ownership handoff process
>> completed the {error, notfound} inputs have stopped.
>>
>> Any explanations for the {error, notfound} inputs during ownership
>> handoff? Is this because a node is attempting to process an object now
>> moved to another node? If this is the case would the notfound object be
>> found on the other node in the same mapreduce job (i.e. still visible to
>> the overall mapred process)? Should I assume {error, notfound} inputs for
>> all mapred jobs as a valid possible input and always handle it?
>>
>> I've also accumulated about 50MB of "Pipe worker startup failed:fitting
>> was gone before startup" on each node during the ownership_transfer
>> process. These messages are benign?
>>
>> Thanks a lot.
>>
>> --
>> Jeremy
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Question regarding support for images

2013-03-06 Thread Bryan Hughes

Hi Everyone,

I am building a new 5 node cluster with 1.3.0 and am transitioning from 
Bitcask to LevelDB (or perhaps a Mulit with LevelDB being the main) 
which is all well understood.  My question is regarding image data, and 
other large binary data: Is one better than the other in terms of the 
size of binary data that can be stored, as well as performance of 
reads?  I recall Mark suggesting to limit binary data to 10MB.


I am curious where this limitation comes from?

Thanks,
Bryan

--

Bryan Hughes
*Go Factory*
http://www.go-factory.net

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: {error, notfound} items in mapreduce during ownership handoff

2013-03-06 Thread John Daily
Disclaimer: I haven't done this myself, or even attempted to compile the code 
below, but in hopes it makes your life a little easier...

Here's some Erlang that should allow you to abstract away the checks. Assuming 
that you have a function named "real_map_fun" that takes a single Riak object 
as an argument, you should be able to reuse "check_found" and "check_tombstone" 
below and wrap the call to "real_map_fun" in a function like "map_something".

map_something(Obj, _, _) ->
check_found(Obj, fun real_map_fun/1).


check_found({error, notfound}, _Fun) ->
[];
check_found(Obj, Fun) ->
check_tombstone(dict:is_key(<<"X-Riak-Deleted">>,
riak_object:get_metadata(Obj)),
Obj, Fun).


check_tombstone(true, _Obj, _Fun) ->
[];
check_tombstone(false, Obj, Fun) ->
Fun(Obj).

-John


On Mar 6, 2013, at 4:45 PM, Jeremy Raymond  wrote:

> Thanks for the response. To handle not founds and tombstones I need this in 
> every map function?
> 
> map_something({error, notfound}, _, _) ->
> [];
> map_something(RiakObj, _, _) ->
> Metadata = riak_object:get_metadata(RiakObj),
> case dict:is_key(<<"X-Riak-Deleted">>, Metadata) of
> true ->
> []; % I have a tombstone
> false ->
> [ok] % I have a valid item
> end.
> 
> Is there a better or built-in way to do this filtering?
> 
> --
> Jeremy
> 
> 
> On Wed, Mar 6, 2013 at 4:08 PM, John Daily  wrote:
> I'm sorry this went unanswered, Jeremy.  Thanks for the follow up.
> 
> Your code needs to be able to handle notfound errors and tombstones[1] 
> regardless of ownership handoff. The coverage for the 2i or listkeys input 
> will be calculated up front, with the work distributed to a node where the 
> key is expected to be found, but it's always possible that the node selected 
> may not have the data you want due to network or system errors that haven't 
> yet healed.
> 
> Both the notfound and the fitting error in the logs (which is benign, and 
> related to the notfound problem) should be less common under Riak 1.3, 
> although ownership handoff will still exacerbate the problem.
> 
> [1] https://github.com/basho/riak_kv/issues/358
> 
> -John Daily
> Technical Evangelist
> jda...@basho.com
> 
> 
> On Mar 6, 2013, at 3:14 PM, Jeremy Raymond  wrote:
> 
>> So I should expect {error, notfound} inputs to map jobs while ownership 
>> handoff is in progress? Are the not found items actually unavailable during 
>> handoff or is this just not found on the old node, but will be picked up by 
>> the new node during the same mapreduce job?
>> 
>> --
>> Jeremy
>> 
>> 
>> On Thu, Feb 28, 2013 at 11:28 AM, Jeremy Raymond  wrote:
>> Yesterday I added a new node to my cluster. During the time when ownership 
>> handoff was happening (several hours of work) mapreduce map functions were 
>> receiving {error, notfound} as inputs. My Erlang mapred functions weren't 
>> designed to handle this. They hadn't encountered this before during normal 
>> operation. After the ownership handoff process completed the {error, 
>> notfound} inputs have stopped.
>> 
>> Any explanations for the {error, notfound} inputs during ownership handoff? 
>> Is this because a node is attempting to process an object now moved to 
>> another node? If this is the case would the notfound object be found on the 
>> other node in the same mapreduce job (i.e. still visible to the overall 
>> mapred process)? Should I assume {error, notfound} inputs for all mapred 
>> jobs as a valid possible input and always handle it?
>> 
>> I've also accumulated about 50MB of "Pipe worker startup failed:fitting was 
>> gone before startup" on each node during the ownership_transfer process. 
>> These messages are benign?
>> 
>> Thanks a lot.
>> 
>> --
>> Jeremy
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Question regarding support for images

2013-03-06 Thread Matthew Von-Maszewski
Just curious, what is the typical size and the overall range of sizes for your 
image data?

Matthew

On Mar 6, 2013, at 6:08 PM, Bryan Hughes  wrote:

> Hi Everyone,
> 
> I am building a new 5 node cluster with 1.3.0 and am transitioning from 
> Bitcask to LevelDB (or perhaps a Mulit with LevelDB being the main) which is 
> all well understood.  My question is regarding image data, and other large 
> binary data: Is one better than the other in terms of the size of binary data 
> that can be stored, as well as performance of reads?  I recall Mark 
> suggesting to limit binary data to 10MB.
> 
> I am curious where this limitation comes from?
> 
> Thanks,
> Bryan
> 
> -- 
> Bryan Hughes
> Go Factory
> http://www.go-factory.net
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Question regarding support for images

2013-03-06 Thread Bryan Hughes

Hi Matthew,

Right now we are storing images captured from mobile phones in Riak.  So 
far it works really well as the image sizes range from a few hundred K 
to a few MB.  But, as camera resolutions increase, so does the data 
size, plus, users are actually interested in the higher resolution 
images if they decide to print them.


I am hoping to understand is the largest "chunk" size that a binary 
value should have and whether or not leveldb would be better or worse 
than bitcask.  For example, if it is as Mark suggested where the optimal 
size would be 10MB, that would mean that we would be chunking a 50MB 
binary object into 5 concurrent writes.  For us, this is actually a very 
good solution since our platform does not actually operate on any of the 
data it is persisting and can achieve the scale necessary.


A little bit of background - we have developed a novel message queue 
architecture/platform in Erlang that is optimized for mobile devices 
that is currently in private beta (will be opening it up to a public 
beta around the end of March).  We provide an SDK that allows mobile 
developers access to light weight services running on our platform that 
gives instant group collaboration functionality ranging from 
group/device discovery, group formation and management, to real-time 
chat (feature set is actually pretty comprehensive).  The SDK allows 
developers to drop our service into any existing or new project with 
just a few lines of code, which means that the binary data will be 
whatever the developer decides to push along the wire, from images to 
audio to even video clips.


If you are interested, here is a link to our docs: 
http://developer.go-factory.net/


Also note, I understand that the larger the data size, the more 
on-the-wire cost within the cluster, for example if the data size is 
10MB, with n_val of 3, that is 30MB on the wire for each chunk, and with 
5 chunks, that comes out to 150MB.  For us that is less of an issue as 
we host our own service on dedicated dual ported gigabit hardware 
co-located at two major data centers.


Does this help?

Cheers,
Bryan


On 3/6/13 4:14 PM, Matthew Von-Maszewski wrote:
Just curious, what is the typical size and the overall range of sizes 
for your image data?


Matthew

On Mar 6, 2013, at 6:08 PM, Bryan Hughes > wrote:



Hi Everyone,

I am building a new 5 node cluster with 1.3.0 and am transitioning 
from Bitcask to LevelDB (or perhaps a Mulit with LevelDB being the 
main) which is all well understood.  My question is regarding image 
data, and other large binary data: Is one better than the other in 
terms of the size of binary data that can be stored, as well as 
performance of reads?  I recall Mark suggesting to limit binary data 
to 10MB.


I am curious where this limitation comes from?

Thanks,
Bryan

--

Bryan Hughes
*Go Factory*
http://www.go-factory.net

___
riak-users mailing list
riak-users@lists.basho.com 
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




--

Bryan Hughes
CTO and Founder / *Go Factory*
(415) 515-7916
http://www.go-factory.net

/"Art is never finished, only abandoned. - Leonardo da Vinci"/


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Question regarding support for images

2013-03-06 Thread Matthew Von-Maszewski
Bryan,

My job is to test and optimize leveldb.  I do not have a test case with your 
size objects.  Hence I asked the specifics.

I do not have any data at this time to help with your question.  But I will add 
your parameters to my standard testing and see what I can find / optimize.  I 
am not currently performing comparison work against bitcask and must leave 
those opinions to others.

Matthew


On Mar 6, 2013, at 8:15 PM, Bryan Hughes  wrote:

> Hi Matthew,
> 
> Right now we are storing images captured from mobile phones in Riak.  So far 
> it works really well as the image sizes range from a few hundred K to a few 
> MB.  But, as camera resolutions increase, so does the data size, plus, users 
> are actually interested in the higher resolution images if they decide to 
> print them.
> 
> I am hoping to understand is the largest "chunk" size that a binary value 
> should have and whether or not leveldb would be better or worse than bitcask. 
>  For example, if it is as Mark suggested where the optimal size would be 
> 10MB, that would mean that we would be chunking a 50MB binary object into 5 
> concurrent writes.  For us, this is actually a very good solution since our 
> platform does not actually operate on any of the data it is persisting and 
> can achieve the scale necessary.
> 
> A little bit of background - we have developed a novel message queue 
> architecture/platform in Erlang that is optimized for mobile devices that is 
> currently in private beta (will be opening it up to a public beta around the 
> end of March).  We provide an SDK that allows mobile developers access to 
> light weight services running on our platform that gives instant group 
> collaboration functionality ranging from group/device discovery, group 
> formation and management, to real-time chat (feature set is actually pretty 
> comprehensive).  The SDK allows developers to drop our service into any 
> existing or new project with just a few lines of code, which means that the 
> binary data will be whatever the developer decides to push along the wire, 
> from images to audio to even video clips.   
> 
> If you are interested, here is a link to our docs: 
> http://developer.go-factory.net/
> 
> Also note, I understand that the larger the data size, the more on-the-wire 
> cost within the cluster, for example if the data size is 10MB, with n_val 
> of 3, that is 30MB on the wire for each chunk, and with 5 chunks, that comes 
> out to 150MB.  For us that is less of an issue as we host our own service on 
> dedicated dual ported gigabit hardware co-located at two major data centers.  
> 
> Does this help?
> 
> Cheers,
> Bryan
> 
> 
> On 3/6/13 4:14 PM, Matthew Von-Maszewski wrote:
>> Just curious, what is the typical size and the overall range of sizes for 
>> your image data?
>> 
>> Matthew
>> 
>> On Mar 6, 2013, at 6:08 PM, Bryan Hughes  wrote:
>> 
>>> Hi Everyone,
>>> 
>>> I am building a new 5 node cluster with 1.3.0 and am transitioning from 
>>> Bitcask to LevelDB (or perhaps a Mulit with LevelDB being the main) which 
>>> is all well understood.  My question is regarding image data, and other 
>>> large binary data: Is one better than the other in terms of the size of 
>>> binary data that can be stored, as well as performance of reads?  I recall 
>>> Mark suggesting to limit binary data to 10MB.
>>> 
>>> I am curious where this limitation comes from?
>>> 
>>> Thanks,
>>> Bryan
>>> 
>>> -- 
>>> Bryan Hughes
>>> Go Factory
>>> http://www.go-factory.net
>>> 
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
> 
> -- 
> Bryan Hughes
> CTO and Founder / Go Factory
> (415) 515-7916
> http://www.go-factory.net
> 
> "Art is never finished, only abandoned. - Leonardo da Vinci"
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Storing objects during MapReduce phase.

2013-03-06 Thread Mark Phillips
Hi Tiago,

Mikhail Sobelov wrote an erlang function a while back that'll take the
output of a map and store it to a bucket and key.

http://contrib.basho.com/save_reduce.html

Fair warning: it was written a few years back, so you might have to
revise the code a bit, but it's a good starting point.

Mark

On Tue, Feb 12, 2013 at 8:32 AM, ttt  wrote:
> Hi everyone!
>
>
> Is it possible to store data into Riak during a Map or Reduce phase?
>
> This is my scenario:
> - I have a simple MapReduce job, with 1 map followed with 1 reduce.
> - I am using the Ruby library and JavaScript to code the phase functions.
> - In the map phase, some properties of the input objects are recalculated
> and then passed on to the reduce phase.
> - I am keeping the results from the map phase, to update the input objects
> as soon as they come back via streaming.
> - The MapReduce job is used in each iteration of an implementation of the
> K-means clustering algorithm.
>
> Once the map phase runs on the nodes where the objects are stored, it would
> be preferable to have them being updated by the JavaScript function that
> runs on the map phase, to avoid the data objects being transfered back and
> forth.
>
> Would it be a good idea to install the JavaScript client library on the
> server and make it available to the MapReduce phase functions? Would it
> work?
>
>
> Thanks for your time.
> Best regards.
> Tiago
>
>
>
> --
> View this message in context: 
> http://riak-users.197444.n3.nabble.com/Storing-objects-during-MapReduce-phase-tp4026852.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com