Java client: ConflictResolver for RiakObject, how to get the key?

2015-04-13 Thread Henning Verbeek
I'm in the process of migrating my code from Riak 1.4 to Riak 2.0.

In Riak 2.0, I'm storing binary data as a RiakObject:

RiakObject obj = new RiakObject();
   obj.setContentType(CONTENT_TYPE);
   obj.setValue(BinaryValue.create(someByteArray));
StoreValue op = new StoreValue.Builder(obj)
   .withLocation(new Location(ns, keyOfObject))
   .withOption(StoreValue.Option.RETURN_BODY, false)
   .build();

A siphash-digest is computed over the byte-array beforehand, and is
stored in a separate object in Riak (I call it 'manifest').

When fetching the binary data, I want to provide a custom
ConflictResolver. This resolver shall fetch the manifest to the binary
data, where it can look up the expected digest. This can then be used
for identifying and eliminating bad siblings. It can use the object's
key to identify the corresponding manifest.

My problem is: how does the conflict resolver know the key?

In Riak 1.4, I used IRiakObject to transport the data. The key was
available right on the IRiakObject:
public IRiakObject resolve(Collection siblings) {
...
String key = siblings.iterator().next().getKey();
...
}

In Riak 2.0, the RiakObject does not expose this method. Is it
available maybe in the RiakUserMetadata ?

As an alternative, should I maybe create a POJO to encapsulate both
key (annotated with @RiakKey ?) and byte[]-data? I guess, I'd need a
custom converter for that, right?

Thanks,
Henning

-- 
My other signature is a regular expression.
http://www.pray4snow.de

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: object sizes

2015-04-13 Thread bryan hunt
Alex,


Maps and Sets are stored just like a regular Riak object, but using a 
particular data structure and object serialization format. As you have 
observed, there is an overhead, and you want to monitor the growth of these 
data structures.

It is possible to write a MapReduce map function (in Erlang) which  retrieves a 
provided object by type/bucket/id and returns the size of it's data. Would such 
a thing be of use?

It would not be hard to write such a module, and I might even have some code 
for doing so if you are interested. There are also reasonably good examples in 
our documentation - http://docs.basho.com/riak/latest/dev/advanced/mapreduce

I haven't looked at the Python PB API in a while, but I'm reasonably certain it 
supports the invocation of MapReduce jobs.

Bryan


> On 10 Apr 2015, at 13:51, Alex De la rosa  wrote:
> 
> Also, I forgot, i'm most interested on bucket_types instead of simple riak 
> buckets. Being able how my mutable data inside a MAP/SET has grown.
> 
> For a traditional standard bucket I can calculate the size of what I'm 
> sending before, so Riak won't get data bigger than 1MB. Problem arise in 
> MAPS/SETS that can grown.
> 
> Thanks,
> Alex
> 
> On Fri, Apr 10, 2015 at 2:47 PM, Alex De la rosa  > wrote:
> Well... using the HTTP Rest API would make no sense when using the PB API... 
> would be extremely costly to maintain, also it may include some extra bytes 
> on the transport.
> 
> I would be interested on being able to know the size via Python itself using 
> the PB API as I'm doing.
> 
> Thanks anyway,
> Alex
> 
> On Fri, Apr 10, 2015 at 1:58 PM, Ciprian Manea  > wrote:
> Hi Alex,
> 
> You can always query the size of a riak object using `curl` and the REST API:
> 
> i.e. curl -I :8098/buckets/test/keys/demo
> 
> 
> Regards,
> Ciprian
> 
> On Thu, Apr 9, 2015 at 12:11 PM, Alex De la rosa  > wrote:
> Hi there,
> 
> I'm using the python client (by the way).
> 
> obj = RIAK.bucket('my_bucket').get('my_key')
> 
> Is there any way to know the actual size of an object stored in Riak? to make 
> sure something mutable (like a set) didn't added up to more than 1MB in 
> storage size.
> 
> Thanks!
> Alex
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com 
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 
> 
> 
> 
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: nodes with 100% HD usage

2015-04-13 Thread bryan hunt
Result - Failed writes, reduced AAE availability, system errors, probably other 
(OS level) processes terminating.

100% disk usage is never good. However, our storage systems are write-append, 
which will mitigate against data corruption.

If the node becomes completely unavailable, the other nodes will also attempt 
to rebalance the data, with less nodes this means each node will be responsible 
for more storage, which could potentially cause a cascading failure.

Moral of the story - monitor, and start sending SMS messages when disk use goes 
above 80%, a standard devops chore, and applicable to any business critical 
computer system.

Bryan

> On 9 Apr 2015, at 14:10, Alex De la rosa  wrote:
> 
> Hi there,
> 
> One theoretical question; what happens when a node (or more) hits a 100% HD 
> usage?
> 
> Riak can easily scale horizontally adding new nodes to the cluster, but what 
> if one of them is full? will the system have troubles? will this node only be 
> used only for reading and new items get saved in the other nodes? will the 
> data rebalance in newly added servers freeing some space in the fully used 
> node?
> 
> Thanks!
> Alex
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: object sizes

2015-04-13 Thread Alex De la rosa
Hi Bryan,

Thanks for your answer; i don't know how to code in erlang, so all my
system relies on Python.

Following Ciprian's curl suggestion, I tried to compare it with this python
code during the weekend:

Map object:
curl -I
> 1058 bytes
print sys.getsizeof(obj.value)
> 3352 bytes

Standard object:
curl -I
> 9718 bytes
print sys.getsizeof(obj.encoded_data)
> 9755 bytes

The standard object seems pretty accurate in both approaches even the image
binary data was only 5kbs (I assume some overhead here)

The map object is about 3x the difference between curl and getting the
object via Python.

Not so sure if this is a realistic way to measure their growth (moreover
because the objects i would need this monitorization are Maps, not
unaltered binary data that I can know the size before storing it).

Would it be possible in some way that the Python get() function would
return something like "obj.content-lenght" returning the size is currently
taking? that would be a pretty nice feature.

Thanks!
Alex

On Mon, Apr 13, 2015 at 12:47 PM, bryan hunt  wrote:

> Alex,
>
>
> Maps and Sets are stored just like a regular Riak object, but using a
> particular data structure and object serialization format. As you have
> observed, there is an overhead, and you want to monitor the growth of these
> data structures.
>
> It is possible to write a MapReduce map function (in Erlang) which
>  retrieves a provided object by type/bucket/id and returns the size of it's
> data. Would such a thing be of use?
>
> It would not be hard to write such a module, and I might even have some
> code for doing so if you are interested. There are also reasonably good
> examples in our documentation -
> http://docs.basho.com/riak/latest/dev/advanced/mapreduce
>
> I haven't looked at the Python PB API in a while, but I'm reasonably
> certain it supports the invocation of MapReduce jobs.
>
> Bryan
>
>
> On 10 Apr 2015, at 13:51, Alex De la rosa  wrote:
>
> Also, I forgot, i'm most interested on bucket_types instead of simple riak
> buckets. Being able how my mutable data inside a MAP/SET has grown.
>
> For a traditional standard bucket I can calculate the size of what I'm
> sending before, so Riak won't get data bigger than 1MB. Problem arise in
> MAPS/SETS that can grown.
>
> Thanks,
> Alex
>
> On Fri, Apr 10, 2015 at 2:47 PM, Alex De la rosa 
> wrote:
>
>> Well... using the HTTP Rest API would make no sense when using the PB
>> API... would be extremely costly to maintain, also it may include some
>> extra bytes on the transport.
>>
>> I would be interested on being able to know the size via Python itself
>> using the PB API as I'm doing.
>>
>> Thanks anyway,
>> Alex
>>
>> On Fri, Apr 10, 2015 at 1:58 PM, Ciprian Manea  wrote:
>>
>>> Hi Alex,
>>>
>>> You can always query the size of a riak object using `curl` and the REST
>>> API:
>>>
>>> i.e. curl -I :8098/buckets/test/keys/demo
>>>
>>>
>>> Regards,
>>> Ciprian
>>>
>>> On Thu, Apr 9, 2015 at 12:11 PM, Alex De la rosa <
>>> alex.rosa@gmail.com> wrote:
>>>
 Hi there,

 I'm using the python client (by the way).

 obj = RIAK.bucket('my_bucket').get('my_key')

 Is there any way to know the actual size of an object stored in Riak?
 to make sure something mutable (like a set) didn't added up to more than
 1MB in storage size.

 Thanks!
 Alex

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


>>>
>>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: nodes with 100% HD usage

2015-04-13 Thread Alex De la rosa
Awesome! thanks Bryan, that's exactly what I wanted to know.

Monitoring that all nodes are below 80% of capacity and add nodes when
reaching those limits to rebalance data and free space on this nodes seems
the right way to go then : )

Thanks,
Alex

On Mon, Apr 13, 2015 at 1:16 PM, bryan hunt  wrote:

> Result - Failed writes, reduced AAE availability, system errors, probably
> other (OS level) processes terminating.
>
> 100% disk usage is never good. However, our storage systems are
> write-append, which will mitigate against data corruption.
>
> If the node becomes completely unavailable, the other nodes will also
> attempt to rebalance the data, with less nodes this means each node will be
> responsible for more storage, which could potentially cause a cascading
> failure.
>
> Moral of the story - monitor, and start sending SMS messages when disk use
> goes above 80%, a standard devops chore, and applicable to any business
> critical computer system.
>
> Bryan
>
> > On 9 Apr 2015, at 14:10, Alex De la rosa 
> wrote:
> >
> > Hi there,
> >
> > One theoretical question; what happens when a node (or more) hits a 100%
> HD usage?
> >
> > Riak can easily scale horizontally adding new nodes to the cluster, but
> what if one of them is full? will the system have troubles? will this node
> only be used only for reading and new items get saved in the other nodes?
> will the data rebalance in newly added servers freeing some space in the
> fully used node?
> >
> > Thanks!
> > Alex
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com