Re: Two quick questions about X-RIak-Meta-* headers....

Damien Krotkine Mon, 07 Dec 2015 22:38:12 -0800

Hi Joe,

First of all, what Dmitri says makes a lot of sense. From what Iunderstand, you are trying to avoid wasting network bandwidth bytransferring data where you only need the metadata of your keys. AsDmitri pointed out, if your replication factor is 3 (default), then Riakwill internally query all the replica full *data* and metadata, thenonly return the metadata to the client. From a network perspective,you'll save network bandwidth *outside* of your cluster, but you'll usequite a lot of network bandwidth *inside* of your cluster. Maybe that'snot an issue. But if it is, read on:

1/ first solution : as Dmitri suggested, the best approach is todecouple your metadata and data, if possible. If your key is "my_stuff",then have your data stored in the bucket "data" and the metadata storedin the bucket "metadata". So that you'd fetch "metadata/my_stuff", thenfetch "data/my_stuff". This should make your life way easier, but theissue is that you loose the strong relationship between data andmetadata. To try to mitigate this:- when writing your data, start by writing "data/my_stuff" withconservative parameters ( w=3 for instance ) then wait for thesuccessful write before storing the metadata. So that when the metadatais there, there is a very high chance that the data is there as well.- when updating your data: ry to be smart, like marking the metadata asinvalid or unavailable, while you change the data underneath then updatethe metadata- be fault tolerant on your application: if you fetch some metadata, butthe data is missing, retry, or wait, or gracefully fallback.- be fault tolerant again: when fetching some data, have it contain aheader or an id that must match with the metadata. If it doesn't match,you need to wait/retry/fallback- if you don't want/can't handle that on the client side, it's possibleto enrich the Riak API and have Riak do the bookeeping itself. If you'reinterested, let me know.

2/ second solution: you don't want / can't separate metadata and data.In this case, you can try to reduce the network usage- first, reduce the internal network usage inside the cluster. Whenquerying the metadata, if you're using the PB API, you can pass "n_val"as parameter of your request. If you pass n_val=1, then Riak will notquery the 3 replicas to fetch the value, instead it'll fetch only onereplica, saving a lot of internal bandwidth.- second, you can have your client query one of the primary node (whereone of the replica is) for a given key. Coupled with passing n_val=1,Riak will not transfer anything on the internal network. you can checkout https://github.com/dams/riak_preflists to find the primary nodes.


Dmitri Zagidulin wrote:

Hi Joe,
1. Yes, it's possible (with the HTTP HEAD request, or the clientlibrary equivalent (I'm pretty sure all the client libraries exposethe 'return only the headers' part of the object fetch -- see theOptional Parameters head=true section of the PB APIhttp://docs.basho.com/riak/latest/dev/references/protocol-buffers/fetch-object/)).
However, this is not going to be very helpful in your case. Behind thescenes, a HEAD request still requests all replicas of an object --*full* replicas, including the value. It's just that the nodecoordinating the request drops the actual object value beforereturning the metadata/headers to the client.So, if you use this 'just give me the metadata' request, you're onlysaving on the cost of shipping the object value down the wire from thecluster to the client. But you're still incurring the cost of all 3copies of the object (so, 3-4MB, in your case) being transferred overthe network between the nodes, as a result of that HEAD request.
2. I don't know that there is a specific size limit on the objectheader values. However, this question is definitely a red flag -- it'svery likely that you're trying to use the custom headers in a way thatthey weren't intended for.
Can you describe in more detail what's your use case? (As in, what areyou trying to store as metadata, and why would retrieving just theheaders be useful to you.)
Don't forget, that if you need to store a LOT of metadata on anobject, and you can't do it within the object value itself (forexample, when storing binary images), you can simply store a separatemetadata object, in a different bucket, using the same key as the object.For example, if I'm storing my primary objects in the 'blobs' bucket,I can also store a JSON object with corresponding metadata in a'blobs-meta' object, like so:
/buckets/blobs/keys/blob123   -->  binary object value
/buckets/blobs-meta/keys/blob123   --> json metadata object
The downsides of this setup is that you're now doing 2 writes for eachobject (one to the blobs bucket, and one to the meta bucket).
But the benefits are considerable:
- You can store arbitrarily large metadata object (you're not abusingobject headers by stuffing large values into them)- Since the metadata object is likely to be much smaller than theobject it's referring to, you can use the metadata object to check foran object's existence (or to get the actual headers that you careabout) without the cost of requesting the full giant blob.
Dmitri
On Mon, Nov 30, 2015 at 12:18 PM, Joe Olson <technol...@nododos.com<mailto:technol...@nododos.com>> wrote:
    Two quick questions about X-RIak-Meta-* headers:

    1. Is it possible to pull the headers for a key without pulling
    the key itself? The reason I am interested in this is because the
    values for our keys are in the 1.2-1.6 MB range, so the headers
    are a lot smaller in comparison. I know I can index the headers
    using Solr or 2i, but I am trying to involve the overhead of doing
    that.

    2. What are the size limits on the headers values that are strings?

    As always, thanks in advance!

    _______________________________________________
    riak-users mailing list
    riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Two quick questions about X-RIak-Meta-* headers....

Reply via email to