Hi Joe,

First of all, what Dmitri says makes a lot of sense. From what I understand, you are trying to avoid wasting network bandwidth by transferring data where you only need the metadata of your keys. As Dmitri pointed out, if your replication factor is 3 (default), then Riak will internally query all the replica full *data* and metadata, then only return the metadata to the client. From a network perspective, you'll save network bandwidth *outside* of your cluster, but you'll use quite a lot of network bandwidth *inside* of your cluster. Maybe that's not an issue. But if it is, read on:

1/ first solution : as Dmitri suggested, the best approach is to decouple your metadata and data, if possible. If your key is "my_stuff", then have your data stored in the bucket "data" and the metadata stored in the bucket "metadata". So that you'd fetch "metadata/my_stuff", then fetch "data/my_stuff". This should make your life way easier, but the issue is that you loose the strong relationship between data and metadata. To try to mitigate this: - when writing your data, start by writing "data/my_stuff" with conservative parameters ( w=3 for instance ) then wait for the successful write before storing the metadata. So that when the metadata is there, there is a very high chance that the data is there as well. - when updating your data: ry to be smart, like marking the metadata as invalid or unavailable, while you change the data underneath then update the metadata - be fault tolerant on your application: if you fetch some metadata, but the data is missing, retry, or wait, or gracefully fallback. - be fault tolerant again: when fetching some data, have it contain a header or an id that must match with the metadata. If it doesn't match, you need to wait/retry/fallback - if you don't want/can't handle that on the client side, it's possible to enrich the Riak API and have Riak do the bookeeping itself. If you're interested, let me know.

2/ second solution: you don't want / can't separate metadata and data. In this case, you can try to reduce the network usage - first, reduce the internal network usage inside the cluster. When querying the metadata, if you're using the PB API, you can pass "n_val" as parameter of your request. If you pass n_val=1, then Riak will not query the 3 replicas to fetch the value, instead it'll fetch only one replica, saving a lot of internal bandwidth. - second, you can have your client query one of the primary node (where one of the replica is) for a given key. Coupled with passing n_val=1, Riak will not transfer anything on the internal network. you can check out https://github.com/dams/riak_preflists to find the primary nodes.

Dmitri Zagidulin wrote:
Hi Joe,

1. Yes, it's possible (with the HTTP HEAD request, or the client library equivalent (I'm pretty sure all the client libraries expose the 'return only the headers' part of the object fetch -- see the Optional Parameters head=true section of the PB API http://docs.basho.com/riak/latest/dev/references/protocol-buffers/fetch-object/ )).

However, this is not going to be very helpful in your case. Behind the scenes, a HEAD request still requests all replicas of an object -- *full* replicas, including the value. It's just that the node coordinating the request drops the actual object value before returning the metadata/headers to the client. So, if you use this 'just give me the metadata' request, you're only saving on the cost of shipping the object value down the wire from the cluster to the client. But you're still incurring the cost of all 3 copies of the object (so, 3-4MB, in your case) being transferred over the network between the nodes, as a result of that HEAD request.

2. I don't know that there is a specific size limit on the object header values. However, this question is definitely a red flag -- it's very likely that you're trying to use the custom headers in a way that they weren't intended for.

Can you describe in more detail what's your use case? (As in, what are you trying to store as metadata, and why would retrieving just the headers be useful to you.)

Don't forget, that if you need to store a LOT of metadata on an object, and you can't do it within the object value itself (for example, when storing binary images), you can simply store a separate metadata object, in a different bucket, using the same key as the object. For example, if I'm storing my primary objects in the 'blobs' bucket, I can also store a JSON object with corresponding metadata in a 'blobs-meta' object, like so:

/buckets/blobs/keys/blob123   -->  binary object value
/buckets/blobs-meta/keys/blob123   --> json metadata object

The downsides of this setup is that you're now doing 2 writes for each object (one to the blobs bucket, and one to the meta bucket).
But the benefits are considerable:

- You can store arbitrarily large metadata object (you're not abusing object headers by stuffing large values into them) - Since the metadata object is likely to be much smaller than the object it's referring to, you can use the metadata object to check for an object's existence (or to get the actual headers that you care about) without the cost of requesting the full giant blob.

Dmitri



On Mon, Nov 30, 2015 at 12:18 PM, Joe Olson <technol...@nododos.com <mailto:technol...@nododos.com>> wrote:

    Two quick questions about X-RIak-Meta-* headers:

    1. Is it possible to pull the headers for a key without pulling
    the key itself? The reason I am interested in this is because the
    values for our keys are in the 1.2-1.6 MB range, so the headers
    are a lot smaller in comparison. I know I can index the headers
    using Solr or 2i, but I am trying to involve the overhead of doing
    that.

    2. What are the size limits on the headers values that are strings?

    As always, thanks in advance!

    _______________________________________________
    riak-users mailing list
    riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to