Hi Joe,
First of all, what Dmitri says makes a lot of sense. From what I
understand, you are trying to avoid wasting network bandwidth by
transferring data where you only need the metadata of your keys. As
Dmitri pointed out, if your replication factor is 3 (default), then Riak
will internally query all the replica full *data* and metadata, then
only return the metadata to the client. From a network perspective,
you'll save network bandwidth *outside* of your cluster, but you'll use
quite a lot of network bandwidth *inside* of your cluster. Maybe that's
not an issue. But if it is, read on:
1/ first solution : as Dmitri suggested, the best approach is to
decouple your metadata and data, if possible. If your key is "my_stuff",
then have your data stored in the bucket "data" and the metadata stored
in the bucket "metadata". So that you'd fetch "metadata/my_stuff", then
fetch "data/my_stuff". This should make your life way easier, but the
issue is that you loose the strong relationship between data and
metadata. To try to mitigate this:
- when writing your data, start by writing "data/my_stuff" with
conservative parameters ( w=3 for instance ) then wait for the
successful write before storing the metadata. So that when the metadata
is there, there is a very high chance that the data is there as well.
- when updating your data: ry to be smart, like marking the metadata as
invalid or unavailable, while you change the data underneath then update
the metadata
- be fault tolerant on your application: if you fetch some metadata, but
the data is missing, retry, or wait, or gracefully fallback.
- be fault tolerant again: when fetching some data, have it contain a
header or an id that must match with the metadata. If it doesn't match,
you need to wait/retry/fallback
- if you don't want/can't handle that on the client side, it's possible
to enrich the Riak API and have Riak do the bookeeping itself. If you're
interested, let me know.
2/ second solution: you don't want / can't separate metadata and data.
In this case, you can try to reduce the network usage
- first, reduce the internal network usage inside the cluster. When
querying the metadata, if you're using the PB API, you can pass "n_val"
as parameter of your request. If you pass n_val=1, then Riak will not
query the 3 replicas to fetch the value, instead it'll fetch only one
replica, saving a lot of internal bandwidth.
- second, you can have your client query one of the primary node (where
one of the replica is) for a given key. Coupled with passing n_val=1,
Riak will not transfer anything on the internal network. you can check
out https://github.com/dams/riak_preflists to find the primary nodes.
Dmitri Zagidulin wrote:
Hi Joe,
1. Yes, it's possible (with the HTTP HEAD request, or the client
library equivalent (I'm pretty sure all the client libraries expose
the 'return only the headers' part of the object fetch -- see the
Optional Parameters head=true section of the PB API
http://docs.basho.com/riak/latest/dev/references/protocol-buffers/fetch-object/
)).
However, this is not going to be very helpful in your case. Behind the
scenes, a HEAD request still requests all replicas of an object --
*full* replicas, including the value. It's just that the node
coordinating the request drops the actual object value before
returning the metadata/headers to the client.
So, if you use this 'just give me the metadata' request, you're only
saving on the cost of shipping the object value down the wire from the
cluster to the client. But you're still incurring the cost of all 3
copies of the object (so, 3-4MB, in your case) being transferred over
the network between the nodes, as a result of that HEAD request.
2. I don't know that there is a specific size limit on the object
header values. However, this question is definitely a red flag -- it's
very likely that you're trying to use the custom headers in a way that
they weren't intended for.
Can you describe in more detail what's your use case? (As in, what are
you trying to store as metadata, and why would retrieving just the
headers be useful to you.)
Don't forget, that if you need to store a LOT of metadata on an
object, and you can't do it within the object value itself (for
example, when storing binary images), you can simply store a separate
metadata object, in a different bucket, using the same key as the object.
For example, if I'm storing my primary objects in the 'blobs' bucket,
I can also store a JSON object with corresponding metadata in a
'blobs-meta' object, like so:
/buckets/blobs/keys/blob123 --> binary object value
/buckets/blobs-meta/keys/blob123 --> json metadata object
The downsides of this setup is that you're now doing 2 writes for each
object (one to the blobs bucket, and one to the meta bucket).
But the benefits are considerable:
- You can store arbitrarily large metadata object (you're not abusing
object headers by stuffing large values into them)
- Since the metadata object is likely to be much smaller than the
object it's referring to, you can use the metadata object to check for
an object's existence (or to get the actual headers that you care
about) without the cost of requesting the full giant blob.
Dmitri
On Mon, Nov 30, 2015 at 12:18 PM, Joe Olson <technol...@nododos.com
<mailto:technol...@nododos.com>> wrote:
Two quick questions about X-RIak-Meta-* headers:
1. Is it possible to pull the headers for a key without pulling
the key itself? The reason I am interested in this is because the
values for our keys are in the 1.2-1.6 MB range, so the headers
are a lot smaller in comparison. I know I can index the headers
using Solr or 2i, but I am trying to involve the overhead of doing
that.
2. What are the size limits on the headers values that are strings?
As always, thanks in advance!
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com