You're right, I think the Python client doesn't support the HEAD / metadata-only request.
I'm still curious, however - what will you do with the object metadata, assuming you can code around the lack of support in the client? On Tue, Dec 8, 2015 at 10:02 AM, Joe O <jo7...@nododos.com> wrote: > Damien and Dmitri – Thanks for the guidance. This really helps me out. > > I see from the link Dmitri provided that a PB fetch object call does > indeed support a request for returning only metadata. Unfortunately, I do > not think this functionality is exposed in the Riak Python API > RiakBucket.get method, according to the docs at > http://basho.github.io/riak-python-client/bucket.html#bucket-objects . I > looked in the Python Riak client source code in GitHub (bucket.py), and > also did not see a HEAD implementation in ./transports/http/transport.py. > Am I looking in the right place? If it were implemented, it would be a > function of the ‘get' method on the bucket object, right? > > I did test the http HEAD method (albeit with curl’s flawed –X HEAD > implementation) and that did work. > > I can live with internal cluster bandwidth being used for a metadata > request. I don’t want to send large objects back down to the client that > aren’t being used. > > I understand the strategy of storing metadata for a given key in a > different bucket using the same key. I’m trying to avoid turning my > key-value store into a key-key-value-value store. There is an elegance to > storing both the data and the metadata at the same time and in the same > place via the same operation, so that is the perferred direction. > > > > From: Damien Krotkine > Date: Tuesday, December 8, 2015 at 12:35 AM > To: Dmitri Zagidulin > Cc: "technol...@nododos.com", riak-users > Subject: Re: Two quick questions about X-RIak-Meta-* headers.... > > Hi Joe, > > First of all, what Dmitri says makes a lot of sense. From what I > understand, you are trying to avoid wasting network bandwidth by > transferring data where you only need the metadata of your keys. As Dmitri > pointed out, if your replication factor is 3 (default), then Riak will > internally query all the replica full *data* and metadata, then only return > the metadata to the client. From a network perspective, you'll save network > bandwidth *outside* of your cluster, but you'll use quite a lot of network > bandwidth *inside* of your cluster. Maybe that's not an issue. But if it > is, read on: > > 1/ first solution : as Dmitri suggested, the best approach is to decouple > your metadata and data, if possible. If your key is "my_stuff", then have > your data stored in the bucket "data" and the metadata stored in the bucket > "metadata". So that you'd fetch "metadata/my_stuff", then fetch > "data/my_stuff". This should make your life way easier, but the issue is > that you loose the strong relationship between data and metadata. To try to > mitigate this: > - when writing your data, start by writing "data/my_stuff" with > conservative parameters ( w=3 for instance ) then wait for the successful > write before storing the metadata. So that when the metadata is there, > there is a very high chance that the data is there as well. > - when updating your data: ry to be smart, like marking the metadata as > invalid or unavailable, while you change the data underneath then update > the metadata > - be fault tolerant on your application: if you fetch some metadata, but > the data is missing, retry, or wait, or gracefully fallback. > - be fault tolerant again: when fetching some data, have it contain a > header or an id that must match with the metadata. If it doesn't match, you > need to wait/retry/fallback > - if you don't want/can't handle that on the client side, it's possible to > enrich the Riak API and have Riak do the bookeeping itself. If you're > interested, let me know. > > 2/ second solution: you don't want / can't separate metadata and data. In > this case, you can try to reduce the network usage > - first, reduce the internal network usage inside the cluster. When > querying the metadata, if you're using the PB API, you can pass "n_val" as > parameter of your request. If you pass n_val=1, then Riak will not query > the 3 replicas to fetch the value, instead it'll fetch only one replica, > saving a lot of internal bandwidth. > - second, you can have your client query one of the primary node (where > one of the replica is) for a given key. Coupled with passing n_val=1, Riak > will not transfer anything on the internal network. you can check out > https://github.com/dams/riak_preflists to find the primary nodes. > > Dmitri Zagidulin wrote: > > Hi Joe, > > 1. Yes, it's possible (with the HTTP HEAD request, or the client library > equivalent (I'm pretty sure all the client libraries expose the 'return > only the headers' part of the object fetch -- see the Optional Parameters > head=true section of the PB API > http://docs.basho.com/riak/latest/dev/references/protocol-buffers/fetch-object/ > )). > > However, this is not going to be very helpful in your case. Behind the > scenes, a HEAD request still requests all replicas of an object -- *full* > replicas, including the value. It's just that the node coordinating the > request drops the actual object value before returning the metadata/headers > to the client. > So, if you use this 'just give me the metadata' request, you're only > saving on the cost of shipping the object value down the wire from the > cluster to the client. But you're still incurring the cost of all 3 copies > of the object (so, 3-4MB, in your case) being transferred over the network > between the nodes, as a result of that HEAD request. > > 2. I don't know that there is a specific size limit on the object header > values. However, this question is definitely a red flag -- it's very likely > that you're trying to use the custom headers in a way that they weren't > intended for. > > Can you describe in more detail what's your use case? (As in, what are you > trying to store as metadata, and why would retrieving just the headers be > useful to you.) > > Don't forget, that if you need to store a LOT of metadata on an object, > and you can't do it within the object value itself (for example, when > storing binary images), you can simply store a separate metadata object, in > a different bucket, using the same key as the object. > For example, if I'm storing my primary objects in the 'blobs' bucket, I > can also store a JSON object with corresponding metadata in a 'blobs-meta' > object, like so: > > /buckets/blobs/keys/blob123 --> binary object value > /buckets/blobs-meta/keys/blob123 --> json metadata object > > The downsides of this setup is that you're now doing 2 writes for each > object (one to the blobs bucket, and one to the meta bucket). > But the benefits are considerable: > > - You can store arbitrarily large metadata object (you're not abusing > object headers by stuffing large values into them) > - Since the metadata object is likely to be much smaller than the object > it's referring to, you can use the metadata object to check for an object's > existence (or to get the actual headers that you care about) without the > cost of requesting the full giant blob. > > Dmitri > > > > On Mon, Nov 30, 2015 at 12:18 PM, Joe Olson <technol...@nododos.com> > wrote: > >> Two quick questions about X-RIak-Meta-* headers: >> >> 1. Is it possible to pull the headers for a key without pulling the key >> itself? The reason I am interested in this is because the values for our >> keys are in the 1.2-1.6 MB range, so the headers are a lot smaller in >> comparison. I know I can index the headers using Solr or 2i, but I am >> trying to involve the overhead of doing that. >> >> 2. What are the size limits on the headers values that are strings? >> >> As always, thanks in advance! >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > _______________________________________________ > riak-users mailing > listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com