On Thu, Jun 1, 2017 at 2:03 AM Jens Rosenboom <j.rosenb...@x-ion.de> wrote:

> On a large Hammer-based cluster (> 1 Gobjects) we are seeing a small
> amount of objects being truncated. All of these objects are between
> 512kB and 4MB in size and they are not uploaded as multipart, so the
> first 512kB get stored into the head object and the next chunks should
> be in tail objects named <bucket_id>__shadow_<tag>_N, but the latter
> seem to go missing sometimes. The PUT operation for these objects is
> logged as successful (HTTP code 200), so I'm currently having two
> hypotheses as to what might be happening:
>
> 1. The object is received by the radosgw process, the head object is
> written successfully, then the write for the tail object somehow
> fails. So the question is whether this is possible or whether radosgw
> will always wait until all operations have completed successfully
> before returning the 200. This blog [1] at least mentions some
> asynchronous operations.
>
> 2. The full object is written correctly, but the tail objects are
> getting deleted somehow afterwards. This might happen during garbage
> collection if there was a collision between the tail object names for
> two objects, but again I'm not sure whether this is possible.
>
> So the question is whether anyone else has seen this issue, also
> whether it may possibly be fixed in Jewel or later.
>
> The second issue is what happens when a client tries to access such an
> truncated object. The radosgw first answers with the full headers and
> a content-length of e.g. 600000, then sends the first chunk of data
> (524288 bytes) from the head object. After that it tries to read the
> first tail object, but receives an error -2 (file not found). radosgw
> now tries to send a 404 status with a NoSuchKey error in XML body, but
> of course this is too late, the clients sees this as part of the
> object data. After that, the connection stays open, the clients waits
> for the rest of the object to be sent and times out with an error in
> the end. Or, if the original object was just slightly larger than
> 512k, the client will append the 404 header at that point and continue
> with corrupted data, hopefully checking the MD5 sum and noticing the
> issue. This behaviour is still unchanged at least in Jewel and you can
> easily reproduce it by manually deleting the shadow object from the
> bucket pool after you have uploaded an object of the proper size.
>
> I have created a bug report with the first issue[2], please let me
> know whether you would like a different ticket for the second one.
>


No idea what's going on here but they definitely warrant separate issues.
The second one is about handling error states; the first is about inducing
them. :)
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to