On Thu, Jun 1, 2017 at 2:03 AM Jens Rosenboom <j.rosenb...@x-ion.de> wrote:
> On a large Hammer-based cluster (> 1 Gobjects) we are seeing a small > amount of objects being truncated. All of these objects are between > 512kB and 4MB in size and they are not uploaded as multipart, so the > first 512kB get stored into the head object and the next chunks should > be in tail objects named <bucket_id>__shadow_<tag>_N, but the latter > seem to go missing sometimes. The PUT operation for these objects is > logged as successful (HTTP code 200), so I'm currently having two > hypotheses as to what might be happening: > > 1. The object is received by the radosgw process, the head object is > written successfully, then the write for the tail object somehow > fails. So the question is whether this is possible or whether radosgw > will always wait until all operations have completed successfully > before returning the 200. This blog [1] at least mentions some > asynchronous operations. > > 2. The full object is written correctly, but the tail objects are > getting deleted somehow afterwards. This might happen during garbage > collection if there was a collision between the tail object names for > two objects, but again I'm not sure whether this is possible. > > So the question is whether anyone else has seen this issue, also > whether it may possibly be fixed in Jewel or later. > > The second issue is what happens when a client tries to access such an > truncated object. The radosgw first answers with the full headers and > a content-length of e.g. 600000, then sends the first chunk of data > (524288 bytes) from the head object. After that it tries to read the > first tail object, but receives an error -2 (file not found). radosgw > now tries to send a 404 status with a NoSuchKey error in XML body, but > of course this is too late, the clients sees this as part of the > object data. After that, the connection stays open, the clients waits > for the rest of the object to be sent and times out with an error in > the end. Or, if the original object was just slightly larger than > 512k, the client will append the 404 header at that point and continue > with corrupted data, hopefully checking the MD5 sum and noticing the > issue. This behaviour is still unchanged at least in Jewel and you can > easily reproduce it by manually deleting the shadow object from the > bucket pool after you have uploaded an object of the proper size. > > I have created a bug report with the first issue[2], please let me > know whether you would like a different ticket for the second one. > No idea what's going on here but they definitely warrant separate issues. The second one is about handling error states; the first is about inducing them. :)
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com