Re: [ceph-users] radosgw daemon stalls on download of some files

German Anders Fri, 29 Nov 2013 12:24:26 -0800

Thanks a lot Sebastian, i'm going to try that, also i'm having anissue while trying to test a rbd creation, i've install in the deployserver the ceph-client:

ceph@ceph-deploy01:/etc/ceph$ sudo rbd -n client.ceph-test -k/home/ceph/ceph-cluster/ceph.client.admin.keyring create --size 10240cephdata2013-11-29 15:20:25.683930 7fcd9979c780 0 librados:client.ceph-openstack authentication error (1) Operation not permitted

rbd: couldn't connect to the cluster!

Anyone know what could be the issue here? maybe it has something todo with keys or maybe not...


Thanks in advance,

Best regards,


German Anders

--- Original message ---
Asunto: Re: [ceph-users] radosgw daemon stalls on download of somefiles
De: Sebastian <webmas...@mailz.de>
Para: ceph-users <ceph-users@lists.ceph.com>
Fecha: Friday, 29/11/2013 16:18

Hi Yehuda,
It's interesting, the responses are received but seems that they
aren't being handled (hence the following pings). There are a few
things that you could look at. First, try to connect to the admin
socket and see if you get any useful information from there. This
could include in-flight requests, look for other requests that have
not completed. Also see if there's indication for requests throttling.
Do you refer to the methods mentioned here?http://ceph.com/docs/dumpling/radosgw/troubleshooting/?Unfortunately the socket file is not present. Do i have to activate itin the config somehow? I could not find any reference to that in thedocs. Is it already included in my radosgw version?
radosgw -v
ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
Another thing to look at would be at the seemingly unrelated timeout
messages. These should not happen and might indicate that there's
something that is holding you up that shouldn't. Try searching for the
same thread id that is specified in these messages (omit the 0x
prefix), and see what's the last thing that it's doing.
I checked that:
http://pastebin.com/Z23PWwjt
i do not see anything unusual before the messages happen, but maybeyou see something odd.
You could also try turning on also 'debug objecter = 20', see if it
provides more info (it's very verbose though).
Did that, but that is way to verbose for me ;) I uploaded it here:
http://pastebin.com/VBPAVP6z
There might be some requests mixed into it, but the one forcdn/52974400c6dd6ca719000004/source.avi is the one that stalled.
How much are you loading the gateway before that happens? We've seen a
similar issue in the past that was related to the fcgi library that is
dynamically linked with the radosgw process (that is, not the apache
mod_fastcgi module). This, however, would only happen when there's
heavy load and the fd numbers handled by the radosgw surpassed 1024
(buggy library that was using select() instead of poll()).
There are not that many requests on the Storage, maybe 10-20 req/min.The cluster serves as a source for a CDN, so once the resource isfetched it should not be fetched again soon. I checked for the openfiles, and there are only about 10-20 open file handles for theradosgw process. So this probably is not the issue.
Sebastian
Yehuda

On Fri, Nov 29, 2013 at 7:28 AM, Sebastian <webmas...@mailz.de> wrote:
Hi,
thanks for the hint. I tried this again and noticed that the time outmessage does seem to be unrelated. Here is the log file for a stallingrequest with debug turned on:
http://pastebin.com/DcQuc9wP
I really cannot really find a real "error" in the log. The downloadstalls at about 500kb at that point though. Restarting radosgw fixesit for 1 download only, the next one is broken again. But as i saidthis does not happen for all files.
Sebastian

On 27.11.2013, at 21:53, Yehuda Sadeh wrote:
On Wed, Nov 27, 2013 at 4:46 AM, Sebastian <webmas...@mailz.de> wrote:
Hi,
we have a setup of 4 Servers running ceph and radosgw. We use it as aninternal S3 service for our files. The Servers run Debian Squeeze withCeph 0.67.4.
The cluster has been running smoothly for quite a while, but we arecurrently experiencing issues with the radosgw. For some files theHTTP Download just stalls at around 500kb.
The Apache error log just says:
[error] [client ] FastCGI: comm with server "/var/www/s3gw.fcgi"aborted: idle timeout (30 sec)[error] [client ] Handler for fastcgi-script returned invalid resultcode 1
radosgw logging:
7f00bc66a700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread0x7f00934bb700' had timed out after 6007f00bc66a700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread0x7f00ab4eb700' had timed out after 600
The interesting thing is that the cluster health is fine an only somefiles are not working properly. Most of them just work fine. A restartof radosgw fixes the issue. The other ceph logs are also clean.
Any idea why this happens?
No, but you can turn on 'debug ms = 1' on your gateway ceph.conf, and
that might give some better indication.

Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw daemon stalls on download of some files

Reply via email to