Re: [ceph-users] radosgw daemon stalls on download of some files

Andrew Woodward Sat, 30 Nov 2013 11:47:37 -0800

Are you using the  inktank patched FastCGI sever? http://gitbuilder.ceph.com


Alternately try another script sever like ngnix as already suggested.
On Nov 29, 2013 12:23 PM, "German Anders" <gand...@despegar.com> wrote:

>  Thanks a lot Sebastian, i'm going to try that, also i'm having an issue
> while trying to test a rbd creation, i've install in the deploy server the
> ceph-client:
>
> ceph@ceph-deploy01:/etc/ceph$ sudo rbd -n client.ceph-test -k
> /home/ceph/ceph-cluster/ceph.client.admin.keyring create --size 10240
> cephdata
> 2013-11-29 15:20:25.683930 7fcd9979c780  0 librados: client.ceph-openstack
> authentication error (1) Operation not permitted
> rbd: couldn't connect to the cluster!
>
>  Anyone know what could be the issue here? maybe it has something to do
> with keys or maybe not...
>
> Thanks in advance,
>
> Best regards,
>
>
> *German Anders*
>
>
>
>
>
>
>
> --- Original message ---
> *Asunto:* Re: [ceph-users] radosgw daemon stalls on download of some
> files
> *De:* Sebastian <webmas...@mailz.de>
> *Para:* ceph-users <ceph-users@lists.ceph.com>
> *Fecha:* Friday, 29/11/2013 16:18
>
> Hi Yehuda,
>
>
> It's interesting, the responses are received but seems that they
> aren't being handled (hence the following pings). There are a few
> things that you could look at. First, try to connect to the admin
> socket and see if you get any useful information from there. This
> could include in-flight requests, look for other requests that have
> not completed. Also see if there's indication for requests throttling.
>
>
> Do you refer to the methods mentioned here?
> http://ceph.com/docs/dumpling/radosgw/troubleshooting/?
> Unfortunately the socket file is not present. Do i have to activate it in
> the config somehow? I could not find any reference to that in the docs. Is
> it already included in my radosgw version?
> radosgw -v
> ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>
> Another thing to look at would be at the seemingly unrelated timeout
> messages. These should not happen and might indicate that there's
> something that is holding you up that shouldn't. Try searching for the
> same thread id that is specified in these messages (omit the 0x
> prefix), and see what's the last thing that it's doing.
>
>
> I checked that:
> http://pastebin.com/Z23PWwjt
> i do not see anything unusual before the messages happen, but maybe you
> see something odd.
>
>
> You could also try turning on also 'debug objecter = 20', see if it
> provides more info (it's very verbose though).
>
>
> Did that, but that is way to verbose for me ;) I uploaded it here:
> http://pastebin.com/VBPAVP6z
> There might be some requests mixed into it, but the one for
> cdn/52974400c6dd6ca719000004/source.avi is the one that stalled.
>
> How much are you loading the gateway before that happens? We've seen a
> similar issue in the past that was related to the fcgi library that is
> dynamically linked with the radosgw process (that is, not the apache
> mod_fastcgi module). This, however, would only happen when there's
> heavy load and the fd numbers handled by the radosgw surpassed 1024
> (buggy library that was using select() instead of poll()).
>
>
> There are not that many requests on the Storage, maybe 10-20 req/min. The
> cluster serves as a source for a CDN, so once the resource is fetched it
> should not be fetched again soon. I checked for the open files, and there
> are only about 10-20 open file handles for the radosgw process. So this
> probably is not the issue.
>
> Sebastian
>
>
>
> Yehuda
>
> On Fri, Nov 29, 2013 at 7:28 AM, Sebastian <webmas...@mailz.de> wrote:
>
> Hi,
>
> thanks for the hint. I tried this again and noticed that the time out
> message does seem to be unrelated. Here is the log file for a stalling
> request with debug turned on:
> http://pastebin.com/DcQuc9wP
>
> I really cannot really find a real "error" in the log. The download stalls
> at about 500kb at that point though. Restarting radosgw fixes it for 1
> download only, the next one is broken again. But as i said this does not
> happen for all files.
>
> Sebastian
>
> On 27.11.2013, at 21:53, Yehuda Sadeh wrote:
>
> On Wed, Nov 27, 2013 at 4:46 AM, Sebastian <webmas...@mailz.de> wrote:
>
> Hi,
>
> we have a setup of 4 Servers running ceph and radosgw. We use it as an
> internal S3 service for our files. The Servers run Debian Squeeze with Ceph
> 0.67.4.
>
> The cluster has been running smoothly for quite a while, but we are
> currently experiencing issues with the radosgw. For some files the HTTP
> Download just stalls at around 500kb.
>
> The Apache error log just says:
> [error] [client ] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted:
> idle timeout (30 sec)
> [error] [client ] Handler for fastcgi-script returned invalid result code 1
>
> radosgw logging:
> 7f00bc66a700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread
> 0x7f00934bb700' had timed out after 600
> 7f00bc66a700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread
> 0x7f00ab4eb700' had timed out after 600
>
> The interesting thing is that the cluster health is fine an only some
> files are not working properly. Most of them just work fine. A restart of
> radosgw fixes the issue. The other ceph logs are also clean.
>
> Any idea why this happens?
>
>
> No, but you can turn on 'debug ms = 1' on your gateway ceph.conf, and
> that might give some better indication.
>
> Yehuda
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw daemon stalls on download of some files

Reply via email to