Hi,

thanks for the hint. I tried this again and noticed that the time out message 
does seem to be unrelated. Here is the log file for a stalling request with 
debug turned on:
http://pastebin.com/DcQuc9wP

I really cannot really find a real "error" in the log. The download stalls at 
about 500kb at that point though. Restarting radosgw fixes it for 1 download 
only, the next one is broken again. But as i said this does not happen for all 
files. 

Sebastian

On 27.11.2013, at 21:53, Yehuda Sadeh wrote:

> On Wed, Nov 27, 2013 at 4:46 AM, Sebastian <webmas...@mailz.de> wrote:
>> Hi,
>> 
>> we have a setup of 4 Servers running ceph and radosgw. We use it as an 
>> internal S3 service for our files. The Servers run Debian Squeeze with Ceph 
>> 0.67.4.
>> 
>> The cluster has been running smoothly for quite a while, but we are 
>> currently experiencing issues with the radosgw. For some files the HTTP 
>> Download just stalls at around 500kb.
>> 
>> The Apache error log just says:
>> [error] [client ] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: 
>> idle timeout (30 sec)
>> [error] [client ] Handler for fastcgi-script returned invalid result code 1
>> 
>> radosgw logging:
>> 7f00bc66a700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 
>> 0x7f00934bb700' had timed out after 600
>> 7f00bc66a700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 
>> 0x7f00ab4eb700' had timed out after 600
>> 
>> The interesting thing is that the cluster health is fine an only some files 
>> are not working properly. Most of them just work fine. A restart of radosgw 
>> fixes the issue. The other ceph logs are also clean.
>> 
>> Any idea why this happens?
>> 
> 
> No, but you can turn on 'debug ms = 1' on your gateway ceph.conf, and
> that might give some better indication.
> 
> Yehuda

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to