Hi All.

We are trying to cope with radosGW crashing every 5-15mins. This seems to
be getting worse and worse but we are unable to determine the cause,
nothing in the logs as it appears to be a radosgw hang.

The port is open, accepts a connect but there is no response to a HEAD/GET
etc etc.

We are unsure where to go from here.

We have HAProxy running on a dual 10G connected server. It is also doing
SSL offload for the gateways.

The gateways are civetweb. We run obj01/02 on physical hardware. We have
attempted to run 4 instances on the same machine, the machine can cope, but
the instances still crash too.

We are running 0.94-1337-gce175f3-1 which is
https://github.com/ceph/ceph/tree/wip-rgw-orphans/src/rgw

Attached is the data via the load balancer for the last week. As you can
see its close to 500-900MB/s at most times.

[client.radosgw.ceph-obj02]
  host = ceph-obj02
  keyring = /etc/ceph/keyring.radosgw.ceph-obj02
  rgw socket path = /tmp/radosgw.sock
  log file = /var/log/ceph/radosgw.log
  rgw data = /var/lib/ceph/radosgw/ceph-obj02
  rgw thread pool size = 1024
  rgw print continue = False
  rgw enable ops log = False
  log to stderr = False
  rgw enable usage log = False

Anyone have any thoughts? Is this just a pure capacity/performance issue
with civetweb and I need to run more threads/gateways?
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to