On Mon, Sep 29, 2014 at 10:44 AM, Lyn Mitchell <mitc...@bellsouth.net> wrote:
>
>
> Hello ceph users,
>
>
>
> We have a federated gateway configured to replicate between two zones.
> Replication seems to be working smoothly between the master and slave zone,
> however I have a recurring error in the replication log with the following
> info:
>
>
> INFO:radosgw_agent.worker:17573 is processing shard number 60
>
> INFO:radosgw_agent.sync:60/128 items processed
>
> INFO:radosgw_agent.worker:finished processing shard 60
>
> INFO:radosgw_agent.sync:61/128 items processed
>
> INFO:radosgw_agent.worker:17573 is processing shard number 61
>
> INFO:radosgw_agent.worker:bucket instance "xxx-secondary-01:alph-1.80907.1"
> has 1 entries after "00000000112.112.3"
>
> INFO:radosgw_agent.worker:syncing bucket "xxx-secondary-01"
>
> ERROR:radosgw_agent.worker:failed to sync object
> xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd:
> state is error
>
> INFO:radosgw_agent.worker:finished processing shard 61
>
> INFO:radosgw_agent.sync:62/128 items processed
>
> INFO:radosgw_agent.worker:17573 is processing shard number 62
>
> INFO:radosgw_agent.worker:finished processing shard 62
>
>
>
> This file was originally created and deleted via a 3rd party application
> (Citrix CloudPlatform).  On the master zone I can see where the file was
> deleted and placed in a completed state, see below:
>
>
>
> (MASTER)
>
> radosgw-admin bilog list -–bucket=xxxx-secondary-01 –n $GATEWAY_INST
>
> …
>
>     { "op_id": "00000000107.107.2",
>
>       "op_tag": "alph-1.81679.241",
>
>       "op": "del",
>
>       "object":
> "snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd",
>
>       "state": "pending",
>
>       "index_ver": 107,
>
>       "timestamp": "2014-09-18 02:57:58.000000Z",
>
>       "ver": { "pool": 76,
>
>           "epoch": 267}},
>
>     { "op_id": "00000000108.108.3",
>
>       "op_tag": "alph-1.81679.241",
>
>       "op": "del",
>
>       "object":
> "snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd",
>
>       "state": "complete",
>
>       "index_ver": 108,
>
>       "timestamp": "2014-09-18 02:57:58.000000Z",
>
>       "ver": { "pool": 76,
>
>           "epoch": 348}},
>
> …
>
>
>
> While looking through the slave zone I found the following:
>
> (SLAVE):
>
> adosgw-admin opstate list -n $GATEWAY_INST
>
> …
>
>     { "client_id": "radosgw-agent",
>
>       "op_id": "xxxx-xxxx-r1:25526:2",
>
>       "object":
> "xxx-secondary-01\/snapshots\/8\/56\/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd",
>
>       "timestamp": "2014-09-29 17:12:43.402487Z",
>
>       "state": "error"},
>
> …
>
> Also, there was no reference when using:
> (SLAVE):
>
> radosgw-admin bilog list --bucket=xxx-secondary-01 -n $GATEWAY_INST
>
>        nothing was returned.
>
>
>
> (SLAVE):
>
> The gateway log on the slave has some information:
> 2014-09-29 13:26:49.554771 7f58881cc700  1 ====== req done
> req=0x7f58a8080690 http_status=204 ======
>
> 2014-09-29 13:26:49.581884 7f58a61fc700  1 ====== starting new request
> req=0x7f58a8063be0 =====
>
> 2014-09-29 13:26:49.582592 7f58a61fc700  0 WARNING: couldn't find acl header
> for bucket, generating default
>
> 2014-09-29 13:26:49.587044 7f58a61fc700  0 > HTTP_DATE -> Mon Sep 29
> 17:26:49 2014
>
> 2014-09-29 13:26:49.587063 7f58a61fc700  0 > HTTP_X_AMZ_COPY_SOURCE ->
> xxx-secondary-01/snapshots/8/56/07fb198b-e26a-46c2-9fc0-0ecee9c076ec.vhd
>
> 2014-09-29 13:26:49.608648 7f58a61fc700  0 curl_easy_performed returned
> error: couldn't connect to host
>
> 2014-09-29 13:26:49.612826 7f58a61fc700  1 ====== req done
> req=0x7f58a8063be0 http_status=400 ======
>
> 2014-09-29 13:26:49.640460 7f5898fe7700  1 ====== starting new request
> req=0x7f58a8077550 =====
>
> 2014-09-29 13:26:49.643624 7f5898fe7700  1 ====== req done
> req=0x7f58a8077550 http_status=200 ======
>
>
>
> From the error above it appears the slave is attempting to connect to the
> master, yet the file it’s requesting doesn’t exist.  I don’t think “couldn’t
> connect to host” is accurate because we’re not seeing the issue with any
> other objects which have been replicated.
>

I think that's the error that libcurl sends, so it I think it should
reflect what's actually happening.

>
>
> Has anyone by chance run across an instance of this and if so what can be
> done to remove the references or clean it up?
>
>

Can you turn up rgw debugging?

debug ms = 1
debug rgw = 20

Thanks,
Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to