Is that radosgw log from the primary or the secondary zone?  Nothing in
that log jumps out at me.

I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known
issue with Apache 2.4 on the primary and replication.  It's fixed, just
waiting for the next firefly release.  Although, that causes 40x errors
with Apache 2.4, not 500 errors.

Have you verified that both system users can read and write to both
clusters?  (Just make sure you clean up the writes to the slave cluster).




On Tue, Nov 11, 2014 at 6:51 AM, Aaron Bassett <aa...@five3genomics.com>
wrote:

> Ok I believe I’ve made some progress here. I have everything syncing
> *except* data. The data is getting 500s when it tries to sync to the backup
> zone. I have a log from the radosgw with debug cranked up to 20:
>
> 2014-11-11 14:37:06.688331 7f54447f0700  1 ====== starting new request
> req=0x7f546800f3b0 =====
> 2014-11-11 14:37:06.688978 7f54447f0700  0 WARNING: couldn't find acl
> header for bucket, generating default
> 2014-11-11 14:37:06.689358 7f54447f0700  1 -- 172.16.10.103:0/1007381 -->
> 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1783
> statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write
> e47531) v4 -- ?+0 0x7f534800d770 con 0x7f53f00053f0
> 2014-11-11 14:37:06.689396 7f54447f0700 20 -- 172.16.10.103:0/1007381
> submit_message osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call
> statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote,
> 172.16.10.103:6934/14875, have pipe.
> 2014-11-11 14:37:06.689481 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.689592 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer encoding 48 features 17592186044415
> 0x7f534800d770 osd_op(client.5673295.0:1783 statelog.obj_opstate.97 [call
> statelog.add] 193.1cf20a5a ondisk+write e47531) v4
> 2014-11-11 14:37:06.689756 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 48): sig = 206599450695048354
> 2014-11-11 14:37:06.689804 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer sending 48 0x7f534800d770
> 2014-11-11 14:37:06.689884 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.689915 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer sleeping
> 2014-11-11 14:37:06.694968 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got ACK
> 2014-11-11 14:37:06.695053 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 48
> 2014-11-11 14:37:06.695067 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
> 2014-11-11 14:37:06.695079 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got MSG
> 2014-11-11 14:37:06.695093 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190
> data=0 off 0
> 2014-11-11 14:37:06.695108 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler
> 0/104857600
> 2014-11-11 14:37:06.695135 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got front 190
> 2014-11-11 14:37:06.695150 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).aborted = 0
> 2014-11-11 14:37:06.695158 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message
> 2014-11-11 14:37:06.695284 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got message 48 0x7f51b4001950
> osd_op_reply(1783 statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0)
> v6
> 2014-11-11 14:37:06.695313 7f51ff0f0700 20 -- 172.16.10.103:0/1007381
> queue 0x7f51b4001950 prio 127
> 2014-11-11 14:37:06.695374 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
> 2014-11-11 14:37:06.695384 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.695426 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).write_ack 48
> 2014-11-11 14:37:06.695421 7f54ebfff700  1 -- 172.16.10.103:0/1007381 <==
> osd.25 172.16.10.103:6934/14875 48 ==== osd_op_reply(1783
> statelog.obj_opstate.97 [call] v47531'13 uv13 ondisk = 0) v6 ==== 190+0+0
> (4092879147 0 0) 0x7f51b4001950 con 0x7f53f00053f0
> 2014-11-11 14:37:06.695458 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.695476 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer sleeping
> 2014-11-11 14:37:06.695495 7f54ebfff700 10 -- 172.16.10.103:0/1007381
> dispatch_throttle_release 190 to dispatch throttler 190/104857600
> 2014-11-11 14:37:06.695506 7f54ebfff700 20 -- 172.16.10.103:0/1007381
> done calling dispatch on 0x7f51b4001950
> 2014-11-11 14:37:06.695616 7f54447f0700  0 > HTTP_DATE -> Tue Nov 11
> 14:37:06 2014
> 2014-11-11 14:37:06.695636 7f54447f0700  0 > HTTP_X_AMZ_COPY_SOURCE ->
> test/upload
> 2014-11-11 14:37:06.696823 7f54447f0700  1 -- 172.16.10.103:0/1007381 -->
> 172.16.10.103:6934/14875 -- osd_op(client.5673295.0:1784
> statelog.obj_opstate.97 [call statelog.add] 193.1cf20a5a ondisk+write
> e47531) v4 -- ?+0 0x7f534800fbb0 con 0x7f53f00053f0
> 2014-11-11 14:37:06.696866 7f54447f0700 20 -- 172.16.10.103:0/1007381
> submit_message osd_op(client.5673295.0:1784 statelog.obj_opstate.97 [call
> statelog.add] 193.1cf20a5a ondisk+write e47531) v4 remote,
> 172.16.10.103:6934/14875, have pipe.
> 2014-11-11 14:37:06.696935 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.696972 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer encoding 49 features 17592186044415
> 0x7f534800fbb0 osd_op(client.5673295.0:1784 statelog.obj_opstate.97 [call
> statelog.add] 193.1cf20a5a ondisk+write e47531) v4
> 2014-11-11 14:37:06.697120 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer signed seq # 49): sig =
> 6092508395557517420
> 2014-11-11 14:37:06.697161 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer sending 49 0x7f534800fbb0
> 2014-11-11 14:37:06.697223 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.697257 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer sleeping
> 2014-11-11 14:37:06.701315 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got ACK
> 2014-11-11 14:37:06.701364 7f51ff0f0700 15 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got ack seq 49
> 2014-11-11 14:37:06.701376 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
> 2014-11-11 14:37:06.701389 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got MSG
> 2014-11-11 14:37:06.701402 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got envelope type=43 src osd.25 front=190
> data=0 off 0
> 2014-11-11 14:37:06.701415 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader wants 190 from dispatch throttler
> 0/104857600
> 2014-11-11 14:37:06.701435 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got front 190
> 2014-11-11 14:37:06.701449 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).aborted = 0
> 2014-11-11 14:37:06.701458 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got 190 + 0 + 0 byte message
> 2014-11-11 14:37:06.701569 7f51ff0f0700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader got message 49 0x7f51b4001460
> osd_op_reply(1784 statelog.obj_opstate.97 [call] v47531'14 uv14 ondisk = 0)
> v6
> 2014-11-11 14:37:06.701597 7f51ff0f0700 20 -- 172.16.10.103:0/1007381
> queue 0x7f51b4001460 prio 127
> 2014-11-11 14:37:06.701627 7f51ff0f0700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).reader reading tag...
> 2014-11-11 14:37:06.701636 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.701678 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).write_ack 49
> 2014-11-11 14:37:06.701684 7f54ebfff700  1 -- 172.16.10.103:0/1007381 <==
> osd.25 172.16.10.103:6934/14875 49 ==== osd_op_reply(1784
> statelog.obj_opstate.97 [call] v47531'14 uv14 ondisk = 0) v6 ==== 190+0+0
> (1714651716 0 0) 0x7f51b4001460 con 0x7f53f00053f0
> 2014-11-11 14:37:06.701710 7f51ff1f1700 10 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer: state = open policy.server=0
> 2014-11-11 14:37:06.701728 7f51ff1f1700 20 -- 172.16.10.103:0/1007381 >>
> 172.16.10.103:6934/14875 pipe(0x7f53f0005160 sd=61 :33168 s=2 pgs=2524
> cs=1 l=1 c=0x7f53f00053f0).writer sleeping
> 2014-11-11 14:37:06.701751 7f54ebfff700 10 -- 172.16.10.103:0/1007381
> dispatch_throttle_release 190 to dispatch throttler 190/104857600
> 2014-11-11 14:37:06.701762 7f54ebfff700 20 -- 172.16.10.103:0/1007381
> done calling dispatch on 0x7f51b4001460
> 2014-11-11 14:37:06.701815 7f54447f0700  0 WARNING: set_req_state_err
> err_no=5 resorting to 500
> 2014-11-11 14:37:06.701894 7f54447f0700  1 ====== req done
> req=0x7f546800f3b0 http_status=500 ======
>
>
> Any information you could give me would be wonderful as I’ve been banging
> my head against this for a few days.
>
> Thanks, Aaron
>
> On Nov 5, 2014, at 3:02 PM, Aaron Bassett <aa...@five3genomics.com> wrote:
>
> Ah so I need both users in both clusters? I think I missed that bit, let
> me see if that does the trick.
>
> Aaron
>
> On Nov 5, 2014, at 2:59 PM, Craig Lewis <cle...@centraldesktop.com> wrote:
>
> One region two zones is the standard setup, so that should be fine.
>
> Is metadata (users and buckets) being replicated, but not data (objects)?
>
>
> Let's go through a quick checklist:
>
>    - Verify that you enabled log_meta and log_data in the region.json for
>    the master zone
>    - Verify that RadosGW is using your region map with radosgw-admin
>    regionmap get --name client.radosgw.<name>
>    - Verifu
>    - Verify that RadosGW is using your zone map with radosgw-admin zone
>    get --name client.radosgw.<name>
>    - Verify that all the pools in your zone exist (RadosGW only
>    auto-creates the basic ones).
>    - Verify that your system users exist in both zones with the same
>    access and secret.
>
> Hopefully that gives you an idea what's not working correctly.
>
> If it doesn't, crank up the logging on the radosgw daemon on both sides,
> and check the logs.  Add debug rgw = 20 to both ceph.conf (in the
> client.radosgw.<name> section), and restart.  Hopefully those logs will
> tell you what's wrong.
>
>
> On Wed, Nov 5, 2014 at 11:39 AM, Aaron Bassett <aa...@five3genomics.com>
> wrote:
>
>> Hello everyone,
>> I am attempted to setup a two cluster situation for object storage
>> disaster recovery. I have two physically separate sites so using 1 big
>> cluster isn’t an option. I’m attempting to follow the guide at:
>> http://ceph.com/docs/v0.80.5/radosgw/federated-config/ . After a couple
>> days of flailing, I’ve settled on using 1 region with two zones, where each
>> cluster is a zone. I’m now attempting to set up an agent as per the
>> “Multi-Site Data Replication section. The agent kicks off ok and starts
>> making all sorts of connections, but no objects were being copied to the
>> non-master zone. I re-ran the agent with the -v flag and saw a lot of:
>>
>> DEBUG:urllib3.connectionpool:"GET
>> /admin/opstate?client-id=radosgw-agent&object=test%2F_shadow_.JjVixjWmebQTrRed36FL6D0vy2gDVZ__39&op-id=phx-r1-head1%3A2451615%3A1
>> HTTP/1.1" 200 None
>>
>> DEBUG:radosgw_agent.worker:op state is []
>>
>> DEBUG:radosgw_agent.worker:error geting op state: list index out of range
>>
>>
>> So it appears something is still wrong with my agent though I have no
>> idea what. I can’t seem to find any errors in any other logs. Does anyone
>> have any insight here?
>>
>> I’m also wondering if what I’m attempting with two cluster in the same
>> region as separate zones makes sense?
>>
>> Thanks, Aaron
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to