That did the trick, we had it set to 0 just on the swift rgw definitions 
although it was set on other rgw services, I'm guessing someone must have 
thought there was a different precedence in play in the past.

On Tue, 2018-12-11 at 11:41 -0500, Casey Bodley wrote:

Hi Leon,


Are you running with a non-default value of rgw_gc_max_objs? I was able

to reproduce this exact stack trace by setting rgw_gc_max_objs = 0; I

can't think of any other way to get a 'Floating point exception' here.


On 12/11/18 10:31 AM, Leon Robinson wrote:

Hello, I have found a surefire way to bring down our swift gateways.


First, upload a bunch of large files and split it in to segments, e.g.


for i in {1..100}; do swift upload test_container -S 10485760

CentOS-7-x86_64-GenericCloud.qcow2 --object-name

CentOS-7-x86_64-GenericCloud.qcow2-$i; done


This creates 100 objects in test_container and 1000 or so objects in

test_container_segments


Then, Delete them. Preferably in a ludicrous manner.


for i in $(swift list test_container); do swift delete test_container

$i; done


What results is:


 -13> 2018-12-11 15:17:57.627655 7fc128b49700  1 --

172.28.196.121:0/464072497 <== osd.480 172.26.212.6:6802/2058882 1

==== osd_op_reply(11 .dir.default.1083413551.2.7 [call,call]

v1423252'7548804 uv7548804 ondisk = 0) v8 ==== 213+0+0 (3895049453 0

0) 0x55c98f45e9c0 con 0x55c98f4d7800

   -12> 2018-12-11 15:17:57.627827 7fc0e3ffe700  1 --

172.28.196.121:0/464072497 --> 172.26.221.7:6816/2366816 --

osd_op(unknown.0.0:12 14.110b

14:d08c26b8:::default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10%2f1532606905.440697%2f938016768%2f10485760%2f00000037:head

[cmpxattr user.rgw.idtag (25) op 1 mode 1,call rgw.obj_remove] snapc

0=[] ondisk+write+known_if_redirected e1423252) v8 -- 0x55c98f4603c0 con 0

   -11> 2018-12-11 15:17:57.628582 7fc128348700  5 --

172.28.196.121:0/157062182 >> 172.26.225.9:6828/2257653

conn(0x55c98f0eb000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH

pgs=540 cs=1 l=1). rx osd.87 seq 2 0x55c98f4603c0 osd_op_reply(340

obj_delete_at_hint.0000000055 [call] v1423252'9217746 uv9217746 ondisk

= 0) v8

   -10> 2018-12-11 15:17:57.628604 7fc128348700  1 --

172.28.196.121:0/157062182 <== osd.87 172.26.225.9:6828/2257653 2 ====

osd_op_reply(340 obj_delete_at_hint.0000000055 [call] v1423252'9217746

uv9217746 ondisk = 0) v8 ==== 173+0+0 (3971813511 0 0) 0x55c98f4603c0

con 0x55c98f0eb000

    -9> 2018-12-11 15:17:57.628760 7fc1017f9700  1 --

172.28.196.121:0/157062182 --> 172.26.225.9:6828/2257653 --

osd_op(unknown.0.0:341 13.4f

13:f3db1134:::obj_delete_at_hint.0000000055:head [call timeindex.list]

snapc 0=[] ondisk+read+known_if_redirected e1423252) v8 --

0x55c98f45fa00 con 0

    -8> 2018-12-11 15:17:57.629306 7fc128348700  5 --

172.28.196.121:0/157062182 >> 172.26.225.9:6828/2257653

conn(0x55c98f0eb000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH

pgs=540 cs=1 l=1). rx osd.87 seq 3 0x55c98f45fa00 osd_op_reply(341

obj_delete_at_hint.0000000055 [call] v0'0 uv9217746 ondisk = 0) v8

    -7> 2018-12-11 15:17:57.629326 7fc128348700  1 --

172.28.196.121:0/157062182 <== osd.87 172.26.225.9:6828/2257653 3 ====

osd_op_reply(341 obj_delete_at_hint.0000000055 [call] v0'0 uv9217746

ondisk = 0) v8 ==== 173+0+15 (3272189389 0 2149983739) 0x55c98f45fa00

con 0x55c98f0eb000

    -6> 2018-12-11 15:17:57.629398 7fc128348700  5 --

172.28.196.121:0/464072497 >> 172.26.221.7:6816/2366816

conn(0x55c98f4d6000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH

pgs=181 cs=1 l=1). rx osd.58 seq 2 0x55c98f45fa00 osd_op_reply(12

default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10/1532606905.440697/938016768/10485760/00000037

[cmpxattr (25) op 1 mode 1,call] v1423252'743755 uv743755 ondisk = 0) v8

    -5> 2018-12-11 15:17:57.629418 7fc128348700  1 --

172.28.196.121:0/464072497 <== osd.58 172.26.221.7:6816/2366816 2 ====

osd_op_reply(12

default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10/1532606905.440697/938016768/10485760/00000037

[cmpxattr (25) op 1 mode 1,call] v1423252'743755 uv743755 ondisk = 0)

v8 ==== 290+0+0 (3763879162 0 0) 0x55c98f45fa00 con 0x55c98f4d6000

    -4> 2018-12-11 15:17:57.629458 7fc1017f9700  1 --

172.28.196.121:0/157062182 --> 172.26.225.9:6828/2257653 --

osd_op(unknown.0.0:342 13.4f

13:f3db1134:::obj_delete_at_hint.0000000055:head [call lock.unlock]

snapc 0=[] ondisk+write+known_if_redirected e1423252) v8 --

0x55c98f45fd40 con 0

    -3> 2018-12-11 15:17:57.629603 7fc0e3ffe700  1 --

172.28.196.121:0/464072497 --> 172.26.212.6:6802/2058882 --

osd_op(unknown.0.0:13 15.1e0

15:079bdcbb:::.dir.default.1083413551.2.7:head [call

rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[]

ondisk+write+known_if_redirected e1423252) v8 -- 0x55c98f460700 con 0

    -2> 2018-12-11 15:17:57.631312 7fc128b49700  5 --

172.28.196.121:0/464072497 >> 172.26.212.6:6802/2058882

conn(0x55c98f4d7800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH

pgs=202 cs=1 l=1). rx osd.480 seq 2 0x55c98f460700 osd_op_reply(13

.dir.default.1083413551.2.7 [call,call] v1423252'7548805 uv7548805

ondisk = 0) v8

    -1> 2018-12-11 15:17:57.631329 7fc128b49700  1 --

172.28.196.121:0/464072497 <== osd.480 172.26.212.6:6802/2058882 2

==== osd_op_reply(13 .dir.default.1083413551.2.7 [call,call]

v1423252'7548805 uv7548805 ondisk = 0) v8 ==== 213+0+0 (4216487267 0

0) 0x55c98f460700 con 0x55c98f4d7800

     0> 2018-12-11 15:17:57.631834 7fc0e3ffe700 -1 *** Caught signal

(Floating point exception) **

 in thread 7fc0e3ffe700 thread_name:civetweb-worker


 ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94)

luminous (stable)

 1: (()+0x200024) [0x55c98cc95024]

 2: (()+0x11390) [0x7fc13e474390]

 3: (RGWGC::tag_index(std::__cxx11::basic_string<char,

std::char_traits<char>, std::allocator<char> > const&)+0x56)

[0x55c98cf78cc6]

 4: (RGWGC::send_chain(cls_rgw_obj_chain&,

std::__cxx11::basic_string<char, std::char_traits<char>,

std::allocator<char> > const&, bool)+0x6a) [0x55c98cf7b06a]

 5: (RGWRados::Object::complete_atomic_modification()+0xd3)

[0x55c98cdbfb63]

 6: (RGWRados::Object::Delete::delete_obj()+0xa22) [0x55c98cdf4142]

 7: (RGWDeleteObj::execute()+0x46c) [0x55c98cd8802c]

 8: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*,

req_state*, bool)+0x165) [0x55c98cdb01c5]

 9: (process_request(RGWRados*, RGWREST*, RGWRequest*,

std::__cxx11::basic_string<char, std::char_traits<char>,

std::allocator<char> > const&, rgw::auth::StrategyRegistry const&,

RGWRestfulIO*, OpsLogSocket*, int*)+0x1dbc) [0x55c98cdb234c]

 10: (RGWCivetWebFrontend::process(mg_connection*)+0x38f) [0x55c98cc4aacf]

 11: (()+0x1f05d9) [0x55c98cc855d9]

 12: (()+0x1f1fa9) [0x55c98cc86fa9]

 13: (()+0x76ba) [0x7fc13e46a6ba]

 14: (clone()+0x6d) [0x7fc133b5941d]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is

needed to interpret this.


--- logging levels ---

   0/ 5 none

   0/ 1 lockdep

   0/ 1 context

   1/ 1 crush

   1/ 5 mds

   1/ 5 mds_balancer

   1/ 5 mds_locker

   1/ 5 mds_log

   1/ 5 mds_log_expire

   1/ 5 mds_migrator

   0/ 1 buffer

   0/ 1 timer

   0/ 1 filer

   0/ 1 striper

   0/ 1 objecter

   0/ 5 rados

   0/ 5 rbd

   0/ 5 rbd_mirror

   0/ 5 rbd_replay

   0/ 5 journaler

   0/ 5 objectcacher

   0/ 5 client

   1/ 5 osd

   0/ 5 optracker

   0/ 5 objclass

   1/ 3 filestore

   1/ 3 journal

   0/ 5 ms

   1/ 5 mon

   0/10 monc

   0/ 0 paxos

   0/ 5 tp

   1/ 5 auth

   1/ 5 crypto

   1/ 1 finisher

   1/ 1 reserver

   1/ 5 heartbeatmap

   1/ 5 perfcounter

   1/ 5 rgw

   1/10 civetweb

   1/ 5 javaclient

   1/ 5 asok

   1/ 1 throttle

   0/ 0 refs

   1/ 5 xio

   1/ 5 compressor

   1/ 5 bluestore

   1/ 5 bluefs

   1/ 3 bdev

   1/ 5 kstore

   4/ 5 rocksdb

   4/ 5 leveldb

   4/ 5 memdb

   1/ 5 kinetic

   1/ 5 fuse

   1/ 5 mgr

   1/ 5 mgrc

   1/ 5 dpdk

   1/ 5 eventtrace

  -2/-2 (syslog threshold)

  -1/-1 (stderr threshold)

  max_recent     10000

  max_new         1000

  log_file /var/log/ceph/radosgw_swift.log



Which isn't great. We can restart the radosgw but then anyone else who

fancies deleting a large segmented object can kill our service.


Any ideas?


--

Leon L. Robinson <

<mailto:leon.robin...@ukfast.co.uk>

leon.robin...@ukfast.co.uk



<mailto:

<mailto:leon.robin...@ukfast.co.uk>

leon.robin...@ukfast.co.uk

>>


------------------------------------------------------------------------


NOTICE AND DISCLAIMER

This e-mail (including any attachments) is intended for the

above-named person(s). If you are not the intended recipient, notify

the sender immediately, delete this email from your system and do not

disclose or use for any purpose. We may monitor all incoming and

outgoing emails in line with current legislation. We have taken steps

to ensure that this email and attachments are free from any virus, but

it remains your responsibility to ensure that viruses do not adversely

affect you


_______________________________________________

ceph-users mailing list

<mailto:ceph-users@lists.ceph.com>

ceph-users@lists.ceph.com


<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________

ceph-users mailing list

<mailto:ceph-users@lists.ceph.com>

ceph-users@lists.ceph.com


<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


________________________________

NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named 
person(s). If you are not the intended recipient, notify the sender 
immediately, delete this email from your system and do not disclose or use for 
any purpose. We may monitor all incoming and outgoing emails in line with 
current legislation. We have taken steps to ensure that this email and 
attachments are free from any virus, but it remains your responsibility to 
ensure that viruses do not adversely affect you
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to