Hi, The auth caps were as follows:
caps: [mon] allow r caps: [osd] allow rwx pool=hosting_windows_sharedweb, allow rwx pool=infra_systems, allow rwx pool=hosting_linux_sharedweb I changed them (just adding a pool to the list) to: caps: [mon] allow r caps: [osd] allow rwx pool=hosting_windows_sharedweb, allow rwx pool=infra_systems, allow rwx pool=hosting_linux_sharedweb, allow rwx pool=test Thanks J On 1 August 2014 01:17, Brad Hubbard <bhubb...@redhat.com> wrote: > On 07/31/2014 06:37 PM, James Eckersall wrote: > >> Hi, >> >> The stacktraces are very similar. Here is another one with complete >> dmesg: http://pastebin.com/g3X0pZ9E >> > > $ decodecode < tmp.oops > [ 28.636837] Code: dc 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f > 84 17 01 00 00 48 85 c0 0f 84 0e 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 > <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63 > All code > ======== > 0: dc 00 faddl (%rax) > 2: 00 49 8b add %cl,-0x75(%rcx) > 5: 50 push %rax > 6: 08 4d 8b or %cl,-0x75(%rbp) > 9: 20 49 8b and %cl,-0x75(%rcx) > c: 40 10 4d 85 adc %cl,-0x7b(%rbp) > 10: e4 0f in $0xf,%al > 12: 84 17 test %dl,(%rdi) > 14: 01 00 add %eax,(%rax) > 16: 00 48 85 add %cl,-0x7b(%rax) > 19: c0 0f 84 rorb $0x84,(%rdi) > 1c: 0e (bad) > 1d: 01 00 add %eax,(%rax) > 1f: 00 49 63 add %cl,0x63(%rcx) > 22: 46 20 48 8d rex.RX and %r9b,-0x73(%rax) > 26: 4a 01 4d 8b rex.WX add %rcx,-0x75(%rbp) > 2a: 06 (bad) > 2b:* 49 8b 1c 04 mov (%r12,%rax,1),%rbx <-- trapping > instruction > 2f: 4c 89 e0 mov %r12,%rax > 32: 65 49 0f c7 08 cmpxchg16b %gs:(%r8) > 37: 0f 94 c0 sete %al > 3a: 84 c0 test %al,%al > 3c: 74 b9 je 0xfffffffffffffff7 > 3e: 49 rex.WB > 3f: 63 .byte 0x63 > > Code starting with the faulting instruction > =========================================== > 0: 49 8b 1c 04 mov (%r12,%rax,1),%rbx > 4: 4c 89 e0 mov %r12,%rax > 7: 65 49 0f c7 08 cmpxchg16b %gs:(%r8) > c: 0f 94 c0 sete %al > f: 84 c0 test %al,%al > 11: 74 b9 je 0xffffffffffffffcc > 13: 49 rex.WB > 14: 63 .byte 0x63 > > Looks like the value in r12 was bad. You'd have to look at the code and > the rest of the disassembly to work out where that value came from. Good > chance it's the kmem_cache* passed to kmem_cache_alloc of course. > > In this case r12 is c74b1e0b59385d30 and in the stack below it's > 7d10f8ec0c3cb928. Neither is a valid address of course. > > >> The rbd's are mapped by the rbdmap service on boot. >> All our ceph servers are running Ubuntu 14.04 (kernel >> 3.13.0-30-generic). Ceph packages are from the Ubuntu repos, >> version 0.80.1-0ubuntu1.1. >> I should have probably mentioned this info in the initial mail :) >> >> This problem also seemed to get gradually worse over time. >> We had a couple of sporadic crashes at the start of the week, escalating >> to the node being unable to stay up for more than a couple of minutes >> before panicking. >> >> Thanks >> >> J >> >> >> On 31 July 2014 09:12, Ilya Dryomov <ilya.dryo...@inktank.com >> <mailto:ilya.dryo...@inktank.com>> wrote: >> >> On Thu, Jul 31, 2014 at 11:44 AM, James Eckersall >> <james.eckers...@gmail.com <mailto:james.eckers...@gmail.com>> wrote: >> > Hi, >> > >> > I've had a fun time with ceph this week. >> > We have a cluster with 4 OSD (20 OSD's per) servers, 3 mons and a >> server >> > mapping ~200 rbd's and presenting cifs shares. >> > >> > We're using cephx and the export node has its own cephx auth key. >> > >> > I made a change to the key last week, adding rwx access to >> another pool. >> > >> > Since that point, we had sporadic kernel panics of the export node. >> > >> > It got to the point where it would barely finish booting up and >> would panic. >> > >> > Once I removed the extra pool I had added to the auth key, it >> hasn't crashed >> > again. >> > >> > I'm a bit concerned that a change to an auth key can cause this >> type of >> > crash. >> > There were no log entries on mon/osd/export node regarding the >> key at all, >> > so it was only by searching my memory for what had changed that >> allowed me >> > to resolve the problem. >> > >> > From what I could tell from the key, the format was correct and >> the pool >> > that I added did exist, so I am confused as to how this would >> have caused >> > kernel panics. >> > >> > Below is an example of one of the crash stacktraces. >> > >> > [ 32.713504] general protection fault: 0000 [#1] SMP >> > [ 32.724718] Modules linked in: ipt_REJECT xt_tcpudp >> iptable_filter >> > ip_tables x_tables rbd libceph libcrc32c gpio_ich dcdbas intel_rapl >> > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm >> > crct10dif_pclmul joydev crc32_pclmul ghash_clmulni_intel >> aesni_intel >> > aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac >> edac_core >> > shpchp lpc_ich mei_me mei wmi ipmi_si mac_hid acpi_power_meter >> 8021q garp >> > stp mrp llc bonding lp parport nfsd auth_rpcgss nfs_acl nfs lockd >> sunrpc >> > fscache hid_generic igb ixgbe i2c_algo_bit usbhid dca hid ptp >> ahci libahci >> > pps_core megaraid_sas mdio >> > [ 32.843936] CPU: 18 PID: 5030 Comm: tr Not tainted >> 3.13.0-30-generic >> > #54-Ubuntu >> > [ 32.860163] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, >> BIOS 1.6.0 >> > 03/07/2013 >> > [ 32.876774] task: ffff880417b15fc0 ti: ffff8804273f4000 task.ti: >> > ffff8804273f4000 >> > [ 32.893384] RIP: 0010:[<ffffffff811a19c5>] [<ffffffff811a19c5>] >> > kmem_cache_alloc+0x75/0x1e0 >> > [ 32.912198] RSP: 0018:ffff8804273f5d40 EFLAGS: 00010286 >> > [ 32.924015] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >> > 00000000000011ed >> > [ 32.939856] RDX: 00000000000011ec RSI: 00000000000080d0 RDI: >> > ffff88042f803700 >> > [ 32.955696] RBP: ffff8804273f5d70 R08: 0000000000017260 R09: >> > ffffffff811be63c >> > [ 32.971559] R10: 8080808080808080 R11: 0000000000000000 R12: >> > 7d10f8ec0c3cb928 >> > [ 32.987421] R13: 00000000000080d0 R14: ffff88042f803700 R15: >> > ffff88042f803700 >> > [ 33.003284] FS: 0000000000000000(0000) >> GS:ffff88042fd20000(0000) >> > knlGS:0000000000000000 >> > [ 33.021281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > [ 33.034068] CR2: 00007f01a8fced40 CR3: 000000040e52f000 CR4: >> > 00000000000407e0 >> > [ 33.049929] Stack: >> > [ 33.054456] ffffffff811be63c 0000000000000000 ffff88041be52780 >> > ffff880428052000 >> > [ 33.071259] ffff8804273f5f2c 00000000ffffff9c ffff8804273f5d98 >> > ffffffff811be63c >> > [ 33.088084] 0000000000000080 ffff8804273f5f2c ffff8804273f5e40 >> > ffff8804273f5e30 >> > [ 33.104908] Call Trace: >> > [ 33.110399] [<ffffffff811be63c>] ? get_empty_filp+0x5c/0x180 >> > [ 33.123188] [<ffffffff811be63c>] get_empty_filp+0x5c/0x180 >> > [ 33.135593] [<ffffffff811cc03d>] path_openat+0x3d/0x620 >> > [ 33.147422] [<ffffffff811cd47a>] do_filp_open+0x3a/0x90 >> > [ 33.159250] [<ffffffff811a1985>] ? kmem_cache_alloc+0x35/0x1e0 >> > [ 33.172405] [<ffffffff811cc6bf>] ? getname_flags+0x4f/0x190 >> > [ 33.185004] [<ffffffff811da237>] ? __alloc_fd+0xa7/0x130 >> > [ 33.197025] [<ffffffff811bbb99>] do_sys_open+0x129/0x280 >> > [ 33.209049] [<ffffffff81020d25>] ? >> syscall_trace_enter+0x145/0x250 >> > [ 33.222992] [<ffffffff811bbd0e>] SyS_open+0x1e/0x20 >> > [ 33.234053] [<ffffffff8172aeff>] tracesys+0xe1/0xe6 >> > [ 33.245112] Code: dc 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d >> 85 e4 0f >> > 84 17 01 00 00 48 85 c0 0f 84 0e 01 00 00 49 63 46 20 48 8d 4a 01 >> 4d 8b 06 >> > <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63 >> > [ 33.292549] RIP [<ffffffff811a19c5>] >> kmem_cache_alloc+0x75/0x1e0 >> > [ 33.306192] RSP <ffff8804273f5d40> >> >> Hi James, >> >> Are all the stacktraces the same? When are those rbd images mapped >> - during >> boot with some sort of init script? Can you attach the entire dmesg? >> >> Thanks, >> >> Ilya >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > -- > > Kindest Regards, > > Brad Hubbard > Senior Software Maintenance Engineer > Red Hat Global Support Services > Asia Pacific Region > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com