Re: [ceph-users] help troubleshooting some osd communication problems

2016-04-29 Thread Mike Lovell
i attempted to grab some logs from the two osds in questions with debug_ms and debug_osd at 20. i have looked through them a little bit but digging through the logs at this verbosity is something i don't have much experience with. hopefully someone on the list can help make sense of it. the logs ar

Re: [ceph-users] Hammer broke after adding 3rd osd server

2016-04-29 Thread Andrei Mikhailovsky
A quick update on the case. I think i've isolated the problem. I've spent a while checking the osd servers for differences in configuration. I've noticed two distinctions. The first one being the sysctl.conf tuning options for ipoib, which were not present on the new server. The second one is t

Re: [ceph-users] NO mon start after Jewel Upgrade using systemctl

2016-04-29 Thread Iban Cabrillo
Hi Karsten, It works! [root@cephmon03 ~]# systemctl enable ceph-mon@cephmon03 Created symlink from /etc/systemd/system/ceph-mon.target.wants/ceph-mon@cephmon03.service to /usr/lib/systemd/system/ceph-mon@.service. ceph 731 1 0 12:12 ?00:00:00 /usr/bin/ceph-mon -f --cluster c

Re: [ceph-users] help troubleshooting some osd communication problems

2016-04-29 Thread Alexey Sheplyakov
Hi, > i also wonder if just taking 148 out of the cluster (probably just marking it out) would help As far as I understand this can only harm your data. The acting set of PG 17.73 is [41, 148], so after stopping/taking out OSD 148 OSD 41 will store the only copy of objects in PG 17.73 (so it wo

Re: [ceph-users] Data still in OSD directories after removing

2016-04-29 Thread Andrey Korolyov
On Thu, May 22, 2014 at 12:56 PM, Olivier Bonvalet wrote: > > Le mercredi 21 mai 2014 à 18:20 -0700, Josh Durgin a écrit : >> On 05/21/2014 03:03 PM, Olivier Bonvalet wrote: >> > Le mercredi 21 mai 2014 à 08:20 -0700, Sage Weil a écrit : >> >> You're certain that that is the correct prefix for the

Re: [ceph-users] help troubleshooting some osd communication problems

2016-04-29 Thread Mike Lovell
On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov wrote: > Hi, > > > i also wonder if just taking 148 out of the cluster (probably just > marking it out) would help > > As far as I understand this can only harm your data. The acting set of PG > 17.73 is [41, 148], > so after stopping/taking out

[ceph-users] Backfilling caused RBD corruption on Hammer?

2016-04-29 Thread Robert Sander
Hi, yesterday we ran into a strange bug / mysterious issue with a Hammer 0.94.5 storage cluster. We added OSDs and the cluster started the backfilling. Suddenly one of the running VMs complained that it lost a partition in a 2TB RBD. After resetting the VM it could not boot any more as the RBD h

[ceph-users] workqueue

2016-04-29 Thread Dyweni - Ceph-Users
Hi, I'd like to watch and monitor Ceph's progress as it as asynchronously removes the objects belonging to a snapshot that I just deleted. Is there a to monitor the workqueue? Is there a better way to do this? And determine when the snapshot as been completely deleted? Thanks, Dyweni __

Re: [ceph-users] Backfilling caused RBD corruption on Hammer?

2016-04-29 Thread Mike Lovell
are the new osds running 0.94.5 or did they get the latest .6 packages? are you also using cache tiering? we ran in to a problem with individual rbd objects getting corrupted when using 0.94.6 with a cache tier and min_read_recency_for_promote was > 1. our only solution to corruption that happened

Re: [ceph-users] Backfilling caused RBD corruption on Hammer?

2016-04-29 Thread Robert Sander
Hi, all OSDs are running 0.94.5 as the new ones were added to the existing servers. No cache tiering is involved. We observed many "slow request" warnings during the backfill. As the backfilling with the full weight of the new OSDs would have run for more than 28h and no VM was usable we re-we

[ceph-users] Mapping RBD On Ceph Cluster Node

2016-04-29 Thread Edward Huyer
This is more of a "why" than a "can I/should I" question. The Ceph block device quickstart says (if I interpret it correctly) not to use a physical machine as both a Ceph RBD client and a node for hosting OSDs or other Ceph services. Is this interpretation correct? If so, what is the reasoning?

[ceph-users] Optimal OS configuration for running ceph

2016-04-29 Thread Andrei Mikhailovsky
Hello everyone, Please excuse me if this topic has been covered already. I've not managed to find a guide, checklist or even a set of notes on optimising OS level settings/configuration/services for running ceph. One of the main reasons for asking is I've recently had to troubleshoot a bunch o

[ceph-users] OSD Crashes

2016-04-29 Thread Garg, Pankaj
Hi, I had a fully functional Ceph cluster with 3 x86 Nodes and 3 ARM64 nodes, each with 12 HDD Drives and 2SSD Drives. All these were initially running Hammer, and then were successfully updated to Infernalis (9.2.0). I recently deleted all my OSDs and swapped my drives with new ones on the x86

Re: [ceph-users] OSD Crashes

2016-04-29 Thread Samuel Just
Your fs is throwing an EIO on open. -Sam On Fri, Apr 29, 2016 at 8:54 AM, Garg, Pankaj wrote: > Hi, > > I had a fully functional Ceph cluster with 3 x86 Nodes and 3 ARM64 nodes, > each with 12 HDD Drives and 2SSD Drives. All these were initially running > Hammer, and then were successfully update

Re: [ceph-users] OSD Crashes

2016-04-29 Thread Garg, Pankaj
I can see that. I guess what would that be symptomatic of? How is it doing that on 6 different systems and on multiple OSDs? -Original Message- From: Samuel Just [mailto:sj...@redhat.com] Sent: Friday, April 29, 2016 8:57 AM To: Garg, Pankaj Cc: ceph-users@lists.ceph.com Subject: Re: [ce

Re: [ceph-users] OSD Crashes

2016-04-29 Thread Somnath Roy
Check system log and search for the corresponding drive. It should have the information what is failing.. Thanks & Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Garg, Pankaj Sent: Friday, April 29, 2016 8:59 AM To: Samuel Jus

Re: [ceph-users] OSD Crashes

2016-04-29 Thread Samuel Just
You could strace the process to see precisely what ceph-osd is doing to provoke the EIO. -Sam On Fri, Apr 29, 2016 at 9:03 AM, Somnath Roy wrote: > Check system log and search for the corresponding drive. It should have the > information what is failing.. > > Thanks & Regards > Somnath > > -

Re: [ceph-users] OSD Crashes

2016-04-29 Thread Garg, Pankaj
I think the issue is possibly coming from my Journal drives after upgrade to Infernalis. I have 2 SSDs, which have 6 partitions each for a total of 12 Journals / server. When I create OSDS, I pass the partition names as Journals For e.g. ceph-deploy osd prepare x86Ceph7:/dev/sdd:/dev/sdb1 Thi

Re: [ceph-users] help troubleshooting some osd communication problems

2016-04-29 Thread Mike Lovell
On Fri, Apr 29, 2016 at 9:34 AM, Mike Lovell wrote: > On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov < > asheplya...@mirantis.com> wrote: > >> Hi, >> >> > i also wonder if just taking 148 out of the cluster (probably just >> marking it out) would help >> >> As far as I understand this can onl

Re: [ceph-users] CentOS 7 iscsi gateway using lrbd

2016-04-29 Thread Mike Christie
On 04/29/2016 11:44 AM, Ming Lin wrote: > On Tue, Jan 19, 2016 at 1:34 PM, Mike Christie wrote: >> Everyone is right - sort of :) >> >> It is that target_core_rbd module that I made that was rejected >> upstream, along with modifications from SUSE which added persistent >> reservations support. I

Re: [ceph-users] osd problem upgrading from hammer to jewel

2016-04-29 Thread Randy Orr
Hi, I have a little bit of additional information here that might help debug this situation. From the OSD logs: 2016-04-29 14:32:46.886538 7fa4cd004800 0 osd.2 14422 done with init, starting boot process 2016-04-29 14:32:46.886555 7fa4cd004800 1 -- 10.2.0.116:6808/32079 --> 10.2.0.117:6789/0 --

[ceph-users] Ceph Jewel 10.2.0 Build Error - ldap dependency related to -j1 and radosgw enabled

2016-04-29 Thread Dyweni - Ceph-Users
Hi, When I compile Ceph Jewel 10.2.0 using 'make -j1' I get the following ldap undefined references: ./.libs/librgw.so: undefined reference to `ldap_get_dn' ./.libs/librgw.so: undefined reference to `ldap_search_s' ./.libs/librgw.so: undefined reference to `ldap_memfree' ./.libs/librgw.so: und

Re: [ceph-users] Monitor not starting: Corruption: 12 missing files

2016-04-29 Thread Daniel.Balsiger
Dear Joao , dear ceph users, Thanks for your fast reply. I couldn't get to my ceph cluster so far since I was visiting the OpenStack summit in Austin, TX, and had really no time until now. I just fixed the monitor, it is up and running again, by removing and re-adding it. I still wonder though

Re: [ceph-users] Mapping RBD On Ceph Cluster Node

2016-04-29 Thread Tu Holmes
It can be done.  However, with the node hosting OSDs already has enough work to do and you will run into performance issues.  It's been, and can be done, but you are better off to not do so.  //Tu _ From: Edward Huyer Sent: Friday, April 29, 2016 11:30 AM Subject

Re: [ceph-users] hadoop on cephfs

2016-04-29 Thread Bill Sharer
Actually this guy is already a fan of Hadoop. I was just wondering whether anyone has been playing around with it on top of cephfs lately. It seems like the last round of papers were from around cuttlefish. On 04/28/2016 06:21 AM, Oliver Dzombic wrote: Hi, bad idea :-) Its of course nice a

Re: [ceph-users] Mapping RBD On Ceph Cluster Node

2016-04-29 Thread Gregory Farnum
On Friday, April 29, 2016, Edward Huyer wrote: > This is more of a "why" than a "can I/should I" question. > > The Ceph block device quickstart says (if I interpret it correctly) not to > use a physical machine as both a Ceph RBD client and a node for hosting > OSDs or other Ceph services. > > Is