Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-05 Thread Marco Baldini - H.S. Amiata
Hi I monitor dmesg in each of the 3 nodes, no hardware issue reported. And the problem happens with various different OSDs in different nodes, for me it is clear it's not an hardware problem. Thanks for reply Il 05/03/2018 21:45, Vladimir Prokofev ha scritto: > always solved by ceph pg re

Re: [ceph-users] When all Mons are down, does existing RBD volume continue to work

2018-03-05 Thread Gregory Farnum
On Sun, Mar 4, 2018 at 12:02 AM Mayank Kumar wrote: > Ceph Users, > > My question is if all mons are down(i know its a terrible situation to > be), does an existing rbd volume which is mapped to a host and being > used(read/written to) continues to work? > > I understand that it wont get notifica

Re: [ceph-users] Slow requests troubleshooting in Luminous - details missing

2018-03-05 Thread Brad Hubbard
On Fri, Mar 2, 2018 at 3:54 PM, Alex Gorbachev wrote: > On Thu, Mar 1, 2018 at 10:57 PM, David Turner wrote: >> Blocked requests and slow requests are synonyms in ceph. They are 2 names >> for the exact same thing. >> >> >> On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev >> wrote: >>> >>> On Thu,

Re: [ceph-users] Delete a Pool - how hard should be?

2018-03-05 Thread Alex Gorbachev
On Mon, Mar 5, 2018 at 2:17 PM, Gregory Farnum wrote: > On Thu, Mar 1, 2018 at 9:21 AM Max Cuttins wrote: >> >> I think this is a good question for everybody: How hard should be delete a >> Pool? >> >> We ask to tell the pool twice. >> We ask to add "--yes-i-really-really-mean-it" >> We ask to ad

[ceph-users] XFS Metadata corruption while activating OSD

2018-03-05 Thread 赵赵贺东
Hello ceph-users,It is a really really Really tough problem for our team.We investigated in the problem for a long time, try a lot of efforts, but can’t solve the problem, even the concentrate cause of the problem is still unclear for us!So, Anyone give any solution/suggestion/opinion whatever  wil

Re: [ceph-users] Ceph SNMP hooks?

2018-03-05 Thread Andre Goree
On 2018/02/28 3:32 pm, David Turner wrote: You could probably write an SNMP module for the new ceph-mgr daemon. What do you want to use to monitor Ceph that requires SNMP? On Wed, Feb 28, 2018 at 1:13 PM Andre Goree wrote: I've looked and haven't found much information besides custom 3rd-pa

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-05 Thread Vladimir Prokofev
> always solved by ceph pg repair That doesn't necessarily means that there's no hardware issue. In my case repair also worked fine and returned cluster to OK state every time, but in time faulty disk fail another scrub operation, and this repeated multiple times before we replaced that disk. One

Re: [ceph-users] rbd mirror mechanics

2018-03-05 Thread Jason Dillaman
On Mon, Mar 5, 2018 at 2:07 PM, Brady Deetz wrote: > While preparing a risk assessment for a DR solution involving RBD, I'm > increasingly unsure of a few things. > > 1) Does the failover from primary to secondary cluster occur automatically > in the case that the primary backing rados pool become

[ceph-users] Cache tier

2018-03-05 Thread Budai Laszlo
Dear all, I have some questions about cache tier in ceph: 1. Can someone share experiences with cache tiering? What are the sensitive things to pay attention regarding the cache tier? Can one use the same ssd for both cache and 2. Is cache tiering supported with bluestore? Any advices for usin

Re: [ceph-users] Delete a Pool - how hard should be?

2018-03-05 Thread Gregory Farnum
On Thu, Mar 1, 2018 at 9:21 AM Max Cuttins wrote: > I think this is a good question for everybody: How hard should be delete a > Pool? > > We ask to tell the pool twice. > We ask to add "--yes-i-really-really-mean-it" > We ask to add ability to mons to delete the pool (and remove this ability > A

[ceph-users] rbd mirror mechanics

2018-03-05 Thread Brady Deetz
While preparing a risk assessment for a DR solution involving RBD, I'm increasingly unsure of a few things. 1) Does the failover from primary to secondary cluster occur automatically in the case that the primary backing rados pool becomes inaccessible? 1.a) If the primary backing rados pool is un

Re: [ceph-users] Deep Scrub distribution

2018-03-05 Thread Gregory Farnum
On Mon, Mar 5, 2018 at 9:56 AM Jonathan D. Proulx wrote: > Hi All, > > I've recently noticed my deep scrubs are EXTREAMLY poorly > distributed. They are stating with in the 18->06 local time start > stop time but are not distrubuted over enough days or well distributed > over the range of days t

[ceph-users] Deep Scrub distribution

2018-03-05 Thread Jonathan D. Proulx
Hi All, I've recently noticed my deep scrubs are EXTREAMLY poorly distributed. They are stating with in the 18->06 local time start stop time but are not distrubuted over enough days or well distributed over the range of days they have. root@ceph-mon0:~# for date in `ceph pg dump | awk '/active/

Re: [ceph-users] Ceph newbie(?) issues

2018-03-05 Thread Daniel K
I had a similar problem with some relatively underpowered servers (2x E5-2603 6 core 1.7ghz no HT, 12-14 2TB OSDs per server, 32Gb RAM) There was a process on a couple of the servers that would hang and chew up all available CPU. When that happened, I started getting scrub errors on those servers.

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-05 Thread Marco Baldini - H.S. Amiata
Hi and thanks for reply The OSDs are all healthy, in fact after a ceph pg repair the ceph health is back to OK and in the OSD log I see repair ok, 0 fixed The SMART data of the 3 OSDs seems fine *OSD.5* # ceph-disk list | grep osd.5  /dev/sdd1 ceph data, active, cluster ceph, osd.5, block

Re: [ceph-users] Ceph newbie(?) issues

2018-03-05 Thread Ronny Aasen
On 05. mars 2018 14:45, Jan Marquardt wrote: Am 05.03.18 um 13:13 schrieb Ronny Aasen: i had some similar issues when i started my proof of concept. especialy the snapshot deletion i remember well. the rule of thumb for filestore that i assume you are running is 1GB ram per TB of osd. so with 8

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-05 Thread Marco Baldini - H.S. Amiata
Hi I just posted in the ceph tracker with my logs and my issue Let's hope this will be fixed Thanks Il 05/03/2018 13:36, Paul Emmerich ha scritto: Hi, yeah, the cluster that I'm seeing this on also has only one host that reports that specific checksum. Two other hosts only report the same

Re: [ceph-users] Ceph newbie(?) issues

2018-03-05 Thread Jan Marquardt
Am 05.03.18 um 13:13 schrieb Ronny Aasen: > i had some similar issues when i started my proof of concept. especialy > the snapshot deletion i remember well. > > the rule of thumb for filestore that i assume you are running is 1GB ram > per TB of osd. so with 8 x 4TB osd's you are looking at 32GB o

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-05 Thread Vladimir Prokofev
> candidate had a read error speaks for itself - while scrubbing it coudn't read data. I had similar issue, and it was just OSD dying - errors and relocated sectors in SMART, just replaced the disk. But in your case it seems that errors are on different OSDs? Are your OSDs all healthy? You can use

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-05 Thread Paul Emmerich
Hi, yeah, the cluster that I'm seeing this on also has only one host that reports that specific checksum. Two other hosts only report the same error that you are seeing. Could you post to the tracker issue that you are also seeing this? Paul 2018-03-05 12:21 GMT+01:00 Marco Baldini - H.S. Amiat

Re: [ceph-users] Ceph newbie(?) issues

2018-03-05 Thread Ronny Aasen
On 05. mars 2018 11:21, Jan Marquardt wrote: Hi, we are relatively new to Ceph and are observing some issues, where I'd like to know how likely they are to happen when operating a Ceph cluster. Currently our setup consists of three servers which are acting as OSDs and MONs. Each server has two

Re: [ceph-users] All pools full after one OSD got OSD_FULL state

2018-03-05 Thread Vladimir Prokofev
I'll pitch in my personal expirience. When single OSD in a pool becomes full(95% used), then all client IO writes to this pool must stop, even if other OSDs are almost free. This is done for the purpose of data intergity. [1] To avoid this you need to balance your failure domains. For example, ass

Re: [ceph-users] RocksDB configuration

2018-03-05 Thread Oliver Freyermuth
After going through: https://de.slideshare.net/sageweil1/bluestore-a-new-storage-backend-for-ceph-one-year-in I can already answer some of my own questions - notably, compaction should happen slowly, and there is high write amplification for SSDs, which could explain why our SSDs in our MDS reach

Re: [ceph-users] All pools full after one OSD got OSD_FULL state

2018-03-05 Thread Jakub Jaszewski
One full OSD has caused that all pools got full. Can anyone help me understand this ? During ongoing PGs backfilling I see that MAX AVAIL values are changing when USED values are constant. GLOBAL: SIZE AVAIL RAW USED %RAW USED 425T 145T 279T 65.70 POOLS:

Re: [ceph-users] Ceph iSCSI is a prank?

2018-03-05 Thread Robert Sander
On 05.03.2018 00:26, Adrian Saul wrote: >   > > We are using Ceph+RBD+NFS under pacemaker for VMware.  We are doing > iSCSI using SCST but have not used it against VMware, just Solaris and > Hyper-V. > > > It generally works and performs well enough – the biggest issues are the > clustering for

Re: [ceph-users] Random health OSD_SCRUB_ERRORS on various OSDs, after pg repair back to HEALTH_OK

2018-03-05 Thread Marco Baldini - H.S. Amiata
Hi After some days with debug_osd 5/5 I found [ERR] in different days, different PGs, different OSDs, different hosts. This is what I get in the OSD logs: *OSD.5 (host 3)* 2018-03-01 20:30:02.702269 7fdf4d515700  2 osd.5 pg_epoch: 16486 pg[9.1c( v 16486'51798 (16431'50251,16486'51798] local-

[ceph-users] RocksDB configuration

2018-03-05 Thread Oliver Freyermuth
Dear Cephalopodians, in the benchmarks done with many files, I noted that our bottleneck was mainly given by the MDS-SSD performance, and notably, after deletion of the many files in CephFS, the RocksDB stayed large and did not shrink. Recreating an OSD from scratch and backfilling it, however,

Re: [ceph-users] BlueStore questions

2018-03-05 Thread Gustavo Varela
There is a presentation of sage, slide 16, https://es.slideshare.net/sageweil1/bluestore-a-new-storage-backend-for-ceph-one-year-in You can probably try that as an initial guide, hope it helps. gus From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Frank Ritchie Sent: do

[ceph-users] Ceph newbie(?) issues

2018-03-05 Thread Jan Marquardt
Hi, we are relatively new to Ceph and are observing some issues, where I'd like to know how likely they are to happen when operating a Ceph cluster. Currently our setup consists of three servers which are acting as OSDs and MONs. Each server has two Intel Xeon L5420 (yes, I know, it's not state o