[ceph-users] pg_autoscaler is not working

2019-11-26 Thread Thomas Schneider
Hi, I enabled pg_autoscaler on a specific pool ssd. I failed to increase pg_num / pgp_num on pools ssd to 1024: root@ld3955:~# ceph osd pool autoscale-status  POOL   SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  cephfs_metadata  395.8

[ceph-users] Cannot increate pg_num / pgp_num on a pool

2019-11-24 Thread Thomas
Hi, I failed to increase pg_num / pgp_num on pools ssd to 1024: root@ld3976:~# ceph osd pool get ssd pg_num pg_num: 512 root@ld3976:~# ceph osd pool get ssd pgp_num pgp_num: 512 root@ld3976:~# ceph osd pool set ssd pg_num 1024 root@ld3976:~# ceph osd pool get ssd pg_num pg_num: 512 When I check

[ceph-users] Cannot increate pg_num / pgp_num on a pool

2019-11-24 Thread Thomas
Hi, I failed to increase pg_num / pgp_num on pools ssd to 1024: root@ld3976:~# ceph osd pool get ssd pg_num pg_num: 512 root@ld3976:~# ceph osd pool get ssd pgp_num pgp_num: 512 root@ld3976:~# ceph osd pool set ssd pg_num 1024 root@ld3976:~# ceph osd pool get ssd pg_num pg_num: 512 When I check

Re: [ceph-users] Command ceph osd df hangs

2019-11-21 Thread Thomas Schneider
; I had this when testing pg_autoscaler, after some time every command > would hang. Restarting the MGR helped for a short period of time, then > I disabled pg_autoscaler. This is an upgraded cluster, currently on > Nautilus. > > Regards, > Eugen > > > Zitat von Thomas

[ceph-users] Command ceph osd df hangs

2019-11-21 Thread Thomas Schneider
Hi, command ceph osd df does not return any output. Based on the strace output there's a timeout. [...] mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f53006b9000 brk(0x55c2579b6000) = 0x55c2579b6000 brk(0x55c2579d7000) = 0x55

Re: [ceph-users] Cannot enable pg_autoscale_mode

2019-11-21 Thread Thomas Schneider
Update: Issue is solved. The output of "ceph osd dump" showed that the required setting was incorrect, means require_osd_release luminous After executing ceph osd require-osd-release nautilus I can enable pg_autoscale_mode on any pool. THX Am 21.11.2019 um 13:51 schrieb Paul Emmerich: > "ceph o

Re: [ceph-users] Cannot enable pg_autoscale_mode

2019-11-21 Thread Thomas Schneider
Looks like the flag is not correct. root@ld3955:~# ceph osd dump | grep nautilus root@ld3955:~# ceph osd dump | grep require require_min_compat_client luminous require_osd_release luminous Am 21.11.2019 um 13:51 schrieb Paul Emmerich: > "ceph osd dump" shows you if the flag is set > > > Paul >

Re: [ceph-users] Cannot enable pg_autoscale_mode

2019-11-21 Thread Thomas Schneider
Hello Paul, I didn't skip this step. Actually I'm sure that everything on Cluster is on Nautilus because I had issues with SLES 12SP2 Clients that failed to connect due to outdated client tools that could not connect to Nautilus. Would it make sense to execute ceph osd require-osd-release nautil

[ceph-users] Cannot enable pg_autoscale_mode

2019-11-21 Thread Thomas Schneider
Hi, I try to enable pg_autoscale_mode on a specific pool of my cluster, however this returns an error. root@ld3955:~# ceph osd pool set ssd pg_autoscale_mode on Error EINVAL: must set require_osd_release to nautilus or later before setting pg_autoscale_mode The error message is clear, but my clus

[ceph-users] Error in MGR log: auth: could not find secret_id

2019-11-20 Thread Thomas Schneider
Hi, my Ceph cluster is in unhealthy state and busy with recovery. I'm observing the MGR log and this is showing this error message regularely: 2019-11-20 09:51:45.211 7f7205581700  0 auth: could not find secret_id=4193 2019-11-20 09:51:45.211 7f7205581700  0 cephx: verify_authorizer could not get

Re: [ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage

2019-09-24 Thread Thomas
Hi, I'm experiencing the same issue with this setting in ceph.conf:     osd op queue = wpq     osd op queue cut off = high Furthermore I cannot read any old data in the relevant pool that is serving CephFS. However, I can write new data and read this new data. Regards Thoma

Re: [ceph-users] Help understanding EC object reads

2019-09-16 Thread Thomas Byrne - UKRI STFC
rnum > Sent: 09 September 2019 23:25 > To: Byrne, Thomas (STFC,RAL,SC) > Cc: ceph-users > Subject: Re: [ceph-users] Help understanding EC object reads > > On Thu, Aug 29, 2019 at 4:57 AM Thomas Byrne - UKRI STFC > wrote: > > > > Hi all, > > > > I’m investiga

[ceph-users] OSD Down After Reboot

2019-08-29 Thread Thomas Sumpter
Hi Folks, I have found similar reports of this problem in the past but can't seem to find any solution to it. We have ceph filesystem running mimic version 13.2.5. OSDs are running on AWS EC2 instances with centos 7. OSD disk is an AWS nvme device. Problem I, sometimes when rebooting an OSD in

[ceph-users] Help understanding EC object reads

2019-08-29 Thread Thomas Byrne - UKRI STFC
Hi all, I'm investigating an issue with our (non-Ceph) caching layers of our large EC cluster. It seems to be turning users requests for whole objects into lots of small byte range requests reaching the OSDs, but I'm not sure how inefficient this behaviour is in reality. My limited understandi

[ceph-users] No files in snapshot

2019-08-26 Thread Thomas Schneider
Hi, I'm running Debian 10 with btrfs-progs=5.2.1. Creating snapshots with snapper=0.8.2 works w/o errors. However, I run into an issue and need to restore various files. I thought that I could simply take the files from a snapshot created before. However, the files required don't exist in any

Re: [ceph-users] Scrub start-time and end-time

2019-08-14 Thread Thomas Byrne - UKRI STFC
Hi Torben, > Is it allowed to have the scrub period cross midnight ? eg have start time at > 22:00 and end time 07:00 next morning. Yes, I think that's what the way it is mostly used, primarily to reduce the scrub impact during waking/working hours. > I assume that if you only configure the on

Re: [ceph-users] RGW 4 MiB objects

2019-08-01 Thread Thomas Bennett
ul 31, 2019 at 2:36 PM Aleksey Gutikov wrote: > Hi Thomas, > > We did some investigations some time before and got several rules how to > configure rgw and osd for big files stored on erasure-coded pool. > Hope it will be useful. > And if I have any mistakes, please let me kno

Re: [ceph-users] How to deal with slow requests related to OSD bugs

2019-08-01 Thread Thomas Bennett
situations? A OSD > blocking queries in a RBD scenario is a big deal, as plenty of VMs will > have disk timeouts which can lead to the VM just panicking. > > > > Thanks! > > Xavier > > > ___ > ceph-users mailing list &

Re: [ceph-users] RGW configuration parameters

2019-07-30 Thread Thomas Bennett
Hi Casey, Thanks for your reply. Just to make sure I understand correctly- would that only be if the S3 object size for the put/get is multiples of your rgw_max_chunk_size? Kind regards, Tom On Tue, 30 Jul 2019 at 16:57, Casey Bodley wrote: > Hi Thomas, > > I see that you're

[ceph-users] RGW configuration parameters

2019-07-30 Thread Thomas Bennett
Does anyone know what these parameters are for. I'm not 100% sure I understand what a window is in context of rgw objects: - rgw_get_obj_window_size - rgw_put_obj_min_window_size The code points to throttling I/O. But some more info would be useful. Kind regards, Tom __

[ceph-users] RGW 4 MiB objects

2019-07-30 Thread Thomas Bennett
Hi, Does anyone out there use bigger than default values for rgw_max_chunk_size and rgw_obj_stripe_size? I'm planning to set rgw_max_chunk_size and rgw_obj_stripe_size to 20MiB, as it suits our use case and from our testing we can't see any obvious reason not to. Is there some convincing experi

Re: [ceph-users] How to add 100 new OSDs...

2019-07-25 Thread Thomas Byrne - UKRI STFC
As a counterpoint, adding large amounts of new hardware in gradually (or more specifically in a few steps) has a few benefits IMO. - Being able to pause the operation and confirm the new hardware (and cluster) is operating as expected. You can identify problems with hardware with OSDs at 10% we

Re: [ceph-users] OSDs taking a long time to boot due to 'clear_temp_objects', even with fresh PGs

2019-06-25 Thread Thomas Byrne - UKRI STFC
Gregory Farnum Sent: 24 June 2019 17:30 To: Byrne, Thomas (STFC,RAL,SC) Cc: ceph-users Subject: Re: [ceph-users] OSDs taking a long time to boot due to 'clear_temp_objects', even with fresh PGs On Mon, Jun 24, 2019 at 9:06 AM Thomas Byrne - UKRI STFC wrote: > > Hi all, > &g

[ceph-users] OSDs taking a long time to boot due to 'clear_temp_objects', even with fresh PGs

2019-06-24 Thread Thomas Byrne - UKRI STFC
Hi all, Some bluestore OSDs in our Luminous test cluster have started becoming unresponsive and booting very slowly. These OSDs have been used for stress testing for hardware destined for our production cluster, so have had a number of pools on them with many, many objects in the past. All

[ceph-users] NFS-Ganesha Mounts as a Read-Only Filesystem

2019-04-06 Thread thomas
ime,sync 172.16.32.15:/ /mnt/cephfs I have tried stripping much of the config and altering mount options, but so far completely unable to decipher the cause. Also seems I’m not the only one who has been caught on this: https://www.spinics.net/lists/ceph-devel/msg41201.html Thanks in adv

[ceph-users] rbd unmap fails with error: rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy

2019-02-27 Thread Thomas
Hi, I have noticed an error when writing to a mapped RBD. Therefore I unmounted the block device. Then I tried to unmap it w/o success: ld2110:~ # rbd unmap /dev/rbd0 rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy The same block device is mapped on another client and there

Re: [ceph-users] Modify ceph.mon network required

2019-01-25 Thread Thomas
Thanks. This procedure works very well. Am 25.01.2019 um 14:24 schrieb Janne Johansson: > Den fre 25 jan. 2019 kl 09:52 skrev cmonty14 <74cmo...@gmail.com>: >> Hi, >> I have identified a major issue with my cluster setup consisting of 3 nodes: >> all monitors are connected to cluster network. >

Re: [ceph-users] [Solved]reating a block device user with restricted access to image

2019-01-25 Thread Thomas
he error was caused by failure when copy & paste from Eugen's instructions that are 100% correct! Thanks for your great support!!! Maybe another question related to this topic: If I write a backup into a RBD, will Ceph use single IO stream or

Re: [ceph-users] Creating a block device user with restricted access to image

2019-01-25 Thread Thomas
mitted rbd: error opening image gbs: (1) Operation not permitted In some cases useful info is found in syslog - try "dmesg | tail". rbd: map failed: (1) Operation not permitted Regards Thomas Am 25.01.2019 um 12:31 schrieb Eugen Block: > You can check all objects of that pool to see i

Re: [ceph-users] Creating a block device user with restricted access to image

2019-01-25 Thread Thomas
s found in syslog - try "dmesg | tail". rbd: map failed: (1) Operation not permitted Regards Thomas Am 25.01.2019 um 11:52 schrieb Eugen Block: > osd 'allow rwx > pool object_prefix rbd_data.2b36cf238e1f29; allow rwx pool > object_prefix rbd_header.2b36cf238e1f29 ___

[ceph-users] Using Ceph central backup storage - Best practice creating pools

2019-01-21 Thread Thomas
Hi,   my use case for Ceph is serving a central backup storage. This means I will backup multiple databases in Ceph storage cluster.   This is my question: What is the best practice for creating pools & images? Should I create multiple pools, means one pool per database? Or should I create a single

[ceph-users] Using Ceph central backup storage - Best practice creating pools

2019-01-21 Thread Thomas
Hi,   my use case for Ceph is serving a central backup storage. This means I will backup multiple databases in Ceph storage cluster.   This is my question: What is the best practice for creating pools & images? Should I create multiple pools, means one pool per database? Or should I create a single

[ceph-users] Best practice creating pools / rbd images

2019-01-15 Thread Thomas
Hi,   my use case for Ceph is serving a central backup storage. This means I will backup multiple databases in Ceph storage cluster.   This is my question: What is the best practice for creating pools & images? Should I create multiple pools, means one pool per database? Or should I create a single

Re: [ceph-users] Is it possible to increase Ceph Mon store?

2019-01-08 Thread Thomas Byrne - UKRI STFC
For what it's worth, I think the behaviour Pardhiv and Bryan are describing is not quite normal, and sounds similar to something we see on our large luminous cluster with elderly (created as jewel?) monitors. After large operations which result in the mon stores growing to 20GB+, leaving the clu

Re: [ceph-users] ceph health JSON format has changed

2019-01-02 Thread Thomas Byrne - UKRI STFC
> In previous versions of Ceph, I was able to determine which PGs had > scrub errors, and then a cron.hourly script ran "ceph pg repair" for them, > provided that they were not already being scrubbed. In Luminous, the bad > PG is not visible in "ceph --status" anywhere. Should I use something

Re: [ceph-users] ceph health JSON format has changed sync?

2019-01-02 Thread Thomas Byrne - UKRI STFC
I recently spent some time looking at this, I believe the 'summary' and 'overall_status' sections are now deprecated. The 'status' and 'checks' fields are the ones to use now. The 'status' field gives you the OK/WARN/ERR, but returning the most severe error condition from the 'checks' section i

Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2019-01-02 Thread Thomas Byrne - UKRI STFC
Assuming I understand it correctly: "pg_upmap_items 6.0 [40,20]" refers to replacing (upmapping?) osd.40 with osd.20 in the acting set of the placement group '6.0'. Assuming it's a 3 replica PG, the other two OSDs in the set remain unchanged from the CRUSH calculation. "pg_upmap_items 6.6 [45,

[ceph-users] Some pgs stuck unclean in active+remapped state

2018-11-19 Thread Thomas Klute
0.el7.x86_64 ceph-osd-10.2.11-0.el7.x86_64 ceph-mon-10.2.11-0.el7.x86_64 ceph-deploy-1.5.39-0.noarch ceph-10.2.11-0.el7.x86_64 Could please someone help how to proceed? Thanks and kind regards, Thomas ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Thomas White
-> 10.2.10 -> 12.2.9 in the past 2 weeks with no issues. That said, it is disappointing these packages are making their way into repositories without the proper announcements for an LTS release, especially given this is enterprise orientated software. Thomas -Original Message- From: ceph

[ceph-users] Fwd: Ceph Meetup Cape Town

2018-10-30 Thread Thomas Bennett
ike to attend, please complete the following form to register: https://goo.gl/forms/imuP47iCYssNMqHA2 Kind regards, SARAO storage team -- Thomas Bennett SARAO Science Data Processing ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.

[ceph-users] getattr - failed to rdlock waiting

2018-10-02 Thread Thomas Sumpter
Hi Folks, I am looking for advice on how to troubleshoot some long operations found in MDS. Most of the time performance is fantastic, but occasionally and to no real pattern or trend, a gettattr op will take up to ~30 seconds to complete in MDS which is stuck on "event": "failed to rdlock, wai

[ceph-users] How many objects to expect?

2018-09-26 Thread Thomas Sumpter
Hello, I have two independent but almost identical systems, one of them (A) the total number of objects stays around 200, the other (B) has been steadily increasing and now seems to have levelled off at around 4000 objects. The total used data remains roughly the same, but this data is continuou

[ceph-users] All shards of PG missing object and inconsistent

2018-09-21 Thread Thomas White
resolve this inconsistency when the object is supposed to be absent? Kind Regards, Thomas ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Delay Between Writing Data and that Data being available for reading?

2018-09-20 Thread Thomas Sumpter
: Thomas Sumpter Sent: Wednesday, September 19, 2018 4:31 PM To: 'Gregory Farnum' Cc: ceph-users@lists.ceph.com Subject: RE: [ceph-users] Delay Between Writing Data and that Data being available for reading? Linux version 4.18.4-1.el7.elrepo.x86_64 (mockbuild@Build64R7) (gcc version 4.8.

Re: [ceph-users] Delay Between Writing Data and that Data being available for reading?

2018-09-19 Thread Thomas Sumpter
Linux version 4.18.4-1.el7.elrepo.x86_64 (mockbuild@Build64R7) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)) CentOS 7 From: Gregory Farnum Sent: Wednesday, September 19, 2018 4:27 PM To: Thomas Sumpter Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Delay Between Writing Data

Re: [ceph-users] Delay Between Writing Data and that Data being available for reading?

2018-09-19 Thread Thomas Sumpter
(5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable) Regards, Tom From: Gregory Farnum Sent: Wednesday, September 19, 2018 4:04 PM To: Thomas Sumpter Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Delay Between Writing Data and that Data being available for reading? You're going to need to te

[ceph-users] Delay Between Writing Data and that Data being available for reading?

2018-09-19 Thread Thomas Sumpter
Hello, We have Mimic version 13.2.1 using Bluestore. OSDs are using NVMe disks for data storage (in AWS). Four OSDs are active in replicated mode. Further information on request, since there are so many config options I am not sure where to focus my attention yet. Assume we have default options.

Re: [ceph-users] Installing ceph 12.2.4 via Ubuntu apt

2018-08-28 Thread Thomas Bennett
the version to a > folder and you can create a repo file that reads from a local directory. > That's how I would re-install my test lab after testing an upgrade > procedure to try it over again. > > On Tue, Aug 28, 2018, 1:01 AM Thomas Bennett wrote: > >> Hi, >>

[ceph-users] How to mount NFS-Ganesha-ressource via Proxmox-NFS-Plugin?

2018-08-28 Thread Naumann, Thomas
path Aug 27 13:14:48 tr-25-3 pvestatd[15777]: file /etc/pve/storage.cfg line 82 (skip section 'test'): missing value for required option 'export' ... mounts via cli (mount -t nfs -o nfsvers=4.1,noauto,soft,sync,proto=tcp x.x.x.x:/ /mnt/ganesha/) are working without issues -

Re: [ceph-users] SAN or DAS for Production ceph

2018-08-28 Thread Thomas White
Hi James, I can see where some of the confusion has arisen, hopefully I can put at least some of it to rest. In the Tumblr post from Yahoo, the keyword to look out for is “nodes”, which is distinct from individual hard drives which in Ceph is an OSD in most cases. So you would have multiple

[ceph-users] Installing ceph 12.2.4 via Ubuntu apt

2018-08-28 Thread Thomas Bennett
ey're just not included in the package distribution. Is this the desired behaviour or a misconfiguration? Cheers, Tom -- Thomas Bennett SARAO Science Data Processing ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo

Re: [ceph-users] Inconsistent PG could not be repaired

2018-08-14 Thread Thomas White
Hi Arvydas, The error seems to suggest this is not an issue with your object data, but the expected object digest data. I am unable to access where I stored my very hacky diagnosis process for this, but our eventual fix was to locate the bucket or files affected and then rename an object wit

Re: [ceph-users] Ceph upgrade Jewel to Luminous

2018-08-14 Thread Thomas White
Hi Jaime, Upgrading directly should not be a problem. It is usually recommended to go to the latest minor release before upgrading major versions, but my own migration from 10.2.10 to 12.2.5 went seamlessly and I can’t see of any technical limitation which would hinder or prevent this proces

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-13 Thread Thomas White
Hi Steven, Just to somewhat clarify my previous post, I mention OSDs in the sense that the OS is installed on the OSD server using the SD card, I would absolutely recommend against using SD cards as the actual OSD media. This of course misses another point, which is for the Mons or other suc

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-13 Thread thomas
Hi Steven, If you are running OSDs on the SD card, there would be nothing technically stopping this setup, but the main factors against would be the simple endurance and performance of SD cards and the potential fallout when they inevitably fail. If you factor time and maintenance as a cost

[ceph-users] Bluestore OSD Segfaults (12.2.5/12.2.7)

2018-08-07 Thread Thomas White
Hi all, We have recently begun switching over to Bluestore on our Ceph cluster, currently on 12.2.7. We first began encountering segfaults on Bluestore during 12.2.5, but strangely these segfaults apply exclusively to our SSD pools and not the PCIE/HDD disks. We upgraded to 12.2.7 last week to

Re: [ceph-users] Reset Object ACLs in RGW

2018-08-02 Thread thomas
ll need to install the aws toolkit and jq of course and configure them. Thanks again, Tom -Original Message- From: ceph-users On Behalf Of Casey Bodley Sent: 02 August 2018 17:08 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Reset Object ACLs in RGW On 08/02/2018 07:35 AM, Thomas

[ceph-users] Reset Object ACLs in RGW

2018-08-02 Thread Thomas White
Hi all, At present I have a cluster with a user on the RGW who has lost access to many of his files. The bucket has the correct ACL to be accessed by the account and so with their access and secret key many items can be listed, but are unable to be downloaded. Is there a way of using the rados

Re: [ceph-users] client.bootstrap-osd authentication error - which keyrin

2018-07-09 Thread Thomas Roth
bootstrap-osd" > > > Paul > > > 2018-07-06 16:47 GMT+02:00 Thomas Roth : > >> Hi all, >> >> I wonder which is the correct key to create/recreate an additional OSD >> with 12.2.5. >> >> Following >> http://docs.ceph.com/docs

[ceph-users] client.bootstrap-osd authentication error - which keyrin

2018-07-06 Thread Thomas Roth
them on my mon hosts. "ceph-volume" and "ceph-disk" go looking for that file, so I put it there, to no avail. Btw, the target server has still several "up" and "in" OSDs running, so this is not a question of network or general authentication iss

Re: [ceph-users] pre-sharding s3 buckets

2018-06-27 Thread Thomas Bennett
ket 30 times in 8 hours as we will write ~3 million objects in ~8 hours. Hence the idea that we should preshard to avoid any undesirable workloads. Cheers, Tom On Wed, Jun 27, 2018 at 3:16 PM, Matthew Vernon wrote: > Hi, > > On 27/06/18 11:18, Thomas Bennett wrote: > > > We h

[ceph-users] pre-sharding s3 buckets

2018-06-27 Thread Thomas Bennett
rifice that I'm willing to take for the convenience of it preconfigured. Cheers, Tom -- Thomas Bennett SRAO Storage Engineer - Science Data Processing ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph_vms performance

2018-05-23 Thread Thomas Bennett
Hi, I'm testing out ceph_vms vs a cephfs mount with a cifs export. I currently have 3 active ceph mds servers to maximise throughput and when I have configured a cephfs mount with a cifs export, I'm getting a reasonable benchmark results. However, when I tried some benchmarking with the ceph_v

Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

2018-05-21 Thread Thomas Byrne - UKRI STFC
time to compact their stores. Although it’s far from ideal (from a total time to get new storage weighted up), I’ll be letting the mons compact between every backfill until I have a better idea of what went on last week. From: David Turner Sent: 17 May 2018 18:57 To: Byrne, Thomas (STFC,RAL,SC

Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

2018-05-17 Thread Thomas Byrne - UKRI STFC
ors > holding onto cluster maps > > > > On 05/17/2018 04:37 PM, Thomas Byrne - UKRI STFC wrote: > > Hi all, > > > > > > > > As far as I understand, the monitor stores will grow while not > > HEALTH_OK as they hold onto all cluster maps.

[ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

2018-05-17 Thread Thomas Byrne - UKRI STFC
Hi all, As far as I understand, the monitor stores will grow while not HEALTH_OK as they hold onto all cluster maps. Is this true for all HEALTH_WARN reasons? Our cluster recently went into HEALTH_WARN due to a few weeks of backfilling onto new hardware pushing the monitors data stores over the

Re: [ceph-users] Too many active mds servers

2018-05-15 Thread Thomas Bennett
Hi Patric, Thanks! Much appreciate. On Tue, 15 May 2018 at 14:52, Patrick Donnelly wrote: > Hello Thomas, > > On Tue, May 15, 2018 at 2:35 PM, Thomas Bennett wrote: > > Hi, > > > > I'm running Luminous 12.2.5 and I'm testing cephfs. > > > > Ho

[ceph-users] Too many active mds servers

2018-05-15 Thread Thomas Bennett
Hi, I'm running Luminous 12.2.5 and I'm testing cephfs. However, I seem to have too many active mds servers on my test cluster. How do I set one of my mds servers to become standby? I've run ceph fs set cephfs max_mds 2 which set the max_mds from 3 to 2 but has no effect on my running configura

Re: [ceph-users] What do you use to benchmark your rgw?

2018-04-04 Thread Thomas Bennett
London, NW1 2BE > <https://maps.google.com/?q=215+Euston+Road,+London,+NW1+2BE&entry=gmail&source=g> > . > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.c

Re: [ceph-users] RGW default.rgw.meta pool

2018-02-05 Thread Thomas Bennett
Hi Orit, Thanks for the reply, much appreciated. You cannot see the omap size using rados ls but need to use rados omap > commands. You can use this script to calculate the bucket index size: > https://github.com/mkogan1/ceph-utils/blob/master/ > scripts/get_omap_kv_size.sh Great. I had not e

[ceph-users] RGW default.rgw.meta pool

2018-02-05 Thread Thomas Bennett
Hi, In trying to understand RGW pool usage I've noticed the pool called *default.rgw.meta* pool has a large number of objects in it. Suspiciously about twice as many objects in my *default.rgw.buckets.index* pool. As I delete and add buckets, the number of objects in both pools decrease and incre

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-31 Thread Thomas Bennett
Hi Peter, Relooking at your problem, you might want to keep track of this issue: http://tracker.ceph.com/issues/22440 Regards, Tom On Wed, Jan 31, 2018 at 11:37 AM, Thomas Bennett wrote: > Hi Peter, > > From your reply, I see that: > >1. pg 3.12c is part of pool 3. >

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-31 Thread Thomas Bennett
Hi Peter, >From your reply, I see that: 1. pg 3.12c is part of pool 3. 2. The osd's in the "up" for pg 3.12c are: 6, 0, 12. I suggest to check on this 'activating' issue do the following: 1. What is the rule that pool 3 should follow, 'hybrid', 'nvme' or 'hdd'? (Use the *ceph osd

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-26 Thread Thomas Bennett
11:48 AM, Peter Linder wrote: > Hi Thomas, > > No, we haven't gotten any closer to resolving this, in fact we had another > issue again when we added a new nvme drive to our nvme servers (storage11, > storage12 and storage13) that had weight 1.7 instead of the usual 0.728 >

Re: [ceph-users] Weird issues related to (large/small) weights in mixed nvme/hdd pool

2018-01-25 Thread Thomas Bennett
ame, the > problem goes away! I would have though that the weights does not matter, > since we have to choose 3 of these anyways. So I'm really confused over > this. > > Today I also had to change > > item ldc1 weight 197.489 > item ldc2 weight 197.196 >

Re: [ceph-users] How to remove deactivated cephFS

2018-01-25 Thread Thomas Bennett
e suddenly listed only one cephFS. Also the > command "ceph fs status" doesn't return an error anymore but shows the > corret output. > I guess Ceph is indeed a self-healing storage solution! :-) > > Regards, > Eugen > > > Zitat von Thomas Bennett : &

Re: [ceph-users] How to remove deactivated cephFS

2018-01-24 Thread Thomas Bennett
1 03 15 > D-22423 Hamburg e-mail : ebl...@nde.ag > > Vorsitzende des Aufsichtsrates: Angelika Mozdzen > Sitz und Registergericht: Hamburg, HRB 90934 > Vorstand: Jens-U. Mozdzen >USt-IdNr. DE 814

[ceph-users] Unable to join additional mon servers (luminous)

2018-01-11 Thread Thomas Gebhardt
-5@2(probing).data_health(6138) service_dispatch_op not in quorum -- drop message 2018-01-11 15:17:22.060499 7f69b80d6700 0 log_channel(cluster) log [INF] : mon.my-ceph-mon-5 calling new monitor election 2018-01-11 15:17:22.060612 7f69b80d6700 1 mon.my-ceph-mon-5@2(electing).elector(613

Re: [ceph-users] Multiple OSD crashing on 12.2.0. Bluestore / EC pool / rbd

2017-09-06 Thread Thomas Coelho
Hi, I have the same problem. A bug [1] is reported since months, but unfortunately this is not fixed yet. I hope, if more people are having this problem the developers can reproduce and fix it. I was using Kernel-RBD with a Cache Tier. so long Thomas Coelho [1] http://tracker.ceph.com/issues

Re: [ceph-users] osd heartbeat protocol issue on upgrade v12.1.0 ->v12.2.0

2017-09-01 Thread Thomas Gebhardt
Hello, thank you very much for the hint, you are right! Kind regards, Thomas Marc Roos schrieb am 30.08.2017 um 14:26: > > I had this also once. If you update all nodes and then systemctl restart > 'ceph-osd@*' on all nodes, you should be fine. But first the mo

[ceph-users] osd heartbeat protocol issue on upgrade v12.1.0 ->v12.2.0

2017-08-30 Thread Thomas Gebhardt
rsion 1 < struct_compat ( it is puzzling that the *older* v12.1.0 node complains about the *old* encoding version of the *newer* v12.2.0 node.) Any idea how I can go ahead? Kind regards, Thomas ___ ceph-users mailing list ceph-users@lists.ceph.com http://list

Re: [ceph-users] Reaching aio-max-nr on Ubuntu 16.04 with Luminous

2017-08-30 Thread Thomas Bennett
at 10:49 AM, Dan van der Ster wrote: > Hi Thomas, > > Yes we set it to a million. > From our puppet manifest: > > # need to increase aio-max-nr to allow many bluestore devs > sysctl { 'fs.aio-max-nr': val => '1048576' } &

[ceph-users] Reaching aio-max-nr on Ubuntu 16.04 with Luminous

2017-08-30 Thread Thomas Bennett
Hi, I've been testing out Luminous and I've noticed that at some point the number of osds per nodes was limited by aio-max-nr. By default its set to 65536 in Ubuntu 16.04 Has anyone else experienced this issue? fs.aio-nr currently sitting at 196608 with 48 osds. I have 48 osd's per node so I've

Re: [ceph-users] MON daemons fail after creating bluestore osd with block.db partition (luminous 12.1.0-1~bpo90+1 )

2017-07-09 Thread Thomas Gebhardt
Hello, Thomas Gebhardt schrieb am 07.07.2017 um 17:21: > ( e.g., > ceph-deploy osd create --bluestore --block-db=/dev/nvme0bnp1 node1:/dev/sdi > ) just noticed that there was typo in the block-db device name (/dev/nvme0bnp1 -> /dev/nvme0n1p1). After fixing that misspelling my coo

[ceph-users] MON daemons fail after creating bluestore osd with block.db partition (luminous 12.1.0-1~bpo90+1 )

2017-07-07 Thread Thomas Gebhardt
/ does not yet support stretch - but I suppose that's not related to my problem). Kind regards, Thomas Jul 07 09:58:54 node1 systemd[1]: Started Ceph cluster monitor daemon. Jul 07 09:58:54 node1 ceph-mon[550]: starting mon.node1 rank 0 at x.x.x.x:6789/0 mon_data /var/lib/ceph/mon/ceph-node1

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-12-04 Thread Thomas Danan
limit its impact. Thomas From: Nick Fisk [mailto:n...@fisk.me.uk] Sent: mercredi 23 novembre 2016 14:09 To: Thomas Danan; 'Peter Maloney' Cc: ceph-users@lists.ceph.com Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently Hi Thomas, I’m afraid I can’t off

Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-12-02 Thread Thomas Bennett
o, so you don’t even have to rely on Ceph to avoid > downtime. I probably wouldn’t run it everywhere at once though for > performance reasons. A single OSD at a time would be ideal, but that’s a > matter of preference. > > > > *From:* ceph-users [mailto:ceph-users-boun...@li

Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-11-30 Thread Thomas Bennett
gt; Of *Kate Ward > *Sent:* Tuesday, November 29, 2016 2:02 PM > *To:* Thomas Bennett > *Cc:* ceph-users@lists.ceph.com > *Subject:* Re: [ceph-users] Is there a setting on Ceph that we can use to > fix the minimum read size? > > > > I have no experience with XFS, but

Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-11-29 Thread Thomas Bennett
ter at combining requests before they get to the > drive? > > k8 > > On Tue, Nov 29, 2016 at 9:52 AM Thomas Bennett wrote: > >> Hi, >> >> We have a use case where we are reading 128MB objects off spinning disks. >> >> We've benchmarked a number of dif

[ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-11-29 Thread Thomas Bennett
Hi, We have a use case where we are reading 128MB objects off spinning disks. We've benchmarked a number of different hard drive and have noticed that for a particular hard drive, we're experiencing slow reads by comparison. This occurs when we have multiple readers (even just 2) reading objects

Re: [ceph-users] Ceph performance laggy (requests blocked > 32) on OpenStack

2016-11-25 Thread Thomas Danan
Hi Kévin, I am currently having a similar issue. in my env I have around 16 Linux vms (vmware) more or less equaly loaded accessing a 1PB ceph hammer cluster (40 dn, 800 osds) through rbd. Very often we have IO freeze on the VM xfs FS and we also continuously have slow requests on osd ( up to

[ceph-users] Fwd: RadosGW not responding if ceph cluster in state health_error

2016-11-24 Thread Thomas
Sorry to bring this up again - any ideas? Or should I try the IRC channel? Cheers, Thomas Original Message Subject:RadosGW not responding if ceph cluster in state health_error Date: Mon, 21 Nov 2016 17:22:20 +1300 From: Thomas To: ceph-users@lists.ceph.com

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-23 Thread Thomas Danan
mon_osd_min_down_reports = 10 Thomas From: David Turner [mailto:david.tur...@storagecraft.com] Sent: mercredi 23 novembre 2016 21:27 To: n...@fisk.me.uk; Thomas Danan; 'Peter Maloney' Cc: ceph-users@lists.ceph.com Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently T

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-23 Thread Thomas Danan
.2406870.1:140440919 rbd_data.616bf2ae8944a.002b85a7 [set-alloc-hint object_size 4194304 write_size 4194304,write 1449984~524288] 0.4e69d0de snapc 218=[218,1fb,1df] ondisk+write e212564) currently waiting for subops from 528,771 Thomas From: Tomasz Kuzemko [mailto:tom...@kuzemko.net] Sent: jeudi

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-23 Thread Thomas Danan
overloading the network or if my network switches that were having an issue. Switches have been checked and they are showing no congestion issues or other errors. I really don’t know what to check or test, any idea is more than welcomed … Thomas From: Thomas Danan Sent: vendredi 18 novembre 2016

[ceph-users] RadosGW not responding if ceph cluster in state health_error

2016-11-20 Thread Thomas
up creation Full log here: http://pastebin.com/iYpiF9wP Once we removed the pool with size = 1 via 'rados rmpool', the cluster started recovering and RGW served requests! Any ideas? Cheers, Thomas -- Thomas Gross TGMEDIA Ltd. p. +64 211 569080 | i...@tgmedia.co.nz

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-18 Thread Thomas Danan
online ? Thanks Thomas From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas Danan Sent: vendredi 18 novembre 2016 12:42 To: n...@fisk.me.uk; 'Peter Maloney' Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph cluster having blocke requests very freq

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-18 Thread Thomas Danan
entify anything obvious in the logs. Thanks for your help … Thomas From: Nick Fisk [mailto:n...@fisk.me.uk] Sent: jeudi 17 novembre 2016 11:02 To: Thomas Danan; n...@fisk.me.uk; 'Peter Maloney' Cc: ceph-users@lists.ceph.com Subject: RE: [ceph-users] ceph cluster having blocke requests very f

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-17 Thread Thomas Danan
Actually forgot to say that the following issue is describing very close symptoms : http://tracker.ceph.com/issues/9844 Thomas From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas Danan Sent: jeudi 17 novembre 2016 09:59 To: n...@fisk.me.uk; 'Peter Maloney'

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-17 Thread Thomas Danan
example and with some DEBUG messages activated I was also able to see the many of the following messages on secondary OSDs. 2016-11-15 03:53:04.298502 7ff9c434f700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7ff9bdb42700' had timed out after 15 Thomas From: Nick Fisk [mailto:

  1   2   >