Re: [ceph-users] Disk write cache - safe?

2018-03-15 Thread Christian Balzer
Hello, what has been said by others before is essentially true, as in if you want: - as much data conservation as possible and have - RAID controllers with decent amounts of cache and a BBU then disabling the on disk cache is the way to go. But as you found out, w/o those caches and a controll

Re: [ceph-users] Problem with UID starting with underscores

2018-03-15 Thread Rudenko Aleksandr
Hi, I have the same issue. Try to use two underscores: radosgw-admin user info --uid=“__pro_" I have user with two underscores on hammer and i can work with him with one underscore:) I recommend you remove this user and not use underscore in user names and access_keys because after upgrade o

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-15 Thread Mike Christie
On 03/14/2018 04:28 PM, Maxim Patlasov wrote: > On Wed, Mar 14, 2018 at 12:05 PM, Michael Christie > wrote: > > On 03/14/2018 01:27 PM, Michael Christie wrote: > > On 03/14/2018 01:24 PM, Maxim Patlasov wrote: > >> On Wed, Mar 14, 2018 at 11:13

[ceph-users] Instrument librbd+qemu IO from hypervisor

2018-03-15 Thread Martin Millnert
Dear fellow cephalopods, does anyone have any pointers on how to instrument librbd as-driven-by qemu IO performance from a hypervisor? Are there less intrusive ways than perf or equivalent? Can librbd be told to dump statistics somewhere (per volume) - clientside? This would come in real handy w

Re: [ceph-users] Instrument librbd+qemu IO from hypervisor

2018-03-15 Thread Martin Millnert
Self-follow-up: The ceph version is 0.80.11 in the cluster I'm working. So quite old. Adding: admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok log file = /var/log/ceph/ to /etc/ceph.conf, and then in my case tweaking apparmor (for test disabling it): service apparmor teardo

Re: [ceph-users] Object Gateway - Server Side Encryption

2018-03-15 Thread Vik Tara
On 14/03/18 12:31, Amardeep Singh wrote: > Though I have now another issue because I am using Multisite setup > with one zone for data and second zone for metadata with elastic > search tier. > > http://docs.ceph.com/docs/master/radosgw/elastic-sync-module/ > > When document is encrypted the metad

Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap

2018-03-15 Thread Fulvio Galeazzi
Hallo Jason, I am really thankful for your time! Changed the volume features: rbd image 'volume-80838a69-e544-47eb-b981-a4786be89736': . features: layering, exclusive-lock, deep-flatten I had to create several dummy files before seeing and increase with "rbd du": to me, this is s

Re: [ceph-users] Issue with fstrim and Nova hw_disk_discard=unmap

2018-03-15 Thread Jason Dillaman
OK, last suggestion just to narrow the issue down: ensure you have a functional admin socket and librbd log file as documented here [1]. With the VM running, before you execute "fstrim", run "ceph --admin-daemon /path/to/the/asok/file conf set debug_rbd 20" on the hypervisor host, execute "fstrim"

Re: [ceph-users] rctime not tracking inode ctime

2018-03-15 Thread Dan van der Ster
On Wed, Mar 14, 2018 at 11:43 PM, Patrick Donnelly wrote: > On Wed, Mar 14, 2018 at 9:22 AM, Dan van der Ster wrote: >> Hi all, >> >> On our luminous v12.2.4 ceph-fuse clients / mds the rctime is not >> tracking the latest inode ctime, but only the latest directory ctimes. >> >> Initial empty dir

[ceph-users] Crush Bucket move crashes mons

2018-03-15 Thread warren.jeffs
Hi All, Having some interesting challenges. I am trying to move 2 new nodes + 2 new racks into my default root, I have added them to the cluster outside of the Root=default. They are all in and up - happy it seems. The new nodes have all 12 OSDs in them and they are all 'UP' So when going to

[ceph-users] Backfilling on Luminous

2018-03-15 Thread David Turner
I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last week I added 2 nodes to the cluster. The backfilling has been ATROCIOUS. I have OSDs consistently [2] segfaulting during recovery. There's no pattern of which OSDs are segfaulting, which hosts have segfaulting OSDs, etc... It

Re: [ceph-users] Luminous | PG split causing slow requests

2018-03-15 Thread David Turner
The settings don't completely disable it, they push it back so that it won't happen for a very long time. I based doing my offline splitting every month because after 5-6 weeks I found that our PGs started splitting on their own with the increased settings... so I schedule a task to split them off

Re: [ceph-users] Backfilling on Luminous

2018-03-15 Thread Cassiano Pilipavicius
Hi David, for me something similar happened when I've upgraded from jewel to luminous and I have discovered that the problem is the memory allocator. I've tried to change to JEMAlloc in jewel to improve performance, and when upgraded to bluestore in luminous my osds started to crash. I;ve commente

[ceph-users] seeking maintainer for ceph-deploy (was Re: ceph-deploy's currentstatus)

2018-03-15 Thread Sage Weil
Adding ceph-users to get a bit broader distribution. tl;dr: is anyone interested in stepping up to help maintain ceph-deploy? Thanks! sage On Fri, 9 Mar 2018, Alfredo Deza wrote: > Since about October of 2015, ceph-deploy has gone without a dedicated lead > doing full time development on it. I'

Re: [ceph-users] Backfilling on Luminous

2018-03-15 Thread Jan Marquardt
Hi David, Am 15.03.18 um 18:03 schrieb David Turner: > I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last > week I added 2 nodes to the cluster.  The backfilling has been > ATROCIOUS.  I have OSDs consistently [2] segfaulting during recovery.  > There's no pattern of which OSDs

Re: [ceph-users] Disk write cache - safe?

2018-03-15 Thread Tim Bishop
Thank you Christian, David, and Reed for your responses. My servers have the Dell H730 RAID controller in them, but I have the OSD disks in Non-RAID mode. When initially testing I compared single RAID-0 containers with Non-RAID and the Non-RAID performance was acceptable, so I opted for the config

Re: [ceph-users] Disk write cache - safe?

2018-03-15 Thread John Petrini
I had a recent battle with performance on two of our nodes and it turned out to be a result of using non-raid mode. We ended up rebuilding them one by one in raid-0 with controller cache enabled on the OSD disks. I discussed it on the mailing list: https://www.spinics.net/lists/ceph-users/msg42756.

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-15 Thread Maxim Patlasov
On Thu, Mar 15, 2018 at 12:48 AM, Mike Christie wrote: > ... > > It looks like there is a bug. > > 1. A regression was added when I stopped killing the iscsi connection > when the lock is taken away from us to handle a failback bug where it > was causing ping ponging. That combined with #2 will c

Re: [ceph-users] Backfilling on Luminous

2018-03-15 Thread David Turner
We haven't used jemalloc for anything. The only thing in our /etc/sysconfig/ceph configuration is increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES. I didn't see anything in dmesg on one of the recent hosts that had an osd segfault. I looked at your ticket and that looks like something with PGs b

Re: [ceph-users] Backfilling on Luminous

2018-03-15 Thread Dan van der Ster
Hi, Do you see any split or merge messages in the osd logs? I recall some surprise filestore splitting on a few osds after the luminous upgrade. .. Dan On Mar 15, 2018 6:04 PM, "David Turner" wrote: I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last week I added 2 nodes t

Re: [ceph-users] Backfilling on Luminous

2018-03-15 Thread David Turner
I am aware of the filestore splitting happening. I manually split all of the subfolders a couple weeks ago on this cluster, but every time we have backfilling the newly moved PGs have a chance to split before the backfilling is done. When that has happened in the past it causes some blocked reque

Re: [ceph-users] Cephfs MDS slow requests

2018-03-15 Thread Deepak Naidu
David, few inputs based on my working experience on cephFS. Might or might not be relevant to the current issue seen in your cluster. 1. Create Metadata pool on NVMe. Folks can claim not needed, but I have seen worst perf when on HDD though the Metadata size is very small. 2. In cephFS, e

Re: [ceph-users] Disk write cache - safe?

2018-03-15 Thread Joe Comeau
Hi We're using SUSE Ent Storage - Ceph And have Dell 730xd and expansion trays with 8 tB disks We initially had the controller cache turned off as per ceph documentation (so configured as jboss in Dell Bios) We reconfigured as raid0 and use the cache now for both internal and expansion drive

Re: [ceph-users] Backfilling on Luminous

2018-03-15 Thread Dan van der Ster
Did you use perf top or iotop to try to identify where the osd is stuck? Did you try increasing the op thread suicide timeout from 180s? Splitting should log at the beginning and end of an op, so it should be clear if it's taking longer than the timeout. .. Dan On Mar 15, 2018 9:23 PM, "David

Re: [ceph-users] iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

2018-03-15 Thread Mike Christie
On 03/15/2018 02:32 PM, Maxim Patlasov wrote: > On Thu, Mar 15, 2018 at 12:48 AM, Mike Christie > wrote: > > ... > > It looks like there is a bug. > > 1. A regression was added when I stopped killing the iscsi connection > when the lock is taken away

Re: [ceph-users] Disk write cache - safe?

2018-03-15 Thread Joe Comeau
After reading Reeds comments about losing power to his data center, I think he brings up a lot of good points. So take Dells advice I linked into consideration with your own environment We also have 8TB disks with Intel P3700 for journal Our large ups and new generators which are tested week

Re: [ceph-users] Cephfs MDS slow requests

2018-03-15 Thread Sergey Malinin
On Friday, March 16, 2018 at 00:07, Deepak Naidu wrote: > cephFS is not great for small files(in KB’s) but works great with large file > sizes(MB or GB’s). So using like filer(NFS/SMB) use-case needs administration > attention. > Got to disagree with you there. CephFS (Luminous) performs perfe

Re: [ceph-users] Fwd: Slow requests troubleshooting in Luminous - details missing

2018-03-15 Thread Alex Gorbachev
On Mon, Mar 12, 2018 at 12:21 PM, Alex Gorbachev wrote: > On Mon, Mar 12, 2018 at 7:53 AM, Дробышевский, Владимир > wrote: >> >> I was following this conversation on tracker and got the same question. I've >> got a situation with slow requests and had no any idea on how to find the >> reason. F

Re: [ceph-users] Cephfs MDS slow requests

2018-03-15 Thread Yan, Zheng
On Wed, Mar 14, 2018 at 3:17 AM, David C wrote: > Hi All > > I have a Samba server that is exporting directories from a Cephfs Kernel > mount. Performance has been pretty good for the last year but users have > recently been complaining of short "freezes", these seem to coincide with > MDS related

Re: [ceph-users] PG numbers don't add up?

2018-03-15 Thread Ovidiu Poncea
Hi Nathan, Do you have replication enabled in any of your pools? E.g. a replication of 2 will allocate each PG to two OSDs so that PGs/OSD is twice the one you expect. On 03/14/2018 07:28 AM, Nathan Dehnel wrote: I try to add a data pool: OSD_STAT USED   AVAIL TOTAL HB_PEERS            PG_S

[ceph-users] Reducing pg_num for a pool

2018-03-15 Thread Ovidiu Poncea
Hi All, Is there any news on when/if support for decreasing pg_num will be available? Thank you, Ovidiu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com