[ceph-users] Trouble removing MDS daemons | Luminous

2017-11-08 Thread Geoffrey Rhodes
Good day, Firstly I'd like to acknowledge that I consider myself a Ceph noob. OS: Ubuntu 16.04.3 LTS Ceph version: 12.2.1 I'm running a small six node POC cluster with three MDS daemons. (One on each node, node1, node2 and node3) I've also configured three ceph file systems fsys1, fsys2 and fsy

Re: [ceph-users] Trouble removing MDS daemons | Luminous

2017-11-08 Thread John Spray
On Wed, Nov 8, 2017 at 10:39 AM, Geoffrey Rhodes wrote: > Good day, > > Firstly I'd like to acknowledge that I consider myself a Ceph noob. > > OS: Ubuntu 16.04.3 LTS > Ceph version: 12.2.1 > > I'm running a small six node POC cluster with three MDS daemons. (One on > each node, node1, node2 and

[ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Wolfgang Lendl
Hello, it's clear to me getting a performance gain from putting the journal on a fast device (ssd,nvme) when using filestore backend. it's not when it comes to bluestore - are there any resources, performance test, etc. out there how a fast wal,db device impacts performance? br wolfgang -- Wol

[ceph-users] OSD heartbeat problem

2017-11-08 Thread Monis Monther
Good Day, Today we had a problem with lots of OSDs being marked as down due to heartbeat failures between the OSDs. Specifically the following is seen in the OSD logs prior to the heartbeat no_reply errors monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early

Re: [ceph-users] Issue with "renamed" mon, crashing

2017-11-08 Thread Kamila Součková
Hi, I am not sure if this is the same issue as we had recently, but it looks a bit like it -- we also had a Luminous mon crashing right after syncing was done. Turns out that the current release has a bug which causes the mon to crash if it cannot find a mgr daemon. This should be fixed in the up

[ceph-users] High osd cpu usage

2017-11-08 Thread Alon Avrahami
Hello Guys We have a fresh 'luminous' ( 12.2.0 ) (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) ( installed using ceph-ansible ) the cluster contains 6 * Intel server board S2600WTTR ( 96 osds and 3 mons ) We have 6 nodes ( Intel server board S2600WTTR ) , Mem - 64G , CPU

Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Mark Nelson
Hi Wolfgang, In bluestore the WAL serves sort of a similar purpose to filestore's journal, but bluestore isn't dependent on it for guaranteeing durability of large writes. With bluestore you can often get higher large-write throughput than with filestore when using HDD-only or flash-only OSDs

Re: [ceph-users] VM Data corruption shortly after Luminous Upgrade

2017-11-08 Thread James Forde
Title probably should have read "Ceph Data corruption shortly after Luminous Upgrade" Problem seems to have been sorted out. Still not sure why original problem other than Upgrade latency?, or mgr errors? After I resolved the boot problem I attempted to reproduce error, but was unsuccessful whi

Re: [ceph-users] VM Data corruption shortly after Luminous Upgrade

2017-11-08 Thread Jason Dillaman
Are your QEMU VMs using a different CephX user than client.admin? If so, can you double-check your caps to ensure that the QEMU user can blacklist? See step 6 in the upgrade instructions [1]. The fact that "rbd resize" fixed something hints that your VMs had hard-crashed with the exclusive lock lef

Re: [ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

2017-11-08 Thread Jon Light
Thanks for the instructions Michael, I was able to successfully get the patch, build, and install. Unfortunately I'm now seeing "osd/PG.cc: 5381: FAILED assert(info.history.same_interval_since != 0)". Then the OSD crashes. On Sat, Nov 4, 2017 at 5:51 AM, Michael wrote: > Jon Light wrote: > > I

[ceph-users] Recovery operations and ioprio options

2017-11-08 Thread Захаров Алексей
Hello, Today we use ceph jewel with: osd disk thread ioprio class=idle osd disk thread ioprio priority=7 and "nodeep-scrub" flag is set. We want to change scheduler from CFQ to deadline, so these options will lose effect. I've tried to find out what operations are performed in "disk thread".

Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Wolfgang Lendl
Hi Mark, thanks for your reply! I'm a big fan of keeping things simple - this means that there has to be a very good reason to put the WAL and DB on a separate device otherwise I'll keep it collocated (and simpler). as far as I understood - putting the WAL,DB on a faster (than hdd) device makes m

Re: [ceph-users] Issue with "renamed" mon, crashing

2017-11-08 Thread Anders Olausson
Hi Kamila, Thank you for your response. I think we solved it yesterday. I simply removed the mon again and this time I also removed all references to it in ceph.conf (had some remnants there). After that I ran ceph-deploy and after that it haven’t crashed again so far. So in this case it was mo

Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Mark Nelson
Hi Wolfgang, You've got the right idea. RBD is probably going to benefit less since you have a small number of large objects and little extra OMAP data. Having the allocation and object metadata on flash certainly shouldn't hurt, and you should still have less overhead for small (<64k) writes

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-08 Thread Jan Pekař - Imatic
You were right, it was frozen at virtual machine level. panic kernel parameter worked, so server resumed with reboot. But there were no panic displayed on the VNC console even if I was logged. The main problem is, that combination of MON and OSD silent failure at once will cause much longer res

Re: [ceph-users] Issues with dynamic bucket indexing resharding and tenants

2017-11-08 Thread Orit Wasserman
On Wed, Nov 8, 2017 at 9:45 PM, Mark Schouten wrote: > I see you fixed this (with a rather trivial patch :)), great! > :) > I am wondering though, should I be able to remove the invalid entry using > this patch too? > It should work. > Regards, > > Mark > > > On 5 Nov 2017, at 07:33, Orit Wasser

Re: [ceph-users] Bluestore osd_max_backfills

2017-11-08 Thread Scottix
When I add in the next hdd i'll try the method again and see if I just needed to wait longer. On Tue, Nov 7, 2017 at 11:19 PM Wido den Hollander wrote: > > > Op 7 november 2017 om 22:54 schreef Scottix : > > > > > > Hey, > > I recently updated to luminous and started deploying bluestore osd > no

[ceph-users] Erasure pool

2017-11-08 Thread Marc Roos
Can anyone advice on a erasure pool config to store - files between 500MB and 8GB, total 8TB - just for archiving, not much reading (few files a week) - hdd pool - now 3 node cluster (4th coming) - would like to save on storage space I was thinking of a profile with jerasure k=3 m=2, but mayb

Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Mark Nelson > Sent: 08 November 2017 19:46 > To: Wolfgang Lendl > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] bluestore - wal,db on faster devices? > > Hi Wolfgang, > > You've

Re: [ceph-users] Recovery operations and ioprio options

2017-11-08 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > ??? ??? > Sent: 08 November 2017 16:21 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Recovery operations and ioprio options > > Hello, > Today we use ceph jewel with: > osd

Re: [ceph-users] Inconsistent PG won't repair

2017-11-08 Thread Richard Bade
For anyone that encounters this in the future, I was able to resolve the issue by finding the three osd's that the object is on. One by one I stop the osd, flushed the journal and used the objectstore tool to remove the data (sudo ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-19 --journa

Re: [ceph-users] Blog post: storage server power consumption

2017-11-08 Thread Nick Fisk
Also look at the new WD 10TB Red's if you want very low use archive storage. Because they spin at 5400, they only use 2.8W at idle. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Jack > Sent: 06 November 2017 22:31 > To: ceph-users@lists.c

[ceph-users] Disconnect a client Hypervisor

2017-11-08 Thread Karun Josy
Hi, Do you think there is a way for ceph to disconnect an HV client from a cluster? We want to prevent the possibility that two hvs are running the same vm. When a hv crashes, we have to make sure that when the vms are started in a new hv, that the disk is not open in the crashed hv. I can see

Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Mark Nelson
On 11/08/2017 03:16 PM, Nick Fisk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark Nelson Sent: 08 November 2017 19:46 To: Wolfgang Lendl Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] bluestore - wal,db on faster devices?

[ceph-users] who is using nfs-ganesha and cephfs?

2017-11-08 Thread Sage Weil
Who is running nfs-ganesha's FSAL to export CephFS? What has your experience been? (We are working on building proper testing and support for this into Mimic, but the ganesha FSAL has been around for years.) Thanks! sage ___ ceph-users mailing list

Re: [ceph-users] who is using nfs-ganesha and cephfs?

2017-11-08 Thread Marc Roos
I, in test environment, centos7, on a luminous osd node, with binaries from download.ceph.com::ceph/nfs-ganesha/rpm-V2.5-stable/luminous/x86_64/ Having these: Nov 6 17:41:34 c01 kernel: ganesha.nfsd[31113]: segfault at 0 ip 7fa80a151a43 sp 7fa755ffa2f0 error 4 in libdbus-1.so.3.7.4

Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Nick Fisk
> -Original Message- > From: Mark Nelson [mailto:mnel...@redhat.com] > Sent: 08 November 2017 21:42 > To: n...@fisk.me.uk; 'Wolfgang Lendl' > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] bluestore - wal,db on faster devices? > > > > On 11/08/2017 03:16 PM, Nick Fisk wrote:

[ceph-users] recreate ceph-deploy node

2017-11-08 Thread James Forde
On my cluster I have a ceph-deploy node that is not a mon or osd. This is my bench system, and I want to recreate the ceph-deploy node to simulate a failure. I cannot find this outlined anywhere, so I thought I would ask. Basically follow Preflight http://docs.ceph.com/docs/master/start/quick-s

Re: [ceph-users] VM Data corruption shortly after Luminous Upgrade

2017-11-08 Thread James Forde
Wow, Thanks for the heads-up Jason. That explains a lot. I followed the instructions here http://ceph.com/releases/v12-2-0-luminous-released/ which apparently left out that step. I have now executed that command. Is there a new master list of the cli’s? From: Jason Dillaman [mailto:jdill...@red

Re: [ceph-users] who is using nfs-ganesha and cephfs?

2017-11-08 Thread Lincoln Bryant
Hi Sage, We have been running the Ganesha FSAL for a while (as far back as Hammer / Ganesha 2.2.0), primarily for uid/gid squashing. Things are basically OK for our application, but we've seen the following weirdness*: - Sometimes there are duplicated entries when directories are listed

[ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Sam Huracan
Hi Cephers, I'm testing RadosGW in Luminous version. I've already installed done in separate host, service is running but RadosGW did not accept any my configuration in ceph.conf. My Config: [client.radosgw.gateway] host = radosgw keyring = /etc/ceph/ceph.client.radosgw.keyring rgw socket path =

Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Hans van den Bogert
Are you sure you deployed it with the client.radosgw.gateway name as well? Try to redeploy the RGW and make sure the name you give it corresponds to the name you give in the ceph.conf. Also, do not forget to push the ceph.conf to the RGW machine. On Wed, Nov 8, 2017 at 11:44 PM, Sam Huracan wrote

Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Sam Huracan
@Hans: Yes, I tried to redeploy RGW, and ensure client.radosgw.gateway is the same in ceph.conf. Everything go well, service radosgw running, port 7480 is opened, but all my config of radosgw in ceph.conf can't be set, rgw_dns_name is still empty, and log file keeps default value. [root@radosgw sy

Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Sam Huracan
I checked ceph pools, cluster has some pools: [ceph-deploy@ceph1 cluster-ceph]$ ceph osd lspools 2 rbd,3 .rgw.root,4 default.rgw.control,5 default.rgw.meta,6 default.rgw.log, 2017-11-09 11:25 GMT+07:00 Sam Huracan : > @Hans: Yes, I tried to redeploy RGW, and ensure client.radosgw.gateway is >

Re: [ceph-users] who is using nfs-ganesha and cephfs?

2017-11-08 Thread Wido den Hollander
> Op 8 november 2017 om 22:41 schreef Sage Weil : > > > Who is running nfs-ganesha's FSAL to export CephFS? What has your > experience been? > A customer of mine is going this. They are running Ubuntu and my experience is that getting Ganesha compiled is already a pain sometimes. When it r

Re: [ceph-users] High osd cpu usage

2017-11-08 Thread Vy Nguyen Tan
Hello, I think it not normal behavior in Luminous. I'm testing 3 nodes, each node have 3 x 1TB HDD, 1 SSD for wal + db, E5-2620 v3, 32GB of RAM, 10Gbps NIC. I use fio for I/O performance measurements. When I ran "fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filen