[ceph-users] Point-in-Time Recovery

2020-03-13 Thread Ml Ml
Hello List,

when reading:
  https://docs.ceph.com/docs/master/rbd/rbd-mirroring/
it says: (...)Journal-based: This mode uses the RBD journaling image
feature to ensure point-in-time, crash-consistent replication between
clusters(...)

Does this mean, that mean, that we have some kind of transaction logs
where we will be able to "walk to" a specific time to restore a
specific state?

Like: 
https://blog.sleeplessbeastie.eu/2016/02/29/how-to-perform-postgresql-point-in-time-recovery/

Thanks,
Michael
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: EC pool 4+2 - failed to guarantee a failure domain

2020-03-13 Thread Eugen Block

Hi,

this is unexpected, of course, but it can happen if one OSD is full  
(or also nearfull?). Have you checked 'ceph osd df'? The pg  
availability has more priority than the placement, so it's possible  
that during a failure some chunks are recreated on the same OSD or  
host even if the crush rules shouldn't allow that.


Regards,
Eugen


Zitat von Maks Kowalik :


Hello,

I have created a small 16pg EC pool with k=4, m=2.
Then I applied following crush rule to it:

rule test_ec {  id 99   type erasuremin_size 5  max_size 6  
step
set_chooseleaf_tries 5  step set_choose_tries 100   step take 
default
step choose indep 3 type host   step chooseleaf indep 2 type osd
step emit  }

The OSD tree looks as following:
-1   43.38448 root default
 -9   43.38448 region lab1
 -7   43.38448 room dc1.lab1
 -5   43.38448 rack r1.dc1.lab1
 -3   14.44896 host host1.r1.dc1.lab1
  6   hdd  3.63689 osd.6
   up  1.0 1.0
  8   hdd  3.63689 osd.8
   up  1.0 1.0
  7   hdd  3.63689 osd.7
   up  1.0 1.0
 11   hdd  3.53830 osd.11
   up  1.0 1.0
-11   14.44896 host host2.r1.dc1.lab1
  4   hdd  3.63689 osd.4
   up  1.0 1.0
  9   hdd  3.63689 osd.9
   up  1.0 1.0
  5   hdd  3.63689 osd.5
   up  1.0 1.0
 10   hdd  3.53830 osd.10
   up  1.0 1.0
-13   14.48656 host host3.r1.dc1.lab1
  0   hdd  3.57590 osd.0
   up  1.0 1.0
  1   hdd  3.63689 osd.1
   up  1.0 1.0
  2   hdd  3.63689 osd.2
   up  1.0 1.0
  3   hdd  3.63689 osd.3
   up  1.0 1.0

My expectation was that each host will contain 2 shards of any PG of  
the pool.


When I dumped PGs, it was true, but one group is placed on OSDs 0,2,3
which will cause downtime in case of host3 failure.
root@host1:~/mkw # ceph pg dump|grep "^66\."|awk '{print $17}'
dumped all
[4,5,7,6,1,2]

[8,11,9,3,0,2]  <<< - this one is problematic

[6,7,10,9,2,0]
[2,3,7,6,5,9]
[7,8,10,5,3,1]
[4,5,8,6,0,2]
[7,11,9,4,1,2]
[5,9,0,2,7,11]
[9,5,3,1,7,8]
[8,11,2,0,5,9]
[2,0,8,6,10,9]
[3,2,5,9,7,11]
[6,7,9,5,1,2]
[10,5,1,3,11,8]
[4,5,7,8,2,0]
[7,8,3,2,9,10]

Is there a way to ensure that host failure is not disruptive to the cluster?

During the experiment I used info from this thread:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030227.html

Kind regards,

Maks Kowalik
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Performance of Micron 5210 SATA?

2020-03-13 Thread Marc Roos
 

Hi Mourik Jan,

 > So, ran the fio commands, and pasted output (as it's quite a lot) 
here:

 > I hope someone here can draw some conclusions from this output...

Now you know, it sort of performs similar to other enterprise drives. 
And you know your ceph solution will never perform beyond this ;) And if 
your ceph overhead is more than standard, you maybe able be to identify 
at a later time where the problem is. 
With the 4k, 128k, 1024k and 4096k test results I am trying to have 
data, that will explain ceph performance under different use cases. 



This is a result from a samsung sm863 1.92TB

write-4k-seq: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1
randwrite-4k-seq: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W) 
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
read-4k-seq: (g=2): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1
randread-4k-seq: (g=3): rw=randread, bs=(R) 4096B-4096B, (W) 
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
rw-4k-seq: (g=4): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1
randrw-4k-seq: (g=5): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
(T) 4096B-4096B, ioengine=libaio, iodepth=1
write-128k-seq: (g=6): rw=write, bs=(R) 128KiB-128KiB, (W) 
128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1
randwrite-128k-seq: (g=7): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 
128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1
read-128k-seq: (g=8): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, 
(T) 128KiB-128KiB, ioengine=libaio, iodepth=1
randread-128k-seq: (g=9): rw=randread, bs=(R) 128KiB-128KiB, (W) 
128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1
rw-128k-seq: (g=10): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 
128KiB-128KiB, ioengine=libaio, iodepth=1
randrw-128k-seq: (g=11): rw=randrw, bs=(R) 128KiB-128KiB, (W) 
128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1
write-1024k-seq: (g=12): rw=write, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
randwrite-1024k-seq: (g=13): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
read-1024k-seq: (g=14): rw=read, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
randread-1024k-seq: (g=15): rw=randread, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
rw-1024k-seq: (g=16): rw=rw, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
randrw-1024k-seq: (g=17): rw=randrw, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
write-4096k-seq: (g=18): rw=write, bs=(R) 4096KiB-4096KiB, (W) 
4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=1
randwrite-4096k-seq: (g=19): rw=randwrite, bs=(R) 4096KiB-4096KiB, (W) 
4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=1
read-4096k-seq: (g=20): rw=read, bs=(R) 4096KiB-4096KiB, (W) 
4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=1
randread-4096k-seq: (g=21): rw=randread, bs=(R) 4096KiB-4096KiB, (W) 
4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=1
rw-4096k-seq: (g=22): rw=rw, bs=(R) 4096KiB-4096KiB, (W) 
4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=1
randrw-4096k-seq: (g=23): rw=randrw, bs=(R) 4096KiB-4096KiB, (W) 
4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=1
fio-3.1
Starting 24 processes

write-4k-seq: (groupid=0, jobs=1): err= 0: pid=11515: Mon Aug 26 
15:45:05 2019
  write: IOPS=30.8k, BW=120MiB/s (126MB/s)(21.1GiB/180001msec)
slat (nsec): min=3762, max=75591, avg=7173.31, stdev=3526.20
clat (nsec): min=440, max=1057.9k, avg=23654.15, stdev=4570.40
 lat (usec): min=23, max=1062, avg=31.02, stdev= 4.94
clat percentiles (nsec):
 |  1.00th=[15680],  5.00th=[19840], 10.00th=[21120], 
20.00th=[21888],
 | 30.00th=[22144], 40.00th=[22400], 50.00th=[22656], 
60.00th=[22912],
 | 70.00th=[23168], 80.00th=[24448], 90.00th=[28544], 
95.00th=[29568],
 | 99.00th=[47360], 99.50th=[49920], 99.90th=[51968], 
99.95th=[52480],
 | 99.99th=[84480]
   bw (  KiB/s): min=117983, max=146636, per=100.00%, avg=123958.14, 
stdev=7366.26, samples=360
   iops: min=29495, max=36659, avg=30989.17, stdev=1841.60, 
samples=360
  lat (nsec)   : 500=0.01%, 750=0.01%
  lat (usec)   : 10=0.01%, 20=5.10%, 50=94.45%, 100=0.45%, 250=0.01%
  lat (usec)   : 500=0.01%, 750=0.01%
  lat (msec)   : 2=0.01%
  cpu  : usr=5.57%, sys=25.51%, ctx=8452083, majf=0, minf=7
  IO depths: 1=117.5%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 issued rwt: total=0,

[ceph-users] ceph qos

2020-03-13 Thread 展荣臻(信泰)
Hi everyone:
  There are two qos in ceph(one based on tokenbucket algorithm,another based on 
mclock ).
Which one I can use in production environment? Thank you
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a better way to make a samba/nfs gateway?

2020-03-13 Thread Nathan Fish
Note that we have had issues with deadlocks when re-exporting CephFS
via Samba. It appears to only occur with Mac clients, though. In some
cases it has hung on a request for a high-level directory and hung
that branch for all clients.

On Fri, Mar 13, 2020 at 1:56 AM Konstantin Shalygin  wrote:
>
>
> On 3/11/20 11:16 PM, Seth Galitzer wrote:
> > I have a hybrid environment and need to share with both Linux and
> > Windows clients. For my previous iterations of file storage, I
> > exported nfs and samba shares directly from my monolithic file server.
> > All Linux clients used nfs and all Windows clients used samba. Now
> > that I've switched to ceph, things are a bit more complicated. I built
> > a gateway to export nfs and samba as needed, and connect that as a
> > client to my ceph cluster.
> >
> > After having file locking problems with kernel nfs, I made the switch
> > to nfs-ganesha, which has helped immensely. For Linux clients that
> > have high I/O needs, like desktops and some web servers, I connect to
> > ceph directly for those shares. For all other Linux needs, I use nfs
> > from the gateway. For all Windows clients (desktops and a small number
> > of servers), I use samba exported from the gateway.
> >
> > Since my ceph cluster went live in August, I have had some kind of
> > strange (to me) error at least once a week, almost always related to
> > the gateway client. Last night, it was MDS_CLIENT_OLDEST_TID. Since
> > we're on Spring Break at my university and not very busy, I decided to
> > unmount/remount the ceph share, requiring stopping nfs and samba
> > services. Stopping nfs-ganesha took a while, but it finally completed
> > with no complaints from the ceph cluster. Stopping samba took longer
> > and gave me MDS_SLOW_REQUEST and MDS_CLIENT_LATE_RELEASE on the mds.
> > It finally finished, and I was able to unmount/remount the ceph share
> > and that finally cleared all the errors.
> >
> > This is leading me to believe that samba on the gateway and all the
> > clients attaching to that is putting a strain on the connection back
> > to ceph. Which finally brings me to my question: is there a better way
> > to export samba to my clients using the ceph back end? Or is this as
> > good as it gets and I just have to put up with the seemingly frequent
> > errors? I can live with the errors and have been able to handle them
> > so far, but I know people who have much bigger clusters and many more
> > clients than me (by an order of magnitude) and don't see nearly as
> > many errors as I do. Which is why I'm trying to figure out what is
> > special about my setup.
> >
> > All my ceph nodes are running latest nautilus on Centos 7 (I just
> > updated last week to 14.2.8), as is the gateway host. I'm mounting
> > ceph directly on the gateway (by way of the kernel using cephfs, not
> > rados/rbd) to a single mount point and exporting from there.
> >
> > My searches so far have not turned up anything extraordinarily useful,
> > so I'm asking for some guidance here. Any advice is welcome.
>
> You can connect to your cluster directly from userland, without kernel.
> Use Samba vfs_ceph for this.
>
>
>
> k
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Performance of Micron 5210 SATA?

2020-03-13 Thread vitalif

Hi,

Can you test it slightly differently (and simpler)? Like in this 
googledoc: 
https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit#gid=0


As we know that it's a QLC drive, first let it fill the SLC cache:

fio -ioengine=libaio -direct=1 -name=test -bs=4M -iodepth=32 -rw=write 
-size=100G -filename=/dev/sdXX


Then, immediately:

1) Parallel random write

fio -ioengine=libaio -direct=1 -name=test -bs=4k -iodepth=32 
-rw=randwrite -runtime=60 -filename=/dev/sdXX


2) Transactional random write (journaling)

fio -ioengine=libaio -fsync=1 -direct=1 -name=test -bs=4k -iodepth=1 
-rw=randwrite -runtime=60 -filename=/dev/sdXX


3) Linear write

fio -ioengine=libaio -direct=1 -name=test -bs=4M -iodepth=32 -rw=write 
-runtime=60 -filename=/dev/sdXX


4) Parallel random read:

fio -ioengine=libaio -direct=1 -name=test -bs=4k -iodepth=32 
-rw=randread -runtime=60 -filename=/dev/sdXX


5) Synchronous random read:

fio -ioengine=libaio -sync=1 -direct=1 -name=test -bs=4k -iodepth=1 
-rw=randread -runtime=60 -filename=/dev/sdXX


6) Linear read:

fio -ioengine=libaio -direct=1 -name=test -bs=4M -iodepth=32 -rw=read 
-runtime=60 -filename=/dev/sdXX



Hi,

So, ran the fio commands, and pasted output (as it's quite a lot) here:

https://pastebin.com/KWYEu9uU

I hope someone here can draw some conclusions from this output...

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cancelled: Ceph Day Oslo May 13th

2020-03-13 Thread Wido den Hollander
Hi,

Due to the recent developments around the COVID-19 virus we (the
organizers) have decided to cancel the Ceph Day in Oslo on May 13th.

Altough it's still 8 weeks away we don't know how the situation will
develop and if travel will be possible or people are willing to travel.

Therefor we thought it was best to cancel the event for now and to
re-schedule to a later date in 2020.

We haven't picked a date yet. Once chosen we'll communicate it through
the regular channels.

Wido
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MGRs failing once per day and generally slow response times

2020-03-13 Thread Janek Bevendorff
I replaced ntpd with chronyd and will let you know if it changes 
anything. Thanks.



On 13/03/2020 06:25, Konstantin Shalygin wrote:

On 3/13/20 12:57 AM, Janek Bevendorff wrote:
NTPd is running, all the nodes have the same time to the second. I 
don't think that is the problem. 


As always in such cases - try to switch your ntpd to default EL7 
daemon - chronyd.




k 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a better way to make a samba/nfs gateway?

2020-03-13 Thread Martin Verges
Hello,

we have a CTDB based HA Samba in our Ceph Management Solution.
It works like a charm and we connect it to existing active directories as
well.

It's based on vfs_ceph and you can read more about how to configure it
yourself on
https://www.samba.org/samba/docs/current/man-html/vfs_ceph.8.html.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Fr., 13. März 2020 um 13:06 Uhr schrieb Nathan Fish :

> Note that we have had issues with deadlocks when re-exporting CephFS
> via Samba. It appears to only occur with Mac clients, though. In some
> cases it has hung on a request for a high-level directory and hung
> that branch for all clients.
>
> On Fri, Mar 13, 2020 at 1:56 AM Konstantin Shalygin 
> wrote:
> >
> >
> > On 3/11/20 11:16 PM, Seth Galitzer wrote:
> > > I have a hybrid environment and need to share with both Linux and
> > > Windows clients. For my previous iterations of file storage, I
> > > exported nfs and samba shares directly from my monolithic file server.
> > > All Linux clients used nfs and all Windows clients used samba. Now
> > > that I've switched to ceph, things are a bit more complicated. I built
> > > a gateway to export nfs and samba as needed, and connect that as a
> > > client to my ceph cluster.
> > >
> > > After having file locking problems with kernel nfs, I made the switch
> > > to nfs-ganesha, which has helped immensely. For Linux clients that
> > > have high I/O needs, like desktops and some web servers, I connect to
> > > ceph directly for those shares. For all other Linux needs, I use nfs
> > > from the gateway. For all Windows clients (desktops and a small number
> > > of servers), I use samba exported from the gateway.
> > >
> > > Since my ceph cluster went live in August, I have had some kind of
> > > strange (to me) error at least once a week, almost always related to
> > > the gateway client. Last night, it was MDS_CLIENT_OLDEST_TID. Since
> > > we're on Spring Break at my university and not very busy, I decided to
> > > unmount/remount the ceph share, requiring stopping nfs and samba
> > > services. Stopping nfs-ganesha took a while, but it finally completed
> > > with no complaints from the ceph cluster. Stopping samba took longer
> > > and gave me MDS_SLOW_REQUEST and MDS_CLIENT_LATE_RELEASE on the mds.
> > > It finally finished, and I was able to unmount/remount the ceph share
> > > and that finally cleared all the errors.
> > >
> > > This is leading me to believe that samba on the gateway and all the
> > > clients attaching to that is putting a strain on the connection back
> > > to ceph. Which finally brings me to my question: is there a better way
> > > to export samba to my clients using the ceph back end? Or is this as
> > > good as it gets and I just have to put up with the seemingly frequent
> > > errors? I can live with the errors and have been able to handle them
> > > so far, but I know people who have much bigger clusters and many more
> > > clients than me (by an order of magnitude) and don't see nearly as
> > > many errors as I do. Which is why I'm trying to figure out what is
> > > special about my setup.
> > >
> > > All my ceph nodes are running latest nautilus on Centos 7 (I just
> > > updated last week to 14.2.8), as is the gateway host. I'm mounting
> > > ceph directly on the gateway (by way of the kernel using cephfs, not
> > > rados/rbd) to a single mount point and exporting from there.
> > >
> > > My searches so far have not turned up anything extraordinarily useful,
> > > so I'm asking for some guidance here. Any advice is welcome.
> >
> > You can connect to your cluster directly from userland, without kernel.
> > Use Samba vfs_ceph for this.
> >
> >
> >
> > k
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a better way to make a samba/nfs gateway?

2020-03-13 Thread Marc Roos


Can you also create snapshots via the vfs_ceph solution?

 

-Original Message-
Sent: 13 March 2020 14:46
Subject: [ceph-users] Re: Is there a better way to make a samba/nfs 
gateway?

Hello,

we have a CTDB based HA Samba in our Ceph Management Solution.
It works like a charm and we connect it to existing active directories 
as well.

It's based on vfs_ceph and you can read more about how to configure it 
yourself on 
https://www.samba.org/samba/docs/current/man-html/vfs_ceph.8.html.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht 
Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Inactive PGs

2020-03-13 Thread Peter Eisch
Full cluster is 14.2.8.

I had some OSD drop overnight which results now in 4 inactive PGs.  The pools 
had three participant (2 ssd, 1 sas) OSDs.  In each pool at least 1 ssd and 1 
sas OSD is working without issue.  I’ve ‘ceph pg repair ’ but it doesn’t 
seem to make any changes.

PG_AVAILABILITY Reduced data availability: 4 pgs inactive, 4 pgs incomplete
pg 10.2e is incomplete, acting [59,67]
pg 10.c3 is incomplete, acting [62,105]
pg 10.f3 is incomplete, acting [62,59]
pg 10.1d5 is incomplete, acting [87,106]

Using `ceph pg  query` I can see the OSD in each case of the ones which 
failed.  Respectively they are:
pg 10.2e participants: 59, 68, 77, 143
pg 10.c3 participants: 60, 62, 85, 102, 105, 106
pg 10.f3 participants: 59, 64, 75, 107
pg 10.1d5 participants: 64, 77, 87, 106

The OSDs which are now down/out and have been removed from the crush map and 
removed the auth are:
62, 64, 68

Of course I have lots of reports of slow OSDs now from OSDs worried about the 
inactive PGs.

How do I properly kick these PGs to have them drop their usage of the OSDs 
which no longer exist?

Thanks for you thoughts on this,

peter


Peter Eisch
Senior Site Reliability Engineer
T1.612.445.5135
virginpulse.com
|virginpulse.com/global-challenge
Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA
Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.
v2.64
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible bug with rbd export/import?

2020-03-13 Thread Matt Dunavant
I'm not sure of the last known good release of the rbd CLI where this worked. I 
just ran the sha1sum against the images and they always come up as different. 
Might be worth knowing, this is a volume that's provisioned at 512GB (with much 
less actually used) but after export, it only shows up as about 56GB.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a better way to make a samba/nfs gateway? (Marc Roos)

2020-03-13 Thread Chad William Seys
Awhile back I thought there were some limitations which prevented us 
from trying this, but I cannot remember...


What does the ceph vfs gain you over exporting by cephfs kernel module 
(kernel 4.19).  What does it lose you?


(I.e. pros and cons versus kernel module?)

Thanks!
C.

It's based on vfs_ceph and you can read more about how to configure it 
yourself on 
https://www.samba.org/samba/docs/current/man-html/vfs_ceph.8.html.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible bug with rbd export/import?

2020-03-13 Thread Jason Dillaman
On Fri, Mar 13, 2020 at 11:17 AM Matt Dunavant
 wrote:
>
> I'm not sure of the last known good release of the rbd CLI where this worked. 
> I just ran the sha1sum against the images and they always come up as 
> different. Might be worth knowing, this is a volume that's provisioned at 
> 512GB (with much less actually used) but after export, it only shows up as 
> about 56GB.

The resulting image from the "rbd import" only shows up as 56GiB?

> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible bug with rbd export/import?

2020-03-13 Thread Matt Dunavant
Jason Dillaman wrote:
> On Fri, Mar 13, 2020 at 11:17 AM Matt Dunavant
>  > 
> >  I'm not sure of the last known good release of the rbd CLI where this 
> > worked. I just
> > ran the sha1sum against the images and they always come up as different. 
> > Might be worth
> > knowing, this is a volume that's provisioned at 512GB (with much less 
> > actually used)
> > but after export, it only shows up as about 56GB. 
> The resulting image from the "rbd import" only shows up as 56GiB?
> 
> >  ___
> >  ceph-users mailing list -- ceph-users(a)ceph.io
> >  To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >
Yeah, it's super odd. The actual content in the 512GB rbd image is probably 
about 50ish GB but the command isn't killing itself early or throwing any 
errors. I believe when I run the export to a middleman and then import, the 
image shows up as the correct size. I'll test that in a bit.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible bug with rbd export/import?

2020-03-13 Thread Jason Dillaman
On Fri, Mar 13, 2020 at 11:36 AM Matt Dunavant
 wrote:
>
> Jason Dillaman wrote:
> > On Fri, Mar 13, 2020 at 11:17 AM Matt Dunavant
> >  > >
> > >  I'm not sure of the last known good release of the rbd CLI where this 
> > > worked. I just
> > > ran the sha1sum against the images and they always come up as different. 
> > > Might be worth
> > > knowing, this is a volume that's provisioned at 512GB (with much less 
> > > actually used)
> > > but after export, it only shows up as about 56GB.
> > The resulting image from the "rbd import" only shows up as 56GiB?
> >
> > >  ___
> > >  ceph-users mailing list -- ceph-users(a)ceph.io
> > >  To unsubscribe send an email to ceph-users-leave(a)ceph.io
> > >
> Yeah, it's super odd. The actual content in the 512GB rbd image is probably 
> about 50ish GB but the command isn't killing itself early or throwing any 
> errors. I believe when I run the export to a middleman and then import, the 
> image shows up as the correct size. I'll test that in a bit.

Couple test cases to try:

Does the "rbd export" progress bar get to 100%?
If you run "rbd export - > some_file" does it create a 512GiB file?
If you run "rbd import" with a "--sparse-size 0" argument, does it
change the result?

> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MGRs failing once per day and generally slow response times

2020-03-13 Thread Janek Bevendorff
Indeed. I just had another MGR go bye-bye. I don't think host clock skew 
is the problem.



On 13/03/2020 15:29, Anthony D'Atri wrote:

Chrony does converge faster, but I doubt this will solve your problem if you 
don’t have quality peers.  Or if it’s not really a time problem.


On Mar 13, 2020, at 6:44 AM, Janek Bevendorff  
wrote:

I replaced ntpd with chronyd and will let you know if it changes anything. 
Thanks.



On 13/03/2020 06:25, Konstantin Shalygin wrote:

On 3/13/20 12:57 AM, Janek Bevendorff wrote:
NTPd is running, all the nodes have the same time to the second. I don't think 
that is the problem.

As always in such cases - try to switch your ntpd to default EL7 daemon - 
chronyd.



k

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Bauhaus-Universität Weimar
Bauhausstr. 9a, Room 308
99423 Weimar, Germany

Phone: +49 (0)3643 - 58 3577
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Inactive PGs

2020-03-13 Thread Wido den Hollander


On 3/13/20 4:09 PM, Peter Eisch wrote:
> Full cluster is 14.2.8.
> 
> I had some OSD drop overnight which results now in 4 inactive PGs. The
> pools had three participant (2 ssd, 1 sas) OSDs. In each pool at least 1
> ssd and 1 sas OSD is working without issue. I’ve ‘ceph pg repair ’
> but it doesn’t seem to make any changes.
> 
> PG_AVAILABILITY Reduced data availability: 4 pgs inactive, 4 pgs incomplete
> pg 10.2e is incomplete, acting [59,67]
> pg 10.c3 is incomplete, acting [62,105]
> pg 10.f3 is incomplete, acting [62,59]
> pg 10.1d5 is incomplete, acting [87,106]
> 
> Using `ceph pg  query` I can see the OSD in each case of the ones
> which failed. Respectively they are:
> pg 10.2e participants: 59, 68, 77, 143
> pg 10.c3 participants: 60, 62, 85, 102, 105, 106
> pg 10.f3 participants: 59, 64, 75, 107
> pg 10.1d5 participants: 64, 77, 87, 106
> 
> The OSDs which are now down/out and have been removed from the crush map
> and removed the auth are:
> 62, 64, 68
> 
> Of course I have lots of reports of slow OSDs now from OSDs worried
> about the inactive PGs.
> 
> How do I properly kick these PGs to have them drop their usage of the
> OSDs which no longer exist?

You don't. Because those OSDs hold the data you need.

Why did  you remove them from the CRUSHMap, OSDMap and auth? As you need
these to rebuild the PGs.

Wido

> 
> Thanks for you thoughts on this,
> 
> peter
> 
> Peter Eisch​
> Senior Site Reliability Engineer
> 
> T
> 
>   *1.612.445.5135* 
> 
> Facebook 
> 
>   
> LinkedIn 
> 
>   
> Twitter 
> 
> *virginpulse.com* 
>   
> |
> 
>   *virginpulse.com/global-challenge*
> 
> 
> Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | 
> Switzerland | United Kingdom | USA
> 
> Confidentiality Notice: The information contained in this e-mail,
> including any attachment(s), is intended solely for use by the
> designated recipient(s). Unauthorized use, dissemination, distribution,
> or reproduction of this message by anyone other than the intended
> recipient(s), or a person designated as responsible for delivering such
> messages to the intended recipient, is strictly prohibited and may be
> unlawful. This e-mail may contain proprietary, confidential or
> privileged information. Any views or opinions expressed are solely those
> of the author and do not necessarily represent those of Virgin Pulse,
> Inc. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender and delete this
> e-mail message.
> 
> v2.64
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Inactive PGs

2020-03-13 Thread Peter Eisch



Peter Eisch
Senior Site Reliability Engineer
T1.612.445.5135
virginpulse.com
|virginpulse.com/global-challenge
Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA
Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.
v2.64
On 3/13/20, 11:38 AM, "Wido den Hollander"  wrote:

This email originates outside Virgin Pulse.


On 3/13/20 4:09 PM, Peter Eisch wrote:
> Full cluster is 14.2.8.
>
> I had some OSD drop overnight which results now in 4 inactive PGs. The
> pools had three participant (2 ssd, 1 sas) OSDs. In each pool at least 1
> ssd and 1 sas OSD is working without issue. I’ve ‘ceph pg repair ’
> but it doesn’t seem to make any changes.
>
> PG_AVAILABILITY Reduced data availability: 4 pgs inactive, 4 pgs 
incomplete
> pg 10.2e is incomplete, acting [59,67]
> pg 10.c3 is incomplete, acting [62,105]
> pg 10.f3 is incomplete, acting [62,59]
> pg 10.1d5 is incomplete, acting [87,106]
>
> Using `ceph pg  query` I can see the OSD in each case of the ones
> which failed. Respectively they are:
> pg 10.2e participants: 59, 68, 77, 143
> pg 10.c3 participants: 60, 62, 85, 102, 105, 106
> pg 10.f3 participants: 59, 64, 75, 107
> pg 10.1d5 participants: 64, 77, 87, 106
>
> The OSDs which are now down/out and have been removed from the crush map
> and removed the auth are:
> 62, 64, 68
>
> Of course I have lots of reports of slow OSDs now from OSDs worried
> about the inactive PGs.
>
> How do I properly kick these PGs to have them drop their usage of the
> OSDs which no longer exist?

You don't. Because those OSDs hold the data you need.

Why did  you remove them from the CRUSHMap, OSDMap and auth? As you need
these to rebuild the PGs.

Wido

The drives failed at a hardware level.  I've replaced OSDs with this by either 
planned migration or failure in previous instances without issue.  I didn't 
realize all the replicated copies were on just one drive in each pool.

What should my actions have been in this case?

  pool 10 volumes' replicated size 2 min_size 1 crush_rule 1 object_hash 
rjenkins pg_num 512 pgp_num 512 autoscale_mode warn last_change 47570 lfor 
0/0/40781 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd

Crush rule 1:
rule ssd_by_host {
id 1
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}

peter

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Inactive PGs

2020-03-13 Thread Wido den Hollander


On 3/13/20 5:44 PM, Peter Eisch wrote:
> 
> 
> 
> Peter Eisch​
> Senior Site Reliability Engineer
> 
> T
> 
>   *1.612.445.5135* 
> 
> Facebook 
> 
>   
> LinkedIn 
> 
>   
> Twitter 
> 
> *virginpulse.com* 
>   
> |
> 
>   *virginpulse.com/global-challenge*
> 
> 
> Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | 
> Switzerland | United Kingdom | USA
> 
> Confidentiality Notice: The information contained in this e-mail,
> including any attachment(s), is intended solely for use by the
> designated recipient(s). Unauthorized use, dissemination, distribution,
> or reproduction of this message by anyone other than the intended
> recipient(s), or a person designated as responsible for delivering such
> messages to the intended recipient, is strictly prohibited and may be
> unlawful. This e-mail may contain proprietary, confidential or
> privileged information. Any views or opinions expressed are solely those
> of the author and do not necessarily represent those of Virgin Pulse,
> Inc. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender and delete this
> e-mail message.
> 
> v2.64
> 
> On 3/13/20, 11:38 AM, "Wido den Hollander"  wrote:
> 
> This email originates outside Virgin Pulse.
> 
> 
> On 3/13/20 4:09 PM, Peter Eisch wrote:
>> Full cluster is 14.2.8.
>>
>> I had some OSD drop overnight which results now in 4 inactive PGs. The
>> pools had three participant (2 ssd, 1 sas) OSDs. In each pool at least 1
>> ssd and 1 sas OSD is working without issue. I’ve ‘ceph pg repair ’
>> but it doesn’t seem to make any changes.
>>
>> PG_AVAILABILITY Reduced data availability: 4 pgs inactive, 4 pgs
> incomplete
>> pg 10.2e is incomplete, acting [59,67]
>> pg 10.c3 is incomplete, acting [62,105]
>> pg 10.f3 is incomplete, acting [62,59]
>> pg 10.1d5 is incomplete, acting [87,106]
>>
>> Using `ceph pg  query` I can see the OSD in each case of the ones
>> which failed. Respectively they are:
>> pg 10.2e participants: 59, 68, 77, 143
>> pg 10.c3 participants: 60, 62, 85, 102, 105, 106
>> pg 10.f3 participants: 59, 64, 75, 107
>> pg 10.1d5 participants: 64, 77, 87, 106
>>
>> The OSDs which are now down/out and have been removed from the crush map
>> and removed the auth are:
>> 62, 64, 68
>>
>> Of course I have lots of reports of slow OSDs now from OSDs worried
>> about the inactive PGs.
>>
>> How do I properly kick these PGs to have them drop their usage of the
>> OSDs which no longer exist?
> 
> You don't. Because those OSDs hold the data you need.
> 
> Why did you remove them from the CRUSHMap, OSDMap and auth? As you need
> these to rebuild the PGs.
> 
> Wido
> 
> The drives failed at a hardware level. I've replaced OSDs with this by
> either planned migration or failure in previous instances without issue.
> I didn't realize all the replicated copies were on just one drive in
> each pool.
> > What should my actions have been in this case?

Try to get those OSDs online again. Maybe try a rescue of the disks or
see how the OSDs would be able to start.

A tool like dd_rescue can help in getting such a thing done.

> 
> pool 10 volumes' replicated size 2 min_size 1 crush_rule 1 object_hash
> rjenkins pg_num 512 pgp_num 512 autoscale_mode warn last_change 47570
> lfor 0/0/40781 flags hashpspool,selfmanaged_snaps stripe_width 0
> application rbd

I see you use 2x replication with min_size=1, that's dangerous and can
easily lead to data loss.

I wouldn't say it's impossible to get the data back, but something like
this can take a while (a lot of hours) to be brought back online.

Wido

> 
> Crush rule 1:
> rule ssd_by_host {
> id 1
> type replicated
> min_size 1
> max_size 10
> step take default class ssd
> step chooseleaf firstn 0 type host
> step emit
> }
> 
> peter
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible bug with rbd export/import?

2020-03-13 Thread Matt Dunavant
Jason Dillaman wrote:
> On Fri, Mar 13, 2020 at 11:36 AM Matt Dunavant
>  > 
> >  Jason Dillaman wrote:
> >  > On Fri, Mar 13, 2020 at 11:17 AM Matt Dunavant
> >  >  >  > >
> >  > >  I'm not sure of the last known good release of the rbd CLI where this
> > worked. I just
> >  > > ran the sha1sum against the images and they always come up as 
> > different. Might
> > be worth
> >  > > knowing, this is a volume that's provisioned at 512GB (with much less
> > actually used)
> >  > > but after export, it only shows up as about 56GB.
> >  > The resulting image from the "rbd import" only shows up as 56GiB?
> >  >
> >  > >  ___
> >  > >  ceph-users mailing list -- ceph-users(a)ceph.io
> >  > >  To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >  > >
> >  Yeah, it's super odd. The actual content in the 512GB rbd image is 
> > probably about
> > 50ish GB but the command isn't killing itself early or throwing any errors. 
> > I believe
> > when I run the export to a middleman and then import, the image shows up as 
> > the correct
> > size. I'll test that in a bit. 
> Couple test cases to try:
> 
> Does the "rbd export" progress bar get to 100%?
> If you run "rbd export - > some_file" does it create a 512GiB file?
> If you run "rbd import" with a "--sparse-size 0" argument, does it
> change the result?
> 
> >  ___
> >  ceph-users mailing list -- ceph-users(a)ceph.io
> >  To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >
rbd export progress bar gets to 100% but there's some weird behavior. It'll 
jump immediately to 14%, wait for a bit, and then slowly climb to 100%

A few results:

1) rbd export of a snapshot results in an incorrectly sized drive but correct 
sha1sum.
2) rbd export > some_file creates a 512GB file and a correctly sized import and 
sha1sum, however the VM disk has some sort of corruption and the OS won't 
properly load.
3) rbd import with --sparse-size0 results in an incorrect sha1sum
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible bug with rbd export/import?

2020-03-13 Thread Jason Dillaman
On Fri, Mar 13, 2020 at 2:48 PM Matt Dunavant
 wrote:
>
> Jason Dillaman wrote:
> > On Fri, Mar 13, 2020 at 11:36 AM Matt Dunavant
> >  > >
> > >  Jason Dillaman wrote:
> > >  > On Fri, Mar 13, 2020 at 11:17 AM Matt Dunavant
> > >  >  > >  > >
> > >  > >  I'm not sure of the last known good release of the rbd CLI where 
> > > this
> > > worked. I just
> > >  > > ran the sha1sum against the images and they always come up as 
> > > different. Might
> > > be worth
> > >  > > knowing, this is a volume that's provisioned at 512GB (with much less
> > > actually used)
> > >  > > but after export, it only shows up as about 56GB.
> > >  > The resulting image from the "rbd import" only shows up as 56GiB?
> > >  >
> > >  > >  ___
> > >  > >  ceph-users mailing list -- ceph-users(a)ceph.io
> > >  > >  To unsubscribe send an email to ceph-users-leave(a)ceph.io
> > >  > >
> > >  Yeah, it's super odd. The actual content in the 512GB rbd image is 
> > > probably about
> > > 50ish GB but the command isn't killing itself early or throwing any 
> > > errors. I believe
> > > when I run the export to a middleman and then import, the image shows up 
> > > as the correct
> > > size. I'll test that in a bit.
> > Couple test cases to try:
> >
> > Does the "rbd export" progress bar get to 100%?
> > If you run "rbd export - > some_file" does it create a 512GiB file?
> > If you run "rbd import" with a "--sparse-size 0" argument, does it
> > change the result?
> >
> > >  ___
> > >  ceph-users mailing list -- ceph-users(a)ceph.io
> > >  To unsubscribe send an email to ceph-users-leave(a)ceph.io
> > >
> rbd export progress bar gets to 100% but there's some weird behavior. It'll 
> jump immediately to 14%, wait for a bit, and then slowly climb to 100%
>
> A few results:
>
> 1) rbd export of a snapshot results in an incorrectly sized drive but correct 
> sha1sum.

The "export" image was the wrong size as compared to "rbd info
[image]@[snap]"? What are you comparing the sha1sum against?

> 2) rbd export > some_file creates a 512GB file and a correctly sized import 
> and sha1sum, however the VM disk has some sort of corruption and the OS won't 
> properly load.

The sha1sum of the "some_file" matches or the sha1sum from
re-exporting the newly imported image?

> 3) rbd import with --sparse-size0 results in an incorrect sha1sum
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible bug with rbd export/import?

2020-03-13 Thread Jason Dillaman
On Fri, Mar 13, 2020 at 3:31 PM Jason Dillaman  wrote:
>
> On Fri, Mar 13, 2020 at 2:48 PM Matt Dunavant
>  wrote:
> >
> > Jason Dillaman wrote:
> > > On Fri, Mar 13, 2020 at 11:36 AM Matt Dunavant
> > >  > > >
> > > >  Jason Dillaman wrote:
> > > >  > On Fri, Mar 13, 2020 at 11:17 AM Matt Dunavant
> > > >  >  > > >  > >
> > > >  > >  I'm not sure of the last known good release of the rbd CLI where 
> > > > this
> > > > worked. I just
> > > >  > > ran the sha1sum against the images and they always come up as 
> > > > different. Might
> > > > be worth
> > > >  > > knowing, this is a volume that's provisioned at 512GB (with much 
> > > > less
> > > > actually used)
> > > >  > > but after export, it only shows up as about 56GB.
> > > >  > The resulting image from the "rbd import" only shows up as 56GiB?
> > > >  >
> > > >  > >  ___
> > > >  > >  ceph-users mailing list -- ceph-users(a)ceph.io
> > > >  > >  To unsubscribe send an email to ceph-users-leave(a)ceph.io
> > > >  > >
> > > >  Yeah, it's super odd. The actual content in the 512GB rbd image is 
> > > > probably about
> > > > 50ish GB but the command isn't killing itself early or throwing any 
> > > > errors. I believe
> > > > when I run the export to a middleman and then import, the image shows 
> > > > up as the correct
> > > > size. I'll test that in a bit.
> > > Couple test cases to try:
> > >
> > > Does the "rbd export" progress bar get to 100%?
> > > If you run "rbd export - > some_file" does it create a 512GiB file?
> > > If you run "rbd import" with a "--sparse-size 0" argument, does it
> > > change the result?
> > >
> > > >  ___
> > > >  ceph-users mailing list -- ceph-users(a)ceph.io
> > > >  To unsubscribe send an email to ceph-users-leave(a)ceph.io
> > > >
> > rbd export progress bar gets to 100% but there's some weird behavior. It'll 
> > jump immediately to 14%, wait for a bit, and then slowly climb to 100%
> >
> > A few results:
> >
> > 1) rbd export of a snapshot results in an incorrectly sized drive but 
> > correct sha1sum.
>
> The "export" image was the wrong size as compared to "rbd info
> [image]@[snap]"? What are you comparing the sha1sum against?

Another thing to try would be to compare "rbd export
--rbd_concurrent_management_ops=1 [image] - > [some_file]" against an
"rbd export [image] [some_file]". The only change (since v14.2.3 for
rbd import/export) was to speed-up exports to STDOUT by issuing
concurrent I/O.

> > 2) rbd export > some_file creates a 512GB file and a correctly sized import 
> > and sha1sum, however the VM disk has some sort of corruption and the OS 
> > won't properly load.
>
> The sha1sum of the "some_file" matches or the sha1sum from
> re-exporting the newly imported image?
>
> > 3) rbd import with --sparse-size0 results in an incorrect sha1sum
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Jason



-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a better way to make a samba/nfs gateway?

2020-03-13 Thread Seth Galitzer
Thanks to all who have offered advise on this. I have been looking at 
using vfs_ceph in samba, but I'm unsure how to get it on Centos7. As I 
understand it, it's optional at compile time. When searching for a 
package for it, I see one glusterfs (samba-vfs-glusterfs), but nothing 
for ceph. Is it just enabled in the Centos samba package, or will I have 
to compile my own samba binaries?


Thanks.
Seth

On 3/13/20 8:46 AM, Martin Verges wrote:

Hello,

we have a CTDB based HA Samba in our Ceph Management Solution.
It works like a charm and we connect it to existing active 
directories as well.


It's based on vfs_ceph and you can read more about how to configure it 
yourself on 
https://www.samba.org/samba/docs/current/man-html/vfs_ceph.8.html.


--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io 
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Fr., 13. März 2020 um 13:06 Uhr schrieb Nathan Fish 
mailto:lordci...@gmail.com>>:


Note that we have had issues with deadlocks when re-exporting CephFS
via Samba. It appears to only occur with Mac clients, though. In some
cases it has hung on a request for a high-level directory and hung
that branch for all clients.

On Fri, Mar 13, 2020 at 1:56 AM Konstantin Shalygin mailto:k0...@k0ste.ru>> wrote:
 >
 >
 > On 3/11/20 11:16 PM, Seth Galitzer wrote:
 > > I have a hybrid environment and need to share with both Linux and
 > > Windows clients. For my previous iterations of file storage, I
 > > exported nfs and samba shares directly from my monolithic file
server.
 > > All Linux clients used nfs and all Windows clients used samba. Now
 > > that I've switched to ceph, things are a bit more complicated.
I built
 > > a gateway to export nfs and samba as needed, and connect that as a
 > > client to my ceph cluster.
 > >
 > > After having file locking problems with kernel nfs, I made the
switch
 > > to nfs-ganesha, which has helped immensely. For Linux clients that
 > > have high I/O needs, like desktops and some web servers, I
connect to
 > > ceph directly for those shares. For all other Linux needs, I
use nfs
 > > from the gateway. For all Windows clients (desktops and a small
number
 > > of servers), I use samba exported from the gateway.
 > >
 > > Since my ceph cluster went live in August, I have had some kind of
 > > strange (to me) error at least once a week, almost always
related to
 > > the gateway client. Last night, it was MDS_CLIENT_OLDEST_TID. Since
 > > we're on Spring Break at my university and not very busy, I
decided to
 > > unmount/remount the ceph share, requiring stopping nfs and samba
 > > services. Stopping nfs-ganesha took a while, but it finally
completed
 > > with no complaints from the ceph cluster. Stopping samba took
longer
 > > and gave me MDS_SLOW_REQUEST and MDS_CLIENT_LATE_RELEASE on the
mds.
 > > It finally finished, and I was able to unmount/remount the ceph
share
 > > and that finally cleared all the errors.
 > >
 > > This is leading me to believe that samba on the gateway and all the
 > > clients attaching to that is putting a strain on the connection
back
 > > to ceph. Which finally brings me to my question: is there a
better way
 > > to export samba to my clients using the ceph back end? Or is
this as
 > > good as it gets and I just have to put up with the seemingly
frequent
 > > errors? I can live with the errors and have been able to handle
them
 > > so far, but I know people who have much bigger clusters and
many more
 > > clients than me (by an order of magnitude) and don't see nearly as
 > > many errors as I do. Which is why I'm trying to figure out what is
 > > special about my setup.
 > >
 > > All my ceph nodes are running latest nautilus on Centos 7 (I just
 > > updated last week to 14.2.8), as is the gateway host. I'm mounting
 > > ceph directly on the gateway (by way of the kernel using
cephfs, not
 > > rados/rbd) to a single mount point and exporting from there.
 > >
 > > My searches so far have not turned up anything extraordinarily
useful,
 > > so I'm asking for some guidance here. Any advice is welcome.
 >
 > You can connect to your cluster directly from userland, without
kernel.
 > Use Samba vfs_ceph for this.
 >
 >
 >
 > k
 > ___
 > ceph-users mailing list -- ceph-users@ceph.io

 > To unsubscribe send an email to ceph-users-le...@ceph.io

___

[ceph-users] How to get num ops blocked per OSD

2020-03-13 Thread Robert LeBlanc
For Jewel I wrote a script to take the output of `ceph health detail
--format=json` and send alerts to our system that ordered the osds based on
how long the ops were blocked and which OSDs had the most ops blocked. This
was really helpful to quickly identify which OSD out of a list of 100 would
be the most probable one having issues. Since upgrading to Luminous, I
don't get that and I'm not sure where that info went to. Do I need to query
the manager now?

This is the regex I was using to extract the pertinent information:

'^(\d+) ops are blocked > (\d+\.+\d+) sec on osd\.(\d+)$'

Thanks,
Robert LeBlanc

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to get num ops blocked per OSD

2020-03-13 Thread Anthony D'Atri
Yeah the removal of that was annoying for sure.  ISTR that one can gather the 
information from the OSDs’ admin sockets.

Envision a Prometheus exporter that polls the admin sockets (in parallel) and 
Grafana panes that graph slow requests by OSD and by node.


> On Mar 13, 2020, at 4:14 PM, Robert LeBlanc  wrote:
> 
> For Jewel I wrote a script to take the output of `ceph health detail
> --format=json` and send alerts to our system that ordered the osds based on
> how long the ops were blocked and which OSDs had the most ops blocked. This
> was really helpful to quickly identify which OSD out of a list of 100 would
> be the most probable one having issues. Since upgrading to Luminous, I
> don't get that and I'm not sure where that info went to. Do I need to query
> the manager now?
> 
> This is the regex I was using to extract the pertinent information:
> 
> '^(\d+) ops are blocked > (\d+\.+\d+) sec on osd\.(\d+)$'
> 
> Thanks,
> Robert LeBlanc
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io