date:20190910

[ceph-users] Re: vfs_ceph and permissions

2019-09-10 Thread Marco Gaiarin

Mandi! Konstantin Shalygin
  In chel di` si favelave...

> >  vfs objects = acl_xattr full_audit
[...]
> >  vfs objects = ceph

> You have doubled `vfs objects` option, but this option is stackable and
> should be `vfs objects = acl_xattr full_audit ceph`, I think...

Yes, the latter override the former, so the two 'vfs objects' mean
for samba 'vfs objects = ceph' only.


Do a 'samba-tool testparm' and verify how samba read the file...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] 2 OpenStack environment, 1 Ceph cluster

2019-09-10 Thread vladimir franciz blando

I have 2 OpenStack environment that I want to integrate to an existing ceph
cluster.  I know technically it can be done but has anyone tried this?

- Vlad
ᐧ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster

2019-09-10 Thread Wesley Peng




on 2019/9/10 17:14, vladimir franciz blando wrote:
I have 2 OpenStack environment that I want to integrate to an existing 
ceph cluster.  I know technically it can be done but has anyone tried this?




Sure you can. Ceph could be deployed as separate storage service, 
openstack is just its customer. You can have one customer, but also can 
have multi-customers for Ceph service.


regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Host failure trigger " Cannot allocate memory"

2019-09-10 Thread Amudhan P

Hi,

I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.

My current setup:
3 nodes, 1 node contain two bricks and other 2 nodes contain single brick
each.

Volume is a 3 replica, I am trying to simulate node failure.

I powered down one host and started getting msg in other systems when
running any command
"-bash: fork: Cannot allocate memory" and system not responding to commands.

what could be the reason for this?
at this stage, I could able to read some of the data stored in the volume
and some just waiting for IO.

output from "sudo ceph -s"
  cluster:
id: 7c138e13-7b98-4309-b591-d4091a1742b4
health: HEALTH_WARN
1 osds down
2 hosts (3 osds) down
Degraded data redundancy: 5313488/7970232 objects degraded
(66.667%), 64 pgs degraded

  services:
mon: 1 daemons, quorum mon01
mgr: mon01(active)
mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
osd: 4 osds: 1 up, 2 in

  data:
pools:   2 pools, 64 pgs
objects: 2.66 M objects, 206 GiB
usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
pgs: 5313488/7970232 objects degraded (66.667%)
 64 active+undersized+degraded

  io:
client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr

output from : sudo ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
 0   hdd 1.819400 0 B 0 B 0 B 00   0
 3   hdd 1.819400 0 B 0 B 0 B 00   0
 1   hdd 1.81940  1.0 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
 2   hdd 1.81940  1.0 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03

regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster [EXT]

2019-09-10 Thread Dave Holland

On Tue, Sep 10, 2019 at 05:14:34PM +0800, vladimir franciz blando wrote:
> I have 2 OpenStack environment that I want to integrate to an
> existing ceph cluster.  I know technically it can be done but has
> anyone tried this?

Yes, it works fine. You need each OpenStack to have a different client
key, so that they can't trample on each other's pools; and you need some
sort of naming convention, so you can tell which pool belongs to which
OpenStack.

For example, our OpenStack deployments are named after Greek letters, so
we have ceph-ansible create the pools eta-images, eta-vms,
eta-volumes... zeta-images, zeta-vms, zeta-volumes for the two
deployments "eta" and "zeta". ceph-ansible also manages the different
client keys, "client.openstack-eta" and "client.openstack-zeta", with
permissions only for the appropriate pools.

We use Red Hat OpenStack (based on TripleO) so for each deployment there
is a yaml file describing where its Ceph is, e.g. for Eta it looks like
this:

parameter_defaults:
  CephClusterFSID: '12341234-1234-1234-1234-123412341234'
  CephClientKey: 'asdfasdfasdfasdfasdfasdfasdfasdfasdfas=='
  CephExternalMonHost: '1.2.3.4, 5.6.7.8, 9.10.11.12'
  NovaRbdPoolName: eta-vms
  CinderRbdPoolName: eta-volumes
  GlanceRbdPoolName: eta-images
  CephClientUserName: openstack-eta

The FSID and mons will be the same for each deployment but the key and
pool names will be different.

Cheers,
Dave
-- 
** Dave Holland ** Systems Support -- Informatics Systems Group **
** 01223 496923 **Wellcome Sanger Institute, Hinxton, UK**

-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster

2019-09-10 Thread Massimo Sgaravatto

We have a single ceph cluster used by 2 openstack installations.

We use different ceph pools for the 2 openstack clusters.
For nova, cinder and glance this is straightforward.

It was a bit more complicated fo radosgw. In this case the setup I used was:

- creating 2 realms (one for each cloud)
- creating one zonegroup for each realm
- creating one zone for each zonegroup
- having 1 ore more rgw instances for each zone

I don't know if there are simpler approaches

Cheers, Massimo

On Tue, Sep 10, 2019 at 11:20 AM Wesley Peng  wrote:

>
>
> on 2019/9/10 17:14, vladimir franciz blando wrote:
> > I have 2 OpenStack environment that I want to integrate to an existing
> > ceph cluster.  I know technically it can be done but has anyone tried
> this?
> >
>
> Sure you can. Ceph could be deployed as separate storage service,
> openstack is just its customer. You can have one customer, but also can
> have multi-customers for Ceph service.
>
> regards.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster [EXT]

2019-09-10 Thread Matthew H

This is almost inline with how I did it before.. and i was using Red Hat 
OpenStack as well.

From: Dave Holland 
Sent: Tuesday, September 10, 2019 5:32 AM
To: vladimir franciz blando 
Cc: ceph-users@ceph.io 
Subject: [ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster [EXT]

On Tue, Sep 10, 2019 at 05:14:34PM +0800, vladimir franciz blando wrote:
> I have 2 OpenStack environment that I want to integrate to an
> existing ceph cluster.  I know technically it can be done but has
> anyone tried this?

Yes, it works fine. You need each OpenStack to have a different client
key, so that they can't trample on each other's pools; and you need some
sort of naming convention, so you can tell which pool belongs to which
OpenStack.

For example, our OpenStack deployments are named after Greek letters, so
we have ceph-ansible create the pools eta-images, eta-vms,
eta-volumes... zeta-images, zeta-vms, zeta-volumes for the two
deployments "eta" and "zeta". ceph-ansible also manages the different
client keys, "client.openstack-eta" and "client.openstack-zeta", with
permissions only for the appropriate pools.

We use Red Hat OpenStack (based on TripleO) so for each deployment there
is a yaml file describing where its Ceph is, e.g. for Eta it looks like
this:

parameter_defaults:
  CephClusterFSID: '12341234-1234-1234-1234-123412341234'
  CephClientKey: 'asdfasdfasdfasdfasdfasdfasdfasdfasdfas=='
  CephExternalMonHost: '1.2.3.4, 5.6.7.8, 9.10.11.12'
  NovaRbdPoolName: eta-vms
  CinderRbdPoolName: eta-volumes
  GlanceRbdPoolName: eta-images
  CephClientUserName: openstack-eta

The FSID and mons will be the same for each deployment but the key and
pool names will be different.

Cheers,
Dave
--
** Dave Holland ** Systems Support -- Informatics Systems Group **
** 01223 496923 **Wellcome Sanger Institute, Hinxton, UK**

--
 The Wellcome Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

2019-09-10 Thread Amudhan P

I have also found below error in dmesg.

[332884.028810] systemd-journald[6240]: Failed to parse kernel command
line, ignoring: Cannot allocate memory
[332885.054147] systemd-journald[6240]: Out of memory.
[332894.844765] systemd[1]: systemd-journald.service: Main process exited,
code=exited, status=1/FAILURE
[332897.199736] systemd[1]: systemd-journald.service: Failed with result
'exit-code'.
[332906.503076] systemd[1]: Failed to start Journal Service.
[332937.909198] systemd[1]: ceph-crash.service: Main process exited,
code=exited, status=1/FAILURE
[332939.308341] systemd[1]: ceph-crash.service: Failed with result
'exit-code'.
[332949.545907] systemd[1]: systemd-journald.service: Service has no
hold-off time, scheduling restart.
[332949.546631] systemd[1]: systemd-journald.service: Scheduled restart
job, restart counter is at 7.
[332949.546781] systemd[1]: Stopped Journal Service.
[332949.566402] systemd[1]: Starting Journal Service...
[332950.190332] systemd[1]: ceph-osd@1.service: Main process exited,
code=killed, status=6/ABRT
[332950.190477] systemd[1]: ceph-osd@1.service: Failed with result 'signal'.
[332950.842297] systemd-journald[6249]: File
/var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted
or uncleanly shut down, renaming and replacing.
[332951.019531] systemd[1]: Started Journal Service.

On Tue, Sep 10, 2019 at 3:04 PM Amudhan P  wrote:

> Hi,
>
> I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
>
> My current setup:
> 3 nodes, 1 node contain two bricks and other 2 nodes contain single brick
> each.
>
> Volume is a 3 replica, I am trying to simulate node failure.
>
> I powered down one host and started getting msg in other systems when
> running any command
> "-bash: fork: Cannot allocate memory" and system not responding to
> commands.
>
> what could be the reason for this?
> at this stage, I could able to read some of the data stored in the volume
> and some just waiting for IO.
>
> output from "sudo ceph -s"
>   cluster:
> id: 7c138e13-7b98-4309-b591-d4091a1742b4
> health: HEALTH_WARN
> 1 osds down
> 2 hosts (3 osds) down
> Degraded data redundancy: 5313488/7970232 objects degraded
> (66.667%), 64 pgs degraded
>
>   services:
> mon: 1 daemons, quorum mon01
> mgr: mon01(active)
> mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
> osd: 4 osds: 1 up, 2 in
>
>   data:
> pools:   2 pools, 64 pgs
> objects: 2.66 M objects, 206 GiB
> usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
> pgs: 5313488/7970232 objects degraded (66.667%)
>  64 active+undersized+degraded
>
>   io:
> client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr
>
> output from : sudo ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
>  0   hdd 1.819400 0 B 0 B 0 B 00   0
>  3   hdd 1.819400 0 B 0 B 0 B 00   0
>  1   hdd 1.81940  1.0 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
>  2   hdd 1.81940  1.0 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
> TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03
>
> regards
> Amudhan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

2019-09-10 Thread Ashley Merrick

What's specs ate the machines?


Recovery work will use more memory the general clean operation and looks like 
your maxing out the available memory on the machines during CEPH trying to 
recover.




 On Tue, 10 Sep 2019 18:10:50 +0800 amudha...@gmail.com wrote 


I have also found below error in dmesg.



[332884.028810] systemd-journald[6240]: Failed to parse kernel command line, 
ignoring: Cannot allocate memory
[332885.054147] systemd-journald[6240]: Out of memory.
[332894.844765] systemd[1]: systemd-journald.service: Main process exited, 
code=exited, status=1/FAILURE
[332897.199736] systemd[1]: systemd-journald.service: Failed with result 
'exit-code'.
[332906.503076] systemd[1]: Failed to start Journal Service.
[332937.909198] systemd[1]: ceph-crash.service: Main process exited, 
code=exited, status=1/FAILURE
[332939.308341] systemd[1]: ceph-crash.service: Failed with result 'exit-code'.
[332949.545907] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
[332949.546631] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 7.
[332949.546781] systemd[1]: Stopped Journal Service.
[332949.566402] systemd[1]: Starting Journal Service...
[332950.190332] systemd[1]: ceph-osd@1.service: Main process exited, 
code=killed, status=6/ABRT
[332950.190477] systemd[1]: ceph-osd@1.service: Failed with result 'signal'.
[332950.842297] systemd-journald[6249]: File 
/var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted or 
uncleanly shut down, renaming and replacing.
[332951.019531] systemd[1]: Started Journal Service.



On Tue, Sep 10, 2019 at 3:04 PM Amudhan P  wrote:

Hi,


I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.


My current setup:

3 nodes, 1 node contain two bricks and other 2 nodes contain single brick each.


Volume is a 3 replica, I am trying to simulate node failure.


I powered down one host and started getting msg in other systems when running 
any command
"-bash: fork: Cannot allocate memory" and system not responding to commands.


what could be the reason for this?
at this stage, I could able to read some of the data stored in the volume and 
some just waiting for IO.


output from "sudo ceph -s"
  cluster:
    id:     7c138e13-7b98-4309-b591-d4091a1742b4
    health: HEALTH_WARN
            1 osds down
            2 hosts (3 osds) down
            Degraded data redundancy: 5313488/7970232 objects degraded 
(66.667%), 64 pgs degraded

  services:
    mon: 1 daemons, quorum mon01
    mgr: mon01(active)
    mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
    osd: 4 osds: 1 up, 2 in

  data:
    pools:   2 pools, 64 pgs
    objects: 2.66 M objects, 206 GiB
    usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
    pgs:     5313488/7970232 objects degraded (66.667%)
             64 active+undersized+degraded

  io:
    client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr



output from : sudo ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS
 0   hdd 1.81940        0     0 B     0 B     0 B     0    0   0
 3   hdd 1.81940        0     0 B     0 B     0 B     0    0   0
 1   hdd 1.81940  1.0 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
 2   hdd 1.81940  1.0 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
                    TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03



regards
Amudhan

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: regurlary 'no space left on device' when deleting on cephfs

2019-09-10 Thread Burkhard Linke


Hi,


do you use hard links in your workload? The 'no space left on device' 
message may also refer to too many stray files. Strays are either files 
that are to be deleted (e.g. the purge queue), but also files which are 
deleted, but hard links are still pointing to the same content. Since 
cephfs does not use an indirect layer between inodes and data, and the 
data chunks are named after the inode id, removing the original file 
will leave stray entries since cephfs is not able to rename the 
underlying rados objects.



There are 10 hidden directories for stray files, and given a maximum 
size of 100.000 entries you can store only up to 1 million entries. I 
don't know exactly how entries are distributed among the 10 directories, 
so the limit may be reached earlier for a single stray directory. The 
performance counters contains some values for stray, so they are easy to 
check. The daemonperf output also shows the current value.



The problem of the upper limit of directory entries was solved by 
directory fragmentation, so you should check whether fragmentation is 
allowed in your filesystem. You can also try to increase the upper 
directory entry limit, but this might lead to other problems (too large 
rados omap objects).



Regards,

Burkhard


--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

2019-09-10 Thread Amudhan P

Its a test cluster each node with a single OSD and 4GB RAM.

On Tue, Sep 10, 2019 at 3:42 PM Ashley Merrick 
wrote:

> What's specs ate the machines?
>
> Recovery work will use more memory the general clean operation and looks
> like your maxing out the available memory on the machines during CEPH
> trying to recover.
>
>
>
>  On Tue, 10 Sep 2019 18:10:50 +0800 * amudha...@gmail.com
>  * wrote 
>
> I have also found below error in dmesg.
>
> [332884.028810] systemd-journald[6240]: Failed to parse kernel command
> line, ignoring: Cannot allocate memory
> [332885.054147] systemd-journald[6240]: Out of memory.
> [332894.844765] systemd[1]: systemd-journald.service: Main process exited,
> code=exited, status=1/FAILURE
> [332897.199736] systemd[1]: systemd-journald.service: Failed with result
> 'exit-code'.
> [332906.503076] systemd[1]: Failed to start Journal Service.
> [332937.909198] systemd[1]: ceph-crash.service: Main process exited,
> code=exited, status=1/FAILURE
> [332939.308341] systemd[1]: ceph-crash.service: Failed with result
> 'exit-code'.
> [332949.545907] systemd[1]: systemd-journald.service: Service has no
> hold-off time, scheduling restart.
> [332949.546631] systemd[1]: systemd-journald.service: Scheduled restart
> job, restart counter is at 7.
> [332949.546781] systemd[1]: Stopped Journal Service.
> [332949.566402] systemd[1]: Starting Journal Service...
> [332950.190332] systemd[1]: ceph-osd@1.service: Main process exited,
> code=killed, status=6/ABRT
> [332950.190477] systemd[1]: ceph-osd@1.service: Failed with result
> 'signal'.
> [332950.842297] systemd-journald[6249]: File
> /var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted
> or uncleanly shut down, renaming and replacing.
> [332951.019531] systemd[1]: Started Journal Service.
>
> On Tue, Sep 10, 2019 at 3:04 PM Amudhan P  wrote:
>
> Hi,
>
> I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
>
> My current setup:
> 3 nodes, 1 node contain two bricks and other 2 nodes contain single brick
> each.
>
> Volume is a 3 replica, I am trying to simulate node failure.
>
> I powered down one host and started getting msg in other systems when
> running any command
> "-bash: fork: Cannot allocate memory" and system not responding to
> commands.
>
> what could be the reason for this?
> at this stage, I could able to read some of the data stored in the volume
> and some just waiting for IO.
>
> output from "sudo ceph -s"
>   cluster:
> id: 7c138e13-7b98-4309-b591-d4091a1742b4
> health: HEALTH_WARN
> 1 osds down
> 2 hosts (3 osds) down
> Degraded data redundancy: 5313488/7970232 objects degraded
> (66.667%), 64 pgs degraded
>
>   services:
> mon: 1 daemons, quorum mon01
> mgr: mon01(active)
> mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
> osd: 4 osds: 1 up, 2 in
>
>   data:
> pools:   2 pools, 64 pgs
> objects: 2.66 M objects, 206 GiB
> usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
> pgs: 5313488/7970232 objects degraded (66.667%)
>  64 active+undersized+degraded
>
>   io:
> client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr
>
> output from : sudo ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
>  0   hdd 1.819400 0 B 0 B 0 B 00   0
>  3   hdd 1.819400 0 B 0 B 0 B 00   0
>  1   hdd 1.81940  1.0 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
>  2   hdd 1.81940  1.0 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
> TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03
>
> regards
> Amudhan
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

2019-09-10 Thread Amudhan P

I am also getting this error msg in one node when other host is down.

ceph -s
Traceback (most recent call last):
  File "/usr/bin/ceph", line 130, in 
import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages



On Tue, Sep 10, 2019 at 4:39 PM Amudhan P  wrote:

> Its a test cluster each node with a single OSD and 4GB RAM.
>
> On Tue, Sep 10, 2019 at 3:42 PM Ashley Merrick 
> wrote:
>
>> What's specs ate the machines?
>>
>> Recovery work will use more memory the general clean operation and looks
>> like your maxing out the available memory on the machines during CEPH
>> trying to recover.
>>
>>
>>
>>  On Tue, 10 Sep 2019 18:10:50 +0800 * amudha...@gmail.com
>>  * wrote 
>>
>> I have also found below error in dmesg.
>>
>> [332884.028810] systemd-journald[6240]: Failed to parse kernel command
>> line, ignoring: Cannot allocate memory
>> [332885.054147] systemd-journald[6240]: Out of memory.
>> [332894.844765] systemd[1]: systemd-journald.service: Main process
>> exited, code=exited, status=1/FAILURE
>> [332897.199736] systemd[1]: systemd-journald.service: Failed with result
>> 'exit-code'.
>> [332906.503076] systemd[1]: Failed to start Journal Service.
>> [332937.909198] systemd[1]: ceph-crash.service: Main process exited,
>> code=exited, status=1/FAILURE
>> [332939.308341] systemd[1]: ceph-crash.service: Failed with result
>> 'exit-code'.
>> [332949.545907] systemd[1]: systemd-journald.service: Service has no
>> hold-off time, scheduling restart.
>> [332949.546631] systemd[1]: systemd-journald.service: Scheduled restart
>> job, restart counter is at 7.
>> [332949.546781] systemd[1]: Stopped Journal Service.
>> [332949.566402] systemd[1]: Starting Journal Service...
>> [332950.190332] systemd[1]: ceph-osd@1.service: Main process exited,
>> code=killed, status=6/ABRT
>> [332950.190477] systemd[1]: ceph-osd@1.service: Failed with result
>> 'signal'.
>> [332950.842297] systemd-journald[6249]: File
>> /var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted
>> or uncleanly shut down, renaming and replacing.
>> [332951.019531] systemd[1]: Started Journal Service.
>>
>> On Tue, Sep 10, 2019 at 3:04 PM Amudhan P  wrote:
>>
>> Hi,
>>
>> I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
>>
>> My current setup:
>> 3 nodes, 1 node contain two bricks and other 2 nodes contain single brick
>> each.
>>
>> Volume is a 3 replica, I am trying to simulate node failure.
>>
>> I powered down one host and started getting msg in other systems when
>> running any command
>> "-bash: fork: Cannot allocate memory" and system not responding to
>> commands.
>>
>> what could be the reason for this?
>> at this stage, I could able to read some of the data stored in the volume
>> and some just waiting for IO.
>>
>> output from "sudo ceph -s"
>>   cluster:
>> id: 7c138e13-7b98-4309-b591-d4091a1742b4
>> health: HEALTH_WARN
>> 1 osds down
>> 2 hosts (3 osds) down
>> Degraded data redundancy: 5313488/7970232 objects degraded
>> (66.667%), 64 pgs degraded
>>
>>   services:
>> mon: 1 daemons, quorum mon01
>> mgr: mon01(active)
>> mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
>> osd: 4 osds: 1 up, 2 in
>>
>>   data:
>> pools:   2 pools, 64 pgs
>> objects: 2.66 M objects, 206 GiB
>> usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
>> pgs: 5313488/7970232 objects degraded (66.667%)
>>  64 active+undersized+degraded
>>
>>   io:
>> client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr
>>
>> output from : sudo ceph osd df
>> ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
>>  0   hdd 1.819400 0 B 0 B 0 B 00   0
>>  3   hdd 1.819400 0 B 0 B 0 B 00   0
>>  1   hdd 1.81940  1.0 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
>>  2   hdd 1.81940  1.0 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
>> TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
>> MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03
>>
>> regards
>> Amudhan
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Manager plugins issues on new ceph-mgr nodes

2019-09-10 Thread Alexandru Cucu

Hello,

Running 14.2.3, updated from 14.2.1.
Until recently I've had ceph-mgr collocated with OSDs. I've installed
ceph-mgr on separate servers and everything looks OK in Ceph status
but there are multiple issues:

1. Dashboard only runs on old mgr servers. Tried restarting the
daemons and disable/enable the dashboard plugin. New mgr won't listen
on the dashboard port.
2. To (re)enable the dashboard plugin I had to use "--force"
# ceph mgr module enable dashboard
Error ENOENT: all mgr daemons do not support module 'dashboard',
pass --force to force enablement
3. When accessing the Cluster -> Manager modules menu in the dashboard
I get a 500 error message. The exact error below:


2019-09-10 15:01:39.270 7fb6d4916700  0 mgr[dashboard]
[10/Sep/2019:15:01:39] HTTP Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py", line
656, in respond
response.body = self.handler()
  File "/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py",
line 188, in __call__
self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/cherrypy/_cptools.py", line
221, in wrap
return self.newhandler(innerfunc, *args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 88,
in dashboard_exception_handler
return handler(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py",
line 34, in __call__
return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line
649, in inner
ret = func(*args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line
842, in wrapper
return func(*vpath, **params)
  File "/usr/share/ceph/mgr/dashboard/controllers/mgr_modules.py",
line 35, in list
obj['enabled'] = True
TypeError: 'NoneType' object does not support item assignment

2019-09-10 15:01:39.271 7fb6d4916700  0 mgr[dashboard]
[:::192.168.15.55:54860] [GET] [500] [0.014s] [admin] [1.3K]
/api/mgr/module
2019-09-10 15:01:39.272 7fb6d4916700  0 mgr[dashboard] ['{"status":
"500 Internal Server Error", "version": "3.2.2", "detail": "The server
encountered an unexpected condition which prevented it from fulfilling
the request.", "traceback": "Traceback (most recent call last):\\n
File \\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\",
line 656, in respond\\nresponse.body = self.handler()\\n  File
\\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line
188, in __call__\\nself.body = self.oldhandler(*args, **kwargs)\\n
 File \\"/usr/lib/python2.7/site-packages/cherrypy/_cptools.py\\",
line 221, in wrap\\nreturn self.newhandler(innerfunc, *args,
**kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/services/exception.py\\", line 88, in
dashboard_exception_handler\\nreturn handler(*args, **kwargs)\\n
File \\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\",
line 34, in __call__\\nreturn self.callable(*self.args,
**self.kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line 649,
in inner\\nret = func(*args, **kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line 842,
in wrapper\\nreturn func(*vpath, **params)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/mgr_modules.py\\", line
35, in list\\nobj[\'enabled\'] = True\\nTypeError: \'NoneType\'
object does not support item assignment\\n"}']


Anyone got the same problems after adding new manager nodes? Is there
something I'm missing here?

Thanks!
---
Alex Cucu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: regurlary 'no space left on device' when deleting on cephfs

2019-09-10 Thread Kenneth Waegeman

We sync the file system without preserving hard links. But we take 
snapshots after each sync, so I guess deleting files which are still in 
snapshots can also be in the stray directories?


[root@mds02 ~]# ceph daemon mds.mds02 perf dump | grep -i 'stray\|purge'
    "finisher-PurgeQueue": {
    "num_strays": 990153,
    "num_strays_delayed": 32,
    "num_strays_enqueuing": 0,
    "strays_created": 753278,
    "strays_enqueued": 650603,
    "strays_reintegrated": 0,
    "strays_migrated": 0,


num_strays is indeed close to a million


On 10/09/2019 12:42, Burkhard Linke wrote:

Hi,


do you use hard links in your workload? The 'no space left on device' 
message may also refer to too many stray files. Strays are either 
files that are to be deleted (e.g. the purge queue), but also files 
which are deleted, but hard links are still pointing to the same 
content. Since cephfs does not use an indirect layer between inodes 
and data, and the data chunks are named after the inode id, removing 
the original file will leave stray entries since cephfs is not able to 
rename the underlying rados objects.



There are 10 hidden directories for stray files, and given a maximum 
size of 100.000 entries you can store only up to 1 million entries. I 
don't know exactly how entries are distributed among the 10 
directories, so the limit may be reached earlier for a single stray 
directory. The performance counters contains some values for stray, so 
they are easy to check. The daemonperf output also shows the current 
value.



The problem of the upper limit of directory entries was solved by 
directory fragmentation, so you should check whether fragmentation is 
allowed in your filesystem. You can also try to increase the upper 
directory entry limit, but this might lead to other problems (too 
large rados omap objects).



Regards,

Burkhard



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph FS not releasing space after file deletion

2019-09-10 Thread Patrick Donnelly

On Tue, Sep 3, 2019 at 3:39 PM Guilherme  wrote:
>
> Dear CEPHers,
> Adding some comments to my colleague's post: we are running Mimic 13.2.6  and 
> struggling with 2 issues (that might be related):
> 1) After a "lack of space" event we've tried to remove a 40TB file. The file 
> is not there anymore, but no space was released. No process is using the file 
> either.
> 2) There are many files in /lost+found (~25TB|~5%). Every time we try to 
> remove a file, MDS crashes ([1,2]).

Might be related to this ticket: https://tracker.ceph.com/issues/38452

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] [nautilus] Dashboard & RADOSGW

2019-09-10 Thread DHilsbos

All;

We're trying to add a RADOSGW instance to our new production cluster, and it's 
not showing in the dashboard, or in ceph -s.

The cluster is running 14.2.2, and the RADOSGW got 14.2.3.

systemctl status ceph-radosgw@ rgw.s700037 returns: active (running).

ss -ntlp does NOT show port 80.

Here's the ceph.conf on the system:
[global]
fsid = effc5134-e0cc-4628-a079-d67b60071f90
mon initial members = s700034,s700035,s700036
mon host = 
[v1:10.0.80.10:6789/0,v2:10.0.80.10:3300/0],[v1:10.0.80.11:6789/0,v2:10.0.80.11:3300/0],[v1:10.0.80.12:6789/0,v2:10.0.80.12:3300/0]
public network = 10.0.80.0/24
cluster network = 10.0.88.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 8
osd pool default pgp num = 8

[client.rgw.s700037]
host = s700037.performair.local
rgw frontends = "civetweb port=80"
rgw dns name = radosgw.performair.local

Any thoughts on what I'm missing?

I'm also seeing these in the manager's logs:
2019-09-10 15:49:43.946 7efe6eee1700  0 mgr[dashboard] [10/Sep/2019:15:49:43] 
ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", 
line 1837, in start
self.tick()
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", 
line 1902, in tick
s, ssl_env = self.ssl_adapter.wrap(s)
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/ssl_builtin.py", 
line 52, in wrap
keyfile=self.private_key, ssl_version=ssl.PROTOCOL_SSLv23)
  File "/usr/lib64/python2.7/ssl.py", line 934, in wrap_socket
ciphers=ciphers)
  File "/usr/lib64/python2.7/ssl.py", line 609, in __init__
self.do_handshake()
  File "/usr/lib64/python2.7/ssl.py", line 831, in do_handshake
self._sslobj.do_handshake()
SSLError: [SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate 
unknown (_ssl.c:618)

Thoughts on this?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Manager plugins issues on new ceph-mgr nodes

2019-09-10 Thread DHilsbos

Alexander;

What is your operating system?

Is it possible that the dashboard module isn't installed?

I've run into "Error ENOENT: all mgr daemons do not support module 'dashboard'" 
on my CentOS 7 machines, where the module is a separate package (I had to use 
"yum install ceph-mgr-dashboard" to get the dashboard module). 

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Alexandru Cucu [mailto:m...@alexcucu.ro] 
Sent: Tuesday, September 10, 2019 5:23 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Manager plugins issues on new ceph-mgr nodes

Hello,

Running 14.2.3, updated from 14.2.1.
Until recently I've had ceph-mgr collocated with OSDs. I've installed
ceph-mgr on separate servers and everything looks OK in Ceph status
but there are multiple issues:

1. Dashboard only runs on old mgr servers. Tried restarting the
daemons and disable/enable the dashboard plugin. New mgr won't listen
on the dashboard port.
2. To (re)enable the dashboard plugin I had to use "--force"
# ceph mgr module enable dashboard
Error ENOENT: all mgr daemons do not support module 'dashboard',
pass --force to force enablement
3. When accessing the Cluster -> Manager modules menu in the dashboard
I get a 500 error message. The exact error below:


2019-09-10 15:01:39.270 7fb6d4916700  0 mgr[dashboard]
[10/Sep/2019:15:01:39] HTTP Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py", line
656, in respond
response.body = self.handler()
  File "/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py",
line 188, in __call__
self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/cherrypy/_cptools.py", line
221, in wrap
return self.newhandler(innerfunc, *args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 88,
in dashboard_exception_handler
return handler(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py",
line 34, in __call__
return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line
649, in inner
ret = func(*args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line
842, in wrapper
return func(*vpath, **params)
  File "/usr/share/ceph/mgr/dashboard/controllers/mgr_modules.py",
line 35, in list
obj['enabled'] = True
TypeError: 'NoneType' object does not support item assignment

2019-09-10 15:01:39.271 7fb6d4916700  0 mgr[dashboard]
[:::192.168.15.55:54860] [GET] [500] [0.014s] [admin] [1.3K]
/api/mgr/module
2019-09-10 15:01:39.272 7fb6d4916700  0 mgr[dashboard] ['{"status":
"500 Internal Server Error", "version": "3.2.2", "detail": "The server
encountered an unexpected condition which prevented it from fulfilling
the request.", "traceback": "Traceback (most recent call last):\\n
File \\"/usr/lib/python2.7/site-packages/cherrypy/_cprequest.py\\",
line 656, in respond\\nresponse.body = self.handler()\\n  File
\\"/usr/lib/python2.7/site-packages/cherrypy/lib/encoding.py\\", line
188, in __call__\\nself.body = self.oldhandler(*args, **kwargs)\\n
 File \\"/usr/lib/python2.7/site-packages/cherrypy/_cptools.py\\",
line 221, in wrap\\nreturn self.newhandler(innerfunc, *args,
**kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/services/exception.py\\", line 88, in
dashboard_exception_handler\\nreturn handler(*args, **kwargs)\\n
File \\"/usr/lib/python2.7/site-packages/cherrypy/_cpdispatch.py\\",
line 34, in __call__\\nreturn self.callable(*self.args,
**self.kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line 649,
in inner\\nret = func(*args, **kwargs)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/__init__.py\\", line 842,
in wrapper\\nreturn func(*vpath, **params)\\n  File
\\"/usr/share/ceph/mgr/dashboard/controllers/mgr_modules.py\\", line
35, in list\\nobj[\'enabled\'] = True\\nTypeError: \'NoneType\'
object does not support item assignment\\n"}']


Anyone got the same problems after adding new manager nodes? Is there
something I'm missing here?

Thanks!
---
Alex Cucu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [nautilus] Dashboard & RADOSGW

2019-09-10 Thread DHilsbos

All;

I found the problem, it was an identity issue.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: dhils...@performair.com [mailto:dhils...@performair.com] 
Sent: Tuesday, September 10, 2019 3:52 PM
To: ceph-users@ceph.io
Cc: Stephen Self
Subject: [ceph-users] [nautilus] Dashboard & RADOSGW

All;

We're trying to add a RADOSGW instance to our new production cluster, and it's 
not showing in the dashboard, or in ceph -s.

The cluster is running 14.2.2, and the RADOSGW got 14.2.3.

systemctl status ceph-radosgw@ rgw.s700037 returns: active (running).

ss -ntlp does NOT show port 80.

Here's the ceph.conf on the system:
[global]
fsid = effc5134-e0cc-4628-a079-d67b60071f90
mon initial members = s700034,s700035,s700036
mon host = 
[v1:10.0.80.10:6789/0,v2:10.0.80.10:3300/0],[v1:10.0.80.11:6789/0,v2:10.0.80.11:3300/0],[v1:10.0.80.12:6789/0,v2:10.0.80.12:3300/0]
public network = 10.0.80.0/24
cluster network = 10.0.88.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 8
osd pool default pgp num = 8

[client.rgw.s700037]
host = s700037.performair.local
rgw frontends = "civetweb port=80"
rgw dns name = radosgw.performair.local

Any thoughts on what I'm missing?

I'm also seeing these in the manager's logs:
2019-09-10 15:49:43.946 7efe6eee1700  0 mgr[dashboard] [10/Sep/2019:15:49:43] 
ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", 
line 1837, in start
self.tick()
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", 
line 1902, in tick
s, ssl_env = self.ssl_adapter.wrap(s)
  File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/ssl_builtin.py", 
line 52, in wrap
keyfile=self.private_key, ssl_version=ssl.PROTOCOL_SSLv23)
  File "/usr/lib64/python2.7/ssl.py", line 934, in wrap_socket
ciphers=ciphers)
  File "/usr/lib64/python2.7/ssl.py", line 609, in __init__
self.do_handshake()
  File "/usr/lib64/python2.7/ssl.py", line 831, in do_handshake
self._sslobj.do_handshake()
SSLError: [SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate 
unknown (_ssl.c:618)

Thoughts on this?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: regurlary 'no space left on device' when deleting on cephfs

2019-09-10 Thread Yan, Zheng

On Wed, Sep 11, 2019 at 6:51 AM Kenneth Waegeman
 wrote:
>
> We sync the file system without preserving hard links. But we take
> snapshots after each sync, so I guess deleting files which are still in
> snapshots can also be in the stray directories?
>
> [root@mds02 ~]# ceph daemon mds.mds02 perf dump | grep -i 'stray\|purge'
>  "finisher-PurgeQueue": {
>  "num_strays": 990153,
>  "num_strays_delayed": 32,
>  "num_strays_enqueuing": 0,
>  "strays_created": 753278,
>  "strays_enqueued": 650603,
>  "strays_reintegrated": 0,
>  "strays_migrated": 0,
>
>
> num_strays is indeed close to a million
>
>

The issue is related to snapshot. snap inodes stray in stray
directory. I suggest deleting some old snapshots

> On 10/09/2019 12:42, Burkhard Linke wrote:
> > Hi,
> >
> >
> > do you use hard links in your workload? The 'no space left on device'
> > message may also refer to too many stray files. Strays are either
> > files that are to be deleted (e.g. the purge queue), but also files
> > which are deleted, but hard links are still pointing to the same
> > content. Since cephfs does not use an indirect layer between inodes
> > and data, and the data chunks are named after the inode id, removing
> > the original file will leave stray entries since cephfs is not able to
> > rename the underlying rados objects.
> >
> >
> > There are 10 hidden directories for stray files, and given a maximum
> > size of 100.000 entries you can store only up to 1 million entries. I
> > don't know exactly how entries are distributed among the 10
> > directories, so the limit may be reached earlier for a single stray
> > directory. The performance counters contains some values for stray, so
> > they are easy to check. The daemonperf output also shows the current
> > value.
> >
> >
> > The problem of the upper limit of directory entries was solved by
> > directory fragmentation, so you should check whether fragmentation is
> > allowed in your filesystem. You can also try to increase the upper
> > directory entry limit, but this might lead to other problems (too
> > large rados omap objects).
> >
> >
> > Regards,
> >
> > Burkhard
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph FS not releasing space after file deletion

2019-09-10 Thread Yan, Zheng

On Wed, Sep 4, 2019 at 6:39 AM Guilherme  wrote:
>
> Dear CEPHers,
> Adding some comments to my colleague's post: we are running Mimic 13.2.6  and 
> struggling with 2 issues (that might be related):
> 1) After a "lack of space" event we've tried to remove a 40TB file. The file 
> is not there anymore, but no space was released. No process is using the file 
> either.
> 2) There are many files in /lost+found (~25TB|~5%). Every time we try to 
> remove a file, MDS crashes ([1,2]).
>
> The Dennis Kramer's case [3] led me to believe that I need to recreate the FS.
> But I refuse to (dis)believe that CEPH hasn't a  repair tool for it.
> I thought   "cephfs-table-tool take_inos" could be the answer for my problem, 
> but the message [4] were not clear enough.
> Can I run the command without resetting the inodes?
>
> [1] Error at ceph -w - https://pastebin.com/imNqBdmH
> [2] Error at mds.log - https://pastebin.com/rznkzLHG

For the mds crash issue. 'cephfs-data-scan scan_link' of nautilus
version (14.2.2) should fix it.
snaptable.  You don't need to upgrade whole cluster. Just install nautilus in a
temp machine or compile ceph from source.

> [3] Discussion - 
> http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2018-July/027845.html
> [4] Discussion - 
> http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2018-July/027935.html
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster

2019-09-10 Thread vladimir franciz blando

Thanks for the added info, appreciate it.

- Vlad


On Tue, Sep 10, 2019 at 5:37 PM Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> We have a single ceph cluster used by 2 openstack installations.
>
> We use different ceph pools for the 2 openstack clusters.
> For nova, cinder and glance this is straightforward.
>
> It was a bit more complicated fo radosgw. In this case the setup I used
> was:
>
> - creating 2 realms (one for each cloud)
> - creating one zonegroup for each realm
> - creating one zone for each zonegroup
> - having 1 ore more rgw instances for each zone
>
> I don't know if there are simpler approaches
>
> Cheers, Massimo
>
> On Tue, Sep 10, 2019 at 11:20 AM Wesley Peng  wrote:
>
>>
>>
>> on 2019/9/10 17:14, vladimir franciz blando wrote:
>> > I have 2 OpenStack environment that I want to integrate to an existing
>> > ceph cluster.  I know technically it can be done but has anyone tried
>> this?
>> >
>>
>> Sure you can. Ceph could be deployed as separate storage service,
>> openstack is just its customer. You can have one customer, but also can
>> have multi-customers for Ceph service.
>>
>> regards.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
ᐧ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: vfs_ceph and permissions

[ceph-users] 2 OpenStack environment, 1 Ceph cluster

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster

[ceph-users] Host failure trigger " Cannot allocate memory"

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster [EXT]

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster [EXT]

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

[ceph-users] Re: regurlary 'no space left on device' when deleting on cephfs

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

[ceph-users] Manager plugins issues on new ceph-mgr nodes

[ceph-users] Re: regurlary 'no space left on device' when deleting on cephfs

[ceph-users] Re: Ceph FS not releasing space after file deletion

[ceph-users] [nautilus] Dashboard & RADOSGW

[ceph-users] Re: Manager plugins issues on new ceph-mgr nodes

[ceph-users] Re: [nautilus] Dashboard & RADOSGW

[ceph-users] Re: regurlary 'no space left on device' when deleting on cephfs

[ceph-users] Re: Ceph FS not releasing space after file deletion

[ceph-users] Re: 2 OpenStack environment, 1 Ceph cluster

21 matches

Site Navigation

Mail list logo

Footer information