[ceph-users] Issue in the Epson interface? Find uphold from Epson Printer Support.

2020-09-21 Thread mary smith
In the event that you face an interface issue in your gadget, by then you 
should reboot it. Regardless, on the off chance that the issue doesn't get 
unraveled, by then you can take help from the tech consultancies and utilize 
their investigating procedures. You can also contact the Epson Printer Support 
for finding support. https://www.epsonprintersupportpro.net/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Record recuperation mistake prompting Unlock Yahoo Account? Contact help group.

2020-09-21 Thread mary smith
The record recuperation mistake can make a disappointment Unlock Yahoo Account. 
This issue can be settled by utilizing the assist that with canning be found in 
the tech consultancies. What's more, you can generally visit Youtube and take a 
gander at some tech vids that will help you in managing the issue. 
https://www.customercare-email.com/blog/unlock-my-yahoo-account/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setting up a small experimental CEPH network

2020-09-21 Thread Anthony D'Atri
Depending what you’re looking to accomplish, setting up a cluster in VMs 
(VirtualBox, Fusion, cloud provider, etc) may meet your needs without having to 
buy anything.

> 
> - Don't think having a few 1Gbit can replace a >10Gbit. Ceph doesn't use 
> such bonds optimal. I already asked about this years ago. Having a 10Gbe 
> might make a SBC solution more costly than estimated.

10GE is out of scope for sure, but a note about bonding:  more than once I’ve 
seen NICs bonded improperly, or tested improperly.

* Bonding doesn’t make a single faster interface - you need multiple clients to 
be able to utilize it well
* If the hash policy isn’t set right, it’s easy to have *all* of your traffic 
routed over a single link, and your throughput will be capped at half what you 
expect
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Solve the issue of time scarcity via assignment help in Kuwait

2020-09-21 Thread itsmonikaa1
Students can get assignment help from professional UK writers here at 
https://www.theacademicpapers.co.uk/ , it is offering its best writing services 
since 2011 in multiple academic fields.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setting up a small experimental CEPH network

2020-09-21 Thread Marc Roos
 
I tested something in the past[1] where I could notice that an osd 
staturated a bond link and did not use the available 2nd one. I think I 
maybe made a mistake in writing down it was a 1x replicated pool. 
However it has been written here multiple times that these osd processes 
are single thread, so afaik they cannot use more than on link, and at 
the moment your osd has a saturated link, your clients will notice this.


[1]
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg35474.html



-Original Message-
From: Lindsay Mathieson [mailto:lindsay.mathie...@gmail.com] 
Sent: maandag 21 september 2020 2:42
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Setting up a small experimental CEPH network

On 21/09/2020 5:40 am, Stefan Kooman wrote:
> My experience with bonding and Ceph is pretty good (OpenvSwitch). Ceph 

> uses lots of tcp connections, and those can get shifted (balanced) 
> between interfaces depending on load.

Same here - I'm running 4*1GB (LACP, Balance-TCP) on a 5 node cluster 
with 19 OSD's. 20 Active VM's and it idles at under 1 MiB/s, spikes up 
to 100MiB/s no problem. When doing a heavy rebalance/repair data rates 
on any one node can hit 400MiBs+


It scales out really well.

--
Lindsay
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is the advice, one disk per OSD, or multiple disks

2020-09-21 Thread Robert Sander
On 21.09.20 14:29, Kees Bakker wrote:

> Being new to CEPH, I need some advice how to setup a cluster.
> Given a node that has multiple disks, should I create one OSD for
> all disks, or is it better to have one OSD per disk.

The general rule is one OSD per disk.

There may be an exception with very fast devices like NVMe where one OSD
is not able to fully use the available IO bandwidth. NVMes can have two
OSDs per device.

But you would not create one OSD over multiple devices.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph 14.2.8 tracing ceph with blking compile error

2020-09-21 Thread 陈晓波
I want to trace a request in ceph processing in 14.2.8 version.
I open the blkin with do_cmake.sh -DWITH_BLKIN=ON, when compile

there is a  error in below:

../lib/libblkin.a(tp.c.o): undefined reference to symbol 'lttng_probe_register'
//lib64/liblttng-ust.so.0: error adding symbols: DSO missing from command line

collect2: error: ld returned 1 exit status
make[2]: *** [src/CMakeFiles/ceph-osd.dir/build.make:130: bin/ceph-osd] Error 1
make[1]: *** Waiting for unfinished jobs...





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Understanding what ceph-volume does, with bootstrap-osd/ceph.keyring, tmpfs

2020-09-21 Thread Marc Roos



When I create a new encrypted osd with ceph volume[1] 

I assume something like this is being done, please correct what is 
wrong.

- it creates the pv on the block device
- it creates the ceph vg on the block device
- it creates the osd lv in the vg
- it uses cryptsetup to encrypt this lv
  (or is there some internal support for luks in lvm?)
- it sets all the tags on the vg (shown by: lvs -o lv_tags vg)
- it creates and enables ceph-volume@lvm-osdid-osdfsid
- it creates and enables ceph-osd@osdid

When a node is restarted, these lvm osds are started with
- running ceph-volume@lvm-osdid-osdfsid (creating this tmpfs mount?)
- running ceph-osd@osdid


Q1: I had to create bootstrap-osd/ceph.keyring (ownership root.root). 
For what is that being used? Does it need to exist upon node restart?

Q2: I had some issues with a node starting, solving this with adding a 
nofail to the fstab. How is this done with ceph-volume?

Q3: Why these strange permissions on the mounted folder? 
drwxrwxrwt  2 ceph ceph 340 Sep 19 15:24 ceph-40

Q4: Where is this luks passphrase stored?

Q5: Where does this tmpfs+content come from? How can I mount this myself 
from the command line?

Q6: My lvm tags show ceph.crush_device_class=None, while ceph osd tree 
shows the correct class. Is this correct?

Q7: I saw in my ceph-volume output sometimes 'disabling cephx', what 
does this mean? How can I verify this and fix it?

Links to manuals are also welcome, these ceph-volume[2] are not to clear 
about this.


[1]
ceph-volume lvm create --data /dev/sdk --dmcrypt

[2]
https://docs.ceph.com/en/latest/ceph-volume/lvm/activate/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is the advice, one disk per OSD, or multiple disks

2020-09-21 Thread Wout van Heeswijk
Just to expand on the answer of Robert.

If all devices are of the same class (hdd/ssd/nvme) then a one on one 
relationship is most likely the best choice.

If you have very fast devices it might be good the have multiple OSDs on one 
devices, at the cost of some complexity.

If you have devices of multiple classes (hdds and ssds for example) it might be 
a good idea to offload some of the OSDs internal data onto the faster devices. 
This is done by offloading the write ahead log and/or the rocksdb to the faster 
device.

Kind regards,

Wout
42on


From: Robert Sander 
Sent: Monday, September 21, 2020 3:09 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: What is the advice, one disk per OSD, or multiple 
disks

On 21.09.20 14:29, Kees Bakker wrote:

> Being new to CEPH, I need some advice how to setup a cluster.
> Given a node that has multiple disks, should I create one OSD for
> all disks, or is it better to have one OSD per disk.

The general rule is one OSD per disk.

There may be an exception with very fast devices like NVMe where one OSD
is not able to fully use the available IO bandwidth. NVMes can have two
OSDs per device.

But you would not create one OSD over multiple devices.

Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setting up a small experimental CEPH network

2020-09-21 Thread Anthony D'Atri

> we use heavily bonded interfaces (6x10G) and also needed to look at this 
> balancing question. We use LACP bonding and, while the host OS probably tries 
> to balance outgoing traffic over all NICs

> I tested something in the past[1] where I could notice that an osd
> staturated a bond link and did not use the available 2nd one.

This is exactly what I wrote about, and it doesn’t have to be this way.

When using Linux bonding, be sure to set the xmit hash policy to layer3+4 and 
the mode on both sides to active/active.  

active/backup cuts your potential bandwidth, and a layer 1 / config problem on 
the backup link will be latent until you need it most, eg. when you do switch 
maintenance and assume that your bonds will all failover for continuity.  


> However it has been written here multiple times that these osd processes
> are single thread, so afaik they cannot use more than on link, and at
> the moment your osd has a saturated link, your clients will notice this.

Threads have nothing to do with links. Even if they did, real clusters have 
multiple OSDs per node, right?

> [1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg35474.html


Context, brother, context.

"1 osd per node cluster” 

“This is typical for a 'single line of communication' using lacp. Afaik 
the streams to the nodes are independent from each other anyway, so 
maybe it is possible to 'fork' the transmitting process, so linux can 
detect it as a separate stream and thus use the other link.”


This is NOT typical of production, WDL’s microserver experiment notwithstanding.


In real life, you’re going to have, what, at least 8 OSDs per node?  Each with 
streams to multiple clients (and other OSDs).  With dozens/hundreds/thousands 
of streams and a proper bonding (or equal-cost routing) setup, the *streams* 
are going to be hashed across available links by the bonding driver.


> 
> 
> 
> -Original Message-
> From: Lindsay Mathieson [mailto:lindsay.mathie...@gmail.com]
> Sent: maandag 21 september 2020 2:42
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: Setting up a small experimental CEPH network
> 
> On 21/09/2020 5:40 am, Stefan Kooman wrote:
>> My experience with bonding and Ceph is pretty good (OpenvSwitch). Ceph
> 
>> uses lots of tcp connections, and those can get shifted (balanced)
>> between interfaces depending on load.
> 
> Same here - I'm running 4*1GB (LACP, Balance-TCP) on a 5 node cluster
> with 19 OSD's. 20 Active VM's and it idles at under 1 MiB/s, spikes up
> to 100MiB/s no problem. When doing a heavy rebalance/repair data rates
> on any one node can hit 400MiBs+
> 
> 
> It scales out really well.
> 
> --
> Lindsay
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Troubleshooting stuck unclean PGs?

2020-09-21 Thread Matt Larson
Hi,

 Our Ceph cluster is reporting several PGs that have not been scrubbed
or deep scrubbed in time. It is over a week for these PGs to have been
scrubbed. When I checked the `ceph health detail`, there are 29 pgs
not deep-scrubbed in time and 22 pgs not scrubbed in time. I tried to
manually start a scrub on the PGs, but it appears that they are
actually in an unclean state that needs to be resolved first.

This is a cluster running:
 ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)

 Following the information at [Troubleshooting
PGs](https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/),
I checked for PGs that are stuck stale | inactive | unclean. There
were no PGs that are stale or inactive, but there are several that are
stuck unclean:

 ```
PG_STAT  STATE  UP
   UP_PRIMARY  ACTINGACTING_PRIMARY
8.3c active+remapped+backfill_wait
[124,41,108,8,87,16,79,157,49] 124
[139,57,16,125,154,65,109,86,45] 139
8.3e active+remapped+backfill_wait
[108,2,58,146,130,29,37,66,118] 108
[127,92,24,50,33,6,130,66,149] 127
8.3f active+remapped+backfill_wait
[19,34,86,132,59,78,153,99,6]  19
[90,45,147,4,105,61,30,66,125]  90
8.40 active+remapped+backfill_wait
[19,131,80,76,42,101,61,3,144]  19
[28,106,132,3,151,36,65,60,83]  28
8.3a   active+remapped+backfilling
[32,72,151,30,103,131,62,84,120]  32
[91,60,7,133,101,117,78,20,158]  91
8.7e active+remapped+backfill_wait
[108,2,58,146,130,29,37,66,118] 108
[127,92,24,50,33,6,130,66,149] 127
8.3b active+remapped+backfill_wait
[34,113,148,63,18,95,70,129,13]  34
[66,17,132,90,14,52,101,47,115]  66
8.7f active+remapped+backfill_wait
[19,34,86,132,59,78,153,99,6]  19
[90,45,147,4,105,61,30,66,125]  90
8.78 active+remapped+backfill_wait
[96,113,159,63,29,133,73,8,89]  96
[138,121,15,103,55,41,146,69,18] 138
8.7d   active+remapped+backfilling
[0,90,60,124,159,19,71,101,135]   0
[150,72,124,129,63,10,94,29,41] 150
8.7c active+remapped+backfill_wait
[124,41,108,8,87,16,79,157,49] 124
[139,57,16,125,154,65,109,86,45] 139
8.79 active+remapped+backfill_wait
[59,15,41,82,131,20,73,156,113]  59
[13,51,120,102,29,149,42,79,132]  13
```

If I query one of the PGs that is backfilling, 8.3a, it shows it's state as :
"recovery_state": [
{
"name": "Started/Primary/Active",
"enter_time": "2020-09-19T20:45:44.027759+",
"might_have_unfound": [],
"recovery_progress": {
"backfill_targets": [
"30(3)",
"32(0)",
"62(6)",
"72(1)",
"84(7)",
"103(4)",
"120(8)",
"131(5)",
"151(2)"
],

Q1: Is there anything that I should check/fix to enable the PGs to
resolve from the `unclean` state?
Q2: I have also seen that the podman containers on one of our OSD
servers are taking large amounts of disk space. Is there a way to
limit the growth of disk space for podman containers, when
administering a Ceph cluster using `cephadm` tools? At last check, a
server running 16 OSDs and 1 MON is using 39G of disk space for its
running containers. Can restarting containers help to start with a
fresh slate or reduce the disk use?

Thanks,
  Matt



Matt Larson
Associate Scientist
Computer Scientist/System Administrator
UW-Madison Cryo-EM Research Center
433 Babcock Drive, Madison, WI 53706
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] What is the advice, one disk per OSD, or multiple disks

2020-09-21 Thread Kees Bakker
Hello,

Being new to CEPH, I need some advice how to setup a cluster.
Given a node that has multiple disks, should I create one OSD for
all disks, or is it better to have one OSD per disk.
-- 
Kees Bakker
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setting up a small experimental CEPH network

2020-09-21 Thread Frank Schilder
Hi all,

we use heavily bonded interfaces (6x10G) and also needed to look at this 
balancing question. We use LACP bonding and, while the host OS probably tries 
to balance outgoing traffic over all NICs, the real decision is made by the 
switches (incoming traffic). Our switches hash packets to a port by (source?) 
MAC address, meaning that it is not the number of TCP/IP connections that helps 
balancing, but only the number of MAC addresses. In an LACP bond, all NICs have 
the same MAC address and balancing happens by (physical) host. The more hosts, 
the better it will work.

In a way, for us this is a problem and not at the same time. We have about 550 
physical clients (an HPC cluster) and 12 OSD hosts, which means that we 
probably have a good load on every single NIC for client traffic.

On the other hand, rebalancing between 12 servers is unlikely to use all NICs 
effectively. So far, we don't have enough disks per host to notice that, but it 
could become visible at some point. Basically, the host with the worst 
switch-sided hashing for incoming traffic will become the bottleneck.

On some switches the hashing method for LACP bonds can be configured, however, 
not with much detail. I have not seen a possibility to use IP:PORT for hashing 
to a switch port.

I have no experience with bonding mode 6 (ALB) that might provide a 
per-connection hashing. Would be interested to hear how it performs.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc Roos 
Sent: 21 September 2020 11:08:55
To: ceph-users; lindsay.mathieson
Subject: [ceph-users] Re: Setting up a small experimental CEPH network

I tested something in the past[1] where I could notice that an osd
staturated a bond link and did not use the available 2nd one. I think I
maybe made a mistake in writing down it was a 1x replicated pool.
However it has been written here multiple times that these osd processes
are single thread, so afaik they cannot use more than on link, and at
the moment your osd has a saturated link, your clients will notice this.


[1]
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg35474.html



-Original Message-
From: Lindsay Mathieson [mailto:lindsay.mathie...@gmail.com]
Sent: maandag 21 september 2020 2:42
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Setting up a small experimental CEPH network

On 21/09/2020 5:40 am, Stefan Kooman wrote:
> My experience with bonding and Ceph is pretty good (OpenvSwitch). Ceph

> uses lots of tcp connections, and those can get shifted (balanced)
> between interfaces depending on load.

Same here - I'm running 4*1GB (LACP, Balance-TCP) on a 5 node cluster
with 19 OSD's. 20 Active VM's and it idles at under 1 MiB/s, spikes up
to 100MiB/s no problem. When doing a heavy rebalance/repair data rates
on any one node can hit 400MiBs+


It scales out really well.

--
Lindsay
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Mount CEPH-FS on multiple hosts with concurrent access to the same data objects?

2020-09-21 Thread René Bartsch
I'm new on the list,

so a "Hello" to all! :-)

We're planning a Proxmox-Cluster. The data-center operator advised to
use a virtual machine with NFS on top of a single CEPH-FS instance to
mount the shared CEPH-FS storage on multiple hosts/VMs.

As this NFS/CEPH-FS-VM could be a bottle-neck I was wondering if CEPH-
FS is capable to manage concurrent access and locking itself.

Is it possible to mount CEPH-FS instances on multiple hosts (e.g. /srv)
 all accessing the same data objects without data-loss or dead-locks by
concurrent access?

Will this perform better than a single NFS/CEPH-FS instance (VM)?

Thanx for any hint

Renne
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mount CEPH-FS on multiple hosts with concurrent access to the same data objects?

2020-09-21 Thread Wout van Heeswijk
Hi Rene,

Yes, cephfs is a good filesystem for concurrent writing. When using CephFS with 
ganesha you can even scale out.

It will perform better but why don't you mount CephFS inside the VM?

Kind regards,

Wout
42on


From: René Bartsch 
Sent: Monday, September 21, 2020 6:44 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Mount CEPH-FS on multiple hosts with concurrent access to 
the same data objects?

I'm new on the list,

so a "Hello" to all! :-)

We're planning a Proxmox-Cluster. The data-center operator advised to
use a virtual machine with NFS on top of a single CEPH-FS instance to
mount the shared CEPH-FS storage on multiple hosts/VMs.

As this NFS/CEPH-FS-VM could be a bottle-neck I was wondering if CEPH-
FS is capable to manage concurrent access and locking itself.

Is it possible to mount CEPH-FS instances on multiple hosts (e.g. /srv)
 all accessing the same data objects without data-loss or dead-locks by
concurrent access?

Will this perform better than a single NFS/CEPH-FS instance (VM)?

Thanx for any hint

Renne
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Troubleshooting stuck unclean PGs?

2020-09-21 Thread Wout van Heeswijk
Hi Matt,

The mon data can grow during when PGs are stuck unclean. Don't restart the mons.

You need to find out why your placement groups are "backfill_wait". Likely some 
of your OSDs are (near)full.

If you have space elsewhere you can use the ceph balancer module or reweighting 
of OSDs to rebalance data.

Scrubbing will continue once the PGs are "active+clean"

Kind regards,

Wout
42on


From: Matt Larson 
Sent: Monday, September 21, 2020 6:22 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Troubleshooting stuck unclean PGs?

Hi,

 Our Ceph cluster is reporting several PGs that have not been scrubbed
or deep scrubbed in time. It is over a week for these PGs to have been
scrubbed. When I checked the `ceph health detail`, there are 29 pgs
not deep-scrubbed in time and 22 pgs not scrubbed in time. I tried to
manually start a scrub on the PGs, but it appears that they are
actually in an unclean state that needs to be resolved first.

This is a cluster running:
 ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)

 Following the information at [Troubleshooting
PGs](https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/),
I checked for PGs that are stuck stale | inactive | unclean. There
were no PGs that are stale or inactive, but there are several that are
stuck unclean:

 ```
PG_STAT  STATE  UP
   UP_PRIMARY  ACTINGACTING_PRIMARY
8.3c active+remapped+backfill_wait
[124,41,108,8,87,16,79,157,49] 124
[139,57,16,125,154,65,109,86,45] 139
8.3e active+remapped+backfill_wait
[108,2,58,146,130,29,37,66,118] 108
[127,92,24,50,33,6,130,66,149] 127
8.3f active+remapped+backfill_wait
[19,34,86,132,59,78,153,99,6]  19
[90,45,147,4,105,61,30,66,125]  90
8.40 active+remapped+backfill_wait
[19,131,80,76,42,101,61,3,144]  19
[28,106,132,3,151,36,65,60,83]  28
8.3a   active+remapped+backfilling
[32,72,151,30,103,131,62,84,120]  32
[91,60,7,133,101,117,78,20,158]  91
8.7e active+remapped+backfill_wait
[108,2,58,146,130,29,37,66,118] 108
[127,92,24,50,33,6,130,66,149] 127
8.3b active+remapped+backfill_wait
[34,113,148,63,18,95,70,129,13]  34
[66,17,132,90,14,52,101,47,115]  66
8.7f active+remapped+backfill_wait
[19,34,86,132,59,78,153,99,6]  19
[90,45,147,4,105,61,30,66,125]  90
8.78 active+remapped+backfill_wait
[96,113,159,63,29,133,73,8,89]  96
[138,121,15,103,55,41,146,69,18] 138
8.7d   active+remapped+backfilling
[0,90,60,124,159,19,71,101,135]   0
[150,72,124,129,63,10,94,29,41] 150
8.7c active+remapped+backfill_wait
[124,41,108,8,87,16,79,157,49] 124
[139,57,16,125,154,65,109,86,45] 139
8.79 active+remapped+backfill_wait
[59,15,41,82,131,20,73,156,113]  59
[13,51,120,102,29,149,42,79,132]  13
```

If I query one of the PGs that is backfilling, 8.3a, it shows it's state as :
"recovery_state": [
{
"name": "Started/Primary/Active",
"enter_time": "2020-09-19T20:45:44.027759+",
"might_have_unfound": [],
"recovery_progress": {
"backfill_targets": [
"30(3)",
"32(0)",
"62(6)",
"72(1)",
"84(7)",
"103(4)",
"120(8)",
"131(5)",
"151(2)"
],

Q1: Is there anything that I should check/fix to enable the PGs to
resolve from the `unclean` state?
Q2: I have also seen that the podman containers on one of our OSD
servers are taking large amounts of disk space. Is there a way to
limit the growth of disk space for podman containers, when
administering a Ceph cluster using `cephadm` tools? At last check, a
server running 16 OSDs and 1 MON is using 39G of disk space for its
running containers. Can restarting containers help to start with a
fresh slate or reduce the disk use?

Thanks,
  Matt



Matt Larson
Associate Scientist
Computer Scientist/System Administrator
UW-Madison Cryo-EM Research Center
433 Babcock Drive, Madison, WI 53706
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setting up a small experimental CEPH network

2020-09-21 Thread Hans van den Bogert
Perhaps not SBCs, but I have 4x HP 6300s and have been running 
Kubernetes together with Ceph/Rook for more than 3 years. The HPs can be 
picked up around 80-120eu. I learned so much in 3 years, last time I had 
that was when I started using Linux. This was money well spent and still 
is, it runs nextcloud, home automation, Wifi controller (unifi 
controller), websites -- and all that safely on Ceph.


Also a big plus of those HP typical business desktops is Intel AMT. It's 
kind of a poor-man's Remote Management Console (similar-ish to iDRAC and 
iLO). This integrates nicely with, in my case Ubuntu MAAS. So this 
allows me to programmatically spin up one of those nodes. Or take a node 
out for maintenance.


The 4 nodes and network switch run around 100W which I think is pretty 
ok, but again, this is not SBC territory; though if you think the SBCs 
will run with a lot less power, think again. If you use spinners expect 
them to be the main power user in your setup. I never expected them to 
really do 6W when idle and that really adds up when you have multiple 
disks per node. If you don't need the sheer data capacity, choose SSDs. 
I migrated to SSDs for at least the OS disks and that got me into the 
100W baseline. Also take at least 8GB per node, including 1 OSD, and an 
extra 4GB for every extra OSD.


Hans
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Troubleshooting stuck unclean PGs?

2020-09-21 Thread Matt Larson
Hi Wout,

 None of the OSDs are greater than 20% full. However, only 1 PG is
backfilling at a time, while the others are backfill_wait. I had
recently added a large amount of data to the Ceph cluster, and this
may have caused the # of PGs to increase causing the need to rebalance
or move objects.

 It appears that I could increase the # of backfill operations that
happen simultaneously by increasing `osd_max_backfills` and/or
`osd_recovery_max_active`. It looks like I should maybe consider
increasing the number of max backfills happening at a time because the
overall io during the backfill is pretty small.

 Does this seem reasonable? If so, with Ceph Octopus/cephadm, how can
adjust the parameters?

 Thanks,
   Matt

On Mon, Sep 21, 2020 at 2:21 PM Wout van Heeswijk  wrote:
>
> Hi Matt,
>
> The mon data can grow during when PGs are stuck unclean. Don't restart the 
> mons.
>
> You need to find out why your placement groups are "backfill_wait". Likely 
> some of your OSDs are (near)full.
>
> If you have space elsewhere you can use the ceph balancer module or 
> reweighting of OSDs to rebalance data.
>
> Scrubbing will continue once the PGs are "active+clean"
>
> Kind regards,
>
> Wout
> 42on
>
> 
> From: Matt Larson 
> Sent: Monday, September 21, 2020 6:22 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Troubleshooting stuck unclean PGs?
>
> Hi,
>
>  Our Ceph cluster is reporting several PGs that have not been scrubbed
> or deep scrubbed in time. It is over a week for these PGs to have been
> scrubbed. When I checked the `ceph health detail`, there are 29 pgs
> not deep-scrubbed in time and 22 pgs not scrubbed in time. I tried to
> manually start a scrub on the PGs, but it appears that they are
> actually in an unclean state that needs to be resolved first.
>
> This is a cluster running:
>  ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus 
> (stable)
>
>  Following the information at [Troubleshooting
> PGs](https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/),
> I checked for PGs that are stuck stale | inactive | unclean. There
> were no PGs that are stale or inactive, but there are several that are
> stuck unclean:
>
>  ```
> PG_STAT  STATE  UP
>UP_PRIMARY  ACTINGACTING_PRIMARY
> 8.3c active+remapped+backfill_wait
> [124,41,108,8,87,16,79,157,49] 124
> [139,57,16,125,154,65,109,86,45] 139
> 8.3e active+remapped+backfill_wait
> [108,2,58,146,130,29,37,66,118] 108
> [127,92,24,50,33,6,130,66,149] 127
> 8.3f active+remapped+backfill_wait
> [19,34,86,132,59,78,153,99,6]  19
> [90,45,147,4,105,61,30,66,125]  90
> 8.40 active+remapped+backfill_wait
> [19,131,80,76,42,101,61,3,144]  19
> [28,106,132,3,151,36,65,60,83]  28
> 8.3a   active+remapped+backfilling
> [32,72,151,30,103,131,62,84,120]  32
> [91,60,7,133,101,117,78,20,158]  91
> 8.7e active+remapped+backfill_wait
> [108,2,58,146,130,29,37,66,118] 108
> [127,92,24,50,33,6,130,66,149] 127
> 8.3b active+remapped+backfill_wait
> [34,113,148,63,18,95,70,129,13]  34
> [66,17,132,90,14,52,101,47,115]  66
> 8.7f active+remapped+backfill_wait
> [19,34,86,132,59,78,153,99,6]  19
> [90,45,147,4,105,61,30,66,125]  90
> 8.78 active+remapped+backfill_wait
> [96,113,159,63,29,133,73,8,89]  96
> [138,121,15,103,55,41,146,69,18] 138
> 8.7d   active+remapped+backfilling
> [0,90,60,124,159,19,71,101,135]   0
> [150,72,124,129,63,10,94,29,41] 150
> 8.7c active+remapped+backfill_wait
> [124,41,108,8,87,16,79,157,49] 124
> [139,57,16,125,154,65,109,86,45] 139
> 8.79 active+remapped+backfill_wait
> [59,15,41,82,131,20,73,156,113]  59
> [13,51,120,102,29,149,42,79,132]  13
> ```
>
> If I query one of the PGs that is backfilling, 8.3a, it shows it's state as :
> "recovery_state": [
> {
> "name": "Started/Primary/Active",
> "enter_time": "2020-09-19T20:45:44.027759+",
> "might_have_unfound": [],
> "recovery_progress": {
> "backfill_targets": [
> "30(3)",
> "32(0)",
> "62(6)",
> "72(1)",
> "84(7)",
> "103(4)",
> "120(8)",
> "131(5)",
> "151(2)"
> ],
>
> Q1: Is there anything that I should check/fix to enable the PGs to
> resolve from the `unclean` state?
> Q2: I have also seen that the podman containers on one of our OSD
> servers are taking large amounts of disk space. Is there a way to
> limit the growth of disk space for podman containers, when
> administering a Ceph cluster using `cep

[ceph-users] Re: Troubleshooting stuck unclean PGs?

2020-09-21 Thread Matt Larson
I tried this:

`sudo ceph tell 'osd.*' injectargs '--osd-max-backfills 4'`

Which has increased to having 10 simultaneous backfills and a higher
10X higher rate of data movements. It looks like I could increase this
further by increasing the number of simultaneous recovery operations,
but changing that parameter to 20 didn't cause a change. The command
warned that OSDs may need to be restarted before this takes effect:

sudo ceph tell 'osd.*' injectargs '--osd-recovery-max-active 20'

I'll let it run overnight with a higher backfill rate and see if that
is sufficient to let the cluster catch up.

The commands are from
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023844.html)

-Matt

On Mon, Sep 21, 2020 at 7:20 PM Matt Larson  wrote:
>
> Hi Wout,
>
>  None of the OSDs are greater than 20% full. However, only 1 PG is
> backfilling at a time, while the others are backfill_wait. I had
> recently added a large amount of data to the Ceph cluster, and this
> may have caused the # of PGs to increase causing the need to rebalance
> or move objects.
>
>  It appears that I could increase the # of backfill operations that
> happen simultaneously by increasing `osd_max_backfills` and/or
> `osd_recovery_max_active`. It looks like I should maybe consider
> increasing the number of max backfills happening at a time because the
> overall io during the backfill is pretty small.
>
>  Does this seem reasonable? If so, with Ceph Octopus/cephadm, how can
> adjust the parameters?
>
>  Thanks,
>Matt
>
> On Mon, Sep 21, 2020 at 2:21 PM Wout van Heeswijk  wrote:
> >
> > Hi Matt,
> >
> > The mon data can grow during when PGs are stuck unclean. Don't restart the 
> > mons.
> >
> > You need to find out why your placement groups are "backfill_wait". Likely 
> > some of your OSDs are (near)full.
> >
> > If you have space elsewhere you can use the ceph balancer module or 
> > reweighting of OSDs to rebalance data.
> >
> > Scrubbing will continue once the PGs are "active+clean"
> >
> > Kind regards,
> >
> > Wout
> > 42on
> >
> > 
> > From: Matt Larson 
> > Sent: Monday, September 21, 2020 6:22 PM
> > To: ceph-users@ceph.io
> > Subject: [ceph-users] Troubleshooting stuck unclean PGs?
> >
> > Hi,
> >
> >  Our Ceph cluster is reporting several PGs that have not been scrubbed
> > or deep scrubbed in time. It is over a week for these PGs to have been
> > scrubbed. When I checked the `ceph health detail`, there are 29 pgs
> > not deep-scrubbed in time and 22 pgs not scrubbed in time. I tried to
> > manually start a scrub on the PGs, but it appears that they are
> > actually in an unclean state that needs to be resolved first.
> >
> > This is a cluster running:
> >  ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus 
> > (stable)
> >
> >  Following the information at [Troubleshooting
> > PGs](https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/),
> > I checked for PGs that are stuck stale | inactive | unclean. There
> > were no PGs that are stale or inactive, but there are several that are
> > stuck unclean:
> >
> >  ```
> > PG_STAT  STATE  UP
> >UP_PRIMARY  ACTINGACTING_PRIMARY
> > 8.3c active+remapped+backfill_wait
> > [124,41,108,8,87,16,79,157,49] 124
> > [139,57,16,125,154,65,109,86,45] 139
> > 8.3e active+remapped+backfill_wait
> > [108,2,58,146,130,29,37,66,118] 108
> > [127,92,24,50,33,6,130,66,149] 127
> > 8.3f active+remapped+backfill_wait
> > [19,34,86,132,59,78,153,99,6]  19
> > [90,45,147,4,105,61,30,66,125]  90
> > 8.40 active+remapped+backfill_wait
> > [19,131,80,76,42,101,61,3,144]  19
> > [28,106,132,3,151,36,65,60,83]  28
> > 8.3a   active+remapped+backfilling
> > [32,72,151,30,103,131,62,84,120]  32
> > [91,60,7,133,101,117,78,20,158]  91
> > 8.7e active+remapped+backfill_wait
> > [108,2,58,146,130,29,37,66,118] 108
> > [127,92,24,50,33,6,130,66,149] 127
> > 8.3b active+remapped+backfill_wait
> > [34,113,148,63,18,95,70,129,13]  34
> > [66,17,132,90,14,52,101,47,115]  66
> > 8.7f active+remapped+backfill_wait
> > [19,34,86,132,59,78,153,99,6]  19
> > [90,45,147,4,105,61,30,66,125]  90
> > 8.78 active+remapped+backfill_wait
> > [96,113,159,63,29,133,73,8,89]  96
> > [138,121,15,103,55,41,146,69,18] 138
> > 8.7d   active+remapped+backfilling
> > [0,90,60,124,159,19,71,101,135]   0
> > [150,72,124,129,63,10,94,29,41] 150
> > 8.7c active+remapped+backfill_wait
> > [124,41,108,8,87,16,79,157,49] 124
> > [139,57,16,125,154,65,109,86,45] 139
> > 8.79 active+remapped+backfill_wait
> > [59,15,41,82,131,20,73,156,113]  59
> > [13,51,120,102,29,149,42,79,132]  13
> > ```
> >
> > If I query one o

[ceph-users] Re: Understanding what ceph-volume does, with bootstrap-osd/ceph.keyring, tmpfs

2020-09-21 Thread Janne Johansson
Den mån 21 sep. 2020 kl 16:15 skrev Marc Roos :

> When I create a new encrypted osd with ceph volume[1]
>
> Q4: Where is this luks passphrase stored?
>

I think the OSD asks the mon for it after auth:ing, so "in the mon DBs"
somewhere.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mount CEPH-FS on multiple hosts with concurrent access to the same data objects?

2020-09-21 Thread Robert Sander
On 21.09.20 18:44, René Bartsch wrote:

> We're planning a Proxmox-Cluster. The data-center operator advised to
> use a virtual machine with NFS on top of a single CEPH-FS instance to
> mount the shared CEPH-FS storage on multiple hosts/VMs.

For what purpose do you plan to use CephFS?

Do you know that Proxmox is able to store VM images as RBD directly in a
Ceph cluster?

I would not recommend to store VM images as files on CephFS. Or even
exporting NFS out of a VM to store other VM images on it.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io