[ceph-users] How to change the pg numbers
Hi guys, I have a rbd pool pg_num 2048, I want to change it to 4096, how can I do this? If I change it directly to 4096, it may cause the client slow requests, What the better step size should be? Thanks, Kern ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to change the pg numbers
I don't think it might lead to more client slow requests if you set it to 4096 in one step, since there is a cap on how many recovery/backfill requests there can be per OSD at any given time. I am not sure though, but I am happy to be proved wrong by the senior members in this list :) Hans On 8/18/20 10:23 AM, norman wrote: Hi guys, I have a rbd pool pg_num 2048, I want to change it to 4096, how can I do this? If I change it directly to 4096, it may cause the client slow requests, What the better step size should be? Thanks, Kern ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs get full with bluestore logs
It says: FAILED assert(0 == "bluefs enospc") Could it be that the OSD disks you use are very very small? Den mån 17 aug. 2020 kl 20:26 skrev Khodayar Doustar : > Hi, > > I have a 3 node cluster of mimic with 9 osds (3 osds on each node). > I use this cluster to test integration of an application with S3 api. > > The problem is that after a few days all OSD starts filling up with > bluestore logs and goes down and out one by one! > I cannot stop the logs and I cannot find the setting to fix this leakage, > this should be a leakage in logs because it's not logical to fill up all > OSD with bluefs logs. > > This is an example of logs which is being repeated in bluestore logs: > > [root@server2 ~]# ceph-bluestore-tool --command bluefs-log-dump --path > /var/lib/ceph/osd/ceph-5 > . > . > > [root@server1 ~]# ceph osd df tree > ID CLASS WEIGHT REWEIGHT SIZE USE DATAOMAP META AVAIL %USE > VAR PGS TYPE NAME > -1 0.16727-0 B 0 B 0 B 0 B 0 B0 B0 >0 - root default > -3 0.05576-0 B 0 B 0 B 0 B 0 B0 B0 >0 - host server1 > 0 hdd 0.01859 1.00 B 0 B 0 B 0 B 0 B0 B0 >0 0 osd.0 > 1 hdd 0.0185900 B 0 B 0 B 0 B 0 B0 B0 >0 0 osd.1 > 2 hdd 0.0185900 B 0 B 0 B 0 B 0 B0 B0 >0 0 osd.2 > -5 0.05576- 19 GiB 1.4 GiB 360 MiB 3 KiB 1024 MiB 18 GiB0 >0 - host server2 > 3 hdd 0.01859 1.00 B 0 B 0 B 0 B 0 B0 B0 >0 0 osd.3 > 4 hdd 0.0185900 B 0 B 0 B 0 B 0 B0 B0 >0 0 osd.4 > 5 hdd 0.01859 1.0 19 GiB 1.4 GiB 360 MiB 3 KiB 1024 MiB 18 GiB 7.11 > 1.04 99 osd.5 > -7 0.05576-0 B 0 B 0 B 0 B 0 B0 B0 >0 - host server3 > 6 hdd 0.01859 1.0 19 GiB 1.2 GiB 249 MiB 3 KiB 1024 MiB 18 GiB 6.55 > 0.96 78 osd.6 > 7 hdd 0.01859 1.00 B 0 B 0 B 0 B 0 B0 B0 >0 0 osd.7 > 8 hdd 0.01859 1.00 B 0 B 0 B 0 B 0 B0 B0 >0 0 osd.8 > TOTAL 38 GiB 2.6 GiB 610 MiB 6 KiB 2.0 GiB 35 GiB 6.83 > > MIN/MAX VAR: 0/1.04 STDDEV: 5.58 > [root@server1 ~]# > > > I'm kind of newbie to ceph, so any help or hint would be appreciated. > Did I hit a bug or something is wrong with my configuration? > Make the disks larger, those sizes are far too small for any usable cluster, so I don't think that use case gets tested at all. The database preallocations, WAL and things OSDs create in order to be good for 100G -> 12-14-18TB drives makes them less useful for 0.018TB drives. I don't think the logs are the real problem, the OSD processes are crashing because you give them no room and then they log repeatedly that they can't restart because they are still out of space. -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to change the pg numbers
On 2020-08-18 11:13, Hans van den Bogert wrote: > I don't think it might lead to more client slow requests if you set it > to 4096 in one step, since there is a cap on how many recovery/backfill > requests there can be per OSD at any given time. > > I am not sure though, but I am happy to be proved wrong by the senior > members in this list :) Not sure if I qualify for senior, but here are my 2 cents ... I would argue that you do want to do this in one step. Doing this in multiple steps will trigger data movement every time you change pg_num (and pgp_num for that matter). Ceph will recalculate a new mapping every time you change the pg(p)_num for a pool (or by altering CRUSH rules). osd_recovery_max_active = 1 osd_max_backfills = 1 If your cluster can't handle this than I wonder what a disk / host failure would trigger. Some on this list would argue that you also want the following setting to avoid client IO starvation: ceph config set osd osd_op_queue_cut_off high This is already the default in Octopus. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] fio rados ioengine
Hi all, When testing with the fio rados ioengine is it necessary to run a write test with the no-cleanup option before running read tests like is required with rados bench? thx Frank ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] radosgw beast access logs
Are there any plans to add access logs to the beast frontend, in the same way we can get with civetweb? Increasing the "debug rgw" setting really doesn't provide the same thing. Graham -- Graham Allan - g...@umn.edu Associate Director of Operations - Minnesota Supercomputing Institute ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] why ceph-fuse init Objecter with osd_timeout = 0
Hi all I'm using the mimic-13.2.4 version and ceph-fuse as client. Recently I'm suffering a strange problem: on client machine we can see the tcp has ESTAB state to osd machine, but on osd machine nothing could be found. Then the client hangs on read/write requests from this osd So I'm trying to figure out why this happens, and search for configuration to set a timeout for osd requests. I noticed that osdc/Objecter.h has a osd_timeout field, which will be initiated when we create an Objecter. But in ceph-fuse, it creates an Objecter using a fixed value 0, meaning no timeout? StandaloneClient::StandaloneClient(Messenger *m, MonClient *mc, >boost::asio::io_context& ictx) > : Client(m, mc, new Objecter(m->cct, m, mc, ictx, 0, 0)) > Here are my questions: 1. Why ceph-fuse set osd_timeout to 0? 2. Do we have other configurations to let osd requests fail instead of hanging forever? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to change the pg numbers
A few years ago Dan van der Ster and I were working on two similar scripts for increasing pgs. Just have a look at the following link: https://github.com/cernceph/ceph-scripts/blob/master/tools/split/ceph-gentle-split ___ Clyso GmbH Am 18.08.2020 um 11:27 schrieb Stefan Kooman: On 2020-08-18 11:13, Hans van den Bogert wrote: I don't think it might lead to more client slow requests if you set it to 4096 in one step, since there is a cap on how many recovery/backfill requests there can be per OSD at any given time. I am not sure though, but I am happy to be proved wrong by the senior members in this list :) Not sure if I qualify for senior, but here are my 2 cents ... I would argue that you do want to do this in one step. Doing this in multiple steps will trigger data movement every time you change pg_num (and pgp_num for that matter). Ceph will recalculate a new mapping every time you change the pg(p)_num for a pool (or by altering CRUSH rules). osd_recovery_max_active = 1 osd_max_backfills = 1 If your cluster can't handle this than I wonder what a disk / host failure would trigger. Some on this list would argue that you also want the following setting to avoid client IO starvation: ceph config set osd osd_op_queue_cut_off high This is already the default in Octopus. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Alpine linux librados-dev missing
I am not sure if I should try this, but I was trying to build this dovecot-ceph-plugin plugin on alpine linux to create a nice small container image. However alpine linux does not seem to have librados-dev. Did any one do something similar, and have a workaround for this? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to change the pg numbers
Hans, I made a big change in my staging cluster before, I set a pool pg_num from 8 to 2048, it cased the cluster available for a long time :( On 18/8/2020 下午5:13, Hans van den Bogert wrote: I don't think it might lead to more client slow requests if you set it to 4096 in one step, since there is a cap on how many recovery/backfill requests there can be per OSD at any given time. I am not sure though, but I am happy to be proved wrong by the senior members in this list :) Hans On 8/18/20 10:23 AM, norman wrote: Hi guys, I have a rbd pool pg_num 2048, I want to change it to 4096, how can I do this? If I change it directly to 4096, it may cause the client slow requests, What the better step size should be? Thanks, Kern ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to change the pg numbers
Stefan, I agree with you about the crush rule, but I truely met the problem for the cluster, I set the values large for a quick recover: osd_recovery_max_active 16 osd_max_backfills 32 Is it a very bad setting? Kern On 18/8/2020 下午5:27, Stefan Kooman wrote: On 2020-08-18 11:13, Hans van den Bogert wrote: I don't think it might lead to more client slow requests if you set it to 4096 in one step, since there is a cap on how many recovery/backfill requests there can be per OSD at any given time. I am not sure though, but I am happy to be proved wrong by the senior members in this list :) Not sure if I qualify for senior, but here are my 2 cents ... I would argue that you do want to do this in one step. Doing this in multiple steps will trigger data movement every time you change pg_num (and pgp_num for that matter). Ceph will recalculate a new mapping every time you change the pg(p)_num for a pool (or by altering CRUSH rules). osd_recovery_max_active = 1 osd_max_backfills = 1 If your cluster can't handle this than I wonder what a disk / host failure would trigger. Some on this list would argue that you also want the following setting to avoid client IO starvation: ceph config set osd osd_op_queue_cut_off high This is already the default in Octopus. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] cephadm not working with non-root user
Hi, I am trying to install ceph 'octopus' using cephadm. In bootstrap command, I have specified a non-root user account as ssh-user. cephadm bootstrap --mon-ip xx.xxx.xx.xx --ssh-user non-rootuser when bootstrap about to complete it threw an error stating. INFO:cephadm:Non-zero exit code 2 from /usr/bin/podman run --rm --net=host --ipc=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=node1 - v /var/log/ceph/ae4ed114-e145-11ea-9c1f-0025900a8ebe:/var/log/ceph:z -v /tmp/ceph-tmpm22k9j9w:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpe 1ltigk8:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph docker.io/ceph/ceph:v15 orch host add node1 INFO:cephadm:/usr/bin/ceph:stderr Error ENOENT: Failed to connect to node1 (node1). INFO:cephadm:/usr/bin/ceph:stderr Check that the host is reachable and accepts connections using the cephadm SSH key INFO:cephadm:/usr/bin/ceph:stderr INFO:cephadm:/usr/bin/ceph:stderr you may want to run: INFO:cephadm:/usr/bin/ceph:stderr > ceph cephadm get-ssh-config > ssh_config INFO:cephadm:/usr/bin/ceph:stderr > ceph config-key get mgr/cephadm/ssh_identity_key > key INFO:cephadm:/usr/bin/ceph:stderr > ssh -F ssh_config -i key root@node1 " In the above steps, it's trying to connect as root to the node and when I downloaded ssh_config file it was also specified as 'root' inside. so, I modified the config file and uploaded but same to ceph but still ssh to node1 is not working. To confirm if I have used the right command been used during bootstrap. I have tried the below command. " ceph config-key dump mgr/cephadm/ssh_user" { "mgr/cephadm/ssh_user": "non-rootuser" } and the output shows the user I have used during bootstrap "non-rootuser" but at the same time when I run cmd " ceph cephadm get-user " the output still shows 'root' as the user. Why the change is not affecting? do anyone faced a similar issue in bootstrap? Is there any way to avoid using container with cephadm? regards Amudhan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to change the pg numbers
I set the values large for a quick recover: osd_recovery_max_active 16 osd_max_backfills 32 Is it a very bad setting? Only bad for the clients. ;-) As Stefan already advised, turn down these values to 1 and let the cluster rebalance slowly. If the client performance seems fine you can increase by 1 or so and see how it behaves. You'll have to find reasonable values for your specific setup to have a good mix between quick recovery without impacting client performance too much. Regards, Eugen Zitat von norman : Stefan, I agree with you about the crush rule, but I truely met the problem for the cluster, I set the values large for a quick recover: osd_recovery_max_active 16 osd_max_backfills 32 Is it a very bad setting? Kern On 18/8/2020 下午5:27, Stefan Kooman wrote: On 2020-08-18 11:13, Hans van den Bogert wrote: I don't think it might lead to more client slow requests if you set it to 4096 in one step, since there is a cap on how many recovery/backfill requests there can be per OSD at any given time. I am not sure though, but I am happy to be proved wrong by the senior members in this list :) Not sure if I qualify for senior, but here are my 2 cents ... I would argue that you do want to do this in one step. Doing this in multiple steps will trigger data movement every time you change pg_num (and pgp_num for that matter). Ceph will recalculate a new mapping every time you change the pg(p)_num for a pool (or by altering CRUSH rules). osd_recovery_max_active = 1 osd_max_backfills = 1 If your cluster can't handle this than I wonder what a disk / host failure would trigger. Some on this list would argue that you also want the following setting to avoid client IO starvation: ceph config set osd osd_op_queue_cut_off high This is already the default in Octopus. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] What to Do When Printer Goes Down? Contact Epson Customer Service.
Found your printer is down? Printer issues tend to frustrate users at times, especially when you need it the most. Now, you not need to panic. Epson Customer Service professionals can diagnose the problems and fix your printer problem in a matter of minutes. You just need to share your problems, the error code and other errors. They avail expertise and trained exclusively to provide instant services. https://www.epsonprintersupportpro.net/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] How To Reset HP Printer? Get In Touch With HP Support Assistant.
If you are not aware of the process of getting the real time process of resetting your HP printer, then you should fetch the technical backing directly from the HP Support Assistant. Here, you will be able to get the top-to-toe solutions under the supervision of the experts in a couple of seconds. https://www.amiytech.com/hp-support-assistant/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io