from:"Harry G. Coin"

[ceph-users] Upgrade from 19.2.2 to .3 pauses on 'phantom' duplicate osd?

2025-07-29 Thread Harry G Coin

Need a clue about what appears to be a phantom duplicate osd automagically created/discovered via the upgrade process -- which blocks the upgrade. The upgrade process on a known-good 19.2.2 to 19.2.3 proceeded normally through the mgrs and mons. It upgraded most of the osds, then stopped wit

[ceph-users] Re: squid 19.2.3 QE validation status

2025-07-07 Thread Harry G Coin

Do the qualification tests check whether ipv6 osds are now 'found' by the dashboard/healthchecks? Or are they still reported as all missing while nevertheless working normally? On 7/7/25 09:31, Yuri Weinstein wrote: Seeking approvals/reviews for: rados - Radek, Laura rgw- Adam Emerson fs - V

[ceph-users] Re: CephFS with Ldap

2025-06-30 Thread Harry G Coin

To get ldap working, we had to set up samba to manage the shares (it has the ability to do ldap auth, connecting the smb accounts to the linux ownership/permission space). It would be a very nice help if ceph would include a native, secondary ldap option, if only just for anything doing file o

[ceph-users] Re: 19.2.2: Warning,Smartctl has received an unknown argument (error code -22)

2025-05-31 Thread Harry G Coin

quot;name": "/dev/sdg", "info_name": "/dev/sdg [SAT]", "type": "sat", "protocol": "ATA" } } root@noc3:/# So, it's a puzzle. On 5/30/25 19:12, Anthony D'Atri wrote: Do you have 7.0+? That’s when JSON outpu

[ceph-users] 19.2.2: Warning,Smartctl has received an unknown argument (error code -22)

2025-05-30 Thread Harry G Coin

Using 19.2.2, we notice under cluster/osds/'device health' on the dashboard, for all osds no matter the server: Warning Smartctl has received an unknown argument (error code -22). You may be using an incompatible version of smartmontools. Version >= 7.0 of smartmontools is required to succ

[ceph-users] Re: Upgrade from 18.2.4 to 19.x or even 20

2025-04-21 Thread Harry G Coin

Frédéric's policy below I think is very good advice. The only reason to upgrade sooner than his advice is when you need a missing feature or fear hitting a fixed bug -- or just like living on the edge. On 4/19/25 16:45, Frédéric Nass wrote: Hi e3gh75 :-) There's no dumb questions. Here's a s

[ceph-users] Re: v19.2.2 Squid released

2025-04-10 Thread Harry G Coin

19.2.2 Installed! # ceph -s cluster: id: ,,, health: HEALTH_ERR 27 osds(s) are not reachable ... osd: 27 osds: 27 up (since 32m), 27 in (since 5w) ... It's such a 'bad look' something so visible, in such an often given command. 10/4/25 06:00 PM[ERR]osd.27's public a

[ceph-users] 19.2.1 dashboard OSD column sorts do nothing?

2025-03-21 Thread Harry G Coin

Has anyone else tried to change the sort order of columns in the cluster/osd display on 19.2.1? While the header changes to indicate 'increasing/descending' and 'selected', the rows stay fixed on an ascending order by ID. ??? ___ ceph-users mailin

[ceph-users] Re: Squid 19.2.1 dashboard javascript error

2025-02-10 Thread Harry G Coin

ot used by the dashboard apparently? On 2/10/25 14:00, Eugen Block wrote: Hi, did you also mute the osd_unreachable warning? ceph health mute OSD_UNREACHABLE 10w Should bring the cluster back to HEALTH_OK for 10 weeks. Zitat von Harry G Coin : Hi Nizam Answers interposed below. On 2/10/

[ceph-users] Re: Squid 19.2.1 dashboard javascript error

2025-02-10 Thread Harry G Coin

reachable and up and in as well. Not a big confidence builder. Regards, Nizam On Mon, Feb 10, 2025 at 9:00 PM Harry G Coin wrote: In the same code area: If all the alerts are silenced, nevertheless the dashboard will not show 'green', but red or yellow dependin

[ceph-users] Re: 19.2.1: HEALTH_ERR 27 osds(s) are not reachable. (Yet working normally...)

2025-02-10 Thread Harry G Coin

. ---- *De :* Harry G Coin *Envoyé :* vendredi 7 février 2025 22:52 *À :* ceph-users *Objet :* [ceph-users] 19.2.1: HEALTH_ERR 27 osds(s) are not reachable. (Yet working normally...) 19.2.1 complains of all osd's being un

[ceph-users] Re: Squid 19.2.1 dashboard javascript error

2025-02-10 Thread Harry G Coin

In the same code area: If all the alerts are silenced, nevertheless the dashboard will not show 'green', but red or yellow depending on the nature of the silenced alerts. On 2/10/25 04:18, Nizamudeen A wrote: Thank you Chris, I was able to reproduce this. We will look into it and send out a

[ceph-users] 19.2.1: HEALTH_ERR 27 osds(s) are not reachable. (Yet working normally...)

2025-02-07 Thread Harry G Coin

19.2.1 complains of all osd's being unreachable, as their public address isn't in the public subnet. However, they all are within the subnet, and are working normally as well. It's embarrassing for the dashboard to glow red of a totally crippled osd roster --- while all is working normally.

[ceph-users] Re: squid 19.2.1 RC QE validation status

2024-12-18 Thread Harry G Coin

Any chance for this one or one that fixes 'all osd's unreachable' when ipv6 in use? https://github.com/ceph/ceph/pull/60881 On 12/18/24 11:35, Ilya Dryomov wrote: On Mon, Dec 16, 2024 at 6:27 PM Yuri Weinstein wrote: Details of this release are summarized here: https://tracker.ceph.com/issu

[ceph-users] Re: Help with "27 osd(s) are not reachable" when also "27 osds: 27 up.. 27 in"

2024-10-16 Thread Harry G Coin

14/24 21:04, Anthony D'Atri wrote: Try failing over to a standby mgr On Oct 14, 2024, at 9:33 PM, Harry G Coin wrote: I need help to remove a useless "HEALTH ERR" in 19.2.0 on a fully dual stack docker setup with ceph using ip v6, public and private nets separated, with a fe

[ceph-users] Re: Help with "27 osd(s) are not reachable" when also "27 osds: 27 up.. 27 in"

2024-10-15 Thread Harry G Coin

wrong, the dashboard reporting critical problems -- except there are none. Makes me really wonder whether any actual testing on ipv6 is ever done before releases are marked 'stable'. HC On 10/14/24 21:04, Anthony D'Atri wrote: Try failing over to a standby mgr On Oct 14, 2024,

[ceph-users] Help with "27 osd(s) are not reachable" when also "27 osds: 27 up.. 27 in"

2024-10-14 Thread Harry G Coin

I need help to remove a useless "HEALTH ERR" in 19.2.0 on a fully dual stack docker setup with ceph using ip v6, public and private nets separated, with a few servers. After upgrading from an error free v18 rev, I can't get rid of the 'health err' owing to the report that all osds are unreach

[ceph-users] Re: v19 & IPv6: unable to convert chosen address to string

2024-10-04 Thread Harry G Coin

Same errors as below on latest ceph / latest Ubuntu LTS / noble when updating from reef to squid. The same 'ceph -s' that reports all osd's are 'up' and 'in' also reports all of them are 'unreachable'. I hate it when that happens. All OSD/mon/mgr hosts are dual stack, but ceph uses just ip

[ceph-users] Re: 18.2.4 regression: 'diskprediction_local' has failed: No module named 'sklearn'

2024-07-27 Thread Harry G Coin

On 7/26/24 11:45, Rouven Seifert wrote: Hello, On 2024-07-25 16:39, Harry G Coin wrote: Upgraded to 18.2.4 yesterday. Healthy cluster reported a few minutes after the upgrade completed. Next morning, this: # ceph health detail HEALTH_ERR Module 'diskprediction_local' has failed:

[ceph-users] 18.2.4 regression: 'diskprediction_local' has failed: No module named 'sklearn'

2024-07-25 Thread Harry G Coin

Upgraded to 18.2.4 yesterday. Healthy cluster reported a few minutes after the upgrade completed. Next morning, this: # ceph health detail HEALTH_ERR Module 'diskprediction_local' has failed: No module named 'sklearn' [ERR] MGR_MODULE_ERROR: Module 'diskprediction_local' has failed: No modul

[ceph-users] Re: 18.2.2 dashboard really messed up.

2024-03-13 Thread Harry G Coin

o read more about the new page you can check here <https://docs.ceph.com/en/latest/mgr/dashboard/#overview-of-the-dashboard-landing-page>. Regards, Nizam On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin wrote: Looking at ceph -s, all is well. Looking at the dashboard, 85% of my ca

[ceph-users] 18.2.2 dashboard really messed up.

2024-03-11 Thread Harry G Coin

Looking at ceph -s, all is well. Looking at the dashboard, 85% of my capacity is 'warned', and 95% is 'in danger'. There is no hint given as to the nature of the danger or reason for the warning. Though apparently with merely 5% of my ceph world 'normal', the cluster reports 'ok'. Which, y

[ceph-users] Howto: 'one line patch' in deployed cluster?

2023-12-14 Thread Harry G Coin

Is there a 'Howto' or 'workflow' to implement a one-line patch in a running cluster? With full understanding it will be gone on the next upgrade? Hopefully without having to set up an entire packaging/development environment? Thanks! To implement: * /Subject/: Re: Permanent KeyError: 'T

[ceph-users] Permanent KeyError: 'TYPE' ->17.2.7: return self.blkid_api['TYPE'] == 'part'

2023-11-07 Thread Harry G Coin

These repeat for every host, only after upgrading from prev release Quincy to 17.2.7. As a result, the cluster is always warned, never indicates healthy. root@noc1:~# ceph health detail HEALTH_WARN failed to probe daemons or devices [WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or de

[ceph-users] libcephfs init hangs, is there a 'timeout' argument?

2023-08-09 Thread Harry G Coin

Libcephfs's 'init' call hangs when passed arguments that once worked normally, but later refer to a cluster that's either broken, is on its way out of service, has too few mons, etc. At least the python libcephfs wrapper hangs on init. Of course mount and session timeouts work, but is there a

[ceph-users] Puzzle re 'ceph: mds0 session blocklisted"

2023-08-08 Thread Harry G Coin

Can anyone help me understand seemingly contradictory cephfs error messages? I have a RHEL ceph client that mounts a cephfs file system via autofs. Very typical. After boot, when a user first uses the mount, for example 'ls /mountpoint' , all appears normal to the user. But on the system co

[ceph-users] RHEL / CephFS / Pacific / SELinux unavoidable "relabel inode" error?

2023-08-02 Thread Harry G Coin

Hi! No matter what I try, using the latest cephfs on an all ceph-pacific setup, I've not been able to avoid this error message, always similar to this on RHEL family clients: SELinux: inode=1099954719159 on dev=ceph was found to have an invalid context=system_u:object_r:unlabeled_t:s0. This

[ceph-users] ls: cannot access '/cephfs': Stale file handle

2023-05-17 Thread Harry G Coin

I have two autofs entries that mount the same cephfs file system to two different mountpoints. Accessing the first of the two fails with 'stale file handle'. The second works normally. Other than the name of the mount point, the lines in autofs are identical. No amount of 'umount -f' or res

[ceph-users] Re: 17.2.6 fs 'ls' ok, but 'cat' 'operation not permitted' puzzle

2023-05-02 Thread Harry G Coin

s in the NFS, not cephfs docs: https://docs.ceph.com/en/latest/cephfs/nfs/ and the Genius level much appreciated pointer from Curt here: On 5/2/23 14:21, Curt wrote: This thread might be of use, it's an older version of ceph 14, but might still apply, https://lists.ceph.io/hyperkitty/l

[ceph-users] 17.2.6 fs 'ls' ok, but 'cat' 'operation not permitted' puzzle

2023-05-02 Thread Harry G Coin

In 17.2.6 is there a security requirement that pool names supporting a ceph fs filesystem match the filesystem name.data for the data and name.meta for the associated metadata pool? (multiple file systems are enabled) I have filesystems from older versions with the data pool name matching the

[ceph-users] Grafana host overview -- "no data"?

2022-05-12 Thread Harry G. Coin

I've a 'healthy' cluster with a dashboard where Grafana correctly reports the number of osds on a host and the correct raw capacity -- and 'no data' for any time period, for any of the osd's (dockerized Quincy). Meanwhile the top level dashboard cluster reports reasonable client throughput rea

[ceph-users] Re: The last 15 'degraded' items take as many hours as the first 15K?

2022-05-12 Thread Harry G. Coin

On 5/12/22 02:05, Janne Johansson wrote: Den tors 12 maj 2022 kl 00:03 skrev Harry G. Coin : Might someone explain why the count of degraded items can drop thousands, sometimes tens of thousands in the same number of hours it takes to go from 10 to 0? For example, when an OSD or a host with a

[ceph-users] Re: The last 15 'degraded' items take as many hours as the first 15K?

2022-05-11 Thread Harry G. Coin

ct is not huge and is not RGW index omap, that slow of a single-object recovery would have me checking whether I have a bad disk that's presenting itself as significantly underperforming. Josh On Wed, May 11, 2022 at 4:03 PM Harry G. Coin wrote: Might someone explain why the count of degraded

[ceph-users] The last 15 'degraded' items take as many hours as the first 15K?

2022-05-11 Thread Harry G. Coin

Might someone explain why the count of degraded items can drop thousands, sometimes tens of thousands in the same number of hours it takes to go from 10 to 0? For example, when an OSD or a host with a few OSD's goes offline for a while, reboots. Sitting at one complete and entire degraded obj

[ceph-users] Re: reinstalled node with OSD

2022-05-11 Thread Harry G. Coin

bbk, It did help! Thank you. Here's a slightly more 'with the osd-fsid details filled in' procedure for moving a 'dockerized' / container-run OSD set of drives to a replacement server/motherboard (or the same server with blank/new/fresh reinstalled OS). For occasions when the 'new setup' wil

[ceph-users] [progress WARNING root] complete: ev ... does not exist, oh my!

2022-05-06 Thread Harry G. Coin

I tried searching for the meaning of a ceph Quincy all caps WARNING message, and failed. So I need help. Ceph tells me my cluster is 'healthy', yet emits a bunch of 'progress WARNING root] comlete ev' ... messages. Which I score right up there with the helpful dmesg "yama, becoming mindful"

[ceph-users] How to make ceph syslog items approximate ceph -w ?

2022-05-05 Thread Harry G. Coin

Using Quincy I'm getting a much worse lag owing to ceph syslog message volume, though without obvious system errors. In the usual case of no current/active hardware errors and no software crashes: what config settings can I pick so that what appears in syslog is as close to what would appear

[ceph-users] Re: v17.2.0 Quincy released

2022-04-19 Thread Harry G. Coin

Great news! Any notion when the many pending bug fixes will show up in Pacific? It's been a while. On 4/19/22 20:36, David Galloway wrote: We're very happy to announce the first stable release of the Quincy series. We encourage you to read the full release notes at https://ceph.io/en/news/

[ceph-users] How to avoid 'bad port / jabber flood' = ceph killer?

2022-01-27 Thread Harry G. Coin

I would really appreciate advice because I bet many of you have 'seen this before' but I can't find a recipe. There must be a 'better way' to respond to this situation: It starts with a well working small ceph cluster with 5 servers and no apparent change to the workflow suddenly starts repo

[ceph-users] "Just works" no-typing drive placement howto?

2022-01-21 Thread Harry G. Coin

There's got to be some obvious way I haven't found for this common ceph use case, that happens at least once every couple weeks. I hope someone on this list knows and can give a link. The scenario goes like this, on a server with a drive providing boot capability, the rest osds: 1. First, s

[ceph-users] A middle ground between containers and 'lts distros'?

2021-11-18 Thread Harry G. Coin

I sense the concern about ceph distributions via containers generally has to do with what you might call a feeling of 'opaqueness'. The feeling is amplified as most folks who choose open source solutions prize being able promptly to address the particular concerns affecting them without havin

[ceph-users] Re: How to get ceph bug 'non-errors' off the dashboard?

2021-10-03 Thread Harry G. Coin

Worked very well! Thank you. Harry Coin On 10/2/21 11:23 PM, 胡玮文 wrote: Hi Harry, Please try these commands in CLI: ceph health mute MGR_MODULE_ERROR ceph health mute CEPHADM_CHECK_NETWORK_MISSING Weiwen Hu 在 2021年10月3日，05:37，Harry G. Coin 写道： I need help getting two 'non e

[ceph-users] How to get ceph bug 'non-errors' off the dashboard?

2021-10-02 Thread Harry G. Coin

I need help getting two 'non errors' off the ceph dashboard so it stops falsely scaring people with the dramatic read "HEALTH_ERR" --- and masking what could be actual errors of immediate importance. The first is a bug where the devs try to do date arithmetic between incompatible variables. T

[ceph-users] Re: Trying to understand what overlapped roots means in pg_autoscale's scale-down mode

2021-10-01 Thread Harry G. Coin

I asked as well, it seems nobody on the list knows so far. On 9/30/21 10:34 AM, Andrew Gunnerson wrote: Hello, I'm trying to figure out what overlapping roots entails with the default scale-down autoscaling profile in Ceph Pacific. My test setup involves a CRUSH map that looks like: ID=-

[ceph-users] Set some but not all drives as 'autoreplace'?

2021-09-28 Thread Harry G. Coin

Hi all, I know Ceph offers a way to 'automatically' cause blank drives it detects to be spun up into osds, but I think that's an 'all or nothing' situation if I read the docs properly. Is there a way to specify which slots, or even better, a way to specify not specific slots? It sure would

[ceph-users] Is this really an 'error'? "pg_autoscaler... has overlapping roots"

2021-09-23 Thread Harry G. Coin

Is there anything to be done about groups of log messages like "pg_autoscaler ERROR root] pool has overlapping roots" The cluster reports it is healthy, and yet this is reported as an error, so-- is it an error that ought to have been reported, or is it not an error? Thanks Harry Coin ___

[ceph-users] "Remaining time" under-estimates by 100x....

2021-09-22 Thread Harry G. Coin

Is there a way to re-calibrate the various 'global recovery event' and related 'remaining time' estimators? For the last three days I've been assured that a 19h event will be over in under 3 hours... Previously I think Microsoft held the record for the most incorrect 'please wait' progress i

[ceph-users] after upgrade: HEALTH ERR ...'devicehealth' has failed: can't subtract offset-naive and offset-aware datetimes

2021-09-21 Thread Harry G. Coin

A cluster reporting no errors running 16.2.5 immediately after upgrade to 16.2.6 features what seems to be an entirely bug-related dramatic 'Heath Err' on the dashboard: Module 'devicehealth' has failed: can't subtract offset-naive and offset-aware datetimes Looking at the bug tracking logs,

[ceph-users] Bigger picture 'ceph web calculator', was Re: SATA vs SAS

2021-08-22 Thread Harry G. Coin

This topic comes up often enough, maybe it's time for one of those 'web calculators'. One that accepts the user who knows their goals but not ceph-fu, entering the importance of various factors (my suggested factors: read freq/stored tb, write freq/stored tb, unreplicated tb needed, min target d

[ceph-users] Docker container snapshots accumulate until disk full failure?

2021-08-11 Thread Harry G. Coin

Does ceph remove container subvolumes holding previous revisions of daemon images after upgrades? I have a couple servers using btrfs to hold the containers. The number of docker related sub-volumes just keeps growing, way beyond the number of daemons running. If I ignore this, I'll get disk-fu

[ceph-users] Re: Did standby dashboards stop redirecting to the active one?

2021-07-26 Thread Harry G. Coin

/en/latest/mgr/dashboard/#disable-the-redirection> > (e.g.: HAProxy)? No redirection, nothing. Just timeout on every manager other than the active one. Adding a HAproxy would be easily done, but seems redundant to ceph internal capability -- that at one time worked, anyhow. > > Kind R

[ceph-users] Did standby dashboards stop redirecting to the active one?

2021-07-26 Thread Harry G. Coin

Somewhere between Nautilus and Pacific the hosts running standby managers, which previously would redirect browsers to the currently active mgr/dashboard, seem to have stopped doing that. Is that a switch somewhere? Or was I just happily using an undocumented feature? Thanks Harry Coin _

[ceph-users] Re: name alertmanager/node-exporter already in use with v16.2.5

2021-07-11 Thread Harry G. Coin

On 7/8/21 5:06 PM, Bryan Stillwell wrote: > I upgraded one of my clusters to v16.2.5 today and now I'm seeing these > messages from 'ceph -W cephadm': > > 2021-07-08T22:01:55.356953+ mgr.excalibur.kuumco [ERR] Failed to apply > alertmanager spec AlertManagerSpec({'placement': PlacementSpec(co

[ceph-users] Re: name alertmanager/node-exporter already in use with v16.2.5

2021-07-10 Thread Harry G. Coin

Same problem here. Hundreds of lines like ' Updating node-exporter deployment (+4 -4 -> 5) (0s) [] ' And, similar to yours: ... 2021-07-10T16:26:30.432487-0500 mgr.noc4.tvhgac [ERR] Failed to apply node-exporter spec MonitoringSpec({'placement': PlacementSp

[ceph-users] Question re: replacing failed boot/os drive in cephadm / pacific cluster

2021-07-09 Thread Harry G. Coin

Hi In a Pacific/container/cephadm setup, when a server boot/os drive fails (unrelated to any osd actual storage): Can the boot/OS drive be replaced with a 'fresh install OS install' then simply setting up the same networking addressing/ssh keys (assuming the necessary docker/non-ceph pkgs are ins

[ceph-users] Why does 'mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 2w' expire in less than a day?

2021-07-07 Thread Harry G. Coin

Is this happening to anyone else? After this command: ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 2w The 'dashboard' shows 'Health OK', then after a few hours (perhaps a mon leadership change), it's back to 'degraded' and 'AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing

[ceph-users] Re: CephFS design

2021-06-14 Thread Harry G. Coin

On 6/11/21 3:52 AM, Szabo, Istvan (Agoda) wrote: > Hi, > > Can you suggest me what is a good cephfs design? I've never used it, only rgw > and rbd we have, but want to give a try. Howvere in the mail list I saw a > huge amount of issues with cephfs so would like to go with some let's say > bulle

[ceph-users] Re: In theory - would 'cephfs root' out-perform 'rbd root'?

2021-06-13 Thread Harry G. Coin

got to take more processing and network bandwidth than the 'known interesting only' parts of files. > > On Fri, Jun 11, 2021 at 12:31 PM Harry G. Coin wrote: >> On any given a properly sized ceph setup, for other than database end >> use) theoretically shouldn&#

[ceph-users] In theory - would 'cephfs root' out-perform 'rbd root'?

2021-06-11 Thread Harry G. Coin

On any given a properly sized ceph setup, for other than database end use) theoretically shouldn't a ceph-fs root out-perform any fs atop a rados block device root? Seems to me like it ought to: moving only the 'interesting' bits of files over the so-called 'public' network should take fewer, smal

[ceph-users] Cephfs root/boot?

2021-06-07 Thread Harry G. Coin

Has anyone added the 'conf.d' modules (and on the centos/rhel/fedora world done the selinux work) so that initramfs/dracut can 'direct kernel boot' cephfs as a guest image root file system? It took some work for the nfs folks to manage being the root filesystem. Harry _

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-02 Thread Harry G. Coin

On 6/2/21 2:28 PM, Phil Regnauld wrote: > Dave Hall (kdhall) writes: >> But the developers aren't out in the field with their deployments >> when something weird impacts a cluster and the standard approaches don't >> resolve it. And let's face it: Ceph is a marvelously robust solution for >> lar

[ceph-users] mons assigned via orch label 'committing suicide' upon reboot.

2021-05-28 Thread Harry G. Coin

FYI, I'm getting monitors assigned via '... apply label:mon' with current and valid 'mon' tags: 'committing suicide' after surprise reboots in the 'Pacific' 16.2.4 release. The tag indicating a monitor should be assigned to that host is present and never changed. Deleting the mon tag, waiting a

[ceph-users] Re: orch apply mon assigns wrong IP address?

2021-05-21 Thread Harry G. Coin

On 5/21/21 9:49 AM, Eugen Block wrote: > You can define the public_network [1]: > > ceph config set mon public_network ** > > For example: > > ceph config set mon public_network 10.1.2.0/24 > > Or is that already defined and it happens anyway? The public network is defined, and it happens anyway

[ceph-users] orch apply mon assigns wrong IP address?

2021-05-21 Thread Harry G. Coin

Is there a way to force '.. orch apply *' to limit ip address selection to addresses matching the hostname in dns or /etc/hosts, or to a specific address given at 'host add' time? I've hit a bothersome problem: On v15, 'ceph orch apply mon ...' appears not to use the dns ip or /etc/hosts when i

[ceph-users] diskprediction_local to be retired or fixed or??

2020-12-11 Thread Harry G. Coin

Any idea whether 'diskprediction_local' will ever work in containers? I'm running 15.2.7 which contains a dependency on scikit-learn v 0.19.2 which isn't in the container. It's been throwing that error for a year now on all the octopus container versions I tried. It used to be on the baremetal v

[ceph-users] Switch docker image?

2020-10-22 Thread Harry G. Coin

This has got to be ceph/docker "101" but I can't find the answer in the docs and need help. The latest docker octopus images support using the ntpsec time daemon. The default stable octopus image doesn't as yet. I want to add a mon to a cluster that needs to use ntpsec (just go with it..), so I

[ceph-users] Re: Are there 'tuned profiles' for various ceph scenarios?

2020-07-01 Thread Harry G. Coin

[Resent to correct title] Marc: Here's a template that works here. You'll need to do some steps to create the 'secret' and make the block devs and so on: Glad I could contribute something. Sure would appreciate leads for the suggested sysctls

[ceph-users] Re: SPAM Are there 'tuned profiles' for various ceph scenarios?

2020-07-01 Thread Harry G. Coin

Marc: Here's a template that works here. You'll need to do some steps to create the 'secret' and make the block devs and so on: Glad I could contribute something. Sure would appreciate leads for the suggested sysctls/etc either apart or as tu

[ceph-users] Are there 'tuned profiles' for various ceph scenarios?

2020-07-01 Thread Harry G. Coin

Hi Are there any 'official' or even 'works for us' pointers to 'tuned profiles' for such common uses as 'ceph baremetal osd host' 'ceph osd + libvirt host' 'ceph mon/mgr' 'guest vm based on a kernel-mounted rbd' 'guest vm based on a direct virtio->rados link' I suppose there are a few other

[ceph-users] Re: Re layout help: need chassis local io to minimize net links

2020-06-29 Thread Harry G. Coin

Anthony asked about the 'use case'. Well, I haven't gone into details because I worried it wouldn't help much. From a 'ceph' perspective, the sandbox layout goes like this: 4 pretty much identical old servers, each with 6 drives, and a smaller server just running a mon to break ties. Usual fron

[ceph-users] Re: Re layout help: need chassis local io to minimize net links

2020-06-29 Thread Harry G. Coin

ote: > What does “traffic” mean? Reads? Writes will have to hit the net > regardless of any machinations. > >> On Jun 29, 2020, at 7:31 PM, Harry G. Coin wrote: >> >> I need exactly what ceph is for a whole lot of work, that work just >> doesn't represent

[ceph-users] Re layout help: need chassis local io to minimize net links

2020-06-29 Thread Harry G. Coin

I need exactly what ceph is for a whole lot of work, that work just doesn't represent a large fraction of the total local traffic. Ceph is the right choice. Plainly ceph has tremendous support for replication within a chassis, among chassis and among racks. I just need intra-chassis traffic to n

[ceph-users] layout help: need chassis local io to minimize net links

2020-06-29 Thread Harry G. Coin

Hi I have a few servers each with 6 or more disks, with a storage workload that's around 80% done entirely within each server. From a work-to-be-done perspective there's no need for 80% of the load to traverse network interfaces, the rest needs what ceph is all about. So I cooked up a set of c

[ceph-users] Recovery throughput inversely linked with rbd_cache_xyz?

2020-04-23 Thread Harry G. Coin

Hello, A couple days ago I increased the rbd cache size from the default to 256MB/osd on a small 4 node, 6 osd/node setup in a test/lab setting. The rbd volumes are all vm images with writeback cache parameters and steady if only a few mb/sec writes going on. Logging mostly. I noticed the

[ceph-users] Re: v14.2.3 Nautilus released

2019-09-04 Thread Harry G. Coin

Does anyone know if the change to disable spdk by default (so as to remove the corei7 dependency when running on intel platforms) made it in to 14.2.3? The spdk version only required core2 in 14.2.1, the change to require corei7 in 14.2.2 killed all the osds on older systems flat. On 9/4/1

75 matches

Mail list logo