Just upgraded from Ceph Nautilus to Ceph Octopus on Ubuntu 18.04 using
standard ubuntu packages from the Ceph repo.
Upgrade has gone OK but we are having issues with our radosgw service,
eventually failing after some load, here's what we see in the logs:
2021-10-05T15:55:16.328-0400 7fa47700
oad' and bouncing the radosgw
process and we seem to be humming along nicely now.
On Tue, Oct 5, 2021 at 4:55 PM shubjero wrote:
>
> Just upgraded from Ceph Nautilus to Ceph Octopus on Ubuntu 18.04 using
> standard ubuntu packages from the Ceph repo.
>
> Upgrade has gone OK
We've done 14.04 -> 16.04 -> 18.04 -> 20.04 all at various stages of our
ceph cluster life.
The latest 18.04 to 20.04 was painless and we ran:
apt update && apt dist-upgrade -y -o Dpkg::Options::=\"--force-confdef\" -o
Dpkg::Options::=\"--force-confold\"
do-release-upgrade --allow-third-party -f D
Hey all,
Recently upgraded to Ceph Octopus (15.2.14). We also run Zabbix
5.0.15. Have had ceph/zabbix monitoring for a long time. After the
Ceph Octopus update I installed the latest version of the Ceph
template in Zabbix
(https://github.com/ceph/ceph/blob/master/src/pybind/mgr/zabbix/zabbix_templ
Hi all,
I have a 39 node, 1404 spinning disk Ceph Mimic cluster across 6 racks
for a total of 9.1PiB raw and about 40% utilized. These storage nodes
started their life on Ubuntu 14.04 and in-place upgraded to 16.04 2
years ago however I have started a project to do fresh installs of
each OSD node
Good day,
I am having an issue with some multipart uploads to radosgw. I
recently upgraded my cluster from Mimic to Nautilus and began having
problems with multipart uploads from clients using the Java AWS SDK
(specifically 1.11.219). I do NOT have issues with multipart uploads
with other clients
, Sep 2, 2020 at 3:15 PM shubjero wrote:
>
> Good day,
>
> I am having an issue with some multipart uploads to radosgw. I
> recently upgraded my cluster from Mimic to Nautilus and began having
> problems with multipart uploads from clients using the Java AWS SDK
> (specificall
We have our object storage endpoint fqdn DNS round robining to 2 IP's.
Those 2 IP's are managed by keepalived across 3 servers running
haproxy where each haproxy instance is listening on each round robin'd
IP and then load balanced to 5 servers running radosgw.
On Fri, Sep 4, 2020 at 12:35 PM Oliv
Hey all,
I'm creating a new post for this issue as we've narrowed the problem
down to a partsize limitation on multipart upload. We have discovered
that in our production Nautilus (14.2.11) cluster and our lab Nautilus
(14.2.10) cluster that multipart uploads with a configured part size
of greater
) breaks multipart uploads.
On Tue, Sep 8, 2020 at 12:12 PM shubjero wrote:
>
> Hey all,
>
> I'm creating a new post for this issue as we've narrowed the problem
> down to a partsize limitation on multipart upload. We have discovered
> that in our production Nautilus (
Will do Matt
On Tue, Sep 8, 2020 at 5:36 PM Matt Benjamin wrote:
>
> thanks, Shubjero
>
> Would you consider creating a ceph tracker issue for this?
>
> regards,
>
> Matt
>
> On Tue, Sep 8, 2020 at 4:13 PM shubjero wrote:
> >
> > I had been looking
_max_chunk_size > rgw_put_obj_min_window_size,
> because we try to write in units of chunk size but the window is too
> small to write a single chunk.
>
> On Wed, Sep 9, 2020 at 8:51 AM shubjero wrote:
> >
> > Will do Matt
> >
> > On Tue, Sep 8, 2020 at 5:36
I'm having a similar issue with ceph-mgr stability problems since
upgrading from 13.2.5 to 13.2.6. I have isolated the crashing to the
prometheus module being enabled and notice much better stability when
the prometheus module is NOT enabled. No more failovers, however I do
notice that even with pr
Good day,
We have a Ceph cluster and make use of object-storage and integrate
with OpenStack. Each OpenStack project/tenant is given a radosgw user
which allows all keystone users of that project to access the
object-storage as that single radosgw user. The radosgw user is the
project id of the Op
Good day,
We have a Ceph cluster and make use of object-storage and integrate
with OpenStack. Each OpenStack project/tenant is given a radosgw user
which allows all keystone users of that project to access the
object-storage as that single radosgw user. The radosgw user is the
project id of the Op
Hey all,
Yesterday our cluster went in to HEALTH_WARN due to 1 large omap
object in the .usage pool (I've posted about this in the past). Last
time we resolved the issue by trimming the usage log below the alert
threshold but this time it seems like the alert wont clear even after
trimming and (th
rados -p .usage listomapkeys usage.22
root@infra:~#
On Thu, Sep 19, 2019 at 12:54 PM Charles Alva wrote:
>
> Could you please share how you trimmed the usage log?
>
> Kind regards,
>
> Charles Alva
> Sent from Gmail Mobile
>
>
> On Thu, Sep 19, 2019 at 11:4
> issued or cleared during scrub, so I'd expect them to go away the next
> time the usage objects get scrubbed.
>
> On 9/20/19 2:31 PM, shubjero wrote:
> > Still trying to solve this one.
> >
> > Here is the corresponding log entry when the large omap object was
The deep scrub of the pg updated the cluster that the large omap was gone.
HEALTH_OK !
On Fri., Sep. 20, 2019, 2:31 p.m. shubjero, wrote:
> Still trying to solve this one.
>
> Here is the corresponding log entry when the large omap object was found:
>
> ceph-osd.1284.log.2.gz:2
Hi all,
I'm running a Ceph Mimic cluster 13.2.6 and we use the ceph-balancer
in upmap mode. This cluster is fairly old and pre-Mimic we used to set
osd reweights to balance the standard deviation of the cluster. Since
moving to Mimic about 9 months ago I enabled the ceph-balancer with
upmap mode a
Right, but should I be proactively returning any reweighted OSD's that
are not 1. to 1.?
On Wed, Feb 26, 2020 at 3:36 AM Konstantin Shalygin wrote:
>
> On 2/26/20 3:40 AM, shubjero wrote:
> > I'm running a Ceph Mimic cluster 13.2.6 and we use the ceph-balancer
&
I talked to some guys on IRC about going back to the non-1 reweight
OSD's and setting them to 1.
I went from a standard deviation of 2+ to 0.5.
Awesome.
On Wed, Feb 26, 2020 at 10:08 AM shubjero wrote:
>
> Right, but should I be proactively returning any reweighted OSD's that
I've reported stability problems with ceph-mgr w/ prometheus plugin
enabled on all versions we ran in production which were several
versions of Luminous and Mimic. Our solution was to disable the
prometheus exporter. I am using Zabbix instead. Our cluster is 1404
OSD's in size with about 9PB raw wi
23 matches
Mail list logo