Hi,
I enabled pg_autoscaler on a specific pool ssd.
I failed to increase pg_num / pgp_num on pools ssd to 1024:
root@ld3955:~# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO
TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
cephfs_metadata 395.8
Hi,
I failed to increase pg_num / pgp_num on pools ssd to 1024:
root@ld3976:~# ceph osd pool get ssd pg_num
pg_num: 512
root@ld3976:~# ceph osd pool get ssd pgp_num
pgp_num: 512
root@ld3976:~# ceph osd pool set ssd pg_num 1024
root@ld3976:~# ceph osd pool get ssd pg_num
pg_num: 512
When I check
Hi,
I failed to increase pg_num / pgp_num on pools ssd to 1024:
root@ld3976:~# ceph osd pool get ssd pg_num
pg_num: 512
root@ld3976:~# ceph osd pool get ssd pgp_num
pgp_num: 512
root@ld3976:~# ceph osd pool set ssd pg_num 1024
root@ld3976:~# ceph osd pool get ssd pg_num
pg_num: 512
When I check
; I had this when testing pg_autoscaler, after some time every command
> would hang. Restarting the MGR helped for a short period of time, then
> I disabled pg_autoscaler. This is an upgraded cluster, currently on
> Nautilus.
>
> Regards,
> Eugen
>
>
> Zitat von Thomas
Hi,
command ceph osd df does not return any output.
Based on the strace output there's a timeout.
[...]
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f53006b9000
brk(0x55c2579b6000) = 0x55c2579b6000
brk(0x55c2579d7000) = 0x55
Update:
Issue is solved.
The output of "ceph osd dump" showed that the required setting was
incorrect, means
require_osd_release luminous
After executing
ceph osd require-osd-release nautilus
I can enable pg_autoscale_mode on any pool.
THX
Am 21.11.2019 um 13:51 schrieb Paul Emmerich:
> "ceph o
Looks like the flag is not correct.
root@ld3955:~# ceph osd dump | grep nautilus
root@ld3955:~# ceph osd dump | grep require
require_min_compat_client luminous
require_osd_release luminous
Am 21.11.2019 um 13:51 schrieb Paul Emmerich:
> "ceph osd dump" shows you if the flag is set
>
>
> Paul
>
Hello Paul,
I didn't skip this step.
Actually I'm sure that everything on Cluster is on Nautilus because I
had issues with SLES 12SP2 Clients that failed to connect due to
outdated client tools that could not connect to Nautilus.
Would it make sense to execute
ceph osd require-osd-release nautil
Hi,
I try to enable pg_autoscale_mode on a specific pool of my cluster,
however this returns an error.
root@ld3955:~# ceph osd pool set ssd pg_autoscale_mode on
Error EINVAL: must set require_osd_release to nautilus or later before
setting pg_autoscale_mode
The error message is clear, but my clus
Hi,
my Ceph cluster is in unhealthy state and busy with recovery.
I'm observing the MGR log and this is showing this error message regularely:
2019-11-20 09:51:45.211 7f7205581700 0 auth: could not find secret_id=4193
2019-11-20 09:51:45.211 7f7205581700 0 cephx: verify_authorizer could
not get
Hi,
I'm experiencing the same issue with this setting in ceph.conf:
osd op queue = wpq
osd op queue cut off = high
Furthermore I cannot read any old data in the relevant pool that is
serving CephFS.
However, I can write new data and read this new data.
Regards
Thoma
rnum
> Sent: 09 September 2019 23:25
> To: Byrne, Thomas (STFC,RAL,SC)
> Cc: ceph-users
> Subject: Re: [ceph-users] Help understanding EC object reads
>
> On Thu, Aug 29, 2019 at 4:57 AM Thomas Byrne - UKRI STFC
> wrote:
> >
> > Hi all,
> >
> > I’m investiga
Hi Folks,
I have found similar reports of this problem in the past but can't seem to find
any solution to it.
We have ceph filesystem running mimic version 13.2.5.
OSDs are running on AWS EC2 instances with centos 7. OSD disk is an AWS nvme
device.
Problem I, sometimes when rebooting an OSD in
Hi all,
I'm investigating an issue with our (non-Ceph) caching layers of our large EC
cluster. It seems to be turning users requests for whole objects into lots of
small byte range requests reaching the OSDs, but I'm not sure how inefficient
this behaviour is in reality.
My limited understandi
Hi,
I'm running Debian 10 with btrfs-progs=5.2.1.
Creating snapshots with snapper=0.8.2 works w/o errors.
However, I run into an issue and need to restore various files.
I thought that I could simply take the files from a snapshot created before.
However, the files required don't exist in any
Hi Torben,
> Is it allowed to have the scrub period cross midnight ? eg have start time at
> 22:00 and end time 07:00 next morning.
Yes, I think that's what the way it is mostly used, primarily to reduce the
scrub impact during waking/working hours.
> I assume that if you only configure the on
ul 31, 2019 at 2:36 PM Aleksey Gutikov
wrote:
> Hi Thomas,
>
> We did some investigations some time before and got several rules how to
> configure rgw and osd for big files stored on erasure-coded pool.
> Hope it will be useful.
> And if I have any mistakes, please let me kno
situations? A OSD
> blocking queries in a RBD scenario is a big deal, as plenty of VMs will
> have disk timeouts which can lead to the VM just panicking.
>
>
>
> Thanks!
>
> Xavier
>
>
> ___
> ceph-users mailing list
&
Hi Casey,
Thanks for your reply.
Just to make sure I understand correctly- would that only be if the S3
object size for the put/get is multiples of your rgw_max_chunk_size?
Kind regards,
Tom
On Tue, 30 Jul 2019 at 16:57, Casey Bodley wrote:
> Hi Thomas,
>
> I see that you're
Does anyone know what these parameters are for. I'm not 100% sure I
understand what a window is in context of rgw objects:
- rgw_get_obj_window_size
- rgw_put_obj_min_window_size
The code points to throttling I/O. But some more info would be useful.
Kind regards,
Tom
__
Hi,
Does anyone out there use bigger than default values for rgw_max_chunk_size
and rgw_obj_stripe_size?
I'm planning to set rgw_max_chunk_size and rgw_obj_stripe_size to 20MiB,
as it suits our use case and from our testing we can't see any obvious
reason not to.
Is there some convincing experi
As a counterpoint, adding large amounts of new hardware in gradually (or more
specifically in a few steps) has a few benefits IMO.
- Being able to pause the operation and confirm the new hardware (and cluster)
is operating as expected. You can identify problems with hardware with OSDs at
10% we
Gregory Farnum
Sent: 24 June 2019 17:30
To: Byrne, Thomas (STFC,RAL,SC)
Cc: ceph-users
Subject: Re: [ceph-users] OSDs taking a long time to boot due to
'clear_temp_objects', even with fresh PGs
On Mon, Jun 24, 2019 at 9:06 AM Thomas Byrne - UKRI STFC
wrote:
>
> Hi all,
>
&g
Hi all,
Some bluestore OSDs in our Luminous test cluster have started becoming
unresponsive and booting very slowly.
These OSDs have been used for stress testing for hardware destined for our
production cluster, so have had a number of pools on them with many, many
objects in the past. All
ime,sync 172.16.32.15:/
/mnt/cephfs
I have tried stripping much of the config and altering mount options, but so
far completely unable to decipher the cause. Also seems Im not the only one
who has been caught on this:
https://www.spinics.net/lists/ceph-devel/msg41201.html
Thanks in adv
Hi,
I have noticed an error when writing to a mapped RBD.
Therefore I unmounted the block device.
Then I tried to unmap it w/o success:
ld2110:~ # rbd unmap /dev/rbd0
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy
The same block device is mapped on another client and there
Thanks.
This procedure works very well.
Am 25.01.2019 um 14:24 schrieb Janne Johansson:
> Den fre 25 jan. 2019 kl 09:52 skrev cmonty14 <74cmo...@gmail.com>:
>> Hi,
>> I have identified a major issue with my cluster setup consisting of 3 nodes:
>> all monitors are connected to cluster network.
>
he error was caused by failure when copy & paste from Eugen's
instructions that are 100% correct!
Thanks for your great support!!!
Maybe another question related to this topic:
If I write a backup into a RBD, will Ceph use single IO stream or
mitted
rbd: error opening image gbs: (1) Operation not permitted
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (1) Operation not permitted
Regards
Thomas
Am 25.01.2019 um 12:31 schrieb Eugen Block:
> You can check all objects of that pool to see i
s found in syslog - try "dmesg | tail".
rbd: map failed: (1) Operation not permitted
Regards
Thomas
Am 25.01.2019 um 11:52 schrieb Eugen Block:
> osd 'allow rwx
> pool object_prefix rbd_data.2b36cf238e1f29; allow rwx pool
> object_prefix rbd_header.2b36cf238e1f29
___
Hi,
my use case for Ceph is serving a central backup storage.
This means I will backup multiple databases in Ceph storage cluster.
This is my question:
What is the best practice for creating pools & images?
Should I create multiple pools, means one pool per database?
Or should I create a single
Hi,
my use case for Ceph is serving a central backup storage.
This means I will backup multiple databases in Ceph storage cluster.
This is my question:
What is the best practice for creating pools & images?
Should I create multiple pools, means one pool per database?
Or should I create a single
Hi,
my use case for Ceph is serving a central backup storage.
This means I will backup multiple databases in Ceph storage cluster.
This is my question:
What is the best practice for creating pools & images?
Should I create multiple pools, means one pool per database?
Or should I create a single
For what it's worth, I think the behaviour Pardhiv and Bryan are describing is
not quite normal, and sounds similar to something we see on our large luminous
cluster with elderly (created as jewel?) monitors. After large operations which
result in the mon stores growing to 20GB+, leaving the clu
> In previous versions of Ceph, I was able to determine which PGs had
> scrub errors, and then a cron.hourly script ran "ceph pg repair" for them,
> provided that they were not already being scrubbed. In Luminous, the bad
> PG is not visible in "ceph --status" anywhere. Should I use something
I recently spent some time looking at this, I believe the 'summary' and
'overall_status' sections are now deprecated. The 'status' and 'checks' fields
are the ones to use now.
The 'status' field gives you the OK/WARN/ERR, but returning the most severe
error condition from the 'checks' section i
Assuming I understand it correctly:
"pg_upmap_items 6.0 [40,20]" refers to replacing (upmapping?) osd.40 with
osd.20 in the acting set of the placement group '6.0'. Assuming it's a 3
replica PG, the other two OSDs in the set remain unchanged from the CRUSH
calculation.
"pg_upmap_items 6.6 [45,
0.el7.x86_64
ceph-osd-10.2.11-0.el7.x86_64
ceph-mon-10.2.11-0.el7.x86_64
ceph-deploy-1.5.39-0.noarch
ceph-10.2.11-0.el7.x86_64
Could please someone help how to proceed?
Thanks and kind regards,
Thomas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-> 10.2.10 -> 12.2.9 in the past 2 weeks with no issues.
That said, it is disappointing these packages are making their way into
repositories without the proper announcements for an LTS release, especially
given this is enterprise orientated software.
Thomas
-Original Message-
From: ceph
ike to attend, please complete the
following form to register: https://goo.gl/forms/imuP47iCYssNMqHA2
Kind regards,
SARAO storage team
--
Thomas Bennett
SARAO
Science Data Processing
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.
Hi Folks,
I am looking for advice on how to troubleshoot some long operations found in
MDS. Most of the time performance is fantastic, but occasionally and to no real
pattern or trend, a gettattr op will take up to ~30 seconds to complete in MDS
which is stuck on "event": "failed to rdlock, wai
Hello,
I have two independent but almost identical systems, one of them (A) the total
number of objects stays around 200, the other (B) has been steadily increasing
and now seems to have levelled off at around 4000 objects.
The total used data remains roughly the same, but this data is continuou
resolve
this inconsistency when the object is supposed to be absent?
Kind Regards,
Thomas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
: Thomas Sumpter
Sent: Wednesday, September 19, 2018 4:31 PM
To: 'Gregory Farnum'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Delay Between Writing Data and that Data being
available for reading?
Linux version 4.18.4-1.el7.elrepo.x86_64 (mockbuild@Build64R7) (gcc version
4.8.
Linux version 4.18.4-1.el7.elrepo.x86_64 (mockbuild@Build64R7) (gcc version
4.8.5 20150623 (Red Hat 4.8.5-28) (GCC))
CentOS 7
From: Gregory Farnum
Sent: Wednesday, September 19, 2018 4:27 PM
To: Thomas Sumpter
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Delay Between Writing Data
(5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
Regards,
Tom
From: Gregory Farnum
Sent: Wednesday, September 19, 2018 4:04 PM
To: Thomas Sumpter
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Delay Between Writing Data and that Data being
available for reading?
You're going to need to te
Hello,
We have Mimic version 13.2.1 using Bluestore. OSDs are using NVMe disks for
data storage (in AWS).
Four OSDs are active in replicated mode.
Further information on request, since there are so many config options I am not
sure where to focus my attention yet. Assume we have default options.
the version to a
> folder and you can create a repo file that reads from a local directory.
> That's how I would re-install my test lab after testing an upgrade
> procedure to try it over again.
>
> On Tue, Aug 28, 2018, 1:01 AM Thomas Bennett wrote:
>
>> Hi,
>>
path
Aug 27 13:14:48 tr-25-3 pvestatd[15777]: file /etc/pve/storage.cfg
line
82 (skip section 'test'): missing value for required option 'export'
...
mounts via cli (mount -t nfs -o nfsvers=4.1,noauto,soft,sync,proto=tcp
x.x.x.x:/ /mnt/ganesha/) are working without issues -
Hi James,
I can see where some of the confusion has arisen, hopefully I can put at least
some of it to rest. In the Tumblr post from Yahoo, the keyword to look out for
is “nodes”, which is distinct from individual hard drives which in Ceph is an
OSD in most cases. So you would have multiple
ey're just not included in the package
distribution. Is this the desired behaviour or a misconfiguration?
Cheers,
Tom
--
Thomas Bennett
SARAO
Science Data Processing
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo
Hi Arvydas,
The error seems to suggest this is not an issue with your object data, but the
expected object digest data. I am unable to access where I stored my very hacky
diagnosis process for this, but our eventual fix was to locate the bucket or
files affected and then rename an object wit
Hi Jaime,
Upgrading directly should not be a problem. It is usually recommended to go to
the latest minor release before upgrading major versions, but my own migration
from 10.2.10 to 12.2.5 went seamlessly and I can’t see of any technical
limitation which would hinder or prevent this proces
Hi Steven,
Just to somewhat clarify my previous post, I mention OSDs in the sense that the
OS is installed on the OSD server using the SD card, I would absolutely
recommend against using SD cards as the actual OSD media. This of course misses
another point, which is for the Mons or other suc
Hi Steven,
If you are running OSDs on the SD card, there would be nothing technically
stopping this setup, but the main factors against would be the simple endurance
and performance of SD cards and the potential fallout when they inevitably
fail. If you factor time and maintenance as a cost
Hi all,
We have recently begun switching over to Bluestore on our Ceph cluster,
currently on 12.2.7. We first began encountering segfaults on Bluestore during
12.2.5, but strangely these segfaults apply exclusively to our SSD pools and
not the PCIE/HDD disks. We upgraded to 12.2.7 last week to
ll need to install the aws toolkit and jq of course and configure them.
Thanks again,
Tom
-Original Message-
From: ceph-users On Behalf Of Casey
Bodley
Sent: 02 August 2018 17:08
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Reset Object ACLs in RGW
On 08/02/2018 07:35 AM, Thomas
Hi all,
At present I have a cluster with a user on the RGW who has lost access to many
of his files. The bucket has the correct ACL to be accessed by the account and
so with their access and secret key many items can be listed, but are unable to
be downloaded.
Is there a way of using the rados
bootstrap-osd"
>
>
> Paul
>
>
> 2018-07-06 16:47 GMT+02:00 Thomas Roth :
>
>> Hi all,
>>
>> I wonder which is the correct key to create/recreate an additional OSD
>> with 12.2.5.
>>
>> Following
>> http://docs.ceph.com/docs
them on my mon hosts.
"ceph-volume" and "ceph-disk" go looking for that file, so I put it there, to
no avail.
Btw, the target server has still several "up" and "in" OSDs running, so this is
not a question of
network or general authentication iss
ket 30
times in 8 hours as we will write ~3 million objects in ~8 hours.
Hence the idea that we should preshard to avoid any undesirable workloads.
Cheers,
Tom
On Wed, Jun 27, 2018 at 3:16 PM, Matthew Vernon wrote:
> Hi,
>
> On 27/06/18 11:18, Thomas Bennett wrote:
>
> > We h
rifice that I'm willing to take for the
convenience of it preconfigured.
Cheers,
Tom
--
Thomas Bennett
SRAO
Storage Engineer - Science Data Processing
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi,
I'm testing out ceph_vms vs a cephfs mount with a cifs export.
I currently have 3 active ceph mds servers to maximise throughput and
when I have configured a cephfs mount with a cifs export, I'm getting
a reasonable benchmark results.
However, when I tried some benchmarking with the ceph_v
time to compact their stores. Although it’s far from ideal
(from a total time to get new storage weighted up), I’ll be letting the mons
compact between every backfill until I have a better idea of what went on last
week.
From: David Turner
Sent: 17 May 2018 18:57
To: Byrne, Thomas (STFC,RAL,SC
ors
> holding onto cluster maps
>
>
>
> On 05/17/2018 04:37 PM, Thomas Byrne - UKRI STFC wrote:
> > Hi all,
> >
> >
> >
> > As far as I understand, the monitor stores will grow while not
> > HEALTH_OK as they hold onto all cluster maps.
Hi all,
As far as I understand, the monitor stores will grow while not HEALTH_OK as
they hold onto all cluster maps. Is this true for all HEALTH_WARN reasons? Our
cluster recently went into HEALTH_WARN due to a few weeks of backfilling onto
new hardware pushing the monitors data stores over the
Hi Patric,
Thanks! Much appreciate.
On Tue, 15 May 2018 at 14:52, Patrick Donnelly wrote:
> Hello Thomas,
>
> On Tue, May 15, 2018 at 2:35 PM, Thomas Bennett wrote:
> > Hi,
> >
> > I'm running Luminous 12.2.5 and I'm testing cephfs.
> >
> > Ho
Hi,
I'm running Luminous 12.2.5 and I'm testing cephfs.
However, I seem to have too many active mds servers on my test cluster.
How do I set one of my mds servers to become standby?
I've run ceph fs set cephfs max_mds 2 which set the max_mds from 3 to 2 but
has no effect on my running configura
London, NW1 2BE
> <https://maps.google.com/?q=215+Euston+Road,+London,+NW1+2BE&entry=gmail&source=g>
> .
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.c
Hi Orit,
Thanks for the reply, much appreciated.
You cannot see the omap size using rados ls but need to use rados omap
> commands.
You can use this script to calculate the bucket index size:
> https://github.com/mkogan1/ceph-utils/blob/master/
> scripts/get_omap_kv_size.sh
Great. I had not e
Hi,
In trying to understand RGW pool usage I've noticed the pool called
*default.rgw.meta* pool has a large number of objects in it. Suspiciously
about twice as many objects in my *default.rgw.buckets.index* pool.
As I delete and add buckets, the number of objects in both pools decrease
and incre
Hi Peter,
Relooking at your problem, you might want to keep track of this issue:
http://tracker.ceph.com/issues/22440
Regards,
Tom
On Wed, Jan 31, 2018 at 11:37 AM, Thomas Bennett wrote:
> Hi Peter,
>
> From your reply, I see that:
>
>1. pg 3.12c is part of pool 3.
>
Hi Peter,
>From your reply, I see that:
1. pg 3.12c is part of pool 3.
2. The osd's in the "up" for pg 3.12c are: 6, 0, 12.
I suggest to check on this 'activating' issue do the following:
1. What is the rule that pool 3 should follow, 'hybrid', 'nvme' or
'hdd'? (Use the *ceph osd
11:48 AM, Peter Linder
wrote:
> Hi Thomas,
>
> No, we haven't gotten any closer to resolving this, in fact we had another
> issue again when we added a new nvme drive to our nvme servers (storage11,
> storage12 and storage13) that had weight 1.7 instead of the usual 0.728
>
ame, the
> problem goes away! I would have though that the weights does not matter,
> since we have to choose 3 of these anyways. So I'm really confused over
> this.
>
> Today I also had to change
>
> item ldc1 weight 197.489
> item ldc2 weight 197.196
>
e suddenly listed only one cephFS. Also the
> command "ceph fs status" doesn't return an error anymore but shows the
> corret output.
> I guess Ceph is indeed a self-healing storage solution! :-)
>
> Regards,
> Eugen
>
>
> Zitat von Thomas Bennett :
&
1 03 15
> D-22423 Hamburg e-mail : ebl...@nde.ag
>
> Vorsitzende des Aufsichtsrates: Angelika Mozdzen
> Sitz und Registergericht: Hamburg, HRB 90934
> Vorstand: Jens-U. Mozdzen
>USt-IdNr. DE 814
-5@2(probing).data_health(6138) service_dispatch_op not
in quorum -- drop message
2018-01-11 15:17:22.060499 7f69b80d6700 0 log_channel(cluster) log
[INF] : mon.my-ceph-mon-5 calling new monitor election
2018-01-11 15:17:22.060612 7f69b80d6700 1
mon.my-ceph-mon-5@2(electing).elector(613
Hi,
I have the same problem. A bug [1] is reported since months, but
unfortunately this is not fixed yet. I hope, if more people are having
this problem the developers can reproduce and fix it.
I was using Kernel-RBD with a Cache Tier.
so long
Thomas Coelho
[1] http://tracker.ceph.com/issues
Hello,
thank you very much for the hint, you are right!
Kind regards, Thomas
Marc Roos schrieb am 30.08.2017 um 14:26:
>
> I had this also once. If you update all nodes and then systemctl restart
> 'ceph-osd@*' on all nodes, you should be fine. But first the mo
rsion 1 < struct_compat
( it is puzzling that the *older* v12.1.0 node complains about the *old*
encoding version of the *newer* v12.2.0 node.)
Any idea how I can go ahead?
Kind regards, Thomas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://list
at 10:49 AM, Dan van der Ster
wrote:
> Hi Thomas,
>
> Yes we set it to a million.
> From our puppet manifest:
>
> # need to increase aio-max-nr to allow many bluestore devs
> sysctl { 'fs.aio-max-nr': val => '1048576' }
&
Hi,
I've been testing out Luminous and I've noticed that at some point the
number of osds per nodes was limited by aio-max-nr. By default its set to
65536 in Ubuntu 16.04
Has anyone else experienced this issue?
fs.aio-nr currently sitting at 196608 with 48 osds.
I have 48 osd's per node so I've
Hello,
Thomas Gebhardt schrieb am 07.07.2017 um 17:21:
> ( e.g.,
> ceph-deploy osd create --bluestore --block-db=/dev/nvme0bnp1 node1:/dev/sdi
> )
just noticed that there was typo in the block-db device name
(/dev/nvme0bnp1 -> /dev/nvme0n1p1). After fixing that misspelling my
coo
/
does not yet support stretch - but I suppose that's not related to my
problem).
Kind regards, Thomas
Jul 07 09:58:54 node1 systemd[1]: Started Ceph cluster monitor daemon.
Jul 07 09:58:54 node1 ceph-mon[550]: starting mon.node1 rank 0 at
x.x.x.x:6789/0 mon_data /var/lib/ceph/mon/ceph-node1
limit its
impact.
Thomas
From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: mercredi 23 novembre 2016 14:09
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently
Hi Thomas,
I’m afraid I can’t off
o, so you don’t even have to rely on Ceph to avoid
> downtime. I probably wouldn’t run it everywhere at once though for
> performance reasons. A single OSD at a time would be ideal, but that’s a
> matter of preference.
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@li
gt; Of *Kate Ward
> *Sent:* Tuesday, November 29, 2016 2:02 PM
> *To:* Thomas Bennett
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Is there a setting on Ceph that we can use to
> fix the minimum read size?
>
>
>
> I have no experience with XFS, but
ter at combining requests before they get to the
> drive?
>
> k8
>
> On Tue, Nov 29, 2016 at 9:52 AM Thomas Bennett wrote:
>
>> Hi,
>>
>> We have a use case where we are reading 128MB objects off spinning disks.
>>
>> We've benchmarked a number of dif
Hi,
We have a use case where we are reading 128MB objects off spinning disks.
We've benchmarked a number of different hard drive and have noticed that
for a particular hard drive, we're experiencing slow reads by comparison.
This occurs when we have multiple readers (even just 2) reading objects
Hi Kévin, I am currently having a similar issue. in my env I have around 16
Linux vms (vmware) more or less equaly loaded accessing a 1PB ceph hammer
cluster (40 dn, 800 osds) through rbd.
Very often we have IO freeze on the VM xfs FS and we also continuously have
slow requests on osd ( up to
Sorry to bring this up again - any ideas? Or should I try the IRC channel?
Cheers,
Thomas
Original Message
Subject:RadosGW not responding if ceph cluster in state health_error
Date: Mon, 21 Nov 2016 17:22:20 +1300
From: Thomas
To: ceph-users@lists.ceph.com
mon_osd_min_down_reports = 10
Thomas
From: David Turner [mailto:david.tur...@storagecraft.com]
Sent: mercredi 23 novembre 2016 21:27
To: n...@fisk.me.uk; Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently
T
.2406870.1:140440919 rbd_data.616bf2ae8944a.002b85a7
[set-alloc-hint object_size 4194304 write_size 4194304,write 1449984~524288]
0.4e69d0de snapc 218=[218,1fb,1df] ondisk+write e212564) currently waiting for
subops from 528,771
Thomas
From: Tomasz Kuzemko [mailto:tom...@kuzemko.net]
Sent: jeudi
overloading the
network or if my network switches that were having an issue.
Switches have been checked and they are showing no congestion issues or other
errors.
I really don’t know what to check or test, any idea is more than welcomed …
Thomas
From: Thomas Danan
Sent: vendredi 18 novembre 2016
up creation
Full log here: http://pastebin.com/iYpiF9wP
Once we removed the pool with size = 1 via 'rados rmpool', the cluster
started recovering and RGW served requests!
Any ideas?
Cheers,
Thomas
--
Thomas Gross
TGMEDIA Ltd.
p. +64 211 569080 | i...@tgmedia.co.nz
online ?
Thanks
Thomas
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas
Danan
Sent: vendredi 18 novembre 2016 12:42
To: n...@fisk.me.uk; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very freq
entify anything obvious in the logs.
Thanks for your help …
Thomas
From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: jeudi 17 novembre 2016 11:02
To: Thomas Danan; n...@fisk.me.uk; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very f
Actually forgot to say that the following issue is describing very close
symptoms :
http://tracker.ceph.com/issues/9844
Thomas
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas
Danan
Sent: jeudi 17 novembre 2016 09:59
To: n...@fisk.me.uk; 'Peter Maloney'
example and with some DEBUG messages activated I was also able
to see the many of the following messages on secondary OSDs.
2016-11-15 03:53:04.298502 7ff9c434f700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7ff9bdb42700' had timed out after 15
Thomas
From: Nick Fisk [mailto:
1 - 100 of 174 matches
Mail list logo