Hello,
re-adding the ML, so everybody benefits from this.
On Thu, 20 Oct 2016 14:03:56 +0530 Subba Rao K wrote:
> Hi Christian,
>
> I have seen one of your responses in CEPH user group and wanted some help
> from you.
>
> Can you please share HW configuration of the CEPH cluster which can ser
Hello,
On Thu, 20 Oct 2016 15:03:02 +0200 Oliver Dzombic wrote:
> Hi Christian,
>
> thank you for your time.
>
> The problem is deep scrub only.
>
> Jewel 10.2.2 is used.
>
Hmm, I was under the impression that the unified queue in Jewel was
supposed to stop scrubs from eating all the I/O babi
Hello,
On Thu, 20 Oct 2016 15:45:34 + Jim Kilborn wrote:
Good to know.
You may be able to squeeze some more 4K write IOPS out of this by cranking
the CPUs to full speed, see the relevant recent threads about this.
As for the 120GB (there is no 128GB SM863 model according to Samsung) SSDs
a
Hi
Any suggestions/recommendations on all SSD for Ceph?
I see SSD freezes occasionally on SATA drives, thus creating spikes in latency
at times. Recovers after a brief pause of 20-30 secs. Any best practices like
colocated journals or not, schedulers, hdparms etc appreciated. Working on 1.3.
R
All are relatively recent Ubuntu 16.04.1 kernels. I upgraded ka05 last
night, but still see an issue. I'm happy to upgrade the rest.
$ for h in ka00 ka01 ka02 ka03 ka04 ka05; do ssh $h uname -a; done
Linux ka00 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:00:59 UTC 2016
i686 i686 i686 GNU/Linux
Li
On Thu, Oct 20, 2016 at 10:15 PM, Kate Ward wrote:
> I have a strange problem that began manifesting after I rebuilt my cluster a
> month or so back. A tiny subset of my files on CephFS are being zero-padded
> out to the length of ceph.dir.layout.stripe_unit when the files are later
> *read* (not
I have a strange problem that began manifesting after I rebuilt my cluster
a month or so back. A tiny subset of my files on CephFS are being
zero-padded out to the length of ceph.dir.layout.stripe_unit when the files
are later *read* (not when they are written). Tonight I realized the
padding match
Do you run a large Ceph cluster? Do you find that you run into issues
that you didn't have when your cluster was smaller? If so we have a new
mailing list for you!
Announcing the new ceph-large mailing list. This list is targeted at
experienced Ceph operators with cluster(s) over 500 OSDs to di
I've been trying to test radosgw multisite and have a pretty bad memory
leak.It appears to be associated only with multisite sync.
Multisite works well for a small numbers of objects.However, it all
fell over when I wrote in 8M 64K objects to two buckets overnight for
testing (via cosbench
Thanks, that's too far actually lol. And how things going with rbd
mirroring?
*German*
2016-10-20 14:49 GMT-03:00 yan cui :
> The two data centers are actually cross US. One is in the west, and the
> other in the east.
> We try to sync rdb images using RDB mirroring.
>
> 2016-10-20 9:54 GMT-07:
Thanks Jason, I will try to use your method.
2016-10-19 17:23 GMT-07:00 Jason Dillaman :
> On Wed, Oct 19, 2016 at 6:52 PM, yan cui wrote:
> > 2016-10-19 15:46:44.843053 7f35c9925d80 -1 librbd: cannot obtain
> exclusive
> > lock - not removing
>
> Are you attempting to delete the primary or non-
The two data centers are actually cross US. One is in the west, and the
other in the east.
We try to sync rdb images using RDB mirroring.
2016-10-20 9:54 GMT-07:00 German Anders :
> from curiosity I wanted to ask you what kind of network topology are you
> trying to use across the cluster? In th
from curiosity I wanted to ask you what kind of network topology are you
trying to use across the cluster? In this type of scenario you really need
an ultra low latency network, how far from each other?
Best,
*German*
2016-10-18 16:22 GMT-03:00 Sean Redmond :
> Maybe this would be an option for
The chart obviously didn’t go well. Here it is again
fio --direct=1 --sync=1 --rw={write,randwrite,read,randread} --bs={4M,4K}
--numjobs=1 --iodepth=1 --runtime=60 --size=5G --time_based --group_reporting
--name=journal-test
FIO Test Local disk S
Thanks Christion for the additional information and comments.
· upgraded the kernels, but still had poor performance
· Removed all the pools and recreated with just a replication of 3,
with the two pool for the data and metadata. No cache tier pool
· Turned back on the
- Le 20 Oct 16, à 15:03, Oliver Dzombic a écrit :
> Hi Christian,
> thank you for your time.
> The problem is deep scrub only.
> Jewel 10.2.2 is used.
> Thank you for your hint with manual deep scrubs on specific OSD's. I
> didnt come up with that idea.
> -
> Where do you know
> o
You can inspect source code or do:
ceph --admin-daemon /var/run/ceph/ceph-osd.OSD_ID.asok config show |
grep scrub # or similar
And then check in source code :)
On 10/20/2016 03:03 PM, Oliver Dzombic wrote:
> Hi Christian,
>
> thank you for your time.
>
> The problem is deep scrub only.
>
> Jewe
On Thu, Oct 20, 2016 at 2:45 PM, David Riedl wrote:
> Hi cephers,
>
> I want to use the newest features of jewel on my cluster. I already updated
> all kernels on the OSD nodes to the following version:
> 4.8.2-1.el7.elrepo.x86_64.
>
> The KVM hypervisors are running the CentOS 7 stock kernel (
>
On Thu, Oct 20, 2016 at 1:51 AM, Ahmed Mostafa
wrote:
> different OSDs
PGs -- but more or less correct since the OSDs will process requests
for a particular PG sequentially and not in parallel.
--
Jason
___
ceph-users mailing list
ceph-users@lists.cep
Hi Christian,
thank you for your time.
The problem is deep scrub only.
Jewel 10.2.2 is used.
Thank you for your hint with manual deep scrubs on specific OSD's. I
didnt come up with that idea.
-
Where do you know
osd_scrub_sleep
from ?
I am saw here lately on the mailinglist multiple ti
Hi cephers,
I want to use the newest features of jewel on my cluster. I already
updated all kernels on the OSD nodes to the following version:
4.8.2-1.el7.elrepo.x86_64.
The KVM hypervisors are running the CentOS 7 stock kernel (
3.10.0-327.22.2.el7.x86_64 )
If I understand it correctly, l
Hello,
On Thu, 20 Oct 2016 11:23:54 +0200 Oliver Dzombic wrote:
> Hi,
>
> we have here globally:
>
> osd_client_op_priority = 63
> osd_disk_thread_ioprio_class = idle
> osd_disk_thread_ioprio_priority = 7
> osd_max_scrubs = 1
>
If you google for osd_max_scrubs you will find plenty of threads,
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> William Josefsson
> Sent: 20 October 2016 10:25
> To: Nick Fisk
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] RBD with SSD journals and SAS OSDs
>
> On Mon, Oct 17, 2016 at 6
We do have 2 ceph (9.2.1) clusters, where one is sending snaphots of
pools to the other one for backup purposes.
Snapshots are fine, however the ceph pool get's blown up by sizes not
matching the snapshots.
Here's the size of a snapshot and the resulting cluster usage
afterwards. The snapshot is
Hi,
Interesting reading!
Any chance you could state some of your lessons (if any) you learned..?
I can, for example, imagine your situation would have been much better
with a replication factor of three instead of two..?
MJ
On 10/20/2016 12:09 AM, Kostis Fardelas wrote:
Hello cephers,
this
On Mon, Oct 17, 2016 at 6:16 PM, Nick Fisk wrote:
> Did you also set /check the c-states, this can have a large impact as well?
Hi Nick. I did try intel_idle.max_cstate=0, and I've got quite a
significant improvement as attached below. Thanks for this advice!
This is still with DIRECT=1, SYNC=1,
Hi,
we have here globally:
osd_client_op_priority = 63
osd_disk_thread_ioprio_class = idle
osd_disk_thread_ioprio_priority = 7
osd_max_scrubs = 1
to influence the scrubbing performance and
osd_scrub_begin_hour = 1
osd_scrub_end_hour = 7
to influence the scrubbing time frame
Now, as it seems,
Hello,
On Thu, 20 Oct 2016 07:56:55 + Patrik Martinsson wrote:
> Hi Christian,
>
> Thanks for your very detailed and thorough explanation, very much
> appreciated.
>
You're welcome.
> We have definitely thought of a design where we have dedicated nvme-
> pools for 'high-performance' as
Hi Christian,
Thanks for your very detailed and thorough explanation, very much
appreciated.
We have definitely thought of a design where we have dedicated nvme-
pools for 'high-performance' as you say.
At the same time I *thought* that having the journal offloaded to
another device *always*
Kostis,
Excellent article mate. This is the kind of war story that can really help
people out. Learning through (others) adversity.
Kris
> On 20 Oct 2016, at 00:09, Kostis Fardelas wrote:
>
> Hello cephers,
> this is the blog post on our Ceph cluster's outage we experienced some
> weeks ago
30 matches
Mail list logo