thus limiting
things to whatever the overall latency (network, 3x write) is here.
With random writes you will engage more or less all OSDs that hold your
fio file, thus spreading things out.
This becomes more and more visible with increasing number of OSDs and
nodes.
Regards,
Christian
On Fri, 1
Hello,
On Thu, 26 Dec 2019 18:11:29 +0100 Ml Ml wrote:
> Hello Christian,
>
> thanks for your reply. How should i benchmark my OSDs?
>
Benchmarking individual components can be helpful if you suspect
something, but you need to get a grip on what your systems are doing,
re-read
al client, your backup server.
Make sure your network does what you want and monitor the ceph nodes with
ie. atop during the test runs to see where obvious bottlenecks are.
Christian
> I mount the stuff with rbd from the backup server. It seems that i get
> good write, but slow read speed. More
>
> >> There's a bug in the current stable Nautilus release that causes a loop
> and/or crash in get_obj_data::flush (you should be able to see it gobbling
> up CPU in perf top). This is the related issue:
> https://tracker.ceph.com/issues/39660 -- it should be fixed as soon as
> 14.2.5 is released
Hi,
I used https://github.com/dvassallo/s3-benchmark to measure some
> performance values for the rgws and got some unexpected results.
> Everything above 64K has excellent performance but below it drops down to
> a fraction of the speed and responsiveness resulting in even 256K objects
> being fa
than anything below 64K.
Does anyone observe similar effects while running this benchmark? Is it the
benchmarks fault or are there some options to tweak performance for low
object sizes?
s3-benchmark results below.
Best,
Christian
$ ./s3-benchmark -region us-east-1 -threads-min=8 -threads-max=8
labor intensive and a nuisance for real users) as
well as harsher ingress and egress (aka spamfiltering) controls you will
find that all the domains spamvertized are now in the Spamhaus DBL.
"host abbssm.edu.in.dbl.spamhaus.org"
Pro tip for spammers:
Don't get my attention, ever.
Ch
Thank you Robin.
Looking at the video it doesn't seem like a fix is anywhere near ready.
Am I correct in concluding that Ceph is not the right tool for my use-case?
Cheers,
Christian
On Oct 3 2019, at 6:07 am, Robin H. Johnson wrote:
> On Wed, Oct 02, 2019 at 01:48:40PM +0200, C
Hi Martin,
Even before adding cold storage on HDD, I had the cluster with SSD only. That
also could not keep up with deleting the files.
I am no where near I/O exhaustion on the SSDs or even the HDDs.
Cheers,
Christian
On Oct 2 2019, at 1:23 pm, Martin Verges wrote:
> Hello Christian,
>
incoming files to
the cluster.
I'm running 5 rgw servers, but that doesn't really change anything from
when I was running less. I've tried adjusting rgw lc max objs, but again no
change in performance.
Any suggestions on how I can tune the lifecycle process?
Ch
Hello,
On Sun, 4 Aug 2019 06:34:46 -0500 Mark Nelson wrote:
> On 8/4/19 6:09 AM, Paul Emmerich wrote:
>
> > On Sun, Aug 4, 2019 at 3:47 AM Christian Balzer wrote:
> >
> >> 2. Bluestore caching still broken
> >> When writing data with the fios below, it
Reads from a hot cache with direct=0
read: IOPS=199, BW=797MiB/s (835MB/s)(32.0GiB/41130msec)
with direct=1
read: IOPS=702, BW=2810MiB/s (2946MB/s)(32.0GiB/11662msec)
Which is as fast as gets with this setup.
Comments?
Christian
--
Christian BalzerNetwork/Systems Engineer
?
Regards,
Christian
PS: Sorry for the resend, I used the wrong sending address.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On Thu, 25 Jul 2019 13:49:22 +0900 Sangwhan Moon wrote:
> osd: 39 osds: 39 up, 38 in
You might want to find that out OSD.
--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com Rakuten Mobile Inc.
___
ceph-us
ating-bluestores-block-db/
>
> 3.- Follow the documentation
>
> https://swamireddy.wordpress.com/2016/02/19/ceph-how-to-add-the-ssd-journal/
>
> Thanks for the help
>
> El dom., 7 jul. 2019 a las 14:39, Christian Wuerdig (<
> christian.wuer...@gmail.com>) escribió:
>
&
One thing to keep in mind is that the blockdb/wal becomes a Single Point Of
Failure for all OSDs using it. So if that SSD dies essentially you have to
consider all OSDs using it as lost. I think most go with something like 4-8
OSDs per blockdb/wal drive but it really depends how risk-averse you are
db/002923.sst
Jun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:29.831
7fa9bd453d80 -1 *** Caught signal (Aborted) **
Is there any way to recover any of these OSDs?
Karlsruhe Institute of Technology (KIT)
Pervasive Computing Systems – TECO
Prof. Dr. Michael Beigl
IT
Christian Wahl
Vincenz-Prießnitz-Str. 1
Building 07.07., 2nd floor
76131 Karlsruhe, Germany
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
on is triggered. The additional improvement is Snappy compression.
> We rebuild ceph with support for it. I can create PR with it, if you want :)
>
>
> Best Regards,
>
> Rafał Wądołowski
> Cloud & Security Engineer
>
> On 25.06.2019 22:16, Christian Wuerdig wrote:
>
>
The sizes are determined by rocksdb settings - some details can be found
here: https://tracker.ceph.com/issues/24361
One thing to note, in this thread
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030775.html
it's noted that rocksdb could use up to 100% extra space during compact
The simple answer is because k+1 is the default min_size for EC pools.
min_size means that the pool will still accept writes if that many failure
domains are still available. If you set min_size to k then you have entered
the dangerous territory that if you loose another failure domain (OSD or
host
Hello,
It's now May and nothing has changed here or in the tracker for the
related Bionic issue.
At this point in time it feels like Redhat/DIY or bust, neither are very
enticing prospects.
Definitely not going to deploy a Stretch and Luminous cluster next in July.
Christian
On Thu, 2
On Sun, 28 Apr 2019 at 21:45, Igor Podlesny wrote:
> On Sun, 28 Apr 2019 at 16:14, Paul Emmerich
> wrote:
> > Use k+m for PG calculation, that value also shows up as "erasure size"
> > in ceph osd pool ls detail
>
> So does it mean that for PG calculation those 2 pools are equivalent:
>
> 1) EC(
iting to the same OSDs 1000
times, again no gain from distribution here.
This should get you hopefully on the right track.
Christian
On Sun, 21 Apr 2019 13:55:37 +0700 Muhammad Fakhri Abdillah wrote:
> Hey everyone,
> Currently running a 4 node Proxmox cluster with external Ceph cluster (Ce
On Wed, 17 Apr 2019 16:08:34 +0200 Lars Täuber wrote:
> Wed, 17 Apr 2019 20:01:28 +0900
> Christian Balzer ==> Ceph Users :
> > On Wed, 17 Apr 2019 11:22:08 +0200 Lars Täuber wrote:
> >
> > > Wed, 17 Apr 2019 10:47:32 +0200
> > > Paul Emmerich ==
are you probably want to reduce
> > recovery speed anyways if you would run into that limit
> >
> > Paul
> >
>
> Lars
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://
25GBit network was because a remark of
> someone, that the latency in this ethernet is way below that of 10GBit. I
> never double checked this.
>
Correct, 25Gb/s is a split of 100GB/s, inheriting the latency advantages
from it.
So if you do a lot of small IOPS, this will help.
ou're using
object store), how busy those disks and CPUs are, etc.
That kind of information will be invaluable for others here and likely the
developers as well.
Regards,
Christian
> Kind regards,
>
> Charles Alva
> Sent from Gmail Mobile
--
Christian BalzerNe
Hello,
On Wed, 10 Apr 2019 20:09:58 +0200 Paul Emmerich wrote:
> On Wed, Apr 10, 2019 at 11:12 AM Christian Balzer wrote:
> >
> >
> > Hello,
> >
> > Another thing that crossed my mind aside from failure probabilities caused
> > by actual HDDs dying i
week" situation like experienced
with several people here, you're even more like to wind up in trouble very
fast.
This is of course all something people do (or should know), I'm more
wondering how to model it to correctly asses risks.
Christian
On Wed, 3 Apr 2019 10:28:09 +0900 Ch
rruption,
that's not really a option for most people either.
Looks like Ceph is out of the race for multi-PB use case here, unless
multi-site and dynamic resharding are less than 6 months away.
Regards,
Christian
> The 'stale-instances rm' command is not safe to run in multisi
On Tue, 2 Apr 2019 19:04:28 +0900 Hector Martin wrote:
> On 02/04/2019 18.27, Christian Balzer wrote:
> > I did a quick peek at my test cluster (20 OSDs, 5 hosts) and a replica 2
> > pool with 1024 PGs.
>
> (20 choose 2) is 190, so you're never going to have more tha
Hello Hector,
Firstly I'm so happy somebody actually replied.
On Tue, 2 Apr 2019 16:43:10 +0900 Hector Martin wrote:
> On 31/03/2019 17.56, Christian Balzer wrote:
> > Am I correct that unlike with with replication there isn't a maximum size
> > of the critical path
Hello,
The answer is yes, as a quick search would have confirmed, for example:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/administration_guide/osd-bluestore
It's also been discussed here for years. ^.^
Christian
On Mon, 1 Apr 2019 13:10:04 +0700 Igor Pod
ully:
> * Some may route PCI through a multi-mode SAS/SATA HBA
> * Watch for PCI bridges or multiplexing
> * Pinning, minimize data over QPI links
> * Faster vs more cores can squeeze out more performance
>
> AMD Epyc single-socket systems may be very interesting for NVMe OSD nod
uce things down to
the same risk as a 3x replica pool.
Feedback welcome.
Christian
--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.cep
#x27;t find this thread, most significant post here but read
it all:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033799.html
In short, a 30GB DB(and thus WAL) partition should do the trick for many
use cases and will still be better than nothing.
Christian
> I had a bunch of no
had timed out after 15
---
Regards,
Christian
[1]
These kinds of reset, the logging happens after the fact, it takes about
40 seconds actually:
---
[54954736.886707] ata5.00: exception Emask 0x0 SAct 0xc0 SErr 0x0 action 0x6
frozen
[54954736.887424] ata5.00: failed command: WRITE FPDMA QUEUED
[549547
't
> clocked super high it's possible that you might see a benefit to 2x
> OSDs/device.
>
With EPYC CPUs and their rather studly interconnect NUMA feels less of an
issue than previous generations.
Of course pinning would still be beneficial.
That said, avoiding it altogeth
99?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299&rgwx-zonegroup=de6af748-1a2f-44a1-9d44-30799cf1313e
HTTP/1.1" 404 0 - -
From: Matthew H
Date: Tuesday, March 5, 2019 at 10:03 AM
To: Christian Rice , ceph-users
Subject: Re: radosgw sync falling behind regu
ot;1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
"zones": [
{
"id": "107d29a0-b732-4bf1-a26e-1f64f820e839",
"name": "dc11-prod",
"endpoints": [
"http://dc11-ceph-rgw1:8080&quo
full sync: 0/64 shards
failed to fetch local sync status: (5) Input/output error
^C
Any advice? All three clusters on 12.2.11, Debian stretch.
From: Christian Rice
Date: Thursday, February 28, 2019 at 9:06 AM
To: Matthew H , ceph-users
Subject: Re: radosgw sync fa
Yeah my bad on the typo, not running 12.8.8 ☺ It’s 12.2.8. We can upgrade and
will attempt to do so asap. Thanks for that, I need to read my release notes
more carefully, I guess!
From: Matthew H
Date: Wednesday, February 27, 2019 at 8:33 PM
To: Christian Rice , ceph-users
Subject: Re
Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters
in one zonegroup.
Often we find either metadata or data sync behind, and it doesn’t look to ever
recover until…we restart the endpoint radosgw target service.
eg at 15:45:40:
dc11-ceph-rgw1:/var/log/ceph# radosgw-adm
!
Von: David Turner
Gesendet: Dienstag, 19. Februar 2019 19:32
An: Hennen, Christian
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] CephFS: client hangs
You're attempting to use mismatching client name and keyring. You want to use
matching name and keyring. For your example
127.0.0.1:6789
ceph-fuse --keyring /etc/ceph/ceph.client.admin.keyring --name client.cephfs -m
192.168.1.17:6789 /mnt/cephfs
-Ursprüngliche Nachricht-
Von: Yan, Zheng
Gesendet: Dienstag, 19. Februar 2019 11:31
An: Hennen, Christian
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users]
nippets/77
> MDS log: https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple)
Kind regards
Christian Hennen
Project Manager Infrastructural Services ZIMK University of Trier
Germany
Von: Ashley Merrick
Gesendet: Montag, 18. Februar 2019 16:53
An: Hennen, Christian
C
ance ever you could also permanently set noout
and nodown and live with the consequences and warning state.
But of course everybody will (rightly) tell you that you need enough
capacity to at the very least deal with a single OSD loss.
Christian
--
Christian BalzerNetwork/Systems Engineer
k.
'ceph osd blacklist ls' shows 0 entries.
Kind regards
Christian Hennen
Project Manager Infrastructural Services ZIMK University of Trier
Germany
smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@l
+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1
replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x564343eed100], fw
to mds.1
Updates from 12.2.8 to 12.2.11 I ran last week didn't help.
Anybody got an idea or a hint where I could look into next? Any help would
be greatly appre
isks / replication => 95K * 6 / 3 => 190K or 6x off?
>
> No measureable service time in iostat when running tests, thus I have
> come to the conclusion that it has to be either client side, the
> network path, or the OSD-daemon that deliveres the increasing latency /
> d
also avoid the data movement back and forth.
And if you see that recovering the node will take a long time, just
manually set things out for the time being.
Christian
On Sun, 27 Jan 2019 00:02:54 +0100 Götz Reinicke wrote:
> Dear Chris,
>
> Thanks for your feedback. The node/OSDs in que
If you use librados directly it's up to you to ensure you can identify your
objects. Generally RADOS stores objects and not files so when you provide
your object ids you need to come up with a convention so you can correctly
identify them. If you need to provide meta data (i.e. a list of all
existi
Re: [ceph-users] any way to see enabled/disabled status of bucket sync?
Hi Christian,
The easiest way to do that is probably the 'radosgw-admin bucket sync
status' command, which will print "Sync is disabled for bucket ..." if
disabled. Otherwise, you could use 'radosgw-
Is there a command that will show me the current status of bucket sync (enabled
vs disabled)?
Referring to
https://github.com/ceph/ceph/blob/b5f33ae3722118ec07112a4fe1bb0bdedb803a60/src/rgw/rgw_admin.cc#L1626
___
ceph-users mailing list
ceph-users@list
complexity” right. I need to know in general if what I’m
trying to do is possible, and find out about the caveats/best-practices.
--Christian
What I’ve tried: latest luminous releases; realm=earth, (replicating setup:
zonegroup=us, zones=us-west-1-rep, us-west-2-rep, us-east-1-rep) +
(non
is case the question is, how soon is the new controller going
to be there?
If it's soon and/or if rebalancing would severely impact the cluster
performance, I'd set noout and then shut the node down, stopping both the
flapping and preventing data movement.
Of course if it's a lon
/or other bluestore particulars (reduced caching
space, not caching in some situations) are rearing their ugly heads.
Christian
> See the following links for details:
>
> https://www.percona.com/blog/2018/02/08/fsync-performance-storage-devices/
>
> https://www.sebastien-han.fr/b
sibly in the logs.
> The systems were never powered off or anything during the conversion
> from filestore to bluestore.
>
So anything mentioned as well as kernel changes don't apply.
I shall point the the bluestore devs then. >.>
Christian
--
Christian BalzerNetwork/
compare?
Anything else like controller cache/battery/BIOS settings/etc that might
have changed during the migration?
Christian
> Tyler Bishop
> EST 2007
>
>
> O: 513-299-7108 x1000
> M: 513-646-5809
> http://BeyondHosting.net
>
>
> This email is intended on
s
2323.4 | avio 0.41 ms |
---
The numbers tend to be a lot higher than what the actual interface is
capable of, clearly the SSD is reporting its internal activity.
In any case, it should give a good insight of what is going on activity
wise.
Also for posterity and curiosity, what kind of SSDs?
Hello,
On Tue, 16 Oct 2018 14:09:23 +0100 (BST) Andrei Mikhailovsky wrote:
> Hi Christian,
>
>
> - Original Message -
> > From: "Christian Balzer"
> > To: "ceph-users"
> > Cc: "Andrei Mikhailovsky"
> > Sen
recently.
If your cluster was close to the brink with filestore just moving it to
bluestore would nicely fit into what you're seeing, especially for the
high stress and cache bypassing bluestore deep scrubbing.
Regards,
Christian
> I have recently migrated all osds to the bluestore, which
applicable to my multi-zone
realm?
TIA,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
the above
or go all SSD (if that's affordable, maybe the RAM savings will help) and
thus bypass all the bluestore performance issues at least.
Regards,
Christian
On Tue, 2 Oct 2018 19:28:13 +0200 jes...@krogh.cc wrote:
> Hi.
>
> Based on some recommendations we have setup our Ce
14. September 2018 11:31, "John Spray" schrieb:
> On Thu, Sep 13, 2018 at 7:55 PM Christian Albrecht wrote:
>
>> Hi all,
>> ...
>> Let me know I have to provide more information on this.
>
> There was very little change in ceph-mgr between 12.2.7 an
python module
'dashboard'
2018-09-13 10:13:12.701486 7f7c4e11e700 1 mgr init Loading python module
'prometheus'
2018-09-13 10:13:12.772187 7f7c4e11e700 1 mgr init Loading python module
'restful'
2018-09-13 10:13:13.282123 7f7c4e11e700 1 mgr init Loading python module
&
uess 3GB could be on the safe side.
Christian
On Thu, 30 Aug 2018 16:31:37 -0700 David Turner wrote:
> Be very careful trying to utilize more RAM while your cluster is healthy.
> When you're going to need the extra RAM is when you're closer is unhealthy
> and your osds ar
torage per node, SAN and FC are anathema and NVMe
is likely not needed in your scenario, at least not for actual storage
space.
Christian
--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hello,
On Fri, 24 Aug 2018 11:30:34 +0300 (EEST) Fyodor Ustinov wrote:
> Hi!
>
> I wait about hour.
>
Aside from verifying those timeout values in your cluster, what's your
mon_osd_down_out_subtree_limit set to?
Christian
> - Original Message -
> From: &qu
d be very telling, collecting and
graphing this data might work, too.
My suspects would be deep scrubs and/or high IOPS spikes when this is
happening, starving out OSD processes (CPU wise, RAM should be fine one
supposes).
Christian
> Please help!!!
> ______
tions when you really don't want to have to deal
with under-resourced hardware.
On Wed, 8 Aug 2018 at 12:26, Cheyenne Forbes
wrote:
>
> Next time I will ask there, any number of core recommendation?
>
> Regards,
>
> Cheyenne O. Forbes
>
>
> On Tue, Aug 7, 2018 at
It should be added though that you're running at only 1/3 of the
recommended RAM usage for the OSD setup alone - not to mention that
you also co-host MON, MGR and MDS deamons on there. The next time you
run into an issue - in particular with OSD recovery - you may be in a
pickle again and then it m
ceph-users is a better place to ask this kind of question.
Anyway the 1GB RAM per TB storage recommendation still stands as far as I
know plus you want some for the OS and some safety margin so in your case
64GB seem sensible
On Wed, 8 Aug 2018, 01:51 Cheyenne Forbes,
wrote:
> The case is 28TB
a new pool and copy the data over if you want to
change your current profile
Cheers
Christian
On Sat, 21 Jul 2018, 01:52 Ziggy Maes, wrote:
> Hello Caspar
>
>
>
> That makes a great deal of sense, thank you for elaborating. Am I correct
> to assume that if we were to use a k=2, m
ring the original object is about
5kB.
Thanks,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ncrease that
number to a couple of hundred million objects. Do you see any problems with
that, provided that the bucket index is sharded appropriately?
Any help is appreciated. Let me know if you need anything like logs,
configs, etc.
Thanks!
Christian
n user, which had the
> caps necessary to blacklist the dead clients and clean up the dirty
> exclusive lock on the image.
>
> On Fri, Jun 22, 2018 at 4:47 PM Gregory Farnum wrote:
>
>> On Fri, Jun 22, 2018 at 2:26 AM Christian Zunker
>> wrote:
>>
>>> Hi
on that topic...)
regards
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
any other suggestions would be
appreciated.
Cluster health details are available here:
https://gitlab.uni-trier.de/snippets/65
Regards
Christian Hennen
Project Manager Infrastructural Services
Zentrum für Informations-, Medien-
und Kommunikationstechnologie (ZIMK)
Universität Trier
saturating your disks with IOPS long before bandwidth becomes
an issue.
> thus, a 10GB network would be needed, right ? Maybe a dual gigabit port
> bonded together could do the job.
> A single gigabit link would be saturated by a single disk.
>
> Is my assumption correct ?
>
T
he 1Gb/s links.
Lastly, more often than not segregated networks are not needed, add
unnecessary complexity and the resources spent on them would be better
used to have just one fast and redundant network instead.
Christian
--
Christian BalzerNetwork/Systems Engineer
Hello,
On Wed, 25 Apr 2018 17:20:55 -0400 Jonathan Proulx wrote:
> On Wed Apr 25 02:24:19 PDT 2018 Christian Balzer wrote:
>
> > Hello,
>
> > On Tue, 24 Apr 2018 12:52:55 -0400 Jonathan Proulx wrote:
>
> > > The performence I really care about is over rbd
rly
> high performance setup but I do expect a bit mroe performance out of
> it.
>
> Are my expectations wrong? If not any clues what I've don (or failed
> to do) that is wrong?
>
> Pretty sure rx/wx was much more sysmetric in earlier versions (subset
> of same hardware
Hello,
On Tue, 24 Apr 2018 11:39:33 +0200 Florian Florensa wrote:
> 2018-04-24 3:24 GMT+02:00 Christian Balzer :
> > Hello,
> >
>
> Hi Christian, and thanks for your detailed answer.
>
> > On Mon, 23 Apr 2018 17:43:03 +0200 Florian Florensa wrote:
> >
as usual and 1-2GB extra per OSD.
> Regards,
>
> Florian
> _______
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Christian BalzerNetwork/Systems Engineer
e to the number of objects (files)?
If so reads will require many more disk accesses than otherwise.
This is a typical wall to run into and can be mitigated by more RAM and
sysctl tuning.
Christian
> Is there anything we can do about this, short of replacing hardware? Is
> it really
The general recommendation is to target around 100 PG/OSD. Have you tried
the https://ceph.com/pgcalc/ tool?
On Wed, 4 Apr 2018 at 21:38, Osama Hasebou wrote:
> Hi Everyone,
>
> I would like to know what kind of setup had the Ceph community been using
> for their Openstack's Ceph configuration w
Hello,
On Mon, 2 Apr 2018 08:33:35 +0200 John Hearns wrote:
> Christian, you mention single socket systems for storage servers.
> I often thought that the Xeon-D would be ideal as a building block for
> storage servers
> https://www.intel.com/content/www/us/en/products/proce
Hello,
firstly, Jack pretty much correctly correlated my issues to Mark's points,
more below.
On Sat, 31 Mar 2018 08:24:45 -0500 Mark Nelson wrote:
> On 03/29/2018 08:59 PM, Christian Balzer wrote:
>
> > Hello,
> >
> > my crappy test cluster was rendered inope
eir machines.
2. Having a per OSD cache is inefficient compared to a common cache like
pagecache, since an OSD that is busier than others would benefit from a
shared cache more.
3. A uniform OSD cache size of course will be a nightmare when having
non-uniform HW, either with RAM or number of O
s to
configure things on either host or switch, or with a good modern switch
not even buy that much in the latency department.
Christian
>
> 2018-03-26 7:41 GMT+07:00 Christian Balzer :
>
> >
> > Hello,
> >
> > in general and as reminder for others, the more informati
.
Some of these things can be mitigated by throttling Ceph down where such
knobs exist, others have gotten better over time with newer versions, but
penultimately your storage system will be the bottleneck.
Christian
> I'm removing 21g layering image and it takes ages, while the image is
of your current setup and will want to
avoid hitting them again.
Having a dedicated SSD pool for high-end VMs or a cache-tier (if it is a
fit, not likely in your case) would be a way forward if your client
demands are still growing.
Christian
>
>
> 3. We use 1 SSD for journaling 7 HDD
I think the primary area where people are concerned about latency are rbd
and 4k block size access. OTOH 2.3us latency seems to be 2 orders of
magnitude below of what seems to be realistically achievable on a real
world cluster anyway (
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/
will get "journaled" in the WAL/DB
akin to the journal with filestore, so depending on your use case you may
see up to a 2x amplification.
Of course any write will also cause (at least one) other write to the
RocksDB, but that's more or less on par with plain filesystem journals and
the
e twice the action, are they
a) really twice as fast or
b) is your load never going to be an issue anyway?
Christian
> I'm also using Luminous/Bluestore if it matters.
>
> Thanks in advance!
>
> *Mark Steffen*
> *"Don't believe everything you read on the Internet
nage to IT mode style
exposure of the disks and still use their HW cache.
Christian
--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com Rakuten Communications
___
ceph-users mailing list
ceph-users@list
srkB/swkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sdb 0.0383.097.07 303.24 746.64 5084.9937.59
0.050.15 0.71 0.13 0.06 2.00
---
300 write IOPS and 5MB/s for all that time.
Christian
--
Christian BalzerNetwork
Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of
storage? Literally everything posted on this list in relation to HW
requirements and related problems will tell you that this simply isn't
going to work. The slightest hint of a problem will simply kill the OSD
nodes with OOM. Hav
Thanks! I'm still puzzled as to _what_ data is moving if the OSD was
previously "out" and didn't host any PG (according to pg dump). The
host only had one other OSD which was already "out" and had zero weight.
It looks like Ceph is moving some other data, which wasn't hosted on
the re-weighted O
1 - 100 of 1459 matches
Mail list logo