I am using Intel P3700DC 400G cards in a similar configuration (two per host) -
perhaps you could look at cards of that capacity to meet your needs.
I would suggest having such small journals would mean you will be constantly
blocking on journal flushes which will impact write performance and l
A while back I attempted to create an RBD volume manually - intending it to be
an exact size of another LUN around 100G. The command line instead took this
to be the default MB argument for size and so I ended up with a 102400 TB
volume. Deletion was painfully slow (I never used the volume, i
Thanks - all sorted.
> -Original Message-
> From: Nick Fisk [mailto:n...@fisk.me.uk]
> Sent: Monday, 23 May 2016 6:58 PM
> To: Adrian Saul; ceph-users@lists.ceph.com
> Subject: RE: RBD removal issue
>
> See here:
>
> http://cephnotes.ksperis.com/blog/2014
Are you using image-format 2 RBD images?
We found a major performance hit using format 2 images under 10.2.0 today in
some testing. When we switched to using format 1 images we literally got 10x
random write IOPS performance (1600 IOPs up to 3 IOPS for the same test).
From: ceph-users [
Sync will always be lower – it will cause it to wait for previous writes to
complete before issuing more so it will effectively throttle writes to a queue
depth of 1.
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ken
Peng
Sent: Wednesday, 25 May 2016 6:36 PM
To: ce
Also if for political reasons you need a “vendor” solution – ask Dell about
their DSS 7000 servers – 90 8TB disks and two compute nodes in 4RU would go a
long way to making up a multi-PB Ceph solution.
Supermicro also do a similar solution with some 36, 60 and 90 disk in 4RU
models.
Cisco ha
I am currently running our Ceph POC environment using dual Nexus 9372TX 10G-T
switches, each OSD host has two connections to each switch and they are formed
into a single 4 link VPC (MC-LAG), which is bonded under LACP on the host side.
What I have noticed is that the various hashing policies f
> > For two links it should be quite good - it seemed to balance across
> > that quite well, but with 4 links it seemed to really prefer 2 in my case.
> >
> Just for the record, did you also change the LACP policies on the switches?
>
> From what I gather, having fancy pants L3+4 hashing on the Li
I upgraded my Infernalis semi-production cluster to Jewel on Friday. While the
upgrade went through smoothly (aside from a time wasting restorecon
/var/lib/ceph in the selinux package upgrade) and the services continued
running without interruption. However this morning when I went to create
-
> From: Jason Dillaman [mailto:jdill...@redhat.com]
> Sent: Monday, 6 June 2016 11:00 AM
> To: Adrian Saul
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>
> Are you able to successfully run the following command successf
]# rados stat -p glebe-sata rbd_directory
glebe-sata/rbd_directory mtime 2016-06-06 10:18:28.00, size 0
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Adrian Saul
> Sent: Monday, 6 June 2016 11:11 AM
> To: dilla...@
/rados-classes/libcls_rbd.so: undefined symbol:
_ZN4ceph6buffer4list8iteratorC1EPS1_j
Trying to figure out why that is the case.
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Adrian Saul
> Sent: Monday, 6 June 2016 11:11 AM
they are failing.
> -Original Message-
> From: Adrian Saul
> Sent: Monday, 6 June 2016 12:29 PM
> To: Adrian Saul; dilla...@redhat.com
> Cc: ceph-users@lists.ceph.com
> Subject: RE: [ceph-users] Jewel upgrade - rbd errors after upgrade
>
>
> I have traced it
at.com]
> Sent: Monday, 6 June 2016 12:37 PM
> To: Adrian Saul
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Jewel upgrade - rbd errors after upgrade
>
> Odd -- sounds like you might have Jewel and Infernalis class objects and
> OSDs intermixed. I would double-check y
Centos 7 - the ugrade was done simply with "yum update -y ceph" on each node
one by one, so the package order would have been determined by yum.
From: Jason Dillaman
Sent: Monday, June 6, 2016 10:42 PM
To: Adrian Saul
Cc: ceph-users@list
Hi All,
We have a Jewel (10.2.1) cluster on Centos 7 - I am using an elrepo 4.4.1
kernel on all machines and we have an issue where some of the machines hang -
not sure if its hardware or OS but essentially the host including the console
is unresponsive and can only be recovered with a hardwar
I recently started a process of using rbd snapshots to setup a backup regime
for a few file systems contained in RBD images. While this generally works
well at the time of the snapshots there is a massive increase in latency (10ms
to multiple seconds of rbd device latency) across the entire cl
I would suggest caution with " filestore_odsync_write" - its fine on good SSDs,
but on poor SSDs or spinning disks it will kill performance.
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Somnath Roy
Sent: Friday, 15 July 2016 3:12 AM
To: Garg, Pankaj; ceph-users@list
I have SELinux disabled and it does the restorecon on /var/lib/ceph regardless
from the RPM post upgrade scripts.
In my case I chose to kill the restorecon processes to save outage time – it
didn’t affect the upgrade package completion.
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.c
ll trigger batches of PGs to deep scrub over
time to push out the distribution again?
Adrian Saul | Infrastructure Projects Team Lead
IT
T 02 9009 9041 | M +61 402 075 760
30 Ross St, Glebe NSW 2037
adrian.s...@tpgtelecom.com.au<mailto:adrian.s...@tpgtelecom.com.au> |
www
Hi All,
I have been reviewing the sizing of our PGs with a view to some intermittent
performance issues. When we have scrubs running, even when only a few are, we
can sometimes get severe impacts on the performance of RBD images, enough to
start causing VMs to appear stalled or unresponsive.
Anyone able to offer any advice on this?
Cheers,
Adrian
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Adrian Saul
> Sent: Friday, 14 July 2017 6:05 PM
> To: 'ceph-users@lists.ceph.com'
> Subject: [ceph
Depends on the error case – usually you will see blocked IO messages as well if
there is a condition causing OSDs to be unresponsive.
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ???
Sent: Friday, 4 August 2017 1:34 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-use
Hi Sam,
We use SCST for iSCSI with Ceph, and a pacemaker cluster to orchestrate the
management of active/passive presentation using ALUA though SCST device groups.
In our case we ended up writing our own pacemaker resources to support our
particular model and preferences, but I believe there
We are using Ceph on NFS for VMWare – we are using SSD tiers in front of SATA
and some direct SSD pools. The datastores are just XFS file systems on RBD
managed by a pacemaker cluster for failover.
Lessons so far are that large datastores quickly run out of IOPS and compete
for performance –
> I'd be interested in details of this small versus large bit.
The smaller shares is just simply to distribute the workload over more RBDs so
the bottleneck doesn’t become the RBD device. The size itself doesn’t
particularly matter but just the idea to distribute VMs across many shares
rather t
> SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage - MZ-
> 75E4T0B/AM | Samsung
The performance difference between these and the SM or PM863 range is night and
day. I would not use these for anything you care about with performance,
particularly IOPS or latency.
Their write lat
Yes - ams5-ssd would have 2 replicas, ams6-ssd would have 1 (@size 3, -2 = 1)
Although for this ruleset the min_size should be set to at least 2, or more
practically 3 or 4.
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sinan
Polat
Sent: Friday, 25 August 2017 3:02
If you are monitoring to ensure that it is mounted and active, a simple
check_disk on the mountpoint should work. If the mount is not present, or the
filesystem is non-responsive then this should pick it up. A second check to
perhaps test you can actually write files to the file system would no
> * Drop the odd releases, and aim for a ~9 month cadence. This splits the
> difference between the current even/odd pattern we've been doing.
>
> + eliminate the confusing odd releases with dubious value
> + waiting for the next release isn't quite as bad
> - required upgrades every 9 months
> I understand what you mean and it's indeed dangerous, but see:
> https://github.com/ceph/ceph/blob/master/systemd/ceph-osd%40.service
>
> Looking at the systemd docs it's difficult though:
> https://www.freedesktop.org/software/systemd/man/systemd.service.ht
> ml
>
> If the OSD crashes due to ano
Thanks for bringing this to attention Wido - its of interest to us as we are
currently looking to migrate mail platforms onto Ceph using NFS, but this seems
far more practical.
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Wido den Ho
Do you mean that after you delete and remove the crush and auth entries for the
OSD, when you go to create another OSD later it will re-use the previous OSD ID
that you have destroyed in the past?
Because I have seen that behaviour as well - but only for previously allocated
OSD IDs that have
We see the same messages and are similarly on a 4.4 KRBD version that is
affected by this.
I have seen no impact from it so far that I know about
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Jason Dillaman
> Sent: Thursday, 5 October
As an aside, SCST iSCSI will support ALUA and does PGRs through the use of
DLM. We have been using that with Solaris and Hyper-V initiators for RBD
backed storage but still have some ongoing issues with ALUA (probably our
current config, we need to lab later recommendations).
> -Origin
.
From: Samuel Soulard [mailto:samuel.soul...@gmail.com]
Sent: Thursday, 12 October 2017 11:20 AM
To: Adrian Saul
Cc: Zhu Lingshan ; dilla...@redhat.com; ceph-users
Subject: RE: [ceph-users] Ceph-ISCSI
Yes I looked at this solution, and it seems interesting. However, one point
often stick
I concur - at the moment we need to manually sum the RBD images to look at how
much we have "provisioned" vs what ceph df shows. in our case we had a rapid
run of provisioning new LUNs but it took a while before usage started to catch
up with what was provisioned as data was migrated in. Cep
What I have been doing with CephFS is make a number of hosts export the same
CephFS mountpoints i.e
cephfs01:/cephfs/home
cephfs02:/cephfs/home
...
I then put the hosts all under a common DNS A record i.e "cephfs-nfs" so it
resolves to all of the hosts exporting the share.
I then use autofs o
I found I could ignore the XFS issues and just mount it with the appropriate
options (below from my backup scripts):
#
# Mount with nouuid (conflicting XFS) and norecovery (ro snapshot)
#
if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then
> But shouldn't freezing the fs and doing a snapshot constitute a "clean
> unmount" hence no need to recover on the next mount (of the snapshot) -
> Ilya?
It's what I thought as well, but XFS seems to want to attempt to replay the log
regardless on mount and write to the device to do so. This wa
? It really should not make the entire platform unusable
for 10 minutes.
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Adrian Saul
> Sent: Wednesday, 6 July 2016 3:41 PM
> To: 'ceph-users@lists.ceph.com'
> Subje
very small FS
metadata updates going on and that is what is killing it.
Cheers,
Adrian
> -Original Message-
> From: Nick Fisk [mailto:n...@fisk.me.uk]
> Sent: Thursday, 22 September 2016 7:06 PM
> To: Adrian Saul; ceph-users@lists.ceph.com
> Subject: RE: Snap delete per
much.
Cheers,
Adrian
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Adrian Saul
> Sent: Thursday, 22 September 2016 7:15 PM
> To: n...@fisk.me.uk; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Snap delete
limit.
Sent from my SAMSUNG Galaxy S7 on the Telstra Mobile Network
Original message
From: Nick Fisk
Date: 23/09/2016 7:26 PM (GMT+10:00)
To: Adrian Saul , ceph-users@lists.ceph.com
Subject: RE: Snap delete performance impact
Looking back through my graphs when this
Hi All,
We have a jewel cluster (10.2.1) that we built up in a POC state (2 clients
also being mons, 12 SSD OSDs on 3 hosts, 20 SATA OSDs on 3 hosts). We have
connected up our "prod" environment to it and performed a migration for all the
OSDs so it is now 114 OSDs (36 SSD, 78 NL-SAS with a
Hi ,
As part of migration between hardware I have been building new OSDs and
cleaning up old ones (osd rm osd.x, osd crush rm osd.x, auth del osd.x). To
try and prevent rebalancing kicking in until all the new OSDs are created on a
host I use "ceph osd set noin", however what I have seen is
To: Gregory Farnum
> Cc: Adrian Saul; ceph-users@lists.ceph.com
> Subject: Re: [EXTERNAL] Re: [ceph-users] osd set noin ignored for old OSD
> ids
>
> From my experience noin doesn't stop new OSDs from being marked in. noin
> only works on OSDs already in the crushmap. To acco
Hi Ceph-users,
I just want to double check a new crush ruleset I am creating - the intent
here is that over 2 DCs, it will select one DC, and place two copies on
separate hosts in that DC. The pools created on this will use size 4 and
min-size 2.
I just want to check I have crafted this c
Thanks Wido.
I had found the show-utilization test, but had not seen show-mappings - that
confirmed it for me.
thanks,
Adrian
> -Original Message-
> From: Wido den Hollander [mailto:w...@42on.com]
> Sent: Monday, 12 December 2016 7:07 PM
> To: ceph-users@lists.ceph.com;
; Adrian
> >
> >
> > > -Original Message-
> > > From: Wido den Hollander [mailto:w...@42on.com]
> > > Sent: Monday, 12 December 2016 7:07 PM
> > > To: ceph-users@lists.ceph.com; Adrian Saul
> > > Subject: Re: [ceph-users] Crush rule c
I found the other day even though I had 0 weighted OSDs, there was still weight
in the containing buckets which triggered some rebalancing.
Maybe it is something similar, there was weight added to the bucket even though
the OSD underneath was 0.
> -Original Message-
> From: ceph-users
We started our cluster with consumer (Samsung EVO) disks and the write
performance was pitiful, they had periodic spikes in latency (average of 8ms,
but much higher spikes) and just did not perform anywhere near where we were
expecting.
When replaced with SM863 based devices the difference was
Can I confirm if this bluestore compression assert issue is resolved in 12.2.8?
https://tracker.ceph.com/issues/23540
I notice that it has a backport that is listed against 12.2.8 but there is no
mention of that issue or backport listed in the release notes.
> -Original Message-
> From
We are using Ceph+RBD+NFS under pacemaker for VMware. We are doing iSCSI using
SCST but have not used it against VMware, just Solaris and Hyper-V.
It generally works and performs well enough – the biggest issues are the
clustering for iSCSI ALUA support and NFS failover, most of which we have
We run CephFS in a limited fashion in a stretched cluster of about 40km with
redundant 10G fibre between sites – link latency is in the order of 1-2ms.
Performance is reasonable for our usage but is noticeably slower than
comparable local ceph based RBD shares.
Essentially we just setup the c
automount would probably work for you.
From: Up Safe [mailto:upands...@gmail.com]
Sent: Tuesday, 22 May 2018 12:33 AM
To: David Turner
Cc: Adrian Saul ; ceph-users
Subject: Re: [ceph-users] multi site with cephfs
I'll explain.
Right now we have 2 sites (racks) with several dozens of servers at
I would concur having spent a lot of time on ZFS on Solaris.
ZIL will reduce the fragmentation problem a lot (because it is not doing intent
logging into the filesystem itself which fragments the block allocations) and
write response will be a lot better. I would use different devices for L2AR
Possibly MySQL is doing sync writes, where as your FIO could be doing buffered
writes.
Try enabling the sync option on fio and compare results.
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Matteo Dacrema
> Sent: Wednesday, 8 March 20
]
Sent: Wednesday, 8 March 2017 10:36 AM
To: Adrian Saul
Cc: ceph-users
Subject: Re: [ceph-users] MySQL and ceph volumes
Thank you Adrian!
I’ve forgot this option and I can reproduce the problem.
Now, what could be the problem on ceph side with O_DSYNC writes?
Regards
Matteo
I am not sure if there is a hard and fast rule you are after, but pretty much
anything that would cause ceph transactions to be blocked (flapping OSD,
network loss, hung host) has the potential to block RBD IO which would cause
your iSCSI LUNs to become unresponsive for that period.
For the mo
for
krbd.
From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: Thursday, 6 April 2017 5:43 PM
To: Adrian Saul; 'Brady Deetz'; 'ceph-users'
Subject: RE: [ceph-users] rbd iscsi gateway question
I assume Brady is referring to the death spiral LIO gets into with some
initiators, inc
> > Early usage will be CephFS, exported via NFS and mounted on ESXi 5.5
> > and
> > 6.0 hosts(migrating from a VMWare environment), later to transition to
> > qemu/kvm/libvirt using native RBD mapping. I tested iscsi using lio
> > and saw much worse performance with the first cluster, so it seems
> Hi Alex,
>
> Have you experienced any problems with timeouts in the monitor action in
> pacemaker? Although largely stable, every now and again in our cluster the
> FS and Exportfs resources timeout in pacemaker. There's no mention of any
> slow requests or any peering..etc from the ceph logs so
t a storage system should
be about, hence why we are using it this way. Its been awesome to get stuck
into it and learn how it works and what it can do.
Adrian Saul | Infrastructure Projects Team Lead
TPG Telecom (ASX: TPM)
Confidentiality: This email and any attachments are confidenti
> Samsung EVO...
> Which exact model, I presume this is not a DC one?
>
> If you had put your journals on those, you would already be pulling your hairs
> out due to abysmal performance.
>
> Also with Evo ones, I'd be worried about endurance.
No, I am using the P3700DCs for journals. The Samsun
> >The Samsungs are the 850 2TB
> > (MZ-75E2T0BW). Chosen primarily on price.
>
> These are spec'ed at 150TBW, or an amazingly low 0.04 DWPD (over 5 years).
> Unless you have a read-only cluster, you will wind up spending MORE on
> replacing them (and/or loosing data when 2 fail at the same time)
I upgraded my lab cluster to 10.1.0 specifically to test out bluestore and see
what latency difference it makes.
I was able to one by one zap and recreate my OSDs to bluestore and rebalance
the cluster (the change to having new OSDs start with low weight threw me at
first, but once I worked t
It is the monitors that ceph clients/daemons can connect to initially to
connect with the cluster.
Once they connect to one of the initial mons they will get a full list of all
monitors and be able to connect to any of them to pull updated maps.
From: ceph-users [mailto:ceph-users-boun...@lis
Not sure about commands however if you look at the OSD mount point there is a
“bluefs” file.
From: German Anders [mailto:gand...@despegar.com]
Sent: Thursday, 31 March 2016 11:48 PM
To: Adrian Saul
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD crash after conversion to bluestore
is possible to run a command in order to see
> > that the OSD is actually using bluestore?
> >
> > Thanks in advance,
> >
> > Best,
> >
> >
> > **
> >
> > *German*
> >
> > 2016-03-31 1:24 GMT-03:00 Adrian Saul > <mailto:adrian.
We are close to being given approval to deploy a 3.5PB Ceph cluster that will
be distributed over every major capital in Australia.The config will be
dual sites in each city that will be coupled as HA pairs - 12 sites in total.
The vast majority of CRUSH rules will place data either local
Hello again Christian :)
> > We are close to being given approval to deploy a 3.5PB Ceph cluster that
> > will be distributed over every major capital in Australia.The config
> > will be dual sites in each city that will be coupled as HA pairs - 12
> > sites in total. The vast majority of C
om: Maxime Guyot [mailto:maxime.gu...@elits.com]
> Sent: Tuesday, 12 April 2016 5:49 PM
> To: Adrian Saul; Christian Balzer; 'ceph-users@lists.ceph.com'
> Subject: Re: [ceph-users] Mon placement over wide area
>
> Hi Adrian,
>
> Looking at the documentation RadosGW has mult
I could only see it being done using FCIP as the OSD processes use IP to
communicate.
I guess it would depend on why you are looking to use something like FC instead
of Ethernet or IB.
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Sc
> from the responses I've gotten, it looks like there's no viable option to use
> fibre channel as an interconnect between the nodes of the cluster.
> Would it be worth while development effort to establish a block protocol
> between the nodes so that something like fibre channel could be used to
>
75 matches
Mail list logo