Hi Brian,
Will check on that also.
On Mon, Apr 3, 2017 at 4:53 PM, Brian : wrote:
> Hi Vlad
>
> Is there anything in syslog on any of the hosts when this happens?
>
> Had a similar issue with a single node recently and it was caused by a
> firmware issue on a single ssd. That would cause the
It's exclusive in that only a single client can write to an image at a
time, but it's not exclusive in that it prevents other clients from
cooperatively requesting the exclusive lock when they have an
outstanding write request. This cooperative lock transition was always
a stop-gap design to handle
Here is what I tried: (several times)
Nothing works
The best I got was following the Ceph guide and adding
sudo yum install centos-release-ceph-jewel
When I do that its finishes and never works, acts like no one is talking.
(selinux off, firewalld off)
admin node $ ceph health
2017-04-06 18:45:47
What are size and min_size for pool '7'... and why?
On Fri, Apr 7, 2017 at 4:20 AM, David Welch wrote:
> Hi,
> We had a disk on the cluster that was not responding properly and causing
> 'slow requests'. The osd on the disk was stopped and the osd was marked down
> and then out. Rebalancing succe
> Op 7 april 2017 om 1:04 schreef Ben Hines :
>
>
> Personally before extreme measures like marking lost, i would try bringing
> up the osd, so it's up and out -- i believe the data will still be found
> and re balanced away from it by Ceph.
Indeed, do not mark it as lost yet. Start the OSD (1
Personally before extreme measures like marking lost, i would try bringing
up the osd, so it's up and out -- i believe the data will still be found
and re balanced away from it by Ceph.
-Ben
On Thu, Apr 6, 2017 at 11:20 AM, David Welch wrote:
> Hi,
> We had a disk on the cluster that was not r
Hello,
I am building my first cluster on Ubuntu 16.04 with jewel 10.2.6, to
host rbd images for qemu (also on Ubuntu 16.04). I've been lurking on
this list for some time, thanks to all you regular posters for so many
valuable insights!
Tried to test the exclusive-lock rbd feature. It does not see
On 04/06/2017 01:54 PM, Adam Carheden wrote:
60-80MBs/s for what sort of setup? Is that 1Gbe rather than 10Gbe?
60-80MB/s per disk, assuming fairly standard 7200RPM disks before any
replication takes place and assuming journals are on SSDs with fast
O_DSYNC write performance. Any network lim
60-80MBs/s for what sort of setup? Is that 1Gbe rather than 10Gbe?
I consistently get 80-90Mb/s bandwidth as measured by `rados bench -p
rbd 10 write` run from a ceph node on a cluster with:
* 3 nodes
* 4 OSD/node, 600GB 15kRPM SAS disks
* 1G disk controller cache write cache shared by all disks i
With filestore on XFS using SSD journals that have good O_DSYNC write
performance, we typically see between 60-80MB/s per disk before
replication for large object writes. This is assuming there are no
other bottlenecks or things going on though (pg splitting, recovery,
network issues, etc). P
Hi,
We had a disk on the cluster that was not responding properly and
causing 'slow requests'. The osd on the disk was stopped and the osd was
marked down and then out. Rebalancing succeeded but (some?) pgs from
that osd are now stuck in stale+active+clean state, which is not being
resolved (s
Also make sure your PGs per pool and per entire Cluster are correct...
you want 50-100 PGs per OSD total, otherwise performance can be
impacted. Also if the cluster is new, it might take it a little while to
rebalance and be available 100%, at that point speed can be affected too.
Those are a
Hello,
We have an unusual scrub failure on one of our PGs. Ordinarily we can trigger a
repair using ceph pg repair, however this mechanism fails to cause a repair
operation to be initiated.
On looking through the logs, we have discovered the original cause of the scrub
error, a single file whi
I've reduced OSDs to 12 and moved journal to ssd drives and now have
"boost" with writes to ~33-35MB/s. Is it maximum without full ssd
pools?
Best,
Stan
2017-04-06 9:34 GMT+02:00 Stanislav Kopp :
> Hello,
>
> I'm evaluate ceph cluster, to see if you can use it for our
> virtualization solution
On 03/25/17 23:01, Nick Fisk wrote:
>
>> I think I owe you another graph later when I put all my VMs on there
>> (probably finally fixed my rbd snapshot hanging VM issue ...worked around it
>> by disabling exclusive-lock,object-map,fast-diff). The bandwidth hungry ones
>> (which hung the most often
On Thu, 6 Apr 2017 14:27:01 +0100, Nick Fisk wrote:
...
> > I'm not to sure what you're referring to WRT the spiral of death, but we did
> > patch some LIO issues encountered when a command was aborted while
> > outstanding at the LIO backstore layer.
> > These specific fixes are carried in the mai
Hi Dan,
did you mean "we have not yet..."?
Yes! That's what I meant.
Chrony does much better a job than NTP, at least here :-)
MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
We were beta till early Feb. so we are relatively young. If there are
issues/bugs, we'd certainly be interested to know through our forum. Note that
with us you can always use the cli and bypass the UI and it will be straight
Ceph/LIO commands if you wish.
From: Brady Deetz
Sent: Thursday, A
> -Original Message-
> From: David Disseldorp [mailto:dd...@suse.de]
> Sent: 06 April 2017 14:06
> To: Nick Fisk
> Cc: 'Maged Mokhtar' ; 'Brady Deetz'
> ; 'ceph-users'
> Subject: Re: [ceph-users] rbd iscsi gateway question
>
> X-Assp-URIBL failed: 'suse.de'(black.uribl.com )
> X-Assp-Spa
I appreciate everybody's responses here. I remember the announcement of
Petasan a whole back on here and some concerns about it.
Is anybody using it in production yet?
On Apr 5, 2017 9:58 PM, "Brady Deetz" wrote:
> I apologize if this is a duplicate of something recent, but I'm not
> finding mu
Hi,
On Thu, 6 Apr 2017 13:31:00 +0100, Nick Fisk wrote:
> > I believe there
> > was a request to include it mainstream kernel but it did not happen,
> > probably waiting for TCMU solution which will be better/cleaner design.
Indeed, we're proceeding with TCMU as a future upstream acceptable
im
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Maged Mokhtar
> Sent: 06 April 2017 12:21
> To: Brady Deetz ; ceph-users
> Subject: Re: [ceph-users] rbd iscsi gateway question
>
> The io hang (it is actually a pause not hang) is done by Ce
The io hang (it is actually a pause not hang) is done by Ceph only in case
of a simultaneous failure of 2 hosts or 2 osds on separate hosts. A single
host/osd being out will not cause this. In PetaSAN project www.petasan.org
we use LIO/krbd. We have done a lot of tests on VMWare, in case of io
Hi John,
Have you managed to reproduce the test case on your side? Any hints on
how to proceed, or if anything I could help with? I've been trying to
understand the protocol between the MDS and the fuse client, but if you
can point me to any docs on the rationale of what the implementation i
> On 6 Apr 2017, at 08:42, Nick Fisk wrote:
>
> I assume Brady is referring to the death spiral LIO gets into with some
> initiators, including vmware, if an IO takes longer than about 10s.
We have occasionally seen this issue with vmware+LIO, almost always when
upgrading OSD nodes. Didn’t re
On 06/04/2017 09:42, Nick Fisk wrote:
>
> I assume Brady is referring to the death spiral LIO gets into with
> some initiators, including vmware, if an IO takes longer than about
> 10s. I haven’t heard of anything, and can’t see any changes, so I
> would assume this issue still remains.
>
>
>
> I
In my case I am using SCST, so that is what my experience is based on. For our
VMware we are using NFS, but for Hyper-V and Solaris we are using iSCSI.
There is actually some work done to make userland SCST which could be
interesting for making a scst_librbd integration that bypasses the need f
I assume Brady is referring to the death spiral LIO gets into with some
initiators, including vmware, if an IO takes longer than about 10s. I haven’t
heard of anything, and can’t see any changes, so I would assume this issue
still remains.
I would look at either SCST or NFS for now.
From
On 04/06/2017 09:34 AM, Stanislav Kopp wrote:
Hello,
I'm evaluate ceph cluster, to see if you can use it for our
virtualization solution (proxmox). I'm using 3 nodes, running Ubuntu
16.04 with stock ceph (10.2.6), every OSD uses separate 8 TB spinning
drive (XFS), MONITORs are installed on the
Hello,
I'm evaluate ceph cluster, to see if you can use it for our
virtualization solution (proxmox). I'm using 3 nodes, running Ubuntu
16.04 with stock ceph (10.2.6), every OSD uses separate 8 TB spinning
drive (XFS), MONITORs are installed on the same nodes, all nodes are
connected via 10G swit
30 matches
Mail list logo