Good day,
Firstly I'd like to acknowledge that I consider myself a Ceph noob.
OS: Ubuntu 16.04.3 LTS
Ceph version: 12.2.1
I'm running a small six node POC cluster with three MDS daemons. (One on
each node, node1, node2 and node3)
I've also configured three ceph file systems fsys1, fsys2 and fsy
On Wed, Nov 8, 2017 at 10:39 AM, Geoffrey Rhodes wrote:
> Good day,
>
> Firstly I'd like to acknowledge that I consider myself a Ceph noob.
>
> OS: Ubuntu 16.04.3 LTS
> Ceph version: 12.2.1
>
> I'm running a small six node POC cluster with three MDS daemons. (One on
> each node, node1, node2 and
Hello,
it's clear to me getting a performance gain from putting the journal on
a fast device (ssd,nvme) when using filestore backend.
it's not when it comes to bluestore - are there any resources,
performance test, etc. out there how a fast wal,db device impacts
performance?
br
wolfgang
--
Wol
Good Day,
Today we had a problem with lots of OSDs being marked as down due to
heartbeat failures between the OSDs.
Specifically the following is seen in the OSD logs prior to the heartbeat
no_reply errors
monclient: _check_auth_rotating possible clock skew, rotating keys expired
way too early
Hi,
I am not sure if this is the same issue as we had recently, but it looks a
bit like it -- we also had a Luminous mon crashing right after syncing was
done.
Turns out that the current release has a bug which causes the mon to crash
if it cannot find a mgr daemon. This should be fixed in the up
Hello Guys
We have a fresh 'luminous' ( 12.2.0 )
(32ce2a3ae5239ee33d6150705cdb24d43bab910c)
luminous (rc) ( installed using ceph-ansible )
the cluster contains 6 * Intel server board S2600WTTR ( 96 osds and 3
mons )
We have 6 nodes ( Intel server board S2600WTTR ) , Mem - 64G , CPU
Hi Wolfgang,
In bluestore the WAL serves sort of a similar purpose to filestore's
journal, but bluestore isn't dependent on it for guaranteeing durability
of large writes. With bluestore you can often get higher large-write
throughput than with filestore when using HDD-only or flash-only OSDs
Title probably should have read "Ceph Data corruption shortly after Luminous
Upgrade"
Problem seems to have been sorted out. Still not sure why original problem
other than Upgrade latency?, or mgr errors?
After I resolved the boot problem I attempted to reproduce error, but was
unsuccessful whi
Are your QEMU VMs using a different CephX user than client.admin? If so,
can you double-check your caps to ensure that the QEMU user can blacklist?
See step 6 in the upgrade instructions [1]. The fact that "rbd resize"
fixed something hints that your VMs had hard-crashed with the exclusive
lock lef
Thanks for the instructions Michael, I was able to successfully get the
patch, build, and install.
Unfortunately I'm now seeing "osd/PG.cc: 5381: FAILED
assert(info.history.same_interval_since != 0)". Then the OSD crashes.
On Sat, Nov 4, 2017 at 5:51 AM, Michael wrote:
> Jon Light wrote:
>
> I
Hello,
Today we use ceph jewel with:
osd disk thread ioprio class=idle
osd disk thread ioprio priority=7
and "nodeep-scrub" flag is set.
We want to change scheduler from CFQ to deadline, so these options will lose
effect.
I've tried to find out what operations are performed in "disk thread".
Hi Mark,
thanks for your reply!
I'm a big fan of keeping things simple - this means that there has to be
a very good reason to put the WAL and DB on a separate device otherwise
I'll keep it collocated (and simpler).
as far as I understood - putting the WAL,DB on a faster (than hdd)
device makes m
Hi Kamila,
Thank you for your response.
I think we solved it yesterday.
I simply removed the mon again and this time I also removed all references to
it in ceph.conf (had some remnants there).
After that I ran ceph-deploy and after that it haven’t crashed again so far.
So in this case it was mo
Hi Wolfgang,
You've got the right idea. RBD is probably going to benefit less since
you have a small number of large objects and little extra OMAP data.
Having the allocation and object metadata on flash certainly shouldn't
hurt, and you should still have less overhead for small (<64k) writes
You were right, it was frozen at virtual machine level.
panic kernel parameter worked, so server resumed with reboot.
But there were no panic displayed on the VNC console even if I was logged.
The main problem is, that combination of MON and OSD silent failure at
once will cause much longer res
On Wed, Nov 8, 2017 at 9:45 PM, Mark Schouten wrote:
> I see you fixed this (with a rather trivial patch :)), great!
>
:)
> I am wondering though, should I be able to remove the invalid entry using
> this patch too?
>
It should work.
> Regards,
>
> Mark
>
>
> On 5 Nov 2017, at 07:33, Orit Wasser
When I add in the next hdd i'll try the method again and see if I just
needed to wait longer.
On Tue, Nov 7, 2017 at 11:19 PM Wido den Hollander wrote:
>
> > Op 7 november 2017 om 22:54 schreef Scottix :
> >
> >
> > Hey,
> > I recently updated to luminous and started deploying bluestore osd
> no
Can anyone advice on a erasure pool config to store
- files between 500MB and 8GB, total 8TB
- just for archiving, not much reading (few files a week)
- hdd pool
- now 3 node cluster (4th coming)
- would like to save on storage space
I was thinking of a profile with jerasure k=3 m=2, but mayb
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mark Nelson
> Sent: 08 November 2017 19:46
> To: Wolfgang Lendl
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] bluestore - wal,db on faster devices?
>
> Hi Wolfgang,
>
> You've
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> ??? ???
> Sent: 08 November 2017 16:21
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Recovery operations and ioprio options
>
> Hello,
> Today we use ceph jewel with:
> osd
For anyone that encounters this in the future, I was able to resolve
the issue by finding the three osd's that the object is on. One by one
I stop the osd, flushed the journal and used the objectstore tool to
remove the data (sudo ceph-objectstore-tool --data-path
/var/lib/ceph/osd/ceph-19 --journa
Also look at the new WD 10TB Red's if you want very low use archive storage.
Because they spin at 5400, they only use 2.8W at idle.
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Jack
> Sent: 06 November 2017 22:31
> To: ceph-users@lists.c
Hi,
Do you think there is a way for ceph to disconnect an HV client from a
cluster?
We want to prevent the possibility that two hvs are running the same vm.
When a hv crashes, we have to make sure that when the
vms are started in a new hv, that the disk is not open in the crashed hv.
I can see
On 11/08/2017 03:16 PM, Nick Fisk wrote:
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Mark Nelson
Sent: 08 November 2017 19:46
To: Wolfgang Lendl
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] bluestore - wal,db on faster devices?
Who is running nfs-ganesha's FSAL to export CephFS? What has your
experience been?
(We are working on building proper testing and support for this into
Mimic, but the ganesha FSAL has been around for years.)
Thanks!
sage
___
ceph-users mailing list
I, in test environment, centos7, on a luminous osd node, with binaries
from
download.ceph.com::ceph/nfs-ganesha/rpm-V2.5-stable/luminous/x86_64/
Having these:
Nov 6 17:41:34 c01 kernel: ganesha.nfsd[31113]: segfault at 0 ip
7fa80a151a43 sp 7fa755ffa2f0 error 4 in
libdbus-1.so.3.7.4
> -Original Message-
> From: Mark Nelson [mailto:mnel...@redhat.com]
> Sent: 08 November 2017 21:42
> To: n...@fisk.me.uk; 'Wolfgang Lendl'
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] bluestore - wal,db on faster devices?
>
>
>
> On 11/08/2017 03:16 PM, Nick Fisk wrote:
On my cluster I have a ceph-deploy node that is not a mon or osd. This is my
bench system, and I want to recreate the ceph-deploy node to simulate a
failure. I cannot find this outlined anywhere, so I thought I would ask.
Basically follow Preflight
http://docs.ceph.com/docs/master/start/quick-s
Wow, Thanks for the heads-up Jason. That explains a lot. I followed the
instructions here http://ceph.com/releases/v12-2-0-luminous-released/ which
apparently left out that step. I have now executed that command.
Is there a new master list of the cli’s?
From: Jason Dillaman [mailto:jdill...@red
Hi Sage,
We have been running the Ganesha FSAL for a while (as far back as Hammer /
Ganesha 2.2.0), primarily for uid/gid squashing.
Things are basically OK for our application, but we've seen the following
weirdness*:
- Sometimes there are duplicated entries when directories are listed
Hi Cephers,
I'm testing RadosGW in Luminous version. I've already installed done in
separate host, service is running but RadosGW did not accept any my
configuration in ceph.conf.
My Config:
[client.radosgw.gateway]
host = radosgw
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path =
Are you sure you deployed it with the client.radosgw.gateway name as
well? Try to redeploy the RGW and make sure the name you give it
corresponds to the name you give in the ceph.conf. Also, do not forget
to push the ceph.conf to the RGW machine.
On Wed, Nov 8, 2017 at 11:44 PM, Sam Huracan wrote
@Hans: Yes, I tried to redeploy RGW, and ensure client.radosgw.gateway is
the same in ceph.conf.
Everything go well, service radosgw running, port 7480 is opened, but all
my config of radosgw in ceph.conf can't be set, rgw_dns_name is still
empty, and log file keeps default value.
[root@radosgw sy
I checked ceph pools, cluster has some pools:
[ceph-deploy@ceph1 cluster-ceph]$ ceph osd lspools
2 rbd,3 .rgw.root,4 default.rgw.control,5 default.rgw.meta,6
default.rgw.log,
2017-11-09 11:25 GMT+07:00 Sam Huracan :
> @Hans: Yes, I tried to redeploy RGW, and ensure client.radosgw.gateway is
>
> Op 8 november 2017 om 22:41 schreef Sage Weil :
>
>
> Who is running nfs-ganesha's FSAL to export CephFS? What has your
> experience been?
>
A customer of mine is going this. They are running Ubuntu and my experience is
that getting Ganesha compiled is already a pain sometimes.
When it r
Hello,
I think it not normal behavior in Luminous. I'm testing 3 nodes, each node
have 3 x 1TB HDD, 1 SSD for wal + db, E5-2620 v3, 32GB of RAM, 10Gbps NIC.
I use fio for I/O performance measurements. When I ran "fio --randrepeat=1
--ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filen
36 matches
Mail list logo