Hi,
no ok it was not. Bug still present. It was only working because the
osdmap was so far away that it has started backfill instead of recovery.
So it happens only in the recovery case.
Greets,
Stefan
Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG:
>
> Am 15.01.19 um 12:45 schrieb
I opened a thread recently here asking about what can be generally
accepted as 'ceph overhead' when using the file system. I wonder if the
performance loss I have on a cephfs 1x replication pool compared to
native performance is really so much. 5,6x to 2x slower than native disk
performance
I’m looking at writes to a fragile SSD on a mon node,
/var/lib/ceph/mon/ceph-{node}/store.db is the big offender at the moment.
Is it required to be on a physical disk or can it be in tempfs? One of the log
files has paxos strings, so I’m guessing it has to be on disk for a panic
recovery? Are
Dear Ceph users,
I’d like to get some feedback for the following thought:
Currently I run some 24*4TB bluestore OSD nodes. The main focus is on storage
space over IOPS.
We use erasure code and cephfs, and things look good right now.
The „but“ is, I do need more disk space and don’t have so muc
Hi,
On 16/01/2019 09:02, Brian Topping wrote:
> I’m looking at writes to a fragile SSD on a mon node,
> /var/lib/ceph/mon/ceph-{node}/store.db is the big offender at the
> moment.
> Is it required to be on a physical disk or can it be in tempfs? One
> of the log files has paxos strings, so I’m gu
Have had some good experiences with ST1NM0156-2AA111
Also running with EC but using RBD for slow storage for VM's, have had good
performance for what id expect for 10TB drive's and EC.
I would definitely say getting helium drives helps VS standard air filled
when you get to the 8TB+ drives.
On 1/16/19 10:36 AM, Matthew Vernon wrote:
> Hi,
>
> On 16/01/2019 09:02, Brian Topping wrote:
>
>> I’m looking at writes to a fragile SSD on a mon node,
>> /var/lib/ceph/mon/ceph-{node}/store.db is the big offender at the
>> moment.
>> Is it required to be on a physical disk or can it be in te
Thanks guys! This does leave me a little worried that I only have one mon at
the moment based on reasons in my previous emails in the list (physical limit
of two nodes at the moment). Going to have to get more creative!
Sent from my iPhone
> On Jan 16, 2019, at 02:56, Wido den Hollander wrote:
On Wed, Jan 16, 2019 at 1:27 AM Kjetil Joergensen wrote:
>
> Hi,
>
> you could try reducing "osd map message max", some code paths that end up as
> -EIO (kernel: libceph: mon1 *** io error) is exceeding
> include/linux/ceph/libceph.h:CEPH_MSG_MAX_{FRONT,MIDDLE,DATA}_LEN.
>
> This "worked for us"
On 1/16/19 11:19 AM, Brian Topping wrote:
> Thanks guys! This does leave me a little worried that I only have one mon at
> the moment based on reasons in my previous emails in the list (physical limit
> of two nodes at the moment). Going to have to get more creative!
>
My advice: Do something
Hi,
I’m trying run a Gerrit deployment on kubernetes. The deployment fails because
the it can’t mount a ceph image with rbd-nbd.
I tried to manually mount the image with "sudo rbd-nbd map rbd/xgrid-rep-test
-m 172.31.141.8:6789,172.31.141.9:6789,172.31.141.10:6789 --keyfile keyfile"
and it work
Disclaimer: Even I will admit that I know this is going to sound like a
silly/crazy/insane question, but I have a reason for wanting to do this and
asking the question. It’s also worth noting that no active, production
workload will be used on this “cluster”, so I’m worried more about data
in
How can there be a "catastrophic reason" if you have "no active,
production workload"...? Do as you please. I am also having 1
replication for temp en tests. But if you have only one osd why use
ceph? Choose the correct 'tool' for the job.
-Original Message-
From: Kenneth Van Alst
Hello,
while digging into this further i saw that it takes ages until all pgs
are active. After starting the OSD 3% of all pgs are inactive and it
takes minutes after they're active.
The log of the OSD is full of:
2019-01-16 15:19:13.568527 7fecbf7da700 0 osd.33 pg_epoch: 1318479
pg[5.563( v 1
Hi,
just some comments:
CephFS has an overhead for accessing files (capabilities round trip to
MDS for first access, cap cache management, limited number of concurrent
caps depending on MDS cache size...), so using the cephfs filesystem as
storage for a filestore OSD will add some extra ove
Marc:
To clarify, there will be no direct client workload (which is what I mean by
“active production workload”), but rather RBD images from a remote cluster
imported from either RBD export/import or as an RBD mirror destination.
Obviously the best solution is dedicated hardware, but I don’t ha
This is not the case with 12.2.8 - it happens with 12.2.9 as well. After
boot all pgs are instantly active - not inactive pgs at least not
noticable in ceph -s.
With 12.2.9 or 12.2.10 or eben current upstream/luminous it takes
minutes until all pgs are active again.
Greets,
Stefan
Am 16.01.19 um
Burkhard:
Thank you, this is literally what I was looking for. A VM with RBD images
attached was my first choice (and what we do for a test and integration lab
today), but am trying to give as much possible space to the underlying cluster
without having to frequently add/remove OSDs and rebalan
Hi Stefan,
12.2.9 included the pg hard limit patches and the osd_memory_autotuning
patches. While at first I was wondering if this was autotuning, it
sounds like it may be more related to the pg hard limit. I'm not
terribly familiar with those patches though so some of the other members
fr
On 16.01.19 16:03, Kenneth Van Alstyne wrote:
> To be clear, I know the question comes across as ludicrous. It *seems*
> like this is going to work okay for the light workload use case that I
> have in mind — I just didn’t want to risk impacting the underlying
> cluster too much or hit any other
Hi everyone,
This has come up several times before, but we need to make a final
decision. Alfredo has a PR prepared that drops Python 2 support entirely
in master, which will mean nautilus is Python 3 only.
All of our distro targets (el7, bionic, xenial) include python 3, so that
isn't an iss
I’d actually rather it not be an extra cluster, but can the destination pool
name be different? If not, I have conflicting image names in the “rbd” pool on
either side.
Thanks,
--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehl
Hi,
My 2 cents:
- do drop python2 support
- do not drop python2 support unexpectedly, aka do a deprecation phase
People should already know that python2 is dead
That is not enough, though, to remove that "by surprise"
Regards,
On 01/16/2019 04:45 PM, Sage Weil wrote:
> Hi everyone,
>
> This ha
I have python 2 in rhel7/centos7
[@c04 ~]# python -V
Python 2.7.5
[@c04 ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
-Original Message-
From: c...@jack.fr.eu.org [mailto:c...@jack.fr.eu.org]
Sent: 16 January 2019 16:55
To: ceph-users@lists.ceph.com
Subject: Re: [
I spoke with Doug Hellmann who has been championing the goal inside of
OpenStack [1].
According to Doug all major services in OpenStack should be supporting
python 3.5 and 3.6. They have a goal in their current cycle, set for
2019-04-10 [2], to make python 3 the default in tests [3].
[1] - https:
Hey everyone,
We're getting close to the release of Ceph Nautilus, and I wanted to
start the discussion of our next shirt!
It looks like in the past we've used common works from Wikipedia pages.
https://en.wikipedia.org/wiki/Nautilus
I thought it would be fun to see who in our community would l
Hi Ilya/Kjetil,
I've done some debugging and tcpdump-ing to see what the interaction
between the kernel client and the mon looks like. Indeed -
CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon
messages for our cluster (with osd_mon_messages_max at 100). We have
about 3500 os
On Wed, 16 Jan 2019, 02:20 David Young Hi folks,
>
> My ceph cluster is used exclusively for cephfs, as follows:
>
> ---
> root@node1:~# grep ceph /etc/fstab
> node2:6789:/ /ceph ceph
> auto,_netdev,name=admin,secretfile=/root/ceph.admin.secret
> root@node1:~#
> ---
>
> "rados df" shows me the fol
I'm looking for some help in fixing a bucket index on a Luminous (12.2.8)
cluster running on FileStore.
First some background on how I believe the bucket index became broken. Last
month we had a PG in our .rgw.buckets.index pool become inconsistent:
2018-12-11 09:12:17.743983 osd.1879 osd.1879 1
On Wed, Jan 16, 2019 at 7:12 PM Andras Pataki
wrote:
>
> Hi Ilya/Kjetil,
>
> I've done some debugging and tcpdump-ing to see what the interaction
> between the kernel client and the mon looks like. Indeed -
> CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon
> messages for our clus
I would definitely see huge value in going to 3 MONs here (and btw 2 on-site
MGR and 2 on-site MDS)
However 350Kbps is quite low and MONs may be latency sensitive, so I suggest
you do heavy QoS if you want to use that link for ANYTHING else.
If you do so, make sure your clients are only listing t
Forgot to reply to the list!
‐‐‐ Original Message ‐‐‐
On Thursday, January 17, 2019 8:32 AM, David Young
wrote:
> Thanks David,
>
> "ceph osd df" looks like this:
>
> -
> root@node1:~# ceph osd df
> ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL%USE VAR PGS
> 9 hdd 7.27
i reverted the whole cluster back to 12.2.8 - recovery speed also
dropped from 300-400MB/s to 20MB/s on 12.2.10. So something is really
broken.
Greets,
Stefan
Am 16.01.19 um 16:00 schrieb Stefan Priebe - Profihost AG:
> This is not the case with 12.2.8 - it happens with 12.2.9 as well. After
> boo
> On Jan 16, 2019, at 12:08 PM, Anthony Verevkin wrote:
>
> I would definitely see huge value in going to 3 MONs here (and btw 2 on-site
> MGR and 2 on-site MDS)
> However 350Kbps is quite low and MONs may be latency sensitive, so I suggest
> you do heavy QoS if you want to use that link for AN
On 1/16/19 8:08 PM, Anthony Verevkin wrote:
> I would definitely see huge value in going to 3 MONs here (and btw 2 on-site
> MGR and 2 on-site MDS)
> However 350Kbps is quite low and MONs may be latency sensitive, so I suggest
> you do heavy QoS if you want to use that link for ANYTHING else.
>
Hello,
What is the difference between marking an OSD "lost" vs removing it with
"rm" in terms of cluster recovery?
What is the next step after marking an OSD "lost" and the cluster finishes
recovering? Do you then "rm" it?
Thanks
Chandra
--
This email message, including attachments, may cont
Hi everyone,
First, this is a reminder that there is a Tech Talk tomorrow from Guy
Margalit about NooBaa, a multi-cloud object data services platform:
Jan 17 at 19:00 UTC
https://bluejeans.com/908675367
Why, you might ask?
There is a lot of interest among many Ceph developers and vendors to
Hi Patrick,
Quoting Stefan Kooman (ste...@bit.nl):
> Quoting Stefan Kooman (ste...@bit.nl):
> > Quoting Patrick Donnelly (pdonn...@redhat.com):
> > > Thanks for the detailed notes. It looks like the MDS is stuck
> > > somewhere it's not even outputting any log messages. If possible, it'd
> > > be
38 matches
Mail list logo