Hi all,
I'm playing with my testing cluster with ceph 12.2.8 installed.
It happened to me for the second time, that I have 1 unfound objects on erasure
coded pool.
I have erasure with 3+1 configuration.
First time I was adding additional disk. During cluster rebalance I noticed one unfound ob
I'd create a new ticket and reference the older one; they may not have the
same cause.
On Tue, Oct 2, 2018 at 12:33 PM Ricardo J. Barberis
wrote:
> Hello,
>
> I'm having this same issue on 12.2.8. Should I repoen the bug report?
>
> This cluster started on 12.2.4 and was upgraded to 12.2.5 and t
As I mentioned in that email, the apply and commit values in BlueStore are
equivalent. They're exported because it's part of the interface (thanks to
FileStore), but they won't differ. If you're doing monitoring or graphs,
just pick one.
-Greg
On Tue, Oct 2, 2018 at 3:43 PM Jakub Jaszewski
wrote:
osd_find_best_info_ignore_history_les is a dangerous option and you should
only use it in very specific circumstances when directed by a developer.
In such cases it will allow a stuck PG to peer. But you're not getting to
that point...you're seeing some sort of resource exhaustion.
The noup t
Hello,
this has crept up before, find my thread
"Bluestore caching, flawed by design?" for starters, if you haven't
already.
I'll have to build a new Ceph cluster next year and am also less than
impressed with the choices at this time:
1. Bluestore is the new shiny, filestore is going to die
Nope, you are right. I think it was just boto catching this for me and I took
that for granted.
I think that is the behavior I would expect too, S3-compliant restrictions on
create and allow legacy buckets to remain. Anyway, noticed you created a ticket
[0] in the tracker for this, thanks!
Be
can you file tracker for your
issues(http://tracker.ceph.com/projects/ceph/issues/new) , email once
its lengthy is not great to track the issue, Ideally full details of
environment (os/ceph versions /before/after/workload info/ tool used
for upgrade) is important if one has to recreate it. There a
Hi Cephers, Hi Gregory,
I consider same case like here, commit_latency==apply_latency in ceph osd
perf
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024317.html
What's the meaning of commit_latency and apply_latency in bluestore OSD
setups[? How useful is it when troubleshooti
Hi,
Sorry to hear that. I’ve been battling with mine for 2 weeks :/
I’ve corrected mine OSDs with the following commands. My OSD logs
(/var/log/ceph/ceph-OSDx.log) has a line including log(EER) with the PG number
besides and before crash dump.
ceph-objectstore-tool --data-path /var/lib/ceph/os
Hi Cephers,
I'm testing cluster throughput before moving to the production. Ceph
version 13.2.1 (I'll update to 13.2.2).
I run rados bench from 10 cluster nodes and 10 clients in parallel.
Just after I call rados command, HDDs behind three OSDs are 100% utilized
while others are < 40%. After the
On Tue, Oct 02, 2018 at 12:37:02PM -0400, Ryan Leimenstoll wrote:
> I was hoping to get some clarification on what "rgw relaxed s3 bucket
> names = false” is intended to filter.
Yes, it SHOULD have caught this case, but does not.
Are you sure it rejects the uppercase? My test also showed that it
Anyone heart about osd_find_best_info_ignore_history_les = true ?
Is that be usefull here? There is such a less information about it.
Goktug Yildirim şunları yazdı (2 Eki 2018 22:11):
> Hi,
>
> Indeed I left ceph-disk to decide the wal and db partitions when I read
> somewhere that that will d
Help. I have a 60 node cluster and most of the OSDs decided to crash
themselves at the same time. They wont restart, the messages look like...
--- begin dump of recent events ---
0> 2018-10-02 21:19:16.990369 7f57ab5b7d80 -1 *** Caught signal
(Aborted) **
in thread 7f57ab5b7d80 thread_name:c
On Tue, Oct 2, 2018 at 4:47 PM Vikas Rana wrote:
>
> Hi,
>
> We have a CEPH 3 node cluster at primary site. We created a RBD image and the
> image has about 100TB of data.
>
> Now we installed another 3 node cluster on secondary site. We want to
> replicate the image at primary site to this new
On 02.10.2018 21:21, jes...@krogh.cc wrote:
On 02.10.2018 19:28, jes...@krogh.cc wrote:
In the cephfs world there is no central server that hold the cache. each
cephfs client reads data directly from the osd's.
I can accept this argument, but nevertheless .. if I used Filestore - it
would work.
Hi,
We have a CEPH 3 node cluster at primary site. We created a RBD image and
the image has about 100TB of data.
Now we installed another 3 node cluster on secondary site. We want to
replicate the image at primary site to this new cluster on secondary site.
As per documentation, we enabled journ
Hi,
Indeed I left ceph-disk to decide the wal and db partitions when I read
somewhere that that will do the proper sizing.
For the blustore cache size I have plenty of RAM. I will increase 8GB for each
and decide a more calculated numberafter cluster settles.
For the osd map loading I’ve a
Thanks for the reply! My answers are inline.
> On 2 Oct 2018, at 21:51, Paul Emmerich wrote:
>
> (Didn't follow the whole story, so you might have already answered that)
> Did you check what the OSDs are doing during the period of high disk
> utilization?
> As in:
>
> * running perf top
Did not
step take default
step choose indep 3 chassis
step chooseleaf indep 2 host
which will only work for k+m=6 setups
Paul
Am Di., 2. Okt. 2018 um 20:36 Uhr schrieb Mark Johnston
:
>
> I have the following setup in a test cluster:
>
> -1 8.49591 root default
> -15 2.83197 chassis vm1
Hi,
I can see some issues from the osd log file. You have an extremely low
size db and wal partitions. Only 1GB for DB and 576MB for wal. I would
recommend cranking up rocksdb cache size as much as possible. If you
have RAM you can also increase bluestores cache size for hdd. Default
is 1GB be as
(Didn't follow the whole story, so you might have already answered that)
Did you check what the OSDs are doing during the period of high disk
utilization?
As in:
* running perf top
* sampling a few stack traces from procfs or gdb
* or just high log settings
* running "status" on the admin socket l
On Tue, Oct 2, 2018 at 11:35 AM Mark Johnston wrote:
>
> I have the following setup in a test cluster:
>
> -1 8.49591 root default
> -15 2.83197 chassis vm1
> -3 1.41599 host ceph01
> 0 ssd 1.41599 osd.0
> -5 1.41599 host ceph02
> 1
Hello,
I'm having this same issue on 12.2.8. Should I repoen the bug report?
This cluster started on 12.2.4 and was upgraded to 12.2.5 and then directly to
12.2.8 (we skipped 2.6 and 2.7) but the malfunctioning OSD is on a new node
installed with 12.2.8.
We're using CentOS 7.5, and bluestore f
I would 'also' choose for a solution where in case there is mv across
pools, the user has to wait a bit longer for the cp to finish. And as
said before if you export cephfs via smb or nfs, I wonder how the
nfs/smb server will execute the move.
If I use 1x replicated pool on /tmp and move the
> On 02.10.2018 19:28, jes...@krogh.cc wrote:
> In the cephfs world there is no central server that hold the cache. each
> cephfs client reads data directly from the osd's.
I can accept this argument, but nevertheless .. if I used Filestore - it
would work.
> This also means no
> single point of
Hi Folks,
I am looking for advice on how to troubleshoot some long operations found in
MDS. Most of the time performance is fantastic, but occasionally and to no real
pattern or trend, a gettattr op will take up to ~30 seconds to complete in MDS
which is stuck on "event": "failed to rdlock, wai
On 02.10.2018 19:28, jes...@krogh.cc wrote:
Hi.
Based on some recommendations we have setup our CephFS installation using
bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS
server - 100TB-ish size.
Current setup is - a sizeable Linux host with 512GB of memory - one large
De
I have the following setup in a test cluster:
-1 8.49591 root default
-15 2.83197 chassis vm1
-3 1.41599 host ceph01
0 ssd 1.41599 osd.0
-5 1.41599 host ceph02
1 ssd 1.41599 osd.1
Our ceph cluster stopped responding to requests two weeks ago, and I have
been trying to fix it since then. After a semi-hard reboot, we had 11-ish
OSDs "fail" spread across two hosts, with the pool size set to two. I was
able to extract a copy of every PG that resided solely on the nonfunctional
On 2018/10/02 2:03 pm, Andre Goree wrote:
On 2018/10/02 1:54 pm, Jason Dillaman wrote:
On Tue, Oct 2, 2018 at 1:48 PM Andre Goree wrote:
I'm actually not so sure the libvirt user has write access to the
location -- will libvirt automatically try to write to the file
(given
that it's a sett
On 2018/10/02 1:54 pm, Jason Dillaman wrote:
On Tue, Oct 2, 2018 at 1:48 PM Andre Goree wrote:
I'm actually not so sure the libvirt user has write access to the
location -- will libvirt automatically try to write to the file (given
that it's a setting in ceph.conf)?
I just confirmed that the
Hello Darius,
Thanks for reply!
The main problem is we can not query PGs. “ceph pg 67.54f query” does stucks
and wait forever since OSD is unresponsive.
We are certain that OSD gets unresponsive as soon as it UP. And we are certain
that OSD responds again after its disk utilization stops.
So
On Tue, Oct 2, 2018 at 1:48 PM Andre Goree wrote:
>
> On 2018/10/02 1:29 pm, Jason Dillaman wrote:
> > On Tue, Oct 2, 2018 at 1:25 PM Andre Goree wrote:
> >>
> >>
> >> Unfortunately, it would appear that I'm not getting anything in the
> >> logs
> >> _but_ the creation of the rbd image -- i.e., n
On 2018/10/02 1:29 pm, Jason Dillaman wrote:
On Tue, Oct 2, 2018 at 1:25 PM Andre Goree wrote:
Unfortunately, it would appear that I'm not getting anything in the
logs
_but_ the creation of the rbd image -- i.e., nothing regarding the
attempt to attach it via libvirt. Here are the logs, fo
On Tue, Oct 2, 2018 at 1:25 PM Andre Goree wrote:
>
> On 2018/10/02 10:26 am, Andre Goree wrote:
> > On 2018/10/02 9:54 am, Jason Dillaman wrote:
> >> Perhaps that pastebin link has the wrong log pasted? The provided log
> >> looks like it's associated with the creation of image
> >> "32635-b65927
Hi.
Based on some recommendations we have setup our CephFS installation using
bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS
server - 100TB-ish size.
Current setup is - a sizeable Linux host with 512GB of memory - one large
Dell MD1200 or MD1220 - 100TB + a Linux kernel N
On 2018/10/02 10:26 am, Andre Goree wrote:
On 2018/10/02 9:54 am, Jason Dillaman wrote:
Perhaps that pastebin link has the wrong log pasted? The provided log
looks like it's associated with the creation of image
"32635-b6592790-5519-5184-b5ef-5f16b3523250" and not the attachment of
an image to a
It may be that having multiple mds is masking the issue, or that we
truly didn't have a large enough inode cache at 55GB. Things are
behaving for me now, even when presenting the same 0 entries in req
and rlat.
If this happens again, I'll attempt to get perf trace logs, along with
ops, ops_in_flig
Hi all,
I was hoping to get some clarification on what "rgw relaxed s3 bucket names =
false” is intended to filter. In our cluster (Luminous 12.2.8, serving S3) it
seems that RGW, with that setting set to false, is still allowing buckets with
underscores in the name to be created, although thi
Hello,
Currently you have 15 objects missing. I would recommend finding them
and making backups of them. Ditch all other osds that are failing to
start and concentrate on bringing online those that have missing
objects. Then slowly turn off nodown and noout on the cluster and see
if it stabilises
Unless this is related to load and OSDs really are unreponsive, it is
almost certainly some sort of network issue. Duplicate IP address
maybe?
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
Ceph cluster in Jewel 10.2.11
Mons & Hosts are on CentOS 7.5.1804 kernel 3.10.0-862.6.3.el7.x86_64
Everyday, we can see in ceph.log on Monitor a lot of logs like these :
2018-10-02 16:07:08.882374 osd.478 192.168.1.232:6838/7689 386 :
cluster [WRN] map e612590 wrongly marked me down
2018-10-02 16
Sent download link by email. verbosity=10, over 900M uncompressed.
> On 2.10.2018, at 16:52, Igor Fedotov wrote:
>
> May I have a repair log for that "already expanded" OSD?
>
>
> On 10/2/2018 4:32 PM, Sergey Malinin wrote:
>> Repair goes through only when LVM volume has been expanded, otherw
On Tue, Oct 2, 2018 at 10:23 AM Alex Litvak
wrote:
>
> Igor,
>
> Thank you for your reply. So what you are saying there are really no
> sensible space requirements for a collocated device? Even if I setup 30
> GB for DB (which I really wouldn't like to do due to a space waste
> considerations ) t
On 2018/10/02 9:54 am, Jason Dillaman wrote:
Perhaps that pastebin link has the wrong log pasted? The provided log
looks like it's associated with the creation of image
"32635-b6592790-5519-5184-b5ef-5f16b3523250" and not the attachment of
an image to a VM.
On Fri, Sep 28, 2018 at 3:15 PM Andr
Igor,
Thank you for your reply. So what you are saying there are really no
sensible space requirements for a collocated device? Even if I setup 30
GB for DB (which I really wouldn't like to do due to a space waste
considerations ) there is a chance that if this space feels up I will be
in th
Even with a single device bluestore has a sort of implicit "BlueFS
partition" where DB is stored. And it dynamically adjusts (rebalances)
the space for that partition in background. Unfortunately it might
perform that "too lazy" and hence under some heavy load it might end-up
with the lack of
One of ceph experts indicated that bluestore is somewhat preview tech
(as for Redhat).
So it could be best to checkout bluestore and rocksdb. There are some
tools to check health and also repair. But there are limited
documentation.
Anyone who has experince with it?
Anyone lead/help to a proper che
I am sorry for interrupting the thread, but my understanding always was
that blue store on the single device should not care of the DB size,
i.e. it would use the data part for all operations if DB is full. And
if it is not true, what would be sensible defaults on 800 GB SSD? I
used ceph-ansi
Perhaps that pastebin link has the wrong log pasted? The provided log
looks like it's associated with the creation of image
"32635-b6592790-5519-5184-b5ef-5f16b3523250" and not the attachment of
an image to a VM.
On Fri, Sep 28, 2018 at 3:15 PM Andre Goree wrote:
>
> On 2018/09/28 2:26 pm, Andre G
May I have a repair log for that "already expanded" OSD?
On 10/2/2018 4:32 PM, Sergey Malinin wrote:
Repair goes through only when LVM volume has been expanded, otherwise it fails
with enospc as well as any other operation. However, expanding the volume
immediately renders bluefs unmountable
Den mån 1 okt. 2018 kl 22:08 skrev John Spray :
>
> > totally new for me, also not what I would expect of a mv on a fs. I know
> > this is normal to expect coping between pools, also from the s3cmd
> > client. But I think more people will not expect this behaviour. Can't
> > the move be implemente
Repair goes through only when LVM volume has been expanded, otherwise it fails
with enospc as well as any other operation. However, expanding the volume
immediately renders bluefs unmountable with IO error.
2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very end
of bluefs
You mentioned repair had worked before, is that correct? What's the
difference now except the applied patch? Different OSD? Anything else?
On 10/2/2018 3:52 PM, Sergey Malinin wrote:
It didn't work, emailed logs to you.
On 2.10.2018, at 14:43, Igor Fedotov wrote:
The major change is in g
The kernel cephfs client unfortunately has the tendency to get stuck
in some unrecoverable states requiring a reboot, especially in older
kernels.
Usually it's not recoverable without a reboot.
Paul
Am Di., 2. Okt. 2018 um 14:55 Uhr schrieb Jaime Ibar :
>
> Hi Paul,
>
> I tried ceph-fuse mounting
Hi Paul,
I tried ceph-fuse mounting it in a different mount point and it worked.
The problem here is we can't unmount ceph kernel client as it is in use
by some virsh processes. We forced the unmount and mount ceph-fuse
but we got an I/O error and mount -l cleared all the processes but after
It didn't work, emailed logs to you.
> On 2.10.2018, at 14:43, Igor Fedotov wrote:
>
> The major change is in get_bluefs_rebalance_txn function, it lacked
> bluefs_rebalance_txn assignment..
>
>
>
> On 10/2/2018 2:40 PM, Sergey Malinin wrote:
>> PR doesn't seem to have changed since yesterd
The major change is in get_bluefs_rebalance_txn function, it lacked
bluefs_rebalance_txn assignment..
On 10/2/2018 2:40 PM, Sergey Malinin wrote:
PR doesn't seem to have changed since yesterday. Am I missing something?
On 2.10.2018, at 14:15, Igor Fedotov wrote:
Please update the patch f
PR doesn't seem to have changed since yesterday. Am I missing something?
> On 2.10.2018, at 14:15, Igor Fedotov wrote:
>
> Please update the patch from the PR - it didn't update bluefs extents list
> before.
>
> Also please set debug bluestore 20 when re-running repair and collect the log.
>
Please update the patch from the PR - it didn't update bluefs extents
list before.
Also please set debug bluestore 20 when re-running repair and collect
the log.
If repair doesn't help - would you send repair and startup logs directly
to me as I have some issues accessing ceph-post-file uplo
On Mon, Oct 1, 2018 at 8:09 PM Gregory Farnum wrote:
>
> On Fri, Sep 28, 2018 at 12:03 AM Dan van der Ster wrote:
> >
> > On Thu, Sep 27, 2018 at 9:57 PM Maged Mokhtar wrote:
> > >
> > >
> > >
> > > On 27/09/18 17:18, Dan van der Ster wrote:
> > > > Dear Ceph friends,
> > > >
> > > > I have a CR
Kernel 4.4 is not suitable for a multi MDS setup. In general, I
wouldn't feel comfortable running 4.4 with kernel cephfs in
production.
I think at least 4.15 (not sure, but definitely > 4.9) is recommended
for multi MDS setups.
If you can't reboot: maybe try cephfs-fuse instead which is usually
ve
Hi,
You can easily configure it manually, e.g. :
$ sudo ceph osd crush rm-device-class osd.xx
$ sudo ceph osd crush set-device-class nvme osd.xx
Indeed, it may be useful when you want to create custom rules on this
type of device.
Hervé
Le 01/10/2018 à 23:25, Vladimir Brik a écrit :
Hello,
Hi Paul,
we're using 4.4 kernel. Not sure if more recent kernels are stable
for production services. In any case, as there are some production
services running on those servers, rebooting wouldn't be an option
if we can bring ceph clients back without rebooting.
Thanks
Jaime
On 01/10/18 21
Hi,
there's only one entry in blacklist, however is a mon, not a cephfs
client and no cephfs
is mounted on that host.
We're using kernel client and the kernel version is 4.4 for ceph
services and cephfs clients.
This is what we have in /sys/kernel/debug/ceph
cat mdsmap
epoch 59259
root 0
Yes, I did repair all OSDs and it finished with 'repair success'. I backed up
OSDs so now I have more room to play.
I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93
> On 2.10.2018, at 11:26, Igor Fedotov wr
You did repair for any of this OSDs, didn't you? For all of them?
Would you please provide the log for both types (failed on mount and
failed with enospc) of failing OSDs. Prior to collecting please remove
existing ones prior and set debug bluestore to 20.
On 10/2/2018 2:16 AM, Sergey Mali
67 matches
Mail list logo