Dear all,
I'm reading the docs at
http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/
regarding the cluster network and I wonder which nodes are connected to the
dedicated cluster network?
The digram on the mentioned page only shows the OSDs connected to the cluster netwo
mds
services on the network will do nothing.
On Fri, Jul 14, 2017, 11:39 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Dear all,
I'm reading the docs at
http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/
regarding the cluster network and I wond
Dear all,
we are planning to add new hosts to our existing hammer clusters, and I'm
looking for best practices recommendations.
currently we have 2 clusters with 72 OSDs and 6 nodes each. We want to add 3
more nodes (36 OSDs) to each cluster, and we have some questions about what
would be the
https://www.spinics.net/lists/ceph-users/msg37252.html
On Tue, Jul 18, 2017, 9:07 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Dear all,
we are planning to add new hosts to our existing hammer clusters, and I'm
looking for best practices recommendations.
cur
e the hosts
into them.
Sage explains a lot of the crush map here.
https://www.slideshare.net/mobile/sageweil1/a-crash-course-in-crush
On Wed, Jul 19, 2017, 2:43 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Hi David,
thank you for pointing this out. Google wasn't
olled the impact
of the recovery/refilling operation on your clients' data traffic? What setting
have you used to avoid slow requests?
Kind regards,
Laszlo
On 19.07.2017 17:40, Richard Hesketh wrote:
On 19/07/17 15:14, Laszlo Budai wrote:
Hi David,
Thank you for that reference about CRU
Dear all,
Where can I read more about how the space used by a snapshot of an RBD image is
calculated? Or can someone explain it here?
I can see that before the snapshot is created, the size of the image is let's
say 100M as reported by the rbd du command, while after taking the snapshot, I
ca
Dear all,
I need to expand a ceph cluster with minimal impact. Reading previous threads
on this topic from the list I've found the ceph-gentle-reweight script
(https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight)
created by Dan van der Ster (Thank you Dan for sharin
reduce the extra data movement we were seeing with smaller weight
increases. Maybe something to try out next time?
Bryan
From: ceph-users on behalf of Dan van der Ster
Date: Friday, August 4, 2017 at 1:59 AM
To: Laszlo Budai
Cc: ceph-users
Subject: Re: [ceph-users] expanding cluster with mini
at 8:12 PM, Laszlo Budai wrote:
Dear all,
I need to expand a ceph cluster with minimal impact. Reading previous
threads on this topic from the list I've found the ceph-gentle-reweight
script
(https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight)
created by Dan van der
Dear all!
In our Hammer cluster we are planning to switch our failure domain from host to
chassis. We have performed some simulations, and regardless of the settings we
have used some slow requests have appeared all the time.
we had the the following settings:
"osd_max_backfills": "1",
"
iling drives) which could easily cause things to block. Also
checking if your disks or journals are maxed out with iostat could shine some
light on any mitigating factor.
On Thu, Aug 31, 2017 at 9:01 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Dear all!
In our Ha
ost people find that they can use 3-5 before the disks are active
enough to come close to impacting customer traffic. That would lead me to
think you have a dying drive that you're reading from/writing to in sectors
that are bad or at least slower.
On Fri, Sep 1, 2017, 6:13 AM Laszlo B
Hi,
I've just started up the dasboard component of the ceph mgr. It looks OK, but
from what can be seen, and what I was able to find in the docs, the dashboard
is just for monitoring. Is there any plugin that allows management of the ceph
resources (pool create/delete).
Thanks,
Laszlo
___
Hello,
I have these settings in my /etc/ceph/ceph.conf:
[client]
rbd cache = true
rbd cache writethrough until flush = true
admin socket = /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
log file = /var/log/qemu/qemu-guest-$pid.log
rbd concurrent management ops = 20
Currently
Hello,
Thank you for the answer.
I don't have the admin socket either :(
the ceph subdirectory is missing in /var/run.
What would be the steps to get the socket?
Kind regards,
Laszlo
On 28.02.2017 05:32, Jason Dillaman wrote:
On Mon, Feb 27, 2017 at 12:36 PM, Laszlo Budai wrote:
Curr
Hello,
I have a strange situation:
On a host server we are running 5 VMs. The VMs have their disks provisioned by
cinder from a ceph cluster and are attached by quemu-kvm using librbd.
We have a very strange situation when the VMs apparently have stopped to work
for a few seconds (10-20), and a
Hello,
is there any risk related to cluster overload when the scrub is re enabled
after a certain amount of time being disabled?
I am thinking of the following scenario:
1. scrub/deep scrub are disabled.
2. after a while (few days) we re enable them. How will the cluster perform?
Will it run a
t the above is slowed down enough that everything is
scrubbed within this long scrub interval, but might need adjustment for
a more normal setting here:
# 60 days ... default is 7 days
osd deep scrub interval = 5259488
And more inline answers below
On 03/08/17 10:46, Laszlo Budai wrote:
Hello
Hello,
After a major network outage our ceph cluster ended up with an inactive PG:
# ceph health detail
HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean; 1
requests are blocked > 32 sec; 1 osds have slow requests
pg 3.367 is stuck inactive for 912263.766607, current state
pdate": "0'0",
"current_last_stamp": "0.00",
"current_info": {
"begin": "0.00",
"end": "0.00",
"versio
are marked DNE and seem to be uncontactable.
This seems to be more than a network issue (unless the outage is still
happening).
http://docs.ceph.com/docs/master/rados/operations/pg-states/?highlight=incomplete
On Fri, Mar 10, 2017 at 6:09 PM, Laszlo Budai wrote:
Hello,
I was informed that
/msg17820.html
If you want to abandon the pg see
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012778.html
for a possible solution.
http://ceph.com/community/incomplete-pgs-oh-my/ may also give some ideas.
On Fri, Mar 10, 2017 at 9:44 PM, Laszlo Budai wrote:
The OSDs are al
Hello,
Can someone explain the meaning of osd_disk_thread_ioprio_priority. I'm reading
the definition from this page:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/configuration_guide/osd_configuration_reference
it says: "It sets the ioprio_set(2) I/O scheduling p
On 11.03.2017 16:25, Nick Fisk wrote:
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Laszlo Budai
Sent: 11 March 2017 13:51
To: ceph-users
Subject: [ceph-users] osd_disk_thread_ioprio_priority help
Hello,
Can someone explain the meaning
27;ll read it. So far, searching for the architecture of an OSD,
I could not find the gory details about these directories.
Kind regards,
Laszlo
On 12.03.2017 02:12, Brad Hubbard wrote:
On Sat, Mar 11, 2017 at 7:43 PM, Laszlo Budai wrote:
Hello,
Thank you for your answer.
indeed the min_size
ions which would help to improove the cluster's
responsiveness during deep scrub operations.
Kind regards,
Laszlo
On 12.03.2017 21:21, Florian Haas wrote:
On Sat, Mar 11, 2017 at 4:24 PM, Laszlo Budai wrote:
Can someone explain the meaning of osd_disk_thread_ioprio_priority. I'm
[...
e.
What else can I try?
Thank you,
Laszlo
On 12.03.2017 13:06, Brad Hubbard wrote:
On Sun, Mar 12, 2017 at 7:51 PM, Laszlo Budai wrote:
Hello,
I have already done the export with ceph_objectstore_tool. I just have to
decide which OSDs to keep.
Can you tell me why the directory structur
Hello,
So, I've done the following seps:
1. set noout
2. stop osd2
3. ceph-objectstore-tool remove
4. start osd2
5. repeat step 2-4 on osd 28 and 35
then I've run the ceph pg force_create_pg 3.367.
This has left the PG in creating state:
# ceph -s
cluster 6713d1b8-83da-11e6-aa79-525400d98
on the disk).
Use force_create_pg to recreate the pg empty.
Use ceph-objectstore-tool to do a rados import on the exported pg copy.
On Wed, Mar 15, 2017 at 12:00 PM, Laszlo Budai wrote:
Hello,
I have tried to recover the pg using the following steps:
Preparation:
1. set noout
2. stop osd.2
ory on the disk).
Use force_create_pg to recreate the pg empty.
Use ceph-objectstore-tool to do a rados import on the exported pg copy.
On Wed, Mar 15, 2017 at 12:00 PM, Laszlo Budai wrote:
Hello,
I have tried to recover the pg using the following steps:
Preparation:
1. set noout
2. stop
Hello,
I'm trying to do an import-rados operation, but the ceph-objectstore-tool
crashes with segfault:
[root@storage1 ~]# ceph-objectstore-tool import-rados images pg6.6exp-osd1
*** Caught signal (Segmentation fault) **
in thread 7f84e0b24880
ceph version 0.94.10 (b1e0532418e4631af01acbc0ced
the debuginfo for ceph (how this works depends on your
distro) and run the following?
# gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args ceph-objectstore-tool
import-rados volumes pg.3.367.export.OSD.35
On Thu, Mar 16, 2017 at 12:02 AM, Laszlo Budai wrote:
Hello,
the
My mistake, I've run it on a wrong system ...
I've attached the terminal output.
I've run this on a test system where I was getting the same segfault when
trying import-rados.
Kind regards,
Laszlo
On 16.03.2017 07:41, Laszlo Budai wrote:
[root@storage2 ~]# gdb -ex 'r
h/$cluster-$name.$pid.log
Then run the ceph-objectstore-tool again taking careful note of what
file is created in /var/log/ceph/ and upload that.
On Thu, Mar 16, 2017 at 5:21 PM, Laszlo Budai wrote:
My mistake, I've run it on a wrong system ...
I've attached the terminal output
Hi all,
I've found that the problem was due to missing
/etc/ceph/ceph.client.admin.keyring file on the storage node where I was trying
to do the import-rados operation.
Kind regards,
Laszlo
On 15.03.2017 20:22, Laszlo Budai wrote:
Hello,
I'm trying to do an import-rados operatio
Hello,
we have been patching our ceph cluster 0.94.7 to 0.94.10. We were updating one
node at a time, and after each OSD node has been rebooted we were waiting for
the cluster health status to be OK.
In the docs we have "stale - The placement group status has not been updated by a
ceph-osd, in
Hello,
can someone tell me the meaning of the last_scrub and last_deep_scrub values
from the ceph pg dump output?
I could not find it with google nor in the documentation.
for example I can see here the last_scrub being 61092'4385, and the
last_deep_scrub=61086'4379
pg_stat objects mip
Hello cephers,
I have a situation where from time to time the write operation to the seph
storage hangs for 3-5 seconds. For testing we have a simple line like:
while sleep 1; date >> logfile; done &
with this we can see that rarely there are 3 seconds or more differences
between the consecuti
Hello,
We have an issue when writing to ceph. From time to time the write operation
seems to hang for a few seconds.
We've seen the https://bugzilla.redhat.com/show_bug.cgi?id=1389503, and there it is said
that when the qemu process would reach the max open files limit, then "the guest OS
shou
|finalize)" log entries
3) use the asok file during one of these events to dump the objecter requests
[1] http://docs.ceph.com/docs/jewel/rbd/rbd-replay/
[2] http://tracker.ceph.com/issues/14629
On Tue, Apr 4, 2017 at 7:36 AM, Laszlo Budai wrote:
Hello cephers,
I have a situation whe
Hello,
we have observed that there are null characters written into the open files
when hard rebooting a VM. Is tis a known issue?
Our VM is using ceph (0.94.10) storage.
we have a script like this:
while sleep 1; do date >> somefile ; done
if we hard reset the VM while the above line is runnin
keystone_token_cache_size": "1",
"rgw_bucket_quota_cache_size": "1",
I did some tests and the problem has appeared when I was using ext4 in the VM,
but not in the case of xfs.
I did an other test when I was calling a sync at the end of the while loop,
mstances
aren't the same, but the patterns of behaviour are similar enough that I wanted
to raise awareness.
k8
On Sat, Apr 8, 2017 at 6:39 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Hello Peter,
Thank you for your answer.
In our setup we have the virtu
Hello,
yesterday one of our compute nodes has recorded the following message for one
of the ceph connections:
submit_message osd_op(client.28817736.0:690186
rbd_data.15c046b11ab57b7.00c4 [read 2097152~380928] 3.6f81364a
ack+read+known_if_redirected e3617) v5 remote, 10.12.68.71:68
On 12.04.2017 22:19, Alex Gorbachev wrote:
Hi Laszlo,
On Wed, Apr 12, 2017 at 6:26 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Hello,
yesterday one of our compute nodes has recorded the following message for
one of the ceph connections:
submit_message osd_op(
e connection.
Maybe both are wrong and the truth is a third variant ... :) This is what I
would like to understand.
Kind regards,
Laszlo
On 13.04.2017 00:36, Gregory Farnum wrote:
On Wed, Apr 12, 2017 at 3:00 AM, Laszlo Budai wrote:
Hello,
yesterday one of our compute nodes has record
17 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Hello Greg,
Thank you for the answer.
I'm still in doubt with the "lossy". What does it mean in this context? I
can think of different variants:
1. The designer of the protocol from start is consid
Hello all,
We have a ceph cluster with 72 OSDs distributed on 6 hosts, in 3 chassis. In
our crush map the we are distributing the PGs on chassis (complete crush map
below):
# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
29.05.2017 14:58, Laszlo Budai wrote:
Hello all,
We have a ceph cluster with 72 OSDs distributed on 6 hosts, in 3 chassis. In
our crush map the we are distributing the PGs on chassis (complete crush map
below):
# rules
rule replicated_ruleset {
ruleset 0
type replicated
Hello,
can someone give me some directions on how the ceph recovery works?
Let's suppose we have a ceph cluster with several nodes grouped in 3 racks (2
nodes/rack). The crush map is configured to distribute PGs on OSDs from
different racks.
What happens if a node fails? Where can I read a des
s can work
if you replace failed storage quickly.
On Mon, May 29, 2017, 12:07 PM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Dear all,
How should ceph react in case of a host failure when from a total of 72
OSDs 12 are out?
is it normal that for the remapping of the PG
ee if any of the PGs are showing that
they are reflecting that they are running on multiple OSDs inside of the same
failure domain.
On Tue, May 30, 2017 at 12:34 PM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Hello David,
Thank you for your message.
Indeed we were exp
the crush map, an osd being marked
out changes the crush map, an osd being removed from the cluster changes the
crush map... The crush map changes all the time even if you aren't modifying it
directly.
On Tue, May 30, 2017 at 2:08 PM Laszlo Budai mailto:las...@componentsoft.eu>> wro
2017 at 6:17 AM, Gregory Farnum wrote:
On Mon, May 29, 2017 at 4:58 AM, Laszlo Budai wrote:
Hello all,
We have a ceph cluster with 72 OSDs distributed on 6 hosts, in 3 chassis. In
our crush map the we are distributing the PGs on chassis (complete crush map
below):
# rules
rule repli
on by default?
Yesterday we were able to reproduce the issue on a test cluster. Hammer has
performed the same way, but Jewel has worked properly.
Upgrading to jewel is planned, but it was not decided yet when to happen.
Thank you,
Laszlo
On 30.05.2017 23:17, Gregory Farnum wrote:
On Mon, May
Hi David,
If I understand correctly your suggestion is the following:
If we have for instance 12 servers grouped into 3 racks (4/rack) then you would
build a crush map saying that you have 6 racks (virtual ones), and 2 servers in
each of them, right?
In this case if we are setting the failure
position where you need to rush to the datacenter to fix the
hardware problems ASAP.
On Fri, Jun 2, 2017, 5:14 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
Hi David,
If I understand correctly your suggestion is the following:
If we have for instance 12 servers gr
58 matches
Mail list logo