we run into some OSD node freezes with out of memory and eating all swap too.
Till we get more physical RAM I’d like to reduce the osd_memory_target, but
can’t find where and how to enable it.
We have 24 bluestore Disks in 64 GB centos nodes with Luminous v12.2.11
Just set value for `osd_memory
Hello,
Jason Dillaman wrote:
: For the future Ceph Octopus release, I would like to remove all
: remaining support for RBD image format v1 images baring any
: substantial pushback.
:
: The image format for new images has been defaulted to the v2 image
: format since Infernalis, the v1 for
Hi!
While trying to understand erasure coded pools, I would have expected
that "min_size" of a pool is equal to the "K" parameter. But it turns
out, that it is always K+1.
Isn't the description of erasure coding misleading then? In a K+M setup,
I would expect to be good (in the sense of "no
Hi,
I see that as a security feature ;-)
You can prevent data loss if k chunks are intact, but you don't want
to work with the least required amount of chunks. In a disaster
scenario you can reduce min_size to k temporarily, but the main goal
should always be to get the OSDs back up.
For ex
Hi,
I have hit the bug again, but this time only on 1 osd
here some graphs:
http://odisoweb1.odiso.net/osd8.png
latency was good until 01:00
Then I'm seeing nodes miss, bluestore onodes number is increasing (seem to be
normal),
after that latency is slowing increasing from 1ms to 3-5ms
after
You're right - WAL/DB expansion capability is present in Luminous+ releases.
But David meant volume migration stuff which appeared in Nautilus, see:
https://github.com/ceph/ceph/pull/23103
Thanks,
Igor
On 2/20/2019 9:22 AM, Konstantin Shalygin wrote:
On 2/19/19 11:46 PM, David Turner wrote:
Something interesting,
when I have restarted osd.8 at 11:20,
I'm seeing another osd.1 where latency is decreasing exactly at the same time.
(without restart of this osd).
http://odisoweb1.odiso.net/osd1.png
onodes and cache_other are also going down for osd.1 at this time.
- Mail ori
Hi all, sorry, we are newbies in Ceph and we have a newbie question
about it. We have a Ceph cluster with three mon's and two public networks:
public network = 10.100.100.0/23,10.100.101.0/21
We have seen that ceph-mon are listen in only one of this network:
tcp 0 0 10.100.100.9:6789 0.0.0
On 2/20/19 1:03 PM, Andrés Rojas Guerrero wrote:
> Hi all, sorry, we are newbies in Ceph and we have a newbie question
> about it. We have a Ceph cluster with three mon's and two public networks:
>
> public network = 10.100.100.0/23,10.100.101.0/21
>
> We have seen that ceph-mon are listen in o
Hello. I need to raise the OSD on the node after reinstalling the OS, some
OSD were made a long time ago, not even a ceph-disk, but a set of scripts.
There was an idea to get their configuration in json via ceph-volume simple
scan, and then on a fresh system I can make a ceph-volume simple activate
On Wed, Feb 20, 2019 at 8:16 AM Анатолий Фуников
wrote:
>
> Hello. I need to raise the OSD on the node after reinstalling the OS, some
> OSD were made a long time ago, not even a ceph-disk, but a set of scripts.
> There was an idea to get their configuration in json via ceph-volume simple
> scan
Thanks for the reply.
blkid -s PARTUUID -o value /dev/sdf1 shows me nothing, but blkid /dev/sdf1
shows me this: /dev/sdf1: UUID="b03810e4-dcc1-46c2-bc31-a1e558904750"
TYPE="xfs"
ср, 20 февр. 2019 г. в 16:27, Alfredo Deza :
> On Wed, Feb 20, 2019 at 8:16 AM Анатолий Фуников
> wrote:
> >
> > Hello
on osd.8, at 01:20 when latency begin to increase, I have a scrub running
2019-02-20 01:16:08.851 7f84d24d9700 0 log_channel(cluster) log [DBG] : 5.52
scrub starts
2019-02-20 01:17:18.019 7f84ce4d1700 0 log_channel(cluster) log [DBG] : 5.52
scrub ok
2019-02-20 01:20:31.944 7f84f036e700 0 -- 1
On Wed, Feb 20, 2019 at 8:40 AM Анатолий Фуников
wrote:
>
> Thanks for the reply.
> blkid -s PARTUUID -o value /dev/sdf1 shows me nothing, but blkid /dev/sdf1
> shows me this: /dev/sdf1: UUID="b03810e4-dcc1-46c2-bc31-a1e558904750"
> TYPE="xfs"
I think this is what happens with a non-gpt partiti
Thats expected from Ceph by design. But in our case, we are using all
recommendation like rack failure domain, replication n/w,etc, still
face client IO performance issues during one OSD down..
On Tue, Feb 19, 2019 at 10:56 PM David Turner wrote:
>
> With a RACK failure domain, you should be able
Mandi! Alfredo Deza
In chel di` si favelave...
> I think this is what happens with a non-gpt partition. GPT labels will
> use a PARTUUID to identify the partition, and I just confirmed that
> ceph-volume will enforce looking for PARTUUID if the JSON
> identified a partition (vs. an LV).
> From w
On Wed, Feb 20, 2019 at 10:21 AM Marco Gaiarin wrote:
>
> Mandi! Alfredo Deza
> In chel di` si favelave...
>
> > I think this is what happens with a non-gpt partition. GPT labels will
> > use a PARTUUID to identify the partition, and I just confirmed that
> > ceph-volume will enforce looking for
On Wed, Feb 20, 2019 at 10:22:47AM +0100, Jan Kasprzak wrote:
> If I read the parallel thread about pool migration in ceph-users@
> correctly, the ability to migrate to v2 would still require to stop the client
> before the "rbd migration prepare" can be executed.
Note, if even rbd supporte
Hello,
Check your CPU usage when you are doing those kind of operations. We
had a similar issue where our CPU monitoring was reporting fine < 40%
usage, but our load on the nodes was high mid 60-80. If it's possible
try disabling ht and see the actual cpu usage.
If you are hitting CPU limits you
Hi,
I would decrese max active recovery processes per osd and increase
recovery sleep.
osd recovery max active = 1 (default is 3)
osd recovery sleep = 1 (default is 0 or 0.1)
osd max backfills defaults to 1 so that should be OK if he's using the
default :D
Disabling scrubbing during reco
Ohh, sorry for the question, now I understand why we need to define
differents public networks in Ceph, I understand that clients contact
with the mon in order to obtain only the cluster map, from the
documentation:
"When a Ceph Client binds to a Ceph Monitor, it retrieves the latest
copy of
Hi everyone,
Thank you all for the quick replies!
So making snapshots and rsyncing them between clusters should work, I'll be
sure to check that out. Snapshot mirroring is what we'd need, but I couldn't
find any release date on nautilus, and we don’t really have time to wait for
its release.
Mandi! Alfredo Deza
In chel di` si favelave...
> > Ahem, how can i add a GPT label to a non-GPT partition (even loosing
> > data)?
> If you are coming from ceph-disk (or something else custom-made) and
> don't care about losing data, why not fully migrate to the
> new OSDs?
> http://docs.ceph.c
Hi all,
hope someone can help me. After restarting a node of my 2-node-cluster
suddenly I get this:
root@yak2 /var/www/projects # ceph -s
cluster:
id: 749b2473-9300-4535-97a6-ee6d55008a1b
health: HEALTH_WARN
Reduced data availability: 200 pgs inactive
services:
On Tue, Feb 19, 2019 at 11:39 AM Fyodor Ustinov wrote:
>
> Hi!
>
> From documentation:
>
> mds beacon grace
> Description:The interval without beacons before Ceph declares an MDS
> laggy (and possibly replace it).
> Type: Float
> Default:15
>
> I do not understand, 15 - are is secon
If I'm not mistaken, if you stop them at the same time during a reboot on a
node with both mds and mon, the mons might receive it, but wait to finish
their own election vote before doing anything about it. If you're trying
to keep optimal uptime for your mds, then stopping it first and on its own
Hi
When enable ccache for ceph, error occurs:
-
ccache: invalid option -- 'E'
...
Unable to determine C++ standard library, got .
-
This is because variable "CXX_STDLIB" was null
Hi,
You have problems with MRG.
http://docs.ceph.com/docs/master/rados/operations/pg-states/
*The ceph-mgr hasn’t yet received any information about the PG’s state from
an OSD since mgr started up.*
чт, 21 февр. 2019 г. в 09:04, Irek Fasikhov :
> Hi,
>
> You have problems with MRG.
> http://docs
Hi,
Please keep in mind that setting the ‘nodown' flag will prevent PGs from
becoming degraded but will also prevent client's requests from being served by
other OSDs that would have take over the non responsive one without the
‘nodown’ flag in a healthy manner. And this the whole time the OSD
Hi Sage,
Would be nice to have this one backported to Luminous if easy.
Cheers,
Frédéric.
> Le 7 juin 2018 à 13:33, Sage Weil a écrit :
>
> On Wed, 6 Jun 2018, Caspar Smit wrote:
>> Hi all,
>>
>> We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a node
>> to it.
>>
>> osd-
It's strange but parted output for this disk (/dev/sdf) show me that it's
GPT:
(parted) print
Model: ATA HGST HUS726020AL (scsi)
Disk /dev/sdf: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Number Start End SizeType File system Flags
2 1049kB 107
31 matches
Mail list logo