[ceph-users] Ceph cluster upgrade

2016-07-06 Thread Kees Meijs
Hi list,

Given a single node Ceph cluster (lab), I started out with the following
CRUSH rule:
> # rules
> rule replicated_ruleset {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step choose firstn 0 type osd
> step emit
> }

Meanwhile, the cluster has grown (production) and additional hosts (and
OSDs, obviously) were added.

Ensuring redundancy between hosts, I would like to alter the rule as
follows:
> # rules
> rule replicated_ruleset {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }

Is this the way to go? I would like as little performance degradation
while rebalancing as possible.

Please advice if I need to take in account certain preparations.

Thanks in advance!

Best regards,
Kees
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster upgrade

2016-07-06 Thread Kees Meijs
Hi Micha,

Thank you very much for your prompt response. In an earlier process, I
already ran:
> $ ceph tell osd.* injectargs '--osd-max-backfills 1'
> $ ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
> $ ceph tell osd.* injectargs '--osd-client-op-priority 63'
> $ ceph tell osd.* injectargs '--osd-recovery-max-active 1'

And yes, creating a separate ruleset makes sense. But, does the proposed
ruleset itself make sense as well?

Regards,
Kees

On 06-07-16 15:36, Micha Krause wrote:
> Set these in your ceph.conf beforehand:
>
> osd recovery op priority = 1
> osd max backfills= 1
>
> I would allso suggest creating a new crush rule, instead of modifying
> your existing one.
>
> This enables you to change the rule on a per pool basis:
>
> ceph osd pool set  crush_rulenum 
>
> Then start with your smallest pool, and see how it goes.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster upgrade

2016-07-06 Thread Kees Meijs
Thank you very much, I'll start testing the logic prior to implementation.

K.

On 06-07-16 19:20, Bob R wrote:
> See http://dachary.org/?p=3189 for some simple instructions on testing
> your crush rule logic.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-06 Thread Kees Meijs
Hi Gaurav,

Unfortunately I'm not completely sure about your setup, but I guess it
makes sense to configure Cinder and Glance to use RBD for a backend. It
seems to me, you're trying to store VM images directly on an OSD filesystem.

Please refer to http://docs.ceph.com/docs/master/rbd/rbd-openstack/ for
details.

Regards,
Kees

On 06-07-16 23:03, Gaurav Goyal wrote:
>
> I am installing ceph hammer and integrating it with openstack Liberty
> for the first time.
>
> My local disk has only 500 GB but i need to create 600 GB VM. SO i
> have created a soft link to ceph filesystem as
>
> lrwxrwxrwx 1 root root 34 Jul 6 13:02 instances ->
> /var/lib/ceph/osd/ceph-0/instances [root@OSKVM1 nova]# pwd
> /var/lib/nova [root@OSKVM1 nova]#
>
> now when i am trying to create an instance it is giving the following
> error as checked from nova-compute.log
> I need your help to fix this issue.
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-07 Thread Kees Meijs
Hi Gaurav,

The following snippets should suffice (for Cinder, at least):
> [DEFAULT]
> enabled_backends=rbd
>
> [rbd]
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> rbd_pool = cinder-volumes
> rbd_ceph_conf = /etc/ceph/ceph.conf
> rbd_flatten_volume_from_snapshot = false
> rbd_max_clone_depth = 5
> rbd_store_chunk_size = 4
> rados_connect_timeout = -1
> rbd_user = cinder
> rbd_secret = REDACTED
>
> backup_driver = cinder.backup.drivers.ceph
> backup_ceph_conf = /etc/ceph/ceph.conf
> backup_ceph_user = cinder-backup
> backup_ceph_chunk_size = 134217728
> backup_ceph_pool = backups
> backup_ceph_stripe_unit = 0
> backup_ceph_stripe_count = 0
> restore_discard_excess_bytes = true

Obviously you'd alter the directives according to your configuration
and/or wishes.

And no, creating RBD volumes by hand is not needed. Cinder will do this
for you.

K.

On 08-07-16 04:14, Gaurav Goyal wrote:
> Yeah i didnt find additional section for [ceph] in my cinder.conf
> file. Should i create that manually? 
> As i didnt find [ceph] section so i modified same parameters in
> [DEFAULT] section.
> I will change that as per your suggestion.
>
> Moreoevr checking some other links i got to know that, i must
> configure following additional parameters
> should i do that and install tgtadm package?
> rootwrap_config = /etc/cinder/rootwrap.conf
> api_paste_confg = /etc/cinder/api-paste.ini
> iscsi_helper = tgtadm
> volume_name_template = volume-%s
> volume_group = cinder-volumes
> Do i need to execute following commands? 
> "pvcreate /dev/rbd1" &
> "vgcreate cinder-volumes /dev/rbd1" 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster upgrade

2016-07-08 Thread Kees Meijs
Thank you everyone, I just tested and verified the ruleset and applied
it so some pools. Worked like a charm!

K.

On 06-07-16 19:20, Bob R wrote:
> See http://dachary.org/?p=3189 for some simple instructions on testing
> your crush rule logic.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Kees Meijs
Hi Gaurav,

Have you distributed your Ceph authentication keys to your compute
nodes? And, do they have the correct permissions in terms of Ceph?

K.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-08 Thread Kees Meijs
Hi,

I'd recommend generating an UUID and use it for all your compute nodes.
This way, you can keep your configuration in libvirt constant.

Regards,
Kees

On 08-07-16 16:15, Gaurav Goyal wrote:
>
> For below section, should i generate separate UUID for both compte hosts? 
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-11 Thread Kees Meijs
Hi,

I think there's still something misconfigured:
> Invalid: 400 Bad Request: Unknown scheme 'file' found in URI (HTTP 400)

It seems the RBD backend is not used as expected.

Have you configured both Cinder _and_ Glance to use Ceph?

Regards,
Kees

On 08-07-16 17:33, Gaurav Goyal wrote:
>
> I regenerated the UUID as per your suggestion. 
> Now i have same UUID in host1 and host2.
> I could create volumes and attach them to existing VMs.
>
> I could create new glance images. 
>
> But still finding the same error while instance launch via GUI.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2016-07-11 Thread Kees Meijs
Glad to hear it works now! Good luck with your setup.

Regards,
Kees

On 11-07-16 17:29, Gaurav Goyal wrote:
> Hello it worked for me after removing the following parameter from
> /etc/nova/nova.conf file

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Re: (no subject)

2016-07-12 Thread Kees Meijs
Sorry, should have posted this to the list.

 Forwarded Message 
Subject:Re: [ceph-users] (no subject)
Date:   Tue, 12 Jul 2016 08:30:49 +0200
From:   Kees Meijs 
To: Gaurav Goyal 



Hi Gaurav,

It might seem a little far fetched, but I'd use the qemu-img(1) tool to
convert the qcow2 image file to a Ceph backed volume.

First of all, create a volume of appropriate size in Cinder. The volume
will be sparse. Then, figure out the identifier and use rados(8) to find
the exact name of the volume in Ceph.

Finally, use qemu-img(1) and point to the volume you just found out about.

Cheers,
Kees

On 11-07-16 18:07, Gaurav Goyal wrote:
> Thanks!
>
> I need to create a VM having qcow2 image file as 6.7 GB but raw image
> as 600GB which is too big.
> Is there a way that i need not to convert qcow2 file to raw and it
> works well with rbd?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Re: (no subject)

2016-07-13 Thread Kees Meijs
Hi Fran,

Fortunately, qemu-img(1) is able to directly utilise RBD (supporting
sparse block devices)!

Please refer to http://docs.ceph.com/docs/hammer/rbd/qemu-rbd/ for examples.

Cheers,
Kees

On 13-07-16 09:18, Fran Barrera wrote:
> Can you explain how you do this procedure? I have the same problem
> with the large images and snapshots.
>
> This is what I do:
>
> # qemu-img convert -f qcow2 -O raw image.qcow2 image.img
> # openstack image create image.img
>
> But the image.img is too large.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Re: (no subject)

2016-07-13 Thread Kees Meijs
Hi,

If the qemu-img is able to handle RBD in a clever way (and I assume it
does) it is able to sparsely write the image to the Ceph pool.

But, it is an assumption! Maybe someone else could shed some light on this?

Or even better: read the source, the RBD handler specifically.

And last but not least, create an empty test image in qcow2 sparse
format of e.g. 10G and store it on Ceph. In other words: just test it
and you'll know for sure.

Cheers,
Kees

On 13-07-16 09:31, Fran Barrera wrote:
> Yes, but is the same problem isn't? The image will be too large
> because the format is raw.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Journal

2016-07-13 Thread Kees Meijs
Hi,

This is an OSD box running Hammer on Ubuntu 14.04 LTS with additional
systems administration tools:
> $ df -h | grep -v /var/lib/ceph/osd
> Filesystem  Size  Used Avail Use% Mounted on
> udev5,9G  4,0K  5,9G   1% /dev
> tmpfs   1,2G  892K  1,2G   1% /run
> /dev/dm-1   203G  2,1G  200G   2% /
> none4,0K 0  4,0K   0% /sys/fs/cgroup
> none5,0M 0  5,0M   0% /run/lock
> none5,9G 0  5,9G   0% /run/shm
> none100M 0  100M   0% /run/user
> /dev/dm-1   203G  2,1G  200G   2% /home

As you can see, less than 10G is actually used.

Regards,
Kees

On 13-07-16 11:51, Ashley Merrick wrote:
> May sound a random question, but what size would you recommend for the 
> SATA-DOM, obviously I know standard OS space requirements, but will CEPH 
> required much on the root OS of a OSD only node apart from standard logs.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Physical maintainance

2016-07-13 Thread Kees Meijs
Hi Cephers,

There's some physical maintainance I need to perform on an OSD node.
Very likely the maintainance is going to take a while since it involves
replacing components, so I would like to be well prepared.

Unfortunately it is no option to add another OSD node or rebalance at
this time, so I'm planning to operate in degraded state during the
maintainance.

If at all possible, I would to shut down the OSD node cleanly and
prevent slow (or even blocking) requests on Ceph clients.

Just setting the noout flag and shutting down the OSDs on the given node
is not enough as it seems. In fact clients do not act that well in this
case. Connections time out and for a while I/O seems to stall.

Any thoughts on this, anyone? For example, is it a sensible idea and are
writes still possible? Let's assume there are OSDs on to the
to-be-maintained host which are primary for sure.

Thanks in advance!

Cheers,
Kees


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Physical maintainance

2016-07-13 Thread Kees Meijs
Thanks!

So to sum up, I'd best:

  * set the noout flag
  * stop the OSDs one by one
  * shut down the physical node
  * jank the OSD drives to prevent ceph-disk(8) from automaticly
activating at boot time
  * do my maintainance
  * start the physical node
  * reseat and activate the OSD drives one by one
  * unset the noout flag

On 13-07-16 14:39, Jan Schermer wrote:
> If you stop the OSDs cleanly then that should cause no disruption to clients.
> Starting the OSD back up is another story, expect slow request for a while 
> there and unless you have lots of very fast CPUs on the OSD node, start them 
> one-by-one and not all at once.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Physical maintainance

2016-07-17 Thread Kees Meijs
Hi,

Thanks guys, this worked like a charm. Activating the OSDs wasn't
necessary: it seemed udev(7) helped me with that.

Cheers,
Kees

On 13-07-16 14:47, Kees Meijs wrote:
> So to sum up, I'd best:
>
>   * set the noout flag
>   * stop the OSDs one by one
>   * shut down the physical node
>   * jank the OSD drives to prevent ceph-disk(8) from automaticly
> activating at boot time
>   * do my maintainance
>   * start the physical node
>   * reseat and activate the OSD drives one by one
>   * unset the noout flag
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What HBA to choose? To expand or not to expand?

2017-09-19 Thread Kees Meijs
Hi list,

It's probably something to discuss over coffee in Ede tomorrow but I'll
ask anyway: what HBA is best suitable for Ceph nowadays?

In an earlier thread I read some comments about some "dumb" HBAs running
in IT mode but still being able to use cache on the HBA. Does it make
sense? Or, is this dangerous similar to RAID solutions* without BBU?

(On a side note, we're planning on not using SAS expanders any-more but
to "wire" each individual disk e.g. using SFF8087 per four disks
minimising risk of bus congestion and/or lock-ups.)

Anyway, in short I'm curious about opinions on brand, type and
configuration of HBA to choose.

Cheers,
Kees

*: apologies for cursing.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What HBA to choose? To expand or not to expand?

2017-09-19 Thread Kees Meijs
Hi Jake,

On 19-09-17 15:14, Jake Young wrote:
> Ideally you actually want fewer disks per server and more servers.
> This has been covered extensively in this mailing list. Rule of thumb
> is that each server should have 10% or less of the capacity of your
> cluster.

That's very true, but let's focus on the HBA.

> I didn't do extensive research to decide on this HBA, it's simply what
> my server vendor offered. There are probably better, faster, cheaper
> HBAs out there. A lot of people complain about LSI HBAs, but I am
> comfortable with them.

Given a configuration our vendor offered it's about LSI/Avago 9300-8i
with 8 drives connected individually using SFF8087 on a backplane (e.g.
not an expander). Or, 24 drives using three HBAs (6xSFF8087 in total)
when using a 4HE SuperMicro chassis with 24 drive bays.

But, what are the LSI complaints about? Or, are the complaints generic
to HBAs and/or cryptic CLI tools and not LSI specific?

> There is a management tool called storcli that can fully configure the
> HBA in one or two command lines.  There's a command that configures
> all attached disks as individual RAID0 disk groups. That command gets
> run by salt when I provision a new osd server.

The thread I read was about Areca in JBOD but still able to utilise the
cache, if I'm not mistaken. I'm not sure anymore if there was something
mentioned about BBU.

>
> What many other people are doing is using the least expensive JBOD HBA
> or the on board SAS controller in JBOD mode and then using SSD
> journals. Save the money you would have spent on the fancy HBA for
> fast, high endurance SSDs.

Thanks! And obviously I'm very interested in other comments as well.

Regards,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Deep scrubbing causes severe I/O stalling

2016-10-28 Thread Kees Meijs
Hi Cephers,

Using Ceph 0.94.9-1trusty we noticed severe I/O stalling during deep
scrubbing (vanilla parameters used in regards to scrubbing). I'm aware
this has been discussed before, but I'd like to share the parameters
we're going to evaluate:

  * osd_scrub_begin_hour 1
  * osd_scrub_end_hour 7
  * osd_scrub_min_interval 259200
  * osd_scrub_max_interval 1814400
  * osd_scrub_chunk_max 5
  * osd_scrub_sleep .1
  * osd_deep_scrub_interval 1814400
  * osd_deep_scrub_stride 1048576

Anyway, thoughts on the matter or specific parameter advice is more than
welcome.

Cheers,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deep scrubbing causes severe I/O stalling

2016-10-28 Thread Kees Meijs
Hi,

On 28-10-16 12:06, w...@42on.com wrote:
> I don't like this personally. Your cluster should be capable of doing
> a deep scrub at any moment. If not it will also not be able to handle
> a node failure during peak times.

Valid point and I totally agree. Unfortunately, the current load doesn't
give me much of a choice I'm afraid. Tweaking and extending the cluster
hardware (e.g. more and faster spinners) makes more sense but we're not
there yet.

Maybe the new parameters help us towards the "always capable" momentum.
Let's hope for the best and see what'll happen. ;-) If it works out, I
could (and will) remove the time constraints.

>   * osd_scrub_sleep .1
>
> You can try to bump that even more.

Thank you for pointing that out. I'm unsure about the osd_scrub_sleep
parameter behaviour (documentation is scarce). Could you please shed a
little light on this?

Cheers,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deep scrubbing causes severe I/O stalling

2016-10-28 Thread Kees Meijs
Hi,

Interesting... We're now running using deadline. In other posts I read
about noop for SSDs instead of CFQ.

Since we're using spinners with SSD journals; does it make since to mix
the scheduler? E.g. CFG for spinners _and_ noop for SSD?

K.

On 28-10-16 14:43, Wido den Hollander wrote:
> Make sure you use the CFQ disk scheduler for your disks though.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deep scrubbing causes severe I/O stalling

2016-11-08 Thread Kees Meijs
Hi,

As promised, our findings so far:

  * For the time being, the new scrubbing parameters work well.
  * Using CFQ for spinners and NOOP voor SSD seems to spread load over
the storage cluster a little better than deadline does. However,
overall latency seems (just a feeling, no numbers there) a little
higher.

Cheers,
Kees

On 28-10-16 15:37, Kees Meijs wrote:
>
> Interesting... We're now running using deadline. In other posts I read
> about noop for SSDs instead of CFQ.
>
> Since we're using spinners with SSD journals; does it make since to
> mix the scheduler? E.g. CFG for spinners _and_ noop for SSD?
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi list,

Our current Ceph production cluster seems to cope with performance
issues, so we decided to add a fully flash based cache tier (now running
with spinners and journals on separate SSDs).

We ordered SSDs (Intel), disk trays and read
http://docs.ceph.com/docs/hammer/rados/operations/cache-tiering/
carefully. Afterwards a new pool was created in a separate root,
assigned with a ruleset matching flash-only OSDs only.

Since adding and removing the cache tier could be done transparantly, we
decided to get going in order to save time and improve performance as
soon as possible:

> # ceph osd tier add cinder-volumes cache
> pool 'cache' is now (or already was) a tier of 'cinder-volumes'
> # ceph osd tier cache-mode cache writeback
> set cache-mode for pool 'cache' to writeback
> # ceph osd tier set-overlay cinder-volumes cache
> overlay for 'cinder-volumes' is now (or already was) 'cache'
> # ceph osd pool set cache hit_set_type bloom
> set pool 6 hit_set_type to bloom
> # ceph osd pool set cache hit_set_count 1
> set pool 6 hit_set_count to 1
> # ceph osd pool set cache hit_set_period 3600
> set pool 6 hit_set_period to 3600
> # ceph osd pool set cache target_max_bytes 257698037760
> set pool 6 target_max_bytes to 257698037760
> # ceph osd pool set cache cache_target_full_ratio 0.8
> set pool 6 cache_target_full_ratio to 0.8
Yes, full flash cache here we go! Or, is it?

After a few minutes, all hell broke loose and it seemed all IO on our
cluster was stalling and no objects were to be found in the new cache
pool called cache.

Luckily we were able to remove the cache tier in a few moments again,
restoring storage services.

The storage cluster backs both Cinder and Glance services with OpenStack.

Could someone please give some pointers in how to debug this? Log files
seem a little "voidy" on the matter, I'm afraid.

Thanks in advance! It would be great if we could implement the cache
tier again in the near future, improving performance.

Cheers,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi,

In addition, some log was generated by KVM processes:

> qemu: terminating on signal 15 from pid 2827
> osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()'
> thread 7f265a77da80 time 2016-11-23 17:26:24.237542
> osdc/ObjectCacher.cc: 551: FAILED assert(i->empty())
>  ceph version 0.94.8 (838cd35201e4fe1339e16d987cc33e873524af90)
>  1: (()+0x15b8ab) [0x7f2649afc8ab]
>  2: (()+0x38cfdd) [0x7f2649d2dfdd]
>  3: (()+0x57406) [0x7f26499f8406]
>  4: (()+0x7e3cd) [0x7f2649a1f3cd]
>  5: (rbd_close()+0x9) [0x7f26499dd529]
>  6: (()+0x2c12) [0x7f264bf70c12]
>  7: (bdrv_close()+0x80) [0x565063a61b90]
>  8: (bdrv_unref()+0x97) [0x565063a61e27]
>  9: (bdrv_close()+0x155) [0x565063a61c65]
>  10: (bdrv_close_all()+0x3c) [0x565063a61d5c]
>  11: (main()+0x418f) [0x5650637bde2f]
>  12: (__libc_start_main()+0xf5) [0x7f2655297f45]
>  13: (()+0xfbaa1) [0x5650637c1aa1]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'

Hope it helps.

Cheers,
Kees

On 24-11-16 13:06, Kees Meijs wrote:
> Could someone please give some pointers in how to debug this? Log files
> seem a little "voidy" on the matter, I'm afraid.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi Burkhard,

A testing pool makes absolute sense, thank you.

About the complete setup, the documentation states:

> The cache tiering agent can flush or evict objects based upon the
> total number of bytes *or* the total number of objects. To specify a
> maximum number of bytes, execute the following:
>
And:

> If you specify both limits, the cache tiering agent will begin
> flushing or evicting when either threshold is triggered.
>
I *did *configure target_max_bytes so I presume (yes, that is an
assumption) we should be good.

Tests will confirm or deny.

Regards,
Kees

On 24-11-16 15:05, Burkhard Linke wrote:
> Just my 2ct:
>
> A cache tier needs a complete setup, e.g. the target_max_objects
> setting is missing. Try to set all cache related settings to a sane
> value.
>
> You might also want to create a simple backend pool first and test the
> cache tier with that pool.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi Nick,

All Ceph pools have very restrictive permissions for each OpenStack
service, indeed. Besides creating the cache pool and enabling it, no
additional parameters or configuration was done.

Do I understand correctly access parameters (e.g. authx keys) are needed
for a cache tier? If yes, it would make sense to add this to the
documentation.

Cheers,
Kees

On 24-11-16 15:12, Nick Fisk wrote:
> I think I remember seeing other people with this problem before, isn't there 
> something you have to do in Openstack to make sure it
> has the correct keys to access the new cache pool? Or something like that.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi Nick,

Oh... In retrospect it makes sense in a way, but it does not as well. ;-)

To clarify: it makes sense since the cache is "just a pool" but it does
not since "it is an overlay and just a cache in between".

Anyway, something that should be well documented and warned for, if you
ask me.

Cheers,
Kees

On 24-11-16 15:29, Nick Fisk wrote:
> Yes, if your keys in use in Openstack only grant permission to the base pool, 
> then it will not be able to access the cache pool when
> enabled.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Kees Meijs
Hi,

Just checked permissions:

> # ceph auth get client.cinder
> exported keyring for client.cinder
> [client.cinder]
> key = REDACTED
> caps mon = "allow r"
> caps osd = "allow class-read object_prefix rbd_children, allow rwx
> pool=cinder-volumes, allow rwx pool=cinder-vms, allow rx
> pool=glance-images"

I presume I should add *allow rwx pool=cache* in our case?

Thanks again,
Kees

On 24-11-16 15:55, Kees Meijs wrote:
> Oh... In retrospect it makes sense in a way, but it does not as well. ;-)
>
> To clarify: it makes sense since the cache is "just a pool" but it does
> not since "it is an overlay and just a cache in between".
>
> Anyway, something that should be well documented and warned for, if you
> ask me.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CoW clone performance

2016-11-25 Thread Kees Meijs
Hi list,

We're using CoW clones (using OpenStack via Glance and Cinder) to store
virtual machine images.

For example:

> # rbd info cinder-volumes/volume-a09bd74b-f100-4043-a422-5e6be20d26b2
> rbd image 'volume-a09bd74b-f100-4043-a422-5e6be20d26b2':
> size 25600 MB in 3200 objects
> order 23 (8192 kB objects)
> block_name_prefix: rbd_data.c569832b851bc
> format: 2
> features: layering, striping
> flags:
> parent: glance-images/37a54104-fe3c-4e2a-a94b-da0f3776e1ac@snap
> overlap: 4096 MB
> stripe unit: 8192 kB
> stripe count: 1

It seems our storage cluster writes a lot, also when the virtualization
cluster isn't loaded at all and there seem to be more writes than reads.
In general that is, which is quite odd and unexpected.

In addition, performance is not as good as we would like.

Can someone please share their thoughts on this matter and for example
at flattening (or maybe not) the volumes.

Thanks in advance!

Cheers,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Kees Meijs
Hi Wido,

Valid point. At this moment, we're using a cache pool with size = 2 and
would like to "upgrade" to size = 3.

Again, you're absolutely right... ;-)

Anyway, any things to consider or could we just:

 1. Run "ceph osd pool set cache size 3".
 2. Wait for rebalancing to complete.
 3. Run "ceph osd pool set cache min_size 2".

Thanks!

Regards,
Kees

On 07-12-16 09:08, Wido den Hollander wrote:
> As a Ceph consultant I get numerous calls throughout the year to help people 
> with getting their broken Ceph clusters back online.
>
> The causes of downtime vary vastly, but one of the biggest causes is that 
> people use replication 2x. size = 2, min_size = 1.
>
> In 2016 the amount of cases I have where data was lost due to these settings 
> grew exponentially.
>
> Usually a disk failed, recovery kicks in and while recovery is happening a 
> second disk fails. Causing PGs to become incomplete.
>
> There have been to many times where I had to use xfs_repair on broken disks 
> and use ceph-objectstore-tool to export/import PGs.
>
> I really don't like these cases, mainly because they can be prevented easily 
> by using size = 3 and min_size = 2 for all pools.
>
> With size = 2 you go into the danger zone as soon as a single disk/daemon 
> fails. With size = 3 you always have two additional copies left thus keeping 
> your data safe(r).
>
> If you are running CephFS, at least consider running the 'metadata' pool with 
> size = 3 to keep the MDS happy.
>
> Please, let this be a big warning to everybody who is running with size = 2. 
> The downtime and problems caused by missing objects/replicas are usually big 
> and it takes days to recover from those. But very often data is lost and/or 
> corrupted which causes even more problems.
>
> I can't stress this enough. Running with size = 2 in production is a SERIOUS 
> hazard and should not be done imho.
>
> To anyone out there running with size = 2, please reconsider this!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 2x replication: A BIG warning

2016-12-09 Thread Kees Meijs
Hi Wido,

Since it's a Friday night, I decided to just go for it. ;-)

It took a while to rebalance the cache tier but all went well. Thanks
again for your valuable advice!

Best regards, enjoy your weekend,
Kees

On 07-12-16 14:58, Wido den Hollander wrote:
>> Anyway, any things to consider or could we just:
>>
>>  1. Run "ceph osd pool set cache size 3".
>>  2. Wait for rebalancing to complete.
>>  3. Run "ceph osd pool set cache min_size 2".
>>
> Indeed! It is a simple as that.
>
> Your cache pool can also contain very valuable data you do not want to loose.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Upgrading from Hammer

2016-12-13 Thread Kees Meijs
Hi guys,

In the past few months, I've read some posts about upgrading from
Hammer. Maybe I've missed something, but I didn't really read something
on QEMU/KVM behaviour in this context.

At the moment, we're using:

> $ qemu-system-x86_64 --version
> QEMU emulator version 2.3.0 (Debian 1:2.3+dfsg-5ubuntu9.4~cloud2),
> Copyright (c) 2003-2008 Fabrice Bellard
The Ubuntu package (originating from Canonical's cloud archive) is
utilising:

  * librados2 - 0.94.8-0ubuntu0.15.10.1~cloud0
  * librbd1 - 0.94.8-0ubuntu0.15.10.1~cloud0

I'm very curious if there's someone out there using a similar version
with a Ceph cluster on Jewel. Anything to take in account?

Thanks in advance!

Best regards,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading from Hammer

2016-12-20 Thread Kees Meijs
Hi Wido,

At the moment, we're running Ubuntu 14.04 LTS using the Ubuntu Cloud
Archive. To be precise again, it's QEMU/KVM 2.3+dfsg-5ubuntu9.4~cloud2
linked to Ceph 0.94.8-0ubuntu0.15.10.1~cloud0.

So yes, it's all about running a newer QEMU/KVM on a not so new version
of Ubuntu.

Question is, are we able to run against a Ceph cluster running Jewel
instead of Hammer. Or, do we need to upgrade our OpenStack installation
first?

Regards,
Kees

On 13-12-16 09:26, Wido den Hollander wrote:
> Why? The Ubuntu Cloud Archive is there to provide you a newer Qemu on a older 
> Ubuntu system.
>
> If you run Qemu under Ubuntu 16.04 and use the DEB packages directly from 
> Ceph you should be fine.
>
> Recent Qemu and recent Ceph :)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading from Hammer

2016-12-20 Thread Kees Meijs
Hi Wido,

Thanks again! Good to hear, it saves us a lot of upgrade trouble in advance.

If I'm not mistaken, we haven't done anything with CRUSH tunables. Any
pointers on how to make sure we really didn't?

Regards,
Kees

On 20-12-16 10:14, Wido den Hollander wrote:
> No, you don't. A Hammer/Jewel client can talk to a Hammer/Jewel cluster. One 
> thing, don't change any CRUSH tunables if the cluster runs Jewel and the 
> client is still on Hammer.
>
> The librados/librbd version is what matters. If you upgrade the cluster to 
> Jewel and leave the client on Hammer it works.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unbalanced OSD's

2016-12-30 Thread Kees Meijs
Hi Asley,

We experience (using Hammer) a similar issue. Not that I have a perfect
solution to share, but I felt like mentioning a "me too". ;-)

On a side note: we configured correct weight per drive as well.

Regards,
Kees

On 29-12-16 11:54, Ashley Merrick wrote:
>
> Hello,
>
>  
>
> I currently have 5 servers within my CEPH Cluster
>
>  
>
> 2 x (10 * 8TB Disks)
>
> 3 x (10 * 4TB Disks)
>
>  
>
> Currently seeing a larger difference in OSD use across the two
> separate server types, as well as within the server itself.
>
>  
>
> For example on one 4TB server I have an OSD at 64% and one at 84%,
> where on the 8TB servers the OSD range from 49% to 64%, where the
> highest used OSD’s are on the 4TB.
>
>  
>
> Each drive has a weight set correctly for the drive size and each
> server has the correct weight set, below is my crush map. Apart from
> running the command to adjust the re-weight is there anything I am
> doing wrong or should change for better spread of data, not looking
> for near perfect but where the 8TB drives are sitting at 64% max and
> 4TB are sitting at 80%’s causes a big inbalance.
>
>  
>
> # begin crush map
>
> tunable choose_local_tries 0
>
> tunable choose_local_fallback_tries 0
>
> tunable choose_total_tries 50
>
> tunable chooseleaf_descend_once 1
>
> tunable chooseleaf_vary_r 1
>
> tunable straw_calc_version 1
>
> tunable allowed_bucket_algs 54
>
>  
>
> # buckets
>
> host sn1 {
>
> id -2   # do not change unnecessarily
>
> # weight 72.800
>
> alg straw2
>
> hash 0  # rjenkins1
>
> item osd.0 weight 7.280
>
> item osd.1 weight 7.280
>
> item osd.3 weight 7.280
>
> item osd.4 weight 7.280
>
> item osd.2 weight 7.280
>
> item osd.5 weight 7.280
>
> item osd.6 weight 7.280
>
> item osd.7 weight 7.280
>
> item osd.8 weight 7.280
>
> item osd.9 weight 7.280
>
> }
>
> host sn3 {
>
> id -6   # do not change unnecessarily
>
> # weight 72.800
>
> alg straw2
>
> hash 0  # rjenkins1
>
> item osd.10 weight 7.280
>
> item osd.11 weight 7.280
>
> item osd.12 weight 7.280
>
> item osd.13 weight 7.280
>
> item osd.14 weight 7.280
>
> item osd.15 weight 7.280
>
> item osd.16 weight 7.280
>
> item osd.17 weight 7.280
>
> item osd.18 weight 7.280
>
> item osd.19 weight 7.280
>
> }
>
> host sn4 {
>
> id -7   # do not change unnecessarily
>
> # weight 36.060
>
> alg straw2
>
> hash 0  # rjenkins1
>
> item osd.20 weight 3.640
>
> item osd.21 weight 3.640
>
> item osd.22 weight 3.640
>
> item osd.23 weight 3.640
>
> item osd.24 weight 3.640
>
> item osd.25 weight 3.640
>
> item osd.26 weight 3.640
>
> item osd.27 weight 3.640
>
> item osd.28 weight 3.640
>
> item osd.29 weight 3.300
>
> }
>
> host sn5 {
>
> id -8   # do not change unnecessarily
>
> # weight 36.060
>
> alg straw2
>
> hash 0  # rjenkins1
>
> item osd.30 weight 3.640
>
> item osd.31 weight 3.640
>
> item osd.32 weight 3.640
>
> item osd.33 weight 3.640
>
> item osd.34 weight 3.640
>
> item osd.35 weight 3.640
>
> item osd.36 weight 3.640
>
> item osd.37 weight 3.640
>
> item osd.38 weight 3.640
>
> item osd.39 weight 3.640
>
> }
>
> host sn6 {
>
> id -9   # do not change unnecessarily
>
> # weight 36.060
>
> alg straw2
>
> hash 0  # rjenkins1
>
> item osd.40 weight 3.640
>
> item osd.41 weight 3.640
>
> item osd.42 weight 3.640
>
> item osd.43 weight 3.640
>
> item osd.44 weight 3.640
>
> item osd.45 weight 3.640
>
> item osd.46 weight 3.640
>
> item osd.47 weight 3.640
>
> item osd.48 weight 3.640
>
> item osd.49 weight 3.640
>
> }
>
> root default {
>
> id -1   # do not change unnecessarily
>
> # weight 253.780
>
> alg straw2
>
> hash 0  # rjenkins1
>
> item sn1 weight 72.800
>
> item sn3 weight 72.800
>
> item sn4 weight 36.060
>
> item sn5 weight 36.060
>
> item sn6 weight 36.060
>
> }
>
>  
>
> # rules
>
> rule replicated_ruleset {
>
> ruleset 0
>
> type replicated
>
> min_size 1
>
> max_size 10
>
> step take default
>
> step chooseleaf firstn 0 type host
>
> step emit
>
> }
>
>  
>
> Thanks,
>
> Ashley
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ce

Re: [ceph-users] Unbalanced OSD's

2016-12-30 Thread Kees Meijs
Thanks, I'll try a manual reweight at first.

Have a happy new year's eve (yes, I know it's a day early)!

Regards,
Kees

On 30-12-16 11:17, Wido den Hollander wrote:
> For this reason you can do a OSD reweight by running the 'ceph osd 
> reweight-by-utilization' command or do it manually with 'ceph osd reweight X 
> 0-1'

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-17 Thread Kees Meijs
Hi Cephers,

For the last months (well... years actually) we were quite happy using
Hammer. So far, there was no immediate cause implying an upgrade.

However, having seen Luminous providing support for BlueStore, it seemed
like a good idea to perform some upgrade steps.

Doing baby steps, I wanted to upgrade from Hammer to Infernalis first
since all ownerships should be changed because of using an unprivileged
user (good stuff!) instead of root.

So far, I've upgraded all monitors from Hammer (0.94.10) to Infernalis
(9.2.1). All seemed well resulting in HEALTH_OK.

Then, I tried upgrading one OSD server using the following procedure:

 1. Alter APT sources to utilise Infernalis instead of Hammer.
 2. Update and upgrade the packages.
 3. Since I didn't want any rebalancing going on, I ran "ceph osd set
noout" as well.
 4. Stop a OSD, then chown ceph:ceph -R /var/lib/ceph/osd/ceph-X, start
the OSD and so on.

Maybe I acted too quickly (ehrm... didn't wait long enough) but at some
point it seemed not all ownership was changed during the process.
Meanwhile we were still HEALTH_OK so I didn't really worry and fixed
left-overs using find /var/lib/ceph -not -user ceph -exec chown
ceph:ceph '{}' ';'

It seemed to work well and two days passed without any issues.

But then... Deep scrubbing happened:

>  health HEALTH_ERR
>     1 pgs inconsistent
>     2 scrub errors

So far, I figured out the two scrubbing errors apply to the same OSD,
being osd.0.

The log at the OSD shows:

> 2018-08-17 15:25:36.810866 7fa3c9e09700  0 log_channel(cluster) log
> [INF] : 3.72 deep-scrub starts
> 2018-08-17 15:25:37.221562 7fa3c7604700 -1 log_channel(cluster) log
> [ERR] : 3.72 soid -5/0072/temp_3.72_0_16187756_3476/head: failed
> to pick suitable auth object
> 2018-08-17 15:25:37.221566 7fa3c7604700 -1 log_channel(cluster) log
> [ERR] : 3.72 soid -5/0072/temp_3.72_0_16195026_251/head: failed to
> pick suitable auth object
> 2018-08-17 15:46:36.257994 7fa3c7604700 -1 log_channel(cluster) log
> [ERR] : 3.72 deep-scrub 2 errors

The situation seems similar to http://tracker.ceph.com/issues/13862 but
so far I'm unable to repair the placement group.

Meanwhile I'm forcing deep scrubbing for all placement groups applicable
to osd.0, hopefully resulting in just PG 3.72 having errors.

Awaiting deep scrubbing to finish, it seemed like a good idea to ask you
guys for help.

What's the best approach at this point?

> eph health detail
> HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
> pg 3.72 is active+clean+inconsistent, acting [0,33,39]
> 2 scrub errors

OSDs 33 and 39 are untouched (still running 0.94.10) and seem fine
without errors.

Thanks in advance for any comments or thoughts.

Regards and enjoy your weekend!
Kees

-- 
https://nefos.nl/contact

Nefos IT bv
Ambachtsweg 25 (industrienummer 4217)
5627 BZ Eindhoven
Nederland

KvK 66494931

/Aanwezig op maandag, dinsdag, woensdag en vrijdag/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-18 Thread Kees Meijs

Hi David,

Thank you for pointing out the option.

On http://docs.ceph.com/docs/infernalis/release-notes/ one can read:

 *

   Ceph daemons now run as user and group ceph by default. The ceph
   user has a static UID assigned by Fedora and Debian (also used by
   derivative distributions like RHEL/CentOS and Ubuntu). On SUSE the
   ceph user will currently get a dynamically assigned UID when the
   user is created.

   If your systems already have a ceph user, upgrading the package will
   cause problems. We suggest you first remove or rename the existing
   ‘ceph’ user before upgrading.

   When upgrading, administrators have two options:

1.

   Add the following line to ceph.conf on all hosts:

   setuser match path = /var/lib/ceph/$type/$cluster-$id

   This will make the Ceph daemons run as root (i.e., not drop
   privileges and switch to user ceph) if the daemon’s data
   directory is still owned by root. Newly deployed daemons
   will be created with data owned by user ceph and will run
   with reduced privileges, but upgraded daemons will continue
   to run as root.

2.

   Fix the data ownership during the upgrade. This is the
   preferred option, but is more work. The process for each
   host would be to:

1.

   Upgrade the ceph package. This creates the ceph user and
   group. For example:

   ceph-deploy install --stable infernalis HOST

2.

   Stop the daemon(s).:

   service ceph stop   # fedora, centos, rhel, debian
   stop ceph-all   # ubuntu

3.

   Fix the ownership:

   chown -R ceph:ceph /var/lib/ceph

4.

   Restart the daemon(s).:

   start ceph-all# ubuntu
   systemctl start ceph.target   # debian, centos, fedora, rhel


Since it seemed more elegant to me, I chose the second option and 
followed the steps.


To be continued... Over night, some more placement groups seem to be 
inconsistent. I'll post my findings later on.


Regards,
Kees

On 17-08-18 17:21, David Turner wrote:
In your baby step upgrade you should avoid the 2 non-LTS releases of 
Infernalis and Kraken.  You should go from Hammer to Jewel to Luminous.


The general rule of doing the upgrade to put all of your OSDs to be 
owned by ceph was to not change the ownership as part of the upgrade.  
There is a [1] config option that tells Ceph to override the user the 
daemons run as so that you can separate these 2 operations from each 
other simplifying each maintenance task.  It will set the user to 
whatever the user is for each daemon's folder.


[1]
setuser match path = /var/lib/ceph/$type/$cluster-$id


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-18 Thread Kees Meijs

Hi again,

After listing all placement groups the problematic OSD (osd.0) being 
part of, I forced a deep-scrub for all those PGs.


A few hours later (and some other deep scrubbing as well) the result 
seems to be:



HEALTH_ERR 8 pgs inconsistent; 14 scrub errors
pg 3.6c is active+clean+inconsistent, acting [14,2,38]
pg 3.32 is active+clean+inconsistent, acting [0,11,33]
pg 3.13 is active+clean+inconsistent, acting [8,34,9]
pg 3.30 is active+clean+inconsistent, acting [14,35,26]
pg 3.31 is active+clean+inconsistent, acting [44,35,26]
pg 3.7d is active+clean+inconsistent, acting [46,37,35]
pg 3.70 is active+clean+inconsistent, acting [0,36,11]
pg 3.72 is active+clean+inconsistent, acting [0,33,39]
14 scrub errors


OSDs (in order) 0, 8, 14 and 46 all reside on the same server. Obviously 
being the one upgraded to Infernalis.


It makes sense I acted too quick given a OSD (regarding to fixing the 
ownerships while maybe still running), maybe two but not all of them.


Although it's very likely it wouldn't make a difference, I'll try a ceph 
pg repair for each PG.


To be continued again!

Regards,
Kees

On 18-08-18 10:52, Kees Meijs wrote:
To be continued... Over night, some more placement groups seem to be 
inconsistent. I'll post my findings later on.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-18 Thread Kees Meijs

Good morning,

And... the results:

2018-08-18 17:45:08.927387 7fa3cbe0d700  0 log_channel(cluster) log 
[INF] : 3.32 repair starts
2018-08-18 17:45:12.350343 7fa3c9608700 -1 log_channel(cluster) log 
[ERR] : 3.32 soid -5/0032/temp_3.32_0_16187756_293/head: failed to 
pick suitable auth object
2018-08-18 18:07:43.908310 7fa3c9608700 -1 log_channel(cluster) log 
[ERR] : 3.32 repair 1 errors, 0 fixed


2018-08-18 18:27:48.141634 7fa3c8606700  0 log_channel(cluster) log 
[INF] : 3.70 repair starts
2018-08-18 18:27:49.073504 7fa3c8606700 -1 log_channel(cluster) log 
[ERR] : 3.70 soid -5/0070/temp_3.70_0_16187756_4006/head: failed 
to pick suitable auth object
2018-08-18 18:51:57.393099 7fa3cae0b700 -1 log_channel(cluster) log 
[ERR] : 3.70 repair 1 errors, 0 fixed


2018-08-18 19:21:20.456610 7fa3c7604700  0 log_channel(cluster) log 
[INF] : 3.72 repair starts
2018-08-18 19:21:21.303999 7fa3c9e09700 -1 log_channel(cluster) log 
[ERR] : 3.72 soid -5/0072/temp_3.72_0_16187756_3476/head: failed 
to pick suitable auth object
2018-08-18 19:21:21.304051 7fa3c9e09700 -1 log_channel(cluster) log 
[ERR] : 3.72 soid -5/0072/temp_3.72_0_16187756_5344/head: failed 
to pick suitable auth object
2018-08-18 19:21:21.304077 7fa3c9e09700 -1 log_channel(cluster) log 
[ERR] : 3.72 soid -5/0072/temp_3.72_0_16195026_251/head: failed to 
pick suitable auth object
2018-08-18 19:48:00.016879 7fa3c9e09700 -1 log_channel(cluster) log 
[ERR] : 3.72 repair 3 errors, 0 fixed


2018-08-18 17:45:08.807173 7f047f9a2700  0 log_channel(cluster) log 
[INF] : 3.13 repair starts
2018-08-18 17:45:10.669835 7f04821a7700 -1 log_channel(cluster) log 
[ERR] : 3.13 soid -5/0013/temp_3.13_0_16175425_287/head: failed to 
pick suitable auth object
2018-08-18 18:05:28.966015 7f04795c7700  0 -- 10.128.4.3:6816/5641 >> 
10.128.4.4:6800/3454 pipe(0x564161026000 sd=59 :46182 s=2 pgs=11994 
cs=31 l=0 c=0x56415b4fc2c0).fault with nothing to send, going to standby
2018-08-18 18:09:46.667875 7f047f9a2700 -1 log_channel(cluster) log 
[ERR] : 3.13 repair 1 errors, 0 fixed


2018-08-18 17:45:00.099722 7f1e4f857700  0 log_channel(cluster) log 
[INF] : 3.6c repair starts
2018-08-18 17:45:01.982007 7f1e4f857700 -1 log_channel(cluster) log 
[ERR] : 3.6c soid -5/006c/temp_3.6c_0_16187760_5765/head: failed 
to pick suitable auth object
2018-08-18 17:45:01.982042 7f1e4f857700 -1 log_channel(cluster) log 
[ERR] : 3.6c soid -5/006c/temp_3.6c_0_16187760_796/head: failed to 
pick suitable auth object
2018-08-18 18:07:33.490940 7f1e4f857700 -1 log_channel(cluster) log 
[ERR] : 3.6c repair 2 errors, 0 fixed


2018-08-18 18:29:24.339018 7f1e4d052700  0 log_channel(cluster) log 
[INF] : 3.30 repair starts
2018-08-18 18:29:25.689341 7f1e4f857700 -1 log_channel(cluster) log 
[ERR] : 3.30 soid -5/0030/temp_3.30_0_16187760_3742/head: failed 
to pick suitable auth object
2018-08-18 18:29:25.689346 7f1e4f857700 -1 log_channel(cluster) log 
[ERR] : 3.30 soid -5/0030/temp_3.30_0_16187760_3948/head: failed 
to pick suitable auth object
2018-08-18 18:54:59.123152 7f1e4f857700 -1 log_channel(cluster) log 
[ERR] : 3.30 repair 2 errors, 0 fixed


2018-08-18 18:05:27.421858 7efc52942700  0 log_channel(cluster) log 
[INF] : 3.7d repair starts
2018-08-18 18:05:29.511779 7efc5013d700 -1 log_channel(cluster) log 
[ERR] : 3.7d soid -5/007d/temp_3.7d_0_16204674_4402/head: failed 
to pick suitable auth object
2018-08-18 18:29:23.159691 7efc52942700 -1 log_channel(cluster) log 
[ERR] : 3.7d repair 1 errors, 0 fixed


I'll investigate further.

Regards,
Kees

On 18-08-18 17:43, Kees Meijs wrote:
Although it's very likely it wouldn't make a difference, I'll try a 
ceph pg repair for each PG. 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
Hi again,

Over night some other PGs seem inconsistent as well after being deep
scrubbed.

All affected OSDs log similar errors like:

> log [ERR] : 3.13 soid -5/0013/temp_3.13_0_16175425_287/head:
> failed to pick suitable auth object

Since there's temp in the name and we're running a 3-replica cluster,
I'm thinking of just reboiling the comprised OSDs.

Any thoughts on this, can I do this safely?

Current status:

> 12 active+clean+inconsistent

Nota bene: it cannot be file ownership is the real culprit of this. Like
I mentioned earlier in this thread it might be the case for one or maybe
two OSDs but definitely not all.

Regards,
Kees

On 19-08-18 08:55, Kees Meijs wrote:
> I'll investigate further.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
Ehrm, that should of course be rebuilding. (I.e. removing the OSD,
reformat, re-add.)

On 20-08-18 11:51, Kees Meijs wrote:
> Since there's temp in the name and we're running a 3-replica cluster,
> I'm thinking of just reboiling the comprised OSDs.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
Hi David,

Thanks for your advice. My end goal is BlueStore so to upgrade to Jewel
and then Luminous would be ideal.

Currently all monitors are (succesfully) running Internalis, one OSD
node is running Infernalis and all other OSD nodes have Hammer.

I'll try freeing up one Infernalis OSD at first and see what'll happen.
If it goes well I'll just (for now) give up all OSDs on the given node.
If it works, I'll end up with Hammer OSDs only and Infernalis monitors.

To be continued again!

Regards,
Kees

On 20-08-18 12:04, David Turner wrote:
> My suggestion would be to remove the osds and let the cluster recover
> from all of the other copies. I would deploy the node back to Hammer
> instead of Infernalis. Either that or remove these osds, let the
> cluster backfill, and then upgrade to Jewel, and then luminous, and
> maybe mimic if you're planning on making it to the newest LTS before
> adding the node back in. That way you could add them back in as
> bluestore (on either luminous or mimic) if that's a part of your plan.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ensure Hammer client compatibility

2018-08-20 Thread Kees Meijs
Good afternoon Cephers,

While I'm fixing our upgrade-semi-broken cluster (see thread Upgrade to
Infernalis: failed to pick suitable auth object) I'm wondering about
ensuring client compatibility.

My end goal is BlueStore (i.e. running Luminous) and unfortunately I'm
obliged to offer Hammer client compatibility.

Any pointers on how to ensure this configuration-wise?

Thanks!

Regards,
Kees

-- 
https://nefos.nl/contact

Nefos IT bv
Ambachtsweg 25 (industrienummer 4217)
5627 BZ Eindhoven
Nederland

KvK 66494931

/Aanwezig op maandag, dinsdag, woensdag en vrijdag/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
Bad news: I've got a PG stuck in down+peering now.

Please advice.

K.

On 20-08-18 12:12, Kees Meijs wrote:
> Thanks for your advice. My end goal is BlueStore so to upgrade to Jewel
> and then Luminous would be ideal.
>
> Currently all monitors are (succesfully) running Internalis, one OSD
> node is running Infernalis and all other OSD nodes have Hammer.
>
> I'll try freeing up one Infernalis OSD at first and see what'll happen.
> If it goes well I'll just (for now) give up all OSDs on the given node.
> If it works, I'll end up with Hammer OSDs only and Infernalis monitors.
>
> To be continued again!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
The given PG is back online, phew...

Meanwhile, some OSDs still on Hammer seem to crash with errors alike:

> 2018-08-20 13:06:33.819569 7f8962b2f700 -1 osd/ReplicatedPG.cc: In
> function 'void ReplicatedPG::scan_range(int, int,
> PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f8962b2f700
> time 2018-08-20 13:06:33.709922
> osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)

Restarting the OSDs seems to work.

K.

On 20-08-18 13:14, Kees Meijs wrote:
> Bad news: I've got a PG stuck in down+peering now.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-08-20 Thread Kees Meijs

Hi again,

I'm starting to feel really unlucky here...

At the moment, the situation is "sort of okay":


    1387 active+clean
  11 active+clean+inconsistent
   7 active+recovery_wait+degraded
   1 active+recovery_wait+undersized+degraded+remapped
   1 active+undersized+degraded+remapped+wait_backfill
   1 
active+undersized+degraded+remapped+inconsistent+backfilling


To ensure nothing is in the way, I disabled both scrubbing and deep 
scrubbing for the time being.


However, random OSDs (still on Hammer) constantly crash giving the error 
as mentioned earlier (osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)).


It felt like they started crashing when hitting the PG currently 
backfilling, so I set the nobackfill flag.


For now, the crashing seems to have stopped. However, the cluster seems 
slow at the moment when trying to access the given PG via KVM/QEMU (RBD).


Recap:

 * All monitors run Infernalis.
 * One OSD node runs Infernalis.
 * All other OSD nodes run Hammer.
 * One OSD on Infernalis is set to "out" and is stopped. This OSD
   seemed to contain one inconsistent PG.
 * Backfilling started.
 * After hours and hours of backfilling, OSDs started to crash.

Other than restarting the "out" and stopped OSD for the time being 
(haven't tried that yet) I'm quite lost.


Hopefully someone has some pointers for me.

Regards,
Kees

On 20-08-18 13:23, Kees Meijs wrote:

The given PG is back online, phew...

Meanwhile, some OSDs still on Hammer seem to crash with errors alike:


2018-08-20 13:06:33.819569 7f8962b2f700 -1 osd/ReplicatedPG.cc: In
function 'void ReplicatedPG::scan_range(int, int,
PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f8962b2f700
time 2018-08-20 13:06:33.709922
osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)

Restarting the OSDs seems to work.

K.

On 20-08-18 13:14, Kees Meijs wrote:

Bad news: I've got a PG stuck in down+peering now.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-08-20 Thread Kees Meijs

Hi there,

A few hours ago I started the given OSD again and gave it weight 
1.0. Backfilling started and more PGs became active+clean.


After a while the same crashing behaviour started to act up so I stopped 
the backfilling.


Running with noout,nobackfill,norebalance,noscrub,nodeep-scrub flags now 
but at least it seems the cluster seems stable (fingers crossed...)


Possible plan of attack:

1. Stopping all Infernalis OSDs.
2. Remove Ceph Infernalis packages from OSD node.
3. Install Hammer packages.
4. Start the OSDs (or maybe the package installation does this already.)

Effectively this is an OSD downgrade. Is this supported or does Ceph 
"upgrade" data structures on disk as well?


Recap: this would imply going from Infernalis back to Hammer.

Any thoughts are more than welcome (maybe a completely different 
approach makes sense...) Meanwhile, I'll try to catch some sleep.


Thanks, thanks!

Best regards,
Kees

On 20-08-18 21:46, Kees Meijs wrote:


Other than restarting the "out" and stopped OSD for the time being 
(haven't tried that yet) I'm quite lost.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ensure Hammer client compatibility

2018-08-20 Thread Kees Meijs

Hi Lincoln,

We're looking at (now existing) RBD support using KVM/QEMU, so this is 
an upgrade path.


Regards,
Kees

On 20-08-18 16:37, Lincoln Bryant wrote:

What interfaces do your Hammer clients need? If you're looking at
CephFS, we have had reasonable success moving our older clients (EL6)
to NFS Ganesha with the Ceph FSAL.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-08-21 Thread Kees Meijs

Hello David,

Thank you and I'm terribly sorry; I was unaware I was starting new threads.

From the top of my mind I say "yes it'll fit" but obviously I make sure 
at first.


Regards,
Kees

On 21-08-18 16:34, David Turner wrote:
Ceph does not support downgrading OSDs.  When you removed the single 
OSD, it was probably trying to move data onto the other OSDs in the 
node with Infernalis OSDs.  I would recommend stopping every OSD in 
that node and marking them out so the cluster will rebalance without 
them.  Assuming your cluster is able to get healthy after that, we'll 
see where things are.


Also, please stop opening so many email threads about this same 
issue.  It makes tracking this in the archives impossible.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-09-10 Thread Kees Meijs
Hi list,

A little update: meanwhile we added a new node consisting of Hammer OSDs
to ensure sufficient cluster capacity.

The upgraded node with Infernalis OSDs is completely removed from the
CRUSH map and the OSDs removed (obviously we didn't wipe the disks yet).

At the moment we're still running using flags
noout,nobackfill,noscrub,nodeep-scrub. Although now only Hammer OSDs
reside, we still experience OSD crashes on backfilling so we're unable
to achieve HEALTH_OK state.

Using debug 20 level we're (mostly my coworker Willem Jan is) figuring
out why the crashes happen exactly. Hopefully we'll figure it out.

To be continued...

Regards,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All SSD cluster performance

2017-01-16 Thread Kees Meijs
Hi Maxime,

Given your remark below, what kind of SATA SSD do you recommend for OSD
usage?

Thanks!

Regards,
Kees

On 15-01-17 21:33, Maxime Guyot wrote:
> I don’t have firsthand experience with the S3520, as Christian pointed out 
> their endurance doesn’t make them suitable for OSDs in most cases. I can only 
> advise you to keep a close eye on the SMART status of the SSDs.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Shrink cache target_max_bytes

2017-02-09 Thread Kees Meijs
Hi Cephers,

Long story short: I'd like to shrink our cache pool a little.

Is it safe to just alter cache target_max_byte and wait for objects to
get evicted? Anything to take into account?

Thanks!

Regards,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Shrink cache target_max_bytes

2017-02-14 Thread Kees Meijs
Hi Cephers,

Although I might be stating an obvious fact: altering the parameter
works as advertised.

The only issue I encountered was lowering the parameter too much at once
results in some slow requests because the cache pool is "full".

So in short: it works when lowering the parameter bit by bit.

Regards,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-11-11 Thread Kees Meijs

Hi list,

Between crashes we were able to allow the cluster to backfill as much as 
possible (all monitors Infernalis, OSDs being Hammer again).


Leftover PGs wouldn't backfill until we removed files such as:

8.0M -rw-r--r-- 1 root root 8.0M Aug 24 23:56 
temp\u3.bd\u0\u16175417\u2718__head_00BD__fffb
8.0M -rw-r--r-- 1 root root 8.0M Aug 28 05:51 
temp\u3.bd\u0\u16175417\u3992__head_00BD__fffb
8.0M -rw-r--r-- 1 root root 8.0M Aug 30 03:40 
temp\u3.bd\u0\u16175417\u4521__head_00BD__fffb
8.0M -rw-r--r-- 1 root root 8.0M Aug 31 03:46 
temp\u3.bd\u0\u16175417\u4817__head_00BD__fffb
8.0M -rw-r--r-- 1 root root 8.0M Sep  5 19:44 
temp\u3.bd\u0\u16175417\u6252__head_00BD__fffb
8.0M -rw-r--r-- 1 root root 8.0M Sep  6 14:44 
temp\u3.bd\u0\u16175417\u6593__head_00BD__fffb
8.0M -rw-r--r-- 1 root root 8.0M Sep  7 10:21 
temp\u3.bd\u0\u16175417\u6870__head_00BD__fffb


Restarting the given OSD didn't seem necessary; backfilling started to 
work and at some point enough replicas were available for each PG.


Finally deep scrubbing repaired the inconsistent PGs automagically and 
we arrived at HEALTH_OK again!


Case closed: up to Jewel.

For everyone involved: a big, big and even bigger thank you for all 
pointers and support!


Regards,
Kees

On 10-09-18 16:43, Kees Meijs wrote:

A little update: meanwhile we added a new node consisting of Hammer OSDs
to ensure sufficient cluster capacity.

The upgraded node with Infernalis OSDs is completely removed from the
CRUSH map and the OSDs removed (obviously we didn't wipe the disks yet).

At the moment we're still running using flags
noout,nobackfill,noscrub,nodeep-scrub. Although now only Hammer OSDs
reside, we still experience OSD crashes on backfilling so we're unable
to achieve HEALTH_OK state.

Using debug 20 level we're (mostly my coworker Willem Jan is) figuring
out why the crashes happen exactly. Hopefully we'll figure it out.

To be continued...


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ensure Hammer client compatibility

2018-11-12 Thread Kees Meijs

Hi list,

Having finished our adventures with Infernalis we're now finally running 
Jewel (10.2.11) on all Ceph nodes. Woohoo!


However, there's still KVM production boxes with block-rbd.so being 
linked to librados 0.94.10 which is Hammer.


Current relevant status parts:


 health HEALTH_WARN
    crush map has legacy tunables (require bobtail, min is 
firefly)

    no legacy OSD present but 'sortbitwise' flag is not set


Obviously we would like go to HEALTH_OK again without the warnings 
mentioned maintaining Hammer client support.


Running ceph osd set require_jewel_osds seemed harmless in terms of 
client compatibility so that's done already.


However, what about sortbitwise and tunables?

Thanks,
Kees

On 21-08-18 03:47, Kees Meijs wrote:
We're looking at (now existing) RBD support using KVM/QEMU, so this is 
an upgrade path. 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ensure Hammer client compatibility

2018-11-12 Thread Kees Meijs
Hi again,

I just read (and reread, and again) the chapter of Ceph Cookbook on
upgrades and
http://docs.ceph.com/docs/jewel/rados/operations/crush-map/#tunables and
figured there's a way back if needed.

The sortbitwise flag is set (repeering was almost instant) and tunables
to "hammer".

There's a lot of data shuffling going on now, so fingers crossed.

Cheers,
Kees

On 12-11-18 09:14, Kees Meijs wrote:
> However, what about sortbitwise and tunables? 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Kees Meijs
Hi Alex,

What kind of clients do you use? Is it KVM (QEMU) using NBD driver,
kernel, or...?

Regards,
Kees

On 17-11-18 20:17, Alex Litvak wrote:
> Hello everyone,
>
> I am trying to troubleshoot cluster exhibiting huge spikes of latency.
> I cannot quite catch it because it happens during the light activity
> and randomly affects one osd node out of 3 in the pool.
>
> This is a file store.
> I see some osds exhibit applied latency  of 400 ms, 1 minute load
> average shuts to 60.  Client commit latency with queue shoots to 300ms
> and journal latency (return write ack for client) (journal on Intel
> DC-S3710 SSD) shoots on 40 ms
>
> op_w_process_latency showed 250 ms and client read-modify-write
> operation readable/applied latency jumped to 1.25 s on one of the OSDs
>
> I rescheduled the scrubbing and deep scrubbing and was watching ceph
> -w activity so it is definitely not related.
>
> At the same time node shows 98 % cpu idle no significant changes in
> memory utilization, no errors on network with bandwidth utilization
> between 20 - 50 Mbit on client and back end networks
>
> OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB
> RAM, dial 6 core / 12 thread CPUs
>
> This is perhaps the most relevant part of ceph config
>
> debug lockdep = 0/0
> debug context = 0/0
> debug crush = 0/0
> debug buffer = 0/0
> debug timer = 0/0
> debug journaler = 0/0
> debug osd = 0/0
> debug optracker = 0/0
> debug objclass = 0/0
> debug filestore = 0/0
> debug journal = 0/0
> debug ms = 0/0
> debug monc = 0/0
> debug tp = 0/0
> debug auth = 0/0
> debug finisher = 0/0
> debug heartbeatmap = 0/0
> debug perfcounter = 0/0
> debug asok = 0/0
> debug throttle = 0/0
>
> [osd]
>     journal_dio = true
>     journal_aio = true
>     osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal
>     osd_journal_size = 2048 ; journal size, in megabytes
> osd crush update on start = false
>     osd mount options xfs =
> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
>     osd_op_threads = 5
>     osd_disk_threads = 4
>     osd_pool_default_size = 2
>     osd_pool_default_min_size = 1
>     osd_pool_default_pg_num = 512
>     osd_pool_default_pgp_num = 512
>     osd_crush_chooseleaf_type = 1
>     ; osd pool_default_crush_rule = 1
> ; new options 04.12.2015
> filestore_op_threads = 4
>     osd_op_num_threads_per_shard = 1
>     osd_op_num_shards = 25
>     filestore_fd_cache_size = 64
>     filestore_fd_cache_shards = 32
> filestore_fiemap = false
> ; Reduce impact of scrub (needs cfq on osds)
> osd_disk_thread_ioprio_class = "idle"
> osd_disk_thread_ioprio_priority = 7
> osd_deep_scrub_interval = 1211600
>     osd_scrub_begin_hour = 19
>     osd_scrub_end_hour = 4
>     osd_scrub_sleep = 0.1
> [client]
> rbd_cache = true
> rbd_cache_size = 67108864
> rbd_cache_max_dirty = 50331648
> rbd_cache_target_dirty = 33554432
> rbd_cache_max_dirty_age = 2
> rbd_cache_writethrough_until_flush = true
>
> OSD logs and system log at that time show nothing interesting.
>
> Any clue of what to look for in order to diagnose the load / latency
> spikes would be really appreciated.
>
> Thank you
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Altering crush-failure-domain

2019-03-04 Thread Kees Meijs
Hi Cephers,

Documentation on
http://docs.ceph.com/docs/master/rados/operations/erasure-code/ states:

> Choosing the right profile is important because it cannot be modified
> after the pool is created: a new pool with a different profile needs
> to be created and all objects from the previous pool moved to the new.

Right, that makes sense. However, is it possible to "migrate"
crush-failure-domain from osd to host, to rack and so on without copying
pools?

Regards,
Kees

-- 
https://nefos.nl/contact

Nefos IT bv
Ambachtsweg 25 (industrienummer 4217)
5627 BZ Eindhoven
Nederland

KvK 66494931

/Aanwezig op maandag, dinsdag, woensdag en vrijdag/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Altering crush-failure-domain

2019-03-04 Thread Kees Meijs
Thanks guys.

Regards,
Kees

On 04-03-19 22:18, Smith, Eric wrote:
> This will cause data migration.
>
> -Original Message-
> From: ceph-users  On Behalf Of Paul 
> Emmerich
> Sent: Monday, March 4, 2019 2:32 PM
> To: Kees Meijs 
> Cc: Ceph Users 
> Subject: Re: [ceph-users] Altering crush-failure-domain
>
> Yes, these parts of the profile are just used to create a crush rule.
> You can change the crush rule like any other crush rule.
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Mon, Mar 4, 2019 at 8:13 PM Kees Meijs  wrote:
>> Hi Cephers,
>>
>> Documentation on 
>> http://docs.ceph.com/docs/master/rados/operations/erasure-code/ states:
>>
>> Choosing the right profile is important because it cannot be modified after 
>> the pool is created: a new pool with a different profile needs to be created 
>> and all objects from the previous pool moved to the new.
>>
>>
>> Right, that makes sense. However, is it possible to "migrate" 
>> crush-failure-domain from osd to host, to rack and so on without copying 
>> pools?
>>
>> Regards,
>> Kees
>>
>> --
>> https://nefos.nl/contact
>>
>> Nefos IT bv
>> Ambachtsweg 25 (industrienummer 4217)
>> 5627 BZ Eindhoven
>> Nederland
>>
>> KvK 66494931
>>
>> Aanwezig op maandag, dinsdag, woensdag en vrijdag 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Random slow requests without any load

2019-07-17 Thread Kees Meijs
Hi,

Experienced similar issues. Our cluster internal network (completely
separated) now has NOTRACK (no connection state tracking) iptables rules.

In full:

> # iptables-save
> # Generated by xtables-save v1.8.2 on Wed Jul 17 14:57:38 2019
> *filter
> :FORWARD DROP [0:0]
> :OUTPUT ACCEPT [0:0]
> :INPUT ACCEPT [0:0]
> COMMIT
> # Completed on Wed Jul 17 14:57:38 2019
> # Generated by xtables-save v1.8.2 on Wed Jul 17 14:57:38 2019
> *raw
> :OUTPUT ACCEPT [0:0]
> :PREROUTING ACCEPT [0:0]
> -A OUTPUT -j NOTRACK
> -A PREROUTING -j NOTRACK
> COMMIT
> # Completed on Wed Jul 17 14:57:38 2019

Ceph uses IPv4 in our case, but to be complete:

> # ip6tables-save
> # Generated by xtables-save v1.8.2 on Wed Jul 17 14:58:20 2019
> *filter
> :OUTPUT ACCEPT [0:0]
> :INPUT ACCEPT [0:0]
> :FORWARD DROP [0:0]
> COMMIT
> # Completed on Wed Jul 17 14:58:20 2019
> # Generated by xtables-save v1.8.2 on Wed Jul 17 14:58:20 2019
> *raw
> :OUTPUT ACCEPT [0:0]
> :PREROUTING ACCEPT [0:0]
> -A OUTPUT -j NOTRACK
> -A PREROUTING -j NOTRACK
> COMMIT
> # Completed on Wed Jul 17 14:58:20 2019

Using this configuration, state tables never ever can fill up with
dropped connections as effect.

Cheers,
Kees

On 17-07-2019 11:27, Maximilien Cuony wrote:
> Just a quick update about this if somebody else get the same issue:
>
> The problem was with the firewall. Port range and established
> connection are allowed, but for some reasons it seems the tracking of
> connections are lost, leading to a strange state where one machine
> refuse data (RST are replied) and the sender never get the RST packed
> (even with 'related' packets allowed).
>
> There was a similar post on this list in February ("Ceph and TCP
> States") where lossing of connections in conntrack created issues, but
> the fix, net.netfilter.nf_conntrack_tcp_be_liberal=1 did not improve
> that particular case.
>
> As a workaround, we installed lighter rules for the firewall (allowing
> all packets from machines inside the cluster by default) and that
> "fixed" the issue :)
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Blacklisting during boot storm

2019-08-03 Thread Kees Meijs
Hi list,

Yesterday afternoon we experienced a compute node outage in our
OpenStack (obviously Ceph backed) cluster.

We tried to (re)start compute instances again as fast as possible,
resulting in some KVM/RBD clients getting blacklisted. The problem was
spotted very quickly so we could remove the listing at once while the
cluster was coping fine with the boot storm.

Question: what can we do to prevent the blacklisting? Or, does it make
sense to completely disable the mechanism (doesn't feel right) or maybe
configure it differently?

Thanks!

Regards,
Kees

-- 
https://nefos.nl/contact

Nefos IT bv
Ambachtsweg 25 (industrienummer 4217)
5627 BZ Eindhoven
Nederland

KvK 66494931

/Aanwezig op maandag, dinsdag, woensdag en vrijdag/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blacklisting during boot storm

2019-08-04 Thread Kees Meijs
Hi Paul,

Okay, thanks for clarifying. If we see the phenomenon again, we'll just
leave it be.

K.

On 03-08-2019 14:33, Paul Emmerich wrote:
> The usual reason for blacklisting RBD clients is breaking an exclusive
> lock because the previous owner seemed to have crashed.
> Blacklisting the old owner is necessary in case you had a network
> partition and not a crash. Note that this is entirely normal and no
> reason to worry.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com