Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-17 Thread Maged Mokhtar
>> Would it be 4 objects of 24M and 4 objects of 250KB? Or will the last 4 objects be artificially padded (with 0's) to meet the stripe_unit? It will be 4 object of 24M + 1M stored on the 5th object If you write 104M : 4 object of 24M + 8M stored on the 5th object If you write 105M : 4 obje

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-10-17 Thread Marco Baldini - H.S. Amiata
Hello Here my results In this node, I have 3 OSDs (1TB HDD), osd.1 and osd.2 have blocks.db in SSD partitions each of 90GB, osd.8 has no separate blocks.db pve-hs-main[0]:~$ for i in {1,2,8} ; do echo -n "osd.$i db per object: " ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_byte

Re: [ceph-users] Rbd resize, refresh rescan

2017-10-17 Thread Marc Roos
Rbd resize is automatically on the mapped host. However for the changes to appear in libvirt/qemu, I have to virsh qemu-monitor-command vps-test2 --hmp "info block" virsh qemu-monitor-command vps-test2 --hmp "block_resize drive-scsi0-0-0-0 12G" -Original Message- From: Marc Roos Sen

Re: [ceph-users] Ceph-ISCSI

2017-10-17 Thread Frédéric Nass
Hi folks, For those who missed it, the fun was here :-) : https://youtu.be/IgpVOOVNJc0?t=3715 Frederic. - Le 11 Oct 17, à 17:05, Jake Young a écrit : > On Wed, Oct 11, 2017 at 8:57 AM Jason Dillaman < [ mailto:jdill...@redhat.com > | > jdill...@redhat.com ] > wrote: >> On Wed, Oct 1

[ceph-users] Retrieve progress of volume flattening using RBD python library

2017-10-17 Thread Xavier Trilla
Hi, Does anybody know if there is a way to inspect the progress of a volume flattening while using the python rbd library? I mean, using the CLI is it possible to see the progress of the flattening, but when calling volume.flatten() it just blocks until it's done. Is there any way to infer the

Re: [ceph-users] Ceph-ISCSI

2017-10-17 Thread Jorge Pinilla López
So what I have understood the final sum up was to support MC to be able to Multipath Active/Active How is that proyect going? Windows will be able to support it because they have already implemented it client-side but unless ESXi implements it, VMware will only be able to do Active/Passive, am I

Re: [ceph-users] Ceph-ISCSI

2017-10-17 Thread Maged Mokhtar
The issue with active/active is the following condition: client initiator sends write operation to gateway server A server A does not respond within client timeout client initiator re-sends failed write operation to gateway server B client initiator sends another write operation to gateway server C

Re: [ceph-users] cephfs: some metadata operations take seconds to complete

2017-10-17 Thread Tyanko Aleksiev
Thanks for the replies. I'll move all our testbed installation to Luminous and redo the tests. Cheers, Tyanko On 17 October 2017 at 10:14, Yan, Zheng wrote: > On Tue, Oct 17, 2017 at 1:07 AM, Tyanko Aleksiev > wrote: > > Hi, > > > > At UZH we are currently evaluating cephfs as a distributed fi

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-10-17 Thread Mark Nelson
On 10/17/2017 01:54 AM, Wido den Hollander wrote: Op 16 oktober 2017 om 18:14 schreef Richard Hesketh : On 16/10/17 13:45, Wido den Hollander wrote: Op 26 september 2017 om 16:39 schreef Mark Nelson : On 09/26/2017 01:10 AM, Dietmar Rieder wrote: thanks David, that's confirming what I w

[ceph-users] Unstable clock

2017-10-17 Thread Mohamad Gebai
Hi, I am looking at the following issue: http://tracker.ceph.com/issues/21375 In summary, during a 'rados bench', impossible latency values (e.g. 9.00648e+07) are suddenly reported. I looked briefly at the code, it seems CLOCK_REALTIME is used, which means that wall clock changes would affect thi

Re: [ceph-users] Unstable clock

2017-10-17 Thread Joao Eduardo Luis
On 10/17/2017 01:30 PM, Mohamad Gebai wrote: A concern was raised: are there more critical parts of Ceph where a clock jumping around might interfere with the behavior of the cluster? It would be good to know if there are any, and maybe prepare for them? cephx and monitor paxos leases come to m

Re: [ceph-users] Unstable clock

2017-10-17 Thread Sage Weil
On Tue, 17 Oct 2017, Mohamad Gebai wrote: > Hi, > > I am looking at the following issue: http://tracker.ceph.com/issues/21375 > > In summary, during a 'rados bench', impossible latency values (e.g. > 9.00648e+07) are suddenly reported. I looked briefly at the code, it > seems CLOCK_REALTIME is us

[ceph-users] Luminous : 3 clients failing to respond to cache pressure

2017-10-17 Thread Yoann Moulin
Hello, I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) and we hit the "X clients failing to respond to cache pressure" message. I have 3 mds servers active. Is this something I have to worry about ? here some information about the cluster : > root@iccluster054:~# cep

Re: [ceph-users] Luminous : 3 clients failing to respond to cache pressure

2017-10-17 Thread Wido den Hollander
> Op 17 oktober 2017 om 15:35 schreef Yoann Moulin : > > > Hello, > > I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) > and we hit the "X clients failing to respond to cache pressure" message. > I have 3 mds servers active. > What type of client? Kernel? FUSE? If

Re: [ceph-users] Unstable clock

2017-10-17 Thread Mohamad Gebai
On 10/17/2017 09:27 AM, Sage Weil wrote: > On Tue, 17 Oct 2017, Mohamad Gebai wrote: > >> It would be good to know if there are any, and maybe prepare for them? > Adam added a new set of clock primitives that include a monotonic clock > option that should be used in all cases where we're measurin

Re: [ceph-users] Unstable clock

2017-10-17 Thread Sage Weil
On Tue, 17 Oct 2017, Mohamad Gebai wrote: > On 10/17/2017 09:27 AM, Sage Weil wrote: > > On Tue, 17 Oct 2017, Mohamad Gebai wrote: > > > >> It would be good to know if there are any, and maybe prepare for them? > > Adam added a new set of clock primitives that include a monotonic clock > > option

Re: [ceph-users] Luminous : 3 clients failing to respond to cache pressure

2017-10-17 Thread Yoann Moulin
>> I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) >> and we hit the "X clients failing to respond to cache pressure" message. >> I have 3 mds servers active. > > What type of client? Kernel? FUSE? > > If it's a kernel client, what kernel are you running? kernel clie

Re: [ceph-users] Unstable clock

2017-10-17 Thread Mohamad Gebai
On 10/17/2017 09:57 AM, Sage Weil wrote: > On Tue, 17 Oct 2017, Mohamad Gebai wrote: >> >> Thanks Sage. I assume that's the card you're referring to: >> https://trello.com/c/SAtGPq0N/65-use-time-span-monotonic-for-durations >> >> I can take of that one if no one else has started working on it. > T

[ceph-users] OSD are marked as down after jewel -> luminous upgrade

2017-10-17 Thread Daniel Carrasco
Hello, Today I've decided to upgrade my Ceph cluster to latest LTS version. To do it I've used the steps posted on release notes: http://ceph.com/releases/v12-2-0-luminous-released/ After upgrade all the daemons I've noticed that all OSD daemons are marked as down even when all are working, so th

Re: [ceph-users] OSD are marked as down after jewel -> luminous upgrade

2017-10-17 Thread Marc Roos
Did you check this? https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39886.html -Original Message- From: Daniel Carrasco [mailto:d.carra...@i2tic.com] Sent: dinsdag 17 oktober 2017 17:49 To: ceph-us...@ceph.com Subject: [ceph-users] OSD are marked as down after jewel ->

[ceph-users] OSD crashed while reparing inconsistent PG luminous

2017-10-17 Thread Ana Aviles
Hello all, We had an inconsistent PG on our cluster. While performing PG repair operation the OSD crashed. The OSD was not able to start again anymore, and there was no hardware failure on the disk itself. This is the log output 2017-10-17 17:48:55.771384 7f234930d700 -1 log_channel(cluster) log

Re: [ceph-users] OSD crashed while reparing inconsistent PG luminous

2017-10-17 Thread Cassiano Pilipavicius
Hello, I have a problem with OSDs crashing after upgrading to bluestore/luminous, due to the fact that I was using JEMALLOC and it seems that there is a bug on bluestore osds x jemalloc. Changing to tcmalloc solved my issues. Dont know if you have the same issue, but in my environment, the osds

Re: [ceph-users] OSD are marked as down after jewel -> luminous upgrade

2017-10-17 Thread Daniel Carrasco
Thanks!! I'll take a look later. Anyway, all my Ceph daemons are in same version on all nodes (I've upgraded the whole cluster). Cheers!! El 17 oct. 2017 6:39 p. m., "Marc Roos" escribió: Did you check this? https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39886.html -Or

[ceph-users] Help with full osd and RGW not responsive

2017-10-17 Thread Bryan Banister
Hi all, Still a real novice here and we didn't set up our initial RGW cluster very well. We have 134 osds and set up our RGW pool with only 64 PGs, thus not all of our OSDs got data and now we have one that is 95% full. This apparently has put the cluster into a HEALTH_ERR condition: [root@car

Re: [ceph-users] Thick provisioning

2017-10-17 Thread Jason Dillaman
There is no existing option to thick provision images within RBD. When an image is created or cloned, the only actions that occur are some small metadata updates to describe the image. This allows image creation to be a quick, constant time operation regardless of the image size. To thick provision

[ceph-users] Efficient storage of small objects / bulk erasure coding

2017-10-17 Thread Jiri Horky
Hi list, we are thinking of building relatively big CEPH-based object storage for storage of our sample files - we have about 700M files ranging from very small (1-4KiB) files to pretty big ones (several GiB). Median of file size is 64KiB. Since the required space is relatively large (1PiB of usab

[ceph-users] To check RBD cache enabled

2017-10-17 Thread Josy
Hi, I am following this article : http://ceph.com/geen-categorie/ceph-validate-that-the-rbd-cache-is-active/ I have enabled this flag in ceph.conf |[client] admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok log file = /var/log/ceph/| But the command to show the conf is not

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Jean-Charles Lopez
Hi syntax uses the admin socket file : ceph --admin-daemon /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok config get rbd_cache Should be /var/run/ceph/ceph.client.admin.$pid.$cctid.asok if your connection is using client.admin to connect to the cluster and your cluster name is set to the d

Re: [ceph-users] How to stop using (unmount) a failed OSD with BlueStore ?

2017-10-17 Thread Alejandro Comisario
hi guys, any tip or help ? On Mon, Oct 16, 2017 at 1:50 PM, Alejandro Comisario wrote: > Hi all, i have to hot-swap a failed osd on a Luminous Cluster with Blue > store (the disk is SATA, WAL and DB are on NVME). > > I've issued a: > * ceph osd crush reweight osd_id 0 > * systemctl stop (osd I'd

Re: [ceph-users] Help with full osd and RGW not responsive

2017-10-17 Thread Andreas Calminder
Hi, You should most definitely look over number of pgs, there's a pg calculator available here: http://ceph.com/pgcalc/ You can increase pgs but not the other way around ( http://docs.ceph.com/docs/jewel/rados/operations/placement-groups/) To solve the immediate problem with your cluster being fu

Re: [ceph-users] Help with full osd and RGW not responsive

2017-10-17 Thread Bryan Banister
Thanks for the response, we increased our pg count to something more reasonable (512 for now) and things are rebalancing. Cheers, -Bryan From: Andreas Calminder [mailto:andreas.calmin...@klarna.com] Sent: Tuesday, October 17, 2017 3:48 PM To: Bryan Banister Cc: Ceph Users Subject: Re: [ceph-us

Re: [ceph-users] How to stop using (unmount) a failed OSD with BlueStore ?

2017-10-17 Thread Jamie Fargen
Alejandro- Please provide the folloing information: 1) Include an example of an actual message you are seeing in dmesg. 2) Provide the output of # ceph status 3) Provide the output of # ceph osd tree Regards, Jamie Fargen On Tue, Oct 17, 2017 at 4:34 PM, Alejandro Comisario wrote: > hi guys,

[ceph-users] How to increase the size of requests written to a ceph image

2017-10-17 Thread Russell Glaue
I am running ceph jewel on 5 nodes with SSD OSDs. I have an LVM image on a local RAID of spinning disks. I have an RBD image on in a pool of SSD disks. Both disks are used to run an almost identical CentOS 7 system. Both systems were installed with the same kickstart, though the disk partitioning i

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Josy
Thanks for the reply. I added rbd_non_blocking_aio = false in ceph.conf and pushed the admin file to all nodes. - [client] admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok log file = /var/log/ceph/client.log debug rbd = 20 debug librbd = 20 rbd_non_blocking_aio = false

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-17 Thread Jason Dillaman
Take this with a grain of salt, but you could try passing "min_io_size=,opt_io_size=" as part of QEMU's HD device parameters to see if the OS picks up the larger IO defaults and actually uses them: $ qemu <...snip...> -device driver=scsi-hd,<...snip...>,min_io_size=32768,opt_io_size=4194304 On T

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Jason Dillaman
Did you restart the librbd client application after updating the config value? On Tue, Oct 17, 2017 at 5:29 PM, Josy wrote: > Thanks for the reply. > > I added rbd_non_blocking_aio = false in ceph.conf and pushed the admin file > to all nodes. > > - > [client] > admin socket = /var/run/ceph/$

Re: [ceph-users] Efficient storage of small objects / bulk erasure coding

2017-10-17 Thread Gregory Farnum
On Tue, Oct 17, 2017 at 12:42 PM Jiri Horky wrote: > Hi list, > > we are thinking of building relatively big CEPH-based object storage for > storage of our sample files - we have about 700M files ranging from very > small (1-4KiB) files to pretty big ones (several GiB). Median of file > size is 6

Re: [ceph-users] OSD crashed while reparing inconsistent PG luminous

2017-10-17 Thread Gregory Farnum
On Tue, Oct 17, 2017 at 9:51 AM Ana Aviles wrote: > Hello all, > > We had an inconsistent PG on our cluster. While performing PG repair > operation the OSD crashed. The OSD was not able to start again anymore, > and there was no hardware failure on the disk itself. This is the log > output > > 20

Re: [ceph-users] Luminous : 3 clients failing to respond to cache pressure

2017-10-17 Thread Gregory Farnum
On Tue, Oct 17, 2017 at 6:36 AM Yoann Moulin wrote: > Hello, > > I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) > and we hit the "X clients failing to respond to cache pressure" message. > I have 3 mds servers active. > > Is this something I have to worry about ? > Th

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Jean-Charles Lopez
Hi Josy, just a doubt but it looks like your ASOK file is the one from a Ceph Manager. So my suspicion is that you may be running the command from the wrong machine. To run this command, you need to ssh into the machine where the client connection is being initiated. But may be I am wrong rega

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Josy
Hi, I am running the command  from the admin server. Because there are no asok file in the client server ls /var/run/ceph/ lists no files in the client server. >> As Jason points it out you also need to make sure that your restart the client connection for the changes in the ceph.conf file to

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Jason Dillaman
On Tue, Oct 17, 2017 at 6:30 PM, Josy wrote: > Hi, > > I am running the command from the admin server. > > Because there are no asok file in the client server > ls /var/run/ceph/ lists no files in the client server. Most likely a permissions or SElinux/AppArmor issue where the librbd client appl

Re: [ceph-users] OSD crashed while reparing inconsistent PG luminous

2017-10-17 Thread Mart van Santen
Hi Greg, (I'm a colleague of Ana), Thank you for your reply On 10/17/2017 11:57 PM, Gregory Farnum wrote: > > > On Tue, Oct 17, 2017 at 9:51 AM Ana Aviles > wrote: > > Hello all, > > We had an inconsistent PG on our cluster. While performing PG repair > op

Re: [ceph-users] How to stop using (unmount) a failed OSD with BlueStore ?

2017-10-17 Thread Alejandro Comisario
Jamie, thanks for replying, info is as follow: 1) [Fri Oct 13 10:21:24 2017] sd 0:2:23:0: [sdx] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [Fri Oct 13 10:21:24 2017] sd 0:2:23:0: [sdx] tag#0 Sense Key : Medium Error [current] [Fri Oct 13 10:21:24 2017] sd 0:2:23:0: [sdx] tag#0 A

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Josy
I think it is permission error, because when running ceph -s it shows this error at the top - $ ceph -s 2017-10-17 15:53:26.132180 7f7698834700 -1 asok(0x7f76940017a0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var

Re: [ceph-users] How to stop using (unmount) a failed OSD with BlueStore ?

2017-10-17 Thread Jamie Fargen
Alejandro- Those are kernel messages indicating that the an error was encountered when data was sent to the storage device and are not related directly to the operation of Ceph. The messages you sent also appear to have happened 4 days ago on Friday and if they have subsided then it probably means

Re: [ceph-users] How to stop using (unmount) a failed OSD with BlueStore ?

2017-10-17 Thread Alejandro Comisario
I believe you are absolutelly right. It was my fault not checking the dates before posting, my bad. Thanks for you help. best. On Tue, Oct 17, 2017 at 8:14 PM, Jamie Fargen wrote: > Alejandro- > > Those are kernel messages indicating that the an error was encountered > when data was sent to the

Re: [ceph-users] How to stop using (unmount) a failed OSD with BlueStore ?

2017-10-17 Thread Alejandro Comisario
I believe you are absolutelly right. It was my fault not checking the dates before posting, my bad. Thanks for you help. best. On Tue, Oct 17, 2017 at 8:14 PM, Jamie Fargen wrote: > Alejandro- > > Those are kernel messages indicating that the an error was encountered > when data was sent to the

Re: [ceph-users] To check RBD cache enabled

2017-10-17 Thread Jean-Charles Lopez
Hi Josy, this is correct. Just make sure that your current user as well as the user for your VMs (if you are using a VM environment) are allowed to write to this directory. Also make sure that /var/run/ceph exists. Once you have fixed the permissions problem and made sure that the path where

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-17 Thread Nico Schottelius
Hello everyone, is there any solution in sight for this problem? Currently our cluster is stuck with a 2 monitor configuration, as everytime we restart the one server2, it crashes after some minutes (and in between the cluster is stuck). Should we consider downgrading to kraken to fix that probl

Re: [ceph-users] Thick provisioning

2017-10-17 Thread Wido den Hollander
> Op 17 oktober 2017 om 19:38 schreef Jason Dillaman : > > > There is no existing option to thick provision images within RBD. When > an image is created or cloned, the only actions that occur are some > small metadata updates to describe the image. This allows image > creation to be a quick, co

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-10-17 Thread Wido den Hollander
> Op 17 oktober 2017 om 14:21 schreef Mark Nelson : > > > > > On 10/17/2017 01:54 AM, Wido den Hollander wrote: > > > >> Op 16 oktober 2017 om 18:14 schreef Richard Hesketh > >> : > >> > >> > >> On 16/10/17 13:45, Wido den Hollander wrote: > Op 26 september 2017 om 16:39 schreef Mark Nel