[ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Marco Gaiarin


Ceph still surprise me, when i'm sure i've fully understood it,
something 'strange' (to my knowledge) happen.


I need to move out a server of my ceph hammer cluster (3 nodes, 4 OSD
per node), and for some reasons i cannot simply move disks.
So i've added a new node, and yesterday i've setup the new 4 OSD.
In my mind i will add 4 OSD with weight 0, and then slowly i will lower
the old OSD weight and increase the weight of the new.

I've done before:

ceph osd set noin

and then added OSD, and (as expected) new OSD start with weight 0.

But, despite of the fact that weight is zero, rebalance happen, and
using percentage of rebalance 'weighted' to the size of new disk (eg,
i've had 18TB circa of space, i've added a 2TB disks and roughly 10% of
data start to rebalance).


Why? Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Paweł Sadowsk
Hi Marco,

On 11/22/18 9:22 AM, Marco Gaiarin wrote:
> 
> ...
> But, despite of the fact that weight is zero, rebalance happen, and
> using percentage of rebalance 'weighted' to the size of new disk (eg,
> i've had 18TB circa of space, i've added a 2TB disks and roughly 10% of
> data start to rebalance).
> 
> 
> Why? Thanks.
> 

We did similar changes a many times and it always behave as expected.
Can you show you crushmap/ceph osd tree?

-- 
PS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RocksDB and WAL migration to new block device

2018-11-22 Thread Igor Fedotov

Hi Florian,


On 11/21/2018 7:01 PM, Florian Engelmann wrote:

Hi Igor,

sad to say but I failed building the tool. I tried to build the whole 
project like documented here:


http://docs.ceph.com/docs/mimic/install/build-ceph/

But as my workstation is running Ubuntu the binary fails on SLES:

./ceph-bluestore-tool --help
./ceph-bluestore-tool: symbol lookup error: ./ceph-bluestore-tool: 
undefined symbol: _ZNK7leveldb6Status8ToStringB5cxx11Ev


I did copy all libraries to ~/lib and exported LD_LIBRARY_PATH but it 
did not solve the problem.


Is there any simple method to just build the bluestore-tool standalone 
and static?



Unfortunately I don't know such a method.

May be try hex editing instead?


All the best,
Florian


Am 11/21/18 um 9:34 AM schrieb Igor Fedotov:
Actually  (given that your devices are already expanded) you don't 
need to expand them once again - one can just update size labels with 
my new PR.


For new migrations you can use updated bluefs expand command which 
sets size label automatically though.



Thanks,
Igor
On 11/21/2018 11:11 AM, Florian Engelmann wrote:
Great support Igor Both thumbs up! We will try to build the tool 
today and expand those bluefs devices once again.



Am 11/20/18 um 6:54 PM schrieb Igor Fedotov:

FYI: https://github.com/ceph/ceph/pull/25187


On 11/20/2018 8:13 PM, Igor Fedotov wrote:


On 11/20/2018 7:05 PM, Florian Engelmann wrote:

Am 11/20/18 um 4:59 PM schrieb Igor Fedotov:



On 11/20/2018 6:42 PM, Florian Engelmann wrote:

Hi Igor,



what's your Ceph version?


12.2.8 (SES 5.5 - patched to the latest version)



Can you also check the output for

ceph-bluestore-tool show-label -p 


ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 8001457295360,
    "btime": "2018-06-29 23:43:12.088842",
    "description": "main",
    "bluefs": "1",
    "ceph_fsid": "a146-6561-307e-b032-c5cee2ee520c",
    "kv_backend": "rocksdb",
    "magic": "ceph osd volume v026",
    "mkfs_done": "yes",
    "ready": "ready",
    "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098690",
    "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098023",
    "description": "bluefs db"
    }
}





It should report 'size' labels for every volume, please check 
they contain new values.




That's exactly the problem, whether "ceph-bluestore-tool 
show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did 
recognize the new sizes. But we are 100% sure the new devices 
are used as we already deleted the old once...


We tried to delete the "key" "size" to add one with the new 
value but:


ceph-bluestore-tool rm-label-key --dev 
/var/lib/ceph/osd/ceph-0/block.db -k size

key 'size' not present

even if:

ceph-bluestore-tool show-label --dev 
/var/lib/ceph/osd/ceph-0/block.db

{
    "/var/lib/ceph/osd/ceph-0/block.db": {
    "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
    "size": 524288000,
    "btime": "2018-06-29 23:43:12.098023",
    "description": "bluefs db"
    }
}

So it looks like the key "size" is "read-only"?


There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352

This PR also eliminates the need to set sizes manually on 
bdev-expand.


I thought it had been backported to Luminous but it looks like 
it doesn't.

Will submit a PR shortly.




Thank you so much Igor! So we have to decide how to proceed. 
Maybe you could help us here as well.


Option A: Wait for this fix to be available. -> could last weeks 
or even months
if you can build a custom version of ceph_bluestore_tool then this 
is a short path. I'll submit a patch today or tomorrow which you 
need to integrate into your private build.

Then you need to upgrade just the tool and apply new sizes.



Option B: Recreate OSDs "one-by-one". -> will take a very long 
time as well

No need for that IMO.


Option C: There is some "lowlevel" commad allowing us to fix 
those sizes?
Well hex editor might help here as well. What you need is just to 
update 64bit size value in block.db and block.wal files. In my lab 
I can find it at offset 0x52. Most probably this is the fixed 
location but it's better to check beforehand - existing value 
should contain value corresponding to the one reported with 
show-label. Or I can do that for you - please send the first 4K 
chunks to me along with corresponding label report.
Then update with new values - the field has to contain exactly the 
same size as your new partition.










Thanks,

Igor


On 11/20/201

Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Marco Gaiarin
Mandi! Paweł Sadowsk
  In chel di` si favelave...

> We did similar changes a many times and it always behave as expected.

Ok. Good.

> Can you show you crushmap/ceph osd tree?

Sure!

 root@blackpanther:~# ceph osd tree
 ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -1 21.83984 root default  
 -2  5.45996 host capitanamerica   
  0  1.81999 osd.0up  1.0  1.0 
  1  1.81999 osd.1up  1.0  1.0 
 10  0.90999 osd.10   up  1.0  1.0 
 11  0.90999 osd.11   up  1.0  1.0 
 -3  5.45996 host vedovanera   
  2  1.81999 osd.2up  1.0  1.0 
  3  1.81999 osd.3up  1.0  1.0 
  4  0.90999 osd.4up  1.0  1.0 
  5  0.90999 osd.5up  1.0  1.0 
 -4  5.45996 host deadpool 
  6  1.81999 osd.6up  1.0  1.0 
  7  1.81999 osd.7up  1.0  1.0 
  8  0.90999 osd.8up  1.0  1.0 
  9  0.90999 osd.9up  1.0  1.0 
 -5  5.45996 host blackpanther 
 12  1.81999 osd.12   up  0.04999  1.0 
 13  1.81999 osd.13   up  0.04999  1.0 
 14  0.90999 osd.14   up  0.04999  1.0 
 15  0.90999 osd.15   up  0.04999  1.0 

OSD 12-15 are the new OSD; after creating it with 'noin' i've
reweighted them to '0.05' (to make a test).


Crush map attached. Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host capitanamerica {
id -2   # do not change unnecessarily
# weight 5.460
alg straw
hash 0  # rjenkins1
item osd.0 weight 1.820
item osd.1 weight 1.820
item osd.10 weight 0.910
item osd.11 weight 0.910
}
host vedovanera {
id -3   # do not change unnecessarily
# weight 5.460
alg straw
hash 0  # rjenkins1
item osd.2 weight 1.820
item osd.3 weight 1.820
item osd.4 weight 0.910
item osd.5 weight 0.910
}
host deadpool {
id -4   # do not change unnecessarily
# weight 5.460
alg straw
hash 0  # rjenkins1
item osd.6 weight 1.820
item osd.7 weight 1.820
item osd.8 weight 0.910
item osd.9 weight 0.910
}
host blackpanther {
id -5   # do not change unnecessarily
# weight 5.460
alg straw
hash 0  # rjenkins1
item osd.12 weight 1.820
item osd.13 weight 1.820
item osd.14 weight 0.910
item osd.15 weight 0.910
}
root default {
id -1   # do not change unnecessarily
# weight 21.840
alg straw
hash 0  # rjenkins1
item capitanamerica weight 5.460
item vedovanera weight 5.460
item deadpool weight 5.460
item blackpanther weight 5.460
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Jarek
On Thu, 22 Nov 2018 12:05:12 +0100
Marco Gaiarin  wrote:

> Mandi! Paweł Sadowsk
>   In chel di` si favelave...
> 
> > We did similar changes a many times and it always behave as
> > expected.  
> 
> Ok. Good.
> 
> > Can you show you crushmap/ceph osd tree?  
> 
> Sure!
> 
>  root@blackpanther:~# ceph osd tree
>  ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT
> PRIMARY-AFFINITY -1 21.83984 root
> default -2  5.45996 host
> capitanamerica 0  1.81999 osd.0up
> 1.0  1.0 1  1.81999 osd.1up
> 1.0  1.0 10  0.90999 osd.10   up
> 1.0  1.0 11  0.90999 osd.11   up
> 1.0  1.0 -3  5.45996 host
> vedovanera 2  1.81999 osd.2up
> 1.0  1.0 3  1.81999 osd.3up
> 1.0  1.0 4  0.90999 osd.4up
> 1.0  1.0 5  0.90999 osd.5up
> 1.0  1.0 -4  5.45996 host
> deadpool 6  1.81999 osd.6up  1.0
> 1.0 7  1.81999 osd.7up  1.0
> 1.0 8  0.90999 osd.8up  1.0
> 1.0 9  0.90999 osd.9up  1.0
> 1.0 -5  5.45996 host
> blackpanther 12  1.81999 osd.12   up
> 0.04999  1.0 13  1.81999 osd.13   up
> 0.04999  1.0 14  0.90999 osd.14   up
> 0.04999  1.0 15  0.90999 osd.15   up
> 0.04999  1.0 
> 
> OSD 12-15 are the new OSD; after creating it with 'noin' i've
> reweighted them to '0.05' (to make a test).
> 
> 
> Crush map attached. Thanks.

When an osd is added, even with the noin flag, weight of the host is
changed, which trigers rebalance.
Instead of the noin flag, set 'osd crush initial weight = 0' in
ceph.conf.

-- 
Pozdrawiam
Jarosław Mociak - Nettelekom GK Sp. z o.o.


pgpJHf9TkZzCm.pgp
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Zongyou Yao
The reason for the rebalance is you are using straw algorithms.  If you swift 
to straw2, no data will be moved.


From: ceph-users  on behalf of Jarek 

Sent: Thursday, November 22, 2018 19:22
To: Marco Gaiarin
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] New OSD with weight 0, rebalance still happen...

On Thu, 22 Nov 2018 12:05:12 +0100
Marco Gaiarin  wrote:

> Mandi! Paweł Sadowsk
>   In chel di` si favelave...
>
> > We did similar changes a many times and it always behave as
> > expected.
>
> Ok. Good.
>
> > Can you show you crushmap/ceph osd tree?
>
> Sure!
>
>  root@blackpanther:~# ceph osd tree
>  ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT
> PRIMARY-AFFINITY -1 21.83984 root
> default -2  5.45996 host
> capitanamerica 0  1.81999 osd.0up
> 1.0  1.0 1  1.81999 osd.1up
> 1.0  1.0 10  0.90999 osd.10   up
> 1.0  1.0 11  0.90999 osd.11   up
> 1.0  1.0 -3  5.45996 host
> vedovanera 2  1.81999 osd.2up
> 1.0  1.0 3  1.81999 osd.3up
> 1.0  1.0 4  0.90999 osd.4up
> 1.0  1.0 5  0.90999 osd.5up
> 1.0  1.0 -4  5.45996 host
> deadpool 6  1.81999 osd.6up  1.0
> 1.0 7  1.81999 osd.7up  1.0
> 1.0 8  0.90999 osd.8up  1.0
> 1.0 9  0.90999 osd.9up  1.0
> 1.0 -5  5.45996 host
> blackpanther 12  1.81999 osd.12   up
> 0.04999  1.0 13  1.81999 osd.13   up
> 0.04999  1.0 14  0.90999 osd.14   up
> 0.04999  1.0 15  0.90999 osd.15   up
> 0.04999  1.0
>
> OSD 12-15 are the new OSD; after creating it with 'noin' i've
> reweighted them to '0.05' (to make a test).
>
>
> Crush map attached. Thanks.

When an osd is added, even with the noin flag, weight of the host is
changed, which trigers rebalance.
Instead of the noin flag, set 'osd crush initial weight = 0' in
ceph.conf.

--
Pozdrawiam
Jarosław Mociak - Nettelekom GK Sp. z o.o.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-11-22 Thread Matthew Vernon

Hi,

The ceph.com ceph luminous packages for Ubuntu Bionic still depend on 
libcurl3 (specifically ceph-common, radosgw. librgw2 all depend on 
libcurl3 (>= 7.28.0)).


This means that anything that depends on libcurl4 (which is the default 
libcurl in bionic) isn't co-installable with ceph. That includes the 
"curl" binary itself, which we've been using in a number of our scripts 
/ tests / etc. I would expect this to make ceph-test uninstallable on 
Bionic also...


...so shouldn't ceph packages for Bionic and later releases be compiled 
against libcurl4 (and thus Depend upon it)? The same will apply to the 
next Debian release, I expect.


The curl authors claim the API doesn't have any incompatible changes.

Regards,

Matthew
[the two packages libcurl3 and libcurl4 are not co-installable because 
libcurl3 includes a libcurl.so.4 for historical reasons :-( ]



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-11-22 Thread Richard Hesketh
Bionic's mimic packages do seem to depend on libcurl4 already, for what
that's worth:

root@vm-gw-1:/# apt-cache depends ceph-common
ceph-common
...
  Depends: libcurl4


On 22/11/2018 12:40, Matthew Vernon wrote:
> Hi,
> 
> The ceph.com ceph luminous packages for Ubuntu Bionic still depend on
> libcurl3 (specifically ceph-common, radosgw. librgw2 all depend on
> libcurl3 (>= 7.28.0)).
> 
> This means that anything that depends on libcurl4 (which is the default
> libcurl in bionic) isn't co-installable with ceph. That includes the
> "curl" binary itself, which we've been using in a number of our scripts
> / tests / etc. I would expect this to make ceph-test uninstallable on
> Bionic also...
> 
> ...so shouldn't ceph packages for Bionic and later releases be compiled
> against libcurl4 (and thus Depend upon it)? The same will apply to the
> next Debian release, I expect.
> 
> The curl authors claim the API doesn't have any incompatible changes.
> 
> Regards,
> 
> Matthew
> [the two packages libcurl3 and libcurl4 are not co-installable because
> libcurl3 includes a libcurl.so.4 for historical reasons :-( ]
> 
> 




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Marco Gaiarin
Mandi! Zongyou Yao
  In chel di` si favelave...

> The reason for the rebalance is you are using straw algorithms.  If you swift 
> to straw2, no data will be moved.

I'm still on hammer, so:

http://docs.ceph.com/docs/hammer/rados/operations/crush-map/

seems there's no 'staw2'...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-11-22 Thread Paul Emmerich
We've encountered the same problem on Debian Buster

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Do., 22. Nov. 2018 um 13:58 Uhr schrieb Richard Hesketh
:
>
> Bionic's mimic packages do seem to depend on libcurl4 already, for what
> that's worth:
>
> root@vm-gw-1:/# apt-cache depends ceph-common
> ceph-common
> ...
>   Depends: libcurl4
>
>
> On 22/11/2018 12:40, Matthew Vernon wrote:
> > Hi,
> >
> > The ceph.com ceph luminous packages for Ubuntu Bionic still depend on
> > libcurl3 (specifically ceph-common, radosgw. librgw2 all depend on
> > libcurl3 (>= 7.28.0)).
> >
> > This means that anything that depends on libcurl4 (which is the default
> > libcurl in bionic) isn't co-installable with ceph. That includes the
> > "curl" binary itself, which we've been using in a number of our scripts
> > / tests / etc. I would expect this to make ceph-test uninstallable on
> > Bionic also...
> >
> > ...so shouldn't ceph packages for Bionic and later releases be compiled
> > against libcurl4 (and thus Depend upon it)? The same will apply to the
> > next Debian release, I expect.
> >
> > The curl authors claim the API doesn't have any incompatible changes.
> >
> > Regards,
> >
> > Matthew
> > [the two packages libcurl3 and libcurl4 are not co-installable because
> > libcurl3 includes a libcurl.so.4 for historical reasons :-( ]
> >
> >
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw, Keystone integration, and the S3 API

2018-11-22 Thread Florian Haas
On 19/11/2018 16:23, Florian Haas wrote:
> Hi everyone,
> 
> I've recently started a documentation patch to better explain Swift
> compatibility and OpenStack integration for radosgw; a WIP PR is at
> https://github.com/ceph/ceph/pull/25056/. I have, however, run into an
> issue that I would really *like* to document, except I don't know
> whether what I'm seeing is how things are supposed to work. :)
> 
> This is about multi-tenancy in radosgw, in combination with S3
> authentication via Keystone (and EC2-compatible credentials generated
> from OpenStack, as explained in my doc patch). Now, when I enable
> rgw_s3_use_keystone_auth and rgw_keystone_implicit_tenants, then, if I
> create an S3 bucket in radosgw for the first time, naming that bucket
> "foo", the following things happen:
> 
> * I see a user that has been created, and that I can query with
>   "radosgw-admin user info", that is named
>   ff569d377ecb4f77875fa1b3f89eb16f$ff569d377ecb4f77875fa1b3f89eb16f
>   (that is, the Keystone tenant/project UUID twice[1], separated by a $
>   character). Its display_name is the name of my tenant.
> 
> * With "radosgw-admin bucket list
> --uid='ff569d377ecb4f77875fa1b3f89eb16f$ff569d377ecb4f77875fa1b3f89eb16f'",
>   I see a bucket that has been created, and that has been named "foo".
> 
> So far, all is well. If I do this, then I can see an bucket named
> "foo" if I use an S3 client, and I can see a container named "foo",
> with identical content, if I use the Swift API.
> 
> Now, if I enable rgw_swift_account_in_url, and update my Keystone
> object store endpoint to include AUTH_%(tenant_id)s, then using the
> Swift API I can also use public ACLs and temp URLs.
> 
> However, I am stumped trying to to understand how exactly this is meant
> to work with the S3 API.
> 
> So I have two questions:
> 
> (1) What do I have to do to get publicly-readable buckets to work in
> the Keystone-authenticated scenario? Moreover, what is the correct
> path to use, for a non-S3 client like curl or a browser, to access
> an object? It seems that using
> http://host:port/ff569d377ecb4f77875fa1b3f89eb16f:foo/bar works
> for S3 objects with a public ACL set, but if I try to use the same
> approach with a signed object, I get a 403 with
> SignatureDoesNotMatch. It seems like what I have to use for a
> signed object is, instead,
> 
> http://host:port/foo/bar?AWSAccessKeyId=something&Expires=something&Signature=something.
> However, if I do *ask* for a signed object that includes the
> tenant name, as in "s3cmd signurl
> s3://5ed51981f4a8468292bf2c578806ebf:foo/bar +120", then I *can*
> use the same URL format as for public ACL objects. Is this the
> intended behavior? If so, does that mean that an application
> using the S3 API, and access/secret keys from OpenStack-backed
> EC2, should configure always itself to use the ":"
> prefix to precede the bucket name?
> 
> (2) Do I understand the documentation
> (http://docs.ceph.com/docs/mimic/radosgw/multitenancy/#s3)
> correctly in that whenever one uses multitenancy of any kind in
> radosgw, S3 bucket hostnames can't ever be used? Thus, is it correct
> to say that if a radosgw instance is meant to *only* ever
> authenticate its users against Keystone, where there is always a
> radosgw tenant that is being created, then it's pointless to set
> rgw_dns_name?
> 
> 
> If anyone could shed a light on the above, I can write up the answer and
> amend the doc patch. Thanks!

OK I *think* I've got this fairly well figured out and I've dropped the
WIP prefix from my doc patch:

https://github.com/ceph/ceph/pull/25056

As this is a documentation patch, you really don't need to be a radosgw
developer to review it — if there's anything you find unclear or plain
wrong by your experience, please do let me know; I'd much appreciate that.

> [1] This would be an additional question: why is the project UUID in
> there *twice*? Surely there's a good cause for that, but it presently
> escapes me. http://docs.ceph.com/docs/master/radosgw/multitenancy/ says
> "TBD – don’t forget to explain the function of rgw keystone implicit
> tenants = true" here, which isn't very helpful. :)

Although I've covered that TBD in my patch, the question of why the
tenant name is duplicated in the radosgw user name is something I still
haven't been able to suss out. So if anyone can enlighten me there,
that'd be excellent too. :)

Cheers,
Florian




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Paweł Sadowsk
On 11/22/18 12:22 PM, Jarek wrote:
> On Thu, 22 Nov 2018 12:05:12 +0100
> Marco Gaiarin  wrote:
> 
>> Mandi! Paweł Sadowsk
>>   In chel di` si favelave...
>>
>>> We did similar changes a many times and it always behave as
>>> expected.  
>>
>> Ok. Good.
>>
>>> Can you show you crushmap/ceph osd tree?  
>>
>> Sure!
>>
>>  root@blackpanther:~# ceph osd tree
>>  ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT
>> PRIMARY-AFFINITY -1 21.83984 root
>> default -2  5.45996 host
>> capitanamerica 0  1.81999 osd.0up
>> 1.0  1.0 1  1.81999 osd.1up
>> 1.0  1.0 10  0.90999 osd.10   up
>> 1.0  1.0 11  0.90999 osd.11   up
>> 1.0  1.0 -3  5.45996 host
>> vedovanera 2  1.81999 osd.2up
>> 1.0  1.0 3  1.81999 osd.3up
>> 1.0  1.0 4  0.90999 osd.4up
>> 1.0  1.0 5  0.90999 osd.5up
>> 1.0  1.0 -4  5.45996 host
>> deadpool 6  1.81999 osd.6up  1.0
>> 1.0 7  1.81999 osd.7up  1.0
>> 1.0 8  0.90999 osd.8up  1.0
>> 1.0 9  0.90999 osd.9up  1.0
>> 1.0 -5  5.45996 host
>> blackpanther 12  1.81999 osd.12   up
>> 0.04999  1.0 13  1.81999 osd.13   up
>> 0.04999  1.0 14  0.90999 osd.14   up
>> 0.04999  1.0 15  0.90999 osd.15   up
>> 0.04999  1.0 
>>
>> OSD 12-15 are the new OSD; after creating it with 'noin' i've
>> reweighted them to '0.05' (to make a test).
>>
>>
>> Crush map attached. Thanks.
> 
> When an osd is added, even with the noin flag, weight of the host is
> changed, which trigers rebalance.
> Instead of the noin flag, set 'osd crush initial weight = 0' in
> ceph.conf.

Exactly, your 'new' OSD have weight 1.81999 (osd.12, osd.13) and 0.90999
(osd.14, osd.15). As Jarek pointed out you should add them using

  'osd crush initial weight = 0'

and the use

  'ceph osd crush reweight osd.x 0.05'

to slowly increase weight on them.

From your osd tree it looks like you used 'ceph osd reweight'.

-- 
PS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Bluestore : Deep Scrubbing vs Checksums

2018-11-22 Thread Eddy Castillon
Hello dear ceph users:

We are running a ceph cluster with Luminous (BlueStore). As you may know
this new  ceph version has a new feature called "Checksums".  I would like
to ask if this feature replace to deep-scrub. In our cluster, we run
deep-scrub ever month however the impact in the performance is high.

Source:  ceph's documentation:

Checksums

BlueStore calculates, stores, and verifies checksums for all data and
metadata it stores. Any time data is read off of disk, a checksum is used
to verify the data is correct before it is exposed to any other part of the
system (or the user).



Sincerely,

Eddy Castillon
+51 934121504
eddy.castil...@qualifacts.com

Qualifacts, Inc. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Marco Gaiarin
Mandi! Paweł Sadowsk
  In chel di` si favelave...

> From your osd tree it looks like you used 'ceph osd reweight'.

Yes, and i supposed also to do the right things!

Now, i've tried to lower the to-dimissi OSD, using:
ceph osd reweight 2 0.95

leading to an osd map tree like:

 root@blackpanther:~# ceph osd tree
 ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -1 21.83984 root default  
 -2  5.45996 host capitanamerica   
  0  1.81999 osd.0up  1.0  1.0 
  1  1.81999 osd.1up  1.0  1.0 
 10  0.90999 osd.10   up  1.0  1.0 
 11  0.90999 osd.11   up  1.0  1.0 
 -3  5.45996 host vedovanera   
  2  1.81999 osd.2up  0.95000  1.0 
  3  1.81999 osd.3up  1.0  1.0 
  4  0.90999 osd.4up  1.0  1.0 
  5  0.90999 osd.5up  1.0  1.0 
 -4  5.45996 host deadpool 
  6  1.81999 osd.6up  1.0  1.0 
  7  1.81999 osd.7up  1.0  1.0 
  8  0.90999 osd.8up  1.0  1.0 
  9  0.90999 osd.9up  1.0  1.0 
 -5  5.45996 host blackpanther 
 12  1.81999 osd.12   up  0.04999  1.0 
 13  1.81999 osd.13   up  0.04999  1.0 
 14  0.90999 osd.14   up  0.04999  1.0 
 15  0.90999 osd.15   up  0.04999  1.0 

and, after rebalancing, to:

 root@blackpanther:~# ceph -s
cluster 8794c124-c2ec-4e81-8631-742992159bd6
 health HEALTH_WARN
6 pgs stuck unclean
recovery 4/2550363 objects degraded (0.000%)
recovery 11282/2550363 objects misplaced (0.442%)
 monmap e6: 6 mons at 
{0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0}
election epoch 2750, quorum 0,1,2,3,4,5 blackpanther,0,1,4,2,3
 osdmap e7300: 16 osds: 16 up, 16 in; 6 remapped pgs
  pgmap v54737590: 768 pgs, 3 pools, 3299 GB data, 830 kobjects
9870 GB used, 12474 GB / 22344 GB avail
4/2550363 objects degraded (0.000%)
11282/2550363 objects misplaced (0.442%)
 761 active+clean
   6 active+remapped
   1 active+clean+scrubbing
  client io 13476 B/s rd, 654 kB/s wr, 95 op/s

Why pgs that are in state 'stuck unclean'?

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-11-22 Thread Matthew Vernon

On 22/11/2018 13:40, Paul Emmerich wrote:

We've encountered the same problem on Debian Buster


It looks to me like this could be fixed simply by building the Bionic 
packages in a Bionic chroot (ditto Buster); maybe that could be done in 
future? Given I think the packaging process is being reviewed anyway at 
the moment (hopefully 12.2.10 will be along at some point...)


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Full L3 Ceph

2018-11-22 Thread Lazuardi Nasution
Hi,

I'm looking example Ceph configuration and topology on full layer 3
networking deployment. Maybe all daemons can use loopback alias address in
this case. But how to set cluster network and public network configuration,
using supernet? I think using loopback alias address can prevent the
daemons down due to physical interfaces disconnection and can load balance
traffic between physical interfaces without interfaces bonding, but with
ECMP.

Best regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Paweł Sadowski

On 11/22/18 6:12 PM, Marco Gaiarin wrote:

Mandi! Paweł Sadowsk
   In chel di` si favelave...


 From your osd tree it looks like you used 'ceph osd reweight'.

Yes, and i supposed also to do the right things!

Now, i've tried to lower the to-dimissi OSD, using:
ceph osd reweight 2 0.95

leading to an osd map tree like:

  root@blackpanther:~# ceph osd tree
  ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
  -1 21.83984 root default
  -2  5.45996 host capitanamerica
   0  1.81999 osd.0up  1.0  1.0
   1  1.81999 osd.1up  1.0  1.0
  10  0.90999 osd.10   up  1.0  1.0
  11  0.90999 osd.11   up  1.0  1.0
  -3  5.45996 host vedovanera
   2  1.81999 osd.2up  0.95000  1.0
   3  1.81999 osd.3up  1.0  1.0
   4  0.90999 osd.4up  1.0  1.0
   5  0.90999 osd.5up  1.0  1.0
  -4  5.45996 host deadpool
   6  1.81999 osd.6up  1.0  1.0
   7  1.81999 osd.7up  1.0  1.0
   8  0.90999 osd.8up  1.0  1.0
   9  0.90999 osd.9up  1.0  1.0
  -5  5.45996 host blackpanther
  12  1.81999 osd.12   up  0.04999  1.0
  13  1.81999 osd.13   up  0.04999  1.0
  14  0.90999 osd.14   up  0.04999  1.0
  15  0.90999 osd.15   up  0.04999  1.0

and, after rebalancing, to:

  root@blackpanther:~# ceph -s
 cluster 8794c124-c2ec-4e81-8631-742992159bd6
  health HEALTH_WARN
 6 pgs stuck unclean
 recovery 4/2550363 objects degraded (0.000%)
 recovery 11282/2550363 objects misplaced (0.442%)
  monmap e6: 6 mons at 
{0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0}
 election epoch 2750, quorum 0,1,2,3,4,5 blackpanther,0,1,4,2,3
  osdmap e7300: 16 osds: 16 up, 16 in; 6 remapped pgs
   pgmap v54737590: 768 pgs, 3 pools, 3299 GB data, 830 kobjects
 9870 GB used, 12474 GB / 22344 GB avail
 4/2550363 objects degraded (0.000%)
 11282/2550363 objects misplaced (0.442%)
  761 active+clean
6 active+remapped
1 active+clean+scrubbing
   client io 13476 B/s rd, 654 kB/s wr, 95 op/s

Why pgs that are in state 'stuck unclean'?

This is most probably due to big difference in weights between your 
hosts (the new one has 20x lower weight than the old ones) which in 
combination with straw algorithm is a 'known' issue. You could try to 
increase *choose_total_tries* in your crush map from 50 to some bigger 
number. The best IMO would be to use straw2 (which will cause some 
rebalance) and then use 'ceph osd crush reweight' (instead of 'ceph osd 
reweight') with small steps to slowly rebalance data onto new OSDs.


--
PS

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How you handle failing/slow disks?

2018-11-22 Thread Alex Litvak

Sorry for hijacking a thread but do you have an idea of what to watch for:

I monitor admin sockets of osds and occasionally I see a burst of both 
op_w_process_latency and op_w_latency to near 150 - 200 ms on 7200 SAS 
enterprise drives.
For example load average on the node jumps up with idle 97 % CPU and I see that out of 12 OSDs probably have latency of op_w_latency 170 - 180 ms and 3 more have latency of ~ 120 - 130 ms and the rest 
100 ms or below.  Does it say anything regarding possible drive failure (I am running drives inside of Dell PowerVault MD3400 and the storage unit shows them all green OK)?  Unfortunately, smartmon 
outside of box tells me nothing other then health is OK.


High load usually corresponds with when the op_w_latency affects multiple OSDs 
(4 or more) at the same time.

On 11/21/2018 10:26 AM, Paul Emmerich wrote:

Yeah, we also observed problems with HP raid controllers misbehaving
when a single disk starts to fail. We would never recommend building a
Ceph cluster on HP raid controllers until they can fix that issue.

There are several features in Ceph which detect dead disks: there are
timeouts for OSDs checking each other and there's a timeout for OSDs
checking in with the mons. But that's usually not enough in this
scenario. The good news is that recent Ceph versions will show which
OSDs are implicated in slow requests (check ceph health detail) which
at least gives you some way to figure out which OSDs are becoming
slow.

We have found it to be useful to monitor the op_*_latency values of
all OSDs (especially subop latencies) from the admin daemon to detect
such failures earlier.


Paul




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full L3 Ceph

2018-11-22 Thread Robin H. Johnson
On Fri, Nov 23, 2018 at 04:03:25AM +0700, Lazuardi Nasution wrote:
> I'm looking example Ceph configuration and topology on full layer 3
> networking deployment. Maybe all daemons can use loopback alias address in
> this case. But how to set cluster network and public network configuration,
> using supernet? I think using loopback alias address can prevent the
> daemons down due to physical interfaces disconnection and can load balance
> traffic between physical interfaces without interfaces bonding, but with
> ECMP.
I can say I've done something similar**, but I don't have access to that
environment or most*** of the configuration anymore.

One of the parts I do recall, was explicitly setting cluster_network
and public_network to empty strings, AND using public_addr+cluster_addr
instead, with routable addressing on dummy interfaces (NOT loopback).

**:For values of similar:
- 99.9% IPv6 environment
- BGP everywhere
- The only IPv4 was on the outside of HAProxy for legacy IPv4 clients.
- Quanta switchgear running Cumulus Linux, 10Gbit ports
- Hosts running Cumulus quagga fork (REQUIRED)
- Host to 2xToR using IPv6 link-local addressing only
  https://blog.ipspace.net/2015/02/bgp-configuration-made-simple-with.html
- Reliable ~19Gbit aggregate (2x10GBit)
- watch out for NIC overheating: no warning, just thermal throttle down
  to ~2.5Gbit/port.

***:Some parts of the configuration ARE public:
https://github.com/dreamhost/ceph-chef/tree/dokken

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com