Re: [ceph-users] ceph osd pg-upmap-items not working

2019-04-08 Thread Iain Buclaw
On Thu, 4 Apr 2019 at 13:32, Dan van der Ster  wrote:
>
> There are several more fixes queued up for v12.2.12:
>
> 16b7cc1bf9 osd/OSDMap: add log for better debugging
> 3d2945dd6e osd/OSDMap: calc_pg_upmaps - restrict optimization to
> origin pools only
> ab2dbc2089 osd/OSDMap: drop local pool filter in calc_pg_upmaps
> 119d8cb2a1 crush: fix upmap overkill
> 0729a78877 osd/OSDMap: using std::vector::reserve to reduce memory 
> reallocation
> f4f66e4f0a osd/OSDMap: more improvements to upmap
> 7bebc4cd28 osd/OSDMap: be more aggressive when trying to balance
> 1763a879e3 osd/OSDMap: potential access violation fix
> 8b3114ea62 osd/OSDMap: don't mapping all pgs each time in calc_pg_upmaps
>
> I haven't personally tried the newest of those yet because the
> balancer is working pretty well in our environment.
> Though one thing we definitely need to improve is the osd failure /
> upmap interplay. We currently lose all related upmaps when an osd is
> out -- this means that even though we re-use an osd-id we still need
> the balancer to work for awhile to restore the perfect balancing.
>
> If you have simple reproducers for your issues, please do create a tracker.
>

Upgraded to v13.2.x, and it's still the same.

https://tracker.ceph.com/issues/39136

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-08 Thread Iain Buclaw
On Mon, 8 Apr 2019 at 05:01, Matt Benjamin  wrote:
>
> Hi Christian,
>
> Dynamic bucket-index sharding for multi-site setups is being worked
> on, and will land in the N release cycle.
>

What about removing orphaned shards on the master?  Is the existing
tools able to work with that?

On the secondaries, it is no problem to proxy_pass all requests to the
master whilst all rgw pools are destroyed and recreated.

I would have though however that manually removing the known orphaned
indexes be safe though, side-stepping the annoying job of having to
force degrade the working service.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Latency spikes in OSD's based on bluestore

2019-04-08 Thread Patrik Martinsson
Hi Anthony,

Thanks for answering.

>> Which SSD model and firmware are you using?  Which HBA?
Well, from what I can see it's basically from all our SSD's, which 
unfortunately varies a bit.
But from the example I posted the particular disk was,

SSD SATA 6.0 Gb/s/0/100/1/0/0.8.0 /dev/sdgdisk   800GB 
INTEL SSDSC2BX80
PERC H730P Mini (25.5.3.0005)

All our SSD's are configured as pass through so I wouldn't think that the 
controller would be involved to much.


>> Compaction may well be a factor as well, but I’ve experienced 
>> hardware/firmware issues as well so I had to ask.
Well, my guess is that it is the compaction, and that there may be ways of 
tuning this. I'm just curious about "how to do it", and if those "spikes" we 
see are "normal operation and nothing to worry about", or if one actually 
should take some sort of action.

Thanks for answering!

Best Regards,
Patrik Martinsson,
Sweden

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-08 Thread Dan van der Ster
Which OS are you using?
With CentOS we find that the heap is not always automatically
released. (You can check the heap freelist with `ceph tell osd.0 heap
stats`).
As a workaround we run this hourly:

ceph tell mon.* heap release
ceph tell osd.* heap release
ceph tell mds.* heap release

-- Dan

On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet  wrote:
>
> Hi,
>
> on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the
> osd_memory_target :
>
> daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
> ceph3646 17.1 12.0 6828916 5893136 ? Ssl  mars29 1903:42 
> /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph --setgroup ceph
> ceph3991 12.9 11.2 6342812 5485356 ? Ssl  mars29 1443:41 
> /usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser ceph --setgroup ceph
> ceph4361 16.9 11.8 6718432 5783584 ? Ssl  mars29 1889:41 
> /usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser ceph --setgroup ceph
> ceph4731 19.7 12.2 6949584 5982040 ? Ssl  mars29 2198:47 
> /usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser ceph --setgroup ceph
> ceph5073 16.7 11.6 6639568 5701368 ? Ssl  mars29 1866:05 
> /usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser ceph --setgroup ceph
> ceph5417 14.6 11.2 6386764 5519944 ? Ssl  mars29 1634:30 
> /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph --setgroup ceph
> ceph5760 16.9 12.0 6806448 5879624 ? Ssl  mars29 1882:42 
> /usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser ceph --setgroup ceph
> ceph6105 16.0 11.6 6576336 5694556 ? Ssl  mars29 1782:52 
> /usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser ceph --setgroup ceph
>
> daevel-ob@ssdr712h:~$ free -m
>   totalusedfree  shared  buff/cache   
> available
> Mem:  47771   452101643  17 917   
> 43556
> Swap: 0   0   0
>
> # ceph daemon osd.147 config show | grep memory_target
> "osd_memory_target": "4294967296",
>
>
> And there is no recovery / backfilling, the cluster is fine :
>
>$ ceph status
>  cluster:
>id: de035250-323d-4cf6-8c4b-cf0faf6296b1
>health: HEALTH_OK
>
>  services:
>mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel
>mgr: tsyne(active), standbys: olkas, tolriq, lorunde, amphel
>osd: 120 osds: 116 up, 116 in
>
>  data:
>pools:   20 pools, 12736 pgs
>objects: 15.29M objects, 31.1TiB
>usage:   101TiB used, 75.3TiB / 177TiB avail
>pgs: 12732 active+clean
> 4 active+clean+scrubbing+deep
>
>  io:
>client:   72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd, 1.29kop/s wr
>
>
>On an other host, in the same pool, I see also high memory usage :
>
>daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd
>ceph6287  6.6 10.6 6027388 5190032 ? Ssl  mars21 1511:07 
> /usr/bin/ceph-osd -f --cluster ceph --id 131 --setuser ceph --setgroup ceph
>ceph6759  7.3 11.2 6299140 5484412 ? Ssl  mars21 1665:22 
> /usr/bin/ceph-osd -f --cluster ceph --id 132 --setuser ceph --setgroup ceph
>ceph7114  7.0 11.7 6576168 5756236 ? Ssl  mars21 1612:09 
> /usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser ceph --setgroup ceph
>ceph7467  7.4 11.1 6244668 5430512 ? Ssl  mars21 1704:06 
> /usr/bin/ceph-osd -f --cluster ceph --id 134 --setuser ceph --setgroup ceph
>ceph7821  7.7 11.1 6309456 5469376 ? Ssl  mars21 1754:35 
> /usr/bin/ceph-osd -f --cluster ceph --id 135 --setuser ceph --setgroup ceph
>ceph8174  6.9 11.6 6545224 5705412 ? Ssl  mars21 1590:31 
> /usr/bin/ceph-osd -f --cluster ceph --id 136 --setuser ceph --setgroup ceph
>ceph8746  6.6 11.1 6290004 5477204 ? Ssl  mars21 1511:11 
> /usr/bin/ceph-osd -f --cluster ceph --id 137 --setuser ceph --setgroup ceph
>ceph9100  7.7 11.6 6552080 5713560 ? Ssl  mars21 1757:22 
> /usr/bin/ceph-osd -f --cluster ceph --id 138 --setuser ceph --setgroup ceph
>
>But ! On a similar host, in a different pool, the problem is less visible :
>
>daevel-ob@ssdr712i:~$ ps auxw | grep ceph-osd
>ceph3617  2.8  9.9 5660308 4847444 ? Ssl  mars29 313:05 
> /usr/bin/ceph-osd -f --cluster ceph --id 151 --setuser ceph --setgroup ceph
>ceph3958  2.3  9.8 5661936 4834320 ? Ssl  mars29 256:55 
> /usr/bin/ceph-osd -f --cluster ceph --id 152 --setuser ceph --setgroup ceph
>ceph4299  2.3  9.8 5620616 4807248 ? Ssl  mars29 266:26 
> /usr/bin/ceph-osd -f --cluster ceph --id 153 --setuser ceph --setgroup ceph
>ceph4643  2.3  9.6 5527724 4713572 ? Ssl  mars29 262:50 
> /usr/bin/ceph-osd -f --cluster ceph --id 154 --setuser ceph --setgroup ceph
>ceph5016  2.2  9.7 5597504 4783412 ? Ssl  mars29 248:37 
> /usr/bin/ceph-osd -f --cluster ceph --id 155 --setuser ceph --setgroup

[ceph-users] radosgw cloud sync aws s3 auth failed

2019-04-08 Thread 黄明友

hi,all

   I had test the cloud sync module in radosgw.  ceph verion is 13.2.5  , 
git commit id is  cbff874f9007f1869bfd3821b7e33b2a6ffd4988;

when sync to a aws s3 endpoint ,get http 400 error , so I use http:// protocol 
,use the tcpick tool to  dump some message like this.

PUT /wuxi01 HTTP/1.1


  
Host: s3.cn-north-1.amazonaws.com.cn
Accept: */*
Authorization: AWS AKIAUQ2G7NKZFVDQ76FZ:7ThaXKa3axR7Egf1tkwZc/YNRm4=
Date: Mon, 08 Apr 2019 10:04:37 +
Content-Length: 0
HTTP/1.1 400 Bad Request
x-amz-request-id: 65803EFC370CF11A
x-amz-id-2: 
py6N1QJw+pd91mvL0XpQhiwIVOiWIUprAX8PwAuSVOx3vrqat/Ka+xIVW3D1zC0+tJSLQyr4qC4=
x-amz-region: cn-north-1
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Mon, 08 Apr 2019 10:04:37 GMT
Connection: close
Server: AmazonS3
144

InvalidRequestThe authorization mechanism you have 
provided is not supported. Please use 
AWS4-HMAC-SHA256.65803EFC370CF11Apy6N1QJw+pd91mvL0XpQhiwIVOiWIUprAX8PwAuSVOx3vrqat/Ka+xIVW3D1zC0+tJSLQyr4qC4=
0



it looks like that the client use a old auth method, not use the 
aws4-hmac-sha256. but , how can enable the aws4-hmac-sha256 auth method?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS-Ganesha Mounts as a Read-Only Filesystem

2019-04-08 Thread junk
Possibly the client doesn't like the server returning SecType = "none";

Maybe try SecType = "sys":?

Leon L. Robinson

> On 6 Apr 2019, at 12:06,   
> wrote:
> 
> Hi all,
>  
> I have recently setup a Ceph cluster and on request using CephFS (MDS 
> version: ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic 
> (stable)) as a backend for NFS-Ganesha. I have successfully tested a direct 
> mount with CephFS to read/write files, however I’m perplexed as to NFS 
> mounting as read-only despite setting the RW flags.
>  
> [root@mon02 mnt]# touch cephfs/test.txt
> touch: cannot touch âcephfs/test.txtâ: Read-only file system
>  
> Configuration of Ganesha is below:
>  
> NFS_CORE_PARAM
> {
>   Enable_NLM = false;
>   Enable_RQUOTA = false;
>   Protocols = 4;
> }
>  
> NFSv4
> {
>   Delegations = true;
>   RecoveryBackend = rados_ng;
>   Minor_Versions =  1,2;
> }
>  
> CACHEINODE {
>   Dir_Chunk = 0;
>   NParts = 1;
>   Cache_Size = 1;
> }
>  
> EXPORT
> {
> Export_ID = 15;
> Path = "/";
> Pseudo = "/cephfs/";
> Access_Type = RW;
> NFS_Protocols = "4";
> Squash = No_Root_Squash;
> Transport_Protocols = TCP;
> SecType = "none";
> Attr_Expiration_Time = 0;
> Delegations = R;
>  
> FSAL {
> Name = CEPH;
>  User_Id = "ganesha";
>  Filesystem = "cephfs";
>  Secret_Access_Key = "";
> }
> }
>  
>  
> Provided mount parameters:
>  
> mount -t nfs -o nfsvers=4.1,proto=tcp,rw,noatime,sync 172.16.32.15:/ 
> /mnt/cephfs
>  
> I have tried stripping much of the config and altering mount options, but so 
> far completely unable to decipher the cause. Also seems I’m not the only one 
> who has been caught on this:
>  
> https://www.spinics.net/lists/ceph-devel/msg41201.html
>  
> Thanks in advance,
>  
> Thomas
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to judge the results? - rados bench comparison

2019-04-08 Thread Lars Täuber
Hi there,

i'm new to ceph and just got my first cluster running.
Now i'd like to know if the performance we get is expectable.

Is there a website with benchmark results somewhere where i could have a look 
to compare with our HW and our results?

This are the results:
rados bench single threaded:
# rados bench 10 write --rbd-cache=false -t 1

Object size:4194304
Bandwidth (MB/sec): 53.7186
Stddev Bandwidth:   3.86437
Max bandwidth (MB/sec): 60
Min bandwidth (MB/sec): 48
Average IOPS:   13
Stddev IOPS:0.966092
Average Latency(s): 0.0744599
Stddev Latency(s):  0.00911778

nearly maxing out one (idle) client with 28 threads
# rados bench 10 write --rbd-cache=false -t 28

Bandwidth (MB/sec): 850.451
Stddev Bandwidth:   40.6699
Max bandwidth (MB/sec): 904
Min bandwidth (MB/sec): 748
Average IOPS:   212
Stddev IOPS:10.1675
Average Latency(s): 0.131309
Stddev Latency(s):  0.0318489

four concurrent benchmarks on four clients each with 24 threads:
Bandwidth (MB/sec): 396 376 381 389
Stddev Bandwidth:   30  25  22  22
Max bandwidth (MB/sec): 440 420 416 428
Min bandwidth (MB/sec): 352 348 344 364
Average IOPS:   99  94  95  97
Stddev IOPS:7.5 6.3 5.6 5.6
Average Latency(s): 0.240.250.250.24
Stddev Latency(s):  0.120.150.150.14

summing up: write mode
~1500 MB/sec Bandwidth
~385 IOPS
~0.25s Latency

rand mode:
~3500 MB/sec
~920 IOPS
~0.154s Latency



Maybe someone could judge our numbers. I am actually very satisfied with the 
values.

The (mostly idle) cluster is build from these components:
* 10GB frontend network, bonding two connections to mon-, mds- and osd-nodes
** no bonding to clients
* 25GB backend network, bonding two connections to osd-nodes


cluster:
* 3x mon, 2x Intel(R) Xeon(R) Bronze 3104 CPU @ 1.70GHz, 64GB RAM
* 3x mds, 1x Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz, 128MB RAM
* 7x OSD-nodes, 2x Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 96GB RAM
** 4x 6TB SAS HDD HGST HUS726T6TAL5204 (5x on two nodes, max. 6x per chassis 
for later growth)
** 2x 800GB SAS SSD WDC WUSTM3280ASS200 => SW-RAID1 => LVM ~116 GiB per OSD for 
DB and WAL

erasure encoded pool: (made for CephFS)
* plugin=clay k=5 m=2 d=6 crush-failure-domain=host

Thanks and best regards
Lars
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Replication not working

2019-04-08 Thread Jason Dillaman
The log appears to be missing all the librbd log messages. The process
seems to stop at attempting to open the image from the remote cluster:

2019-04-05 12:07:29.992323 7f0f3bfff700 20
rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20
send_open_image

Assuming you are using the default log file naming settings, the log
should be located at "/var/log/ceph/ceph-client.mirrorprod.log". Of
course, looking at your cluster naming makes me think that since your
primary cluster is named "ceph" on the DR-site side, have you changed
your "/etc/default/ceph" file to rename the local cluster from "ceph"
to "cephdr" so that the "rbd-mirror" daemon connects to the correct
local cluster?


On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana  wrote:
>
> Hi Jason,
>
> 12.2.11 is the version.
>
> Attached is the complete log file.
>
> We removed the pool to make sure there's no image left on DR site and 
> recreated an empty pool.
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Friday, April 5, 2019 2:24 PM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph Replication not working
>
> What is the version of rbd-mirror daemon and your OSDs? It looks it found two 
> replicated images and got stuck on the "wait_for_deletion"
> step. Since I suspect those images haven't been deleted, it should have 
> immediately proceeded to the next step of the image replay state machine. Are 
> there any additional log messages after 2019-04-05 12:07:29.981203?
>
> On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
> >
> > Hi there,
> >
> > We are trying to setup a rbd-mirror replication and after the setup, 
> > everything looks good but images are not replicating.
> >
> >
> >
> > Can some please please help?
> >
> >
> >
> > Thanks,
> >
> > -Vikas
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME CLIENT
> >
> >   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
> >
> >
> >
> > root@local:/etc/ceph# rbd  mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME   CLIENT
> >
> >   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
> >
> >
> >
> >
> >
> > root@local:/etc/ceph# rbd info nfs/test01
> >
> > rbd image 'test01':
> >
> > size 102400 kB in 25 objects
> >
> > order 22 (4096 kB objects)
> >
> > block_name_prefix: rbd_data.11cd3c238e1f29
> >
> > format: 2
> >
> > features: layering, exclusive-lock, object-map, fast-diff,
> > deep-flatten, journaling
> >
> > flags:
> >
> > journal: 11cd3c238e1f29
> >
> > mirroring state: enabled
> >
> > mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
> >
> > mirroring primary: true
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status nfs
> > --verbose
> >
> > health: OK
> >
> > images: 0 total
> >
> >
> >
> > root@remote:/var/log/ceph# rbd info nfs/test01
> >
> > rbd: error opening image test01: (2) No such file or directory
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# ceph -s --cluster cephdr
> >
> >   cluster:
> >
> > id: ade49174-1f84-4c3c-a93c-b293c3655c93
> >
> > health: HEALTH_WARN
> >
> > noout,nodeep-scrub flag(s) set
> >
> >
> >
> >   services:
> >
> > mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a
> >
> > mgr:nidcdvtier1a(active), standbys: nidcdvtier2a
> >
> > osd:12 osds: 12 up, 12 in
> >
> > flags noout,nodeep-scrub
> >
> > rbd-mirror: 1 daemon active
> >
> >
> >
> >   data:
> >
> > pools:   5 pools, 640 pgs
> >
> > objects: 1.32M objects, 5.03TiB
> >
> > usage:   10.1TiB used, 262TiB / 272TiB avail
> >
> > pgs: 640 active+clean
> >
> >
> >
> >   io:
> >
> > client:   170B/s rd, 0B/s wr, 0op/s rd, 0op/s wr
> >
> >
> >
> >
> >
> > 2019-04-05 12:07:29.720742 7f0fa5e284c0  0 ceph version 12.2.11
> > (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable), process
> > rbd-mirror, pid 3921391
> >
> > 2019-04-05 12:07:29.721752 7f0fa5e284c0  0 pidfile_write: ignore empty
> > --pid-file
> >
> > 2019-04-05 12:07:29.726580 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: 
> > 0x560200d29bb0 ServiceDaemon:
> >
> > 2019-04-05 12:07:29.732654 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: 
> > 0x560200d29bb0 init:
> >
> > 2019-04-05 12:07:29.734920 7f0fa5e284c0  1 mgrc
> > service_daemon_register rbd-mirror.admin metadata
> > {arch=x86_64,ceph_version=ceph version 12.2.11
> > (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous
> > (stable),cpu=Intel(R) Xeon(R) CPU E5-2690 v2 @
> > 3.00GHz,distro=ubuntu,distro_description=Ubuntu 14.04.5
> > LTS,distro_version=14.04,hostname=nidcdvtier3a,instance_id=464360,kern
> > el_description=#93 SMP Sat Jun 17 04:01:23 EDT
> > 2017,kernel_version=3.19.0-85-vtier,mem_swap_kb=6710578

Re: [ceph-users] Ceph Replication not working

2019-04-08 Thread Vikas Rana
Hi Jason,

On Prod side, we have cluster ceph and on DR side we renamed to cephdr

Accordingly, we renamed the ceph.conf to cephdr.conf on DR side.

This setup used to work and one day we tried to promote the DR to verify the 
replication and since then it's been a nightmare.
The resync didn’t work and then we eventually gave up and deleted the pool on 
DR side to start afresh.

We deleted and recreated the peer relationship also.

Is there any debugging we can do on Prod or DR side to see where its stopping 
or waiting while "send_open_image"?

Rbd-mirror is running as "rbd-mirror --cluster=cephdr"


Thanks,
-Vikas

-Original Message-
From: Jason Dillaman  
Sent: Monday, April 8, 2019 9:30 AM
To: Vikas Rana 
Cc: ceph-users 
Subject: Re: [ceph-users] Ceph Replication not working

The log appears to be missing all the librbd log messages. The process seems to 
stop at attempting to open the image from the remote cluster:

2019-04-05 12:07:29.992323 7f0f3bfff700 20
rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20 send_open_image

Assuming you are using the default log file naming settings, the log should be 
located at "/var/log/ceph/ceph-client.mirrorprod.log". Of course, looking at 
your cluster naming makes me think that since your primary cluster is named 
"ceph" on the DR-site side, have you changed your "/etc/default/ceph" file to 
rename the local cluster from "ceph"
to "cephdr" so that the "rbd-mirror" daemon connects to the correct local 
cluster?


On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana  wrote:
>
> Hi Jason,
>
> 12.2.11 is the version.
>
> Attached is the complete log file.
>
> We removed the pool to make sure there's no image left on DR site and 
> recreated an empty pool.
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Friday, April 5, 2019 2:24 PM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph Replication not working
>
> What is the version of rbd-mirror daemon and your OSDs? It looks it found two 
> replicated images and got stuck on the "wait_for_deletion"
> step. Since I suspect those images haven't been deleted, it should have 
> immediately proceeded to the next step of the image replay state machine. Are 
> there any additional log messages after 2019-04-05 12:07:29.981203?
>
> On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
> >
> > Hi there,
> >
> > We are trying to setup a rbd-mirror replication and after the setup, 
> > everything looks good but images are not replicating.
> >
> >
> >
> > Can some please please help?
> >
> >
> >
> > Thanks,
> >
> > -Vikas
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME CLIENT
> >
> >   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
> >
> >
> >
> > root@local:/etc/ceph# rbd  mirror pool info nfs
> >
> > Mode: pool
> >
> > Peers:
> >
> >   UUID NAME   CLIENT
> >
> >   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
> >
> >
> >
> >
> >
> > root@local:/etc/ceph# rbd info nfs/test01
> >
> > rbd image 'test01':
> >
> > size 102400 kB in 25 objects
> >
> > order 22 (4096 kB objects)
> >
> > block_name_prefix: rbd_data.11cd3c238e1f29
> >
> > format: 2
> >
> > features: layering, exclusive-lock, object-map, fast-diff, 
> > deep-flatten, journaling
> >
> > flags:
> >
> > journal: 11cd3c238e1f29
> >
> > mirroring state: enabled
> >
> > mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
> >
> > mirroring primary: true
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status 
> > nfs --verbose
> >
> > health: OK
> >
> > images: 0 total
> >
> >
> >
> > root@remote:/var/log/ceph# rbd info nfs/test01
> >
> > rbd: error opening image test01: (2) No such file or directory
> >
> >
> >
> >
> >
> > root@remote:/var/log/ceph# ceph -s --cluster cephdr
> >
> >   cluster:
> >
> > id: ade49174-1f84-4c3c-a93c-b293c3655c93
> >
> > health: HEALTH_WARN
> >
> > noout,nodeep-scrub flag(s) set
> >
> >
> >
> >   services:
> >
> > mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a
> >
> > mgr:nidcdvtier1a(active), standbys: nidcdvtier2a
> >
> > osd:12 osds: 12 up, 12 in
> >
> > flags noout,nodeep-scrub
> >
> > rbd-mirror: 1 daemon active
> >
> >
> >
> >   data:
> >
> > pools:   5 pools, 640 pgs
> >
> > objects: 1.32M objects, 5.03TiB
> >
> > usage:   10.1TiB used, 262TiB / 272TiB avail
> >
> > pgs: 640 active+clean
> >
> >
> >
> >   io:
> >
> > client:   170B/s rd, 0B/s wr, 0op/s rd, 0op/s wr
> >
> >
> >
> >
> >
> > 2019-04-05 12:07:29.720742 7f0fa5e284c0  0 ceph version 12.2.11
> > (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable), 
> > process rbd-mirror, pid 3921391
> >
> > 2019-04-05

Re: [ceph-users] Ceph Replication not working

2019-04-08 Thread Jason Dillaman
On Mon, Apr 8, 2019 at 9:47 AM Vikas Rana  wrote:
>
> Hi Jason,
>
> On Prod side, we have cluster ceph and on DR side we renamed to cephdr
>
> Accordingly, we renamed the ceph.conf to cephdr.conf on DR side.
>
> This setup used to work and one day we tried to promote the DR to verify the 
> replication and since then it's been a nightmare.
> The resync didn’t work and then we eventually gave up and deleted the pool on 
> DR side to start afresh.
>
> We deleted and recreated the peer relationship also.
>
> Is there any debugging we can do on Prod or DR side to see where its stopping 
> or waiting while "send_open_image"?

You need to add "debug rbd = 20" to both your ceph.conf and
cephdr.conf (if you haven't already) and you would need to provide the
log associated w/ the production cluster connection (see below). Also,
please use pastebin or similar service to avoid mailing the logs to
the list.

> Rbd-mirror is running as "rbd-mirror --cluster=cephdr"
>
>
> Thanks,
> -Vikas
>
> -Original Message-
> From: Jason Dillaman 
> Sent: Monday, April 8, 2019 9:30 AM
> To: Vikas Rana 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph Replication not working
>
> The log appears to be missing all the librbd log messages. The process seems 
> to stop at attempting to open the image from the remote cluster:
>
> 2019-04-05 12:07:29.992323 7f0f3bfff700 20
> rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20 send_open_image
>
> Assuming you are using the default log file naming settings, the log should 
> be located at "/var/log/ceph/ceph-client.mirrorprod.log". Of course, looking 
> at your cluster naming makes me think that since your primary cluster is 
> named "ceph" on the DR-site side, have you changed your "/etc/default/ceph" 
> file to rename the local cluster from "ceph"
> to "cephdr" so that the "rbd-mirror" daemon connects to the correct local 
> cluster?
>
>
> On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana  wrote:
> >
> > Hi Jason,
> >
> > 12.2.11 is the version.
> >
> > Attached is the complete log file.
> >
> > We removed the pool to make sure there's no image left on DR site and 
> > recreated an empty pool.
> >
> > Thanks,
> > -Vikas
> >
> > -Original Message-
> > From: Jason Dillaman 
> > Sent: Friday, April 5, 2019 2:24 PM
> > To: Vikas Rana 
> > Cc: ceph-users 
> > Subject: Re: [ceph-users] Ceph Replication not working
> >
> > What is the version of rbd-mirror daemon and your OSDs? It looks it found 
> > two replicated images and got stuck on the "wait_for_deletion"
> > step. Since I suspect those images haven't been deleted, it should have 
> > immediately proceeded to the next step of the image replay state machine. 
> > Are there any additional log messages after 2019-04-05 12:07:29.981203?
> >
> > On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana  wrote:
> > >
> > > Hi there,
> > >
> > > We are trying to setup a rbd-mirror replication and after the setup, 
> > > everything looks good but images are not replicating.
> > >
> > >
> > >
> > > Can some please please help?
> > >
> > >
> > >
> > > Thanks,
> > >
> > > -Vikas
> > >
> > >
> > >
> > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs
> > >
> > > Mode: pool
> > >
> > > Peers:
> > >
> > >   UUID NAME CLIENT
> > >
> > >   bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod
> > >
> > >
> > >
> > > root@local:/etc/ceph# rbd  mirror pool info nfs
> > >
> > > Mode: pool
> > >
> > > Peers:
> > >
> > >   UUID NAME   CLIENT
> > >
> > >   612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr
> > >
> > >
> > >
> > >
> > >
> > > root@local:/etc/ceph# rbd info nfs/test01
> > >
> > > rbd image 'test01':
> > >
> > > size 102400 kB in 25 objects
> > >
> > > order 22 (4096 kB objects)
> > >
> > > block_name_prefix: rbd_data.11cd3c238e1f29
> > >
> > > format: 2
> > >
> > > features: layering, exclusive-lock, object-map, fast-diff,
> > > deep-flatten, journaling
> > >
> > > flags:
> > >
> > > journal: 11cd3c238e1f29
> > >
> > > mirroring state: enabled
> > >
> > > mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
> > >
> > > mirroring primary: true
> > >
> > >
> > >
> > >
> > >
> > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status
> > > nfs --verbose
> > >
> > > health: OK
> > >
> > > images: 0 total
> > >
> > >
> > >
> > > root@remote:/var/log/ceph# rbd info nfs/test01
> > >
> > > rbd: error opening image test01: (2) No such file or directory
> > >
> > >
> > >
> > >
> > >
> > > root@remote:/var/log/ceph# ceph -s --cluster cephdr
> > >
> > >   cluster:
> > >
> > > id: ade49174-1f84-4c3c-a93c-b293c3655c93
> > >
> > > health: HEALTH_WARN
> > >
> > > noout,nodeep-scrub flag(s) set
> > >
> > >
> > >
> > >   services:
> > >
> > > mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a
> > >
> > > mgr:

Re: [ceph-users] Unable to list rbd block > images in nautilus dashboard

2019-04-08 Thread Wes Cilldhaire
It's definitely ceph-mgr that is struggling here. It uses 100% of a cpu for for 
several tens of seconds and reports the followinf in its log a few times before 
anything gets displayed

Traceback (most recent call last): 
File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 88, in 
dashboard_exception_handler 
return handler(*args, **kwargs) 
File "/usr/lib64/python2.7/site-packages/cherrypy/_cpdispatch.py", line 54, in 
__call__ 
return self.callable(*self.args, **self.kwargs) 
File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 649, 
in inner 
ret = func(*args, **kwargs) 
File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 842, 
in wrapper 
return func(*vpath, **params) 
File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in 
wrapper 
return f(*args, **kwargs) 
File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in 
wrapper 
return f(*args, **kwargs) 
File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 270, in 
list 
return self._rbd_list(pool_name) 
File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 261, in 
_rbd_list 
status, value = self._rbd_pool_list(pool) 
File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 244, in wrapper 
return rvc.run(fn, args, kwargs) 
File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 232, in run 
raise ViewCacheNoDataException() 
ViewCacheNoDataException: ViewCache: unable to retrieve data

- On 5 Apr, 2019, at 5:06 PM, Wes Cilldhaire w...@sol1.com.au wrote:

> Hi Lenz,
> 
> Thanks for responding. I suspected that the number of rbd images might have 
> had
> something to do with it so I cleaned up old disposable VM images I am no 
> longer
> using, taking the list down from ~30 to 16, 2 in the EC pool on hdds and the
> rest on the replicated ssd pool. They vary in size from 50GB to 200GB, I don't
> have the # of objects per rbd on hand right now but maybe this is a factor as
> well, particularly with 'du'. This doesn't appear to have made a difference in
> the time and number of attempts required to list them in the dashboard.
> 
> I suspect it might be a case of 'du on all images is always going to take 
> longer
> than the current dashboard timeout', in which case the behaviour of the
> dashboard might possibly need to change to account for this, maybe fetch and
> listt the images in parallel and asynchronously or something. As it stand it
> means the dashboard isn't really usable for managing existing images, which is
> a shame because having that ability makes ceph accessible to our clients who
> are considering it and begins affording some level of self-service for them -
> one of the reasons we've been really excited for Mimic's release actually. I
> really hope I've just done something wrong :)
> 
> I'll try to isolate which process the delay is coming from tonight as well as
> collecting other useful metrics when I'm back on that network tonight.
> 
> Thanks,
> Wes
> 
> 
(null)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to list rbd block > images in nautilus dashboard

2019-04-08 Thread Ricardo Dias
Hi Wes,

I just filed a bug ticket in the Ceph tracker about this:

http://tracker.ceph.com/issues/39140

Will work on a solution ASAP.

Thanks,
Ricardo Dias

On 08/04/19 15:41, Wes Cilldhaire wrote:
> It's definitely ceph-mgr that is struggling here. It uses 100% of a cpu for 
> for several tens of seconds and reports the followinf in its log a few times 
> before anything gets displayed
> 
> Traceback (most recent call last): 
> File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 88, in 
> dashboard_exception_handler 
> return handler(*args, **kwargs) 
> File "/usr/lib64/python2.7/site-packages/cherrypy/_cpdispatch.py", line 54, 
> in __call__ 
> return self.callable(*self.args, **self.kwargs) 
> File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 649, 
> in inner 
> ret = func(*args, **kwargs) 
> File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 842, 
> in wrapper 
> return func(*vpath, **params) 
> File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in 
> wrapper 
> return f(*args, **kwargs) 
> File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in 
> wrapper 
> return f(*args, **kwargs) 
> File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 270, in 
> list 
> return self._rbd_list(pool_name) 
> File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 261, in 
> _rbd_list 
> status, value = self._rbd_pool_list(pool) 
> File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 244, in wrapper 
> return rvc.run(fn, args, kwargs) 
> File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 232, in run 
> raise ViewCacheNoDataException() 
> ViewCacheNoDataException: ViewCache: unable to retrieve data
> 
> - On 5 Apr, 2019, at 5:06 PM, Wes Cilldhaire w...@sol1.com.au wrote:
> 
>> Hi Lenz,
>>
>> Thanks for responding. I suspected that the number of rbd images might have 
>> had
>> something to do with it so I cleaned up old disposable VM images I am no 
>> longer
>> using, taking the list down from ~30 to 16, 2 in the EC pool on hdds and the
>> rest on the replicated ssd pool. They vary in size from 50GB to 200GB, I 
>> don't
>> have the # of objects per rbd on hand right now but maybe this is a factor as
>> well, particularly with 'du'. This doesn't appear to have made a difference 
>> in
>> the time and number of attempts required to list them in the dashboard.
>>
>> I suspect it might be a case of 'du on all images is always going to take 
>> longer
>> than the current dashboard timeout', in which case the behaviour of the
>> dashboard might possibly need to change to account for this, maybe fetch and
>> listt the images in parallel and asynchronously or something. As it stand it
>> means the dashboard isn't really usable for managing existing images, which 
>> is
>> a shame because having that ability makes ceph accessible to our clients who
>> are considering it and begins affording some level of self-service for them -
>> one of the reasons we've been really excited for Mimic's release actually. I
>> really hope I've just done something wrong :)
>>
>> I'll try to isolate which process the delay is coming from tonight as well as
>> collecting other useful metrics when I'm back on that network tonight.
>>
>> Thanks,
>> Wes
>>
>>
> (null)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Ricardo Dias
Senior Software Engineer - Storage Team
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284
(AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to list rbd block > images in nautilus dashboard

2019-04-08 Thread Wes Cilldhaire
Thank you

- On 9 Apr, 2019, at 12:50 AM, Ricardo Dias rd...@suse.com wrote:

> Hi Wes,
> 
> I just filed a bug ticket in the Ceph tracker about this:
> 
> http://tracker.ceph.com/issues/39140
> 
> Will work on a solution ASAP.
> 
> Thanks,
> Ricardo Dias
> 
> On 08/04/19 15:41, Wes Cilldhaire wrote:
>> It's definitely ceph-mgr that is struggling here. It uses 100% of a cpu for 
>> for
>> several tens of seconds and reports the followinf in its log a few times 
>> before
>> anything gets displayed
>> 
>> Traceback (most recent call last):
>> File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 88, in
>> dashboard_exception_handler
>> return handler(*args, **kwargs)
>> File "/usr/lib64/python2.7/site-packages/cherrypy/_cpdispatch.py", line 54, 
>> in
>> __call__
>> return self.callable(*self.args, **self.kwargs)
>> File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 
>> 649, in
>> inner
>> ret = func(*args, **kwargs)
>> File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 
>> 842, in
>> wrapper
>> return func(*vpath, **params)
>> File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in
>> wrapper
>> return f(*args, **kwargs)
>> File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in
>> wrapper
>> return f(*args, **kwargs)
>> File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 270, in 
>> list
>> return self._rbd_list(pool_name)
>> File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 261, in
>> _rbd_list
>> status, value = self._rbd_pool_list(pool)
>> File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 244, in wrapper
>> return rvc.run(fn, args, kwargs)
>> File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 232, in run
>> raise ViewCacheNoDataException()
>> ViewCacheNoDataException: ViewCache: unable to retrieve data
>> 
>> - On 5 Apr, 2019, at 5:06 PM, Wes Cilldhaire w...@sol1.com.au wrote:
>> 
>>> Hi Lenz,
>>> 
>>> Thanks for responding. I suspected that the number of rbd images might have 
>>> had
>>> something to do with it so I cleaned up old disposable VM images I am no 
>>> longer
>>> using, taking the list down from ~30 to 16, 2 in the EC pool on hdds and the
>>> rest on the replicated ssd pool. They vary in size from 50GB to 200GB, I 
>>> don't
>>> have the # of objects per rbd on hand right now but maybe this is a factor 
>>> as
>>> well, particularly with 'du'. This doesn't appear to have made a difference 
>>> in
>>> the time and number of attempts required to list them in the dashboard.
>>> 
>>> I suspect it might be a case of 'du on all images is always going to take 
>>> longer
>>> than the current dashboard timeout', in which case the behaviour of the
>>> dashboard might possibly need to change to account for this, maybe fetch and
>>> listt the images in parallel and asynchronously or something. As it stand it
>>> means the dashboard isn't really usable for managing existing images, which 
>>> is
>>> a shame because having that ability makes ceph accessible to our clients who
>>> are considering it and begins affording some level of self-service for them 
>>> -
>>> one of the reasons we've been really excited for Mimic's release actually. I
>>> really hope I've just done something wrong :)
>>> 
>>> I'll try to isolate which process the delay is coming from tonight as well 
>>> as
>>> collecting other useful metrics when I'm back on that network tonight.
>>> 
>>> Thanks,
>>> Wes
>>> 
>>> 
>> (null)
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> --
> Ricardo Dias
> Senior Software Engineer - Storage Team
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284
> (AG Nürnberg)
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
(null)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] DevConf US CFP Ends Today + Planning

2019-04-08 Thread Mike Perez
Hey everyone,

The CFP for DevConf US [1] ends today! I have submitted for us to have
a Ceph Foundation booth, BOF space and two presentations myself which
you can find on our CFP coordination pad [2]. I'll update here if our
booth is accepted and a call for help.

If you're planning on attending and want to help with Ceph's presence,
please email me directly so I can make sure you're part of any
communication.

Looking forward to potentially meeting more people in the community!

[1] - https://devconf.info/us/2019
[2] - https://pad.ceph.com/p/cfp-coordination

--
Mike Perez (thingee)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw cloud sync aws s3 auth failed

2019-04-08 Thread Robin H. Johnson
On Mon, Apr 08, 2019 at 06:38:59PM +0800, 黄明友 wrote:
> 
> hi,all
> 
>I had test the cloud sync module in radosgw.  ceph verion is
>13.2.5  , git commit id is
>cbff874f9007f1869bfd3821b7e33b2a6ffd4988;
Reading src/rgw/rgw_rest_client.cc
shows that it only generates v2 signatures for the sync module :-(

AWS China regions are some of the v4-only regions.

I don't know of any current work to tackle this, but there is v4
signature generation code already in the codebase, would just need to be
wired up in src/rgw/rgw_rest_client.cc.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluefs-bdev-expand experience

2019-04-08 Thread Igor Fedotov

Hi Yuri,

both issues from Round 2 relate to unsupported expansion for main device.

In fact it doesn't work and silently bypasses the operation in you case.

Please try with a different device...


Also I've just submitted a PR for mimic to indicate the bypass, will 
backport to Luminous once mimic patch is approved.


See https://github.com/ceph/ceph/pull/27447


Thanks,

Igor

On 4/5/2019 4:07 PM, Yury Shevchuk wrote:

On Fri, Apr 05, 2019 at 02:42:53PM +0300, Igor Fedotov wrote:

wrt Round 1 - an ability to expand block(main) device has been added to
Nautilus,

see: https://github.com/ceph/ceph/pull/25308

Oh, that's good.  But still separate wal&db may be good for studying
load on each volume (blktrace) or moving db/wal to another physical
disk by means of LVM transparently to ceph.


wrt Round 2:

- not setting 'size' label looks like a bug although I recall I fixed it...
Will double check.

- wrong stats output is probably related to the lack of monitor restart -
could you please try that and report back if it helps? Or even restart the
whole cluster.. (well I understand that's a bad approach for production but
just to verify my hypothesis)

Mon restart didn't help:

node0:~# systemctl restart ceph-mon@0
node1:~# systemctl restart ceph-mon@1
node2:~# systemctl restart ceph-mon@2
node2:~# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL  %USE  VAR  PGS
  0   hdd 0.22739  1.0  233GiB 9.44GiB 223GiB  4.06 0.12 128
  1   hdd 0.22739  1.0  233GiB 9.44GiB 223GiB  4.06 0.12 128
  3   hdd 0.227390  0B  0B 0B 00   0
  2   hdd 0.22739  1.0  800GiB  409GiB 391GiB 51.18 1.51 128
 TOTAL 1.24TiB  428GiB 837GiB 33.84
MIN/MAX VAR: 0.12/1.51  STDDEV: 26.30

Restarting mgrs and then all ceph daemons on all three nodes didn't
help either:

node2:~# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL  %USE  VAR  PGS
  0   hdd 0.22739  1.0  233GiB 9.43GiB 223GiB  4.05 0.12 128
  1   hdd 0.22739  1.0  233GiB 9.43GiB 223GiB  4.05 0.12 128
  3   hdd 0.227390  0B  0B 0B 00   0
  2   hdd 0.22739  1.0  800GiB  409GiB 391GiB 51.18 1.51 128
 TOTAL 1.24TiB  428GiB 837GiB 33.84
MIN/MAX VAR: 0.12/1.51  STDDEV: 26.30

Maybe we should upgrade to v14.2.0 Nautilus instead of studying old
bugs... after all, this is a toy cluster for now.

Thank you for responding,


-- Yury


On 4/5/2019 2:06 PM, Yury Shevchuk wrote:

Hello all!

We have a toy 3-node Ceph cluster running Luminous 12.2.11 with one
bluestore osd per node.  We started with pretty small OSDs and would
like to be able to expand OSDs whenever needed.  We had two issues
with the expansion: one turned out user-serviceable while the other
probably needs developers' look.  I will describe both shortly.

Round 1
~~~
Trying to expand osd.2 by 1TB:

# lvextend -L+1T /dev/vg0/osd2
  Size of logical volume vg0/osd2 changed from 232.88 GiB (59618 extents) 
to 1.23 TiB (321762 extents).
  Logical volume vg0/osd2 successfully resized.

# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2
inferring bluefs devices from bluestore path
 slot 1 /var/lib/ceph/osd/ceph-2//block
1 : size 0x13a3880 : own 0x[1bf220~25430]
Expanding...
1 : can't be expanded. Bypassing...
#

It didn't work.  The explaination can be found in
ceph/src/os/bluestore/BlueFS.cc at line 310:

// returns true if specified device is under full bluefs control
// and hence can be expanded
bool BlueFS::is_device_expandable(unsigned id)
{
  if (id >= MAX_BDEV || bdev[id] == nullptr) {
return false;
  }
  switch(id) {
  case BDEV_WAL:
return true;

  case BDEV_DB:
// true if DB volume is non-shared
return bdev[BDEV_SLOW] != nullptr;
  }
  return false;
}

So we have to use separate block.db and block.wal for OSD to be
expandable.  Indeed, our OSDs were created without separate block.db
and block.wal, like this:

ceph-volume lvm create --bluestore --data /dev/vg0/osd2

Recreating osd.2 with separate block.db and block.wal:

# ceph-volume lvm zap --destroy --osd-id 2
# lvcreate -L1G -n osd2wal vg0
  Logical volume "osd2wal" created.
# lvcreate -L40G -n osd2db vg0
  Logical volume "osd2db" created.
# lvcreate -L400G -n osd2 vg0
  Logical volume "osd2" created.
# ceph-volume lvm create --osd-id 2 --bluestore --data vg0/osd2 --block.db 
vg0/osd2db --block.wal vg0/osd2wal

Resync takes some time, and then we have expandable osd.2.


Round 2
~~~
Trying to expand osd.2 from 400G to 700G:

# lvextend -L+300G /dev/vg0/osd2
  Size of logical volume vg0/osd2 changed from 400.00 GiB (102400 extents) 
to 700.00 GiB (179200 extents).
  Logical volume vg0/osd2 successfully resized.

# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2/
inferring bluefs devices from bluestore 

Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-08 Thread Mark Nelson
One of the difficulties with the osd_memory_target work is that we can't 
tune based on the RSS memory usage of the process. Ultimately it's up to 
the kernel to decide to reclaim memory and especially with transparent 
huge pages it's tough to judge what the kernel is going to do even if 
memory has been unmapped by the process.  Instead the autotuner looks at 
how much memory has been mapped and tries to balance the caches based on 
that.



In addition to Dan's advice, you might also want to enable debug 
bluestore at level 5 and look for lines containing "target:" and 
"cache_size:".  These will tell you the current target, the mapped 
memory, unmapped memory, heap size, previous aggregate cache size, and 
new aggregate cache size.  The other line will give you a break down of 
how much memory was assigned to each of the bluestore caches and how 
much each case is using.  If there is a memory leak, the autotuner can 
only do so much.  At some point it will reduce the caches to fit within 
cache_min and leave it there.



Mark


On 4/8/19 5:18 AM, Dan van der Ster wrote:

Which OS are you using?
With CentOS we find that the heap is not always automatically
released. (You can check the heap freelist with `ceph tell osd.0 heap
stats`).
As a workaround we run this hourly:

ceph tell mon.* heap release
ceph tell osd.* heap release
ceph tell mds.* heap release

-- Dan

On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet  wrote:

Hi,

on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the
osd_memory_target :

daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
ceph3646 17.1 12.0 6828916 5893136 ? Ssl  mars29 1903:42 
/usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph --setgroup ceph
ceph3991 12.9 11.2 6342812 5485356 ? Ssl  mars29 1443:41 
/usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser ceph --setgroup ceph
ceph4361 16.9 11.8 6718432 5783584 ? Ssl  mars29 1889:41 
/usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser ceph --setgroup ceph
ceph4731 19.7 12.2 6949584 5982040 ? Ssl  mars29 2198:47 
/usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser ceph --setgroup ceph
ceph5073 16.7 11.6 6639568 5701368 ? Ssl  mars29 1866:05 
/usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser ceph --setgroup ceph
ceph5417 14.6 11.2 6386764 5519944 ? Ssl  mars29 1634:30 
/usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph --setgroup ceph
ceph5760 16.9 12.0 6806448 5879624 ? Ssl  mars29 1882:42 
/usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser ceph --setgroup ceph
ceph6105 16.0 11.6 6576336 5694556 ? Ssl  mars29 1782:52 
/usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser ceph --setgroup ceph

daevel-ob@ssdr712h:~$ free -m
   totalusedfree  shared  buff/cache   available
Mem:  47771   452101643  17 917   43556
Swap: 0   0   0

# ceph daemon osd.147 config show | grep memory_target
 "osd_memory_target": "4294967296",


And there is no recovery / backfilling, the cluster is fine :

$ ceph status
  cluster:
id: de035250-323d-4cf6-8c4b-cf0faf6296b1
health: HEALTH_OK

  services:
mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel
mgr: tsyne(active), standbys: olkas, tolriq, lorunde, amphel
osd: 120 osds: 116 up, 116 in

  data:
pools:   20 pools, 12736 pgs
objects: 15.29M objects, 31.1TiB
usage:   101TiB used, 75.3TiB / 177TiB avail
pgs: 12732 active+clean
 4 active+clean+scrubbing+deep

  io:
client:   72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd, 1.29kop/s wr


On an other host, in the same pool, I see also high memory usage :

daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd
ceph6287  6.6 10.6 6027388 5190032 ? Ssl  mars21 1511:07 
/usr/bin/ceph-osd -f --cluster ceph --id 131 --setuser ceph --setgroup ceph
ceph6759  7.3 11.2 6299140 5484412 ? Ssl  mars21 1665:22 
/usr/bin/ceph-osd -f --cluster ceph --id 132 --setuser ceph --setgroup ceph
ceph7114  7.0 11.7 6576168 5756236 ? Ssl  mars21 1612:09 
/usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser ceph --setgroup ceph
ceph7467  7.4 11.1 6244668 5430512 ? Ssl  mars21 1704:06 
/usr/bin/ceph-osd -f --cluster ceph --id 134 --setuser ceph --setgroup ceph
ceph7821  7.7 11.1 6309456 5469376 ? Ssl  mars21 1754:35 
/usr/bin/ceph-osd -f --cluster ceph --id 135 --setuser ceph --setgroup ceph
ceph8174  6.9 11.6 6545224 5705412 ? Ssl  mars21 1590:31 
/usr/bin/ceph-osd -f --cluster ceph --id 136 --setuser ceph --setgroup ceph
ceph8746  6.6 11.1 6290004 5477204 ? Ssl  mars21 1511:11 
/usr/bin/ceph-osd -f --cluster ceph --id 137 --setuser ceph --setgroup ceph
ceph9100  7.7 11.6 6552080 5713560 ? Ssl  mars

Re: [ceph-users] PGs stuck in created state

2019-04-08 Thread ceph
Hello Simon,

Another idea is to increase choose_total_tries.

Hth
Mehmet

Am 7. März 2019 09:56:17 MEZ schrieb Martin Verges :
>Hello,
>
>try restarting every osd if possible.
>Upgrade to a recent ceph version.
>
>--
>Martin Verges
>Managing director
>
>Mobile: +49 174 9335695
>E-Mail: martin.ver...@croit.io
>Chat: https://t.me/MartinVerges
>
>croit GmbH, Freseniusstr. 31h, 81247 Munich
>CEO: Martin Verges - VAT-ID: DE310638492
>Com. register: Amtsgericht Munich HRB 231263
>
>Web: https://croit.io
>YouTube: https://goo.gl/PGE1Bx
>
>
>Am Do., 7. März 2019 um 08:39 Uhr schrieb simon falicon <
>simonfali...@gmail.com>:
>
>> Hello Ceph Users,
>>
>> I have an issue with my ceph cluster, after one serious fail in four
>SSD
>> (electricaly dead) I have lost PGs (and replicats) and I have 14 Pgs
>stuck.
>>
>> So for correct it I have try to force create this PGs (with same IDs)
>but
>> now the Pgs stuck in creating state -_-" :
>>
>> ~# ceph -s
>>  health HEALTH_ERR
>> 14 pgs are stuck inactive for more than 300 seconds
>> 
>>
>> ceph pg dump | grep creating
>>
>> dumped all in format plain
>> 9.300000000creating2019-02-25
>09:32:12.3339790'00:0[20,26]20[20,11]200'0   
>2019-02-25 09:32:12.3339790'02019-02-25 09:32:12.333979
>> 3.900000000creating2019-02-25
>09:32:11.2954510'00:0[16,39]16[17,6]170'0   
>2019-02-25 09:32:11.2954510'02019-02-25 09:32:11.295451
>> ...
>>
>> I have try to create new PG dosent existe before and it work, but for
>this
>> PG stuck in creating state.
>>
>> In my monitor logs I have this message:
>>
>> 2019-02-25 11:02:46.904897 7f5a371ed700  0 mon.controller1@1(peon) e7
>handle_command mon_command({"prefix": "pg force_create_pg", "pgid":
>"4.20e"} v 0) v1
>> 2019-02-25 11:02:46.904938 7f5a371ed700  0 log_channel(audit) log
>[INF] : from='client.? 172.31.101.107:0/3101034432'
>entity='client.admin' cmd=[{"prefix": "pg force_create_pg", "pgid":
>"4.20e"}]: dispatch
>>
>> When I check map I have:
>>
>> ~# ceph pg map 4.20e
>> osdmap e428069 pg 4.20e (4.20e) -> up [27,37,36] acting [13,17]
>>
>> I have restart OSD 27,37,36,13 and 17 but no effect. (one by one)
>>
>> I have see this issue http://tracker.ceph.com/issues/18298 but I run
>on
>> ceph 10.2.11.
>>
>> So could you help me please ?
>>
>> Many thanks by advance,
>> Sfalicon.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Bryan Stillwell
We have two separate RGW clusters running Luminous (12.2.8) that have started 
seeing an increase in PGs going active+clean+inconsistent with the reason being 
caused by an omap_digest mismatch.  Both clusters are using FileStore and the 
inconsistent PGs are happening on the .rgw.buckets.index pool which was moved 
from HDDs to SSDs within the last few months.

We've been repairing them by first making sure the odd omap_digest is not the 
primary by setting the primary-affinity to 0 if needed, doing the repair, and 
then setting the primary-affinity back to 1.

For example PG 7.3 went inconsistent earlier today:

# rados list-inconsistent-obj 7.3 -f json-pretty | jq -r '.inconsistents[] | 
.errors, .shards'
[
  "omap_digest_mismatch"
]
[
  {
"osd": 504,
"primary": true,
"errors": [],
"size": 0,
"omap_digest": "0x4c10ee76",
"data_digest": "0x"
  },
  {
"osd": 525,
"primary": false,
"errors": [],
"size": 0,
"omap_digest": "0x26a1241b",
"data_digest": "0x"
  },
  {
"osd": 556,
"primary": false,
"errors": [],
"size": 0,
"omap_digest": "0x26a1241b",
"data_digest": "0x"
  }
]

Since the odd omap_digest is on osd.504 and osd.504 is the primary, we would 
set the primary-affinity to 0 with:

# ceph osd primary-affinity osd.504 0

Do the repair:

# ceph pg repair 7.3

And then once the repair is complete we would set the primary-affinity back to 
1 on osd.504:

# ceph osd primary-affinity osd.504 1

There doesn't appear to be any correlation between the OSDs which would point 
to a hardware issue, and since it's happening on two different clusters I'm 
wondering if there's a race condition that has been fixed in a later version?

Also, what exactly is the omap digest?  From what I can tell it appears to be 
some kind of checksum for the omap data.  Is that correct?

Thanks,
Bryan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Gregory Farnum
On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell  wrote:
>
> We have two separate RGW clusters running Luminous (12.2.8) that have started 
> seeing an increase in PGs going active+clean+inconsistent with the reason 
> being caused by an omap_digest mismatch.  Both clusters are using FileStore 
> and the inconsistent PGs are happening on the .rgw.buckets.index pool which 
> was moved from HDDs to SSDs within the last few months.
>
> We've been repairing them by first making sure the odd omap_digest is not the 
> primary by setting the primary-affinity to 0 if needed, doing the repair, and 
> then setting the primary-affinity back to 1.
>
> For example PG 7.3 went inconsistent earlier today:
>
> # rados list-inconsistent-obj 7.3 -f json-pretty | jq -r '.inconsistents[] | 
> .errors, .shards'
> [
>   "omap_digest_mismatch"
> ]
> [
>   {
> "osd": 504,
> "primary": true,
> "errors": [],
> "size": 0,
> "omap_digest": "0x4c10ee76",
> "data_digest": "0x"
>   },
>   {
> "osd": 525,
> "primary": false,
> "errors": [],
> "size": 0,
> "omap_digest": "0x26a1241b",
> "data_digest": "0x"
>   },
>   {
> "osd": 556,
> "primary": false,
> "errors": [],
> "size": 0,
> "omap_digest": "0x26a1241b",
> "data_digest": "0x"
>   }
> ]
>
> Since the odd omap_digest is on osd.504 and osd.504 is the primary, we would 
> set the primary-affinity to 0 with:
>
> # ceph osd primary-affinity osd.504 0
>
> Do the repair:
>
> # ceph pg repair 7.3
>
> And then once the repair is complete we would set the primary-affinity back 
> to 1 on osd.504:
>
> # ceph osd primary-affinity osd.504 1
>
> There doesn't appear to be any correlation between the OSDs which would point 
> to a hardware issue, and since it's happening on two different clusters I'm 
> wondering if there's a race condition that has been fixed in a later version?
>
> Also, what exactly is the omap digest?  From what I can tell it appears to be 
> some kind of checksum for the omap data.  Is that correct?

Yeah; it's just a crc over the omap key-value data that's checked
during deep scrub. Same as the data digest.

I've not noticed any issues around this in Luminous but I probably
wouldn't have, so will have to leave it up to others if there are
fixes in since 12.2.8.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Bryan Stillwell

> On Apr 8, 2019, at 4:38 PM, Gregory Farnum  wrote:
> 
> On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell  wrote:
>> 
>> There doesn't appear to be any correlation between the OSDs which would 
>> point to a hardware issue, and since it's happening on two different 
>> clusters I'm wondering if there's a race condition that has been fixed in a 
>> later version?
>> 
>> Also, what exactly is the omap digest?  From what I can tell it appears to 
>> be some kind of checksum for the omap data.  Is that correct?
> 
> Yeah; it's just a crc over the omap key-value data that's checked
> during deep scrub. Same as the data digest.
> 
> I've not noticed any issues around this in Luminous but I probably
> wouldn't have, so will have to leave it up to others if there are
> fixes in since 12.2.8.

Thanks for adding some clarity to that Greg!

For some added information, this is what the logs reported earlier today:

2019-04-08 11:46:15.610169 osd.504 osd.504 10.16.10.30:6804/8874 33 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x26a1241b != omap_digest 0x4c10ee76 from shard 504
2019-04-08 11:46:15.610190 osd.504 osd.504 10.16.10.30:6804/8874 34 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x26a1241b != omap_digest 0x4c10ee76 from shard 504

I then tried deep scrubbing it again to see if the data was fine, but the 
digest calculation was just having problems.  It came back with the same 
problem with new digest values:

2019-04-08 15:56:21.186291 osd.504 osd.504 10.16.10.30:6804/8874 49 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x93bac8f != omap_digest 0 xab1b9c6f from shard 504
2019-04-08 15:56:21.186313 osd.504 osd.504 10.16.10.30:6804/8874 50 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x93bac8f != omap_digest 0 xab1b9c6f from shard 504

Which makes sense, but doesn’t explain why the omap data is getting out of sync 
across multiple OSDs and clusters…

I’ll see what I can figure out tomorrow, but if anyone else has some hints I 
would love to hear them.

Thanks,
Bryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Try to log the IP in the header X-Forwarded-For with radosgw behind haproxy

2019-04-08 Thread Francois Lafont

Hi @all,

I'm using Ceph rados gateway installed via ceph-ansible with the Nautilus
version. The radosgw are behind a haproxy which add these headers (checked
via tcpdump):

X-Forwarded-Proto: http
X-Forwarded-For: 10.111.222.55

where 10.111.222.55 is the IP address of the client. The radosgw use the
civetweb http frontend. Currently, this is the IP address of the haproxy
itself which is mentioned in logs. I would like to mention the IP address
from the X-Forwarded-For HTTP header. How to do that?

I have tried this option in ceph.conf:

rgw_remote_addr_param = X-Forwarded-For

It doesn't work but maybe I have read the doc wrongly.

Thx in advance for your help.

PS: I have tried too the http frontend "beast" but, in this case, no HTTP
request seems to be logged.

--
François
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Try to log the IP in the header X-Forwarded-For with radosgw behind haproxy

2019-04-08 Thread Pavan Rallabhandi
Refer "rgw log http headers" under 
http://docs.ceph.com/docs/nautilus/radosgw/config-ref/

Or even better in the code https://github.com/ceph/ceph/pull/7639

Thanks,
-Pavan.

On 4/8/19, 8:32 PM, "ceph-users on behalf of Francois Lafont" 
 
wrote:

Hi @all,

I'm using Ceph rados gateway installed via ceph-ansible with the Nautilus
version. The radosgw are behind a haproxy which add these headers (checked
via tcpdump):

 X-Forwarded-Proto: http
 X-Forwarded-For: 10.111.222.55

where 10.111.222.55 is the IP address of the client. The radosgw use the
civetweb http frontend. Currently, this is the IP address of the haproxy
itself which is mentioned in logs. I would like to mention the IP address
from the X-Forwarded-For HTTP header. How to do that?

I have tried this option in ceph.conf:

 rgw_remote_addr_param = X-Forwarded-For

It doesn't work but maybe I have read the doc wrongly.

Thx in advance for your help.

PS: I have tried too the http frontend "beast" but, in this case, no HTTP
request seems to be logged.

-- 
François
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com