date:20190917

Re: [ceph-users] CephFS deletion performance

2019-09-17 Thread Yan, Zheng

On Sat, Sep 14, 2019 at 8:57 PM Hector Martin  wrote:
>
> On 13/09/2019 16.25, Hector Martin wrote:
> > Is this expected for CephFS? I know data deletions are asynchronous, but
> > not being able to delete metadata/directories without an undue impact on
> > the whole filesystem performance is somewhat problematic.
>
> I think I'm getting a feeling for who the culprit is here. I just
> noticed that listing directories in a snapshot that were subsequently
> deleted *also* performs horribly, and kills cluster performance too.
>
> We just had a partial outage due to this; a snapshot+rsync triggered
> while a round of deletions were happening, and as far as I can tell,
> when it caught up to newly deleted files, MDS performance tanked as it
> repeatedly had to open stray dirs under the hood. In fact, the
> inode/dentry metrics (opened/closed) skyrocketed during that period,
> from the normal ~1Kops from multiple parallel rsyncs to ~15Kops.
>
> As I mentioned in a prior message to the list, we have ~570k stray files
> due to snapshots. It makes sense that deleting a directory/file means
> moving it to a stray directory (each holding ~57k files already), and
> accessing a deleted file via a snapshot means accessing the stray
> directory. Am I right in thinking that these operations are at least
> O(n) in the amount of strays, and in fact may iterate or otherwise touch
> every single file in the stray directories? (This would explain the
> sudden 15Kops spike in indoe/dentry activity). It seems that with such
> bloated stray dirs, anything that involves them under the scenes just
> make the MDS completely hiccup and grind away, affecting performance for
> any other clients.
>
> I guess at this point we'll have to drastically cut down the time span
> for which we keep CephFS snapshots. Maybe I'll move the snapshot history
> keeping to the backup target, at least then it won't affect production
> data. But since we plan on using the other cluster for production too
> eventually, that would mean we need to use multi-FS in order to isolate
> the workloads...
>

when a snapshoted directory is deleted, mds moves the directory into
to stray directory.  You have 57k strays, each time mds have a cache
miss for stray, mds needs to load a stray dirfrag. This is very
inefficient because a stray dirfrag contains lots of items, most items
are useless.


> --
> Hector Martin (hec...@marcansoft.com)
> Public Key: https://mrcn.st/pub
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs: apache locks up after parallel reloads on multiple nodes

2019-09-17 Thread Sander Smeenk

Quoting Paul Emmerich (paul.emmer...@croit.io):

> Yeah, CephFS is much closer to POSIX semantics for a filesystem than
> NFS. There's an experimental relaxed mode called LazyIO but I'm not
> sure if it's applicable here.

Out of curiosity, how would CephFS being more POSIX compliant cause
this much delay in this situation? I'd understand if it would maybe
take up to a second or maybe two, but almost fifteen minutes and then
suddenly /all/ servers recover at the same time?

Would this situation exist because we have so many open filehandles per
server? Or could it also appear in a simpler "two servers share a
CephFS" setup?

I'm so curious to find out what /causes/ this.
"Closer to POSIX sematics" doesn't cut it for me in this case.
Not with the symptoms we're seeing.


> You can debug this by dumping slow requests from the MDS servers via
> the admin socket

As far as i understood, there's not much to see on the MDS servers when
this issue pops op. E.g. no slow ops logged during this event.


Regards,
-Sndr.
-- 
| I think i want a job cleaning mirrors...
| It's just something i can really see myself doing...
| 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7  FBD6 F3A9 9442 20CC 6CD2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD's keep crasching after clusterreboot

2019-09-17 Thread Oliver Freyermuth


Hi together,

it seems the issue described by Ansgar was reported and closed here as being 
fixed for newly created pools in post-Luminous releases:
https://tracker.ceph.com/issues/41336

However, it is unclear to me:
- How to find out if an EC cephfs you have created in Luminous is actually affected, 
before actually testing the "shutdown all" procedure,
  and thus having dying OSDs.
- If affected, how to fix it without purging the pool completely (which is not 
so easily done if there is 0.5 PB inside, which can't be restored without a 
long downtime).

If this is an acknowledged issue, it should probably also be mentioned in the 
upgrade notes from pre-Mimic to Mimic and newer before more people lose data.

In our case, we have such a a CephFS on an EC pool created with Luminous, and are right 
now running Mimic 13.2.6, but never tried a "full shutdown".
We need to try that on Friday, though... (cooling system maintenance).

"osd dump" contains:

pool 1 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 1 object_hash 
rjenkins pg_num 128 pgp_num 128 last_change 40903 flags hashpspool stripe_width 
0 compression_algorithm snappy compression_mode aggressive application cephfs
pool 2 'cephfs_data' erasure size 6 min_size 5 crush_rule 2 object_hash 
rjenkins pg_num 4096 pgp_num 4096 last_change 40953 flags 
hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384 
compression_algorithm snappy compression_mode aggressive application cephfs


and the EC profile is:

# ceph osd erasure-code-profile get cephfs_data
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8


Neither contains the stripe_unit explicitly, so I wonder how to find out if it 
is (in)valid.
Checking the xattr ceph.file.layout.stripe_unit of some "old" files on the FS 
reveals 4194304 in my case.

Any help appreciated.

Cheers and all the best,
Oliver

Am 09.08.19 um 08:54 schrieb Ansgar Jazdzewski:

We got our OSD's back

Since we removed the EC-Pool (cephfs.data) we had to figure out how to
remove the PG from teh Offline OSD and hier is how we did it.

remove cehfs, remove cache layer, remove pools:
#ceph mds fail 0
#ceph fs rm cephfs --yes-i-really-mean-it
#ceph osd tier remove-overlay cephfs.data
there is now (or already was) no overlay for 'cephfs.data'
#ceph osd tier remove cephfs.data cephfs.cache
pool 'cephfs.cache' is now (or already was) not a tier of 'cephfs.data'
#ceph tell mon.\* injectargs '--mon-allow-pool-delete=true'
#ceph osd pool delete cephfs.cache cephfs.cache --yes-i-really-really-mean-it
pool 'cephfs.cache' removed
#ceph osd pool delete cephfs.data cephfs.data --yes-i-really-really-mean-it
pool 'cephfs.data' removed
#ceph osd pool delete cephfs.metadata cephfs.metadata
--yes-i-really-really-mean-it
pool 'cephfs.metadata' removed

remove placement groups of pool 23 (cephfs.data) from all offline OSDs:
DATAPATH=/var/lib/ceph/osd/ceph-${OSD}
a=`ceph-objectstore-tool --data-path ${DATAPATH} --op list-pgs | grep "^23\."`
for i in $a; do
   echo "INFO: removing ${i} from OSD ${OSD}"
   ceph-objectstore-tool --data-path ${DATAPATH} --pgid ${i} --op remove --force
done

since we now had removed our cephfs we still not know if we could have
solved it without data loss by upgrading to nautilus.

Have a nice Weekend,
Ansgar

Am Mi., 7. Aug. 2019 um 17:03 Uhr schrieb Ansgar Jazdzewski
:


another update,

we now took the more destructive route and removed the cephfs pools
(lucky we had only test date in the filesystem)
Our hope was that within the startup-process the osd will delete the
no longer needed PG, But this is NOT the Case.

So we are still have the same issue the only difference is that the PG
does not belong to a pool anymore.

  -360> 2019-08-07 14:52:32.655 7fb14db8de00  5 osd.44 pg_epoch: 196586
pg[23.f8s0(unlocked)] enter Initial
  -360> 2019-08-07 14:52:32.659 7fb14db8de00 -1
/build/ceph-13.2.6/src/osd/ECUtil.h: In function
'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread
7fb14db8de00 time 2019-08-07 14:52:32.660169
/build/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width %
stripe_size == 0)

we now can take one rout and try to delete the pg by hand in the OSD
(bluestore) how this can be done? OR we try to upgrade to Nautilus and
hope for the beset.

any help hints are welcome,
have a nice one
Ansgar

Am Mi., 7. Aug. 2019 um 11:32 Uhr schrieb Ansgar Jazdzewski
:


Hi,

as a follow-up:
* a full log of one OSD failing to start https://pastebin.com/T8UQ2rZ6
* our ec-pool cration in the fist place https://pastebin.com/20cC06Jn
* ceph osd dump and ceph osd erasure-code-profile get cephfs
https://pastebin.com/TRLPaWcH

as we try to dig more into it, it looks like a bug

Re: [ceph-users] cephfs: apache locks up after parallel reloads on multiple nodes

2019-09-17 Thread Gregory Farnum

On Tue, Sep 17, 2019 at 8:12 AM Sander Smeenk  wrote:
>
> Quoting Paul Emmerich (paul.emmer...@croit.io):
>
> > Yeah, CephFS is much closer to POSIX semantics for a filesystem than
> > NFS. There's an experimental relaxed mode called LazyIO but I'm not
> > sure if it's applicable here.
>
> Out of curiosity, how would CephFS being more POSIX compliant cause
> this much delay in this situation? I'd understand if it would maybe
> take up to a second or maybe two, but almost fifteen minutes and then
> suddenly /all/ servers recover at the same time?
>
> Would this situation exist because we have so many open filehandles per
> server? Or could it also appear in a simpler "two servers share a
> CephFS" setup?
>
> I'm so curious to find out what /causes/ this.
> "Closer to POSIX sematics" doesn't cut it for me in this case.
> Not with the symptoms we're seeing.

Yeah this sounds weird. 15 minutes is one or two timers but I can't
think of anything that should be related here.

I'd look and see what sys calls the apache daemons are making and how
long they're taking; in particular what's different between the first
server and the rest. If they're doing a lot of the same syscalls but
just much slower on the follow-on servers, that probably indicates
they're all hammering the CephFS cluster with conflicting updates
(especially if they're writes!) that NFS simply ignored and collapsed.
If it's just one syscall that takes minutes to complete, check the mds
admin socket for ops_in_flight.
-Greg

>
>
> > You can debug this by dumping slow requests from the MDS servers via
> > the admin socket
>
> As far as i understood, there's not much to see on the MDS servers when
> this issue pops op. E.g. no slow ops logged during this event.
>
>
> Regards,
> -Sndr.
> --
> | I think i want a job cleaning mirrors...
> | It's just something i can really see myself doing...
> | 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7  FBD6 F3A9 9442 20CC 6CD2
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Nautilus : ceph dashboard ssl not working

2019-09-17 Thread Michel Raabe


Hi Muthu,



On 16.09.19 11:30, nokia ceph wrote:

Hi Team,
In ceph 14.2.2 , ceph dashboard does not have set-ssl-certificate .
We are trying to enable ceph dashboard and while using the ssl 
certificate and key , it is not working .


cn5.chn5au1c1.cdn ~# ceph dashboard set-ssl-certificate -i dashboard.crt
no valid command found; 10 closest matches:
dashboard set-grafana-update-dashboards 
dashboard reset-prometheus-api-host
dashboard reset-ganesha-clusters-rados-pool-namespace
dashboard set-grafana-api-username 
dashboard get-audit-api-log-payload
dashboard get-grafana-api-password
dashboard get-grafana-api-username
dashboard set-rgw-api-access-key 
dashboard reset-rgw-api-host
dashboard set-prometheus-api-host 
Error EINVAL: invalid command
cn5.chn5au1c1.cdn ~# ceph -v
ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus 
(stable)


How to set crt and key in this case.


ceph config-key dump | grep dashboard/[crt,key]

Try this:

ceph config-key set mgr mgr/dashboard/crt -i ssl.crt
ceph config-key set mgr mgr/dashboard/key -i ssl.key

Regards,
Michel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] eu.ceph.com mirror out of sync?

2019-09-17 Thread Oliver Freyermuth


Dear Cephalopodians,

I realized just now that:
  https://eu.ceph.com/rpm-nautilus/el7/x86_64/
still holds only released up to 14.2.2, and nothing is to be seen of 14.2.3 or 
14.2.4,
while the main repository at:
  https://download.ceph.com/rpm-nautilus/el7/x86_64/
looks as expected.

Is this issue with the eu.ceph.com mirror already knwon?

Cheers,
Oliver



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS deletion performance

Re: [ceph-users] cephfs: apache locks up after parallel reloads on multiple nodes

Re: [ceph-users] OSD's keep crasching after clusterreboot

Re: [ceph-users] cephfs: apache locks up after parallel reloads on multiple nodes

Re: [ceph-users] Nautilus : ceph dashboard ssl not working

[ceph-users] eu.ceph.com mirror out of sync?

6 matches

Site Navigation

Mail list logo

Footer information