I can always remount and see them.   

But I wanted to preserve the "broken" state and see if I could figure out why 
it was happening.   (strace isn't particularly revealing.)

Some other things I noted was that 

- if I reboot the metadata server nobody seems to "fail over" to the hot spare 
(everything locks up until it's back online).   I'm guessing you have to 
manually make the spare primary, and then switch back?
- if I reboot the mon that someone is mounted to, his mount locks up (even if I 
list 4 monitors in the fstab), but other clients still work.







-----Original Message-----
From: Aronesty, Erik 
Sent: Friday, May 09, 2014 11:51 AM
To: 'Lincoln Bryant'
Cc: ceph-users
Subject: RE: [ceph-users] issues with ceph

If I stat on that box, I get nothing:

q782657@usadc-seaxd01:/mounts/ceph1/pubdata/tcga/raw$ cd BRCA
-bash: cd: BRCA: No such file or directory

perl -e 'print stat("BRCA")'
<no result>

If I access a mount on another machine, I can see the files:

q782657@usadc-nasea05:/mounts/ceph1/pubdata/tcga$ ls -l raw
total 0
drwxrwxr-x 1 q783775 pipeline 366462246414 May  8 12:00 BRCA
drwxrwxr-x 1 q783775 pipeline 161578200377 May  8 12:00 COAD
drwxrwxr-x 1 q783775 pipeline 367320207221 May  8 11:35 HNSC
drwxrwxr-x 1 q783775 pipeline 333587505256 May  8 13:27 LAML
drwxrwxr-x 1 q783775 pipeline 380346443564 May  8 13:27 LUSC
drwxrwxr-x 1 q783775 pipeline 357340261602 May  8 13:33 PAAD
drwxrwxr-x 1 q783775 pipeline 389882082560 May  8 13:33 PRAD
drwxrwxr-x 1 q783775 pipeline 634089122305 May  8 13:33 STAD
drwxrwxr-x 1 q783775 pipeline 430754940032 May  8 13:33 THCA

I will try updating the kernel, and rerunning some tests.   Thanks.      


-----Original Message-----
From: Lincoln Bryant [mailto:linco...@uchicago.edu] 
Sent: Friday, May 09, 2014 10:39 AM
To: Aronesty, Erik
Cc: ceph-users
Subject: Re: [ceph-users] issues with ceph

Hi Erik,

What happens if you try to stat one of the "missing" files (assuming you know 
the name of the file before you remount raw)?

I had a problem where files would disappear and reappear in CephFS, which I 
believe was fixed in kernel 3.12.

Cheers,
Lincoln

On May 9, 2014, at 9:30 AM, Aronesty, Erik wrote:

> So we were attempting to stress test a cephfs installation, and last night, 
> after copying 500GB of files, we got this:
> 
> 570G in the "raw" directory
> 
> q782657@usadc-seaxd01:/mounts/ceph1/pubdata/tcga$ ls -lh
> total 32M
> -rw-rw-r-- 1 q783775 pipeline  32M May  8 10:39 
> 2014-02-25T12:00:01-0800_data_manifest.tsv
> -rw-rw-r-- 1 q783775 pipeline  144 May  8 10:42 cghub.key
> drwxrwxr-x 1 q783775 pipeline 234G May  8 11:31 fastqs
> drwxrwxr-x 1 q783775 pipeline 570G May  8 13:33 raw
> -rw-rw-r-- 1 q783775 pipeline   86 May  8 11:19 readme.txt
> 
> But when I ls into the "raw" folder, I get zero files:
> 
> q782657@usadc-seaxd01:/mounts/ceph1/pubdata/tcga$ ls -lh raw
> total 0
> 
> If I mount that folder again... all the files "re-appear".
> 
> Is this a bug that's been solved in a newer release?
> 
> KERNEL:
> Linux usadc-nasea05 3.11.0-20-generic #34~precise1-Ubuntu SMP Thu Apr 3 
> 17:25:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> 
> CEPH:
> ii  ceph                              0.72.2-1precise                   
> distributed storage and file system
> 
> 
> ------ No errors that I could see on the client machine:
> 
> q782657@usadc-seaxd01:/mounts/ceph1/pubdata/tcga$ dmesg | grep ceph
> [588560.047193] Key type ceph registered
> [588560.047334] libceph: loaded (mon/osd proto 15/24)
> [588560.102874] ceph: loaded (mds proto 32)
> [588560.117392] libceph: client6005 fsid f067539c-7426-47ee-afb0-7d2c6dfcbcd0
> [588560.126477] libceph: mon1 10.18.176.180:6789 session established
> 
> 
> ------ Ceph itself looks fine.
> 
> root@usadc-nasea05:~# ceph health
> HEALTH_OK
> 
> root@usadc-nasea05:~# ceph quorum_status
> {"election_epoch":668,"quorum":[0,1,2,3],"quorum_names":["usadc-nasea05","usadc-nasea06","usadc-nasea07","usadc-nasea08"],"quorum_leader_name":"usadc-nasea05","monmap":{"epoch":1,"fsid":"f067539c-7426-47ee-afb0-7d2c6dfcbcd0","modified":"0.000000","created":"0.000000","mons":[{"rank":0,"name":"usadc-nasea05","addr":"10.18.176.179:6789\/0"},{"rank":1,"name":"usadc-nasea06","addr":"10.18.176.180:6789\/0"},{"rank":2,"name":"usadc-nasea07","addr":"10.18.176.181:6789\/0"},{"rank":3,"name":"usadc-nasea08","addr":"10.18.176.182:6789\/0"}]}}
> 
> root@usadc-nasea05:~# ceph mon dump
> dumped monmap epoch 1
> epoch 1
> fsid f067539c-7426-47ee-afb0-7d2c6dfcbcd0
> last_changed 0.000000
> created 0.000000
> 0: 10.18.176.179:6789/0 mon.usadc-nasea05
> 1: 10.18.176.180:6789/0 mon.usadc-nasea06
> 2: 10.18.176.181:6789/0 mon.usadc-nasea07
> 3: 10.18.176.182:6789/0 mon.usadc-nasea08
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to