Hi and thanks for response !

In the last days we've got the idential situtation with these error-messages ...
sometimes on all machines they started to log on the same time ... network-traffic is not extremly high ...


filesystem of the afscache  is ext4 and the size 8GB

Option are : /usr/vice/etc/afsd -afsdb -dynroot -fakestat

The cacheinfo-file :   /usr/vice/etc/cacheinfo : /afs:/var/cache/afs:5552000

[root@bird070 ~]# fs getcacheparms -excessive
AFS using    88% of cache blocks (4908415 of 5552000 1k blocks)
             29% of the cache files (49470 of 173500 files)
        afs_cacheFiles:     173500
        IFFree:             124030
        IFEverUsed:           9551
        IFDataMod:               3
        IFDirtyPages:            0
        IFAnyPages:              0
        IFDiscarded:             0
        DCentries:        9997
          0k-   4K:        267
          4k-  16k:        229
         16k-  64k:       9061
         64k- 256k:        212
        256k-   1M:         10
              >=1M:        218
[root@bird070 ~]# df -i|grep cache |grep afs
/dev/sda3                                          512064    173599     338465  
 34% /var/cache/afs
[root@bird070 ~]# df -h|grep cache |grep afs
/dev/sda3                                       7.6G  4.7G  2.5G  66% 
/var/cache/afs

[root@bird058 ~]# fs getcacheparms -excessive
AFS using    86% of cache blocks (4768364 of 5552000 1k blocks)
             25% of the cache files (43806 of 173500 files)
        afs_cacheFiles:     173500
        IFFree:             129694
        IFEverUsed:           9929
        IFDataMod:               2
        IFDirtyPages:            0
        IFAnyPages:              0
        IFDiscarded:             0
        DCentries:        9998
          0k-   4K:       5074
          4k-  16k:       1639
         16k-  64k:       1728
         64k- 256k:        440
        256k-   1M:        115
              >=1M:       1002

[root@bird652 ~]# fs getcacheparms -excessive
AFS using    89% of cache blocks (4917473 of 5552000 1k blocks)
             34% of the cache files (58678 of 173500 files)
        afs_cacheFiles:     173500
        IFFree:             114822
        IFEverUsed:           9913
        IFDataMod:               0
        IFDirtyPages:            0
        IFAnyPages:              0
        IFDiscarded:             0
        DCentries:        9999
          0k-   4K:       2372
          4k-  16k:       4863
         16k-  64k:       2047
         64k- 256k:        154
        256k-   1M:         78
              >=1M:        485

thanks & cheers,

            martin
On Tue, 23 Oct 2018, Benjamin Kaduk wrote:

On Tue, Oct 23, 2018 at 02:14:38PM +0200, Stephan Wiesand wrote:

On 23. Oct 2018, at 12:16, Andreas Ladanyi <[email protected]> wrote:

In the last few days we've observed an increasing number of Nodes,
which are no longer be reached and have to be rebooted

In the /var/log/messages we see a lot of lines with e.g.

Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
CacheItems slot 25254 off 2020340/13880020 code -5/80
Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
CacheItems slot 25253 off 2020260/13880020 code -5/80
Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
CacheItems slot 25252 off 2020180/13880020 code -5/80
Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
CacheItems slot 25251 off 2020100/13880020 code -5/80

till nothing happens anymore ...

The clients are  Centos 7.5 , 3.10.0-862.14.4.el7.x86_64, OpenAFS
1.6.23 built 2018-09-12 ([email protected])

Any hints for the possible reason ?

I have the same constellation with AFS 1.6.23 client from jsbilling repo.

I cant see this messages in /var/log/messages yet.

We're running the same kernel version and the same client build (it's the SL 
one) on a fair number of SL 7.4 systems, and don't see these issues either.

-5 is EIO, meaning an actual I/O error is reported.

What's the size and type of the cache filesystems? What does "fs getcache 
report"? What are the afsd parameters? Could these nodes be out of space or inodes 
for the cache?

It's also possible that the actual disk is having trouble, and/or got
remounted RO.  dmesg and/or syslog might have some clues.

(Interestingly enough, we had some changes go by recently on master to make
the error handling for certain cases in this same class more graceful (i.e.,
fail requests but not panic), though those changes are not in 1.6.23.)

-Ben
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


Gruss

       Martin Flemming


______________________________________________________
Martin Flemming
DESY / IT          office : Building 2b / 008a
Notkestr. 85       phone  : 040 - 8998 - 4667
22603 Hamburg      mail   : [email protected]
______________________________________________________
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to