Dear OpenAFS community,

We are administrators for an OpenAFS environment of (what will be) about 400 
users and are running into some performance issues, for which we hope you might 
have some advice...

1. Do you have any sources we can look at that might help us in adjusting 
configuration to improve performance? We read the man page for `dafileserver` 
and messed around a lot with our arguments to `dafileserver` (increasing them 
past the values set for -L, or Large)... though we haven't noticed much of an 
improvement in performance through our testing. See below for the configuration 
we currently have set for `dafileserver` on all of our OpenAFS file servers.

2. Do you know what kind of read/write speed we should expect for an 
enviroment/configuration of this size? It would be helpful for us to know what 
we should be expecting in our environment as far as performance is concerned.

===========================
Our performance test
===========================
Here are results from our testing with a binary file (7103053824 bytes in size, 
or 6.7GB), copying it from one client to AFS:
  client1: openSUSE 15.1
  server: AFS file server that hosts the AFS volumes used for our testing

  `scp`: client1 (local) -> server (local): 102.2MB/s (66s)
  `cp`: client1 (local) -> client1 (AFS file space): 19.2MB/s (352s)
  `cp`: client1 (AFS file space) -> client1 (AFS file space): 19.46MB/s (348s)

Here are results from our testing with the same binary file (7103053824 bytes 
in size, or 6.7GB), copying it in parallel from two clients to the same AFS 
volume:
  client1 (local) -> server (AFS file space): 10.22MB/s (663s)
  client2 (local) -> server (AFS file space): 9.69MB/s (699s)

  client1 (AFS file space) -> client1 (AFS file space): 5.38MB/s (1258s)
  client2 (AFS file space) -> client2 (AFS file space): 7MB/s (965s)

  client1 (AFS file space) -> client1 (local): 13.15MB/s (515s)
  client2 (AFS file space) -> client2 (local): 15.57MB/s (435s)

  client1 total time taken: 2436s
  client2 total time taken: 2099s



Here is a snapshot of what `top` looks like from the AFS file server while the 
copy is taking place:

  top - 16:14:14 up 5 days,  7:29,  2 users,  load average: 1.06, 0.37, 0.26
  Tasks: 297 total,   2 running, 294 sleeping,   1 stopped,   0 zombie
  %Cpu0  : 17.3 us,  6.5 sy,  0.0 ni, 69.4 id,  1.7 wa,  1.0 hi,  4.1 si,  0.0 
st
  %Cpu1  : 16.2 us,  4.1 sy,  0.0 ni, 65.5 id, 13.2 wa,  0.7 hi,  0.3 si,  0.0 
st
  %Cpu2  :  5.0 us,  6.7 sy,  0.3 ni, 12.4 id, 63.2 wa,  1.0 hi, 11.4 si,  0.0 
st
  %Cpu3  :  7.5 us,  5.1 sy,  9.2 ni, 44.2 id, 31.5 wa,  1.4 hi,  1.0 si,  0.0 
st
  %Cpu4  : 13.3 us,  6.5 sy,  2.0 ni, 67.6 id,  9.9 wa,  0.7 hi,  0.0 si,  0.0 
st
  %Cpu5  : 37.4 us, 14.6 sy,  0.0 ni, 41.1 id,  6.0 wa,  0.7 hi,  0.3 si,  0.0 
st
  MiB Mem :  24080.5 total,  14283.7 free,    526.5 used,   9270.3 buff/cache
  MiB Swap:   4060.0 total,   4060.0 free,      0.0 used.  23105.9 avail Mem

      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    22409 root      15  -5 4282356  65240   2808 S 118.3   0.3  75:55.61 
dafileserver

Here is the output of `fs getcacheparms` while both clients were copying the 
file to AFS:

  client1: AFS using 781060 of the cache's available 891289 1K byte blocks.
  client2: AFS using 0 of the cache's available 891289 1K byte blocks.


***************************
Our environment
***************************
We have our environment configuration documented below, and are hoping you 
might give us some pointers as to what might be a performance bottleneck.

  Our testing environment:
    - OpenAFS Servers
      - OpenAFS 1.8.9
      - DB servers (total of 3)
        - 1 master
          - Rocky Linux 8.8
          - 2 CPU
          - 4GB RAM
        - 2 replicas, with each having:
          - Rocky Linux 8.8
          - 2 CPU
          - 4GB RAM
      - FS servers (total of 3)
        - 3 fileservers, with each having:
          - Rocky Linux 8.8
          - 6 CPU
          - 24GB RAM
          - /usr/afs/local/BosConfig:
              restrictmode 0
              restarttime 16 0 0 0 0
              checkbintime 3 0 5 0 0
              bnode dafs dafs 1
              parm /usr/afs/bin/dafileserver -L -cb 640000 -abortthreshold 0 
-vc 1000
              parm /usr/afs/bin/davolserver -p 64 -log
              parm /usr/afs/bin/salvageserver
              parm /usr/afs/bin/dasalvager -parallel all32
              end
              bnode simple upclientetc 1
              parm /usr/afs/bin/upclient db1 /usr/afs/etc
              end
              bnode simple upclientbin 1
              parm /usr/afs/bin/upclient db1 /usr/afs/bin
              end
    - OpenAFS Clients
      - client1
        - openSUSE 15.1
        - OpenAFS 1.8.7
        - 6 CPUs
        - 16GB RAM
        - `fs getcacheparms`
            AFS using 12 of the cache's available 891289 1K byte blocks.
        - /etc/sysconfig/openafs-client:
            AFSD_ARGS="-fakestat -stat 6000 -dcache 6000 -daemons 6 -volumes 
256 -files 50000 -chunksize 17"
      - client2
        - openSUSE 13.2
        - OpenAFS 1.8.7
        - 2 CPUs
        - 2GB RAM
        - `fs getcacheparms`
            AFS using 0 of the cache's available 891289 1K byte blocks.
        - /etc/sysconfig/afs
            OPTIONS=$XXLARGE
              (and XXLARGE="-fakestat -stat 4000 -dcache 4000 -daemons 6 
-volumes 256 -afsdb")



Thanks for the help!!


Regards,

Collin

Collin Gros
Staff Software Engineer
RICOH Graphic Communications - DSBC

Ricoh USA, Inc
Phone: +1 720-663-3225
Email: [email protected]
[cid:[email protected]]

Reply via email to