To Eric van Loon:
Hi Eric
I believe I need to give you an update on our backup troubles.
Maybe these interventions might help others.
For some time backups would crash with messages as :
07-07-2025 18:56:39 ANR2012W Error encountered for storage pool directory:
\\medbackup.med.ad.bgu.ac.il\tsmc26<file://medbackup.med.ad.bgu.ac.il/tsmc26>
in storage pool: CPOOL.
You send me to check Dsmffdc log:
[07-07-2025 18:56:24.948][ FFDC_GENERAL_SERVER_ERROR ]:
[127](psfile.c:2909)(SESSION: 8063) Error (platform specific) 59 writing to
file
\\medbackup.med.ad.bgu.ac.il\tsmc15\55\000000000000555c.dcf<file://medbackup.med.ad.bgu.ac.il/tsmc15/55/000000000000555c.dcf>,
amtWritten 0, size 774144 loc 2413883392.
And told me these are indeed likely network troubles.
These issues went on about daily.
Then our server went down for maintenance, and this was an opportunity to try
and fix things.
I am not the manager of the server, not of the network, firewall, security etc.
So had to coordinate changes.
1. The server and the container storage are not on the same VLAN. I talked to
the firewall manager and he found some packages dropped by the Checkpoint
firewall. He relaxed the rules, and no drops were noted afterwards.
2. The McAfee system on the main client (a windows system with 64 TB disk and
14 million files) was extremely busy checking these files. The system is
managed centrally, and it took some discussions to allow to exclude the files
on the non-system disks from McAfee. These files are used on the client systems
so the checks should be there. This was implemented and immediately the file
server got its life back again. (McAfee AKA as Trellix, EndPointSecurity,
McShield, EPO )
3. We replaced the old Intel e1000e driver with a VMXNET3 driver on the server
holding the containers.
4. I packed on the client more old directories into zip files (reducing the
file system from 14 million to less than 9 million)
5. Following a Google AI advise, set the CommTimeOut option from 60 (a minute)
to 3600 (an hour)
6. We restarted the server.
Part of the log
[...]
28-07-2025 23:38:39 Normal File--> 36.512.051.027
\\medfs2\f$\medgroups9\virology\havag\NOAM\Disk<file://medfs2/f$/medgroups9/virology/havag/NOAM/Disk>
2022\MITSY_job33065\הדאטא מסודר כפי שקיבלתי אותו\EEG DATA 2018.zip [Sent]
28-07-2025 23:46:18 Normal File--> 36.173.304.575
\\medfs2\f$\medgroups9\virology\havag\USV<file://medfs2/f$/medgroups9/virology/havag/USV>
SHAKED 2022\ALL USV Recordings\Adult\USV from Ohad and Efrat 2014\לחוה
relevant\final project recordings relevant\Night recording 23.06.2014
relevant.zip [Sent]
28-07-2025 23:46:20 Normal File--> 39.387.080.701
\\medfs2\f$\medgroups9\virology\havag\USV<file://medfs2/f$/medgroups9/virology/havag/USV>
SHAKED 2022\ALL USV Recordings\Adult\USV from Ohad and Efrat 2014\לחוה
relevant\final project recordings relevant\night to run 2.4 relevant.zip [Sent]
28-07-2025 23:48:10 Normal File--> 39.551.746.574
\\medfs2\f$\medgroups9\virology\havag\USV<file://medfs2/f$/medgroups9/virology/havag/USV>
SHAKED 2022\ALL USV Recordings\Adult\USV from Ohad and Efrat 2014\לחוה
relevant\final project recordings relevant\night long 02.04.2014 relevant.zip
[Sent]
28-07-2025 23:48:12 Successful incremental backup of '\\medfs2\f$'
28-07-2025 23:48:22 --- SCHEDULEREC STATUS BEGIN
28-07-2025 23:48:22 Total number of objects inspected: 17.731.714
28-07-2025 23:48:22 Total number of objects backed up: 238.097
28-07-2025 23:48:22 Total number of objects updated: 0
28-07-2025 23:48:22 Total number of objects rebound: 0
28-07-2025 23:48:22 Total number of objects deleted: 0
28-07-2025 23:48:22 Total number of objects expired: 1.156.781
28-07-2025 23:48:22 Total number of objects failed: 37
28-07-2025 23:48:22 Total number of objects encrypted: 0
28-07-2025 23:48:22 Total number of subfile objects: 0
28-07-2025 23:48:22 Total number of objects grew: 0
28-07-2025 23:48:22 Total number of retries: 136
28-07-2025 23:48:22 Total number of bytes inspected: 55,34 TB
28-07-2025 23:48:22 Total number of bytes transferred: 664,81 GB
28-07-2025 23:48:22 Data transfer time: 37.962,50 sec
28-07-2025 23:48:22 Network data transfer rate: 18.362,92 KB/sec
28-07-2025 23:48:22 Aggregate data transfer rate: 45.657,21 KB/sec
28-07-2025 23:48:22 Objects compressed by: 6%
28-07-2025 23:48:22 Total data reduction ratio: 98.83%
28-07-2025 23:48:22 Subfile objects reduced by: 0%
28-07-2025 23:48:22 Elapsed processing time: 04:14:28
28-07-2025 23:48:22 --- SCHEDULEREC STATUS END
28-07-2025 23:48:22 --- SCHEDULEREC OBJECT END DAILY 28-07-2025 16:50:00
28-07-2025 23:48:22 Scheduled event 'DAILY' completed successfully.
Notice that files of 40 GB were backed up without problem.
Note that run time is 4 hours, but data transfer 10 hours (?)
If I was an IBM engineer, I might have tried to split the changes and find out
what was the problem.
In recent days sometimes the backup failed and on others succeeded.
In summary, the most important issues are to remove the barriers for speed
(virus checks, millions of files to be processed) and of course prevent any
hick-ups or time outs.
Thanks for your invaluable advice!
David
-----Original Message-----
From: ADSM: Dist Stor Manager <[email protected]> On Behalf Of Loon, Eric
van (ITOP DI) - KLM
Sent: Wednesday, July 9, 2025 10:02 AM
To: [email protected]
Subject: Re: [ADSM-L] incr backups fail repeatedly because of link to container
storage
Hi David
The error 59 (An unexpected network error occurred. ERROR_UNEXP_NET_ERR) indeed
indicates some kind of network related issue.
Good luck with fixing the issue.
Kind regards,
Eric van Loon
Air France/KLM Core Infra
-----Original Message-----
From: ADSM: Dist Stor Manager
<[email protected]<mailto:[email protected]>> On Behalf Of David L.A. De
Leeuw
Sent: Tuesday, July 8, 2025 10:02 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: incr backups fail repeatedly because of link to container storage
Hi Eric,
I was happy too soon...
The first (large) backups ran great, but next day the problem returned.
Seems definitely disturbance on the network. Our other server (on the same
physical system but with different storage) has errors like these as well.
Looks like we need some network sniffing.
Thanks
David
n.b. others with problems like this? Could you solve it ? any "network timeout
parameter" we can set ?
Server log:
07-07-2025 18:56:39 ANR2012W Error encountered for storage pool directory:
\\medbackup.med.ad.bgu.ac.il\tsmc26<file://medbackup.med.ad.bgu.ac.il/tsmc26>
in storage pool:
CPOOL.
07-07-2025 18:56:39 ANR2012W Error encountered for storage pool directory:
\\medbackup.med.ad.bgu.ac.il\tsmc1<file://medbackup.med.ad.bgu.ac.il/tsmc1> in
storage pool:
CPOOL. (SESSION: 8067)
07-07-2025 18:56:39 ANR0530W Transaction failed for session 8067 for node
MEDFS2 (WinNT) - internal server error detected.
(SESSION: 8067)
07-07-2025 18:56:41 ANR0403I Session 8063 ended for node MEDFS2 (WinNT).
(SESSION: 8063)
07-07-2025 18:56:44 ANR0403I Session 7975 ended for node MEDFS2 (WinNT).
(SESSION: 7975)
Dsmffdc log:
[07-07-2025 18:56:24.948][ FFDC_GENERAL_SERVER_ERROR ]:
[127](psfile.c:2909)(SESSION: 8063) Error (platform specific) 59 writing to
file
\\medbackup.med.ad.bgu.ac.il\tsmc15\55\000000000000555c.dcf<file://medbackup.med.ad.bgu.ac.il/tsmc15/55/000000000000555c.dcf>,
amtWritten 0, size 774144 loc 2413883392.
[07-07-2025 18:56:25.519][ FFDC_GENERAL_SERVER_ERROR ]:
[121](sdcreate.c:2172)(SESSION: 8063) Exiting with bad rc 2528 for bfId
1608710764.
[07-07-2025 18:56:32.057][ FFDC_GENERAL_SERVER_ERROR ]: [423](psfile.c:2909)
Error (platform specific) 59 writing to file
\\medbackup.med.ad.bgu.ac.il\tsmc1\6a\0000000000006a44.dcf<file://medbackup.med.ad.bgu.ac.il/tsmc1/6a/0000000000006a44.dcf>,
amtWritten 0, size 1048576 loc 577073152.
[07-07-2025 18:56:40.247][ FFDC_GENERAL_SERVER_ERROR ]:
[172](psfile.c:2909)(SESSION: 8065) Error (platform specific) 59 writing to
file
\\medbackup.med.ad.bgu.ac.il\tsmc26\6a\0000000000006a4a.dcf<file://medbackup.med.ad.bgu.ac.il/tsmc26/6a/0000000000006a4a.dcf>,
amtWritten 0, size 692352 loc 6642896896.
[07-07-2025 18:56:40.299][ FFDC_GENERAL_SERVER_ERROR ]:
[172](sdprodcon.c:5009)(SESSION: 8065) Flushing write control: rc=2528.
-----Original Message-----
From: David de Leeuw
Sent: Saturday, July 5, 2025 12:58 PM
To: ADSM: Dist Stor Manager <[email protected]<mailto:[email protected]>>
Subject: RE: incr backups fail repeatedly because of link to container storage
Hi Eric,
We upgraded the server from 8.1.25 to 8.1.27.
Seems to solve the problem.
I ran a massive backup, took 10 hours. No errors.
Thanks
David
-----Original Message-----
From: ADSM: Dist Stor Manager
<[email protected]<mailto:[email protected]>> On Behalf Of Loon, Eric van
(ITOP DI) - KLM
Sent: Friday, July 4, 2025 10:27 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: [ADSM-L] incr backups fail repeatedly because of link to container
storage
Hi David,
Have a look at the dsmffdc.log in the instance directory on your server. What
error does it show at 03-07-2025 08:06:40?
Kind regards,
Eric van Loon
Air France/KLM Core Infra
-----Original Message-----
From: ADSM: Dist Stor Manager
<[email protected]<mailto:[email protected]>> On Behalf Of David L.A. De
Leeuw
Sent: Thursday, July 3, 2025 9:55 AM
To: [email protected]<mailto:[email protected]>
Subject: incr backups fail repeatedly because of link to container storage
HI TSM's
We run Storage protect (versions 8.1) quite some time, with "container" storage
on a windows server. The pool is on a remote system with 10Gb internet
connection.
This works fine a number of years. We recently upgraded to server 8.1.25.
Now every few hours the backups fall with messages as
03-07-2025 08:06:39 ANR0950I Session 65175 for node MEDFS2 is using inline
server data deduplication or inline compression.
(SESSION: 65175)
03-07-2025 08:06:40 ANR2012W Error encountered for storage pool directory:
XXX in storage pool:
CPOOL. (SESSION: 65174)
03-07-2025 08:06:40 ANR1181E sdtxn.c(1409): Data storage transaction
0:35162311 was aborted. (SESSION: 65174)
Other backups continue at the same time.
There are no damaged or unavailable containers.
When restarting the INCR all works fine.
Seems to be a time-out or something.
Any ideas were to look for ?
David de Leeuw
Ben Gurion University of the Negev
********************************************************
For information, services and offers, please visit our web site:
https://protect.checkpoint.com/v2/r02/___http://www.klm.com___.YzJlOmJlbmd1cmlvbnVuaXZlcnNpdHlvZnRoZW5lZ2V2OmM6bzo3NGYzZmZiY2RjMzFiNWQ5ODRlNjJkZDc2NmM3NDA1Yjo3Ojk0ZjY6YTk5MWE4Y2E0NTY1ZjYyYTMzMjEzYjg1NWIyYTdjYTE4ZGQ0YTE4ZDUzM2FiNzVlZGE1MmU4NDllNGZmNTA2ZTpwOlQ6Tg<https://protect.checkpoint.com/v2/r02/___http:/www.klm.com___.YzJlOmJlbmd1cmlvbnVuaXZlcnNpdHlvZnRoZW5lZ2V2OmM6bzo3NGYzZmZiY2RjMzFiNWQ5ODRlNjJkZDc2NmM3NDA1Yjo3Ojk0ZjY6YTk5MWE4Y2E0NTY1ZjYyYTMzMjEzYjg1NWIyYTdjYTE4ZGQ0YTE4ZDUzM2FiNzVlZGE1MmU4NDllNGZmNTA2ZTpwOlQ6Tg>.
This e-mail and any attachment may contain confidential and privileged
material intended for the addressee only. If you are not the addressee, you are
notified that no part of the e-mail or any attachment may be disclosed, copied
or distributed, and that any other action related to this e-mail or attachment
is strictly prohibited, and may be unlawful. If you have received this e-mail
by error, please notify the sender immediately by return e-mail, and delete
this message.
Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
employees shall not be liable for the incorrect or incomplete transmission of
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
Airlines) is registered in Amstelveen, The Netherlands, with registered number
33014286
********************************************************
********************************************************
For information, services and offers, please visit our web site:
https://protect.checkpoint.com/v2/r02/___http://www.klm.com___.YzJlOmJlbmd1cmlvbnVuaXZlcnNpdHlvZnRoZW5lZ2V2OmM6bzozZjMzMjhkZTgzYjk2MjlhMzliZjkwODVmMzZkMDliOTo3OmIwN2M6MGVlMWFjN2E2ZDg1NDQ2MDc3NzdhMTE4Y2U0ZDEyNDgxNzU2NjViN2RkMWZiZTg4NDg1MTEyOGM0OGFjOGYzMzpwOlQ6Tg<https://protect.checkpoint.com/v2/r02/___http:/www.klm.com___.YzJlOmJlbmd1cmlvbnVuaXZlcnNpdHlvZnRoZW5lZ2V2OmM6bzozZjMzMjhkZTgzYjk2MjlhMzliZjkwODVmMzZkMDliOTo3OmIwN2M6MGVlMWFjN2E2ZDg1NDQ2MDc3NzdhMTE4Y2U0ZDEyNDgxNzU2NjViN2RkMWZiZTg4NDg1MTEyOGM0OGFjOGYzMzpwOlQ6Tg>.
This e-mail and any attachment may contain confidential and privileged
material intended for the addressee only. If you are not the addressee, you are
notified that no part of the e-mail or any attachment may be disclosed, copied
or distributed, and that any other action related to this e-mail or attachment
is strictly prohibited, and may be unlawful. If you have received this e-mail
by error, please notify the sender immediately by return e-mail, and delete
this message.
Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
employees shall not be liable for the incorrect or incomplete transmission of
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
Airlines) is registered in Amstelveen, The Netherlands, with registered number
33014286
********************************************************