Re: incr backups fail repeatedly because of link to container storage

David L.A. De Leeuw Fri, 08 Aug 2025 01:18:53 -0700

To Eric van Loon:



Hi Eric



I believe I need to give you an update on our backup troubles.

Maybe these interventions might help others.



For some time backups would crash with messages as :



07-07-2025 18:56:39      ANR2012W Error encountered for storage pool directory:

                          
\\medbackup.med.ad.bgu.ac.il\tsmc26<file://medbackup.med.ad.bgu.ac.il/tsmc26> 
in storage pool:             CPOOL.



You send me to check Dsmffdc log:



[07-07-2025 18:56:24.948][ FFDC_GENERAL_SERVER_ERROR ]: 
[127](psfile.c:2909)(SESSION: 8063) Error (platform specific) 59 writing to 
file 
\\medbackup.med.ad.bgu.ac.il\tsmc15\55\000000000000555c.dcf<file://medbackup.med.ad.bgu.ac.il/tsmc15/55/000000000000555c.dcf>,
 amtWritten 0, size 774144 loc 2413883392.

And told me these are indeed likely network troubles.



These issues went on about daily.

Then our server went down for maintenance, and this was an opportunity to try 
and fix things.



I am not the manager of the server, not of the network, firewall, security etc. 
So had to coordinate changes.



1. The server and the container storage are not on the same VLAN. I talked to 
the firewall manager and he found some packages dropped by the Checkpoint 
firewall. He relaxed the rules, and no drops were noted afterwards.

2. The McAfee system on the main client (a windows system with 64 TB disk and 
14 million files) was extremely busy checking these files. The system is 
managed centrally, and it took some discussions to allow to exclude the files 
on the non-system disks from McAfee. These files are used on the client systems 
so the checks should be there. This was implemented and immediately the file 
server got its life back again. (McAfee AKA as Trellix, EndPointSecurity, 
McShield, EPO )

3. We replaced the old Intel e1000e driver with a VMXNET3 driver on the server 
holding the containers.

4. I packed on the client more old directories into zip files (reducing the 
file system from 14 million to less than 9 million)

5. Following a Google AI advise, set the CommTimeOut option from 60 (a minute) 
to 3600 (an hour)

6. We restarted the server.



Part of the log



[...]

28-07-2025 23:38:39 Normal File-->    36.512.051.027 
\\medfs2\f$\medgroups9\virology\havag\NOAM\Disk<file://medfs2/f$/medgroups9/virology/havag/NOAM/Disk>
 2022\MITSY_job33065\הדאטא מסודר כפי שקיבלתי אותו\EEG DATA 2018.zip [Sent]

28-07-2025 23:46:18 Normal File-->    36.173.304.575 
\\medfs2\f$\medgroups9\virology\havag\USV<file://medfs2/f$/medgroups9/virology/havag/USV>
 SHAKED 2022\ALL USV Recordings\Adult\USV from Ohad and Efrat 2014\לחוה 
relevant\final project recordings relevant\Night recording 23.06.2014 
relevant.zip [Sent]

28-07-2025 23:46:20 Normal File-->    39.387.080.701 
\\medfs2\f$\medgroups9\virology\havag\USV<file://medfs2/f$/medgroups9/virology/havag/USV>
 SHAKED 2022\ALL USV Recordings\Adult\USV from Ohad and Efrat 2014\לחוה 
relevant\final project recordings relevant\night to run 2.4 relevant.zip [Sent]

28-07-2025 23:48:10 Normal File-->    39.551.746.574 
\\medfs2\f$\medgroups9\virology\havag\USV<file://medfs2/f$/medgroups9/virology/havag/USV>
 SHAKED 2022\ALL USV Recordings\Adult\USV from Ohad and Efrat 2014\לחוה 
relevant\final project recordings relevant\night long 02.04.2014 relevant.zip 
[Sent]

28-07-2025 23:48:12 Successful incremental backup of '\\medfs2\f$'



28-07-2025 23:48:22 --- SCHEDULEREC STATUS BEGIN

28-07-2025 23:48:22 Total number of objects inspected:   17.731.714

28-07-2025 23:48:22 Total number of objects backed up:      238.097

28-07-2025 23:48:22 Total number of objects updated:              0

28-07-2025 23:48:22 Total number of objects rebound:              0

28-07-2025 23:48:22 Total number of objects deleted:              0

28-07-2025 23:48:22 Total number of objects expired:      1.156.781

28-07-2025 23:48:22 Total number of objects failed:              37

28-07-2025 23:48:22 Total number of objects encrypted:            0

28-07-2025 23:48:22 Total number of subfile objects:              0

28-07-2025 23:48:22 Total number of objects grew:                 0

28-07-2025 23:48:22 Total number of retries:                    136

28-07-2025 23:48:22 Total number of bytes inspected:          55,34 TB

28-07-2025 23:48:22 Total number of bytes transferred:       664,81 GB

28-07-2025 23:48:22 Data transfer time:                   37.962,50 sec

28-07-2025 23:48:22 Network data transfer rate:           18.362,92 KB/sec

28-07-2025 23:48:22 Aggregate data transfer rate:         45.657,21 KB/sec

28-07-2025 23:48:22 Objects compressed by:                        6%

28-07-2025 23:48:22 Total data reduction ratio:               98.83%

28-07-2025 23:48:22 Subfile objects reduced by:                   0%

28-07-2025 23:48:22 Elapsed processing time:               04:14:28

28-07-2025 23:48:22 --- SCHEDULEREC STATUS END

28-07-2025 23:48:22 --- SCHEDULEREC OBJECT END DAILY 28-07-2025 16:50:00

28-07-2025 23:48:22 Scheduled event 'DAILY' completed successfully.



Notice that files of 40 GB were backed up without problem.



Note that run time is 4 hours, but data transfer 10 hours (?)



If I was an IBM engineer, I might have tried to split the changes and find out 
what was the problem.



In recent days sometimes the backup failed and on others succeeded.
In summary, the most important issues are to remove the barriers for speed 
(virus checks, millions of files to be processed) and of course prevent any 
hick-ups or time outs.



Thanks for your invaluable advice!



David







-----Original Message-----
From: ADSM: Dist Stor Manager <[email protected]> On Behalf Of Loon, Eric 
van (ITOP DI) - KLM
Sent: Wednesday, July 9, 2025 10:02 AM
To: [email protected]
Subject: Re: [ADSM-L] incr backups fail repeatedly because of link to container 
storage



Hi David



The error 59 (An unexpected network error occurred. ERROR_UNEXP_NET_ERR) indeed 
indicates some kind of network related issue.

Good luck with fixing the issue.



Kind regards,

Eric van Loon

Air France/KLM Core Infra



-----Original Message-----

From: ADSM: Dist Stor Manager 
<[email protected]<mailto:[email protected]>> On Behalf Of David L.A. De 
Leeuw

Sent: Tuesday, July 8, 2025 10:02 AM

To: [email protected]<mailto:[email protected]>

Subject: Re: incr backups fail repeatedly because of link to container storage



Hi Eric,



I was happy too soon...

The first (large) backups ran great, but next day the problem returned.

Seems definitely disturbance on the network. Our other server (on the same 
physical system but with different storage) has errors like these as well.



Looks like we need some network sniffing.



Thanks



David



n.b. others with problems like this? Could you solve it ? any "network timeout 
parameter" we can set ?



Server log:



07-07-2025 18:56:39      ANR2012W Error encountered for storage pool directory:

                          
\\medbackup.med.ad.bgu.ac.il\tsmc26<file://medbackup.med.ad.bgu.ac.il/tsmc26> 
in storage pool:

                          CPOOL.

07-07-2025 18:56:39      ANR2012W Error encountered for storage pool directory:

                          
\\medbackup.med.ad.bgu.ac.il\tsmc1<file://medbackup.med.ad.bgu.ac.il/tsmc1> in 
storage pool:

                          CPOOL. (SESSION: 8067)

07-07-2025 18:56:39      ANR0530W Transaction failed for session 8067 for node

                          MEDFS2 (WinNT) - internal server error detected.

                          (SESSION: 8067)

07-07-2025 18:56:41      ANR0403I Session 8063 ended for node MEDFS2 (WinNT).

                          (SESSION: 8063)

07-07-2025 18:56:44      ANR0403I Session 7975 ended for node MEDFS2 (WinNT).

                          (SESSION: 7975)



Dsmffdc log:





[07-07-2025 18:56:24.948][ FFDC_GENERAL_SERVER_ERROR ]: 
[127](psfile.c:2909)(SESSION: 8063) Error (platform specific) 59 writing to 
file 
\\medbackup.med.ad.bgu.ac.il\tsmc15\55\000000000000555c.dcf<file://medbackup.med.ad.bgu.ac.il/tsmc15/55/000000000000555c.dcf>,
 amtWritten 0, size 774144 loc 2413883392.

[07-07-2025 18:56:25.519][ FFDC_GENERAL_SERVER_ERROR ]: 
[121](sdcreate.c:2172)(SESSION: 8063) Exiting with bad rc 2528 for bfId 
1608710764.

[07-07-2025 18:56:32.057][ FFDC_GENERAL_SERVER_ERROR ]: [423](psfile.c:2909) 
Error (platform specific) 59 writing to file 
\\medbackup.med.ad.bgu.ac.il\tsmc1\6a\0000000000006a44.dcf<file://medbackup.med.ad.bgu.ac.il/tsmc1/6a/0000000000006a44.dcf>,
 amtWritten 0, size 1048576 loc 577073152.

[07-07-2025 18:56:40.247][ FFDC_GENERAL_SERVER_ERROR ]: 
[172](psfile.c:2909)(SESSION: 8065) Error (platform specific) 59 writing to 
file 
\\medbackup.med.ad.bgu.ac.il\tsmc26\6a\0000000000006a4a.dcf<file://medbackup.med.ad.bgu.ac.il/tsmc26/6a/0000000000006a4a.dcf>,
 amtWritten 0, size 692352 loc 6642896896.

[07-07-2025 18:56:40.299][ FFDC_GENERAL_SERVER_ERROR ]: 
[172](sdprodcon.c:5009)(SESSION: 8065) Flushing write control: rc=2528.









-----Original Message-----

From: David de Leeuw

Sent: Saturday, July 5, 2025 12:58 PM

To: ADSM: Dist Stor Manager <[email protected]<mailto:[email protected]>>

Subject: RE: incr backups fail repeatedly because of link to container storage



Hi Eric,



We upgraded the server from 8.1.25 to 8.1.27.

Seems to solve the problem.

I ran a massive backup, took 10 hours. No errors.



Thanks

David





-----Original Message-----

From: ADSM: Dist Stor Manager 
<[email protected]<mailto:[email protected]>> On Behalf Of Loon, Eric van 
(ITOP DI) - KLM

Sent: Friday, July 4, 2025 10:27 AM

To: [email protected]<mailto:[email protected]>

Subject: Re: [ADSM-L] incr backups fail repeatedly because of link to container 
storage



Hi David,



Have a look at the dsmffdc.log in the instance directory on your server. What 
error does it show at 03-07-2025 08:06:40?



Kind regards,

Eric van Loon

Air France/KLM Core Infra



-----Original Message-----

From: ADSM: Dist Stor Manager 
<[email protected]<mailto:[email protected]>> On Behalf Of David L.A. De 
Leeuw

Sent: Thursday, July 3, 2025 9:55 AM

To: [email protected]<mailto:[email protected]>

Subject: incr backups fail repeatedly because of link to container storage



HI TSM's



We run Storage protect (versions 8.1) quite some time, with "container" storage 
on a windows server. The pool is on a remote system with 10Gb internet 
connection.



This works fine a number of years. We recently upgraded to server 8.1.25.

Now every few hours the backups fall with messages as



03-07-2025 08:06:39      ANR0950I Session 65175 for node MEDFS2 is using inline

                          server data deduplication or inline compression.

                          (SESSION: 65175)

03-07-2025 08:06:40      ANR2012W Error encountered for storage pool directory:

                          XXX in storage pool:

                          CPOOL. (SESSION: 65174)

03-07-2025 08:06:40      ANR1181E sdtxn.c(1409): Data storage transaction

                          0:35162311 was aborted. (SESSION: 65174)



Other backups continue at the same time.

There are no damaged or unavailable containers.

When restarting the INCR all works fine.

Seems to be a time-out or something.

Any ideas were to look for ?



David de Leeuw

Ben Gurion University of the Negev

********************************************************

For information, services and offers, please visit our web site: 
https://protect.checkpoint.com/v2/r02/___http://www.klm.com___.YzJlOmJlbmd1cmlvbnVuaXZlcnNpdHlvZnRoZW5lZ2V2OmM6bzo3NGYzZmZiY2RjMzFiNWQ5ODRlNjJkZDc2NmM3NDA1Yjo3Ojk0ZjY6YTk5MWE4Y2E0NTY1ZjYyYTMzMjEzYjg1NWIyYTdjYTE4ZGQ0YTE4ZDUzM2FiNzVlZGE1MmU4NDllNGZmNTA2ZTpwOlQ6Tg<https://protect.checkpoint.com/v2/r02/___http:/www.klm.com___.YzJlOmJlbmd1cmlvbnVuaXZlcnNpdHlvZnRoZW5lZ2V2OmM6bzo3NGYzZmZiY2RjMzFiNWQ5ODRlNjJkZDc2NmM3NDA1Yjo3Ojk0ZjY6YTk5MWE4Y2E0NTY1ZjYyYTMzMjEzYjg1NWIyYTdjYTE4ZGQ0YTE4ZDUzM2FiNzVlZGE1MmU4NDllNGZmNTA2ZTpwOlQ6Tg>.
 This e-mail and any attachment may contain confidential and privileged 
material intended for the addressee only. If you are not the addressee, you are 
notified that no part of the e-mail or any attachment may be disclosed, copied 
or distributed, and that any other action related to this e-mail or attachment 
is strictly prohibited, and may be unlawful. If you have received this e-mail 
by error, please notify the sender immediately by return e-mail, and delete 
this message.



Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.

Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286

********************************************************

********************************************************

For information, services and offers, please visit our web site: 
https://protect.checkpoint.com/v2/r02/___http://www.klm.com___.YzJlOmJlbmd1cmlvbnVuaXZlcnNpdHlvZnRoZW5lZ2V2OmM6bzozZjMzMjhkZTgzYjk2MjlhMzliZjkwODVmMzZkMDliOTo3OmIwN2M6MGVlMWFjN2E2ZDg1NDQ2MDc3NzdhMTE4Y2U0ZDEyNDgxNzU2NjViN2RkMWZiZTg4NDg1MTEyOGM0OGFjOGYzMzpwOlQ6Tg<https://protect.checkpoint.com/v2/r02/___http:/www.klm.com___.YzJlOmJlbmd1cmlvbnVuaXZlcnNpdHlvZnRoZW5lZ2V2OmM6bzozZjMzMjhkZTgzYjk2MjlhMzliZjkwODVmMzZkMDliOTo3OmIwN2M6MGVlMWFjN2E2ZDg1NDQ2MDc3NzdhMTE4Y2U0ZDEyNDgxNzU2NjViN2RkMWZiZTg4NDg1MTEyOGM0OGFjOGYzMzpwOlQ6Tg>.
 This e-mail and any attachment may contain confidential and privileged 
material intended for the addressee only. If you are not the addressee, you are 
notified that no part of the e-mail or any attachment may be disclosed, copied 
or distributed, and that any other action related to this e-mail or attachment 
is strictly prohibited, and may be unlawful. If you have received this e-mail 
by error, please notify the sender immediately by return e-mail, and delete 
this message.



Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.

Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286

********************************************************

Re: incr backups fail repeatedly because of link to container storage

Reply via email to