Hi Zoltan, i will come back to the approach Jonas mentioned (as I'm the author of that text: thanks to Jonas for doing this ;-) )
the text is in german of course, but the script has some comments in English and will be understandable -- I hope so :-) the text describes first the problem everybody on this list will know: the treewalk takes more times than we have. TSM/ISP has some opportunities to speed up, such as "-incrbydate", but they do not work properly. So for me the only solution is to parallelize the tree walk and do partial incremental backups. First tried to write it with BASH commands, but multithreading was not easy to implement and second it won't run on windows -- but our largest filers ( 500 TB - 1.2 PB) need to be accessed via CIFS to store the ACL information. My first steps with PowerShell for the Windows cost lots of time and were disappointing. Using PERL made everything really easy as it runs on windows with the strawberry perl software and within the script there are only a few if-conditions needed to determine between Linux and Windows. I did some tests according to the depth or the level of the filetree to dive in: As the subfolders are of unequal size, diving just below the mount point and parallelize on the folders of this "first level" mostly does not work well, there's (nearly) always one folder taking all the time. On the other hand diving into all levels will take a certain amount of additional time. The best performance I do see using 3 to 4 levels and 4 to 6 parallel threads for each node. Due to separating users and for accounting I have several nodes on such large file systems. So in total there are about 20 to 40 streams in parallel. Rudi Wüst mentioned in my text figured out a p520 server running AIX6 will support up to 2,000 parallel streams, but as mentioned by Grant using an isilon system the filer will be the bottle neck. As mentioned by Del, you may also test a commercial software "MAGS" by general storage, it can addresses multiple isilon nodes in parallel If there're any questions -- just ask or have a look on the script: https://gitlab.gwdg.de/bnachtw/dsmci // even if the last submit is about 4 month old, the project is still in development ;-) ==> maybe I should update and translate the text from the "GWDG news" to English? Any interest? Best Bjørn p.s. A Result from the wild (weekly backup of a node from a 343 TB Quantum StorNext File System) : >> Process ID : 12988 Path processed : <removed> ------------------------------------------------- Start time : 2018-07-14 12:00 End time : 2018-07-15 06:07 total processing time : 3d 15h 59m 23s total wallclock time : 18h 7m 30s effective speedup : 4.855 using 6 parallel threads datatransfertime ratio: 3.575 % ------------------------------------------------- Objects inspected : 92061596 Objects backed up : 9774876 Objects updated : 0 Objects deleted : 0 Objects expired : 7696 Objects failed : 0 Bytes inspected : 52818.242 (GB) Bytes transferred : 5063.620 (GB) ------------------------------------------------- Number of Errors : 0 Number of Warnings : 43 # of severe Errors : 0 # Out-of-Space Errors : 0 << -------------------------------------------------------------------------------------------------- Bjørn Nachtwey Arbeitsgruppe "IT-Infrastruktur“ Tel.: +49 551 201-2181, E-Mail: bjoern.nacht...@gwdg.de -------------------------------------------------------------------------------------------------- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) Am Faßberg 11, 37077 Göttingen, URL: http://www.gwdg.de Tel.: +49 551 201-1510, Fax: +49 551 201-2150, E-Mail: g...@gwdg.de Service-Hotline: Tel.: +49 551 201-1523, E-Mail: supp...@gwdg.de Geschäftsführer: Prof. Dr. Ramin Yahyapour Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau Sitz der Gesellschaft: Göttingen Registergericht: Göttingen, Handelsregister-Nr. B 598 -------------------------------------------------------------------------------------------------- Zertifiziert nach ISO 9001 -------------------------------------------------------------------------------------------------- -----Ursprüngliche Nachricht----- Von: ADSM: Dist Stor Manager <ADSM-L@VM.MARIST.EDU> Im Auftrag von Zoltan Forray Gesendet: Mittwoch, 11. Juli 2018 13:50 An: ADSM-L@VM.MARIST.EDU Betreff: Re: [ADSM-L] Looking for suggestions to deal with large backups not completing in 24-hours I will need to translate to English but I gather it is talking about the RESOURCEUTILZATION / MAXNUMMP values. While we have increased MAXNUMMP to 5 on the server (will try going higher), not sure how much good it would do since the backup schedule uses OBJECTS to point to a specific/single mountpoint/filesystem (see below) but is worth trying to bump the RESOURCEUTILIZATION value on the client even higher... We have checked the dsminstr.log file and it is spending 92% of the time in PROCESS DIRS (no surprise) 7:46:25 AM SUN : q schedule * ISILON-SOM-SOMADFS1 f=d Policy Domain Name: DFS Schedule Name: ISILON-SOM-SOMADFS1 Description: ISILON-SOM-SOMADFS1 Action: Incremental Subaction: Options: -subdir=yes Objects: \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\* Priority: 5 Start Date/Time: 12/05/2017 08:30:00 Duration: 1 Hour(s) Maximum Run Time (Minutes): 0 Schedule Style: Enhanced Period: Day of Week: Any Month: Any Day of Month: Any Week of Month: Any Expiration: Last Update by (administrator): ZFORRAY Last Update Date/Time: 01/12/2018 10:30:48 Managing profile: On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas <jan...@itc.rwth-aachen.de> wrote: > It is possible to da a parallel backup of file system parts. > https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german) > have a look on page 10. > > --- > Jonas Jansen > > IT Center > Gruppe: Server & Storage > Abteilung: Systeme & Betrieb > RWTH Aachen University > Seffenter Weg 23 > 52074 Aachen > Tel: +49 241 80-28784 > Fax: +49 241 80-22134 > jan...@itc.rwth-aachen.de > www.itc.rwth-aachen.de > > -----Original Message----- > From: ADSM: Dist Stor Manager <ADSM-L@VM.MARIST.EDU> On Behalf Of Del > Hoobler > Sent: Monday, July 9, 2018 3:29 PM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: [ADSM-L] Looking for suggestions to deal with large > backups not completing in 24-hours > > They are a 3rd-party partner that offers an integrated Spectrum > Protect solution for large filer backups. > > > Del > > ---------------------------------------------------- > > "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on 07/09/2018 > 09:17:06 AM: > > > From: Zoltan Forray <zfor...@vcu.edu> > > To: ADSM-L@VM.MARIST.EDU > > Date: 07/09/2018 09:17 AM > > Subject: Re: Looking for suggestions to deal with large backups not > > completing in 24-hours Sent by: "ADSM: Dist Stor Manager" > > <ADSM-L@VM.MARIST.EDU> > > > > Thanks Del. Very interesting. Are they a VAR for IBM? > > > > Not sure if it would work in the current configuration we are using > > to > back > > up ISILON. I have passed the info on. > > > > BTW, FWIW, when I copied/pasted the info, Chrome spell-checker > red-flagged > > on "The easy way to incrementally backup billons of objects" (billions). > > So if you know anybody at the company, please pass it on to them. > > > > On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler <hoob...@us.ibm.com> wrote: > > > > > Another possible idea is to look at General Storage dsmISI MAGS: > > > > > > INVALID URI REMOVED > > > > u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c > =jf_ia > SHvJObTbx- > > > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1H > FyHU75 > lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv- > > ach7r-VHXaLNVD_E&e= > > > > > > > > > Del > > > > > > > > > "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on > > > 07/05/2018 > > > 02:52:27 PM: > > > > > > > From: Zoltan Forray <zfor...@vcu.edu> > > > > To: ADSM-L@VM.MARIST.EDU > > > > Date: 07/05/2018 02:53 PM > > > > Subject: Looking for suggestions to deal with large backups not > > > > completing in 24-hours Sent by: "ADSM: Dist Stor Manager" > > > > <ADSM-L@VM.MARIST.EDU> > > > > > > > > As I have mentioned in the past, we have gone through large > migrations > > > to > > > > DFS based storage on EMC ISILON hardware. As you may recall, we > backup > > > > these DFS mounts (about 90 at last count) using multiple Windows > servers > > > > that run multiple ISP nodes (about 30-each) and they access each > > > > DFS mount/filesystem via -object=\\rams.adp.vcu.edu\departmentname. > > > > > > > > This has lead to lots of performance issue with backups and some > > > > departments are now complain that their backups are running into > > > > multiple-days in some cases. > > > > > > > > One such case in a department with 2-nodes with over 30-million > objects > > > for > > > > each node. In the past, their backups were able to finish > > > > quicker > since > > > > they were accessed via dedicated servers and were able to use > Journaling > > > to > > > > reduce the scan times. Unless things have changed, I believe > Journling > > > is > > > > not an option due to how the files are accessed. > > > > > > > > FWIW, average backups are usually <50k files and <200GB once it > finished > > > > scanning..... > > > > > > > > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head > > > > since > > > many > > > > of these objects haven't been accessed in many years old. But as > > > > I understand it, that won't work either given our current > configuration. > > > > > > > > Given the current DFS configuration (previously CIFS), what can > > > > we > do to > > > > improve backup performance? > > > > > > > > So, any-and-all ideas are up for discussion. There is even > discussion > > > on > > > > replacing ISP/TSM due to these issues/limitations. > > > > > > > > -- > > > > *Zoltan Forray* > > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > > Xymon Monitor Administrator VMware Administrator Virginia > > > > Commonwealth University UCC/Office of Technology Services > > > > www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a > > > > phishing victim - VCU and other reputable organizations > will > > > > never use email to request that you reply with your password, > > > > social security number or confidential personal information. For > > > > more > details > > > > visit INVALID URI REMOVED > > > > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx- > > > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=5bz_TktY > > > > 3- > > > > a432oKYronO-w1z- > > > > ax8md3tzFqX9nGxoU&s=EudIhVvfUVx4-5UmfJHaRUzHCd7Agwk3Pog8wmEEpdA& > > > > e= > > > > > > > > > > > > > -- > > *Zoltan Forray* > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > Xymon Monitor Administrator VMware Administrator Virginia > > Commonwealth University UCC/Office of Technology Services > > www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing > > victim - VCU and other reputable organizations will never use email > > to request that you reply with your password, social security number > > or confidential personal information. For more details visit INVALID > > URI REMOVED > > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx- > > > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1H > FyHU75 > lwUZLmc_kYAQxroVCZQUCSs&s=umTd28h- > > GlxqSvNShsNIqm8D1PcanVk0HPcP5KTurKw&e= > > > -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://phishing.vcu.edu/