Re: Fast way of extracting files from various folders

subhabrata . banerji Sat, 02 May 2015 02:07:07 -0700

On Friday, May 1, 2015 at 5:58:50 PM UTC+5:30, subhabrat...@gmail.com wrote:
> Dear Group,
> 
> I have several millions of documents in several folders and subfolders in my 
> machine.
> I tried to write a script as follows, to extract all the .doc files and to 
> convert them in text, but it seems it is taking too much of time. 
> 
> import os
> from fnmatch import fnmatch
> import win32com.client
> import zipfile, re
> def listallfiles2(n):
>     root = 'C:\Cand_Res'
>     pattern = "*.doc"
>     list1=[]
>     for path, subdirs, files in os.walk(root):
>         for name in files:
>             if fnmatch(name, pattern):
>                 file_name1=os.path.join(path, name)
>                 if ".doc" in file_name1:
>                     #EXTRACTING ONLY .DOC FILES
>                     if ".docx" not in file_name1:
>                         #print "It is A Doc file$$:",file_name1
>                         try:
>                             doc = win32com.client.GetObject(file_name1)
>                             text = doc.Range().Text
>                             text1=text.encode('ascii','ignore')
>                             text_word=text1.split()
>                             #print "Text for Document File Is:",text1
>                             list1.append(text_word)
>                             print "It is a Doc file"
>                         except:
>                             print "DOC ISSUE"
> 
> But it seems it is taking too much of time, to convert to text and to append 
> to list. Is there any way I may do it fast? I am using Python2.7 on Windows 7 
> Professional Edition. Apology for any indentation error. 
> 
> If any one may kindly suggest a solution.
> 
> Regards,
> Subhabrata Banerjee.


Thanks. You are right conversions are taking time. I would surely check. Rest 
part is okay. Regards, Subhabrata Banerjee. 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Fast way of extracting files from various folders

Reply via email to