I am new to Python. I am trying to extract text from the bookmarks in a PDF 
file that would provide the data for a Word template merge. I have gotten down 
to a string of text pulled out of the list object that I got from using PyPDF2 
module.  I am stuck on now to get the data out of the string that I need.  I am 
calling it a string, but Python is recognizing as a dictionary object.  

Here is the string: 

{'/Title': '1F:  Progress Notes  Src.:  MILANI, JOHN C Tmt. Dt.:  05/12/2014 - 
05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}

What a want is the following to end up as fields on my Word template merge:
MedSourceFirstName: "John"
MedSourceLastName: "Milani"
MedSourceLastTreatment: "05/28/2014"

If I use keys() on the dictionary I get this:
['/Title', '/Page', '/Type']I was hoping "Src" and Tmt Dt." would be treated as 
keys.  Seems like the key/value pair of a dictionary would translate nicely to 
fieldname and fielddata for a Word document merge.  Here is my  code so far. 

[python]import PyPDF2
pdfFileObj=open('x.pdf','rb')
pdfReader=PyPDF2.PdfFileReader(pdfFileObj)
MyList=pdfReader.getOutlines()
MyDict=(MyList[-1][0])
print(isinstance(MyDict,dict))
print(MyDict)
print(list(MyDict.keys()))[/python] 

I get this output in Sublime Text:
True
{'/Title': '1F:  Progress Notes  Src.:  MILANI, JOHN C Tmt. Dt.:  05/12/2014 - 
05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}
['/Title', '/Page', '/Type']
[Finished in 0.4s]

Thank you in advance for any suggestions.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to