Stanley Denman <dallasdisabilityattor...@gmail.com> writes: > I am new to Python. I am trying to extract text from the bookmarks in a PDF > file that would provide the data for a Word template merge. I have gotten > down to a string of text pulled out of the list object that I got from using > PyPDF2 module. I am stuck on now to get the data out of the string that I > need. I am calling it a string, but Python is recognizing as a dictionary > object. > > Here is the string: > > {'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 > - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'} > > What a want is the following to end up as fields on my Word template merge: > MedSourceFirstName: "John" > MedSourceLastName: "Milani" > MedSourceLastTreatment: "05/28/2014" > > If I use keys() on the dictionary I get this: > ['/Title', '/Page', '/Type']I was hoping "Src" and Tmt Dt." would be treated > as keys. Seems like the key/value pair of a dictionary would translate > nicely to fieldname and fielddata for a Word document merge. Here is my > code so far.
A Python "dict" is a mapping of keys to values. Its "keys" method gives you the keys (as you have used above). The subscription syntax ("<some_dict>[<some_key>]"; e.g. "pdf_info['/Title']") allows you to access the value associated with "<some_key>". In your case, relevant information is coded inside the values themselves. You will need to extract this information yourself. Python's "re" module might be of help (see the "library reference", for details). -- https://mail.python.org/mailman/listinfo/python-list