On 09/02/18 18:35, Stanley Denman wrote:
On Friday, February 9, 2018 at 1:08:27 AM UTC-6, dieter wrote:
Stanley Denman <dallasdisabilityattor...@gmail.com> writes:
I am new to Python. I am trying to extract text from the bookmarks in a PDF
file that would provide the data for a Word template merge. I have gotten down
to a string of text pulled out of the list object that I got from using PyPDF2
module. I am stuck on now to get the data out of the string that I need. I am
calling it a string, but Python is recognizing as a dictionary object.
Here is the string:
{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 -
05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}
What a want is the following to end up as fields on my Word template merge:
MedSourceFirstName: "John"
MedSourceLastName: "Milani"
MedSourceLastTreatment: "05/28/2014"
If I use keys() on the dictionary I get this:
['/Title', '/Page', '/Type']I was hoping "Src" and Tmt Dt." would be treated as
keys. Seems like the key/value pair of a dictionary would translate nicely to fieldname and
fielddata for a Word document merge. Here is my code so far.
A Python "dict" is a mapping of keys to values. Its "keys" method
gives you the keys (as you have used above).
The subscription syntax ("<some_dict>[<some_key>]"; e.g.
"pdf_info['/Title']") allows you to access the value associated with
"<some_key>".
In your case, relevant information is coded inside the values themselves.
You will need to extract this information yourself. Python's "re" module
might be of help (see the "library reference", for details).
Thanks for your response. Nice to know I am at least on the right path.
Sounds like I am going to have to did in to Regex to get at the test I want.
Maybe using string methods is simpler than a regex.
>>> data = '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.:
05/12/2014 - 05/28/2014 (9 pages)'
>>> bits = data.split(':')
>>> bits
['1F', ' Progress Notes Src.', ' MILANI, JOHN C Tmt. Dt.', '
05/12/2014 - 05/28/2014 (9 pages)']
>>> namebits = bits[2].split()
>>> namebits
['MILANI,', 'JOHN', 'C', 'Tmt.', 'Dt.']
# I'll leave you to grab the names, and strip the comma from the last name.
>>> start = bits[3].find('- ')
>>> stop = bits[3].find('(')
>>> date = bits[3][start + 2: stop].strip()
>>> date
'05/28/2014'
Apologies for the variable names used, I'm sure that you can think of
something better :)
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
--
https://mail.python.org/mailman/listinfo/python-list