Re: regex pattern to extract repeating groups

Malcolm Mon, 27 Aug 2018 17:02:13 -0700

On 28/08/2018 7:09 AM, John Pote wrote:

On 26/08/2018 00:55, Malcolm wrote:
I am trying to understand why regex is not extracting all of thecharacters between two delimiters.
The complete string is the xmp IFD data extracted from a .CR2 imagefile.
I do have a work around, but it's messy and possibly not future proof.
Do you mean future proof your workaround or Cannon's .CR2 raw imagefiles might change? I guess .CR2's won't change but Cannon havebrought out the new .CR3 raw image file for which I needed to upgrademy photo editing suit (at least I didn't but used their tool toconvert .CR3s from the camera to the digital negative format whichmany photo editors can handle.) Can send you sample .CR3 if you wantto compare.
Regards,
John

John

Thank you.

Some background

The application is for personal use. Why I'm familiar with pythongenerally (and thanks to all who post code and answer questions), thisis the first time I have used structs to read a binary file, xml parsersto parse some of the RFD contents and re.


First

I have now discovered that when print the return of re.search that thematched='truncates the matched characters'. To see/get all foundcharacters I need to use the span as indexes to the original string. I'mnot sure if this is mentioned in the re documentation. But all thesamples I've seen on the web use only small strings. This was the causeof my question.


for example
import re

data = '''
   <dc:creator>
    <rdf:Seq>
     <rdf:li>abcdef zxcvb</rdf:li>
    </rdf:Seq>
   </dc:creator>
   '''

re_pattern =r'( *<dc:.*</dc:)' x = re.search(re_pattern, data, re.DOTALL)
print(x)
print(data[x.span()[0] : x.span()[1]])


returns
<_sre.SRE_Match object; span=(1, 89), match='   <dc:creator>\n    <rdf:Seq>\n     
<rdf:li>abcd>
   <dc:creator>
    <rdf:Seq>
     <rdf:li>abcdef zxcvb</rdf:li>
    </rdf:Seq>
   </dc:



Second

By future proofing: At the moment I'm testing code against one .CR2image. My wish at the moment is that my code will work on all of my .CR2images from different cameras. When I upgrade my camera(s) to one(s)that produces .CR3 images I will, no doubt, need to re test my code.

All I trying to do really is to extract some metadata and athumbnail/preview jpg using python instead of relying on subprocess andexiftool/ exiv2. trying to speed things up. Oh and I got side trackedon learning something new.


Malcolm

the full RDF-XMP extracted truncated

xml_data = '''<rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>

  <rdf:Description rdf:about=""
    xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/";
    xmlns:dc="http://purl.org/dc/elements/1.1/";
    xmlns:exif="http://ns.adobe.com/exif/1.0/";
    xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/";
    xmlns:tiff="http://ns.adobe.com/tiff/1.0/";
    xmlns:xmp="http://ns.adobe.com/xap/1.0/";
   Iptc4xmpCore:CountryCode="AUS"
   Iptc4xmpCore:Location="Binna Burra"
   exif:DateTimeDigitized="2018-07-30T09:18:24+10:00"
   exif:DateTimeOriginal="2018-07-30T09:18:24+10:00"
   exif:GPSAltitude="4052/5"
   exif:GPSAltitudeRef="0"
   exif:GPSLatitude="28,11.734230S"
   exif:GPSLongitude="153,11.218140E"
   exif:GPSMapDatum="WGS-84"
   exif:GPSSpeed="28033/5697"
   exif:GPSSpeedRef="K"
   exif:GPSTimeStamp="2018-07-29T23:18:24Z"
   exif:GPSVersionID="2.2.0.0"
   photoshop:City="Lamington National Park"
   photoshop:Country="Australia"
   photoshop:DateCreated="2018-07-30T09:18:24+10:00"
   photoshop:State="Qld"
   tiff:Artist="Malcolm Blake"
   xmp:ModifyDate="2018-07-30T09:18:24+10:00">
   <dc:creator>
    <rdf:Seq>
     <rdf:li>Malcolm Blake</rdf:li>
    </rdf:Seq>
   </dc:creator>
   <dc:rights>
    <rdf:Alt>
     <rdf:li xml:lang="x-default">Malcolm Blake</rdf:li>
    </rdf:Alt>
   </dc:rights>
   <dc:subject>
    <rdf:Bag>

<rdf:li>AUS, Arthur Grooms Cottage, Australia, Binna Burra,Lamington National Park, Qld</rdf:li>

    </rdf:Bag>
   </dc:subject>
  </rdf:Description>
 </rdf:RDF>'''








--
https://mail.python.org/mailman/listinfo/python-list

Re: regex pattern to extract repeating groups

Reply via email to