[issue31677] email.header uses re.IGNORECASE without re.ASCII

INADA Naoki Tue, 03 Oct 2017 05:58:22 -0700

New submission from INADA Naoki <[email protected]>:

email.header has this pattern:


https://github.com/python/cpython/blob/85c0b8941f0c8ef3ed787c9d504712c6ad3eb5d3/Lib/email/header.py#L34-L43

# Match encoded-word strings in the form =?charset?q?Hello_World?=              
         
ecre = re.compile(r'''                                                          
         
  =\?                   # literal =?                                            
         
  (?P<charset>[^?]*?)   # non-greedy up to the next ? is the charset            
         
  \?                    # literal ?                                             
         
  (?P<encoding>[qb])    # either a "q" or a "b", case insensitive               
         
  \?                    # literal ?                                             
         
  (?P<encoded>.*?)      # non-greedy up to the next ?= is the encoded string    
         
  \?=                   # literal ?=                                            
         
  ''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)


Since only 's' and 'i' has other lower case character, this is not a real bug.
But using re.ASCII is more safe.

Additionally, email.util has same pattern from 10 years ago, and it is not used 
by anywhere.
It should be removed.

----------
components: Regular Expressions
messages: 303612
nosy: ezio.melotti, inada.naoki, mrabarnett
priority: normal
severity: normal
status: open
title: email.header uses re.IGNORECASE without re.ASCII
versions: Python 3.7

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue31677>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue31677] email.header uses re.IGNORECASE without re.ASCII

Reply via email to