New submission from Evan <ep58...@student.american.edu>:
Relevant base python library-- C:\Users\User\AppData\Local\Programs\Python\Python38\lib\_markupbase.py The issue- After parsing over 900 SEC filings using beautifulsoup4, I get this user warning. UserWarning: unknown status keyword 'ERF' in marked section warnings.warn(msg) Followed by a traceback .... File "C:\Users\XXXX\AppData\Local\Programs\Python\Python38\lib\site-packages\bs4\__init__.py", line 325, in __init__ self._feed() .... File "C:\Users\XXXX\AppData\Local\Programs\Python\Python38\lib\_markupbase.py", line 160, in parse_marked_section if not match: UnboundLocalError: local variable 'match' referenced before assignment It's probably to due to malformed input from on of the docs. 144 lines into _markupbase lib we have: # Internal -- parse a marked section # Override this to handle MS-word extension syntax <![if word]>content<![endif]> def parse_marked_section(self, i, report=1): rawdata= self.rawdata assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_section()" sectName, j = self._scan_name( i+3, i ) if j < 0: return j if sectName in {"temp", "cdata", "ignore", "include", "rcdata"}: # look for standard ]]> ending match= _markedsectionclose.search(rawdata, i+3) elif sectName in {"if", "else", "endif"}: # look for MS Office ]> ending match= _msmarkedsectionclose.search(rawdata, i+3) else: self.error('unknown status keyword %r in marked section' % rawdata[i+3:j]) if not match: return -1 if report: j = match.start(0) self.unknown_decl(rawdata[i+3: j]) return match.end(0) `match` should be set to None in the fall-through else statement right before `if not match`. ---------- components: Library (Lib) messages: 363234 nosy: SanJacintoJoe priority: normal severity: normal status: open title: Bug in html parsing module triggered by malformed input type: compile error versions: Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue39833> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com