On Sun, Dec 6, 2020 at 2:37 PM Barry <ba...@barrys-emacs.org> wrote:
> > On 5 Dec 2020, at 23:44, Peter J. Holzer <hjp-pyt...@hjp.at> wrote: > > > > On 2020-12-05 23:42:11 +0100, sjeik_ap...@hotmail.com wrote: > >> Timeout: no idea. But check out re.compile and re.iterfind as they > might > >> speed things up. > > > > I doubt that compiling regular expressions helps the OP much. Compiled > > regular expressions are cached, but more importantly, if a match takes > > long enough that specifying a timeout is useful, the time is almost > > certainly not spent compiling, but matching - most likely backtracking > > from lots of promising but ultimately unsuccessful partial matches. > > > >> regex = r'data-stid="section-room-list"[\s\S]*?>\s*([\s\S]*?)\s*' \ > >> > >> > > r'(?:class\s*=\s*"\s*sticky-book-now\s*"|</ul>\s*</section>|id\s*=\s*"Location")' > >> rooms_blocks_to_be_replace = re.findall(regex, html_template) > > > > This part: > > > > \s*([\s\S]*?)\s*' > > > > looks dangerous from a performance point of view. If that can be > > rewritten with less potential for backtracking, it might help. > > > > Generally, it should be possible to implement a timeout for any > > operation by either scheduling an alarm with signal.alarm or by > > executing the operation in a separate thread and killing the thread if > > it takes too long. > > I think that python ignores signals until the coeval loop is entered. > And since the re.match will block that is not going to happen. > > Killing threads is not safe and if your OS allows it then you end up with > the internal state of python messed up. > > To implement this I think requires the re code to implement the timeout. > > Better is for the OP to fix the re to not back track so much or to work on > the > input string in chunks. > If the regex is expensive enough to warrant it, you could use a subprocess - they are killable. -- https://mail.python.org/mailman/listinfo/python-list