[issue7132] Regexp: capturing groups in repetitions

2014-03-22 Thread Mark Lawrence
Mark Lawrence added the comment: Can this be closed as has happened with numerous other issues as a result of work done on the new regex module via #2636? -- nosy: +BreamoreBoy ___ Python tracker __

[issue7132] Regexp: capturing groups in repetitions

2010-11-18 Thread Matthew Barnett
Matthew Barnett added the comment: Earlier this week I discovered that .Net supports repeated capture and its API suggested a much cleaner approach than what Perl offered, so I'll be adding it to the regex module at: http://pypi.python.org/pypi/regex The new methods will follow the examp

[issue7132] Regexp: capturing groups in repetitions

2010-04-08 Thread Ezio Melotti
Changes by Ezio Melotti : -- status: open -> languishing ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://

[issue7132] Regexp: capturing groups in repetitions

2010-04-01 Thread David Chambers
David Chambers added the comment: I would find this functionality very useful. While I agree that it's often simpler to extract the relevant information in several steps, there are situations in which I'd prefer to do it all in one go. The application I'm writing at the moment needs to extrac

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Umm I saif that the attribution to Thompson was wrong, in fact it was correct. Thompson designed and documented the algorithm in 1968, long before the Aho/Seti/Ullman green book... so the algorithm is more than 40 years old, and still not in Python, Perl

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Anyway, there are ways to speedup regexps, even without instructing the regexps with anti-backtracking syntaxes. See http://swtch.com/~rsc/regexp/regexp1.html (article dated January 2007) Which discusses how Perl, PCRE (and PHP), Python, Java, Ruby, .NET libr

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: You said that this extension was not implemented anywhere, and you were wrong. I've found that it IS implemented in Perl 6! Look at this discussion: http://www.perlmonks.org/?node_id=602361 Look at how the matches in quantified capture groups are returned as

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Matthew Barnett
Matthew Barnett added the comment: Instead of a new flag, a '*' could be put after the quantifier, eg: (\d+)(?:\.(\d+)){3}* MatchObject.group(1) would be a string and MatchObject.group(2) would be a list of strings. The group references could be \g<1>, \g<2:0>, \g<2:1>, \g<2:2>. However,

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: >>> re.match('^(\d{1,3})(?:\.(\d{1,3})){3}$', '192.168.0.1').groups() ('192', '1') > If I understood correctly what you are proposing, you would like it to return (['192'], ['168', '0', '1']) instead. In fact it can be assembled in a single array directly in th

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: >> And anyway, my suggestion is certainly much more useful than atomic >> groups and possessive groups that have much lower use [...] >Then why no one implemented it yet? :) That's because they had to use something else than regexps to do their parsing. All t

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread R. David Murray
Changes by R. David Murray : -- nosy: -r.david.murray ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://ma

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: > a "general" regex (e.g. for an ipv6 address) I know this problem, and I have already written about this. It is not possible to parse it in a single regexp if it is written without using repetitions. But in that case, the regexp becomes really HUGE, and the

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: > Even with your solution, in most of the cases you will need additional steps to assemble the results (at least in the cases with some kind of separator, where you have to join the first element with the followings). Yes, but this step is trivial and fully pr

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: > That's why I wrote 'without checking if they are in range(256)'; the fact that this regex matches invalid digits was not relevant in my example (and it's usually easier to convert the digits to int and check if 0 <= digits <= 255). :) NO ! You have to check a

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: ezio said: >>> re.match('^(\d{1,3})(?:\.(\d{1,3})){3}$', '192.168.0.1').groups() ('192', '1') > If I understood correctly what you are proposing, you would like it to return (['192'], ['168', '0', '1']) instead. Yes, exactly ! That's the correct answer that sho

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Ezio Melotti
Ezio Melotti added the comment: > You're wrong, it WILL be compatible, because it is only conditioned > by a FLAG. Sorry, I missed that you mentioned the flag already in the first message, but what I said in 1), 3) and 4) is still valid. > There are plenty of other more complex cases for which

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: I had read carefully ALL what ezio said, this is clear in the fact that I have summarized my responses to ALL the 4 points given by ezio. Capturing groups is a VERY useful feature of regular expressions, but they currently DON'T work as expected (in a useful

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread R. David Murray
R. David Murray added the comment: Just to clarify, when I said "in most cases such an issue would need to include a proposed patch", I mean that even if everyone agrees it is a good idea it isn't likely to happen unless there is a proposed patch :) -- _

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread R. David Murray
R. David Murray added the comment: If you read what Ezio wrote carefully you will see that he addressed both of your points: he acknowledged that a flag would solve (2) (but disagreed that it was worth it), and he said you could use the first expression to validate the string before using the sp

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: And anyway, my suggestion is certainly much more useful than atomic groups and possessive groups that have much lower use, and which are already being tested in Perl but that Python (or PCRE, PHP, and most implementations of 'vi'/'ed', or 'sed') still does no

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Summary of your points with my responses : > 1) it doesn't exist in any other implementation that I know; That's exactly why I proposed to discuss it with the developers of other implementations (I cited PCRE, Perl and PHP developers, there are others). >

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: You're wrong, it WILL be compatible, because it is only conditioned by a FLAG. The flag is there specifically for instructing the parser to generate lists of values rather than single values. Without the regular compilation flag set, as I said, there will be

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: In addition, your suggested regexp for IPv4: '^(\d{1,3})(?:\.(\d{1,3})){3}$' is completely WRONG ! It will match INVALID IPv4 address formats like "000.000.000.000". Reread the RFCs... because "000.000.000.000" is CERTAINLY NOT an IPv4 address (if it is foun

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Note that I used the IPv4 address format only as an example. There are plenty of other more complex cases for which we really need to capture the multiple occurences of a capturing group within a repetition. I'm NOT asking you how to parse it using MULTIPLE r

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Ezio Melotti
Ezio Melotti added the comment: I'm skeptical about what you are proposing for the following reasons: 1) it doesn't exist in any other implementation that I know; 2) if implemented as default behavior: * it won't be backward-compatible; * it will increase the complexity; 3) it will be a pr

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Ezio Melotti
Changes by Ezio Melotti : -- nosy: +ezio.melotti priority: -> low ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Implementation details: Currently, the capturing groups behave quite randomly in the values returned by MachedObject, when backtracking occurs in a repetition. This proposal will help fix the behavior, because it will also be much easier to backtrack cleanly

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: Rationale for the compilation flag: You could think that the compilation flag should not be needed. However, not using it would mean that a LOT of existing regular expressions that already contain capturing groups in repetitions, and for which the caputiring

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
Philippe Verdy added the comment: I'd like to add that the same behavior should also affect the span(index) method of MatchObject, that should also not just return a single (start, end) pair, but that should in this case return a list of pairs, one for each occurence, when the "R" compilation

[issue7132] Regexp: capturing groups in repetitions

2009-10-14 Thread Philippe Verdy
New submission from Philippe Verdy : For now, when capturing groups are used within repetitions, it is impossible to capure what they match individually within the list of matched repetitions. E.g. the following regular expression: (0|1[0-9]{0,2}|2(?:[0-4][0-9]?|5[0-5]?)?)(?:\.(0|1[0-9]{0,2}|