I have been thinking a little about this. Niphlod's suggestion solves the 
problem for me at the moment, but isn't there an enormous problem? It seems 
that any web2py installation can be taken down accidentally or maliciously 
just by somebody requesting an invalid argument string in the url of the 
form 'xxxxxxX' where the 'x's are valid characters and there are enough of 
them, and the 'X' is invalid? There must be a lot of vulnerable sites out 
there.

It seems to me there is one easy fix which is to just strip out invalid 
characters before the regex match. You will get collisions, but since the 
url is invalid anyway, who cares? Or the string could be urlencoded first 
so that the invalid characters become % encoded?


On Tuesday, November 13, 2012 7:33:26 PM UTC, Jonathan Lundell wrote:
>
> On 13 Nov 2012, at 11:20 AM, Niphlod <nip...@gmail.com <javascript:>> 
> wrote:
>
> I'm definitely not a regex master, but what's the *[=.]?* part required 
> for ?
>
>
> The idea (not mine, fwiw) is that you can have multiple strings of [\w@ 
> -]+ separated or ended (but not begun) with a single . or = (but not 
> multiple ones). My workaround would allow leading or multiple . or =. I 
> think we probably should anyway, since we should be assuming that args are 
> necessarily a file path, which seems to be what's going on there.
>
> It's trying to prevent stuff like foo/../../../bar.
>
>
> On Tuesday, November 13, 2012 7:00:32 PM UTC+1, Jonathan Lundell wrote:
>>
>> On 13 Nov 2012, at 9:04 AM, Niphlod <nip...@gmail.com> wrote:
>>
>> seems a problem with the default regex checking for args.... Let's wait 
>> for Jonathan
>>
>> >>> import re
>> >>> mymatch = re.compile(r'([\w@ -]+[=.]?)*$')
>> >>> mymatch.match('a')
>> <_sre.SRE_Match object at 0x02A61020>
>> >>> mymatch.match('Abbbbbbbb Lccc - Pddddddd GA Deeeeee (ffff ffff 
>> A).pdf')
>>
>> endless loop of backtracing regex
>>
>>
>> I don't have a quick fix. The easy solutions involve re elements not 
>> available in Python re (or at least not until 3.1).
>>
>> A workaround would be to make the pattern a little more lenient: [\w@ 
>> -=.]+
>>
>> If we really want to exclude successive dots or equals, we could make a 
>> separate check for that.
>>
>
>
>
>
>

-- 



Reply via email to