Re: URI Basics

2006-04-25 Thread Ramprasad
> There is definitely a VERY significant performance penalty to using > rawbody over URI, for any rule. > > Consider the size of input. A rawbody regex must be run against the > entire text of the body after QP decoding. A uri regex must be run > against all the text of the URIs that SA found. Th

Re: URI Basics

2006-04-24 Thread Dan
Gentlemen, Thank you for the all the great input. Specifically, you're learning perl regular expressions, and perl is a language that gives you a million different ways to skin a cat, so to speak. As the quote goes "all things are permissible, but not all things are beneficial". It's a

Re: URI Basics

2006-04-24 Thread Theo Van Dinter
On Mon, Apr 24, 2006 at 09:27:47PM -0400, Matt Kettler wrote: > > Is URI the way to go when tracking obsfucation, as in: > > uri __LINKAGE_A284 [EMAIL PROTECTED] Yes. The uri rules run over both the raw version and the decoded versions. > Neither of the above will work.. Both uri and rawbody rul

Re: URI Basics

2006-04-24 Thread Matt Kettler
Dan wrote: > Follow up question: > > Is URI the way to go when tracking obsfucation, as in: > uri __LINKAGE_A284 [EMAIL PROTECTED] > > ...or will URI's translation get in the way, requiring something more > like?: > rawbody __LINKAGE_A284 [EMAIL PROTECTED] > Neither of the above will work.. Both ur

Re: URI Basics

2006-04-24 Thread Dan
Follow up question: Is URI the way to go when tracking obsfucation, as in: uri __LINKAGE_A284 [EMAIL PROTECTED] ...or will URI's translation get in the way, requiring something more like?: rawbody __LINKAGE_A284 [EMAIL PROTECTED] Thanks, Dan

Re: URI Basics

2006-04-24 Thread Matt Kettler
Dan wrote: >> In 3 ^ is the first character of the regex, just as it is in 1 and 2. It >> is also inside the delimiters, just like 1 and 2. In example 3 @ is >> being used as a delimiter, and ^ is the first character after it. > > Are you saying that in URIs, any character (@ in this case) can ser

Re: URI Basics

2006-04-24 Thread John Rudd
On Apr 24, 2006, at 5:18 PM, Dan wrote: I'm beginning to realize how many of my learning curve issues are attempts to understand the very structure of a system created with a bare minimum of structure. Specifically, you're learning perl regular expressions, and perl is a language that gives

Re: URI Basics

2006-04-24 Thread Theo Van Dinter
On Mon, Apr 24, 2006 at 05:18:23PM -0700, Dan wrote: > Are you saying that in URIs, any character (@ in this case) can serve > as the delimiter, so long as it displays after the m and again at the > end of the entry? Yes. Take a look at the perlre and perlop (specifically the m// operator) do

Re: URI Basics

2006-04-24 Thread Dan
In 3 ^ is the first character of the regex, just as it is in 1 and 2. It is also inside the delimiters, just like 1 and 2. In example 3 @ is being used as a delimiter, and ^ is the first character after it. Are you saying that in URIs, any character (@ in this case) can serve as the delimit

Re: URI Basics

2006-04-24 Thread Matt Kettler
Dan Patnode wrote: > Another Newbie question here, > > So IRIs find links in the body. I'm trying to get a handle on URI > syntax and have found several disparate examples: > > > 1) uri HTTP_CTRL_CHARS_HOST > /^https?\:\/\/[^\/\s]*[\x00-\x08\x0b\x0c\x0e-\x1f]/ > > 2) uri NORMAL_HTTP_TO_IP

URI Basics

2006-04-24 Thread Dan Patnode
Another Newbie question here, So IRIs find links in the body. I'm trying to get a handle on URI syntax and have found several disparate examples: 1) uri HTTP_CTRL_CHARS_HOST /^https?\:\/\/[^\/\s]*[\x00-\x08\x0b\x0c \x0e-\x1f]/ 2) uri NORMAL_HTTP_TO_IPm{^https?://\d+\.\d+