Re: Do chained comparisons short-circuit?
On Thursday 19 January 2006 04:25, Luke Palmer wrote: > On 1/19/06, Joe Gottman <[EMAIL PROTECTED]> wrote: > >Suppose I have code that looks like this: > > > > my ($x, $y, $z) = (1, 2, 3); > > > > say "sorted backward" if ++$x > ++$y > ++$z; > > > > Will $z be incremented even though the chained comparison is known to be > > false after ++$x and ++$y are compared? > > I don't see a reason for chained comparisons not to short-circuit, > besides the surprise factor. But anyone who knows about &&, and > understands chained comparisons as expanding to &&, should understand > short-circuiting behavior. Although that may lead to _longer_ code, which (when extended) is likely to be broken: $x++; $y++; $z++; say "sorted backward" if $x > $y > $z; To be honest, in this example it mostly doesn't matter; if $x > $y, then ($x+1) > ($y+1). But in many quickly written scripts I did some numeric operation to force the value to numeric, even if I got a parameter like "string" (which becomes 0 when numyfied) How about some flag saying "don't short-circuit this"? Regards, Phil
Re: multi method dispatching of optional arguments (further refined)
On Sunday 03 September 2006 14:25, Mark Stosberg wrote: > Luke Palmer wrote: > > On 9/3/06, Mark Stosberg <[EMAIL PROTECTED]> wrote: > >> Note that the variant /with/ the parameter can be considered an exact > >> match, but but the variant /without/ it cannot be considered an exact > >> match. Excuse me for getting into this thread with only minor knowledge about perl6, but will there be MMD based on the *value* of parameters? Like Haskell has. I don't know about a possible syntax, but sometimes it's a very nice way to dispatch to different parts. (I know that that's possible with if statements, but they have a disadvantage: they're not so visually "dispatching", if you know what I mean). Regards, Phil
Re: multi method dispatching of optional arguments (further refined)
On Monday 04 September 2006 16:21, Audrey Tang wrote: > 2006/9/4, Ph. Marek <[EMAIL PROTECTED]>: > > Excuse me for getting into this thread with only minor knowledge about > > perl6, but will there be MMD based on the *value* of parameters? Like > > Haskell has. > > Why, yes, see the various Unpacking sections in S06, as well as "where" > type constraints. We're st^H^Hadapting as much as we can. :-) Hello Audrey! I now had a look at http://dev.perl.org/perl6/doc/design/syn/S06.html but didn't find what I meant. Sorry if I'm just dumb and don't understand you (or S06); I'll try to explain what I mean. In Haskell you can eg. write: SomeThing :: Int -> Int -> Int SomeThing a b | a = 4 : b+2 | b = 3 : a+1 | otherwise : a*b or AnotherThing :: Int -> Int -> Int AnotherThing 4 b = b+2 AnotherThing b 3 = a+1 AnotherThing a b = a*b In Perl5 this looks like sub SomeThing { my($a, $b)[EMAIL PROTECTED]; return b+2 if ($a == 4); return a+1 if ($b == 3); return a*b; } Which is a bit wrong IMO, because the condition should be first. But sub SomeThing { my($a, $b)[EMAIL PROTECTED]; if ($a == 4) { return b+2 } if ($b == 3) { return a+1 } return a*b; } is a bit of a hazzle with the {} and repeated if()s. What I am asking is whether there will be some multimethod dispatch depending on the *value*, not the *type*, of parameters. Perl6 could possibly do something with "given"; but matching on multiple variables seems to be verbose, too. I'm looking for something in the way of sub SomeThing(Num $a, Num $b) where $a==4 is $b+2; sub SomeThing(Num $a, Num $b) where $b==3 is $a+1; sub SomeThing(Num $a, Num $b) { return $a * $b } but without specifying the signature multiple times (or maybe we should, since it's MMD). Now sub SomeThing(Num $a, Num $b) { if $a==4 { return $b+2;} if $b==3 { return $a+1;} return $a * $b; } would almost do what I want, but I don't know if the compiler would optimize that in the way it could for direct MMD depending on types. Regards, Phil
Re: multi method dispatching of optional arguments (further refined)
On Tuesday 05 September 2006 07:52, Trey Harris wrote: > I don't think you're dumb; the Synopses just require that you intuit > certain things from each other, from examples in other Synopses, and so on > in a Perlish sort of way; what you're looking for is not spelled out > explicitly. It can be found by noticing how you specify subtypes, along > with noticing that subtypes can be specified as parameter types. There's > also an example showing explicitly what you want in S12. Ok, I'll try to dive through the documentation before asking questions. > It's just > > multi sub SomeThing(Num $a where {$^a == 4}, Num $b) { $b + 2 } > multi sub SomeThing(Num $a, Num $b where {$^b == 3}) { $a + 1 } > multi sub SomeThing(Num $a, Num $b) { $a * $b } > > Yes, the signatures are different--the first two multis specify subtypes > as their signatures, the last specifies a canonical type. Thank you *very* much! That clears it up. Regards, Phil
Re: Decrement of Numbers in Strings (Was: [svn:perl6-synopsis] r14460 - doc/trunk/design/syn)
On Mittwoch, 23. April 2008, Larry Wall wrote: > On Wed, Apr 23, 2008 at 04:03:01PM +0100, Smylers wrote: > : The algorithm for increment and decrement on strings sounds really good, > : however I'm concerned that dealing with all that has made the common > : case of integer decrement a little less intuitive where the integer > : happens to be stored in a string, for example in this case: > : > :perl -wle '$a = 10; $b = shift; $a--; $b--; print "$a $b"' 10 > : > : Perl 5 prints "9 9", but Perl 6 will print "9 09". > > On the other hand, "09" has the advantage of still having the numeric > value 9. But the converse is not true if the user was expecting a > string decrement, since decrementing "10" in Perl 5 to get 9 is also > counterintuitive if you were expecting "09". So Perl 5 just punts, > which is the third option. In any case, there's always something > to explain to a beginner. But I think in Perl 6 we're leaning more > toward preserving information than Perl 5 did. But that doesn't really work for loops. Imagine (excuse my perl5) $a = "100"; $a-- for(1 .. 40); So ($a eq "060")? Then you'll have the problem that this gets (or might get) interpreted as octal somewhere; if not in perl6 directly (because of different base specifications), you're likely to get problems when passing that to other programs, eg. via system(). I think that's a can of work, and I'd be +1 on TSa: > If the programmer really wants to decrement "10" to "09" she has > to cast that to Str: ("10" as Str)--. So we have "10".HOW === Str > but "10".WHAT === Num Str. Regards, Phil
Re: Next Apocalypse
> Because there are some assertions that can lead the optimizer to make some > fundamental assumptions, and if those assumptions get violated or > redefined while you're in the middle of executing a function that makes > use of those assumptions, well... > > Changing a function from pure to impure, adding an overloaded operator, or > changing the core structure of a class can all result in code that needs > regeneration. That's no big deal for code you haven't executed yet, but if > you have: > > a = 1; > b = 12; > foo(); > c = a + b; > > and a and b are both passive classes, that can get transformed to > > a = 1; > b = 12; > foo(); > c = 13; > > but if foo changes the rules of the game (adding an overloaded + to a or > b's class) then the code in that sub could be incorrect. > > You can, of course, stop even potential optimization once the first "I can > change the rules" operation is found, but since even assignment can change > the rules that's where we are right now. We'd like to get better by > optimizing based on what we can see at compile time, but that's a very, > very difficult thing to do. How about retaining some "debug" info, (line number come to mind), but only at expression level?? So in your example if foo() changed the + operator, it would return into the calling_sub() at expression 4 (numbered from 1 here :-), notice that something has changed, recompile the sub, and continue processing at expression 4. Phil
calling functions/class methods
Hello everybody, first of all please forgive me if I'm using the wrong words - I'm not up to date about the (current) meanings of methods, functions, etc. I read the article http://www.cuj.com/documents/s=8042/cuj0002meyers/ There is stated (short version - read article for details): In C++ there are member functions, which are called via object.member(parameter), and non-member (possibly friend) function, which are called via function(object,parameter). I wondered whether perl6 could do both: - When called via object.member, look for a member function; if it is not found, look for a function with this name, which takes an object as first parameter. - When called the other way, look first for the function, then for a member. So both ways are possible, and in the (not-interfering) normal situation (only one of member/function defined) it would support encapsulation, in that a caller does not need to know if this function was a member or not. I fear that I'm on a completly wrong track, or that this has been decided - but I didn't find something about this. Regards, Phil
Re: The Sort Problem
> ... > so here is a (very rough and probably broken) syntax idea building on > that: > > sort :key { :descend :string .foo('bar').substr( 10, 3) } > > :key { :int .foo('baz') } > :key { :float .foo('amount') } @unsorted ; I see a kind of problem here: If the parts of the key are not fixed length but can vary you can put them in strings *only* after processing all and verifying the needed length. Example: sort :key { :descend :string .foo('bar') } :key { :int .foo('baz') } :key { :float .foo('amount') } @unsorted ; Now .foo('bar') isn't bounded with any length - so you don't know how much space to reserve. And I believe that - generating keys on every list element - storing them into a array (array of array) and - after having processed all checking the length, and - now generate the to-be-sorted-strings - sort isn't the optimal way. BTW: this requires that *all* keys are generated. In cases like - by name, - by age, - by height, - by number of toes left, - and finally sort by the social security number most of the extractions (and possibly database-queries of calculations or whatever) will not be done - at least in the current form of sort { $a->{"name"} cmp $b->{"name"} || $a->{"age"} <=> $b->{"age"} || ... That is to say, I very much like the syntax you propose, but I'm not sure if pre-generating *every* key-part is necessarily a speed-up. If there are expensive calculations you can always cut them short be pre-calculating them into a hash by object, and just query this in sort. Also I fear that the amount of memory necessary to sort an array of length N is not N*2 (unsorted, sorted), but more like N*3 (unsorted, keys, sorted), which could cause troubles on bigger arrays Regards, Phil
Re: The Sort Problem
Am Freitag, 13. Februar 2004 01:40 schrieb Larry Wall: > On Thu, Feb 12, 2004 at 04:29:58PM -0500, Uri Guttman wrote: > : again, confusing. why should the order of a binary operator mean so > : much? the order of a sort key is either ascending or descending. that is > : what coders want to specify. translating that to the correct operator > : (cmp or <=>) and the correct binary order is not the same as specifying > : the key sort order and key type (int, string, float). > > Uri is dead on with this one, guys. As I listen to this mails, I get the feeling that something like this is wanted: Key generation: @unsorted_temp = map { $k1=$_.func1('a');# ASC $k2=$_.func2('we'); # DESC [ $_, $k1, $k2 ]; } @unsorted; Now we've got an array with keys and the objects. Sorting: @sorted = sort { $a->[1] cmp $b->[1] || $b->[2] <=> $a->[2] || } @unsorted_temp; These things would have to be said in P6. So approx.: @sorted = @unsorted.sort( keys => [ { $_.func1('a'); }, { $_.func2('we'); } ], cmp => [ cmp, <=> ], order => [ "asc", "desc"], key_generation => "lazy", ); That would explain what I want. Maybe we could turn the parts around: @sorted = @unsorted.sort( 1 => [ { $_.func1('a'); }, cmp, "asc"], 2 => [ { $_.func2('we'); }, <=>, "desc"], ); or maybe use a hash instead of an array: @sorted = @unsorted.sort( 1 => [ key => { $_.func1('a'); }, op => cmp, order => "asc"], 2 => [ key => { $_.func2('we'); }, op => <=>, order => "desc"], ); If that's too verbose? I don't think so; I've stumbled often enough on $a <=> $b vs. $b <=> $a and similar, and the above just tells what should be done. Regards, Phil
question regarding rules and bytes vs characters
Hello everybody, I'm about to learn myself perl6 (after using perl5 for some time). One of my first questions deals with regexes. I'd like to parse data of the form Len: 15\n (15 bytes data)\n Len: 5\n (5 bytes data)\n \n OtherTag: some value here\n and so on, where the data can (and will) be binary. I'd try for something like my $data_tag= rule { Len\: $len:=(\d) \n $data:=([:u0 .]<$len>)\n # these are bytes }; Is that correct? And furthermore is perl6 said to be unicode-ready. So I put the :u0-modifier in the data-regex; will that DWIM if I try to match a unicode-string with that rule? Is anything known about the internals of pattern matching whether the hypothetical variables will consume (double) space? I'm asking because I imagine getting a tag like "Len: 2" and then having problems with 256MB RAM. Matching shouldn't be a problem according to apo 5 (see the chapter "RFC 093: Regex: Support for incremental pattern matching") but I'll maybe have troubles using the matched data? Thank you for all answers! Regards, Phil
Re: question regarding rules and bytes vs characters
> : Hello everybody, > : > : I'm about to learn myself perl6 (after using perl5 for some time). > > I'm also trying to learn perl6 after using perl5 for some time. :-) I wouldn't even try to compare you and me :-) > Pretty close. The way it's set up currently, $len is a reference > to a variable external to the rule, so $len is likely to fail under > stricture unless you've declared "my $len" somewhere. To make the > variable automatically scope to the rule, you have to use $?len > these days. ok. > : And furthermore is perl6 said to be unicode-ready. > : So I put the :u0-modifier in the data-regex; will that DWIM if I try to > : match a unicode-string with that rule? > > It should. However (and this is a really big however), you'll have > to be very careful that something earlier hasn't converted one form > of Unicode to another on you. For instance, if your string came in > as UTF-8, and your I/O layer translated it internally to UTF-32 or > some such, you're just completely hosed. When you're working at the > bytes level, you must know the encoding of your string. > > So the natural reaction is to open your I/O handle :raw to get binary > data into your string. Then you try to match Unicode graphemes with [ > :u2 . ] and discover that *that* doesn't work. Which is obvious when > you consider that Perl has no way of knowing which Unicode encoding > the binary data is in, so it's gonna consider it to be something like > Latin-1 unless you tell it otherwise. So you'll probably have to > cast the binary string to whatever its actual encoding is (potentially > lying about the binary parts, which we may or may not get away with, > depending on who validates the string when), or maybe we just need > to define rules like and for use > under the :u0 regime. Of course the file must be opened in binary mode - else the line-endings etc. can be destroyed in the binary data, which is bad. So Perl/Parrot can't autodetect the kind of encoding. But maybe it should be possible to do something like [:utf16be_codepoint]? Len: $?len:=(\d+) \n $?data:=([:raw .]<$len>) \n ie. say that the conversion to unicode is optional?? > : Is anything known about the internals of pattern matching whether the > : hypothetical variables will consume (double) space? > : I'm asking because I imagine getting a tag like "Len: 2" and then > : having problems with 256MB RAM. Matching shouldn't be a problem according > : to apo 5 (see the chapter "RFC 093: Regex: Support for incremental > : pattern matching") but I'll maybe have troubles using the matched data? > > My understanding is that Parrot implements copy-on-write, so you should > be okay there. ok, thank you. > Even the late ones? :-) even them - this is the *only* answer I received. Again: > : Thank you for all answers! > Larry Phil
Re: push with lazy lists
On Thursday 08 July 2004 05:25, Larry Wall wrote: > : say @x[rand]; # how about now? > > Well, that's always going to ask for @x[0], which isn't a problem. > However, if you say rand(@x), it has to calculate the number of > elements in @x, which could take a little while... I'd expect to be rand(@x) = rand(1)[EMAIL PROTECTED] = rand(1)*Inf = Inf or NaN. Case 1 (Inf) would give Inf (which can be argued, since infinite many more elements are bigger than any given finite number), and case 2 could give an exception ... Regards, Phil
Re: push with lazy lists
> >--- Larry Wall <[EMAIL PROTECTED]> wrote: > >> The hard part being to pick a random number in [0,Inf) uniformly. :-) > > > >Half of all numbers in [0, Inf) are in the range [Inf/2, Inf). Which > >collapses to the range [Inf, Inf). Returning Inf seems to satisfy the > >uniform distribution requirement: if you have a number you're waiting > >to see returned, just wait a bit longer... > > I like the 1/n trick used in the Perl Cookbook (Picking a Random Line from > a File). We could apply the same idea here: > > rand($_)<1 && ($chosen=$_) for 1...Inf; I don't believe that that could give you an value ... > All right, it would take a bit longer for your program to run, but that's > a performance issue for them to sort out on *-internals. Like, it would take a bit longer than your lifetime :-)? >-David "sure Moore's Law will deal with it in a year or two" Green 'And my new '986 does the infinite loop in under 3.5 seconds' :-) To repeat Dave and myself - if @x = 1 .. Inf; then rand(@x) should be Inf, and so print $x[rand(@x)]; should give Inf, as the infinite element of @x is Inf. But maybe we could get an index of Inf working like -1 (ie. the last value): @x = 1 .. Inf; push @x, "a"; print $x[Inf]; would print an "a" ... although, on this line of reasoning, print $x[rand(@x)]; would always print "a" I believe that an array should get an .rand-Method, which could do the right thing. @x= (1 .. Inf, "b", -Inf .. -1, "c", 1 .. Inf); print $x[rand(@x)],"\n" while (1); could give Inf Inf -Inf b c Inf -Inf and so on - an "random" element of a random part of the array, and an infinite list gives Inf (or -Inf) as a random element (as explained above in this thread). So an array would have to know of how many "pieces" it is constructed, and then choose an element among the pieces ... I'd think that's reasonable, isn't it? Regards, Phil
Re: push with lazy lists
On Wednesday 14 July 2004 08:39, David Storrs wrote: > > To repeat Dave and myself - if > > @x = 1 .. Inf; > > then > > rand(@x) > > should be Inf, and so > > print $x[rand(@x)]; > > should give Inf, as the infinite element of @x is Inf. Please take my words as my understanding, ie. with no connection to mathmatics or number theory or whatever. I'll just say what I believe is practical. > Does it even make sense to take the Infiniteth element of an > array?...after all, array indices are integers, and Inf is not an > integer. I'd believe that infinity can be integer, ie. has no numbers after the comma; and infinity is in the natural numbers (?), which are a subset of integers. > If we allow it, should we also allow people to take the > NaNth element of an array? NaN is already a "number" (internal representation), so it doesn't get converted. As there is no NaNth element, it would return either undef (as in (0,1,2)[8] ) or an exception, as it is no numeric index. > How about the 'foobar'th element? 'foobar' is converted to a number, so the 0th element is taken. > What happens if I take the Infiniteth element of a finite list? undef, as in 8th element of (1,2,3). > I think I would prefer if using Inf as an array index resulted in a > trappable error. That's a possibility. It could raise an exception as with NaN. To summarize: @x= ('a', 5 .. Inf, 'b'); $x[0] is 'a' $x['foo'] is 'a' $x[-1] is 'b' $x[2] is 6 $x[2002] is 2006 I believe these are clear and understandable. $x[Inf] is 'b' $x[-2] is Inf $x[-10] is Inf $x[-2] is Inf These would result in simply interpolating the indizes. $x[NaN] gets an exception because NaN is already of numeric type (as in $x=tan(pi/2)), but can not be associated to any index. So I'd propose to solve this argument based on "can be used as an index". An infinite array (and even an finite) can be asked for an infinite index - which has an value for infinite arrays. This is just so there's no special coding for some indizes necessary - imagine a lookup like @x = (10,9,9,8,8,8,6,3,2,1,1,1,1,0); $number = scalar()+0; print $x[10/$number]; which would work for *any* input, and just give undef for most of them. Regards, Phil BTW: is it possible to define a look-up table as in @x = (1, 2, 3, 4, 5, Inf .. Inf) to get everything from [5] on to be Inf?
Re: push with lazy lists
On Friday 16 July 2004 18:23, Jonadab the Unsightly One wrote: > > Please take my words as my understanding, ie. with no connection to > > mathmatics or number theory or whatever. I'll just say what I > > believe is practical. > > [...] > > > I'd believe that infinity can be integer, ie. has no numbers after > > the comma; and infinity is in the natural numbers (?), which are a > > subset of integers. > > If that were the case, 0/Inf would == 0. Isn't that so? 0/+Inf == 0 0/-Inf == 0 (or -0, if you wish :-) > Also, if that were the case, 0..Inf would be a finite list. (It is > trivial to prove that 0..N is a finite list with finite cardinality > for all natural numbers N. So if you set N equal to Inf, 0..Inf would > have finite cardinality, if Inf is a natural number.) > > This is obviously some new definition of Inf of which I was not > previously aware. Well, after reading my sentence one more, I see what may have caused some troubles. Inf is not in N; but *in my understanding* it fits naturally as an extension to N, that is, Inf is (or can be) integer as is "after" N... This won't be written in math books, I know. > Also, if that were the case, 0..Inf would be a finite list. (It is > trivial to prove that 0..N is a finite list with finite cardinality > for all natural numbers N. So if you set N equal to Inf, 0..Inf would > have finite cardinality, if Inf is a natural number.) If I extend the natural numbers N with Inf to a new set NI (N with Inf), then 0 .. n (for n in NI) need not be finite ... Sorry for my (very possibly wrong) opinion ... Regards, Phil
S5 and overlap
> # With the new :ov (:overlap) modifier, the current rule will match at all > possible character positions (including overlapping) and return all matches > in a list context, or a disjunction of matches in a scalar context. The > first match at any position is returned. > > $str = "abracadabra"; > > @substrings = $str ~~ m:overlap/ a (.*) a /; > > # bracadabr cadabr dabr br Maybe I'm wrong here, but I'd get $str = "abracadabra"; bracadabr cadabr dabr br (so far identical), but then I'd also expect bracad cad d brac c br which gets me to the question, if there'll be some elements multiple times in the array (they should), and in which order they appear (first match to (nth .. 1st) match, 2nd to (nth .. 2nd)) and so on ... BTW: will $str = "abracadabra"; @substrings = $str ~~ m:overlap/ a (.*) (b|d) /; get some empty strings as well (I believe it should)? Regards, Phil
Re: S5 and overlap
> > # With the new :ov (:overlap) modifier, the current rule will match at > > all possible character positions (including overlapping) and return all > > matches in a list context, or a disjunction of matches in a scalar > > context. The first match at any position is returned. > > > > $str = "abracadabra"; > > > > @substrings = $str ~~ m:overlap/ a (.*) a /; > > > > # bracadabr cadabr dabr br > > Maybe I'm wrong here, but I'd get Just found the answer, sorry. But that gets me to the next question, ie I don't understand the difference between exhaustive and overlap. Is it that overlap fixes the first point of the pattern match and does further scanning for all possibilities, and exhaustive then *after* this processing searches for another first point? Regards, Phil
Re: Zero-day rules implementation status in Pugs
On Monday 09 May 2005 19:36, Autrijus Tang wrote: > On Mon, May 09, 2005 at 10:51:53PM +1000, Damian Conway wrote: > > Autrijus wrote: > > >/me eagerly awaits new revelation from Damian... > > > > Be careful what you wish for. Here's draft zero. ;-) > > ...and here is my status report of the Zero-Day exploit, err, > implementation, in Pugs. :-) That's great. I'm just waiting for the next time, when you announce the implementation before the draft. I'm really looking forward to meet you in Vienna next month. Regards, Phil
Re: reduce metaoperator on an empty list
On Tuesday 07 June 2005 23:41, Luke Palmer wrote: > On 6/7/05, Larry Wall <[EMAIL PROTECTED]> wrote: > > Okay, I've made up my mind. The "err" option is not tenable because > > it can cloak real exceptions, and having multiple versions of reduce is > > simply multiplying entities without adding much power. So let's allow > > an optional "identvalue" trait on operators. If it's there, reduce > > can use it. If it's not, reduce returns failure on 0 args. Built-in > > addition will have an identity value of 0, while multiplication will > > have an identity value of 1. String concatenation will have "". > > We can go as far as having -Inf on [<] and +Inf on [>] > > < and > still don't make sense as reduce operators. Observe the table: > > # of args | Return (type) > 0 | -Inf > 1 | Num (the argument) > 2 | bool > ... | bool How about using initvalue twice for empty array, ie. always pad to at least two values? So $bool = [<] @empty_array; # is false (-Inf < -Inf) $bool = [<=] @empty_array; # is true (-Inf <= -Inf) Which would make some sort of sense - in an empty array there's no right element that's bigger than it's left neighbour ... And if the case [<] @empty_array should return true it's easy to use ?? ::. Just my ยค0.02. Regards, Phil
hyper/vector operation operator
Hello everyone! First of all - I do not closely follow perl6/parrot development. I read "this week on perl6" on www.perl.com but that's it - so if I'm completly off the track, let me know. Regarding the discussions about the hyper operator (eg adding elements of 2 arrays into another array) I've had the following idea: use "=>" - in perl5 there is an operator "=>" which is used in associative array assignment. In perl6 this means "pairs" IIRC, which could get interpreted as "add pairs of numbers" - it has a nice visual feeling: combines 2 elements (two lines) into 1 (one end). So an usage could be @a = @b =>+ @b; @a = @b =+> @b; @a = @b +=> @b; where the 2nd form would be the most intuitive (from reading this source). Hmm, that would leave us with @a =+>= @b; which ain't as pretty. What do you think? Regards, Phil
regex matching from a position ?
Hello everybody, I've sometimes the task to analyse a string starting from a given position, where this position changes after each iteration. (like index() does) As this is perl there are MTOWTDIIP but I'd like to know the fastest. So I used Benchmark.pm to find that out. (script attached) Excerpt from script: "from_start" => sub { m/\S*\s+(\S+)/; }, "re_dyn" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/; }, "re_once" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/o; }, "substr" => sub { substr($_,$pos) =~ m/\S*\s+(\S+)/; }, "substr_set" => sub { $tmp=substr($_,$pos); $tmp =~ m/\S*\s+(\S+)/; }, from_start is for comparision only as it should be. re_once is for comparision too as the index can't be adjusted. (and dynamically recompiling via eval() for changing indexes can't be fast enough) Results: 2505792 bytes to do ... Benchmark: timing 100 iterations of from_start, re_dyn, re_once, substr, substr_set... from_start: 1 wallclock secs ( 1.26 usr + -0.01 sys = 1.25 CPU) @ 80.00/s (n=100) re_dyn: 9 wallclock secs ( 6.52 usr + 0.00 sys = 6.52 CPU) @ 153374.23/s (n=100) re_once: 1 wallclock secs ( 1.26 usr + 0.01 sys = 1.27 CPU) @ 787401.57/s (n=100) substr: 4 wallclock secs ( 2.36 usr + 0.02 sys = 2.38 CPU) @ 420168.07/s (n=100) substr_set: 5 wallclock secs ( 3.23 usr + 0.00 sys = 3.23 CPU) @ 309597.52/s (n=100) Rate re_dyn substr_set substrre_once from_start re_dyn 153374/s -- -50% -63% -81% -81% substr_set 309598/s 102% -- -26% -61% -61% substr 420168/s 174%36% -- -47% -47% re_once787402/s 413% 154%87% ---2% from_start 80/s 422% 158%90% 2% -- So: every possibility is *much* slower than necessary! So I propose (I know that I'm a bit late, but who cares ... :-) a new option for regexes (like each, case-insensitive, and match- multiple-times) which allows to specify a position to start matching. That should be *no* overhead! eg: $text.m:from500:i /\s*(\S+)/; Currently the substr() is the fastest available option - unless somebody has more imagination than me (which I take as given). So, is there a faster possibility, is that no problem for perl6, or will something like this be implemented? Regards, Phil #!/usr/bin/perl use Benchmark qw(cmpthese); $pos=500; $runs=100; $_=`cat /etc/* 2> /dev/null`; study $_; print length($_), " bytes to do ...\n"; cmpthese($runs, { "from_start" => sub { m/\S*\s+(\S+)/; }, "re_dyn" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/; }, "re_once" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/o; }, "substr" => sub { substr($_,$pos) =~ m/\S*\s+(\S+)/; }, "substr_set" => sub { $tmp=substr($_,$pos); $tmp =~ m/\S*\s+(\S+)/; }, } );
Re: regex matching from a position ?
> Phil, please see the perlfunc entry for "pos" and the perlre section > on \G. This is what you need. Thanks a lot! I know about pos but thought it was read-only. And \G is relatively new, isn't it? Certainly wasn't existing in '97 when I learned perl :-) And the "basics" are seldom read again in the docs... Thank you very much, although it's still 32% slower: 2505792 bytes to do ... Benchmark: timing 100 iterations of from_start, pos, re_dyn, re_once, substr, substr_set... from_start: 2 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 943396.23/s (n=100) pos: 0 wallclock secs ( 1.55 usr + 0.01 sys = 1.56 CPU) @ 641025.64/s (n=100) re_dyn: 7 wallclock secs ( 6.13 usr + 0.00 sys = 6.13 CPU) @ 163132.14/s (n=100) re_once: 2 wallclock secs ( 1.22 usr + 0.00 sys = 1.22 CPU) @ 819672.13/s (n=100) substr: 2 wallclock secs ( 2.39 usr + 0.01 sys = 2.40 CPU) @ 41.67/s (n=100) substr_set: 3 wallclock secs ( 3.10 usr + 0.00 sys = 3.10 CPU) @ 322580.65/s (n=100) Ratere_dyn substr_setsubstr pos re_once from_start re_dyn 163132/s-- -49% -61% -75% -80% -83% substr_set 322581/s 98% -- -23% -50% -61% -66% substr 416667/s 155%29%-- -35% -49% -56% pos641026/s 293%99% 54%-- -22% -32% re_once819672/s 402% 154% 97% 28% -- -13% from_start 943396/s 478% 192% 126% 47% 15% -- Regards, Phil #!/usr/bin/perl use Benchmark qw(cmpthese); $pos=500; $runs=100; $_=`cat /etc/* 2> /dev/null`; study $_; print length($_), " bytes to do ...\n"; cmpthese($runs, { "from_start" => sub { m/\S*\s+(\S+)/; }, "re_dyn" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/; }, "re_once" => sub { m/^[\x00-\xff]{$pos}\S*\s+(\S+)/o; }, "substr" => sub { substr($_,$pos) =~ m/\S*\s+(\S+)/; }, "substr_set" => sub { $tmp=substr($_,$pos); $tmp =~ m/\S*\s+(\S+)/; }, "pos" => sub { pos($pos); m/\G\S*\s+(\S+)/; }, } );