On 2015-05-22 21:01:01 -0500, David Wright wrote: > However, in https://lists.debian.org/debian-user/2015/04/msg01265.html > I was perhaps less ambiguous (point 2): > > "In which case, if you want to know how come mutt is so fast, take a > look at the source. Just to mention one optimisation I would consider: > slurp the directory and sort the entries by inode. Open the files in > inode order. > And another: it's probably faster to slurp bigger chunks of each file > (with an intelligent guess of the best buffer size) and use a fast > search for \nMessage-ID rather than reading and checking line by line. > "
This may be interesting with mmap. Otherwise, one may do unnecessary copies. > > Then I don't think that in the particular case of header validation, > > there is much gain applying regexp's on the full header at once; the > > reason is that my regexp's use the end of line as a separator (things > > like /\n[^:\s]+\s/ and /^Message-ID:.../im). So, when I read the file > > line by line, I already do a part of the job of regexp matching. > > But I would assume that regexp in languages like Perl/Python has code > far more optimised than reading files line by line. This is not clear. All my regexp's are anchored on a newline. Reading files line by line allows one to do some factoring. > So you would search for \nmessage-id:.*?\n (where .*? is > non-greedy). One can do better. The code I used in the second test was: $header =~ /^\S+:/ || $header =~ /^From / or die; $header =~ /\n[^:\s]+\s/ and die; $header =~ /^Message-ID:.*^Message-ID:/ims and die; $header =~ /^Message-ID:\s+(<\S+>)( \(added by .*\))?$/im or die; where $header is the full header. > > And finally, for each test, the header has to be read several times. > > I'm not sure why, without knowing the tests to apply (or did I miss > seeing them?). See above. > > In my case, I don't need to deal with folded headers, except validating > > the format, which is very easy with a line-by-line parsing. > > You did mention validating message-id and other headers and checking > for missing ones, but do your scripts throw all this work away and, > if so, why? For example, if you add your own distinctive Message-ID > header to any file that doesn't have one, then that's one test you > never have to repeat. I don't understand. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150525012235.ga29...@xvii.vinc17.org