I have installed this and am closing the bug report.
On Wed, Apr 20, 2016 at 11:21 PM, Paul Eggert wrote:
> I'm attaching a revised patch, relative to the latest grep, to implement the
> idea of the Bug#18777 patch. This revision calls the new array "never_trail"
> instead of "always_character_boundary" to nail down the concept a bit more
> precisel
I'm attaching a revised patch, relative to the latest grep, to implement the
idea of the Bug#18777 patch. This revision calls the new array "never_trail"
instead of "always_character_boundary" to nail down the concept a bit more
precisely. It also removes what appears to be an unnecessary p < mb
On Fri, 19 Dec 2014 00:54:58 +0900
Norihiro Tanaka wrote:
> On Thu, 18 Dec 2014 01:40:18 -0800
> Thanks, I understood that you said. You are right. I changed the patch
> so that always_character_boundary is not pruned even if WCP != NULL, and
> fixed the API document.
I fixed a mismatch with t
On Thu, 18 Dec 2014 01:40:18 -0800
Paul Eggert wrote:
> Why? The (only) caller with WCP != NULL doesn't use *WCP when
> skip_remains_mb (D, P, ..., WCP) returns P. So it's OK to not set *WCP
> in that case.
Thanks, I understood that you said. You are right. I changed the patch
so that always_
Norihiro Tanaka wrote:
if WCP != NULL, we must set a wide character for 0x95 0x5c to WCP before return
P.
Why? The (only) caller with WCP != NULL doesn't use *WCP when skip_remains_mb
(D, P, ..., WCP) returns P. So it's OK to not set *WCP in that case.
On Wed, 17 Dec 2014 09:46:09 -0800
Paul Eggert wrote:
> Yes, and that's the point: we don't want this if-statement to be pruned
> if WCP != NULL. We want the code to return P right away in the typical
> case where P is at a character boundary. If MBP is way less than P,
> this will save the wor
On 12/17/2014 09:21 AM, Norihiro Tanaka wrote:
If WCP != NULL, all of following code will be pruned, although I think
that it is ignorable for the performance.
if (wcp == NULL && always_character_boundary[*p])
return p;
Yes, and that's the point: we don't want this if-statement to be p
On Tue, 16 Dec 2014 16:06:54 -0800
Paul Eggert wrote:
> did you mean "robust in the presence of future changes?
Yes. However, I might have made too big a deal of the effect about
"Portable".
> True, but I wasn't worried so much about that. I was worried about the
> case where WCP != NULL: the
Norihiro Tanaka wrote:
However, first it is no longer portable after
remove it.
"portable"? This issue is independent of platform, surely. By "portable" did
you mean "robust in the presence of future changes?
Second if it is compiled with GCC 4.3 or later, the function
is inlined by and "
On Tue, 16 Dec 2014 09:12:21 -0800
Paul Eggert wrote:
>
> This part of the patch does too much work, as the caller inspects *WCP
> only when skip_remains_mb returns a value not equal to p. So there's
> no need for the "wcp == NULL &&" test in the patch. Instead, the
> documented API can change,
On 12/16/2014 04:42 AM, Norihiro Tanaka wrote:
Thanks for the review and suggestion. If using_utf8 () is true, we can
set always_character_boundary to true except 0x80-0xbf.
Even better, thanks.
>This won't assign anything to *WCP, contrary to the documented API for
>for skip_remains_mb. T
On Mon, 15 Dec 2014 09:43:54 -0800
Paul Eggert wrote:
> Can't we improve this when using_utf8 () is true? In that case, every
> ASCII character is always single byte. Also, the bytes 0xc0, 0xc1,
> and 0xf5 through 0xff can be added to the table: they are not
> single-byte characters but they ar
On 12/15/2014 06:59 AM, Norihiro Tanaka wrote:
+/* True if each byte can not occur inside a multibyte character */
+static bool always_single_byte[NOTCHAR];
+
+static void
+dfaalwayssb (void)
+{
+ size_t i;
+ unsigned char const uc[] = { '\0', '\n', '\r', '.', '/' };
+ for (i = 0; i < sizeof
On Mon, 20 Oct 2014 10:07:20 -0600
Eric Blake wrote:
> POSIX requires that NUL, slash, dot, newline, and carriage return all be
> single bytes that cannot occur inside a multibyte character (because
> they have special meaning to file name resolution and/or terminal
> interaction); it added this
arn...@skeeve.com wrote:
> Gawk does not remove CR in advance, unless someone specifically
> set RS = "\r\n", in which case the full regex matcher is used
> to first find \r\n in the raw input buffer.
Thanks, I also confirmed it on source code of Gawk.
> So for gawk, adding a check for (c == eolb
Hi.
Norihiro Tanaka wrote:
> arn...@skeeve.com wrote:
> > I would think adding a check for '\r' would be safe and would help
> > too; given that on Windows systems '\r' generally occurs just as
> > frequently as '\n', it should give a nice speedup for gawk on those
> > systems.
>
> As I recogniz
arn...@skeeve.com wrote:
> I would think adding a check for '\r' would be safe and would help
> too; given that on Windows systems '\r' generally occurs just as
> frequently as '\n', it should give a nice speedup for gawk on those
> systems.
As I recognize that DFA and regex aren't support multipl
Norihiro Tanaka wrote:
> Eric Blake wrote:
> > Is it worth extending your optimization to all five of the
> > POSIX-guaranteed single byte characters?
>
> Thanks, but I don't want to perform it immediately. DFA has already
> regarded newline as a single byte character, but hasn't others yet. S
Eric Blake wrote:
> Is it worth extending your optimization to all five of the
> POSIX-guaranteed single byte characters?
Thanks, but I don't want to perform it immediately. DFA has already
regarded newline as a single byte character, but hasn't others yet. So,
we may need to make many changes
On 10/20/2014 09:04 AM, Norihiro Tanaka wrote:
> This patch improves performance for input string which doesn't match
> even the first part of a pattern. Although there is no less effective
> for grep as it uses a superset of DFA, gawk speeds up about 40%.
>
>
> When found newline, we can skip
Norihiro Tanaka wrote:
> $ time -p env LC_ALL=ja_JP.eucJP ./gawk '/k/ { print }' ../k
The file `k' is below.
$ yes `printf '%040d' 0` | head -1000 >../k
This patch improves performance for input string which doesn't match
even the first part of a pattern. Although there is no less effective
for grep as it uses a superset of DFA, gawk speeds up about 40%.
$ time -p env LC_ALL=ja_JP.eucJP ./gawk '/k/ { print }' ../k
(before)
real 2.85 user 2.79
23 matches
Mail list logo