2016-11-18 15:37:16 -0800, Paul Eggert: [...] > >That might have been the case a long time ago, as I remember > >some discussion about it as it explained some wrong information > >in the documentation, but as far as I and gdb can tell, grep > >2.26 at least call pcre_exec for every line of the input with > >grep -P. > > > > Although that was true starting with commit > a14685c2833f7c28a427fecfaf146e0a861d94ba (2010-03-04), it became > false starting with commit 9fa500407137f49f6edc3c6b4ee6c7096f0190c5 > (2014-09-16). [...]
OK, it looks like I don't have the full story, and my multiple calls to pcre_exec() seems to point to something else: $ seq 10 | ltrace -e '*pcre*' ./src/grep -P . grep->pcre_maketables(0x221e2f0, 0x221e240, 1, 2) = 0x221e310 grep->pcre_compile(0x221e2f0, 2050, 0x7ffe943ec6f8, 0x7ffe943ec6f4) = 0x221e760 grep->pcre_study(0x221e760, 1, 0x7ffe943ec6f8, 0x7ffe943eb490) = 0x221e7b0 grep->pcre_fullinfo(0x221e760, 0x221e7b0, 16, 0x7ffe943ec6f4) = 0 grep->pcre_exec(0x221e760, 0x221e7b0, "", 0, 0, 128, 0x7ffe943ec700, 300) = -1 grep->pcre_exec(0x221e760, 0x221e7b0, "", 0, 0, 0, 0x7ffe943ec700, 300) = -1 grep->pcre_exec(0x221e760, 0x221e7b0, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n", 20, 0, 8192, 0x7ffe943ec4e0, 300) = 1 1 grep->pcre_exec(0x221e760, 0x221e7b0, "2\n3\n4\n5\n6\n7\n8\n9\n10\n", 18, 0, 8192, 0x7ffe943ec4e0, 300) = 1 2 grep->pcre_exec(0x221e760, 0x221e7b0, "3\n4\n5\n6\n7\n8\n9\n10\n", 16, 0, 8192, 0x7ffe943ec4e0, 300) = 1 3 grep->pcre_exec(0x221e760, 0x221e7b0, "4\n5\n6\n7\n8\n9\n10\n", 14, 0, 8192, 0x7ffe943ec4e0, 300) = 1 4 grep->pcre_exec(0x221e760, 0x221e7b0, "5\n6\n7\n8\n9\n10\n", 12, 0, 8192, 0x7ffe943ec4e0, 300) = 1 5 grep->pcre_exec(0x221e760, 0x221e7b0, "6\n7\n8\n9\n10\n", 10, 0, 8192, 0x7ffe943ec4e0, 300) = 1 6 grep->pcre_exec(0x221e760, 0x221e7b0, "7\n8\n9\n10\n", 8, 0, 8192, 0x7ffe943ec4e0, 300) = 1 7 grep->pcre_exec(0x221e760, 0x221e7b0, "8\n9\n10\n", 6, 0, 8192, 0x7ffe943ec4e0, 300) = 1 8 grep->pcre_exec(0x221e760, 0x221e7b0, "9\n10\n", 4, 0, 8192, 0x7ffe943ec4e0, 300) = 1 9 grep->pcre_exec(0x221e760, 0x221e7b0, "10\n", 2, 0, 8192, 0x7ffe943ec4e0, 300) = 1 10 +++ exited (status 0) +++ I don't know the details of why it's done that way, but I'm not sure I can see how calling pcre_exec that way can be quicker than calling it on each individual line/record. Note that this is still wrong: $ printf 'a\nb\0' | ./src/grep -zxP a a b Removing PCRE_MULTILINE (and get back to calling pcre_exec on every record separately) would help except in the cases where the user does: grep -xzP '(?m)a' You'd want to change: static char const xprefix[] = "^(?:"; static char const xsuffix[] = ")$"; To: static char const xprefix[] = "\A(?:"; static char const xsuffix[] = ")\z"; -- Stephane