wacko LC_ALL=zh_TW.utf8 egrep -i

2007-08-29 Thread jidanni
$ printf Me\\nji\\n|LC_ALL=zh_TW.utf8 egrepMe\|ji
Me
ji
$ printf Me\\nji\\n|LC_ALL=zh_TW.utf8 egrep -i Me\|ji
ji
$ printf Me\\nji\\n|LC_ALL=zh_TW.utf8 egrep -i Me
Me
$ printf Me\\nji\\n|LC_ALL=zh_TW.utf8 egrep -i me\|ji
ji

GNU grep 2.5.3


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: wacko LC_ALL=zh_TW.utf8 egrep -i

2007-08-29 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to [EMAIL PROTECTED] on 8/29/2007 6:13 AM:
> $ printf Me\\nji\\n|LC_ALL=zh_TW.utf8 egrep -i me\|ji
> ji
> 
> GNU grep 2.5.3

Wrong list.  Coreutils does not provide grep.

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG1Wh+84KuGfSFAYARAqnTAJ48RQLzdI8r0jQ7tp2dzD6OWXWjTQCfbURC
zqOXwJeZRlhH2/gBkSk/vus=
=TNtL
-END PGP SIGNATURE-


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: [PATCH] Command line parsing of ls with genparse

2007-08-29 Thread Michael Geng
On Tue, Aug 28, 2007 at 09:08:51PM -0600, Eric Blake wrote:
> According to Michael Geng on 8/28/2007 12:33 PM:
> > 
> > In the present version of genparse new strings are always printed
> > in new lines. For example (also from the ls commmand):
> > 
> > d / directory   flag"list directory entries instead 
> > of contents,"
> > "  and do not dereference symbolic 
> > links"
> 
> Why not make genparse a bit smarter, and let the user supply free-form
> text as the option description.  Genparse should then wrap it to fit an
> 80-column screen before generating the resulting usage() in the .c file.
> Then the above example would simply be:
> 
> d / directory flag \
> "list directory entries instead of contents, and do not dereference
> symbolic links"
> 
> with the __GNU_GLOSSARY__(29) being the formatting hint of where the
> auto-wrapping should occur in the output English text.

I think that's a good idea. How about adding a --linebreak[=width] command 
line switch to genparse which enables breaking lines on the help screen 
automatically to the specified width or 80 columns if --linebreak is given
without argument?

Michael


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: [PATCH] Command line parsing of ls with genparse

2007-08-29 Thread Michael Geng
On Tue, Aug 28, 2007 at 01:21:46PM -0700, Eric Blake-1 wrote:
> 
> > 2. ls.c depends ls-clp.h (the generated parser)
> >ls-clp.h depends on ls.gp (the genparse file)
> >ls.gp depends on ls.c because ls.gp is embedded as a comment in ls.c
> >-> There is a circular dependency!
> 
> That seems wrong to me.  Isn't it really:
> 
> ls$(EXEEXT) directly depends on ls.o and ls-clp.o
> ls.o directly depends on ls.c and ls-clp.h
> ls-clp.o directly depends on ls-clp.c and ls-clp.h
> ls-clp.c directly depends on ls.gp
> ls-clp.h directly depends on ls.gp
> ls.gp directly depends on ls.c
> 
> No cycle there, even though ls.c is an indirect
> dependency of ls$(EXEEXT) through more than one
> leg of the transitive closure.

You are right. I verified this and it builds properly with the 
following modifications on src/Makefile.am:

--- coreutils-6.9.orig/src/Makefile.am  2007-03-20 08:24:27.0 +0100
+++ coreutils-6.9/src/Makefile.am   2007-08-29 21:14:29.0 +0200
@@ -48,7 +48,7 @@
 EXTRA_DIST = dcgen dircolors.hin tac-pipe.c \
   groups.sh wheel-gen.pl extract-magic c99-to-c89.diff
 BUILT_SOURCES =
-CLEANFILES = $(SCRIPTS) su
+CLEANFILES = $(SCRIPTS) su *.gp *-clp.c *-clp.h
 
 AM_CPPFLAGS = -I$(top_srcdir)/lib
 
@@ -185,14 +185,16 @@
 __SOURCES = lbracket.c
 
 cp_SOURCES = cp.c copy.c cp-hash.c
-dir_SOURCES = ls.c ls-dir.c
-vdir_SOURCES = ls.c ls-vdir.c
-ls_SOURCES = ls.c ls-ls.c
+dir_SOURCES = ls.c ls-dir.c ls-clp.c ls-clp.h
+vdir_SOURCES = ls.c ls-vdir.c ls-clp.c ls-clp.h
+ls_SOURCES = ls.c ls-ls.c ls-clp.c ls-clp.h
 chown_SOURCES = chown.c chown-core.c
 chgrp_SOURCES = chgrp.c chown-core.c
 
 mv_SOURCES = mv.c copy.c cp-hash.c remove.c
 rm_SOURCES = rm.c remove.c
+tail_SOURCES = tail.c tail-clp.c
+wc_SOURCES = wc.c wc-clp.c
 
 md5sum_SOURCES = md5sum.c
 md5sum_CPPFLAGS = -DHASH_ALGO_MD5=1 $(AM_CPPFLAGS)
@@ -363,3 +365,14 @@
| grep -Ev -f $$t &&\
  { echo 'the above variables should have static scope' 1>&2;   \
exit 1; } || :
+
+ls.$(OBJEXT): ls-clp.c ls-clp.h
+tail.$(OBJEXT): ls-clp.c tail-clp.h
+wc.$(OBJEXT): ls-clp.c wc-clp.h
+
+%-clp.c %-clp.h: %.gp
+   genparse --longmembers --internationalize -o $(*F)-clp $<
+
+%.gp: %.c
+   sed -n -e '/genparse file starts here/,/genparse file ends here/p' < 
$(*F).c | \
+   sed -e '/genparse file ends here/d' -n -e '2,$$p' > $@

Michael


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: [PATCH] Command line parsing of ls with genparse

2007-08-29 Thread Michael Geng
On Tue, Aug 28, 2007 at 01:21:46PM -0700, Eric Blake-1 wrote:
> > +++ coreutils-6.9/src/ls.c  2007-08-26 19:58:20.0 +0200
> > @@ -76,7 +76,6 @@
> >  # define SA_RESTART 0
> >  #endif
> >  
> > -#include "system.h"
> >  #include 
> 
> Why are you deleting this include?  Without it, how do you ensure
> that  is pulled in before anything else?  If you intend for
> ls-clp.h to fill this role, then it must be included before any
> system files.  Also, are you sure you are not falling foul of
> any 'make distcheck' rules in Makefile.maint?

I need the following definitions in ls-clp.c:

1. the i18n macro _()
2. the definition of PACKAGE_BUGREPORT
3. the definition of true and false

I got everything by including system.h in ls-clp.c. Unfortunately
I had to exclude it from ls.c then because there were duplicate 
definitons. I entered this when I wrote the patch for the wc command
and at that time I was happy to get it compiled, that's all.

I'm sure there is a better solution. Maybe I have to include other
files. I agree that this has to be fixed.

> > +  Cmdline(&cmdline, argc, argv);
> 
> GNU coding standards want a space between the function
> name and open (.

Right, thanks.

> > +/* Extract the following section an process it with genparse
> > +   (see http://genparse.sourceforge.net) in order to generate a parser
> > +   for the command line arguments and a usage function for printing a
> > help 
> > +   screen. */
> > +
> > +/* genparse file starts here
> > +#include 
> > +#include "system.h"
> > +#include "ls.h"
> > +
> > +#exit_value LS_FAILURE
> 
> I know the C standard requires this, but in practice, are all
> C preprocessors tolerant of comments that contain lines
> that look like preprocessing directives but which are not?

That's potentially another drawback of embedding the genparse files in the
C sources.

> > +NONE / helpflag"display this help and exit"
> > +NONE / version flag"output version information and 
> > exit"
> 
> It looks like one drawback of using genparse is that you lose
> the system.h magic that ensures consistency between all
> the apps with --help and --version, since you can't really
> use the preprocessor macros *_HELP_OPTION_* here.

I could imagine that this can be solved by adding the capability 
to include parameter definitions in a genparse file, i.e. include 
genparse files in other genparse files. There could be a shared 
genparse file with the parameter definitions for help and version 
which could be included by all other genparse files.

> > +Report bugs to <__STRING__(PACKAGE_BUGREPORT)>.
> 
> What happened to the TRANSLATOR comment that reminds
> them to add a second line, including the address to report
> translation bugs to?  Also, it isn't very obvious how this
> will affect xgettext extraction of strings that need
> translation.  Are you sure you haven't broken things
> for other locales?  Would the generated ls-clp.c need
> to be added to POTFILES.in, or is your intent still to
> have all translatable strings reside in ls.c?

If I understood the i18n mechanism right then the C preprocesor
is needed for the _() macros to take effect. So the genparse files
can't be translated directly, even if they are embedded in C files
because they are still inside of a comment. So I think ls-clp.c
would have to be added to POTFILES.in. 

I haven't investigated how a genparse based solution affects
i18n and I generally have very view experiance with i18n. I 
would expect that problems are caused by different partitioning 
of text. In the present version of ls the usage() function calls 
fputs() several times. The genparse version prints everything in 
1 single call to printf(). So the usage() text in the present ls.c
is split into multiple _() macros, whereas ls-clp.c uses 1 single _() 
macro for the whole help screen. Do you agree that this is the main
source of trouble? Do you see other problems? I haven't fully 
thought this through but I think I could change genparse such that 
the user can control when a new print command should start thus 
giving control of partitioning translatable text to the user.

Michael


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: [PATCH] Command line parsing of ls with genparse

2007-08-29 Thread Jim Meyering
[EMAIL PROTECTED] (Michael Geng) wrote:
...
> of text. In the present version of ls the usage() function calls
> fputs() several times. The genparse version prints everything in
> 1 single call to printf(). So the usage() text in the present ls.c
> is split into multiple _() macros, whereas ls-clp.c uses 1 single _()
> macro for the whole help screen.

Consider separating it into strings no longer than 509 bytes each
and printing them separately.  That's a portability limitation imposed
by some c89 compilers.  (see gcc's -Woverlength-strings)

If you were to run "make distcheck", this and some other problems
would be exposed.  For example, you added at least one function
that was not declared static.

With your changes does "make check" still pass?


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Minutes of the August 28th 2007 teleconference

2007-08-29 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Andrew Josey on 8/29/2007 12:16 AM:
> Austin Group Minutes of the 28th August 2007 Teleconference Austin-379 Page 1 
> of 1
> XCU ERN 165 mv Accept
> 
> Send down the interpretations track.
> The standard is clear, the standard is wrong , concerns
> are being forwarded to the sponsor.
> 
> XCU ERN 166 mv Accept as marked below
> 
> Send down the interpretations track.
> The standard is ambiguous, no conformance distinctions can be 
> made about different implementations , concerns
> are being forwarded to the sponsor.

We should consider editing both of these interpretations to also apply to
ln.  The coreutils list noted, just this month, that
 'ln a/f b/f c && rm -Rf a b'
risks losing user data; and 'ln a /' should not attempt to create '//a'.

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG1gV+84KuGfSFAYARAiqoAKC5z3Zk3q/89pp1kPRHV4D5D6OHpACgqZeo
Jy+ZY/4PcOP4CzgCuX/PaQ4=
=p9ry
-END PGP SIGNATURE-


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Minutes of the August 28th 2007 teleconference

2007-08-29 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Eric Blake on 8/29/2007 5:47 PM:
> We should consider editing both of these interpretations to also apply to
> ln.  The coreutils list noted, just this month, that
>  'ln a/f b/f c && rm -Rf a b'
> risks losing user data;

That example should have read:
 'ln -f a/f b/f c && rm -Rf a b'

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG1g9e84KuGfSFAYARAiSIAJ9KlxCvA3bJEQrklAR+LGTLwCUttACgvI8R
KooGPUK41dCjmhApjepE/CI=
=5Nbi
-END PGP SIGNATURE-


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: [PATCH] Command line parsing of ls with genparse

2007-08-29 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Michael Geng on 8/29/2007 2:29 PM:
> On Tue, Aug 28, 2007 at 01:21:46PM -0700, Eric Blake-1 wrote:
>>> +++ coreutils-6.9/src/ls.c  2007-08-26 19:58:20.0 +0200
>>> @@ -76,7 +76,6 @@
>>>  # define SA_RESTART 0
>>>  #endif
>>>  
>>> -#include "system.h"
>>>  #include 
>> Why are you deleting this include?  Without it, how do you ensure
>> that  is pulled in before anything else?  If you intend for
>> ls-clp.h to fill this role, then it must be included before any
>> system files.  Also, are you sure you are not falling foul of
>> any 'make distcheck' rules in Makefile.maint?
> 
> I need the following definitions in ls-clp.c:

My complaint was not that you moved #include "system.h" to ls-clp.h (via
the genparse chunk), but that you forgot to put #include "ls-clp.h" first,
prior to .  Remember, in gnulib-based projects, 
absolutely has to be included prior to any system headers, because we
provide replacement system headers (such as a replacement ), and
our replacements sometimes depend on the contents of  (although
we are trying to fix those cases where we can).

> 
> 1. the i18n macro _()
> 2. the definition of PACKAGE_BUGREPORT
> 3. the definition of true and false

Shouldn't true and false just come from C99?  Or even the gnulib
 module?  Here's a case where providing your own definition in
ls-clp.h is liable to break if you first include "system.h" (which picks
up the C99 or gnulib ).

>> What happened to the TRANSLATOR comment that reminds
>> them to add a second line, including the address to report
>> translation bugs to?  Also, it isn't very obvious how this
>> will affect xgettext extraction of strings that need
>> translation.  Are you sure you haven't broken things
>> for other locales?  Would the generated ls-clp.c need
>> to be added to POTFILES.in, or is your intent still to
>> have all translatable strings reside in ls.c?
> 
> If I understood the i18n mechanism right then the C preprocesor
> is needed for the _() macros to take effect.

Not quite - xgettext is not a C preprocessor, rather it is a regular
expression matcher.  It recognizes C comments, and normally will not
extract any language comments inside them, but on the other hand, the
gettext manual recommends teaching xgettext the patterns to look for so
that translations can be grabbed from original source files rather than
from generated byproducts (ie. POTFILES.in would list getdate.y, not the
bison-generated getdate.c, if getdate has translatable strings).  Really,
the role of the C preprocessor here is to make typing gettext () shorter,
as in _(); provided that xgettext is told that _ marks a translatable string.

> So the genparse files
> can't be translated directly, even if they are embedded in C files
> because they are still inside of a comment. So I think ls-clp.c
> would have to be added to POTFILES.in. 

I think that is true, unless you can teach xgettext to look inside comments.

> 
> I haven't investigated how a genparse based solution affects
> i18n and I generally have very view experiance with i18n. I 
> would expect that problems are caused by different partitioning 
> of text. In the present version of ls the usage() function calls 
> fputs() several times.

Not only because of the 509-character string literal limit that Jim
mentioned, but also because gettext recommends providing no more than
about 6 or 7 lines of text to the translator at a time.  The more lines
there are to translate in one go, the harder it is for a translator to
spot the minor change embedded in those lines when all you do is edit one
word in the string.  The gettext manual talks more about this.

> The genparse version prints everything in 
> 1 single call to printf(). So the usage() text in the present ls.c
> is split into multiple _() macros, whereas ls-clp.c uses 1 single _() 
> macro for the whole help screen. Do you agree that this is the main
> source of trouble?

Yes, that's definitely part of the problem.  The other part is that the
_() macro only works if xgettext was able to extract the string to begin with.

> Do you see other problems? I haven't fully 
> thought this through but I think I could change genparse such that 
> the user can control when a new print command should start thus 
> giving control of partitioning translatable text to the user.

Sounds like it would definitely be needed before coreutils could consider
switching to genparse.

By the way, thanks for your efforts in trying to improve all of this.
Even if Jim doesn't accept your code, it is making genparse better, and it
is finding areas in coreutils that could use improvement regardless of how
option-parsing code is generated.

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.m