At first I was somewhat confused about what the current behaviour is, both
by this thread, the documentation and my experience with file matching
algorithms used by other programs such as the various shells.

Judging from your example below I believe you may be confused about the
current behaviour as well.

The current code allows "*" to match any character INCLUDING the "/".

This means, given the pattern "*kern", and using WildFile, the following
files would match:

/xxx/kern
/xxx/abc_kern
/xxx/yyy/kern

But the following would not unless an asterisk is added to the end of the
pattern or WildDir or Wild was used.

/xxx/kern/yyy
/xxx/abc_kern/yyy

If you wanted "*kern" to match as part of the path or as a file then you
would have to use Wild rather than WildFile.  This will be matched when that
directory is processed as the tree is descended.

Perhaps most worrisome of all is that the following would match the pattern
"/test*.dat"

/test/archive/important/file1.dat

Or given the pattern "/xxx/yyy/*.c" the following would match

/xxx/yyy/abc.c
/xxx/yyy/backup_copy/abc.c

This is how the code works today.  I think these last two cases are
counter-intuitive and likely to cause someone problems at some point.  Off
the top of my head, I can't think of a case where it would be desirable
behaviour.

With this understanding of the current behaviour I believe it is important
to make FNM_FILE_NAME work properly and specify it.  This would just require
adding one missing check for a "/" in fnmatch() when FNM_FILE_NAME is set.
There isn't much difference in the current code when this flag is specified.
It certainly doesn't behave the way it is documented in GNU or POSIX. 

However fixing this "bug" would mean that the pattern "*.tmp" would no
longer match anything since absolute paths are always supplied.  Thus the
origins of my original suggestion.  While you could add a new keyword for
matching against just the base name you would either have to not fix the
above behaviour or break existing configurations.

As far as the code example is concerned I wouldn't implement it that way in
production code either.  I just included it to illustrate what I was
suggesting using the minimum changes required.  In production code I would
only scan the filename and patterns once.

While you could use regex the patterns would be much more complicated and
harder for the average user, who is familiar with using the shell, to
understand.  There is a reason why programs that deal with just filenames
use the glob(7) and fnmatch(3) patterns.

I think the reason that only one user has mentioned it in the last 5 years
is due to a few factors,

        the code mostly works as expected,

        I believe that Wild patterns are primarily used for exclusions and
most users don't examine which files are being excluded until they aren't
there when you try to restore.
        

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kern Sibbald
Sent: Sunday, April 16, 2006 10:19 AM
To: Robert Nelson
Cc: 'Martin Simmons'; [EMAIL PROTECTED];
bacula-users@lists.sourceforge.net
Subject: Re: [Bacula-devel] Surprise bug + Scratch pool algorithm

On Sunday 16 April 2006 11:52, Robert Nelson wrote:
> But I think the behaviour would be very intuitive.  If you look at how 
> it is used below in the example from the Windows FileSet I think it is 
> fairly obvious.  I also think it is much clearer and easier to 
> maintain than the corresponding regex would be.  The code change was 
> minimal and didn't require any modification of the buffers themselves.  
> Here is the diff for
> WildFile:

Well, I am sorry, but I don't agree with you on the point of it being very
intuitive.  With your change, I would no longer be able to do something
based on a part of the path -- for example, suppose I want to compress all
files where any part of the path has kern in the name.  I can do it with

   WildFile = "*kern"

With your suggestion, that would only match the filename part and would
never match against something in the path.

In addition, if I did implement it, I wouldn't do it as in the code below
because that code has a huge performance penalty especially if you are
dealing with 3 million files.  I leave it to you to work out why.

After a good deal of thought, IMO the correct way to solve the problem is
with regex, or possibly if it is really necessary with another directive
that explicitly lets the user match against only the filename part rather
than the full path, but I don't think a new directive will really be
necessary since no one has asked for it in the 5 years it has been
programmed.

>
>        } else  {
>           for (k=0; k<fo->wildfile.size(); k++) {
> -            if (fnmatch((char *)fo->wildfile.get(k), ff->fname,
fnmode|ic)
> == 0) {
> +            const char *pattern = (const char *)fo->wildfile.get(k);
> +            const char *fname;
> +
> +            if (strchr(pattern, '/') != NULL || (fname =
> strrchr(ff->fname, '/')) == NULL)
> +               fname = ff->fname;
> +            else
> +               fname++;
> +
> +            if (fnmatch(pattern, fname, fnmode|ic) == 0) {
>                 if (ff->flags & FO_EXCLUDE) {
>
> -----Original Message-----
> From: Kern Sibbald [mailto:[EMAIL PROTECTED]
> Sent: Sunday, April 16, 2006 1:18 AM
> To: Robert Nelson
> Cc: 'Martin Simmons'; [EMAIL PROTECTED];
> bacula-users@lists.sourceforge.net
> Subject: Re: [Bacula-devel] Surprise bug + Scratch pool algorithm
>
> On Sunday 16 April 2006 00:21, Robert Nelson wrote:
> > Couldn't you handle both cases transparently.  If the pattern has a 
> > "/" in it then pass the full name, otherwise just pass the basename 
> > to
>
> fnmatch().
>
> > That way you get both behaviours without breaking existing examples 
> > and configs.
> >
> > Ironically the Windows example FileSet in the manual expects the 
> > above behaviour since it has both
> >
> >     WildFile = "[A-Z]:/WINNT/system32/dhcp/tmp.edb"
> > And
> >     WildFile = "*.tmp"
>
> That is an interesting idea, but probably not something I would do, 
> because it makes matching more complicated by altering the input data 
> (filenames) depending on the pattern.
>
> Tar has a similar feature, and I doubt that many on this list know 
> about it or that anyone on this list can explain exactly how it works.
>
> Since wild-cards are terribly incomplete, the solution to the 
> limitations users will have with wild-cards is to use Bacula's regular 
> expressions, which are now implemented (experimentally) in Win32 in 
> version 1.38.8.  The only problem with the Win32 regex is that it is 
> untested and it does not have an "ignore case", which I will probably add
in a future version.
>
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Kern 
> > Sibbald
> > Sent: Monday, April 10, 2006 5:09 AM
> > To: Martin Simmons
> > Cc: [EMAIL PROTECTED];
> > bacula-users@lists.sourceforge.net
> > Subject: Re: [Bacula-devel] Surprise bug + Scratch pool algorithm
> >
> > On Monday 10 April 2006 13:15, Martin Simmons wrote:
> > > >>>>> On Mon, 10 Apr 2006 12:22:59 +0200, Kern Sibbald 
> > > >>>>> <[EMAIL PROTECTED]>
> > > >>>>> said:
> > > >
> > > > Hello,
> > > >
> > > > It seems that it is becoming more frequent (probably because of 
> > > > the increasing number of Bacula users) that users submit support 
> > > > questions to the bugs database.  This morning a user submitted a 
> > > > bug stating that the WildFile option was broken. Normally, I 
> > > > would have dismissed this as a support problem because most of 
> > > > us realize that wild-cards and regexes are awfully tricky.
> > > >
> > > > However, this user presented a *really* simple case with debug 
> > > > output, so I took a look at it, and surprise both WildFile and 
> > > > RegexFile are broken because they match against the full path 
> > > > and filename rather than just the filename.
> > > >
> > > > I wonder how many users have torn out their hair trying to 
> > > > figure out why WildFile or RegexFile didn't work :-(
> > >
> > > Are you really sure that is a bug?  I think the word "filename" in 
> > > the documentation is ambiguous, but when it says "No directories 
> > > will be matched by this directive" it does not mean that the 
> > > matching is performed only on the basename part.
> > >
> > > The examples in "A Windows Example FileSet" are also written to 
> > > assume that WildFile compares the whole name.
> > >
> > > The current behaviour is very useful because it allows files in 
> > > selected directories to be matched, without accidentally matching 
> > > subdirectories (as Wild will do).
> >
> > After a little more thought about this, I'm not so sure I should 
> > change the behavior. It is not what I had originally intended (I 
> > didn't program it), but to change it now, given all the examples in 
> > the doc would create a number of problems.
> >
> > I think the best solution is to ensure that the documentation is 
> > extremely clear, then if there is really a demand, implement a new 
> > option such as WildFilename that matches against only the filename
>
> (basename).
>
> > --
> > Best regards,
> >
> > Kern
> >
> >   (">
> >   /\
> >   V_V
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by xPML, a groundbreaking scripting 
> > language that extends applications into web and mobile media. Attend 
> > the live webcast and join the prime developer group breaking into 
> > this new coding territory!
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=12
> > 16
> > 42 _______________________________________________
> > Bacula-devel mailing list
> > [EMAIL PROTECTED]
> > https://lists.sourceforge.net/lists/listinfo/bacula-devel
>
> --
> Best regards,
>
> Kern
>
>   (">
>   /\
>   V_V

--
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bacula-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/bacula-devel





-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to