> On 2019/05/17, at 6:17, Matthew DeVore <matv...@comcast.net> wrote:
>
>
>
>> On May 16, 2019, at 8:25 PM, Junio C Hamano <gits...@pobox.com> wrote:
>>>
>>> $ git rev-list --filter=tree:2 --filter:blob:limit=32k
>>
>> Shouldn't the second one say "--filter=blob:limit=32k" (i.e. the
>> first colon should be an equal sign)?
>
> That's right. Fixed locally.
>
>>
>>> Such usage is currently an error, so giving it a meaning is backwards-
>>> compatible.
>>
>> Two minor comments.
>>
>> If combine means "must satisfy all of these", '+' is probably a poor
>> choice (perhaps we want '&' instead). Also, it seems to me that
>
> I think I agree. & is more intuitive.
After I tried this in code, I noticed two problems with & which make
me prefer + again:
a. the "&" char must be quoted or escaped in the shell, even if it is
hugged by alphanumeric characters on either side:
$ echo a&b
[1] 17083
a
-bash: b: command not found
[1]+ Done echo a
$
b. visually speaking, "&" doesn't stand out very well unless it's
surrounded by whitespace, and currently it must *not* be surrounded
by whitespace:
--filter=combine:blob:none&tree:3&sparse:../foo
vs.
--filter=combine:blob:none+tree:3+sparse:../foo
>
>> having to worry about url encoding and parsing encoded data
>> correctly and securely would be far more work than simply taking
>> multiple command line parameters, accumulating them in a string
>> list, and then at the end of command line parsing, building a
>> combined filter out of all of them at once (a degenerate case may
>> end up attempting to build a combined filter that combines a single
>> filter), iow just biting the bullet and do the "potentially be
>> improved" step from the beginning.
>
> My intention actually is to support the repeated flag pretty soon, but I only
> want to write the code if there's agreement on my current approach.
>
> My justification for the URL-encoding scheme is:
>
> 1. The combined filters will eventually have to travel over the wire.
>
> 2. The Git protocol will either have repeated "filter" lines or it will
> continue to use a single filter line with an encoding scheme.
>
> 3. Continuing to use a single filter line seemed the least disruptive
> considering both this codebase and Git clones like JGit. Other clones will
> likely fail saying "unknown filter combine:" or something like that until it
> gets implemented. A paranoid consideration is that clones and proprietary
> server implementations may currently allow the "filter" line to be silently
> overridden if it is repeated.
>
> 4. Assuming we *do* use a single filter line over the wire, it makes sense to
> allow the user to specify the raw filter line as well as have the more
> friendly UI of repeating --filter flags.
>
> 5. If we use repeated "filter" lines over the wire, and later start
> implementing a more complete DSL for specifying filters (see Mercurial's
> "revsets") the repeated-filter-line feature in the protocol may end up
> becoming deprecated and we will end up back-pedaling to allow integration of
> the "&" operator with whatever new operators we need.
>
> (I very much doubt I will be the one implementing such a DSL for filters or
> resets, but I think it's a possibility)
>
>> So why are we allowing %3A there that does not even have to be
>> encoded? Shouldn't it be an error?
>
> We do have to require the combine operator (& or +) and % be encoded. For
> other operators, there are three options:
>
> 1. Allow anything to be encoded. I chose this because it's how I usually
> think of URL encoding working. For instance, if I go to
> https://public-inbox.org/git/?q=cod%65+coverage in Chrome, the browser
> automatically decodes the %65 to an e in the address bar. Safari does not
> automatically decode, but the server apparently interprets the %65 as an e. I
> am not really attached to this choice.
>
> 2. Do not allow or require anything else to be encoded.
>
> 3. Require encoding of a couple of "reserved" characters that don't appear in
> filters now, and don't typically appear in UNIX path names. This would allow
> for expansion later. For instance, "~&%*+|(){}!\" plus the ASCII range [0,
> 0x20] and single and double quotes - do not allow encoding of anything else.
>
> 4. Same requirements as 3, but permit encoding of other arbitrary characters.
>
> I kind of like 3 now that I've thought it out more.
>
>>
>> In any case, I am not quite convinced that we need to complicate the
>> parameters with URLencoding, so I'd skip reviewing large part this
>> patch that is about "decoding".
>
> It's fine if we drop the encoding scheme. I intentionally tried to limit the
> amount of work I stacked on top of it until I got agreement. Please let me
> know if anything I've said changes your perspective.
>
>>
>> Once the combined filter definition is built in-core, the code that
>> evaluates the intersection of all conditions seems to be written
>> sanely to me.
>
> Great! I actually did simplify it a bit since I sent the first roll-up.
>
> Thanks.
>