Hi,

Sorry not to have gotten to that earlier.

I had initially expressed on IRC that I found the newsyslog changes great, but 
now reading Mike's arguments and proposals I have serious doubts on the current 
approach.

> Sorry not to have noticed this in the review; it was only when I saw this
> message that it sunk in that we now have *three* ways to specify compression,
> and I'm not even sure what the precedence is.  I would have thought that
> <compress> would replace -c.  It's a mess if the config file has entries
> that specify J and X flags as well as none, the config file has
> <compress> zstd, and the -c option is given as well.  We now have a knob
> to override the knob to override a knob. The only reason to keep -c that
> I can think of is to specify a different compression in a single invocation,
> but as noted, changing compression requires manual operations that make
> it unreasonable to change it invocation by invocation.

I agree.  Two possibilies that I can think of from here: Remove '-c' or make it 
enable compression regardless of the log files' individual settings.
 
> I still think it would be much better to add an option letter to select
> the default compression as specified by <compress>.  This would eliminate
> the need for "legacy", and it would add the ability to have both a global
> default and an exception.  I think the redefinition of the existing flags
> to have different meanings if <compress> is given is messy.

I didn't think about that at first.  I agree.

If people want to be able to override compression settings globally, which I 
find useful, one could introduce another directive such as <compress_override> 
taking a boolean to request to apply the <compress> option regardless of the 
individual compression letters.

Another possibility is just to rename "<compress>" to "<compress_override>" 
(so, this time, not a boolean) and keep its current behavior.  This would match 
one of the suggestions above about '-c', but then there's the question of which 
one takes precedence, and I think that the command-line specification should 
prevail (for practical purposes and POLA).
 
> The entry for -c says that we plan to change the default to "none" in 15.0.
> Hopefully that would be done via <compress> and not -c.  However, there
> was significant pushback on "none" being the default.

I think the default should be "no <compress_override>", i.e., no directive.  
This may plea for having "none" mean "don't change anything" (as if the 
directive wasn't there) and have something else to deactivate compression, such 
as "no_compression" (which is really an override).  If "none" is confusing, 
then just forego it completely, and have 'newsyslog' plain fail on it (but keep 
"no_compression" as just described).

If there is consensus, I'd then change the 'J' flag currently used for all log 
files to the new chosen flag for generic compression, and have 
<compress_override> set to "bzip2" in a first step (for POLA).  Then, it could 
be changed to something else, e.g., 'zstd'.

Setting it to 'none' seems to me the worst solution (but far from being the end 
of the world).

More deeply, I remember having seen at least two claims that using filesystem's 
compression is better, without arguments.  I don't agree with that in practice. 
 The only advantage of in-filesystem compression, besides the administrative 
simplification that you can also get with the override above, is to get O(1) 
random access to big log files, and I don't see any compelling and common use 
case for it.  You certainly want to get to the end of the current log quickly, 
but that one precisely is not handled by 'newsyslog' and stays uncompressed (at 
the application level).  When you want to search for strings or patterns, you 
have to grep the whole file anyway.  You may want to immediately reach the end 
of some historical log file, e.g., when manually going back in time from the 
current log, but this should have negligible latency, and if it doesn't, than 
just use more and smaller log archives.  Same thing if you have a more 
sophisticated setup with an index of log text: Jumping to a particular location 
in the log file should have negligible latency, else apply the same recipe.  If 
your setup with index requires a single, never rotated, log file, then you're 
not even using 'newsyslog' in the first place (or should not).  Although I 
agree that in this case using a compressed filesystem (or a randomly accessible 
archive) can make sense (if your index doesn't already cover the results 
expected from your searches), I very much doubt this is a common setup.

Moreover, using in-filesystem compression can lead to degrading the compression 
ratio, since the compression method on ZFS is chosen per dataset, which 
includes a bunch of other files and use cases preventing the administrator from 
choosing the best, and slowest, compression methods.  To avoid this problem, 
one can use a separate dataset for /var/log (anyone?), but changing this on 
already running systems is a greater burden than just changing the compression 
settings in the 'newsyslog' configuration files.

I'd like people who disagree with this to present arguments for their case, if 
for nothing else to share their experience and best practices on log management.

Thanks and regards.

-- 
Olivier Certner

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to