Yes, commitlog_archiving.properties is used  both for archiving and
restoration for commitlog, but we don't have a tool or interface to do
commitlog restore online which I have created the commitlog tool issue
CASSANDRA-15156 <https://issues.apache.org/jira/browse/CASSANDRA-15156> .
>
> By default we would provide "cp $from $to" implemented by Cassandra itself
> without any process invocation.
>

What about providing a hard link for commitlog archive by default, if we
need other features, such as log compression, we may need to specify the
copy to a specific location (with sufficient capacity).

Štefan Miklošovič <smikloso...@apache.org> 于2024年9月3日周二 19:15写道:

> Scott is right that this is also coming from us having a MBean method
> which allows commands to be changed in runtime. The solution to that was
> that we can prevent it from changing dynamically by having a configuration
> property, which is actually by default set to false so FQL archiving is
> ever possible only in case an operator explicitly enables that.
>
> However, even if commands were not modifiable in runtime via JMX and even
> an operator has a chance to enable command execution explicitly, that still
> does not make it 100% secure because an attacker does not need to change /
> modify cassandra.yaml where the script to execute is configure, just the
> content of such a script which is executed.
>
> So, introducing a similar property as it was done for FQL would in this
> context mean that it would be used for disabling commitlog archiving /
> restoring altogether while for FQL it would still do its thing, it would
> just not archive it. Whole commitlog archiving / restoring is now based on
> some commands to be executed so disabling commands being executed
> practically means we disabled this whole feature as such.
>
> We could indeed make it flat out impossible to execute anything but these
> scripts might contain some custom logic, like uploading to various cloud
> storages (AWS, Azure, GCP or something completely custom), people have
> their own "storage solutions" like remove the old logs when new come in
> etc. so by disabling this altogether we would make it impossible and users
> would need to accommodate that which would break their existing solutions.
>
> What I find confusing is that commitlog_archiving.properties is used both
> for restoration AS WELL AS for archiving. If we're ever going to change how
> this works, I think that it should be somehow logically split into
> archiving and restoring parts.
>
> So, we might introduce a property in cassandra.yaml to disable
> commitlog_archiving.properties altogether and we might deprecate
> commitlog_archiving.properties way of doing this (still keep it there for
> legacy reasons), add a new cassandra.yaml configuration section for that
> and there make the archiving and the restoration pluggable. By default we
> would provide "cp $from $to" implemented by Cassandra itself without any
> process invocation. Then we might eventually drop
> commitlog_archiving.properties but if the maintenance of that is cheap I
> would just keep it, we would just flip the switch so a new way of doing
> that would be preferable and the old way of doing it (via properties) would
> need to be explicitly enabled.
>
> On Tue, Sep 3, 2024 at 11:55 AM guo Maxwell <cclive1...@gmail.com> wrote:
>
>> Thank you very much for everyone's replies, they are all very valuable
>> feedback to me.
>>
>> I don't really understand what benefit adding restrictions would serve.
>>> Would it be hard coded in C* itself, or configurable?  If it's
>>> configurable, then are we just making users enter their commands twice?
>>> This is meant to be used by an operator, so who's actually protected by an
>>> allow-list?
>>>
>>
>> I agree with you too, so I may prefer to idea 2 with implement commitlog
>> archiving in c* (not archiving by user defined shell), and deprecate the
>> commitlog_archiving.properties
>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>  configuration
>> through which  we  can set the properties of commitlog archiving. This view
>> may be similar to that of Scott.
>>
>> If I want to use rclone or aws-cli to archive my commit logs that's my
>>> prerogative.
>>>
>>
>> Yes, it may be very flexible if we set aws-cli in shell. But as I know
>> cassandra-medusa can also do this , and for me letting other tools to do
>> this work may be better , for example we can upload more than one log (if
>> log size is not big ) in a rpc to improve write throughput.
>>
>> I think we can divide this big task into several subtasks:
>>
>>    - Add this feature that Stefan mentioned before for commitlog archive
>>    CASSANDRA-18550
>>    <https://issues.apache.org/jira/browse/CASSANDRA-18550> in 5.x  and
>>    may the original commitlog_archiving.properties  deprecate.
>>    - Add the feature of archiving for cassandra (commitlog/query log/or
>>    event sstable) in the long run such as 6.0.
>>
>> I can prepare a cep if necessary. Looking forward to your feedback.
>>
>>
>> We can divide this task into several subtasks and complete them step by
>> step
>>
>>
>>
>> Jordan West <jorda...@gmail.com> 于2024年9月3日周二 00:55写道:
>>
>>> +1 to Scott’s comments. Once you expose those YAML config params outside
>>> of a single node which many of us do, this becomes an RCE attack vector.
>>> Something more structured as Scott proposes, similar to snapshots, would be
>>> preferred. Would recommend a CEP.
>>>
>>> Jordan
>>>
>>> On Fri, Aug 30, 2024 at 20:58 C. Scott Andreas <csco...@icloud.com>
>>> wrote:
>>>
>>>> I appreciate this report and would love to work toward the direction
>>>> it recommends.
>>>>
>>>> I’m also familiar with past concerns raised by others with our FQL
>>>> configuration parameters that allow passing shell commands for FQL segment
>>>> archival.
>>>>
>>>> We bias toward ensuring an MBean exists for dynamic modification of
>>>> yaml parameters. When we couple dynamic configuration updates and arbitrary
>>>> shell command execution, we introduce vectors for arbitrary code execution,
>>>> data exfiltration, and data compromise that have a lower bar to achieve
>>>> than local file write.
>>>>
>>>> I agree that we should work toward removing operator-provided shell
>>>> commands in yaml.
>>>>
>>>> For concerns like archival, these seem like areas that Cassandra could
>>>> easily accomplish itself without shelling out to gzip/zstd/lz4-compress a
>>>> file. Introducing a new config structure that declares an archival format,
>>>> accompanying implementations for compression/decompression, and deprecation
>>>> of the prior approach sounds both reasonable and desirable to me.
>>>>
>>>> – Scott
>>>>
>>>> —
>>>> Mobile
>>>>
>>>> On Aug 30, 2024, at 10:25 PM, Bowen Song via dev <
>>>> dev@cassandra.apache.org> wrote:
>>>>
>>>> 
>>>>
>>>> I'm not sure what is the concern here. Is it a malicious user
>>>> exploiting this? Or human error with unintended consequences?
>>>>
>>>> For malicious user, in order to exploit this, an attacker needs to be
>>>> able to write to the config file. The config file on Linux by default is
>>>> owned by the root user and has the -rw-r--r-- permission, that means the
>>>> attacker must either gain root access to the system or has the ability to
>>>> write arbitrary file on the filesystem. With either of these permission,
>>>> they can already do almost anything they want (e.g. modify a SUID
>>>> executable file). They wouldn't even need to exploit this to run a script
>>>> or dangerous command. So this sounds like a non-issue to me, at least on
>>>> Linux-based OSes.
>>>>
>>>> For human error, if the operator puts "rm -rf" in it, the software
>>>> should treat it as the operator actually wants to do that. I personally
>>>> don't like software attempting to outsmart human, which often ends up
>>>> interfering with legitimate use cases. The best thing a software can do is
>>>> log it, so there's some traceability if and when things go wrong.
>>>>
>>>> So, IMO, there's nothing wrong with the implementation in Cassandra.
>>>>
>>>>
>>>> On 30/08/2024 17:13, guo Maxwell wrote:
>>>>
>>>>     Commitlog has the ability of archive  log file, see
>>>> CommitLogArchiver.java
>>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java>,
>>>> we can achieve the purpose of archive and restore commitlog by
>>>> configuring archive_command and restore_command in
>>>> commitlog_archiving.properties
>>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>>> .The archive_command and restore_command can be some linux/unix shell
>>>> command.  However, I found that the shell command can actually be
>>>> filled with any script, even if "*rm -rf"* .I have tested this
>>>> situation and it finally succeeded with my test file being deleted.
>>>>
>>>>     Personally, I think it is a dangerous behavior, because if there
>>>> are no system-level restrictions and users are allowed to do anything in
>>>> these shell commands. So here I want to discuss with you whether it is
>>>> necessary to impose any restrictions on use, or do we need a new way of
>>>> archiving/restoring commitlog?
>>>>
>>>> Of course, before that, I would also like to ask, how many people are
>>>> using archive and restore of commitlog? It seems that the commitlog archive
>>>> code has not been updated for a long time.
>>>>
>>>> I have two ideas.
>>>> One is to make some restrictions on the command context based on the
>>>> existing usage methods, such as strictly only allowing the current cp/mv/ln
>>>> %path to %name.Other redundant strings in the command are not allowed.
>>>> Another one , As I roughly investigated the archive of mysql and pg.
>>>> They do not give users too much space (I am talking about letting users
>>>> define their own archiving command ), and archive directly to a designated
>>>> location. For us, I feel that we can refer to c * Incremental backup of
>>>> sstable,  add a hardlink to the commitlog to the specified location, but
>>>> this place may modify the original configuration method, such as setting
>>>> the archive location and restoring location of the node through nodetool
>>>> and deprecate the  commitlog_archiving.properties
>>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28>
>>>>  configuration.
>>>>
>>>> I am just putting forward some views  here, and looking forward to your
>>>> feedback. 😀
>>>>
>>>>

Reply via email to