Yes, commitlog_archiving.properties is used both for archiving and restoration for commitlog, but we don't have a tool or interface to do commitlog restore online which I have created the commitlog tool issue CASSANDRA-15156 <https://issues.apache.org/jira/browse/CASSANDRA-15156> . > > By default we would provide "cp $from $to" implemented by Cassandra itself > without any process invocation. >
What about providing a hard link for commitlog archive by default, if we need other features, such as log compression, we may need to specify the copy to a specific location (with sufficient capacity). Štefan Miklošovič <smikloso...@apache.org> 于2024年9月3日周二 19:15写道: > Scott is right that this is also coming from us having a MBean method > which allows commands to be changed in runtime. The solution to that was > that we can prevent it from changing dynamically by having a configuration > property, which is actually by default set to false so FQL archiving is > ever possible only in case an operator explicitly enables that. > > However, even if commands were not modifiable in runtime via JMX and even > an operator has a chance to enable command execution explicitly, that still > does not make it 100% secure because an attacker does not need to change / > modify cassandra.yaml where the script to execute is configure, just the > content of such a script which is executed. > > So, introducing a similar property as it was done for FQL would in this > context mean that it would be used for disabling commitlog archiving / > restoring altogether while for FQL it would still do its thing, it would > just not archive it. Whole commitlog archiving / restoring is now based on > some commands to be executed so disabling commands being executed > practically means we disabled this whole feature as such. > > We could indeed make it flat out impossible to execute anything but these > scripts might contain some custom logic, like uploading to various cloud > storages (AWS, Azure, GCP or something completely custom), people have > their own "storage solutions" like remove the old logs when new come in > etc. so by disabling this altogether we would make it impossible and users > would need to accommodate that which would break their existing solutions. > > What I find confusing is that commitlog_archiving.properties is used both > for restoration AS WELL AS for archiving. If we're ever going to change how > this works, I think that it should be somehow logically split into > archiving and restoring parts. > > So, we might introduce a property in cassandra.yaml to disable > commitlog_archiving.properties altogether and we might deprecate > commitlog_archiving.properties way of doing this (still keep it there for > legacy reasons), add a new cassandra.yaml configuration section for that > and there make the archiving and the restoration pluggable. By default we > would provide "cp $from $to" implemented by Cassandra itself without any > process invocation. Then we might eventually drop > commitlog_archiving.properties but if the maintenance of that is cheap I > would just keep it, we would just flip the switch so a new way of doing > that would be preferable and the old way of doing it (via properties) would > need to be explicitly enabled. > > On Tue, Sep 3, 2024 at 11:55 AM guo Maxwell <cclive1...@gmail.com> wrote: > >> Thank you very much for everyone's replies, they are all very valuable >> feedback to me. >> >> I don't really understand what benefit adding restrictions would serve. >>> Would it be hard coded in C* itself, or configurable? If it's >>> configurable, then are we just making users enter their commands twice? >>> This is meant to be used by an operator, so who's actually protected by an >>> allow-list? >>> >> >> I agree with you too, so I may prefer to idea 2 with implement commitlog >> archiving in c* (not archiving by user defined shell), and deprecate the >> commitlog_archiving.properties >> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28> >> configuration >> through which we can set the properties of commitlog archiving. This view >> may be similar to that of Scott. >> >> If I want to use rclone or aws-cli to archive my commit logs that's my >>> prerogative. >>> >> >> Yes, it may be very flexible if we set aws-cli in shell. But as I know >> cassandra-medusa can also do this , and for me letting other tools to do >> this work may be better , for example we can upload more than one log (if >> log size is not big ) in a rpc to improve write throughput. >> >> I think we can divide this big task into several subtasks: >> >> - Add this feature that Stefan mentioned before for commitlog archive >> CASSANDRA-18550 >> <https://issues.apache.org/jira/browse/CASSANDRA-18550> in 5.x and >> may the original commitlog_archiving.properties deprecate. >> - Add the feature of archiving for cassandra (commitlog/query log/or >> event sstable) in the long run such as 6.0. >> >> I can prepare a cep if necessary. Looking forward to your feedback. >> >> >> We can divide this task into several subtasks and complete them step by >> step >> >> >> >> Jordan West <jorda...@gmail.com> 于2024年9月3日周二 00:55写道: >> >>> +1 to Scott’s comments. Once you expose those YAML config params outside >>> of a single node which many of us do, this becomes an RCE attack vector. >>> Something more structured as Scott proposes, similar to snapshots, would be >>> preferred. Would recommend a CEP. >>> >>> Jordan >>> >>> On Fri, Aug 30, 2024 at 20:58 C. Scott Andreas <csco...@icloud.com> >>> wrote: >>> >>>> I appreciate this report and would love to work toward the direction >>>> it recommends. >>>> >>>> I’m also familiar with past concerns raised by others with our FQL >>>> configuration parameters that allow passing shell commands for FQL segment >>>> archival. >>>> >>>> We bias toward ensuring an MBean exists for dynamic modification of >>>> yaml parameters. When we couple dynamic configuration updates and arbitrary >>>> shell command execution, we introduce vectors for arbitrary code execution, >>>> data exfiltration, and data compromise that have a lower bar to achieve >>>> than local file write. >>>> >>>> I agree that we should work toward removing operator-provided shell >>>> commands in yaml. >>>> >>>> For concerns like archival, these seem like areas that Cassandra could >>>> easily accomplish itself without shelling out to gzip/zstd/lz4-compress a >>>> file. Introducing a new config structure that declares an archival format, >>>> accompanying implementations for compression/decompression, and deprecation >>>> of the prior approach sounds both reasonable and desirable to me. >>>> >>>> – Scott >>>> >>>> — >>>> Mobile >>>> >>>> On Aug 30, 2024, at 10:25 PM, Bowen Song via dev < >>>> dev@cassandra.apache.org> wrote: >>>> >>>> >>>> >>>> I'm not sure what is the concern here. Is it a malicious user >>>> exploiting this? Or human error with unintended consequences? >>>> >>>> For malicious user, in order to exploit this, an attacker needs to be >>>> able to write to the config file. The config file on Linux by default is >>>> owned by the root user and has the -rw-r--r-- permission, that means the >>>> attacker must either gain root access to the system or has the ability to >>>> write arbitrary file on the filesystem. With either of these permission, >>>> they can already do almost anything they want (e.g. modify a SUID >>>> executable file). They wouldn't even need to exploit this to run a script >>>> or dangerous command. So this sounds like a non-issue to me, at least on >>>> Linux-based OSes. >>>> >>>> For human error, if the operator puts "rm -rf" in it, the software >>>> should treat it as the operator actually wants to do that. I personally >>>> don't like software attempting to outsmart human, which often ends up >>>> interfering with legitimate use cases. The best thing a software can do is >>>> log it, so there's some traceability if and when things go wrong. >>>> >>>> So, IMO, there's nothing wrong with the implementation in Cassandra. >>>> >>>> >>>> On 30/08/2024 17:13, guo Maxwell wrote: >>>> >>>> Commitlog has the ability of archive log file, see >>>> CommitLogArchiver.java >>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java>, >>>> we can achieve the purpose of archive and restore commitlog by >>>> configuring archive_command and restore_command in >>>> commitlog_archiving.properties >>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28> >>>> .The archive_command and restore_command can be some linux/unix shell >>>> command. However, I found that the shell command can actually be >>>> filled with any script, even if "*rm -rf"* .I have tested this >>>> situation and it finally succeeded with my test file being deleted. >>>> >>>> Personally, I think it is a dangerous behavior, because if there >>>> are no system-level restrictions and users are allowed to do anything in >>>> these shell commands. So here I want to discuss with you whether it is >>>> necessary to impose any restrictions on use, or do we need a new way of >>>> archiving/restoring commitlog? >>>> >>>> Of course, before that, I would also like to ask, how many people are >>>> using archive and restore of commitlog? It seems that the commitlog archive >>>> code has not been updated for a long time. >>>> >>>> I have two ideas. >>>> One is to make some restrictions on the command context based on the >>>> existing usage methods, such as strictly only allowing the current cp/mv/ln >>>> %path to %name.Other redundant strings in the command are not allowed. >>>> Another one , As I roughly investigated the archive of mysql and pg. >>>> They do not give users too much space (I am talking about letting users >>>> define their own archiving command ), and archive directly to a designated >>>> location. For us, I feel that we can refer to c * Incremental backup of >>>> sstable, add a hardlink to the commitlog to the specified location, but >>>> this place may modify the original configuration method, such as setting >>>> the archive location and restoring location of the node through nodetool >>>> and deprecate the commitlog_archiving.properties >>>> <https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties#L28> >>>> configuration. >>>> >>>> I am just putting forward some views here, and looking forward to your >>>> feedback. 😀 >>>> >>>>