As an initial step, could we introduce some sort of log warning, metric or other indicator for operators to determine if they're running with a non-UTF-8 encoding?
On Mon, Nov 28, 2022 at 1:21 PM David Capwell <dcapw...@apple.com> wrote: > > It probably has to be done on a case-by-case basis > > > Yeah, this is what I feel as well… > > Does the linter provide more detail than just the list? > > > Not really, it shows how to fix but can’t really say if the fix will cause > issues… If you are not running with UTF-8 we do the right thing most of the > time, but some files “may” break… this would also be true if you > backup/restore these files on a different environment... > > > On Nov 10, 2022, at 12:44 PM, Derek Chen-Becker <de...@chen-becker.org> wrote: > > This seems fraught with peril. I think that it should be fixed, but I > also wonder what the testing requirements would be to validate no > regression. It probably has to be done on a case-by-case basis. Is it > as simple as auditing places where we're calling getBytes or > PrintReader/PrintWriter without an explicit encoding? Some of them, > like > https://github.com/apache/cassandra/blob/30ad754d7e95501ffa916bf986e4cfda1aa5e441/src/java/org/apache/cassandra/tools/HashPassword.java#L128, > look like that would be easy to address, but others seem like they > could be complicated. > > Does the linter provide more detail than just the list? > > Cheers, > > Derek > > On Fri, Nov 4, 2022 at 2:09 PM David Capwell <dcapw...@apple.com> wrote: > > > Testing out linter trying to see if it can solve a case for Simulator and see > we have 25 cases where we don’t add the encoding and rely on default, which > is based off the system… > > If we attempt to fix these cases, I am wondering if this is a regression… it > “might” be the case someone set -Dfile.encoding=ascii or updated env LANG to > something non-UTF based… > > Here is the list reported > > org.apache.cassandra.cql3.functions.JavaBasedUDFunction since first > historized release > org.apache.cassandra.db.ColumnFamilyStore since first historized release > org.apache.cassandra.db.compaction.CompactionLogger$CompactionLogSerializer > since first historized release > org.apache.cassandra.db.filter.RowFilter$CustomExpression since first > historized release > org.apache.cassandra.db.lifecycle.LogTransaction since first historized > release > org.apache.cassandra.gms.FailureDetector since first historized release > org.apache.cassandra.index.sasi.analyzer.StandardTokenizerImpl since first > historized release > org.apache.cassandra.io.sstable.SSTable since first historized release > org.apache.cassandra.io.util.FileReader since first historized release > org.apache.cassandra.io.util.FileReader since first historized release > org.apache.cassandra.io.util.FileWriter since first historized release > org.apache.cassandra.io.util.FileWriter since first historized release > org.apache.cassandra.metrics.SamplingManager since first historized release > org.apache.cassandra.metrics.SamplingManager since first historized release > org.apache.cassandra.schema.IndexMetadata since first historized release > org.apache.cassandra.security.PEMBasedSslContextFactory since first > historized release > org.apache.cassandra.tools.HashPassword since first historized release > org.apache.cassandra.tools.JMXTool$Dump$Format$3 since first historized > release > org.apache.cassandra.tools.NodeTool$NodeToolCmd since first historized release > org.apache.cassandra.tools.SSTableMetadataViewer since first historized > release > org.apache.cassandra.transport.Client since first historized release > org.apache.cassandra.utils.ByteArrayUtil since first historized release > org.apache.cassandra.utils.FBUtilities since first historized release > org.apache.cassandra.utils.GuidGenerator since first historized release > org.apache.cassandra.utils.HeapUtils since first historized release > > > > -- > +---------------------------------------------------------------+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---------------------------------------------------------------+ > > -- +---------------------------------------------------------------+ | Derek Chen-Becker | | GPG Key available at https://keybase.io/dchenbecker and | | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | +---------------------------------------------------------------+