Hi,
Am 03.12.23 um 12:59 schrieb Stephan Bergmann:
On 12/2/23 16:38, Mike Kaganski wrote:
On 02.12.2023 17:46, Rene Engelhard wrote:
In any case this is bad. My filesystem (I think from 2020 or so)
apparently shows it (ls -l does) but I wouldn't be sure for other,
old ones (like Debians build machines). The locale this fails under
definitely is UTF-8 though.
Pre
<https://git.libreoffice.org/core/+/fbf025b4903bfcb93c3d4bbf1ebbf860cf11618d%5E%21>
"Make testHybridPDFFile Windows-only, and filenames in repo
ASCII-only", I can reproduce the failure on Linux when not using an
UTF-8 locale but explicitly specifying an e.g. ASCII locale (and thus
an osl_getThreadTextEncoding value of RTL_TEXTENCODING_ASCII_US) with
`LC_CTYPE=C make -O CppunitTest_filter_textfilterdetect
CPPUNIT_TEST_NAME=testHybridPDFFile::TestBody`. t=`mktemp -q
-d`; \
But in my case this fails with
cd $(SOURCE_TREE) && \
export PATH=$(BUILD_PATH); \
export TMPDIR=$$t; \
export HOME=$$t; \
export LOCPATH=$(CURDIR)/debian/locales; \
export LANG=en_US.UTF-8; \
export TZ=UTC; \
unset DISPLAY; \
unset CONNECTIVITY_TEST_MYSQL_DRIVER; \
export PARALLELISM=1; \
if [ -x /usr/bin/gdb ]; then ulimit -c unlimited ||
true; fi && \
$(TEST_TIMEOUT) $(MAKE) -k check || $(TEST_TIMEOUT)
$(MAKE) check && \
rm -rf $$t
so with a UTF-8 locale. (which is generated before that rule)
For better or worse, the payload of LO "internal" file URLs is always
considered to be a UTF-8 encoding of the actual system pathname. It is
*not* a byte-for-byte representation of the bytes that make up the
Unix system pathname.
What thus happens here is that the file UCP's TaskManager::getv ->
osl::DirectoryItem::get -> osl_getDirectoryItem ->
osl::detail::convertUrlToPathname -> getSystemPathFromFileUrl ->
decodeFromUtf8 -> convert -> UnicodeToTextConverter_Impl::convert ->
rtl_convertUnicodeToText tries to translate the Unicode chars of
"hybrid_writer_абв_αβγ.pdf" to osl_getThreadTextEncoding() ==
RTL_TEXTENCODING_ASCII_US, but which doesn't work because ASCII has no
representation of the Cyrillic and Greek letters.
I did some more tests.
In my standard local build environment (cowbuilder[1] --login chroot) it
fails.
if I chroot() into exactly that same chroot (as it is on disk), it works.
If I use a pbuilder --login chroot it succeeds.
I remember some sal (tmpfile?) tests which exhibited the very same mix
once, too (which I never reported, and I think even pbuilder --login
failed), but not in recent LOs.
Regards,
Rene