The issue was finally resolved and the package is back on CRAN! Extensive try-and-fail is not the best with my advanced arthritis, but progress is progress
I added a small section with a minimal example here: https://cpp4r.org/08-debugging.html#testing-with-docker This is a "nautical diary " I keep with a long collection of errors I found. My research is International Relations and Public Policy, so it is good to have a resource with all my Stackoverflow searches and questions in 1 place. Best, ���� Mauricio "Pach��" Vargas Sep��lveda PhD Student, Political Science University of Toronto ________________________________ From: Mauricio Vargas Sepulveda <m.sepulv...@mail.utoronto.ca> Sent: November 15, 2024 7:16 AM To: Ivan Krylov <ikry...@disroot.org> Cc: r-package-devel@r-project.org <r-package-devel@r-project.org> Subject: Re: [R-pkg-devel] Possible false negative for compiled C++ code in CRAN checks Dear R Developers, I implemented Dr. Krylov's fix, and now redatam is back to CRAN (Point 1). Dr. Eddelbuettel's suggestion about not fully mimicking CRAN gcc-san setup was also correct (Point 2). Point 1: - When the big endian issue was described here, I tried the s390x image from R-Hub, which does not work on my laptop. Then I went to a big endian server that we have at UofT and that reproduced the error! Point 2: - I was testing with docker.io/rocker/r-devel-san:latest, and I could not replicate the error. - I ended up using the R-Hub image with https://github.com/r-hub/containers/pull/81#issuecomment-2478009166, and that replicated the error line by line. Besides it, I implemented the same fix for the command line tool and the Python package. I also changed the example data by aggregating Uruguay's census to show some numbers. Just cutting the levels and show region/city is not informative. Being Redatam a closed format with no standard or specification, I ended up reading the census with my pkg, aggregating with dplyr, saving to CSV, and then converting back to Redatam with a point-and-click tool following an unofficial tutorial from You Tube. There is no easy way to aggregate and save in the same format as we do with SPSS. I added Dr. Krylov to ctb for this very useful fix! Thanks a lot to Dr. Krylov and Dr. Eddelbuettel for the suggestions, I was going in circles being unable to replicate the issue and I now asked Dr. Ligges for the possibility to know more about the CRAN specific configuration to add it to R-Hub. Best, ���� Mauricio "Pach��" Vargas Sep��lveda PhD Student, Political Science University of Toronto ________________________________________ From: Ivan Krylov <ikry...@disroot.org> Sent: November 14, 2024 4:50 PM To: Mauricio Vargas Sepulveda <m.sepulv...@mail.utoronto.ca> Cc: r-package-devel@r-project.org <r-package-devel@r-project.org> Subject: Re: [R-pkg-devel] Possible false negative for compiled C++ code in CRAN checks �� Thu, 14 Nov 2024 16:24:16 +0000 Mauricio Vargas Sepulveda <m.sepulv...@mail.utoronto.ca> ��ڧ�֧�: > After enabling the SAN flags, I cannot reproduce the gcc-san error > [2]. Can you use the rocker/r-devel-san container? It reproduces the problem for me. When reading galapagos/cg15.dic, FuzzyEntityParser::ParseEntities() keeps advancing over the file and failing to parse a single entity until it eventually calls stop() because it didn't find any entities. In a non-sanitized build, it first succeeds at 0-based offset 1095. In a sanitized build, it fails for all offsets. I think this is due to the ordering of the byte reads: https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpachadotdev%2Fopen-redatam%2Fblob%2Fbbb65242f1af5f601def1c0b971ed601d459b4f3%2Fsrc%2Freaders%2FByteArrayReader.cpp%23L176-L192&data=05%7C02%7Cm.sepulveda%40mail.utoronto.ca%7C762907f4ee8f46c7f59b08dd04f667a8%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638672178560868627%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=O8Cjgp1m%2FqQc7CAKWOl%2F6IMcAi0cIhCFnFUY7x%2FYdhU%3D&reserved=0<https://github.com/pachadotdev/open-redatam/blob/bbb65242f1af5f601def1c0b971ed601d459b4f3/src/readers/ByteArrayReader.cpp#L176-L192> In C++, an operation like the following: static_cast<uint16_t>(ReadByte()) << 8 | static_cast<uint16_t>(ReadByte()); ...depends on the order in which the compiler will choose to evaluate the calls to static_cast<uint16_t>(ReadByte()), and this order is not guaranteed to be left-to-right: https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.cppreference.com%2Fw%2Fcpp%2Flanguage%2Feval_order&data=05%7C02%7Cm.sepulveda%40mail.utoronto.ca%7C762907f4ee8f46c7f59b08dd04f667a8%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638672178560885711%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=8k0RVuOCfXwzh72IjVkzKy0yq1uM1bHZc7OBgAhKe9w%3D&reserved=0<https://en.cppreference.com/w/cpp/language/eval_order> I edited all four byte-reading functions and replaced the one-statement operations with separate statements for each of the byte reads: --- redatam.orig/src/redatamlib/readers/ByteArrayReader.cpp 2024-11-09 02:12:17.000000000 +0000 +++ redatam.new/src/redatamlib/readers/ByteArrayReader.cpp 2024-11-14 21:25:54.000000000 +0000 @@ -175,23 +175,27 @@ } uint16_t ByteArrayReader::ReadInt16LE() { - return static_cast<uint16_t>(ReadByte()) | - (static_cast<uint16_t>(ReadByte()) << 8); + uint16_t a = static_cast<uint16_t>(ReadByte()); + uint16_t b = static_cast<uint16_t>(ReadByte()) << 8; + return a | b; } uint32_t ByteArrayReader::ReadInt32LE() { - return static_cast<uint32_t>(ReadInt16LE()) | - static_cast<uint32_t>(ReadInt16LE()) << 16; + uint32_t a = static_cast<uint32_t>(ReadInt16LE()); + uint32_t b = static_cast<uint32_t>(ReadInt16LE()) << 16; + return a | b; } uint16_t ByteArrayReader::ReadInt16BE() { - return (static_cast<uint16_t>(ReadByte()) << 8) | - static_cast<uint16_t>(ReadByte()); + uint16_t a= (static_cast<uint16_t>(ReadByte()) << 8); + uint16_t b= static_cast<uint16_t>(ReadByte()); + return a| b; } uint32_t ByteArrayReader::ReadInt32BE() { - return (static_cast<uint32_t>(ReadInt16BE()) << 16) | - static_cast<uint32_t>(ReadInt16BE()); + uint32_t b = static_cast<uint32_t>(ReadInt16LE()) << 16; + uint32_t a = static_cast<uint32_t>(ReadInt16LE()); + return b | a; } } // namespace RedatamLib ...and this seems to make the error vanish. I think I see the misordering too. In the output of objdump -d redatam.Rcheck/redatam/libs/redatam.so, I see: 0000000000267010 <_ZN10RedatamLib15ByteArrayReader11ReadInt16LEEv>: 267028: e8 93 6f f5 ff call 1bdfc0 <_ZN10RedatamLib15ByteArrayReader8ReadByteEv@plt> first_byte <- ReadByte() 26702d: 41 89 c4 mov %eax,%r12d save first byte in r12 26703d: 41 c1 e4 08 shl $0x8,%r12d left-shift the first byte! 267041: e8 7a 6f f5 ff call 1bdfc0 <_ZN10RedatamLib15ByteArrayReader8ReadByteEv@plt> second byte in eax 26704a: 44 09 e0 or %r12d,%eax OR them together ...which is how you read big-endian numbers, not little-endian ones. -- Best regards, Ivan [[alternative HTML version deleted]]
______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel