Dear R Core Team and R-devel Community,
I hope this message finds you well. I am writing to propose an enhancement to the `as.character()` function in R's base package to address an inconsistency with `as.numeric()` when handling high-precision floating-point numbers. This issue has practical implications for code reliability, especially in scientific computing and data analysis, and I believe a small adjustment could align the behavior more closely with modern user expectations and R's evolving use cases. Problem Description The current behavior of `as.character()` and `as.numeric()` leads to logical inconsistencies when converting high-precision decimal strings. For example, consider the string `"7.999999999999999111822"` (22 significant digits): - `as.numeric("7.999999999999999111822")` converts this to a double-precision floating-point number (per IEEE 754), which is stored as approximately `7.9999999999999991118` (verifiable with `print(x, digits = 20)`). The difference from 8 (`8 - x ≈ 8.88178e-16`) is slightly greater than half the machine epsilon (`0.5 * .Machine$double.eps ≈ 1.11e-16`), so it is not rounded to `8.0`. - However, `as.character(as.numeric("7.999999999999999111822"))` returns `"8"`, simplifying the value and losing the small difference. This leads to a mismatch: `x < 8` is `TRUE`, but `as.numeric(as.character(x)) == 8` is also `TRUE`. This inconsistency arises because `as.numeric()` preserves the precision of the IEEE 754 double (up to ~15-17 decimal digits), while `as.character()` defaults to a human-readable simplification, often rounding to the nearest integer when the difference is below its internal display threshold. Proposed Solution I suggest either of the following enhancements to improve consistency: 1. Swap the Functionality of `format()` and `as.character()`: - Redefine `as.character(x)` to inherit `format()`'s behavior, providing a default precision (e.g., `digits = 17`) to match the effective decimal precision of double-precision floats. This would output `"7.99999999999999911"` for the example above. - Redefine `format(x)` to inherit `as.character()`'s current behavior, serving as a utility for concise, human-readable output (e.g., `"8"`). - Naming would then align with intent: `as.character()` for type conversion with precision, `format()` for formatting adjustments. 2. Add a `digits` Parameter to `as.character()`: - Extend `as.character()` to accept a `digits` argument (defaulting to `NULL` for current behavior, or e.g., `17` for precision matching). Example: x <- as.numeric("7.999999999999999111822") as.character(x, digits = 17) # "7.99999999999999911" as.character(x) # "8" (current default) - This would allow users to opt for precise conversion while preserving backward compatibility. Rationale - Consistency: `as.numeric()` and `as.character()` are similarly named base functions, suggesting they should follow analogous precision rules. The current discrepancy violates the expectation of round-trip consistency (string → numeric → string). - Modern Use Cases: With R's growing use in scientific computing and data science, high-precision handling is increasingly critical. The proposed change aligns R with tools like Python (`str(float(x))` retains more precision) and NumPy. - User Experience: Explicit control via `digits` or a redefined `as.character()` would reduce confusion, especially for users relying on type conversion for logical operations. Use Case Consider a data validation script: s1 <- "7.999999999999999111822" x <- as.numeric(s1) if (x < 8) print("Less than 8") # TRUE, correct if (as.numeric(as.character(x)) == 8) print("Equal to 8") # TRUE, inconsistent The second condition fails due to `as.character(x)` simplifying to `"8"`. With the proposed change (e.g., `as.character(x, digits = 17)`), both conditions would align with the stored value (`< 8`). Implementation Considerations - Backward Compatibility: Option 2 (adding `digits`) is less disruptive, allowing existing code to use the default `as.character()` behavior. Option 1 requires a transition period or deprecation notice. - Performance: High-precision formatting may have minor overhead, but this is negligible for modern hardware. - Documentation: Clear guidance on the new `digits` parameter or redefined roles would be essential. Next Steps I would be happy to assist with testing or drafting a patch if this proposal gains traction. Please let me know your thoughts or any additional considerations. This issue was identified with the help of Grok (xAI), and I believe community feedback could refine the approach. Thank you for your time and the incredible work on R! Best regards 龙华 longhua...@foxmail.com [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel