Dear R Core Team and R-devel Community,

I hope this message finds you well. I am writing to propose an enhancement to 
the `as.character()` function in R's base package to address an inconsistency 
with `as.numeric()` when handling high-precision floating-point numbers. This 
issue has practical implications for code reliability, especially in scientific 
computing and data analysis, and I believe a small adjustment could align the 
behavior more closely with modern user expectations and R's evolving use cases.


Problem Description
The current behavior of `as.character()` and `as.numeric()` leads to logical 
inconsistencies when converting high-precision decimal strings. For example, 
consider the string `"7.999999999999999111822"` (22 significant digits):


- `as.numeric("7.999999999999999111822")` converts this to a double-precision 
floating-point number (per IEEE 754), which is stored as approximately 
`7.9999999999999991118` (verifiable with `print(x, digits = 20)`). The 
difference from 8 (`8 - x ≈ 8.88178e-16`) is slightly greater than half the 
machine epsilon (`0.5 * .Machine$double.eps ≈ 1.11e-16`), so it is not rounded 
to `8.0`.
- However, `as.character(as.numeric("7.999999999999999111822"))` returns `"8"`, 
simplifying the value and losing the small difference. This leads to a 
mismatch: `x < 8` is `TRUE`, but `as.numeric(as.character(x)) == 8` is also 
`TRUE`.


This inconsistency arises because `as.numeric()` preserves the precision of the 
IEEE 754 double (up to ~15-17 decimal digits), while `as.character()` defaults 
to a human-readable simplification, often rounding to the nearest integer when 
the difference is below its internal display threshold.


Proposed Solution
I suggest either of the following enhancements to improve consistency:


1. Swap the Functionality of `format()` and `as.character()`:
&nbsp; &nbsp;- Redefine `as.character(x)` to inherit `format()`'s behavior, 
providing a default precision (e.g., `digits = 17`) to match the effective 
decimal precision of double-precision floats. This would output 
`"7.99999999999999911"` for the example above.
&nbsp; &nbsp;- Redefine `format(x)` to inherit `as.character()`'s current 
behavior, serving as a utility for concise, human-readable output (e.g., `"8"`).
&nbsp; &nbsp;- Naming would then align with intent: `as.character()` for type 
conversion with precision, `format()` for formatting adjustments.


2. Add a `digits` Parameter to `as.character()`:
&nbsp; &nbsp;- Extend `as.character()` to accept a `digits` argument 
(defaulting to `NULL` for current behavior, or e.g., `17` for precision 
matching). Example:


&nbsp; &nbsp; &nbsp;x <- as.numeric("7.999999999999999111822")
&nbsp; &nbsp; &nbsp;as.character(x, digits = 17) &nbsp;# "7.99999999999999911"
&nbsp; &nbsp; &nbsp;as.character(x) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; # "8" (current default)


&nbsp; &nbsp;- This would allow users to opt for precise conversion while 
preserving backward compatibility.


Rationale
- Consistency: `as.numeric()` and `as.character()` are similarly named base 
functions, suggesting they should follow analogous precision rules. The current 
discrepancy violates the expectation of round-trip consistency (string → 
numeric → string).
-&nbsp;Modern Use Cases: With R's growing use in scientific computing and data 
science, high-precision handling is increasingly critical. The proposed change 
aligns R with tools like Python (`str(float(x))` retains more precision) and 
NumPy.
- User Experience: Explicit control via `digits` or a redefined 
`as.character()` would reduce confusion, especially for users relying on type 
conversion for logical operations.


Use Case
Consider a data validation script:


s1 <- "7.999999999999999111822"
x <- as.numeric(s1)
if (x < 8) print("Less than 8") &nbsp;# TRUE, correct
if (as.numeric(as.character(x)) == 8) print("Equal to 8") &nbsp;# TRUE, 
inconsistent


The second condition fails due to `as.character(x)` simplifying to `"8"`. With 
the proposed change (e.g., `as.character(x, digits = 17)`), both conditions 
would align with the stored value (`< 8`).


Implementation Considerations
- Backward Compatibility: Option 2 (adding `digits`) is less disruptive, 
allowing existing code to use the default `as.character()` behavior. Option 1 
requires a transition period or deprecation notice.
- Performance: High-precision formatting may have minor overhead, but this is 
negligible for modern hardware.
- Documentation: Clear guidance on the new `digits` parameter or redefined 
roles would be essential.


Next Steps
I would be happy to assist with testing or drafting a patch if this proposal 
gains traction. Please let me know your thoughts or any additional 
considerations. This issue was identified with the help of Grok (xAI), and I 
believe community feedback could refine the approach.


Thank you for your time and the incredible work on R!


Best regards


龙华
longhua...@foxmail.com
        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to