Dear Rui,

Thank you for your reply.

I do have actually access to the chemical symbols: I have started to refactor and enhance the Rpdb package, see Rpdb::elements:
https://github.com/discoleo/Rpdb

However, the regex that you have constructed is quite heavy, as it needs to iterate through all chemical symbols (in decreasing nchar). Elements like C, and especially O, P or S, appear late in the regex expression - but are quite common in chemistry.

The alternative regex is (in this respect) simpler. It actually works (once you know about the workaround).

Q: My question focused if there is anything like is.numeric, but to parse each element of a vector.

Sincerely,


Leonard


On 10/18/2023 6:53 PM, Rui Barradas wrote:
Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu:
Dear List members,

What is the best way to test for numeric digits?

suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
# [1] NA NA NA  2 NA NA  3
The above requires the use of the suppressWarnings function. Are there
any better ways?

I was working to extract chemical elements from a formula, something
like this:
split.symbol.character = function(x, rm.digits = TRUE) {
      # Perl is partly broken in R 4.3, but this works:
      regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
      # stringi::stri_split(x, regex = regex);
      s = strsplit(x, regex, perl = TRUE);
      if(rm.digits) {
          s = lapply(s, function(s) {
              isNotD = is.na(suppressWarnings(as.numeric(s)));
              s = s[isNotD];
          });
      }
      return(s);
}

split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))


Sincerely,


Leonard


Note:
# works:
regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)


# broken in R 4.3.1
# only slightly "erroneous" with stringi::stri_split
regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://eu01.z.antigena.com/l/boS9jwics77ZHEe0yO-Lt8AIDZm9-s6afEH4ulMO3sMyE9mLHNAR603_eeHQG2-_t0N2KsFVQRcldL-XDy~dLMhLtJWX69QR9Y0E8BCSopItW8RqG76PPj7ejTkm7UOsLQcy9PUV0-uTjKs2zeC_oxUOrjaFUWIhk8xuDJWb
PLEASE do read the posting guide
https://eu01.z.antigena.com/l/rUSt2cEKjOO0HrIFcEgHH_NROfU9g5sZ8MaK28fnBl9G6CrCrrQyqd~_vNxLYzQ7Ruvlxfq~P_77QvT1BngSg~NLk7joNyC4dSEagQsiroWozpyhR~tbGOGCRg5cGlOszZLsmq2~w6qHO5T~8b5z8ZBTJkCZ8CBDi5KYD33-OK
and provide commented, minimal, self-contained, reproducible code.
Hello,

If you want to extract chemical elements symbols, the following might work.
It uses the periodic table in GitHub package chemr and a package stringr
function.


devtools::install_github("paleolimbot/chemr")



split_chem_elements <- function(x) {
    data(pt, package = "chemr", envir = environment())
    el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)]
    pat <- paste(el, collapse = "|")
    stringr::str_extract_all(x, pat)
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
split_chem_elements(mol)
#> [[1]]
#> [1] "C"  "Cl" "F"
#>
#> [[2]]
#> [1] "Li" "Al" "H"
#>
#> [[3]]
#>  [1] "C"  "Cl" "C"  "O"  "Al" "P"  "O"  "Si" "O"  "Cl"


It is also possible to rewrite the function without calls to non base
packages but that will take some more work.

Hope this helps,

Rui Barradas



______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to