Dear all, probably this is for Hervé Pagès:
I tried the following code, which should according to ?AAString not work, since ÜÖÄ are not part of any AA code. > AAString("ÜÄÖ") 3-letter "AAString" instance seq: ÜÄÖ > sessionInfo() R version 3.4.4 (2018-03-15) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] zlibbioc_1.24.0 compiler_3.4.4 tools_3.4.4 yaml_2.1.18 I don’t have access right now to the devel version of Biostrings, bit I checked out the current Code in the github repo and its recent changes. I am pretty sure, that this behavior is also in the current devel branch. Can someone confirm this? My current interest is in using the XString classes and methods for an additional biological string representation. The initial question was, how can I restrict this to a certain character set, if the characters are not saved byte encoded? The latter option is not available to me, since characters like ‚«‘ or ‚=‘ result in a two byte code using the charToRaw function. This trips up the build of the internal lookup table, which are passed down to the C backend. Therefore I looked into, how this is done for an AAString differing from a BString. I discovered, that it currently doesn‘t. I also looked into the current 2.47.12 repo, which as far as I can tell does not use the AMINO_ACID_CODE constant in the creation of an AAString object. So my questions are: - What is the best practice for extending a class from XString with a restricted character set, which is not byte encoded? - Is there a way to use byte encoding for chars with two ore more bytes? Thanks in advance for any help and suggestions. Best regards, Felix PS: regarding the second question: One could change „as.integer(charToRaw(paste(letters, collapse="")))“ to „lapply(lapply(letters,charToRaw),as.integer)“ in .letterAsByteVal, but in any case it will not be atomic anymore, which I think is required to be excepted by the C backend. I didn’t test it. [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel