On Thu, 4 May 2023 23:59:33 +0300 Leonard Mada via R-help <r-help@r-project.org> wrote:
> strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T) > # "a" "bc" "," "def" "," "" "adef" "," "," "gh" > > strsplit("a bc,def, adef ,,gh", " |(?<! )(?=,)|(?<=,)(?![ ])", perl=T) > # "a" "bc" "," "def" "," "" "adef" "," "," "gh" > > strsplit("a bc,def, adef ,,gh", " |(?<! )(?=,)|(?<=,)(?=[^ ])", > perl=T) > # "a" "bc" "," "def" "," "" "adef" "," "," "gh" > > > Is this correct? Perl seems to return the results you expect: $ perl -E ' say("$_:\n ", join " ", map qq["$_"], split $_, q[a bc,def, adef ,,gh]) for ( qr[ |(?=,)|(?<=,)(?![ ])], qr[ |(?<! )(?=,)|(?<=,)(?![ ])], qr[ |(?<! )(?=,)|(?<=,)(?=[^ ])] )' (?^u: |(?=,)|(?<=,)(?![ ])): "a" "bc" "," "def" "," "adef" "," "," "gh" (?^u: |(?<! )(?=,)|(?<=,)(?![ ])): "a" "bc" "," "def" "," "adef" "," "," "gh" (?^u: |(?<! )(?=,)|(?<=,)(?=[^ ])): "a" "bc" "," "def" "," "adef" "," "," "gh" The same thing happens when I ask R to replace the separators instead of splitting by them: sapply(setNames(nm = c( " |(?=,)|(?<=,)(?![ ])", " |(?<! )(?=,)|(?<=,)(?![ ])", " |(?<! )(?=,)|(?<=,)(?=[^ ])") ), gsub, '[]', "a bc,def, adef ,,gh", perl = TRUE) # |(?=,)|(?<=,)(?![ ]) |(?<! )(?=,)|(?<=,)(?![ ]) # "a[]bc[],[]def[],[]adef[],[],[]gh" "a[]bc[],[]def[],[]adef[],[],[]gh" # |(?<! )(?=,)|(?<=,)(?=[^ ]) # "a[]bc[],[]def[],[]adef[],[],[]gh" I think that something strange happens when the delimeter pattern matches more than once in the same place: gsub( '(?=<--)|(?<=-->)', '[]', 'split here --><-- split here', perl = TRUE ) # [1] "split here -->[]<-- split here" (Both Perl's split() and s///g agree with R's gsub() here, although I would have accepted "split here -->[][]<-- split here" too.) On the other hand, the following doesn't look right: strsplit( 'split here --><-- split here', '(?=<--)|(?<=-->)', perl = TRUE ) # [[1]] # [1] "split here -->" "<" "-- split here" The "<" is definitely not followed by "<--", and the rightmost "--" is definitely not preceded by "-->". Perhaps strsplit() incorrectly advances the match position after one match? -- Best regards, Ivan ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.