I always find regex puzzles amusing, so after changing the unicode typo quotes and dashes to ascii, the following simple prescription, similar to those proffered by others, seems to produce what you requested with your example:
x <- c("leucocyten + gramnegatieve staven +++ grampositieve staven ++", "leucocyten - grampositieve coccen +") strsplit(gsub("([^[:alnum:]]) ","\\1>>",x),">>") (You can use unlist on this if you wish). My slight variant uses character classes and a backreference to identify where you want to split. The substitute '>>' split expression is purely arbitrary of course. Instead of [^[:alnum:]] you could probably use [+-], but I only wished to assume some sort of non-alphanumeric. I mention in passing that the above also seemed to work when I kept the en dash instead of a minus sign, but I make no claim for superiority -- or even noninferiority -- to the solutions proposed by others. Cheers, Bert On Wed, Apr 12, 2023 at 2:52 PM David Winsemius <dwinsem...@comcast.net> wrote: > > I thought replacing the spaces following instances of +++,++,+,- with "\n" > and then reading with scan should succeed. Like Ivan Krylov I was fairly sure > that you meant the minus sign to be "-" rather than "–", but perhaps your > were using MS Word as an editor which is inconsistent with effective use of > R. If so, learn to use a proper programming editor, and in any case learn to > post to rhelp in plain text. > > -- > David > > scan(text=gsub("([-+]){1}\\s", "\\1\n", dat), what="", sep="\n") > > > > > On Apr 12, 2023, at 2:29 AM, Emily Bakker <emilybak...@outlook.com> wrote: > > > > Hello List, > > > > I have a dataset consisting of strings that I want to split while saving > > the delimiter. > > > > Some example data: > > “leucocyten + gramnegatieve staven +++ grampositieve staven ++” > > “leucocyten – grampositieve coccen +” > > > > I want to split the strings such that I get the following result: > > c(“leucocyten +”, “gramnegatieve staven +++”, “grampositieve staven ++”) > > c(“leucocyten –“, “grampositieve coccen +”) > > > > I have tried strsplit with a regular expression with a positive lookahead, > > but I am not able to achieve the results that I want. > > > > I have tried: > > as.list(strsplit(x, split = “(?=[\\+-]{1,3}\\s)+, perl=TRUE) > > > > Which results in: > > c(“leucocyten “, “+”, “gramnegatieve staven “, “+”, “+”, “+”, > > “grampositieve staven ++”) > > c(“leucocyten “, “–“, “grampositieve coccen +”) > > > > > > Is there a function or regular expression that will make this possible? > > > > Kind regards, > > Emily > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.