SOLVED. Thanks to a reply off-list it appears that the 'space' in "published 11" is actually some kind of multibyte character. If I physically delete the 'space' and replace it by using the spacebar on my keyboard, then strsplit() behaves as expected.
I had got the text from a hyperlink and copy and pasted it into R. It did not occur to me that the 'spaces' might be something else. However I am surprised that it worked in the first instance for both of the kind posters above. Perhaps i'm just unluky with the local settings on my Vista PC :S Cheers everyone, much appreciated! Tony On 8 Sep, 11:57, Tony Breyal <tony.bre...@googlemail.com> wrote: > UPDATE: > > I'm not sure why, but on my Windows XP 64bit machine, I ran the same > code again and this time it is not working even though it worked > previously. This has been done using the Rgui --vanilla command. > > > x <- c("Weekly sales figures to 30 August 2008 published 5 September", > > "Weekly sales figures to 6 September 2008 published 11 September") > > strsplit(x, 'published ', fixed=TRUE) > > [[1]] > [1] "Weekly sales figures to 30 August 2008 " > [2] "5 September" > > [[2]] > [1] "Weekly sales figures to 6 September 2008 published 11 September" > > O/S: Windows XP 64bit Pro; Service Pack 2> sessionInfo() > > R version 2.9.2 (2009-08-24) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States. > 1252;LC_MONETARY=English_United States. > 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > > > On 8 Sep, 09:47, Tony Breyal <tony.bre...@googlemail.com> wrote: > > > > > After further investigation it appears that the problem is specific to > > my Vista PC. I am able to get the correct results using R 2.9.2 on a > > Window XP 64bit machine. However i do not know why this does not work > > on my Vista PC. The following was done after rebooting Vista. > > > >From CMD.exe I ran the following line: > > > C:\Program Files\R\R-2.9.2\bin>Rgui --vanilla > > > This opened up R. > > > ### R 2.9.2 START ###> txt <- c("sales to 23 August 2008 published 29 > > August", > > > + "sales to 6 September 2008 published 11 September") > > > > strsplit(txt, 'published', fixed=TRUE) > > > [[1]] > > [1] "sales to 23 August 2008 " " 29 August" > > > [[2]] > > [1] "sales to 6 September 2008 " " 11 September" > > > > strsplit(txt, 'published ', fixed=TRUE) > > > [[1]] > > [1] "sales to 23 August 2008 " "29 August" > > > [[2]] > > [1] "sales to 6 September 2008 published 11 September" > > > > sessionInfo() > > > R version 2.9.2 (2009-08-24) > > i386-pc-mingw32 > > > locale: > > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > > Kingdom.1252;LC_MONETARY=English_United > > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > ### R 2.9.2 END ### > > > The exact same thing happened when I used R 2.9.0 and R 2.8.1 on this > > same vista computer. > > > ### R 2.9.0 ###> sessionInfo() > > > R version 2.9.0 (2009-04-17) > > i386-pc-mingw32 > > > locale: > > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > > Kingdom.1252;LC_MONETARY=English_United > > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > other attached packages: > > [1] rcom_2.1-3 rscproxy_1.3-1 > > > loaded via a namespace (and not attached): > > [1] tools_2.9.0 > > > ### R 2.8.1 ###> sessionInfo() > > > R version 2.8.1 (2008-12-22) > > i386-pc-mingw32 > > > locale: > > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > > Kingdom.1252;LC_MONETARY=English_United > > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > my computer details are: > > Windows Vista Ultimate > > Service Pack 1 > > Manufacturer: Dell > > Rating: 3.4 > > Processor: Intel Core 2 Duo CPU E6750 @ 2.66 GHz > > Memory (RAM): 4.00 GB > > System type: 32-bit Operating System > > > 2009/9/8 Gabor Grothendieck <ggrothendi...@gmail.com>: > > > > I am using the exact same version of R as you also on Vista > > > but can't reproduce your result. For me it splits properly. > > > > Try starting R like this (modify path if needed) from the > > > Windows cmd line: > > > > \Program Files\R\R-2.9.2\bin\Rgui --vanilla > > > > and then try it. > > > > On Mon, Sep 7, 2009 at 11:40 AM, Tony Breyal<tony.bre...@googlemail.com> > > > wrote: > > >> Dear all, > > > >> I'm having a problem understanding why a split does not occur with in > > >> the 2nd use of the function strsplit below: > > > >> # text strings > > >>> txt <- c("sales to 23 August 2008 published 29 August", > > >> + "sales to 6 September 2008 published 11 September") > > > >> # first use > > >>> strsplit(txt, 'published', fixed=TRUE) > > >> [[1]] > > >> [1] "sales to 23 August 2008 " " 29 August" > > > >> [[2]] > > >> [1] "sales to 6 September 2008 " " 11 September" > > > >> # second use, but with a space ' ' in the split > > >>> strsplit(txt, 'published ', fixed=TRUE) > > >> [[1]] > > >> [1] "sales to 23 August 2008 " "29 August" > > > >> [[2]] > > >> [1] "sales to 6 September 2008 published 11 September" > > > >> Thank you kindly for any help in advance. > > >> Tony > > > >> O/S: Win Vista Ultimate > > >>> sessionInfo() > > >> R version 2.9.2 (2009-08-24) > > >> i386-pc-mingw32 > > > >> locale: > > >> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. > > >> 1252;LC_MONETARY=English_United Kingdom. > > >> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > > >> attached base packages: > > >> [1] stats graphics grDevices utils datasets methods > > >> base > > > >> other attached packages: > > >> [1] RODBC_1.3-0 > > > >> ______________________________________________ > > >> r-h...@r-project.org mailing list > > >>https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting > > >> guidehttp://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > > -- > > Tony Breyal > > > ______________________________________________ > > r-h...@r-project.org mailing > > listhttps://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.