I just wanted to confirm that Milan's suggestion about adding (*UCP) like in
the example below:
gsub(sprintf("(*UCP)\\b(%s)\\b", "който"), "", "който", perl=TRUE)
solved all problems (under openSuse Linux 12.3 64-bit, R 2.15.2). I reencoded
input files and stop word list in UTF-8, and now stop
I just wanted to confirm that Milan's suggestion about adding (*UCP) like in
the example below:
gsub(sprintf("(*UCP)\\b(%s)\\b", "който"), "", "който", perl=TRUE)
solved all problems (under openSuse Linux 12.3 64-bit, R 2.15.2). I reencoded
input files and stop word list in UTF-8, and now stop
Le mercredi 10 avril 2013 à 13:17 +0200, Ingo Feinerer a écrit :
> On Wed, Apr 10, 2013 at 10:29:27AM +0200, Milan Bouchet-Valat wrote:
> > Thanks for the reproducible example. Indeed, it does not work here
> > either (Linux with UTF-8 locale). The problem seems to be in the call to
> > gsub() in r
Thank you so much! You made it look (almost) so easy. I greatly
appreciate it!
On 10.4.2013 г. 11:29 ч., Milan Bouchet-Valat wrote:
Le mercredi 10 avril 2013 à 10:50 +0300, Ventseslav Kozarev, MPP a
écrit :
Hi,
Thanks for taking the time. Here is a more reproducible example of the
entire proc
Le mercredi 10 avril 2013 à 10:50 +0300, Ventseslav Kozarev, MPP a
écrit :
> Hi,
>
> Thanks for taking the time. Here is a more reproducible example of the
> entire process:
>
> # Creating a vector source - stupid text in the Bulgarian language
> bg<-c('Днес е хубав и слънчев ден, в който всички
Hi,
Thanks for taking the time. Here is a more reproducible example of the
entire process:
# Creating a vector source - stupid text in the Bulgarian language
bg<-c('Днес е хубав и слънчев ден, в който всички искат да бъдат
навън.','Утре ще бъде още по-хубав ден.')
# Converting strings from
Le mardi 09 avril 2013 à 10:10 +0300, Ventseslav Kozarev, MPP a écrit :
> Hi,
>
> I bumped into a serious issue while trying to analyse some texts in
> Bulgarian language (with the tm package). I import a tab-separated csv
> file, which holds a total of 22 variables, most of which are text cells
Hi,
I bumped into a serious issue while trying to analyse some texts in
Bulgarian language (with the tm package). I import a tab-separated csv
file, which holds a total of 22 variables, most of which are text cells
(not factors), using the read.delim function:
data<-read.delim("bigcompanies_
8 matches
Mail list logo