Thanks david. Matevz, maybe I can help explain by doing a very simple and brute force approach as opposed to the way david did it. But you should learn his methods.
I will just do a subset of your problem and if you understand how it works then you should be able to get something done and then make it more elegant. First, I simplify the problem by separating out the "sentence" column. You can do this with your data frame by simply doing this MySentence <-data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE) so I take your original data.frame (yourbigDF) and I just create a copy of that one column $Opis Later we can merge the two back together after I add 10 columns for the words Lets make some dummy data with just 10 rows sentence<- "this is a sentence with ten words or maybe more than ten words" sentV<-rep(sentence,10) # now I just made 10 rows of the same sentence # NEXT because I am going to create 10 new colums of 10 rows I create # 10 vectors> each is named and each has 10 elements For the rows. # they have NO DATA in them first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=10) #Next I create a dataframe with Sentence in the first column and 10 blank colums. # NOTE I use stringsAsFactors=False DF <-data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE) # This is what it would look like ( the first row) DF[1,] Sentence first second third fourth fifth sixth seventh eighth ninth tenth 1 this is a sentence with ten words or maybe more than ten words FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Next, I will show you how to assign the first ten words to the 10 blank columns DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10] #DF[1,2:11] selects the columns 2-11 of the first row #strsplit returns the first 10 words [1:10] and place them in the columsn2-11 If you want to do this the slow way you can just loop through your dataframe row by row or you can probably use apply. Make more sense? > DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10] > DF[1,] Sentence first second third fourth fifth sixth seventh eighth ninth tenth 1 this is a sentence with ten words or maybe more than ten words this is a sentence with ten words or maybe more > DF[1,"first"] [1] "this" On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius <dwinsem...@comcast.net>wrote: > > On Nov 2, 2010, at 3:01 PM, Matevž PavliÄ wrote: > > Hi all, >> >> Thanks for all the help. I managed to do it with what Gaj suggested (Excel >> :(). >> >> The last solution from David is also freat i just don't undestand why R >> put the words in 14 columns and thre rows? >> > > Because the maximum number of words was 14 and the fill argument was TRUE. > There were three rows because there were three items in the supplied > character vector. > > > I would like it to put just the first 10 words in source field to 10 >> diefferent destiantion fields, but the same row. And so on...is that >> possible? >> > > I don't know what a destination field might be. Those are not R data types. > > This would trim the extra columns (in this example set to those greater > than 8) by adding a lot of "NULL"'s to the end of a colClasses specification > .... at the expense of a warning message which can be ignored: > > > read.table(textConnection(words), fill=T, colClasses = c(rep("character", > 8), rep("NULL", 30) ) , stringsAsFactors=FALSE ) > > V1 V2 V3 V4 V5 V6 V7 V8 > 1 I have a columnn with text that has > 2 I would like to split these words in > 3 but just first ten words in the string. > Warning message: > In read.table(textConnection(words), fill = T, colClasses = > c(rep("character", : > cols = 14 != length(data) = 38 > > > If you want to assign the first column to a variable then just: > > first8 <- read.table(textConnection(words), fill=T, colClasses = > c(rep("character", 8), rep("NULL", 30) ) , stringsAsFactors=FALSE) > > var1 <- first8[[1]] > > var1 > [1] "I" "I" "but" > > -- > David. > > > >> Thank you, m >> -----Original Message----- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> On Behalf Of David Winsemius >> Sent: Tuesday, November 02, 2010 3:47 PM >> To: Gaj Vidmar >> Cc: r-h...@stat.math.ethz.ch >> Subject: Re: [R] spliting first 10 words in a string >> >> >> On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote: >> >> Though <forbidden> in this list, in Excel it's just (literally!) >>> five clicks >>> away! >>> (with the column in question selected) >>> Data -> Text to Columns -> Delimited -> tick Space -> Finish >>> Pa je! (~Voila in Slovenian) >>> (then import back to R, keeping only the first 10 columns if so >>> desired) >>> >> >> You could do the same thing without needing to leave R. Just >> read.table( textConnection(..), header=FALSE, fill=TRUE) >> >> read.table(textConnection(words), fill=T) >>> >> V1 V2 V3 V4 V5 V6 V7 V8 V9 >> V10 V11 V12 V13 V14 >> 1 I have a columnn with text that has quite >> a few words in it. >> 2 I would like to split these words in separate columns >> 3 but just first ten words in the string. Is that >> possible in R? >> >> >>> Regards, >>> Assist. Prof. Gaj Vidmar, PhD >>> University Rehabilitattion Institute, Republic of Slovenia >>> >>> Irrelevant P.S. Long ago, before embarking on what eventually ended >>> mainly >>> in statistics, >>> I did two years of geology, so (and also because of knowing what the >>> poster's institute does) >>> I even kinda imagine what these data are. >>> >>> "Matev¾ Pavliè" <matevz.pav...@gi-zrmk.si> wrote in message >>> news:ad5ca6183570b54f92aa45ce2619f9b9d96...@gi-zrmk.si... >>> >>>> Hi, >>>> >>>> I am sorry, will try to be more exact from now on... >>>> >>>> I have a data.frame with a field called Opis. IT contains >>>> sentenses that >>>> I would like to split in words or fields in data.frame...when I say >>>> columns I mean as in Excel table. I would like to split "Opis" into >>>> ten >>>> fields from the first ten words in Opis field. >>>> Here is an example of my data.frame. >>>> >>>> 'data.frame': 22928 obs. of 12 variables: >>>> $ VrtinaID : int 1 1 1 1 2 2 2 2 2 2 ... >>>> $ ZapStev : int 1 2 3 4 1 2 3 4 5 6 ... >>>> $ GlobinaOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ... >>>> $ GlobinaDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ... >>>> $ Opis : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST >>>> PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884 >>>> 9123 2500 >>>> 4756 ... >>>> $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..: >>>> 154 125 >>>> 101 101 NA 106 125 80 106 101 ... >>>> $ GeolNastOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ... >>>> $ GeolNastDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ... >>>> $ GeolNastOpis : Factor w/ 113 levels "","B. M. S.",..: 56 53 53 >>>> 53 56 >>>> 53 53 53 53 53 ... >>>> $ NacinVrtanjaOd : num 0e+00 1e+09 1e+09 1e+09 0e+00 ... >>>> $ NacinVrtanjaDo : num 1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ... >>>> $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1 >>>> 1 1 26 >>>> 1 1 1 1 1 ... >>>> >>>> Hope that explains better... >>>> Thank you, m >>>> >>>> -----Original Message----- >>>> From: David Winsemius [mailto:dwinsem...@comcast.net] >>>> Sent: Monday, November 01, 2010 10:13 PM >>>> To: Matev¾ Pavliè >>>> Cc: r-help@r-project.org >>>> Subject: Re: [R] spliting first 10 words in a string >>>> >>>> >>>> On Nov 1, 2010, at 4:39 PM, Matev¾ Pavliè wrote: >>>> >>>> Hi all, >>>>> >>>>> >>>>> >>>>> I have a columnn with text that has quite a few words in it. I would >>>>> like to split these words in separate columns, but just first ten >>>>> words in the string. Is that possible in R? >>>>> >>>>> >>>>> >>>> Not sure what a column means to you. It's not a precisely defined R >>>> type or class. (And you are requested to offered a concrete example >>>> rather than making us guess.) >>>> >>>> words <-"I have a columnn with text that has quite a few words in >>>>> >>>> it. I would like to split these words in separate columns, but just >>>> first ten words in the string. Is that possible in R?" >>>> >>>> strsplit(words, " ")[[1]][1:10] >>>>> >>>> [1] "I" "have" "a" "columnn" "with" "text" >>>> "that" "has" "quite" "a" >>>> >>>> >>>> Or if in a dataframe: >>>> >>>> words <-c("I have a columnn with text that has quite a few words in >>>>> >>>> it.", "I would like to split these words in separate columns", "but >>>> just first ten words in the string. Is that possible in R?") >>>> >>>>> worddf <- data.frame(words=words) >>>>> >>>> >>>> t(sapply(strsplit(worddf$words, " "), "[", 1:10) ) >>>>> >>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [, >>>> 8] [,9] [,10] >>>> [1,] "I" "have" "a" "columnn" "with" "text" "that" "has" >>>> "quite" "a" >>>> [2,] "I" "would" "like" "to" "split" "these" "words" "in" >>>> "separate" "columns" >>>> [3,] "but" "just" "first" "ten" "words" "in" "the" >>>> "string." >>>> "Is" "that" >>>> >>>> >>>> -- >>>> David Winsemius, MD >>>> West Hartford, CT >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.