Re: [R] spliting first 10 words in a string

steven mosher Tue, 02 Nov 2010 15:47:47 -0700

just merge the data.frames back together.

use merge or cbind()


cbind will be easier

DF1 <- data.frame(x,y,z)
DF2 <-data.frame(DF1$x) # copy a column
then you added columns to DF2

just put them back together

DF3 <-cbind(DF2,DF1$y,DF$z)

if you spend more time with R you will be able to do things like this
elegantly, but for
now This way will work and you will learn a bit about R.

As for counting instances of a string, I might suggest looking at the table
command

k <- c( "all", "but","all")
> table(k)
k
all but
  2   1

So you can do a table for each column in your dataframe

On Tue, Nov 2, 2010 at 12:53 PM, MatevÅ¾ PavliÄ 
<matevz.pav...@gi-zrmk.si>wrote:

> Hi,
>
> Ok, i got this now. At least i think so. I got a data.frame with 15 fields,
> all other words have bee truncated. Which is what i want. But ia have that
> in a seperate data.frame from that one it was before (would be nice if it
> would be in the same ...)
>
> 'data.frame':   22801 obs. of  15 variables:
>  $ V1 : chr  "HUMUS" "SLABO" "MALO" "SLABO" ...
>  $ V2 : chr  "IN" "GRANULIRAN" "PREPEREL" "VEZAN" ...
>  $ V3 : chr  "HUMUSNA" "PEÅ ÄEN" "MELJAST" ",KONGLOMERAT," ...
>  $ V4 : chr  "GLINA" "PROD" "PROD" "P0ROZEN," ...
>  $ V5 : chr  "Z" "DO" "DO" "S" ...
>  $ V6 : chr  "MALO" "r" "r" "PLASTMI" ...
>  $ V7 : chr  "PODA," "=" "=" "GFs," ...
>  $ V8 : chr  "LAHKO" "8Q" "60mm," "SIVORJAV" ...
>  $ V9 : chr  "GNETNA," "mm," "S" "" ...
>  $ V10: chr  "RJAVA" "S" "PRODNIKI," "" ...
>  $ V11: chr  "" "PRODNIKI" "MALO" "" ...
>  $ V12: chr  "" "DO" "PEÅ ÄEN" "" ...
>  $ V13: chr  "" "R" "S" "" ...
>  $ V14: chr  "" "=" "TANKIMI" "" ...
>
> Now, i have another problem. Is it possible to count which word occours
> most often each field (V1, V2, V3, ...) and which one is the second and so
> on. Ideally to create a table for each field (V1, V2, V3, ...) with the word
> and thenumber of occuraces in that field (column) .
> I suppose it could be done in SQL, but what since i saw what R can do i
> guess this can be done here to?
>
> Thanks, m
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsem...@comcast.net]
> Sent: Tuesday, November 02, 2010 8:23 PM
> To: MatevÅ¾ PavliÄ
> Cc: Gaj Vidmar; r-h...@stat.math.ethz.ch
> Subject: Re: [R] spliting first 10 words in a string
>
>
> On Nov 2, 2010, at 3:01 PM, MatevÅ¾ PavliÄ wrote:
>
> > Hi all,
> >
> > Thanks for all the help. I managed to do it with what Gaj suggested
> > (Excel :().
> >
> > The last solution from David is also freat i just don't undestand why
> > R  put the words in 14 columns and thre rows?
>
> Because the maximum number of words was 14 and the fill argument was TRUE.
> There were three rows because there were three items in the supplied
> character vector.
>
> > I would like it to put just the first 10 words in source field to 10
> > diefferent destiantion fields, but the same row. And so on...is that
> > possible?
>
> I don't know what a destination field might be. Those are not R data types.
>
> This would trim the extra columns (in this example set to those greater
> than 8) by adding a lot of "NULL"'s to the end of a colClasses specification
> .... at the expense of a warning message which can be
> ignored:
>
>  > read.table(textConnection(words), fill=T, colClasses =
> c(rep("character", 8), rep("NULL", 30) ) , stringsAsFactors=FALSE )
>    V1    V2    V3      V4    V5    V6    V7      V8
> 1   I  have     a columnn  with  text  that     has
> 2   I would  like      to split these words      in
> 3 but  just first     ten words    in   the string.
> Warning message:
> In read.table(textConnection(words), fill = T, colClasses =
> c(rep("character",  :
>   cols = 14 != length(data) = 38
>
>
> If you want to assign the first column to a variable then just:
>  > first8 <- read.table(textConnection(words), fill=T, colClasses =
> c(rep("character", 8), rep("NULL", 30) ) , stringsAsFactors=FALSE)  > var1
> <- first8[[1]]  > var1
> [1] "I"   "I"   "but"
>
> --
> David.
>
> >
> > Thank you, m
> > -----Original Message-----
> > From: r-help-boun...@r-project.org
> > [mailto:r-help-boun...@r-project.org
> > ] On Behalf Of David Winsemius
> > Sent: Tuesday, November 02, 2010 3:47 PM
> > To: Gaj Vidmar
> > Cc: r-h...@stat.math.ethz.ch
> > Subject: Re: [R] spliting first 10 words in a string
> >
> >
> > On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote:
> >
> >> Though <forbidden> in this list, in Excel it's just (literally!) five
> >> clicks away!
> >> (with the column in question selected) Data -> Text to Columns ->
> >> Delimited -> tick Space -> Finish Pa je! (~Voila in Slovenian) (then
> >> import back to R, keeping only the first 10 columns if so
> >> desired)
> >
> > You could do the same thing without needing to leave R. Just
> > read.table( textConnection(..), header=FALSE, fill=TRUE)
> >
> >> read.table(textConnection(words), fill=T)
> >    V1    V2    V3      V4    V5    V6    V7      V8       V9
> > V10      V11   V12 V13 V14
> > 1   I  have     a columnn  with  text  that     has    quite
> > a      few words  in it.
> > 2   I would  like      to split these words      in separate columns
> > 3 but  just first     ten words    in   the string.       Is    that
> > possible    in  R?
> >
> >>
> >> Regards,
> >> Assist. Prof. Gaj Vidmar, PhD
> >> University Rehabilitattion Institute, Republic of Slovenia
> >>
> >> Irrelevant P.S. Long ago, before embarking on what eventually ended
> >> mainly in statistics, I did two years of geology, so (and also
> >> because of knowing what the poster's institute does) I even kinda
> >> imagine what these data are.
> >>
> >> "MatevÂ¾ PavliÃ¨" <matevz.pav...@gi-zrmk.si> wrote in message
> >> news:ad5ca6183570b54f92aa45ce2619f9b9d96...@gi-zrmk.si...
> >>> Hi,
> >>>
> >>> I am sorry, will try to be more exact from now on...
> >>>
> >>> I have a data.frame  with a field called Opis. IT contains sentenses
> >>> that I would like to split in words or fields in data.frame...when I
> >>> say columns I mean as in Excel table. I would like to split "Opis"
> >>> into ten fields from the first ten words in Opis field.
> >>> Here is an example of my data.frame.
> >>>
> >>> 'data.frame':   22928 obs. of  12 variables:
> >>> $ VrtinaID        : int  1 1 1 1 2 2 2 2 2 2 ...
> >>> $ ZapStev         : int  1 2 3 4 1 2 3 4 5 6 ...
> >>> $ GlobinaOd       : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
> >>> $ GlobinaDo       : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
> >>> $ Opis            : Factor w/ 12754 levels "","(MIVKA) DROBEN
> >>> MELJAST
> >>> PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884
> >>> 9123 2500
> >>> 4756 ...
> >>> $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..:
> >>> 154 125
> >>> 101 101 NA 106 125 80 106 101 ...
> >>> $ GeolNastOd      : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
> >>> $ GeolNastDo      : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
> >>> $ GeolNastOpis    : Factor w/ 113 levels "","B. M. S.",..: 56 53 53
> >>> 53 56
> >>> 53 53 53 53 53 ...
> >>> $ NacinVrtanjaOd  : num  0e+00 1e+09 1e+09 1e+09 0e+00 ...
> >>> $ NacinVrtanjaDo  : num  1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ...
> >>> $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1
> >>> 1 1 26
> >>> 1 1 1 1 1 ...
> >>>
> >>> Hope that explains better...
> >>> Thank you, m
> >>>
> >>> -----Original Message-----
> >>> From: David Winsemius [mailto:dwinsem...@comcast.net]
> >>> Sent: Monday, November 01, 2010 10:13 PM
> >>> To: MatevÂ¾ PavliÃ¨
> >>> Cc: r-help@r-project.org
> >>> Subject: Re: [R] spliting first 10 words in a string
> >>>
> >>>
> >>> On Nov 1, 2010, at 4:39 PM, MatevÂ¾ PavliÃ¨ wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>>
> >>>>
> >>>> I have a columnn with text that has quite a few words in it. I
> >>>> would like to split these words in separate columns, but just first
> >>>> ten words in the string. Is that possible in R?
> >>>>
> >>>>
> >>>
> >>> Not sure what a column means to you. It's not a precisely defined R
> >>> type or class. (And you are requested to offered a concrete example
> >>> rather than making us guess.)
> >>>
> >>>> words <-"I have a columnn with text that has quite a few words in
> >>> it. I would like to split these words in separate columns, but just
> >>> first ten words in the string. Is that possible in R?"
> >>>
> >>>> strsplit(words, " ")[[1]][1:10]
> >>> [1] "I"       "have"    "a"       "columnn" "with"    "text"
> >>> "that"    "has"     "quite"   "a"
> >>>
> >>>
> >>> Or if in a dataframe:
> >>>
> >>>> words <-c("I have a columnn with text that has quite a few words in
> >>> it.",   "I would like to split these words in separate columns",
> >>> "but
> >>> just first ten words in the string. Is that possible in R?")
> >>>> worddf <- data.frame(words=words)
> >>>
> >>>> t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
> >>>    [,1]  [,2]    [,3]    [,4]      [,5]    [,6]    [,7]    [,
> >>> 8]      [,9]       [,10]
> >>> [1,] "I"   "have"  "a"     "columnn" "with"  "text"  "that"  "has"
> >>> "quite"    "a"
> >>> [2,] "I"   "would" "like"  "to"      "split" "these" "words" "in"
> >>> "separate" "columns"
> >>> [3,] "but" "just"  "first" "ten"     "words" "in"    "the"
> >>> "string."
> >>> "Is"       "that"
> >>>
> >>>
> >>> --
> >>> David Winsemius, MD
> >>> West Hartford, CT
> >>>
> >>> ______________________________________________
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] spliting first 10 words in a string

Reply via email to