Re: [R] spliting first 10 words in a string

Matevž Pavlič Wed, 03 Nov 2010 11:28:23 -0700

Hi all,


Thanks for all the help. I realize i have a lot to learn in R but i love it.

 

m

 

From: steven mosher [mailto:mosherste...@gmail.com] 
Sent: Tuesday, November 02, 2010 11:45 PM
To: MatevÅ¾ PavliÄ
Cc: David Winsemius; Gaj Vidmar; r-h...@stat.math.ethz.ch
Subject: Re: [R] spliting first 10 words in a string

 

just merge the data.frames back together.

 

use merge or cbind()

 

cbind will be easier

 

DF1 <- data.frame(x,y,z)

DF2 <-data.frame(DF1$x) # copy a column

then you added columns to DF2

 

just put them back together

 

DF3 <-cbind(DF2,DF1$y,DF$z)

 

if you spend more time with R you will be able to do things like this 
elegantly, but for

now This way will work and you will learn a bit about R.

 

As for counting instances of a string, I might suggest looking at the table 
command

 

k <- c( "all", "but","all")

> table(k)

k

all but 

  2   1 

 

So you can do a table for each column in your dataframe

 

On Tue, Nov 2, 2010 at 12:53 PM, MatevÅ¾ PavliÄ <matevz.pav...@gi-zrmk.si> 
wrote:

Hi,

Ok, i got this now. At least i think so. I got a data.frame with 15 fields, all 
other words have bee truncated. Which is what i want. But ia have that in a 
seperate data.frame from that one it was before (would be nice if it would be 
in the same ...)

'data.frame':   22801 obs. of  15 variables:
 $ V1 : chr  "HUMUS" "SLABO" "MALO" "SLABO" ...
 $ V2 : chr  "IN" "GRANULIRAN" "PREPEREL" "VEZAN" ...
 $ V3 : chr  "HUMUSNA" "PEÅ ÄEN" "MELJAST" ",KONGLOMERAT," ...
 $ V4 : chr  "GLINA" "PROD" "PROD" "P0ROZEN," ...
 $ V5 : chr  "Z" "DO" "DO" "S" ...
 $ V6 : chr  "MALO" "r" "r" "PLASTMI" ...
 $ V7 : chr  "PODA," "=" "=" "GFs," ...
 $ V8 : chr  "LAHKO" "8Q" "60mm," "SIVORJAV" ...
 $ V9 : chr  "GNETNA," "mm," "S" "" ...
 $ V10: chr  "RJAVA" "S" "PRODNIKI," "" ...
 $ V11: chr  "" "PRODNIKI" "MALO" "" ...
 $ V12: chr  "" "DO" "PEÅ ÄEN" "" ...
 $ V13: chr  "" "R" "S" "" ...
 $ V14: chr  "" "=" "TANKIMI" "" ...

Now, i have another problem. Is it possible to count which word occours most 
often each field (V1, V2, V3, ...) and which one is the second and so on. 
Ideally to create a table for each field (V1, V2, V3, ...) with the word and 
thenumber of occuraces in that field (column) .
I suppose it could be done in SQL, but what since i saw what R can do i guess 
this can be done here to?

Thanks, m


-----Original Message-----
From: David Winsemius [mailto:dwinsem...@comcast.net]

Sent: Tuesday, November 02, 2010 8:23 PM
To: MatevÅ¾ PavliÄ

Cc: Gaj Vidmar; r-h...@stat.math.ethz.ch
Subject: Re: [R] spliting first 10 words in a string


On Nov 2, 2010, at 3:01 PM, MatevÅ¾ PavliÄ wrote:

> Hi all,
>
> Thanks for all the help. I managed to do it with what Gaj suggested
> (Excel :().
>
> The last solution from David is also freat i just don't undestand why
> R  put the words in 14 columns and thre rows?

Because the maximum number of words was 14 and the fill argument was TRUE. 
There were three rows because there were three items in the supplied character 
vector.

> I would like it to put just the first 10 words in source field to 10
> diefferent destiantion fields, but the same row. And so on...is that
> possible?

I don't know what a destination field might be. Those are not R data types.

This would trim the extra columns (in this example set to those greater than 8) 
by adding a lot of "NULL"'s to the end of a colClasses specification .... at 
the expense of a warning message which can be
ignored:

 > read.table(textConnection(words), fill=T, colClasses = c(rep("character", 
 > 8), rep("NULL", 30) ) , stringsAsFactors=FALSE )
   V1    V2    V3      V4    V5    V6    V7      V8
1   I  have     a columnn  with  text  that     has
2   I would  like      to split these words      in
3 but  just first     ten words    in   the string.
Warning message:
In read.table(textConnection(words), fill = T, colClasses = c(rep("character",  
:
  cols = 14 != length(data) = 38


If you want to assign the first column to a variable then just:
 > first8 <- read.table(textConnection(words), fill=T, colClasses = 
 > c(rep("character", 8), rep("NULL", 30) ) , stringsAsFactors=FALSE)  > var1 
 > <- first8[[1]]  > var1
[1] "I"   "I"   "but"

--
David.

>
> Thank you, m
> -----Original Message-----
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org
> ] On Behalf Of David Winsemius
> Sent: Tuesday, November 02, 2010 3:47 PM
> To: Gaj Vidmar
> Cc: r-h...@stat.math.ethz.ch
> Subject: Re: [R] spliting first 10 words in a string
>
>
> On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote:
>
>> Though <forbidden> in this list, in Excel it's just (literally!) five
>> clicks away!
>> (with the column in question selected) Data -> Text to Columns ->
>> Delimited -> tick Space -> Finish Pa je! (~Voila in Slovenian) (then
>> import back to R, keeping only the first 10 columns if so
>> desired)
>
> You could do the same thing without needing to leave R. Just
> read.table( textConnection(..), header=FALSE, fill=TRUE)
>
>> read.table(textConnection(words), fill=T)
>    V1    V2    V3      V4    V5    V6    V7      V8       V9
> V10      V11   V12 V13 V14
> 1   I  have     a columnn  with  text  that     has    quite
> a      few words  in it.
> 2   I would  like      to split these words      in separate columns
> 3 but  just first     ten words    in   the string.       Is    that
> possible    in  R?
>
>>
>> Regards,
>> Assist. Prof. Gaj Vidmar, PhD
>> University Rehabilitattion Institute, Republic of Slovenia
>>
>> Irrelevant P.S. Long ago, before embarking on what eventually ended
>> mainly in statistics, I did two years of geology, so (and also
>> because of knowing what the poster's institute does) I even kinda
>> imagine what these data are.
>>
>> "MatevÂ¾ PavliÃ¨" <matevz.pav...@gi-zrmk.si> wrote in message
>> news:ad5ca6183570b54f92aa45ce2619f9b9d96...@gi-zrmk.si...
>>> Hi,
>>>
>>> I am sorry, will try to be more exact from now on...
>>>
>>> I have a data.frame  with a field called Opis. IT contains sentenses
>>> that I would like to split in words or fields in data.frame...when I
>>> say columns I mean as in Excel table. I would like to split "Opis"
>>> into ten fields from the first ten words in Opis field.
>>> Here is an example of my data.frame.
>>>
>>> 'data.frame':   22928 obs. of  12 variables:
>>> $ VrtinaID        : int  1 1 1 1 2 2 2 2 2 2 ...
>>> $ ZapStev         : int  1 2 3 4 1 2 3 4 5 6 ...
>>> $ GlobinaOd       : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
>>> $ GlobinaDo       : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
>>> $ Opis            : Factor w/ 12754 levels "","(MIVKA) DROBEN
>>> MELJAST
>>> PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884
>>> 9123 2500
>>> 4756 ...
>>> $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..:
>>> 154 125
>>> 101 101 NA 106 125 80 106 101 ...
>>> $ GeolNastOd      : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
>>> $ GeolNastDo      : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
>>> $ GeolNastOpis    : Factor w/ 113 levels "","B. M. S.",..: 56 53 53
>>> 53 56
>>> 53 53 53 53 53 ...
>>> $ NacinVrtanjaOd  : num  0e+00 1e+09 1e+09 1e+09 0e+00 ...
>>> $ NacinVrtanjaDo  : num  1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ...
>>> $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1
>>> 1 1 26
>>> 1 1 1 1 1 ...
>>>
>>> Hope that explains better...
>>> Thank you, m
>>>
>>> -----Original Message-----
>>> From: David Winsemius [mailto:dwinsem...@comcast.net]
>>> Sent: Monday, November 01, 2010 10:13 PM
>>> To: MatevÂ¾ PavliÃ¨
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] spliting first 10 words in a string
>>>
>>>
>>> On Nov 1, 2010, at 4:39 PM, MatevÂ¾ PavliÃ¨ wrote:
>>>
>>>> Hi all,
>>>>
>>>>
>>>>
>>>> I have a columnn with text that has quite a few words in it. I
>>>> would like to split these words in separate columns, but just first
>>>> ten words in the string. Is that possible in R?
>>>>
>>>>
>>>
>>> Not sure what a column means to you. It's not a precisely defined R
>>> type or class. (And you are requested to offered a concrete example
>>> rather than making us guess.)
>>>
>>>> words <-"I have a columnn with text that has quite a few words in
>>> it. I would like to split these words in separate columns, but just
>>> first ten words in the string. Is that possible in R?"
>>>
>>>> strsplit(words, " ")[[1]][1:10]
>>> [1] "I"       "have"    "a"       "columnn" "with"    "text"
>>> "that"    "has"     "quite"   "a"
>>>
>>>
>>> Or if in a dataframe:
>>>
>>>> words <-c("I have a columnn with text that has quite a few words in
>>> it.",   "I would like to split these words in separate columns",
>>> "but
>>> just first ten words in the string. Is that possible in R?")
>>>> worddf <- data.frame(words=words)
>>>
>>>> t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
>>>    [,1]  [,2]    [,3]    [,4]      [,5]    [,6]    [,7]    [,
>>> 8]      [,9]       [,10]
>>> [1,] "I"   "have"  "a"     "columnn" "with"  "text"  "that"  "has"
>>> "quite"    "a"
>>> [2,] "I"   "would" "like"  "to"      "split" "these" "words" "in"
>>> "separate" "columns"
>>> [3,] "but" "just"  "first" "ten"     "words" "in"    "the"
>>> "string."
>>> "Is"       "that"
>>>
>>>
>>> --
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] spliting first 10 words in a string

Reply via email to