Re: [R] Transform a data.frame with "; " sep column and another one in a a new one with the same two column but with repetitions

João Azevedo Patrício Fri, 04 Jul 2014 23:43:23 -0700

Em 04-07-2014 15:15, arun escreveu:


Hi,
Try:
dat1 <- read.table(text="'1 > TC' 'WC'
'2 > 0'  'Instruments & Instrumentation; Nuclear Science & Technology;Physics, 
Particles & Fields; Spectroscopy'
'3 > 0' 'Nanoscience & Nanotechnology; Materials Science,Multidisciplinary; 
Physics, Applied'
'4 > 2'    'Physics, Nuclear; Physics, Particles & Fields'
'5 > 0'    'Chemistry, Inorganic & Nuclear'
'6 > 2'    'Chemistry, Physical; Materials Science, Multidisciplinary;Metallurgy & 
Metallurgical Engineering'",sep="",header=F, stringsAsFactors=F)

library(data.table)
Using `cSplit()` from
https://gist.github.com/mrdwab/11380733

cSplit(dat1, "V2", ";", "long")
         V1                                     V2
  1: 1 > TC                                     WC
  2:  2 > 0          Instruments & Instrumentation
  3:  2 > 0           Nuclear Science & Technology
  4:  2 > 0            Physics, Particles & Fields
  5:  2 > 0                           Spectroscopy
  6:  3 > 0           Nanoscience & Nanotechnology
  7:  3 > 0    Materials Science,Multidisciplinary
  8:  3 > 0                       Physics, Applied
  9:  4 > 2                       Physics, Nuclear
10:  4 > 2            Physics, Particles & Fields
11:  5 > 0         Chemistry, Inorganic & Nuclear
12:  6 > 2                    Chemistry, Physical
13:  6 > 2   Materials Science, Multidisciplinary
14:  6 > 2 Metallurgy & Metallurgical Engineering



A.K.


On Friday, July 4, 2014 9:53 AM, João Azevedo Patrício <joao.patri...@gmx.pt> 
wrote:
Hi,

I've been trying to solve this issue but with no success.

I have some data like this:

1 > TC    WC
2 > 0    Instruments & Instrumentation; Nuclear Science & Technology;
Physics, Particles & Fields; Spectroscopy
3 > 0    Nanoscience & Nanotechnology; Materials Science,
Multidisciplinary; Physics, Applied
4 > 2    Physics, Nuclear; Physics, Particles & Fields
5 > 0    Chemistry, Inorganic & Nuclear
6 > 2    Chemistry, Physical; Materials Science, Multidisciplinary;
Metallurgy & Metallurgical Engineering

And I need to have this:

1 > TC    WC
2 > 0    Instruments & Instrumentation
2 > 0    Nuclear Science & Technology
2 > 0    Physics, Particles & Fields
2 > 0    Spectroscopy
3 > 0    Nanoscience & Nanotechnology
3 > 0    Materials Science, Multidisciplinary
3 > 0    Physics, Applied
4 > 2    Physics, Nuclear
4 > 2    Physics, Particles & Fields
5 > 0    Chemistry, Inorganic & Nuclear
6 > 2    Chemistry, Physical
6 > 2    Materials Science, Multidisciplinary
6 > 2    Metallurgy & Metallurgical Engineering

This means repeat the row for each element in WC and keeping the same
value in TC. The goal is to check how many TC (sum) there are by WC,
when WC is multiple.

i've tried to separate the column using strsplt but then I cannot keep
the track of TC.

thanks in advance.

Thanks is simply fantastic!

After that I just have to do an aggregate by WC and it gives me the n ofTC by WC.


thanks!

my code looks like this:

isi <- read.table("filename", header = TRUE, sep=";") ##get citationsand web of science categories file

cSplit(isi, "WC", ";", "long") ## split by WC
isisplit <- cSplit(isi, "WC", ";", "long") ## create file with split WC info

wccitations <- aggregate (isisplit$TC, by=list(Category=isisplit$WC),FUN = sum) ## creates a table with the list of WCategories and thespecific citations sum for eachwcproduction <- table(isisplit$WC) ## creates a table with the number ofpubs by WCategories


--
João Azevedo Patrício
Tel.: +31 91 400 53 63
Portugal
@ http://tripaforra.bl.ee

"Take 2 seconds to think before you act"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Transform a data.frame with "; " sep column and another one in a a new one with the same two column but with repetitions

Reply via email to