Em 04-07-2014 15:15, arun escreveu:

Hi,
Try:
dat1 <- read.table(text="'1 > TC' 'WC'
'2 > 0'  'Instruments & Instrumentation; Nuclear Science & Technology;Physics, 
Particles & Fields; Spectroscopy'
'3 > 0' 'Nanoscience & Nanotechnology; Materials Science,Multidisciplinary; 
Physics, Applied'
'4 > 2'    'Physics, Nuclear; Physics, Particles & Fields'
'5 > 0'    'Chemistry, Inorganic & Nuclear'
'6 > 2'    'Chemistry, Physical; Materials Science, Multidisciplinary;Metallurgy & 
Metallurgical Engineering'",sep="",header=F, stringsAsFactors=F)

library(data.table)
Using `cSplit()` from
https://gist.github.com/mrdwab/11380733

cSplit(dat1, "V2", ";", "long")
         V1                                     V2
  1: 1 > TC                                     WC
  2:  2 > 0          Instruments & Instrumentation
  3:  2 > 0           Nuclear Science & Technology
  4:  2 > 0            Physics, Particles & Fields
  5:  2 > 0                           Spectroscopy
  6:  3 > 0           Nanoscience & Nanotechnology
  7:  3 > 0    Materials Science,Multidisciplinary
  8:  3 > 0                       Physics, Applied
  9:  4 > 2                       Physics, Nuclear
10:  4 > 2            Physics, Particles & Fields
11:  5 > 0         Chemistry, Inorganic & Nuclear
12:  6 > 2                    Chemistry, Physical
13:  6 > 2   Materials Science, Multidisciplinary
14:  6 > 2 Metallurgy & Metallurgical Engineering



A.K.


On Friday, July 4, 2014 9:53 AM, João Azevedo Patrício <joao.patri...@gmx.pt> 
wrote:
Hi,

I've been trying to solve this issue but with no success.

I have some data like this:

1 > TC    WC
2 > 0    Instruments & Instrumentation; Nuclear Science & Technology;
Physics, Particles & Fields; Spectroscopy
3 > 0    Nanoscience & Nanotechnology; Materials Science,
Multidisciplinary; Physics, Applied
4 > 2    Physics, Nuclear; Physics, Particles & Fields
5 > 0    Chemistry, Inorganic & Nuclear
6 > 2    Chemistry, Physical; Materials Science, Multidisciplinary;
Metallurgy & Metallurgical Engineering

And I need to have this:

1 > TC    WC
2 > 0    Instruments & Instrumentation
2 > 0    Nuclear Science & Technology
2 > 0    Physics, Particles & Fields
2 > 0    Spectroscopy
3 > 0    Nanoscience & Nanotechnology
3 > 0    Materials Science, Multidisciplinary
3 > 0    Physics, Applied
4 > 2    Physics, Nuclear
4 > 2    Physics, Particles & Fields
5 > 0    Chemistry, Inorganic & Nuclear
6 > 2    Chemistry, Physical
6 > 2    Materials Science, Multidisciplinary
6 > 2    Metallurgy & Metallurgical Engineering

This means repeat the row for each element in WC and keeping the same
value in TC. The goal is to check how many TC (sum) there are by WC,
when WC is multiple.

i've tried to separate the column using strsplt but then I cannot keep
the track of TC.

thanks in advance.
Thanks is simply fantastic!
After that I just have to do an aggregate by WC and it gives me the n of TC by WC.

thanks!

my code looks like this:

isi <- read.table("filename", header = TRUE, sep=";") ##get citations and web of science categories file
cSplit(isi, "WC", ";", "long") ## split by WC
isisplit <- cSplit(isi, "WC", ";", "long") ## create file with split WC info
wccitations <- aggregate (isisplit$TC, by=list(Category=isisplit$WC), FUN = sum) ## creates a table with the list of WCategories and the specific citations sum for each wcproduction <- table(isisplit$WC) ## creates a table with the number of pubs by WCategories

--
João Azevedo Patrício
Tel.: +31 91 400 53 63
Portugal
@ http://tripaforra.bl.ee

"Take 2 seconds to think before you act"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to