Em 05-07-2014 00:43, John McKown escreveu:
I messed up my original response by not including r-help in the
distribution. And now I won't look as bad because, after a short nap,
I have new, much shorted (but more difficult, for me, to understand)
answer.

#
# The original data is in the variable "x".
z=data.frame(TC=x$TC,
WC=I(mapply(strsplit,x$WC,MoreArgs=list(';'),USE.NAMES=FALSE)));
result=data.frame(TC=rep(x$TC,sapply(z$WC,length)),WC=unlist(z$WC));
#

There may be a way to eliminate the temporary variable "z". Maybe I
need another nap!

The heart of this is the mapply, which results in a list where each
entry in the list is another list. And the entries in embedded list
are the list of results from the output of strsplit() on the WC
information.

If this needs to be a function, then

splitUp <- function(x) {
     z=data.frame(TC=x$TC,
WC=I(mapply(strsplit,x$WC,MoreArgs=list(';'),USE.NAMES=FALSE)));
     result=data.frame(TC=rep(x$TC,sapply(z$WC,length)),WC=unlist(z$WC));
     return(result);
}

Then invoke it with:

flattened.result <- splitUp(original.data.frame);

On Fri, Jul 4, 2014 at 7:50 AM, João Azevedo Patrício
<joao.patri...@gmx.pt> wrote:
Hi,

I've been trying to solve this issue but with no success.

I have some data like this:

1 > TC  WC
2 > 0   Instruments & Instrumentation; Nuclear Science & Technology;
Physics, Particles & Fields; Spectroscopy
3 > 0   Nanoscience & Nanotechnology; Materials Science, Multidisciplinary;
Physics, Applied
4 > 2   Physics, Nuclear; Physics, Particles & Fields
5 > 0   Chemistry, Inorganic & Nuclear
6 > 2   Chemistry, Physical; Materials Science, Multidisciplinary;
Metallurgy & Metallurgical Engineering

And I need to have this:

1 > TC  WC
2 > 0   Instruments & Instrumentation
2 > 0   Nuclear Science & Technology
2 > 0   Physics, Particles & Fields
2 > 0   Spectroscopy
3 > 0   Nanoscience & Nanotechnology
3 > 0   Materials Science, Multidisciplinary
3 > 0   Physics, Applied
4 > 2   Physics, Nuclear
4 > 2   Physics, Particles & Fields
5 > 0   Chemistry, Inorganic & Nuclear
6 > 2   Chemistry, Physical
6 > 2   Materials Science, Multidisciplinary
6 > 2   Metallurgy & Metallurgical Engineering

This means repeat the row for each element in WC and keeping the same value
in TC. The goal is to check how many TC (sum) there are by WC, when WC is
multiple.

i've tried to separate the column using strsplt but then I cannot keep the
track of TC.

thanks in advance.
--
João Azevedo Patrício
I've been testing it and the results is coming nicely.

It grabs a CSV taken from ISI Web Of science, works it out and produces a table organized by WC (web of science category) with number of papers per area, citations and impact factor.

my code is like this right now:

> isi <- read.table("file.csv", header = TRUE, sep=";") ##get citations and web of science categories file
> isisplit=data.frame(TC=isi$TC,
+ WC=I(mapply(strsplit,isi$WC,MoreArgs=list(';'),USE.NAMES=FALSE)));
> result=data.frame(TC=rep(isi$TC,sapply(isisplit$WC,length)),WC=unlist(isisplit$WC));
> isisplit$WC <- str_trim(isisplit$WC)
> wccitations <- aggregate (isisplit$TC, by=list(Category=isisplit$WC), FUN = sum) ## creates a table with the list of WCategories and the specific + citations
> colnames(wccitations) <- c("WC", "TC")
> wcproduction <- table(isisplit$WC) ## creates a table with the number of pubs by WCategories
> wcproduction <- as.data.table(wcproduction)
> colnames(wcproduction) <- c("WC", "PUB")
>wc <- data.frame(WC = wccitations$WC, PUB = wcproduction$PUB, TC = wccitations$TC, IMP = round((wcproduction$PUB/wccitations$TC), digits = + 2))
> wc[wc == Inf] = 0 ## removes inf in impact by impact 0
> write.table(wc, file = "file.csv", sep = ";", dec = ",")


--
João Azevedo Patrício
Tel.: +31 91 400 53 63
Portugal
@ http://tripaforra.bl.ee

"Take 2 seconds to think before you act"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to