Em 05-07-2014 03:35, John McKown escreveu:
On Fri, Jul 4, 2014 at 7:50 AM, João Azevedo Patrício
<joao.patri...@gmx.pt> wrote:
Hi,

I've been trying to solve this issue but with no success.

I have some data like this:

1 > TC  WC
2 > 0   Instruments & Instrumentation; Nuclear Science & Technology;
Physics, Particles & Fields; Spectroscopy
3 > 0   Nanoscience & Nanotechnology; Materials Science, Multidisciplinary;
Physics, Applied
4 > 2   Physics, Nuclear; Physics, Particles & Fields
5 > 0   Chemistry, Inorganic & Nuclear
6 > 2   Chemistry, Physical; Materials Science, Multidisciplinary;
Metallurgy & Metallurgical Engineering

And I need to have this:

1 > TC  WC
2 > 0   Instruments & Instrumentation
2 > 0   Nuclear Science & Technology
2 > 0   Physics, Particles & Fields
2 > 0   Spectroscopy
3 > 0   Nanoscience & Nanotechnology
3 > 0   Materials Science, Multidisciplinary
3 > 0   Physics, Applied
4 > 2   Physics, Nuclear
4 > 2   Physics, Particles & Fields
5 > 0   Chemistry, Inorganic & Nuclear
6 > 2   Chemistry, Physical
6 > 2   Materials Science, Multidisciplinary
6 > 2   Metallurgy & Metallurgical Engineering

This means repeat the row for each element in WC and keeping the same value
in TC. The goal is to check how many TC (sum) there are by WC, when WC is
multiple.

i've tried to separate the column using strsplt but then I cannot keep the
track of TC.

thanks in advance.
--
João Azevedo Patrício
Best that I've come up with, which seems to give the result desired
from the example data given.

splitAtSemiColon <- function(input) {
     z <- strsplit(input$WC,';');
     result <- data.table(TC=rep(input$TC,sapply(z,length)), WC=unlist(z));
     return(result);
}

flatted.data <- splitAtSemiColon(original.data);

<transcript>
print(original.data,right=FALSE)
   TC
1 0
2 0
3 2
4 0
5 2
   WC
1 Instruments & Instrumentation; Nuclear Science & Technology;
Physics, Particles & Fields; Spectroscopy
2 Nanoscience & Nanotechnology; Materials Science, Multidisciplinary;
Physics, Applied
3 Physics, Nuclear; Physics, Particles & Fields
4 Chemistry, Inorganic & Nuclear
5 Chemistry, Physical; Materials Science, Multidisciplinary;
Metallurgy & Metallurgical Engineering
print(splitAtSemiColon,right=FALSE);
function(x) {
     z=strsplit(x$WC,';');
     result3=data.frame(TC=rep(x$TC,sapply(z,length)),WC=unlist(z));
     return(result3);
}
print(splitAtSemiColon(original.data),right=FALSE);
    TC WC
1  0  Instruments & Instrumentation
2  0   Nuclear Science & Technology
3  0   Physics, Particles & Fields
4  0   Spectroscopy
5  0  Nanoscience & Nanotechnology
6  0   Materials Science, Multidisciplinary
7  0   Physics, Applied
8  2  Physics, Nuclear
9  2   Physics, Particles & Fields
10 0  Chemistry, Inorganic & Nuclear
11 2  Chemistry, Physical
12 2   Materials Science, Multidisciplinary
13 2   Metallurgy & Metallurgical Engineering

Note that I still have a problem in that the WC data can have leading
and/or trailing blanks due to the say that strsplit works. The easiest
way to fix this is to use the strtrim() function from the stringr
package.


Yes also have that problem. Tried to work it ou using "sub" but didn't work at all.

--
João Azevedo Patrício
Tel.: +31 91 400 53 63
Portugal
@ http://tripaforra.bl.ee

"Take 2 seconds to think before you act"

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to