On Fri, Jul 4, 2014 at 7:50 AM, João Azevedo Patrício
<joao.patri...@gmx.pt> wrote:
Hi,
I've been trying to solve this issue but with no success.
I have some data like this:
1 > TC WC
2 > 0 Instruments & Instrumentation; Nuclear Science & Technology;
Physics, Particles & Fields; Spectroscopy
3 > 0 Nanoscience & Nanotechnology; Materials Science, Multidisciplinary;
Physics, Applied
4 > 2 Physics, Nuclear; Physics, Particles & Fields
5 > 0 Chemistry, Inorganic & Nuclear
6 > 2 Chemistry, Physical; Materials Science, Multidisciplinary;
Metallurgy & Metallurgical Engineering
And I need to have this:
1 > TC WC
2 > 0 Instruments & Instrumentation
2 > 0 Nuclear Science & Technology
2 > 0 Physics, Particles & Fields
2 > 0 Spectroscopy
3 > 0 Nanoscience & Nanotechnology
3 > 0 Materials Science, Multidisciplinary
3 > 0 Physics, Applied
4 > 2 Physics, Nuclear
4 > 2 Physics, Particles & Fields
5 > 0 Chemistry, Inorganic & Nuclear
6 > 2 Chemistry, Physical
6 > 2 Materials Science, Multidisciplinary
6 > 2 Metallurgy & Metallurgical Engineering
This means repeat the row for each element in WC and keeping the same value
in TC. The goal is to check how many TC (sum) there are by WC, when WC is
multiple.
i've tried to separate the column using strsplt but then I cannot keep the
track of TC.
thanks in advance.
--
João Azevedo Patrício
Best that I've come up with, which seems to give the result desired
from the example data given.
splitAtSemiColon <- function(input) {
z <- strsplit(input$WC,';');
result <- data.table(TC=rep(input$TC,sapply(z,length)), WC=unlist(z));
return(result);
}
flatted.data <- splitAtSemiColon(original.data);
<transcript>
print(original.data,right=FALSE)
TC
1 0
2 0
3 2
4 0
5 2
WC
1 Instruments & Instrumentation; Nuclear Science & Technology;
Physics, Particles & Fields; Spectroscopy
2 Nanoscience & Nanotechnology; Materials Science, Multidisciplinary;
Physics, Applied
3 Physics, Nuclear; Physics, Particles & Fields
4 Chemistry, Inorganic & Nuclear
5 Chemistry, Physical; Materials Science, Multidisciplinary;
Metallurgy & Metallurgical Engineering
print(splitAtSemiColon,right=FALSE);
function(x) {
z=strsplit(x$WC,';');
result3=data.frame(TC=rep(x$TC,sapply(z,length)),WC=unlist(z));
return(result3);
}
print(splitAtSemiColon(original.data),right=FALSE);
TC WC
1 0 Instruments & Instrumentation
2 0 Nuclear Science & Technology
3 0 Physics, Particles & Fields
4 0 Spectroscopy
5 0 Nanoscience & Nanotechnology
6 0 Materials Science, Multidisciplinary
7 0 Physics, Applied
8 2 Physics, Nuclear
9 2 Physics, Particles & Fields
10 0 Chemistry, Inorganic & Nuclear
11 2 Chemistry, Physical
12 2 Materials Science, Multidisciplinary
13 2 Metallurgy & Metallurgical Engineering
Note that I still have a problem in that the WC data can have leading
and/or trailing blanks due to the say that strsplit works. The easiest
way to fix this is to use the strtrim() function from the stringr
package.