Dear Mustafa,

I'd recommend the packages readxls to import the data, tidyr to transform
the data into long format and dplyr to select the data.

1. read the data into R with read_excel()
2. transform sheet 1 into a long format with gather(). The result is one
row for each patient / drug combination
3. select the relevant drugs in sheet 2 with filter()
4. join long sheet 1 and filtered sheet2 with inner_join()
5. summarise() the drug codes and names after group_by(patient_id)
6. count() the number of drug codes.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2017-04-24 7:07 GMT+02:00 abo dalash <abo_d...@hotmail.com>:

> Hi there
>
> I have data set with 500,000 patients (rows) and the first column is
> Patient I'D Number, columns from 2 to 20 are Drug1,Drug 2,...,Drug 20 so
> each row starts with the patient ID and the remaining of cells in the row
> are codes for names of the treatments taken by the patient. Number of
> treatments differ between patients. For example, there are patients with 3
> treatments only and patients with 20 drugs. The unique number of treatments
> in the entire data set is about 6700 drugs. However, I'm interested in
> studying only 128 drugs, these drugs are listed in a second sheet as code
> numbers associated with their meanings (names of drugs representing the
> code). I'm interested in identifying the most frequently used DRUG
> COMBINATIONS between only the 128 drugs among the 6700 drugs. The structure
> of the Excell file to be used in analysis is like this:
>
> -Sheet 1( the entire data set):-
> 1   Patient ID    Drug1    Drug2 ....    Drug 20.
> 2  1125            45            46                55
> 3  1126              60           55                 45
> .
> .
> 500,000
>
> -Sheet 2 (list of codes meanings for only the drugs of interest):
>
> 1    Drug code             meaning
> 2    45                            Simvastatin
> 3    55                            Aspirin
> 4    60                            Paracetamol
> .
> 128
>
> The desired output I'm looking for :
>
>  Drug codes      Meaning          Frequency
> 45+55               Simvastatin              2
>                          +Aspirin
> 60+55             Aspirin+                       1
>                         Paracetamol
> 60+45             Simvastatin+                1
>                         Paracetamol
>
> Please note the the final output does not include any combination
> containing drug 46 as this is not in the list of drugs preferred to be
> studied which are mentioned in sheet 2.
>
> Could you please help me which R codes and packages should be used to run
> this analyisis?
>
> Regards
> Mustafa
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to