On Mon, 21 Jan 2008, Boks, M.P.M. wrote:

> 
> Dear R-experts,
> 
> My problem is how to handle a 10GB data file containing genotype data. The
> file is in a particular format (Illumina final report) and needs to be 
> altered
> and merged with phenotype data for further analysis.
> 

If the data have all the SNPs for one individual, then all the SNPs for the 
next individual, and so on, you can read in 305000 lines of data, look up the 
phenotype, then write out one line of output, eg with cat().

As another approach, I've been using the ncdf package for handling Illumina 
genotype data (slightly larger datasets, and multiple phenotypes).  This has 
been faster and more compact than SQLite (because it doesn't need indexes to do 
random access by person and by SNP). It is then easy to write analyses by SNP 
(association tests) or analyses by person (allele sharing, population 
structure), and even analyses by genomic region (all SNPs in chr9q21.3)

     -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]       University of Washington, Seattle

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to