Hi All,

I need some help to get started on a script.

I have these huge data files 16K rows and several columns. I need to parse
the rows into a subset of these 16K rows. Each rows has a identifier made
up of 2 letters and 6 numbers and the ones I want have specific letter,
they start with either C or D. So I know I can use regex, but I have been
trying to figure out the rest and I don't know where to start. This is the
first time I am trying to do something from scratch so any suggestions
would be appreciated. I am not asking for the script but just some help on
how to go about it.

So, what I want to be able to do is retrieve all the rows that have
identifiers starting with C or D. Should I use arrays, can I store each row
as one item a (set of information separated by tabs) in an array?

Here is an example of how the file looks like. So I would like to use the
Gene ID field to parse it.

  Field Meta Row Meta Column Row Column Gene ID Annotation 1 Flag Signal
Mean Background Mean Signal Median Background Median Signal Mode Background
Mode Signal Area Background Area Signal Total  A 2 1 9 9 AA067532
Arabidopsis Negative Control 2 352.9428 203.4924 77 1 168.1093 55.8592 70
329 24706  A 2 1 9 10 AA067532 Arabidopsis Negative Control 2 352.4057
213.3951 99 1 44.659 48.423 69 329 24316

Thanks,

Tiago

-- 
"Education is not to be used to promote obscurantism." - Theodonius
Dobzhansky.

"Gracias a la vida que me ha dado tanto
Me ha dado el sonido y el abecedario
Con él, las palabras que pienso y declaro
Madre, amigo, hermano
Y luz alumbrando la ruta del alma del que estoy amando

Gracias a la vida que me ha dado tanto
Me ha dado la marcha de mis pies cansados
Con ellos anduve ciudades y charcos
Playas y desiertos, montañas y llanos
Y la casa tuya, tu calle y tu patio"

Violeta Parra - Gracias a la Vida

Tiago S. F. Hori
PhD Candidate - Ocean Science Center-Memorial University of Newfoundland

Reply via email to