Subsetting a dataset
I have a huge dataset containing millions of rows and several dozen columns in a tab delimited text file. I need to extract a small subset of rows and only three columns. One of the three columns has two word string with header “Scientific Name”. The other two columns carry numbers for Longitude and Latitude, as below. Sci Name Longitude Latitude Column4 Gen sp1 82.5 28.4 … Gen sp2 45.9 29.7 … Gen sp1 57.9 32.9 … … … … … Of the many species listed under the column “Sci Name”, I am interested in only one species which will have multiple records interspersed in the millions of rows, and I will probably have to use filename.readline() to read the rows one at a time. How would I search for a particular species in the dataset and create a new dataset for the species with only the three columns? Next, I have to create such datasets for hundreds of species. All these species are listed in another text file. There must be a way to define an iterative function that looks at one species at a time in the list of species and creates separate dataset for each species. The huge dataset contains more species than those listed in the list of my interest. I very much appreciate any help. I am a beginner in Python. So, complete code would be more helpful. - Kumar -- Section of Integrative Biology University of Texas at Austin Austin, Texas 78712, USA -- http://mail.python.org/mailman/listinfo/python-list
Subsetting a dataset
I have a huge dataset containing millions of rows and several dozen columns in a tab delimited text file. I need to extract a small subset of rows and only three columns. One of the three columns has two word string with header “Scientific Name”. The other two columns carry numbers for Longitude and Latitude, as below. Sci Name Longitude Latitude Column4 Gen sp1 82.5 28.4 … Gen sp2 45.9 29.7 … Gen sp1 57.9 32.9 … … … … … Of the many species listed under the column “Sci Name”, I am interested in only one species which will have multiple records interspersed in the millions of rows, and I will probably have to use filename.readline() to read the rows one at a time. How would I search for a particular species in the dataset and create a new dataset for the species with only the three columns? Next, I have to create such datasets for hundreds of species. All these species are listed in another text file. There must be a way to define an iterative function that looks at one species at a time in the list of species and creates separate dataset for each species. The huge dataset contains more species than those listed in the list of my interest. I very much appreciate any help. I am a beginner in Python. So, complete code would be more helpful. - Kumar -- Section of Integrative Biology University of Texas at Austin Austin, Texas 78712, USA -- http://mail.python.org/mailman/listinfo/python-list
Selecting unique values
Greetings I have a dataset with occurrence records of multiple species. I need to get rid of multiple listings of the same occurrence point for a species (as you see below in red and blue typeface). How do I create a dataset only with unique set of longitude and latitude for each species? Thanks in advance. Species_name Longitude Latitude Abies concolor -106.601 35.868 Abies concolor -106.493 35.9682 Abies concolor -106.489 35.892 Abies concolor -106.496 35.8542 Accipiter cooperi -119.688 34.4339 Accipiter cooperi -119.792 34.5069 Accipiter cooperi -118.797 34.2581 Accipiter cooperi -77.38333 39.68333 Accipiter cooperi -77.38333 39.68333 Accipiter cooperi -75.99153 40.65 Accipiter cooperi -75.99153 40.65 - Kumar -- http://mail.python.org/mailman/listinfo/python-list
Re: Selecting unique values
Thank you everybody. I can extract unique values now. - Kumar On Tue, Jul 26, 2011 at 2:38 PM, Dan Stromberg wrote: > > Some good stuff has already been suggested. Another possibility is using a > treap (not a duptreap but a treap): > > http://stromberg.dnsalias.org/~dstromberg/treap/ > > If you just need things unique'd once, the set + yield is an excellent > option. If you need to keep things in order, but also need to make changes > now and then, the treap is very good. > > On Mon, Jul 25, 2011 at 3:03 PM, Kumar Mainali wrote: > >> Greetings >> >> I have a dataset with occurrence records of multiple species. I need to >> get rid of multiple listings of the same occurrence point for a species (as >> you see below in red and blue typeface). How do I create a dataset only with >> unique set of longitude and latitude for each species? Thanks in advance. >> >> Species_name Longitude Latitude >> Abies concolor -106.601 35.868 >> Abies concolor -106.493 35.9682 >> Abies concolor -106.489 35.892 >> Abies concolor -106.496 35.8542 >> Accipiter cooperi -119.688 34.4339 >> Accipiter cooperi -119.792 34.5069 >> Accipiter cooperi -118.797 34.2581 >> Accipiter cooperi -77.38333 39.68333 >> Accipiter cooperi -77.38333 39.68333 >> Accipiter cooperi -75.99153 40.65 >> Accipiter cooperi -75.99153 40.65 >> >> - Kumar >> >> -- >> >> http://mail.python.org/mailman/listinfo/python-list >> >> > -- http://mail.python.org/mailman/listinfo/python-list