John Filben wrote:
I am new to Python but have used many other (mostly dead) languages in
the past. I want to be able to process *.txt and *.csv files. I can
now read that and then change them as needed – mostly just take a column
and do some if-then to create a new variable. My problem is sorting
these files:
1.) How do I sort file1.txt by position and write out
file1_sorted.txt; for example, if all the records are 100 bytes long and
there is a three digit id in the position 0-2; here would be some sample
data:
a. 001JohnFilben……
b. 002Joe Smith…..
Use a dictionary:
linedict = {}
for line in f:
key = line[:3]
linedict[key] = line[3:] # or alternatively 'line' if you want to
include key in the line anyway
sortedlines = []
for key in linedict.keys().sort():
sortedlines.append(linedict[key])
(untested)
This is the simplest, and probably inefficient approach. But it should work.
2.) How do I sort file1.csv by column name; for example, if all the
records have three column headings, “id”, “first_name”, “last_name”;
here would be some sample data:
a. Id, first_name,last_name
b. 001,John,Filben
c. 002,Joe, Smith
This is more complicated: I would make a list of lines, where each line
is a list split according to columns (like ['001', 'John', 'Filben']),
and then I would sort this list using operator.itemgetter, like this:
lines.sort(key = operator.itemgetter(num)) # where num is the number of
column, starting with 0 of course
Read up on operator.*, it's very useful.
3.) What about if I have millions of records and I am processing on a
laptop with a large external drive – basically, are there space
considerations? What are the work arounds.
The simplest is to use smth like SQLite: define a table, fill it up, and
then do SELECT with ORDER BY.
But with a million records I wouldn't worry about it, it should fit in
RAM. Observe:
>>> a={}
>>> for i in range(1000000):
... a[i] = 'spam'*10
...
>>> sys.getsizeof(a)
25165960
So that's what, 25 MB?
Although I have to note that TEMPORARY ram usage in Python process on my
machine did go up to 113MB.
Regards,
mk
--
http://mail.python.org/mailman/listinfo/python-list