grahamdic...@gmail.com wrote:
Hi

I have an excel file that is read into python (8000 rows)

from csv        import reader, writer
incsv = reader(open(MY_FILE), dialect='excel')
keys = incsv.next()

There are mixed datatypes.

the last column contains a cumulative frequency running in order
0.0000 to 1.0000 for the 8000 rows

for a loop of 100,000 times I want to take a new random number each
time and find the row with the next biggest cumulative frequency value

Here's my current (pseudo)code:

for 1 to 100000

myRand = random.random()
for line in incsv:
            if float(line[-1]) > myRand:
                resline = []
                for item in line:
                    try:
                        i = int(item)
                    except ValueError:
                        try:
                            i = float(item)
                        except ValueError:
                            i = item
                    resline.append(i)
                #Here we construct a dict of pair values:
                #{'ID':18,...
                res = dict(zip(keys,resline))
                break
            else:
                continue

      #do some stuff with res




I'm scanning over each line of the csv and deciding which row to
select (100k times) this is just not very efficient

How can i improve this code.
for line in incsv:
            if float(line[-1]) > random.random():

I can use numpy etc. whatever

Here's a suggestion:

Construct the dicts for all the rows, stored in a list.

Construct a list of just the cumulative frequencies.

For each random value, use the bisect module to search for the value in
the cumulative frequencies list. This will return an index which you can
then use on your list of dicts.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to