On 16/09/2006 2:55 AM, Tempo wrote: > John Machin thanks for all of your help, and I take responsibility for > the way I worded my sentences in my last reply to this topic. So in an > effort to say sorry, I want to make it clear to everybody that it seems > as though errors in my code and use of external programs (Excel in > particular) are making "range(sh.nrows)" have faulty results. I am > trying to pinpoint the spot in my code or use of Excel, before > "range(sh.nrows) is executed, that is bugged. John Machin, I am > thrilled that the package xlrd exists at all because it simplifies a > daunting task for a beginner programer--me. Its uses are not bound to > beginners either. So thanks for the package and your help to this point. >
I'm sorry, too: I should have wrapped my post in <humour> ... </humour> tags> :-) Of course it's up to you to decide the criteria for filtering out accidental non-data from your spreadsheet. Note that this phenomenon is not restricted to spreadsheets; one often sees text data files with blank or empty lines on the end -- one's app just has to cope with that. Here's an example of a function that will classify a bunch of cells for you: def usefulness_of_cells(cells): """Score each cell: as 0 if empty, as 1 if zero-length text, as 2 if text and value.isspace() is true, otherwise as 3. Return the highest score found. """ score = 0 for cell in cells: if cell.ctype == xlrd.XL_CELL_EMPTY: continue if cell.ctype == xlrd.XL_CELL_TEXT: if not cell.value: if not score: score = 1 continue if cell.value.isspace(): score = 2 continue return 3 return score and here's an example of using it: def number_of_good_rows(sheet): """Return 1 + the index of the last row with meaningful data in it.""" for rowx in xrange(sheet.nrows - 1, -1, -1): score = usefulness_of_cells(sheet.row(rowx)) if score == 3: return rowx+1 return 0 A note on using the isspace() method: ensure that you use it on cell.value (which is Unicode), not on an 8-bit encoding (especially if your locale is set to the default ("C")). | >>> '\xA0'.isspace() | False | >>> u'\xA0'.isspace() | True | >>> import unicodedata as ucd | >>> ucd.name(u'\xA0') | 'NO-BREAK SPACE' You can get these in spreadsheets when folk paste in stuff off a web page that uses as padding (because HTML trims out leading/trailing/multiple instances of SPACE). Puzzled the heck out of me the first time I encountered it until I did: print repr(data_that_the_users_were_shrieking_about) Here's a tip: repr() in Python and "View > Page Source" in Firefox come in very handy when you have "what you see is not what you've got" problems. Anyway, I'll add something like the above functions in an examples directory in the next release of xlrd (which is at alpha stage right now). I'll also add in a Q&A section in the docs, starting with "Why does xlrd report more rows than I see on the screen?" -- so do let us know what you find down the end of your spreadsheet, in case it's a strange beast that hasn't been seen before. HTH, John -- http://mail.python.org/mailman/listinfo/python-list