losing-end-of-row values when manipulating CSV input

Neil Berg Wed, 13 Jul 2011 13:24:30 -0700

Hello all,

I am having an issue with my attempts to accurately filter some data from a CSV 
file I am importing.  I have attached both a sample of the CSV data and my 
script.


The attached CSV file contains two rows and 27 columns of data.  The first 
column is the station ID "BLS", the second column is the sensor number "4", the 
third column is the date, and the remaining 24 columns are hourly temperature 
readings. 

In my attached script, I read in row[3:] to extract just the temperatures, do a 
sanity check to make sure there are 24 values, remove any missing or "m" 
values, and then append the non-missing values into the "hour_list".  

Strangely the the first seven rows appear to be empty after reading into the 
CSV file, so that's what I had to incorporate the if len(temps) == 24 
statement.  

But the real issue is that for days with no missing values, for example the 
second row of data, the length of the hour_list should be 24.  My script, 
however, is returning 23.  I think this is because the end-of-row-values have a 
trailing "\". This must mark these numbers as non-digits and are lost in my 
"isdig" filter line.  I've tried several ways to remove this trailing "\", but 
to no success. 

Do you have any suggestions on how to fix this issue?

Many thanks in advance,

Neil Berg

# Purpose: read in a CSV file containing hourly temps. at each station,
# then append non-missing hourly data into a list and find the maximum
# value of that list
#---------------------------------------------------------------------
import csv
from numpy import *

f = csv.reader(open('csv_sample.csv','rb'))

for row in f:
	temps= row[3:] #extract hourly temps, neglect station ID,sensor ID, and date 
	#print temps # you see here that the first seven rows are empty 
	if len(temps) == 24: #only keep rows with 24 temps in them  
		hour_list = [] #empty list of all integer hourly temps, i.e. exclude missing "m" values
		for val in temps:	
			#print val #here you can see that the end-of-row values have a trailing "\"
			#--------------------------------------------------------------------
			# This is where I want to strip the trailing "\" before removing any
			# missing or "m" values
			#--------------------------------------------------------------------
			isdig = str.isdigit(val) 
			if isdig is True: 
				hour_list.append(val) 
		print len(hour_list) #should be 24 for rows with no missing values, but it's 23 as is

{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf350
{\fonttbl\f0\fmodern\fcharset0 Courier;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww14540\viewh7680\viewkind0
\deftab720
\pard\pardeftab720\ql\qnatural

\f0\fs24 \cf0 BLS,4,19981101,37,m,36,34,36,35,34,34,35,36,38,39,43,42,42,42,38,36,34,32,33,33,35,34\
BLS,4,19981102,34,32,33,32,34,32,33,32,34,38,40,41,44,47,43,42,39,36,35,35,36,36,35,33\
}

-- 
http://mail.python.org/mailman/listinfo/python-list

losing-end-of-row values when manipulating CSV input

Reply via email to