A similar discussion has already occurred, over 4 years ago: http://groups.google.com/group/comp.lang.python/browse_thread/thread/b806ada0732643d/5dff55826a199928?lnk=gst&q=list+in+place#5dff55826a199928
Nevertheless, I have a use-case where such a discussion comes up. For my data mining class I'm writing an implementation of the bisecting KMeans clustering algorithm (if you're not familiar with clustering and are interested, this gives a decent example based overview: http://rakaposhi.eas.asu.edu/cse494/notes/f02-clustering.ppt). Given a CSV dataset of n records, we are to cluster them accordingly. The dataset is generalizable enough to have any kind of data-type (strings, floats, booleans, etc) for each of the record's columnar values, for example here's a couple of records from the famous iris dataset: 5.1,3.5,1.4,0.2,Iris-setosa 6.4,3.2,4.5,1.5,Iris-versicolor Now we can't calculate a meaningful Euclidean distance for something like "Iris-setosa" and "Iris-versicolor" unless we use string-edit distance or something overly complicated, so instead we'll use a simple quantization scheme of enumerating the set of values within the column domain and replacing the strings with numbers (i.e. Iris-setosa = 1, iris-versicolor=2). So I'm reading in values from a file, and for each column I need to dynamically discover the range of possible values it can take and quantize if necessary. This is the solution I've come up with: <code> def createInitialCluster(fileName): #get the data from the file points = [] with open(fileName, 'r') as f: for line in f: points.append(line.rstrip('\n')) #clean up the data fixedPoints = [] for point in points: dimensions = [quantize(i, points, point.split(",").index(i)) for i in point.split(",")] print dimensions fixedPoints.append(Point(dimensions)) #return an initial cluster of all the points return Cluster(fixedPoints) def quantize(stringToQuantize, pointList, columnIndex): #if it's numeric, no need to quantize if(isNumeric(stringToQuantize)): return float(stringToQuantize) #first we need to discover all the possible values of this column domain = [] for point in pointList: domain.append(point.split(",")[columnIndex]) #make it a set to remove duplicates domain = list(Set(domain)) #use the index into the domain as the number representing this value return float(domain.index(stringToQuantize)) #harvested from http://www.rosettacode.org/wiki/IsNumeric#Python def isNumeric(string): try: i = float(string) except ValueError: return False return True </code> It works, but it feels a little ugly, and not exactly Pythonic. Using two lists I need the original point list to read in the data, then the dimensions one to hold the processed point, and a fixedPoint list to make objects out of the processed data. If my dataset is in the order of millions, this'll nuke the memory. I tried something like: for point in points: point = Point([quantize(i, points, point.split(",").index(i)) for i in point.split(",")]) but when I print out the points afterward, it doesn't keep the changes. What's a more efficient way of doing this? -- http://mail.python.org/mailman/listinfo/python-list