On Tue, Feb 4, 2014 at 3:07 PM, Thomas <t.tchorzew...@gmail.com> wrote: > I've written a script to log data from my Arduino to a csv file. The script > works well enough but it's very, very slow. I'm quite new to Python and I > just wanted to put this out there to see if any Python experts could help > optimise my code. Here it is: >
The most important question is: Have you profiled your code? That is, do you know which part(s) are slow? And the first part of that question is: Define "slow". How long does your program actually take? How much data are you accumulating in that time? By my reading, you're getting 60 lines from the serial port; how long is each line? For a basic back-of-the-envelope timing estimate, work out how long each line is (roughly), and multiply by 60 (number of lines), then divide by 960 (your baud rate, divided by 10, which gives a rough bytes-per-second rate). That'll give you a ball-park figure for how many seconds this loop will take: > # Collecting the data from the serial port > while True: > data_log.append(connection.readline()) > if len(data_log) > max_length - 1: > break If your lines are 80 characters long, give or take, then that works out to 80*60/960 = five seconds just to fetch the data from the serial port. So if the script is taking anywhere from 3 to 8 seconds to run, then you can assume that it's probably spending most of its time right here. Yes, that might feel like it's really slow, but it's all down to your baud rate. You can easily confirm this by surrounding that loop with: import time start_time = time.time() while ... ... append ... print("Time to fetch from serial port:",time.time()-start_time) (If this is Python 2, this will produce slightly ugly output as it'll display it as a tuple. It'll work though.) Put that in, and then run the program with some kind of external timing. On Linux, that would be: $ time python scriptname.py If the total script execution is barely more than the time spent in that loop, don't bother optimizing any of the rest of the code - it's a waste of time improving something that's not materially affecting your total run time. But while I'm here looking at your code, I'll take the liberty of making a few stylistic suggestions :) Feel free to ignore these, but this is the more Pythonic way of writing the code. Starting with the serial port fetch loop: > # Collecting the data from the serial port > while True: > data_log.append(connection.readline()) > if len(data_log) > max_length - 1: > break An infinite loop with a conditional break exactly at one end. This would be clearer written as a straight-forward conditional loop, with the condition inverted: while len(data_log) < max_length: data_log.append(connection.readline()) This isn't strictly identical to your previous version, but since len(data_log) will always be an integer, it is - I believe - functionally equivalent. But make sure I haven't introduced any bugs. :) (There is another difference, in that my version checks the condition on entry where yours would check it only after doing the first append - your version would guarantee a minimum of one line in the log, mine won't. I'm guessing that this won't be significant in normal usage; if it is, go back to your version of the code.) > def map(x, in_min, in_max, out_min, out_max): > return (((x - in_min) * (out_max - out_min))/(in_max - in_min)) + > out_min This shadows the built-in map() function. It works, but it's usually safer to not do this, in case you subsequently want the default map(). > line_data = re.findall('\d*\.\d*',line) # Find all digits > line_data = filter(None,line_data) # Filter out empty strings > line_data = [float(x) for x in line_data] # Convert Strings to > float You can combine these. line_data = [float(x) for x in re.findall('\d*\.\d*',line) if x] Optionally break out the re into a separate line, but definitely I'd go with the conditional comprehension above the separate filter(): line_data = re.findall('\d*\.\d*',line) line_data = [float(x) for x in line_data if x] > for i in range(1,len(line_data)): > line_data[i]=map(line_data[i],0,1023,0,5) Is it deliberate that you start from 1 here? Are you consciously skipping the first element, leaving it unchanged? If not, I would go for another comprehension: line_data = [map(x,0,1023,0,5) for x in line_data] but if so, this warrants a code comment explaining why the first one is special. > plt.clf() > plt.close() > plt.plotfile('data.csv',(0,1,2),names=['time (s)','voltage2 > (V)','voltage1 (V)'],newfig=True) > plt.show() This, btw, is the other place that I'd look for a potentially large slab of time. Bracket this with time.time() calls as above, and see whether it's slow enough to mean nothing else is a concern. But I'd first look at the serial port read loop; depending on how long your lines are, that could easily be quite a few seconds of time just on its own. ChrisA -- https://mail.python.org/mailman/listinfo/python-list