Parsing a serial stream too slowly
Hello, I am having some trouble with a serial stream on a project I am working on. I have an external board that is attached to a set of sensors. The board polls the sensors, filters them, formats the values, and sends the formatted values over a serial bus. The serial stream comes out like $A1234$$B-10$$C987$, where "$A.*$" is a sensor value, "$B.*$" is a sensor value, "$C.*$" is a sensor value, ect... When one sensor is running my python script grabs the data just fine, removes the formatting, and throws it into a text control box. However when 3 or more sensors are running, I get output like the following: Sensor 1: 373 Sensor 2: 112$$M-160$G373 Sensor 3: 763$$A892$ I am fairly certain this means that my code is running too slow to catch all the '$' markers. Below is the snippet of code I believe is the cause of this problem... def OnSerialRead(self, event): text = event.data self.sensorabuffer = self.sensorabuffer + text self.sensorbbuffer = self.sensorbbuffer + text self.sensorcbuffer = self.sensorcbuffer + text if sensoraenable: sensorresult = re.search(r'\$A.*\$.*', self.sensorabuffer ) if sensorresult: s = sensorresult.group(0) s = s[2:-1] if self.sensor_enable_chkbox.GetValue(): self.SensorAValue = s self.sensorabuffer = '' if sensorbenable: sensorresult = re.search(r'\$A.*\$.*', self.sensorbenable) if sensorresult: s = sensorresult.group(0) s = s[2:-1] if self.sensor_enable_chkbox.GetValue(): self.SensorBValue = s self.sensorbenable= '' if sensorcenable: sensorresult = re.search(r'\$A.*\$.*', self.sensorcenable) if sensorresult: s = sensorresult.group(0) s = s[2:-1] if self.sensor_enable_chkbox.GetValue(): self.SensorCValue = s self.sensorcenable= '' self.DisplaySensorReadings() I think that regex is too slow for this operation, but I'm uncertain of another method in python that could be faster. A little help would be appreciated. -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a serial stream too slowly
On Jan 23, 5:00 pm, Jon Clements wrote: > On Jan 23, 9:48 pm, "M.Pekala" wrote: > > > > > > > > > > > Hello, I am having some trouble with a serial stream on a project I am > > working on. I have an external board that is attached to a set of > > sensors. The board polls the sensors, filters them, formats the > > values, and sends the formatted values over a serial bus. The serial > > stream comes out like $A1234$$B-10$$C987$, where "$A.*$" is a sensor > > value, "$B.*$" is a sensor value, "$C.*$" is a sensor value, ect... > > > When one sensor is running my python script grabs the data just fine, > > removes the formatting, and throws it into a text control box. However > > when 3 or more sensors are running, I get output like the following: > > > Sensor 1: 373 > > Sensor 2: 112$$M-160$G373 > > Sensor 3: 763$$A892$ > > > I am fairly certain this means that my code is running too slow to > > catch all the '$' markers. Below is the snippet of code I believe is > > the cause of this problem... > > > def OnSerialRead(self, event): > > text = event.data > > self.sensorabuffer = self.sensorabuffer + text > > self.sensorbbuffer = self.sensorbbuffer + text > > self.sensorcbuffer = self.sensorcbuffer + text > > > if sensoraenable: > > sensorresult = re.search(r'\$A.*\$.*', self.sensorabuffer ) > > if sensorresult: > > s = sensorresult.group(0) > > s = s[2:-1] > > if self.sensor_enable_chkbox.GetValue(): > > self.SensorAValue = s > > self.sensorabuffer = '' > > > if sensorbenable: > > sensorresult = re.search(r'\$A.*\$.*', self.sensorbenable) > > if sensorresult: > > s = sensorresult.group(0) > > s = s[2:-1] > > if self.sensor_enable_chkbox.GetValue(): > > self.SensorBValue = s > > self.sensorbenable= '' > > > if sensorcenable: > > sensorresult = re.search(r'\$A.*\$.*', self.sensorcenable) > > if sensorresult: > > s = sensorresult.group(0) > > s = s[2:-1] > > if self.sensor_enable_chkbox.GetValue(): > > self.SensorCValue = s > > self.sensorcenable= '' > > > self.DisplaySensorReadings() > > > I think that regex is too slow for this operation, but I'm uncertain > > of another method in python that could be faster. A little help would > > be appreciated. > > You sure that's your code? Your re.search()'s are all the same. Whoops you are right. the search for the second should be re.search(r'\ $B.*\$.*', self.sensorbbuffer ), for the third re.search(r'\$C.*\$.*', self.sensorcbuffer ) -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a serial stream too slowly
On Jan 23, 6:49 pm, Cameron Simpson wrote: > On 23Jan2012 13:48, M.Pekala wrote: > | Hello, I am having some trouble with a serial stream on a project I am > | working on. I have an external board that is attached to a set of > | sensors. The board polls the sensors, filters them, formats the > | values, and sends the formatted values over a serial bus. The serial > | stream comes out like $A1234$$B-10$$C987$, where "$A.*$" is a sensor > | value, "$B.*$" is a sensor value, "$C.*$" is a sensor value, ect... > | > | When one sensor is running my python script grabs the data just fine, > | removes the formatting, and throws it into a text control box. However > | when 3 or more sensors are running, I get output like the following: > | > | Sensor 1: 373 > | Sensor 2: 112$$M-160$G373 > | Sensor 3: 763$$A892$ > | > | I am fairly certain this means that my code is running too slow to > | catch all the '$' markers. Below is the snippet of code I believe is > | the cause of this problem... > > Your code _is_ slow, but as you can see above you're not missing data, > you're gathering too much data. > > Some point by point remarks below. The actual _bug_ is your use of ".*" > in your regexps. Some change suggestions below the code. > > | def OnSerialRead(self, event): > | text = event.data > | self.sensorabuffer = self.sensorabuffer + text > | self.sensorbbuffer = self.sensorbbuffer + text > | self.sensorcbuffer = self.sensorcbuffer + text > > Slow and memory wasteful. Supposing a sensor never reports? You will > accumulate an ever growing buffer string. And extending a string gets > expensive as it grows. > > | if sensoraenable: > | sensorresult = re.search(r'\$A.*\$.*', self.sensorabuffer ) > > Slow and buggy. > > The slow: You're compiling the regular expression _every_ time you come > here (unless the re module caches things, which I seem to recall it may. > But that efficiency is only luck. > > The bug: supposing you get multiple sensor reports, like this: > > $A1$$B2$$C3$ > > Your regexp matches the whole thing! Because ".*" is greedy. > You want "[^$]*" - characters that are not a "$". > > | if sensorresult: > | s = sensorresult.group(0) > | s = s[2:-1] > | if self.sensor_enable_chkbox.GetValue(): > | self.SensorAValue = s > | self.sensorabuffer = '' > > What if there are multiple values in the buffer? After fixing your > regexp you will now be throwing them away. Better to go: > > self.sensorabuffer = self.sensorabuffer[sensorresult.end():] > > [...] > | I think that regex is too slow for this operation, but I'm uncertain > | of another method in python that could be faster. A little help would > | be appreciated. > > Regex _is_ slow. It is good for flexible lexing, but generally Not > Fast. It can be faster than in-Python lexing because the inner > interpreation of the regex is C code, but is often overkill when speed > matters. (Which you may find it does not for your app - fix the bugs > first and see how it behaves). > > I would be making the following changes if it were me: > > - keep only one buffer, and parse it into sensor "tokens" > pass each token to the right sensor as needed > > - don't use regexps > this is a speed thing; if you code is more readable with regexps and > still faster enough you may not do this > > To these ends, untested attempt 1 (one buffer, lex into tokens, still > using regexps): > > re_token = re.compile( r'\$([A-Z])([^$]*)\$' ) > > def OnSerialRead(self, event): > # accessing a local var is quicker and more readable > buffer = self.buffer > > text = event.data > buffer += text > > m = re_token.search(buffer) > while m: > sensor, value = m.group(1), m.group(2) > buffer = buffer[m.end():] > if sensor == 'A': > # ... > elif sensor == 'B': > # ... > else: > warning("unsupported sensor: %s", sensor) > > # stash the updated buffer for later > self.buffer = buffer > > I'm assuming here that you can get noise in the serial stream. If you > are certain to get only clean "$Ax$" sequences and nothing else you can > make the co