Parsing a serial stream too slowly

2012-01-23 Thread M.Pekala
Hello, I am having some trouble with a serial stream on a project I am
working on. I have an external board that is attached to a set of
sensors. The board polls the sensors, filters them, formats the
values, and sends the formatted values over a serial bus. The serial
stream comes out like $A1234$$B-10$$C987$,  where "$A.*$" is a sensor
value, "$B.*$" is a sensor value, "$C.*$" is a sensor value, ect...

When one sensor is running my python script grabs the data just fine,
removes the formatting, and throws it into a text control box. However
when 3 or more sensors are running, I get output like the following:

Sensor 1: 373
Sensor 2: 112$$M-160$G373
Sensor 3: 763$$A892$

I am fairly certain this means that my code is running too slow to
catch all the '$' markers. Below is the snippet of code I believe is
the cause of this problem...

def OnSerialRead(self, event):
text = event.data
self.sensorabuffer = self.sensorabuffer + text
self.sensorbbuffer = self.sensorbbuffer + text
self.sensorcbuffer = self.sensorcbuffer + text

if sensoraenable:
sensorresult = re.search(r'\$A.*\$.*', self.sensorabuffer )
if sensorresult:
s = sensorresult.group(0)
s = s[2:-1]
if self.sensor_enable_chkbox.GetValue():
self.SensorAValue = s
self.sensorabuffer = ''

if sensorbenable:
sensorresult = re.search(r'\$A.*\$.*', self.sensorbenable)
if sensorresult:
s = sensorresult.group(0)
s = s[2:-1]
if self.sensor_enable_chkbox.GetValue():
self.SensorBValue = s
self.sensorbenable= ''

if sensorcenable:
sensorresult = re.search(r'\$A.*\$.*', self.sensorcenable)
if sensorresult:
s = sensorresult.group(0)
s = s[2:-1]
if self.sensor_enable_chkbox.GetValue():
self.SensorCValue = s
self.sensorcenable= ''

self.DisplaySensorReadings()

I think that regex is too slow for this operation, but I'm uncertain
of another method in python that could be faster. A little help would
be appreciated.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing a serial stream too slowly

2012-01-23 Thread M.Pekala
On Jan 23, 5:00 pm, Jon Clements  wrote:
> On Jan 23, 9:48 pm, "M.Pekala"  wrote:
>
>
>
>
>
>
>
>
>
> > Hello, I am having some trouble with a serial stream on a project I am
> > working on. I have an external board that is attached to a set of
> > sensors. The board polls the sensors, filters them, formats the
> > values, and sends the formatted values over a serial bus. The serial
> > stream comes out like $A1234$$B-10$$C987$,  where "$A.*$" is a sensor
> > value, "$B.*$" is a sensor value, "$C.*$" is a sensor value, ect...
>
> > When one sensor is running my python script grabs the data just fine,
> > removes the formatting, and throws it into a text control box. However
> > when 3 or more sensors are running, I get output like the following:
>
> > Sensor 1: 373
> > Sensor 2: 112$$M-160$G373
> > Sensor 3: 763$$A892$
>
> > I am fairly certain this means that my code is running too slow to
> > catch all the '$' markers. Below is the snippet of code I believe is
> > the cause of this problem...
>
> > def OnSerialRead(self, event):
> >         text = event.data
> >         self.sensorabuffer = self.sensorabuffer + text
> >         self.sensorbbuffer = self.sensorbbuffer + text
> >         self.sensorcbuffer = self.sensorcbuffer + text
>
> >         if sensoraenable:
> >                 sensorresult = re.search(r'\$A.*\$.*', self.sensorabuffer )
> >                         if sensorresult:
> >                                 s = sensorresult.group(0)
> >                                 s = s[2:-1]
> >                                 if self.sensor_enable_chkbox.GetValue():
> >                                         self.SensorAValue = s
> >                                 self.sensorabuffer = ''
>
> >         if sensorbenable:
> >                 sensorresult = re.search(r'\$A.*\$.*', self.sensorbenable)
> >                         if sensorresult:
> >                                 s = sensorresult.group(0)
> >                                 s = s[2:-1]
> >                                 if self.sensor_enable_chkbox.GetValue():
> >                                         self.SensorBValue = s
> >                                 self.sensorbenable= ''
>
> >         if sensorcenable:
> >                 sensorresult = re.search(r'\$A.*\$.*', self.sensorcenable)
> >                         if sensorresult:
> >                                 s = sensorresult.group(0)
> >                                 s = s[2:-1]
> >                                 if self.sensor_enable_chkbox.GetValue():
> >                                         self.SensorCValue = s
> >                                 self.sensorcenable= ''
>
> >         self.DisplaySensorReadings()
>
> > I think that regex is too slow for this operation, but I'm uncertain
> > of another method in python that could be faster. A little help would
> > be appreciated.
>
> You sure that's your code? Your re.search()'s are all the same.

Whoops you are right. the search for the second should be re.search(r'\
$B.*\$.*', self.sensorbbuffer ), for the third re.search(r'\$C.*\$.*',
self.sensorcbuffer )

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing a serial stream too slowly

2012-01-23 Thread M.Pekala
On Jan 23, 6:49 pm, Cameron Simpson  wrote:
> On 23Jan2012 13:48, M.Pekala  wrote:
> | Hello, I am having some trouble with a serial stream on a project I am
> | working on. I have an external board that is attached to a set of
> | sensors. The board polls the sensors, filters them, formats the
> | values, and sends the formatted values over a serial bus. The serial
> | stream comes out like $A1234$$B-10$$C987$,  where "$A.*$" is a sensor
> | value, "$B.*$" is a sensor value, "$C.*$" is a sensor value, ect...
> |
> | When one sensor is running my python script grabs the data just fine,
> | removes the formatting, and throws it into a text control box. However
> | when 3 or more sensors are running, I get output like the following:
> |
> | Sensor 1: 373
> | Sensor 2: 112$$M-160$G373
> | Sensor 3: 763$$A892$
> |
> | I am fairly certain this means that my code is running too slow to
> | catch all the '$' markers. Below is the snippet of code I believe is
> | the cause of this problem...
>
> Your code _is_ slow, but as you can see above you're not missing data,
> you're gathering too much data.
>
> Some point by point remarks below. The actual _bug_ is your use of ".*"
> in your regexps. Some change suggestions below the code.
>
> | def OnSerialRead(self, event):
> |       text = event.data
> |       self.sensorabuffer = self.sensorabuffer + text
> |       self.sensorbbuffer = self.sensorbbuffer + text
> |       self.sensorcbuffer = self.sensorcbuffer + text
>
> Slow and memory wasteful. Supposing a sensor never reports? You will
> accumulate an ever growing buffer string. And extending a string gets
> expensive as it grows.
>
> |       if sensoraenable:
> |               sensorresult = re.search(r'\$A.*\$.*', self.sensorabuffer )
>
> Slow and buggy.
>
> The slow: You're compiling the regular expression _every_ time you come
> here (unless the re module caches things, which I seem to recall it may.
> But that efficiency is only luck.
>
> The bug: supposing you get multiple sensor reports, like this:
>
>   $A1$$B2$$C3$
>
> Your regexp matches the whole thing! Because ".*" is greedy.
> You want "[^$]*" - characters that are not a "$".
>
> |                       if sensorresult:
> |                               s = sensorresult.group(0)
> |                               s = s[2:-1]
> |                               if self.sensor_enable_chkbox.GetValue():
> |                                       self.SensorAValue = s
> |                               self.sensorabuffer = ''
>
> What if there are multiple values in the buffer? After fixing your
> regexp you will now be throwing them away. Better to go:
>
>   self.sensorabuffer = self.sensorabuffer[sensorresult.end():]
>
> [...]
> | I think that regex is too slow for this operation, but I'm uncertain
> | of another method in python that could be faster. A little help would
> | be appreciated.
>
> Regex _is_ slow. It is good for flexible lexing, but generally Not
> Fast. It can be faster than in-Python lexing because the inner
> interpreation of the regex is C code, but is often overkill when speed
> matters. (Which you may find it does not for your app - fix the bugs
> first and see how it behaves).
>
> I would be making the following changes if it were me:
>
>   - keep only one buffer, and parse it into sensor "tokens"
>     pass each token to the right sensor as needed
>
>   - don't use regexps
>     this is a speed thing; if you code is more readable with regexps and
>     still faster enough you may not do this
>
> To these ends, untested attempt 1 (one buffer, lex into tokens, still
> using regexps):
>
>     re_token = re.compile( r'\$([A-Z])([^$]*)\$' )
>
>     def OnSerialRead(self, event):
>         # accessing a local var is quicker and more readable
>         buffer = self.buffer
>
>         text = event.data
>         buffer += text
>
>         m = re_token.search(buffer)
>         while m:
>             sensor, value = m.group(1), m.group(2)
>             buffer = buffer[m.end():]
>             if sensor == 'A':
>                 # ...
>             elif sensor == 'B':
>                 # ...
>             else:
>                 warning("unsupported sensor: %s", sensor)
>
>         # stash the updated buffer for later
>         self.buffer = buffer
>
> I'm assuming here that you can get noise in the serial stream. If you
> are certain to get only clean "$Ax$" sequences and nothing else you can
> make the co