Re: using the SHELL function to GREP a body of text

jameshale Sat, 19 Mar 2016 06:15:32 -0700

Peter TB Brett wrote
> Unfortunately XHTML and HTML are not regular languages, which means that 
> they cannot be processed correctly with regular expressions.
> 
> Indeed, "Implement an HTML parser using regular expressions" is a 
> well-known prank project to suggest for inexperienced developers to 
> waste their time on...
> 
> So, your approach is sadly not workable.
> 
> If you're processing XHTML, I recommend using revXML.
> 
> If you need to process arbitrary HTML, then unfortunately the only 
> sensible option is to use a browser...


Bummer.
Not only are XHTML and HTML not regular languages but their use in ePub's is
even more irregular (if that is possible.)
I have some texts which include both  forms: 
Others where every tag 'h', 'p' etc has an id attribute.

A browser is not an option as I will need to use LCs chunking and text
selection features.

I am using the htmltext of a field and given the htmltext function ignores
most of what I was trying to remove it probably doesn't matter in the end. 
Just a bit untidy.

Thanks Peter.



--
View this message in context: 
http://runtime-revolution.278305.n4.nabble.com/using-the-SHELL-function-to-GREP-a-body-of-text-tp4702346p4702348.html
Sent from the Revolution - User mailing list archive at Nabble.com.

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: using the SHELL function to GREP a body of text

Reply via email to