Cédrick In principle, what you are asking for is to identify 'islands' of structured information in a 'sea' of otherwise unstructured material, which is now a standard pattern in PetitParser. You could imagine a parser spec of the form:
(sea optional, (email/phone/address/....), sea optional) plus Where email etc are parsers for the individual structures. As a parser this would probably lead to lots of backtracking and be hideously inefficient, but for a short text like an e-mail it could be usable. This also assumes that the items of interest are really structured; there could be many ways of writing phone numbers, for instance. HTH Peter Kenny -----Original Message----- From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Cédrick Béler Sent: 07 March 2019 09:52 To: Any question about pharo is welcome <pharo-users@lists.pharo.org> Cc: Tudor Girba <tu...@tudorgirba.com> Subject: [Pharo-users] Parsing text to discover general data of interest (phone, email, address, ...) Hi all, I’ve often got the need to analyse some random unstructured text to discover (structured) information (in email for instance), to extract : - emails - telephone numbers - addresses - events - person names (according to a list of known persons), - etc… Apple do it in email for instance (strangely, this is not generalized). So my questions are : - do we have something equivalent in Smalltalk/Pharo ? (I didn’t find) - if not, what strategy would you use ? => I do really stupid text analysis (substrings, finding @, …, parsing according to the text structure when there is… kind of Soup parsing…) => I feel this is a job for PetitParser ? And would be a nice feet to the new GToolkit. All ideas or suggestions are welcome ;-) TIA, Cédrick