--- Begin Message --- Couldn't find anything in Smalltalk but that should you give ideas and inspire you or get you started...

https://github.com/search?q=contact+scraping&type=Repositories

I guess we have all that's needed in Pharo : parsers (HTML, XML, PetitParser), Soup & regex !

On 2019-03-07 04:52, Cédrick Béler wrote:
Hi all,

I’ve often got the need to analyse some random unstructured text to discover 
(structured) information (in email for instance), to extract :
- emails
- telephone numbers
- addresses
- events
- person names (according to a list of known persons),
- etc…

Apple do it in email for instance (strangely, this is not generalized).


So my questions are :
- do we have something equivalent in Smalltalk/Pharo ? (I didn’t find)
- if not, what strategy would you use ?
=> I do really stupid text analysis (substrings, finding @, …, parsing 
according to the text structure when there is… kind of Soup parsing…)
=> I feel this is a job for PetitParser ? And would be a nice feet to the new 
GToolkit.

All ideas or suggestions are welcome ;-)


TIA,

Cédrick



--
-----------------
Benoît St-Jean
Yahoo! Messenger: bstjean
Twitter: @BenLeChialeux
Pinterest: benoitstjean
Instagram: Chef_Benito
IRC: lamneth
Blogue: endormitoire.wordpress.com
"A standpoint is an intellectual horizon of radius zero".  (A. Einstein)



--- End Message ---

Reply via email to