Bernardo
Being now retired, I do programing just for intellectual stimulation. Your problem looked as though it would provide more interest than cryptic crosswords or Sudoku, and it touches on areas of Pharo use that I have some experience with. So… The attached file, BernardoDemo.st, shows how to use XMLHTMLParser with xPath and NeoJSON to tackle your problem – or at least a large subset of it. I cobbled it together in a Playground, and the easiest way to use it is to copy it into a Playground and ‘do it and go’ for each block of code. There are liberal comments, but if anything is not clear come back to me. A few caveats: 1. XPath is a whole other programming language, embedded in Pharo, which takes some learning. I am by no means expert in it, and it may be that I have used it clumsily. One advantage of embedding it in Pharo is that you can intersperse Pharo and XPath, which I do whenever I can’t solve something entirely with XPath. Probably most of the places where I use #collect: followed by more XPath could be done entirely in XPath if I knew how. 2. This is the first time I have tried to use NeoJSON, so do not take my code as an example of how to use it. It all works, as far as I can see. I cannot claim more than that. 3. The easiest way to generate an object (or map) in NeoJSON is to start with a Pharo dictionary, which I have done everywhere. However, this means you have no control over the order in which the attributes appear in the JSON file. This is of no importance to a computer, since by definition the attributes are unordered, but it makes it a little odd to a human reader of the JSON. 4. In your spec, the desired output has a lot of unquoted strings for attribute names, for example nbd_no. The code produces these strings with double quotes, which as far as I can see is necessary for legal JSON. 5. Note that all numerical values appear in the output as strings. No doubt they could be converted to numbers, but I was too lazy to find out how. 6. I have done this using Moose 5.1 (Pharo 4.0, build #40613), with versions of XMLHTMLParser and XPath which I downloaded quite a while ago. There are no particularly abstruse uses, so I hope you will be OK if you use more recent versions. Hope this is helpful. Best wishes Peter Kenny From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of Bernardo Ezequiel Contreras Sent: 27 June 2016 15:17 To: Any question about pharo is welcome <pharo-users@lists.pharo.org> Subject: Re: [Pharo-users] If you have to do web data scraping, what tool would you use? Doru, See attached file, it's a job posting from upwork. On Mon, Jun 27, 2016 at 3:58 AM, Tudor Girba <tu...@tudorgirba.com <mailto:tu...@tudorgirba.com> > wrote: Hi, Could you provide more details about the use case? Cheers, Doru > On Jun 26, 2016, at 11:14 PM, Bernardo Ezequiel Contreras > <vonbecm...@gmail.com <mailto:vonbecm...@gmail.com> > wrote: > > Hi, > Imagine that you have to do some data scraping work, what tool would you use? > I know about ZnClient, Soup, NeoCSV, NeoJSON, is there something else that > i'm not aware of it? > > thanks. > > > -- > Bernardo E.C. > > Sent from a cheap desktop computer in South America. -- www.tudorgirba.com <http://www.tudorgirba.com> www.feenk.com <http://www.feenk.com> "If you can't say why something is relevant, it probably isn't." -- Bernardo E.C. Sent from a cheap desktop computer in South America.
BernardoDemo.st
Description: Binary data