I agree it makes no sense. I repeated exactly what you describe in a new playground (in Pharo 6.1 on Windows 10) and all worked as expected – essentially the same result as Torsten reported in his first post. I wonder if it might be something Mac related in the operation of Playground.
As a desperate try to explain it, please see what happens if you open a Playground with just your single line ingredientsXML := XMLHTMLParser parseURL: ' <https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference’> https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference’ and then select ‘do it and go’. You should find an inspector pane opening to the right in the Playground, with the result of the parse. If this fails, the standard suggestion is to open a debugger on you error message and try to work back through the stack to see how execution got there. Just to discourage you further, when you do get to read the contents of the URL, you will find that the USDA have changed everything. All the data are now on a separate web site, probably in a new layout. This is one of the perpetual hassles of web scraping – the web site authors have to justify their existence by rewriting everything. I wrote this section of the scraping booklet, working up something I had done as a one-off a year or so earlier, and then I found that the USDA had changed the layout in the interim and much needed to be rewritten. HTH – in part at least. Peter Kenny To Torsten – I agree I was slipshod in my drafting – I was in a hurry. Instead of saying ‘can screw things up’ I should have said ‘can produce counter-intuitive results’, as exemplified by the fact that, in your first example, ‘ingredientsXML’ can mean different things depending on whether you execute it all in one go or a line at a time. From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of LawsonEnglish Sent: 07 January 2020 20:55 To: Any question about pharo is welcome <pharo-users@lists.pharo.org> Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub I deleted the playground and entered the text thusly ingredientsXML := XMLHTMLParser parseURL: 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb <https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference’> &ds=Standard+Reference’. “do it” has no complaints ingredientsXML = nil yields “false" ingredientsXML inspect has errors: #new sent to nil . This makes no sense at all. L On Jan 7, 2020, at 1:55 AM, PBKResearch <pe...@pbkresearch.co.uk <mailto:pe...@pbkresearch.co.uk> > wrote: It may be a quirk of how Pharo Playground works. It doesn't need local variable declarations - which is convenient - but putting them in can screw things up. Try your snippet again without the first line. Compare Torsten's code. HTH Peter Kenny -----Original Message----- From: Pharo-users <pharo-users-boun...@lists.pharo.org <mailto:pharo-users-boun...@lists.pharo.org> > On Behalf Of Torsten Bergmann Sent: 07 January 2020 07:47 To: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org> Cc: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org> Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub Works without a problem (Pharo 8 on Windows), see attached. So it looks like a local problem. Just check the debugger and compare to the squeak version where you run in trouble. Maybe the document could not be retrieved on your machine. Bye T. Gesendet: Dienstag, 07. Januar 2020 um 04:42 Uhr Von: "LawsonEnglish" <lengli...@cox.net <mailto:lengli...@cox.net> > An: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org> Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub Torsten Bergmann wrote Hi, You can load using Metacello new baseline: 'XMLParserHTML'; repository: 'github://pharo-contributions/XML-XMLParserHTML/src'; load. Bye T. Hi, I'm trying to use the sample code in the pharo screen scraping booklet — http://books.pharo.org/booklet-Scraping/pdf/2018-09-02-scrapingbook.pdf — but while everything appears to load, I'm getting an odd behavior from: /| ingredientsXML | ingredientsXML := XMLHTMLParser parseURL: 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb <https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference> &ds=Standard+Reference'. ingredientsXML inspect/ "#new was sent to nil" No matter what URL I use, I get the same message. I'm using Mac OS Catalina so I thought I might have some strange Mac OS security issue (like it was quietly refusing to allow Pharo to access the internet), but I tested with squeak and the old /html :=(HtmlParser parse: 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb <https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference> &ds=Standard+Reference' asUrl retrieveContents content)/ and that returns actual html without any problems. Suggestions? Thanks. L -- Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html