I agree it makes no sense. I repeated exactly what you describe in a new 
playground (in Pharo 6.1 on Windows 10) and all worked as expected – 
essentially the same result as Torsten reported in his first post. I wonder if 
it might be something Mac related in the operation of Playground.

 

As a desperate try to explain it, please see what happens if you open a 
Playground with just your single line

ingredientsXML := XMLHTMLParser parseURL: ' 
<https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference’> 
https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference’

and then select ‘do it and go’. You should find an inspector pane opening to 
the right in the Playground, with the result of the parse. If this fails, the 
standard suggestion is to open a debugger on you error message and try to work 
back through the stack to see how execution got there.

 

Just to discourage you further, when you do get to read the contents of the 
URL, you will find that the USDA have changed everything. All the data are now 
on a separate web site, probably in a new layout. This is one of the perpetual 
hassles of web scraping – the web site authors have to justify their existence 
by rewriting everything. I wrote this section of the scraping booklet, working 
up something I had done as a one-off a year or so earlier, and then I found 
that the USDA had changed the layout in the interim and much needed to be 
rewritten.

 

HTH – in part at least.

 

Peter Kenny

 

To Torsten – I agree I was slipshod in my drafting – I was in a hurry. Instead 
of saying ‘can screw things up’ I should have said ‘can produce 
counter-intuitive results’, as exemplified by the fact that, in your first 
example, ‘ingredientsXML’ can mean different things depending on whether you 
execute it all in one go or a line at a time.

 

From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of 
LawsonEnglish
Sent: 07 January 2020 20:55
To: Any question about pharo is welcome <pharo-users@lists.pharo.org>
Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

 

I deleted the playground and entered the text thusly

 

ingredientsXML := XMLHTMLParser parseURL: 
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb 
<https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference’> 
&ds=Standard+Reference’. 

 

“do it” has no complaints

 

ingredientsXML = nil 

 

yields “false"

 

ingredientsXML inspect

 

has errors: #new sent to nil

 

 

.

 

This makes no sense at all.

 

 

L

 





On Jan 7, 2020, at 1:55 AM, PBKResearch <pe...@pbkresearch.co.uk 
<mailto:pe...@pbkresearch.co.uk> > wrote:

 

It may be a quirk of how Pharo Playground works. It doesn't need local variable 
declarations - which is convenient - but putting them in can screw things up. 
Try your snippet again without the first line. Compare Torsten's code.

HTH

Peter Kenny

-----Original Message-----
From: Pharo-users <pharo-users-boun...@lists.pharo.org 
<mailto:pharo-users-boun...@lists.pharo.org> > On Behalf Of Torsten Bergmann
Sent: 07 January 2020 07:47
To: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org> 
Cc: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org> 
Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

Works without a problem (Pharo 8 on Windows), see attached. So it looks like a 
local problem.

Just check the debugger and compare to the squeak version where you run in 
trouble.
Maybe the document could not be retrieved on your machine.

Bye
T.




Gesendet: Dienstag, 07. Januar 2020 um 04:42 Uhr
Von: "LawsonEnglish" <lengli...@cox.net <mailto:lengli...@cox.net> >
An: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org> 
Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

Torsten Bergmann wrote



Hi,


You can load using

  Metacello new
               baseline: 'XMLParserHTML';
               repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
               load.


Bye
T.


Hi,

I'm trying to use the sample code in the pharo screen scraping booklet 
— 
http://books.pharo.org/booklet-Scraping/pdf/2018-09-02-scrapingbook.pdf — but 
while everything appears to load, I'm getting an odd behavior from:

/| ingredientsXML |
ingredientsXML := XMLHTMLParser parseURL:
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb 
<https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference> 
&ds=Standard+Reference'.
ingredientsXML inspect/

"#new was sent to nil"

No matter what URL I use, I get the same message.

I'm using Mac OS Catalina so I thought I might have some strange Mac 
OS security issue (like it was quietly refusing to allow Pharo to 
access the internet), but I tested with squeak and the old

/html :=(HtmlParser parse:
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb 
<https://ndb.nal.usda.gov/ndb/search/list?sort=ndb&ds=Standard+Reference> 
&ds=Standard+Reference'
asUrl retrieveContents content)/

and that returns actual html without any problems.


Suggestions?


Thanks.

L




--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html



 

 

Reply via email to