On Thu, 14 Jan 2021 16:10:09 -0500
Jack <ostrof...@users.sourceforge.net> wrote:

> On 2021.01.14 15:49, Walter Dnes wrote:
> >   I'm bored, so I do a regular daily report at the DSL Reports
> > "CanChat"
> > sub-forum, on the Covid-19 case counts for Ontario, using provincial
> > data.  I download 2 files daily as source data.  One of them is a PDF
> > file, which is run through "pdftotext" and then parsed by a bash
> > script
> > (don't ask).  Today, the command...
> >
> >   wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf
> >
> > ...returns a zero-byte file.  *BUT*, sticking the URL into the URL bar
> > of Pale Moon and Google Chrome (and I assume Firefox/etc) brings up
> > the
> > PDF file just fine.  Is "wget" being blocked?  I have to do extra
> > steps
> > to get from the browser-invoked PDF to get the PDF file saved to the
> > standard work area where my script expects it to be, so it can work
> > its
> > magic and parse out the daily breakdown by PHU (Public Health Unit).
> > BTW, today's posts requiring the PDF file are...
> > https://www.dslreports.com/forum/r33002718-
> > https://www.dslreports.com/forum/r33002752-
> >
> >   I've tried setting --user-agent= with my browser's string as shown
> > by
> > https://www.whatismybrowser.com/detect/what-is-my-user-agent  but no
> > luck.  Is there some way to get around this?  I have not updated this
> > past week, so I don't think the problem is at my end.
>
> I just copy/pasted that wget command into my terminal, and it got me a
> 1.7M PDF doc.  I'm in the US, but I have no idea if location/IP is an
> issue or not.
>
> Jack
>

I could download the file too with the wget command that you posted. If
you still have trouble, you could try using curl and pretend that
you're a firefox:
curl 'https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf' -H 
'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 
Firefox/84.0' -H 'Accept: 
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' -H 
'Accept-Language: en,de;q=0.7,en-US;q=0.3' --compressed -H 'DNT: 1' -H 
'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Pragma: 
no-cache' -H 'Cache-Control: no-cache' > moh-covid-19-report-en-2021-01-14.pdf

Andreas

Reply via email to