Hello, On Thu, 14 Jan 2021, Walter Dnes wrote: > I'm bored, so I do a regular daily report at the DSL Reports "CanChat" >sub-forum, on the Covid-19 case counts for Ontario, using provincial >data. I download 2 files daily as source data. One of them is a PDF >file, which is run through "pdftotext" and then parsed by a bash script >(don't ask). Today, the command... > > wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf > >...returns a zero-byte file. *BUT*, sticking the URL into the URL bar >of Pale Moon and Google Chrome (and I assume Firefox/etc) brings up the >PDF file just fine. Is "wget" being blocked? [..] > I've tried setting --user-agent= with my browser's string as shown by >https://www.whatismybrowser.com/detect/what-is-my-user-agent but no >luck. Is there some way to get around this? I have not updated this >past week, so I don't think the problem is at my end.
I could download that file just fine just now[1]. Try running 'wget' with the '-S' option. Oh and: [..] WARNING: cannot verify files.ontario.ca's certificate, issued by [..] If you sent stderr to /dev/null ... So, try: wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \ https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf BTW: you know that you can let date format that URL? e.g.: wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \ "$(date '+https://files.ontario.ca/moh-covid-19-report-en-%Y-%m-%d.pdf')" There just are no unescaped '%' allowed besides the format strings for the date/time. So if an URL contains one, you need to escape those with another '%', as in e.g. $(date '+foo%%20bar-%Y-%m-%d.pdf') ^^ this fella In your case, the URL is clean ;) HTH, -dnh [1] $ TZ=America/Toronto date Thu Jan 14 16:50:15 EST 2021 -- "Airplane travel is nature's way of making you look like your passport photo." -- Al Gore