I'm trying to download a file using urllib.request and pipe it straight to an external process. On Linux systems, the following is a test file that demonstrates the problem:
--- cut --- #!/usr/bin/python3.5 import urllib.request import subprocess TEST_URL = 'https://www.irs.gov/pub/irs-prior/f1040--1864.pdf' with urllib.request.urlopen(TEST_URL) as f: data = subprocess.check_output(['file', '-'], stdin=f) print(data) with urllib.request.urlopen(TEST_URL) as f: with open('/tmp/x.pdf', 'wb') as g: n = g.write(f.read()) with open('/tmp/x.pdf') as g: data = subprocess.check_output(['file', '-'], stdin=g) print(data) --- cut --- Output is: b'/dev/stdin: data\n' b'/dev/stdin: PDF document, version 1.6\n' Expected output is: b'/dev/stdin: PDF document, version 1.6\n' b'/dev/stdin: PDF document, version 1.6\n' If I just read from urllib.request, I get what appears to the naked eye to be the expected data: py> with urllib.request.urlopen(TEST_URL) as f: ... file = f.read() ... py> print(file[:100]) b'%PDF-1.6\r%\xe2\xe3\xcf\xd3\r\n55 0 obj\r<</Linearized 1/L 66721/O 57/E 28286/N 4/T 65574/H [ 856 317]>>\rendobj\r ' Certainly looks like a PDF file. So what's going on? -- Steven 299792.458 km/s — not just a good idea, it’s the law! -- https://mail.python.org/mailman/listinfo/python-list