Hello hackers, During work in the separate thread [1], I discovered more cases where the link in docs wasn't the canonical link [2].
[1] https://postgr.es/m/cakfquwyex9pj9g0zhjewsmsbnquygh+fycw-66ezjfvg4ko...@mail.gmail.com [2] https://en.wikipedia.org/wiki/Canonical_link_element The. below script e.g. doesn't parse SGML, and is broken in some other ways also, but probably good enough to suggest changes that can then be manually carefully verified. ``` #!/bin/bash output_file="changes.log" > $output_file extract_canonical() { local url=$1 canonical=$(curl -s "$url" | sed -n 's/.*<link rel="canonical" href="\([^"]*\)".*/\1/p') if [[ -n "$canonical" && "$canonical" != "$url" ]]; then echo "-$url" >> $output_file echo "+$canonical" >> $output_file echo $canonical else echo $url fi } find . -type f -name '*.sgml' | while read -r file; do urls=$(sed -n 's/.*\(https:\/\/[^"]*\).*/\1/p' "$file") for url in $urls; do canonical_url=$(extract_canonical "$url") if [[ "$canonical_url" != "$url" ]]; then # Replace the original URL with the canonical URL in the file sed -i '' "s|$url|$canonical_url|g" "$file" fi done done ``` Most of what it found was indeed correct, but I had to undo some mistakes it did. All the changes in the attached patch have been manually verified, by clicking the original link, and observing the URL seen in the browser. /Joel
0001-Fix-docs-to-use-canonical-links.patch
Description: Binary data