[PATCH] Fix docs to use canonical links

Joel Jacobson Thu, 27 Jun 2024 02:29:37 -0700

Hello hackers,

During work in the separate thread [1], I discovered more cases
where the link in docs wasn't the canonical link [2].


[1] 
https://postgr.es/m/cakfquwyex9pj9g0zhjewsmsbnquygh+fycw-66ezjfvg4ko...@mail.gmail.com
[2] https://en.wikipedia.org/wiki/Canonical_link_element

The. below script e.g. doesn't parse SGML, and is broken in some other ways
also, but probably good enough to suggest changes that can then be manually
carefully verified.

```
#!/bin/bash
output_file="changes.log"
> $output_file
extract_canonical() {
  local url=$1
  canonical=$(curl -s "$url" | sed -n 's/.*<link rel="canonical" 
href="\([^"]*\)".*/\1/p')
  if [[ -n "$canonical" && "$canonical" != "$url" ]]; then
    echo "-$url" >> $output_file
    echo "+$canonical" >> $output_file
    echo $canonical
  else
    echo $url
  fi
}
find . -type f -name '*.sgml' | while read -r file; do
  urls=$(sed -n 's/.*\(https:\/\/[^"]*\).*/\1/p' "$file")
  for url in $urls; do
    canonical_url=$(extract_canonical "$url")
    if [[ "$canonical_url" != "$url" ]]; then
      # Replace the original URL with the canonical URL in the file
      sed -i '' "s|$url|$canonical_url|g" "$file"
    fi
  done
done
```

Most of what it found was indeed correct, but I had to undo some mistakes it 
did.

All the changes in the attached patch have been manually verified, by clicking
the original link, and observing the URL seen in the browser.

/Joel

0001-Fix-docs-to-use-canonical-links.patch
Description: Binary data

[PATCH] Fix docs to use canonical links

Reply via email to