Ihor Radchenko <[email protected]> writes:

>> There are a couple of known limitations:
>>     -  Some github gists automatically redirect appropriately (f.e.,
>>     https://gist.github.com/4343164 redirects to
>>     https://gist.github.com/tonyday567/4343164). gothub only supports the
>>     "complete" url (/user/gistid). Those gist urls aren't supported yet.
>>     - github supports two way of forcing the "raw" file, via
>>     raw.github.com/... and via github.com/...?raw=true, I currently only
>>     convert the first one.
>>     - github issues pages are not rewritten, see above.
>
> I am wondering whether such unconvertable links might be captured when
> publishing with --debug flag.

I am not sure what you mean with captured, do you mean "following" the
redirection?

>>  publish.sh | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 89 insertions(+)
>
> This is complex enough to warrant a separate file I think.

Agreed, fixed.

> defvar would be more canonical.

Fixed, thanks. What I've done here:

- I've fixed GET error handling (we simply do not redirect any url if we can't
  get libredirect data).

- When the 'libredirect data' (what should we redirect to what) changes
  between publishing, we now export all org files rather than only updated
  ones, described by Christian and Ihor.

- We redirect reddit.com to old.reddit.com


What I have *not* yet done:

- I have ignored the confusing link descriptions for now - when a link refers
  to github.com, that link still points to a free alternative (currently
  gothub.lunar.icu). A simple regex for the domain would suffice I think.

- I still haven't touched org-web-social.org (the aggregate page), or looked
  at URLs that are broken regardless of this patch.

As before, you can see the results on https://files.rensoliemans.nl/worg/,
though only the reddit links differ from before.

Also, is it customary to change the subject line to one containing PATCH
halfway through the discussion?

Best,
Rens

>From f6aad7cef8b6dc5f5b3d6930c549e33e229923bf Mon Sep 17 00:00:00 2001
From: Rens Oliemans <[email protected]>
Date: Tue, 25 Nov 2025 11:10:46 +0100
Subject: [PATCH] Find alternative links for YouTube, GitHub and Reddit

This commit finds alternative links for the hosts mentioned above
since they rely on non-free JS. See
https://list.orgmode.org/orgmode/87pl9szmy6.fsf@localhost/ for
context.
---
 .gitignore      |   1 +
 publish.sh      |  60 +++++++++--------
 rewrite-urls.el | 171 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 205 insertions(+), 27 deletions(-)
 create mode 100644 rewrite-urls.el

diff --git a/.gitignore b/.gitignore
index c9dc5b62..399e35e9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,3 +4,4 @@
 patches/
 /.project
 *patch
+*.sexp
diff --git a/publish.sh b/publish.sh
index b499de3d..6b163187 100755
--- a/publish.sh
+++ b/publish.sh
@@ -14,6 +14,9 @@ This variable can be set when running publish.sh script:
 (require 'htmlize)
 (require 'org-inlinetask)
 
+(push "./" load-path)
+(require 'rewrite-urls)
+
 (setq make-backup-files nil
       debug-on-error t)
 
@@ -58,30 +61,33 @@ This variable can be set when running publish.sh script:
    (R . t)
    (gnuplot . t)))
 
-(dolist (org-file (cl-remove-if
-		   (lambda (n) (string-match-p "worg/archive/" n))
-		   (directory-files-recursively default-directory "\\.org$")))
-  (let ((html-file (concat (file-name-directory org-file)
-			   (file-name-base org-file) ".html")))
-    (if (and (file-exists-p html-file)
-	     (file-newer-than-file-p html-file org-file)
-             ;; If there are include files or code, we need to
-             ;; re-generate the HTML just in case if the included
-             ;; files are changed.
-             (with-temp-buffer
-               (insert-file-contents org-file)
-               (and
-		(save-excursion
-		  (goto-char (point-min))
-		  (not (re-search-forward "#\\+include:" nil t)))
-                (save-excursion
-		  (goto-char (point-min))
-		  (not (re-search-forward "#\\+begin_src" nil t))))))
-	(message " [skipping] unchanged %s" org-file)
-      (message "[exporting] %s" (file-relative-name org-file default-directory))
-      (with-current-buffer (find-file org-file)
-	(if worg-publish-stop-on-error
-            (org-html-export-to-html)
-          (condition-case err
-	      (org-html-export-to-html)
-            (error (message (error-message-string err)))))))))
+(add-hook 'org-export-before-parsing-functions #'add-alternative-links)
+(let ((export-all (libredirect-data-changed-p libredirect-data)))
+  (dolist (org-file (cl-remove-if
+		     (lambda (n) (string-match-p "worg/archive/" n))
+		     (directory-files-recursively default-directory "\\.org$")))
+    (let ((html-file (concat (file-name-directory org-file)
+			     (file-name-base org-file) ".html")))
+      (if (and (not export-all)
+	       (file-exists-p html-file)
+	       (file-newer-than-file-p html-file org-file)
+	       ;; If there are include files or code, we need to
+	       ;; re-generate the HTML just in case if the included
+	       ;; files are changed.
+	       (with-temp-buffer
+		 (insert-file-contents org-file)
+		 (and
+		  (save-excursion
+		    (goto-char (point-min))
+		    (not (re-search-forward "#\\+include:" nil t)))
+                  (save-excursion
+		    (goto-char (point-min))
+		    (not (re-search-forward "#\\+begin_src" nil t))))))
+	  (message " [skipping] unchanged %s" org-file)
+	(message "[exporting] %s" (file-relative-name org-file default-directory))
+	(with-current-buffer (find-file org-file)
+	  (if worg-publish-stop-on-error
+              (org-html-export-to-html)
+            (condition-case err
+		(org-html-export-to-html)
+              (error (message (error-message-string err))))))))))
diff --git a/rewrite-urls.el b/rewrite-urls.el
new file mode 100644
index 00000000..6a833324
--- /dev/null
+++ b/rewrite-urls.el
@@ -0,0 +1,171 @@
+(defvar libredirect-url "https://raw.githubusercontent.com/libredirect/instances/main/data.json";)
+(defvar libredirect-data nil)
+
+(defcustom free-alternatives '((youtube . invidious)
+			       (github . gothub))
+  "Alist of free frontends to use. The CDR corresponds to a symbol that is
+known by libredirect, see https://codeberg.org/LibRedirect/instances.";)
+(defcustom redirect-save-filename "redirect-local-data.sexp"
+  "Filename used to save the relevant output of libredirect's json")
+
+
+(defun libredirect-data-changed-p (&optional data)
+  "Whether or not libredirect data changed since our last export."
+  (setq libredirect-data (or data (-get-libredirect-data)))
+  (if (not (file-exists-p redirect-save-filename))
+      ;; We hadn't saved stuff from our previous export, do so now.
+      (progn
+	(-save-redirect-data libredirect-data)
+	t)
+    ;; Compare results from stored file with data, overwrite stored data if newer
+    (let* ((stored-data (with-temp-buffer
+			  (insert-file-contents redirect-save-filename)
+			  (read (buffer-string))))
+	   (data-equal (equal (-relevant-redirect-data libredirect-data)
+			      stored-data)))
+      (unless data-equal
+	(-save-redirect-data libredirect-data))
+      (not data-equal))))
+
+
+(defun -get-libredirect-data ()
+  "Returns libredirect data from github"
+  (condition-case _
+      (let ((response (with-current-buffer
+			  (url-retrieve-synchronously libredirect-url nil nil 5)
+			(prog2
+			    (re-search-forward "\n\n" nil t) ; skip HTTP headers
+			    (buffer-substring-no-properties (point) (point-max))
+			  (kill-buffer)))))
+	(json-parse-string response :object-type 'alist :array-type 'list))
+    (t
+     (message "Worg couldn't access libredirect data, using original URLs.")
+     nil)))
+
+
+(defun -save-redirect-data (&optional data)
+  "Saves redirect data in a file with filename `redirect-save-filename'."
+  (let ((tosave (or data libredirect-data)))
+    (with-temp-file redirect-save-filename
+      (prin1
+       (-relevant-redirect-data data)
+       (current-buffer)))))
+
+
+(defun -relevant-redirect-data (data)
+  "Extracts relevant redirection data, defined in `free-alternatives'."
+  (seq-map (lambda (association)
+	     (let ((from (car association)))
+	       `(,from . ,(-first-link-of-json data from))))
+	   free-alternatives))
+
+
+(defun add-alternative-links (&optional _)
+  "Add alternative links for websites containing non-free JS.
+For each link that has an alternative (currently YouTube and GitHub), we
+insert a link to the free alternative, and change the link text of the
+original link to to =(original URL)=. We also redirect reddit links to
+old.reddit.com.
+
+See https://list.orgmode.org/orgmode/87pl9szmy6.fsf@localhost/ "
+  (unless libredirect-data
+    (setq libredirect-data (-get-libredirect-data)))
+
+  (let ((links (org-element-map (org-element-parse-buffer) 'link #'identity)))
+    (dolist (link (nreverse links))
+      (when (string-prefix-p "http" (org-element-property :type link))
+	(when-let ((new-url (-find-replacement (org-element-property :path link))))
+	  (let* ((begin (org-element-property :begin link))
+		 (contents-begin (org-element-property :contents-begin link))
+		 (contents-end (org-element-property :contents-end link)))
+	    (save-excursion
+	      (if (and contents-begin contents-end)
+		  ;; Link has description
+		  (let ((description (buffer-substring contents-begin contents-end)))
+		    ;; Change text of original link to (original URL)
+		    (goto-char contents-begin)
+		    (delete-region contents-begin contents-end)
+		    (insert "(original URL)")
+		    ;; Insert new link with free alternative
+		    (goto-char begin)
+		    (insert (format "[[%s][%s]] " new-url description)))
+		;; else, "bare" link without description.  We insert
+		;; the original link as bare, and add the old one
+		;; with (original URL) as description.
+		(let ((end (org-element-property :end link)))
+		  (goto-char end)
+		  (insert (format "][%s]]" "(original URL)"))
+		  (goto-char begin)
+		  (insert (format "[[%s]] [[" new-url)))))))))))
+
+(defun -find-replacement (path)
+  "Takes PATH and returns an alternative link if known and nil otherwise."
+  (and libredirect-data
+       (pcase path
+	 ;; See https://docs.invidious.io/redirector/
+	 ;; This matches both
+	 ;; - youtube.com/watch?v=<video-id>
+	 ;; - youtube.com/playlist?list=<playlist-id>
+	 ;; with optionally ? and = escaped
+	 ((rx "//" (? "www\.")
+	      "youtube.com"
+	      (group
+	       (or (and "/watch"
+			(or "?" "%3F")
+			"v"
+			(or "=" "%3D"))
+		   (and "/playlist"
+			(or "?" "%3F")
+			"list"
+			(or "=" "%3D"))))
+	      (group (+ not-newline)))
+	  (let* ((route (match-string 1 path))
+		 (id (match-string 2 path))
+		 (host (-first-link-of-json libredirect-data 'youtube)))
+	    (concat host route id)))
+	 ;; youtu.be/<video-id>
+	 ((rx  "//" (? "www\.")
+	       "youtu.be"
+	       "/"
+	       (group (+ not-newline)))
+	  (let* ((video-id (match-string 1 path))
+		 (host (-first-link-of-json libredirect-data 'youtube)))
+	    (concat host "/watch?v=" video-id)))
+	 ;; gothub supports all links currently used in Worg, except for
+	 ;; /<user>/<repo>/<issues>, so we exclude those.
+	 ((and
+	   (rx "//" (? "www\.")
+	       "github.com"
+	       "/"
+	       (group (+ not-newline)))
+	   ;; Exclude issues path. This approach currently also would
+	   ;; exclude a supported link if author or repo starts with
+	   ;; "issues"
+	   (guard (not (string-match-p "/issues" path))))
+	  (let* ((url (match-string 1 path))
+		 (host (-first-link-of-json libredirect-data 'github)))
+	    (concat host "/" url)))
+	 ((rx "//" "gist.github.com/"
+	      (group (+ not-newline)))
+	  (let* ((url (match-string 1 path))
+		 (host (-first-link-of-json libredirect-data 'github)))
+	    (concat host "/gist/" url)))
+	 ;; redirect reddit.com to old.reddit.com
+	 ((rx "//" (? "www\.")
+	      "reddit.com"
+	      "/"
+	      (group (+ not-newline)))
+	  (let* ((url (match-string 1 path))
+		 (host "https://old.reddit.com";))
+	    (concat host "/" url))))))
+
+(defun -first-link-of-json (data host)
+  "Gets the first available link of the alternative links for HOST.
+This is based on the json from
+https://codeberg.org/LibRedirect/instances/src/branch/main/data.json.";
+  (let* ((alternative (cdr (assoc host free-alternatives)))
+	 (object (cdr (assoc alternative libredirect-data)))
+	 (links (cdr (assoc 'clearnet object))))
+    (car links)))
+
+(provide 'rewrite-urls)
-- 
2.50.1

Reply via email to