On 4/26/23, David Christensen <dpchr...@holgerdanske.com> wrote: > I suggest hashing the document content rather than the URL. This would > work nicely for static documents.
What do you mean by "hashing the document content"? How would that help when what you are trying to do is cleanse and canonize texts as best as you could to find relationships among their text segments? lbrtchx