Hoi,
I wonder if this information is available at archive.org. If it is, having
it at wikisource is somewhat redundant.
Thanks,
GerardM
On Sat, 6 Dec 2025 at 12:11, <[email protected]> wrote:
> Hello everyone,
>
> I am Quinn (User:SuperGrey) from Chinese Wikisource (zh.wikisource.org).
> I am writing to request advice and precedent from the wider Wikisource
> community and the Wikimedia Foundation regarding a proposed large-scale
> import of Chinese court judgments from the national database known as China
> Judgments Online (中国裁判文书网, often abbreviated as CJO).
>
> I would like to begin with some background, because many non-Chinese
> Wikimedia contributors may not be aware of how significant CJO has been for
> judicial transparency in China and how sharply access to it has been
> reduced in recent years.
>
> China Judgments Online was launched in 2014 by the Supreme People’s Court
> (SPC) as a major transparency initiative. For nearly a decade, courts
> across the country uploaded tens of millions of decisions, creating what
> was widely regarded as one of the world’s largest publicly accessible
> judicial databases. At its peak, CJO hosted over 140 million documents and
> received tens of billions of page views. Researchers inside and outside
> China used the site extensively to study judicial behavior, local
> governance, criminal justice, and institutional changes.
>
> However, since around 2021, and especially in 2023–2024, the Chinese
> government has significantly reversed this openness. Multiple independent
> investigations and media reports have documented the systematic removal of
> previously public judgments, particularly those that reflect poorly on
> local authorities, expose procedural misconduct, involve politically
> sensitive issues, or contradict preferred political narratives. In late
> 2023, leaked SPC documents revealed instructions to migrate judgments into
> a new internal-only database accessible solely within the court system,
> while sharply reducing what remains publicly visible. Studies have shown
> that vast numbers of cases have already disappeared from public view. Major
> news organizations such as MIT Technology Review, Radio Free Asia, the
> South China Morning Post, and Reuters have all reported on this rollback of
> judicial transparency:
> –
> https://www.technologyreview.com/2023/12/20/1085741/china-judgements-online-transparency-government/
> –
> https://www.rfa.org/english/news/china/china-court-records-12142023132626.html
> –
> https://www.scmp.com/news/china/politics/article/3246067/china-cut-back-access-court-rulings-sparking-concerns-about-judicial-transparency
> –
> https://www.reuters.com/world/china/china-vows-judicial-disclosure-after-outcry-over-plan-curb-access-rulings-2024-01-22/
>
> For our purposes, the important point is this: CJO has removed or
> restricted access to large portions of its historical archive, including
> documents that were originally public, legally non-copyrightable under
> Chinese law, and crucial for understanding the functioning of China’s legal
> system. Many judgments that were once easily verifiable on the official
> site can no longer be checked against their original source. These
> documents are at risk of disappearing entirely from public access.
>
> An independent archiving project, caseopen.org, has preserved a large
> HTML snapshot of CJO’s judgments spanning 2013 to October 2024. The
> maintainers of caseopen.org have donated this dataset to Chinese
> Wikisource. The files capture the “online version” as it originally
> appeared on CJO, including formatting and errors, and therefore represent a
> unique opportunity to preserve a historical record of China’s legal system
> prior to this wave of censorship and delisting. In practical terms, this
> may be the last comprehensive public snapshot that will ever exist.
>
> On Chinese Wikisource, I have proposed importing this dataset through a
> bot (User:SuperGrey-bot). The local discussion, including technical details
> and code links, is here (in Chinese):
> https://zh.wikisource.org/wiki/Wikisource:机器人#User:SuperGrey-bot
>
> The scale of the corpus is extremely large: tens of millions of judgments,
> potentially more if we include non-judgment document types such as 裁定书
> (ruling document) and 通知书 (notification document). We are planning a staged
> import, beginning with small test batches, then individual months, and only
> later the full corpus, once the community settles questions about
> formatting, titling, metadata, and scope.
>
> Because this project includes politically sensitive material and an
> unusual archival value, and because the scale is unprecedented for our
> language Wikisource, I would greatly appreciate advice and precedent from
> the international community. This is not only a technical or organizational
> task; it is also a preservation effort. We are attempting to safeguard
> public domain legal documents that have been systematically removed from
> public access. Wikisource may be one of the last neutral, open, global
> platforms capable of preserving this historical record.
>
> Given the potential size of the import, I would also appreciate input from
> the Wikimedia Foundation on any operational considerations. A
> multi-million–page import may affect storage, dumps, CirrusSearch indexing,
> and overall site performance. Before proceeding beyond small test batches,
> I would like to understand whether such an import is feasible within the
> current technical limits of Chinese Wikisource, and whether coordination
> with SRE or Cloud Services is recommended.
>
> Specifically, I would like to ask for input on the following areas:
>
> 1. Scope and suitability
> Have other Wikisources hosted similarly massive, uniform corpora of
> government or legal documents? How did you determine whether they fit the
> mission of Wikisource? Were there concerns about overwhelming the project
> or changing its character?
>
> 2. Verifiability and provenance
> In our case, the source is an independent mirror of a government website
> that is now selectively removing documents. While Wikimedia projects have
> long preserved public domain government documents after originals were
> taken down or censored, I am unsure how Wikisource communities have handled
> this scenario in practice. Are mirrored datasets acceptable when the
> original public source has been altered or removed? How should we document
> provenance and authenticity for future readers?
>
> 3. Organizational and technical considerations
> If we proceed, how should we structure this corpus so the project remains
> usable? Are there recommended practices for:
> – titling, metadata, and Wikidata integration for legal documents,
> – organizing millions of pages so they do not overwhelm categories and
> search,
> – mitigating strain on job queues, dumps, and indexing,
> – making future partial deletions or corrections feasible if political
> pressure or legal demands (e.g., DMCA takedown notices) ever arise?
>
> 4. Political and archival importance
> Wikisource has historically preserved documents at risk of censorship or
> disappearance, whether due to authoritarian restrictions or institutional
> neglect. Do other communities have experience with politically sensitive
> archival projects where the preservation value itself was a central
> motivation?
>
> At present, Chinese Wikisource is still deliberating basic formatting and
> policy questions. No large imports will be performed until a local
> consensus is clear. Although we are working from the independent
> caseopen.org snapshot rather than relying on ongoing availability of the
> official CJO site, the broader context is that public access to Chinese
> judicial decisions has already been substantially reduced in recent years.
> Because our dataset preserves a historical record that may not remain
> accessible through official channels, we believe this is an appropriate
> moment to seek broader input and learn from other Wikisource communities
> with similar archival experiences.
>
> Thank you very much for your time, advice, and any examples or concerns
> you can share. Even understanding which questions we should be asking would
> be extremely helpful.
>
> Best regards,
> Quinn Gao (User:SuperGrey)
> https://meta.wikimedia.org/wiki/User:SuperGrey
> _______________________________________________
> Wikisource-l mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Wikisource-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]