[
https://issues.apache.org/jira/browse/CAMEL-23781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089658#comment-18089658
]
ASF GitHub Bot commented on CAMEL-23781:
----------------------------------------
k-krawczyk commented on PR #1666:
URL: https://github.com/apache/camel-website/pull/1666#issuecomment-4730733995
@davsclaus good call — I measured it against the live site rather than
guessing:
- ~5,570 doc pages in the sitemap (already spanning multiple versions:
`next`, `4.18.x`, `4.14.x`, …)
- average `.md` ~40–86 KB (component pages with big option tables are the
large ones)
- real zip ratio measured on a 12-page sample: **~26.5%**
So uncompressed is ~250–480 MB (that's the ~500 MB you expected), and the
**`.zip` lands around ~70–130 MB**. Plain `zip` compresses each file
independently, so the cross-version duplication doesn't shrink it much — a
solid `.tar.gz` would be smaller. If the on-disk build keeps more versions than
the sitemap exposes, it'd scale up proportionally.
So I agree it's too big to commit into `public/` and redeploy on every
change. Options:
1. Build it only on release (or a scheduled job) and publish it as a
**GitHub Release asset**, with `llms.txt` pointing at that URL. The project
already consumes release binaries via the `github-release-binary` yarn plugin,
so this fits the existing distribution model.
2. Ship `.tar.gz` instead of `.zip` to roughly halve the size.
3. Split into smaller per-area bundles.
I'm happy to rework this PR towards (1). Which distribution mechanism do you
prefer?
_Reported by Claude Code on behalf of Karol Krawczyk_
> camel-website - Offline zip for offline coding agents
> -----------------------------------------------------
>
> Key: CAMEL-23781
> URL: https://issues.apache.org/jira/browse/CAMEL-23781
> Project: Camel
> Issue Type: New Feature
> Components: camel-ai, website
> Reporter: Claus Ibsen
> Assignee: Karol Krawczyk
> Priority: Major
> Fix For: 4.x
>
>
> [https://github.com/apache/camel/pull/24063]
> Companies may have restricted their AI coding agents to not access the
> internet, or with controlled access. But even for controlled acccess it may
> take time for a company to approve camel.apache.org as allowed list.
> Maybe we can have a offline website .zip for AIs that has the website
> structure and only the .md files that coding agents need. Then it can source
> the information there, and just unzip this file on the local disk in /tmp.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)