I agree with 'archiving', but what does that mean? delete from the repo and
site?
While I really doubt people are looking for docs for, say, 0.5.0, it'd be a
big jump to totally remove it.

What if we made a compressed tarball of old docs and put that in the repo,
linked to it, and removed the docs files for many old releases?
It's still in the repo and will be in the container when docs are built,
but, compressed would be much smaller.
That could buy a significant amount of time.

On Thu, Aug 8, 2024 at 7:06 AM Kent Yao <y...@apache.org> wrote:

> Hi dev,
>
> The current size of the spark-website repository is approximately 16GB,
> exceeding the storage limit of GitHub-hosted runners.  The GitHub actions
> have been failing recently in the actions/checkout step caused by
> 'No space left on device' errors.
>
> Filesystem      Size  Used Avail Use% Mounted on
> overlay          73G   58G   16G  80% /
> tmpfs            64M     0   64M   0% /dev
> tmpfs           7.9G     0  7.9G   0% /sys/fs/cgroup
> shm              64M     0   64M   0% /dev/shm
> /dev/root        73G   58G   16G  80% /__w
> tmpfs           1.6G  1.2M  1.6G   1% /run/docker.sock
> tmpfs           7.9G     0  7.9G   0% /proc/acpi
> tmpfs           7.9G     0  7.9G   0% /proc/scsi
> tmpfs           7.9G     0  7.9G   0% /sys/firmware
>
>
> The documentation for each version contributes the most volume. Since
> version
>  3.5.0, the documentation size has grown 3-4 times larger than the
> size of 3.4.x,
>  with more than 1GB.
>
>
> 9.9M ./0.6.0
>  10M ./0.6.1
>  10M ./0.6.2
>  15M ./0.7.0
>  16M ./0.7.2
>  16M ./0.7.3
>  20M ./0.8.0
>  20M ./0.8.1
>  38M ./0.9.0
>  38M ./0.9.1
>  38M ./0.9.2
>  36M ./1.0.0
>  38M ./1.0.1
>  38M ./1.0.2
>  48M ./1.1.0
>  48M ./1.1.1
>  73M ./1.2.0
>  73M ./1.2.1
>  74M ./1.2.2
>  69M ./1.3.0
>  73M ./1.3.1
>  68M ./1.4.0
>  70M ./1.4.1
>  80M ./1.5.0
>  78M ./1.5.1
>  78M ./1.5.2
>  87M ./1.6.0
>  87M ./1.6.1
>  87M ./1.6.2
>  86M ./1.6.3
> 117M ./2.0.0
> 119M ./2.0.0-preview
> 118M ./2.0.1
> 118M ./2.0.2
> 121M ./2.1.0
> 121M ./2.1.1
> 122M ./2.1.2
> 122M ./2.1.3
> 130M ./2.2.0
> 131M ./2.2.1
> 132M ./2.2.2
> 131M ./2.2.3
> 141M ./2.3.0
> 141M ./2.3.1
> 141M ./2.3.2
> 142M ./2.3.3
> 142M ./2.3.4
> 145M ./2.4.0
> 146M ./2.4.1
> 145M ./2.4.2
> 144M ./2.4.3
> 145M ./2.4.4
> 143M ./2.4.5
> 143M ./2.4.6
> 143M ./2.4.7
> 143M ./2.4.8
> 197M ./3.0.0
> 185M ./3.0.0-preview
> 197M ./3.0.0-preview2
> 198M ./3.0.1
> 198M ./3.0.2
> 205M ./3.0.3
> 239M ./3.1.1
> 239M ./3.1.2
> 239M ./3.1.3
> 840M ./3.2.0
> 842M ./3.2.1
> 282M ./3.2.2
> 244M ./3.2.3
> 282M ./3.2.4
> 295M ./3.3.0
> 297M ./3.3.1
> 297M ./3.3.2
> 297M ./3.3.3
> 297M ./3.3.4
> 314M ./3.4.0
> 314M ./3.4.1
> 328M ./3.4.2
> 324M ./3.4.3
> 1.1G ./3.5.0
> 1.2G ./3.5.1
> 1.1G ./4.0.0-preview1
>
> I'm concerned about publishing the documentation for version 3.5.2
> to the asf-site. So, I have merged PR[2] to eliminate this potential
> blocker.
>
> Considering that the problem still exists, should we temporarily archive
> some of the outdated version documents? For example, only keep
> the latest version for each feature release in the asf-site branch. Or,
> Do you have any other suggestions?
>
>
> Bests,
> Kent Yao
>
>
> [1]
> https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories
> [2] https://github.com/apache/spark-website/pull/543
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to