ok thanks Based on the list you provided, I make it to be a total of 11.88 GB.
cat convert_sum.awk { split($1, a, /[MG]/) val = a[1] unit = a[2] if (unit == "G") { val = val * 1024 } sum += val } END { printf("%.2f GB\n", sum / 1024) } awk -f convert_sum.awk size.txt 11.88 GB Mich Talebzadeh, Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Mon, 12 Aug 2024 at 23:33, Sean Owen <sro...@gmail.com> wrote: > He did already; see the preceding thread here on dev@. > > You can figure the size that moves out of the repo from the docs sizes: > > 9.9M ./0.6.0 > 10M ./0.6.1 > 10M ./0.6.2 > 15M ./0.7.0 > 16M ./0.7.2 > 16M ./0.7.3 > 20M ./0.8.0 > 20M ./0.8.1 > 38M ./0.9.0 > 38M ./0.9.1 > 38M ./0.9.2 > 36M ./1.0.0 > 38M ./1.0.1 > 38M ./1.0.2 > 48M ./1.1.0 > 48M ./1.1.1 > 73M ./1.2.0 > 73M ./1.2.1 > 74M ./1.2.2 > 69M ./1.3.0 > 73M ./1.3.1 > 68M ./1.4.0 > 70M ./1.4.1 > 80M ./1.5.0 > 78M ./1.5.1 > 78M ./1.5.2 > 87M ./1.6.0 > 87M ./1.6.1 > 87M ./1.6.2 > 86M ./1.6.3 > 117M ./2.0.0 > 119M ./2.0.0-preview > 118M ./2.0.1 > 118M ./2.0.2 > 121M ./2.1.0 > 121M ./2.1.1 > 122M ./2.1.2 > 122M ./2.1.3 > 130M ./2.2.0 > 131M ./2.2.1 > 132M ./2.2.2 > 131M ./2.2.3 > 141M ./2.3.0 > 141M ./2.3.1 > 141M ./2.3.2 > 142M ./2.3.3 > 142M ./2.3.4 > 145M ./2.4.0 > 146M ./2.4.1 > 145M ./2.4.2 > 144M ./2.4.3 > 145M ./2.4.4 > 143M ./2.4.5 > 143M ./2.4.6 > 143M ./2.4.7 > 143M ./2.4.8 > 197M ./3.0.0 > 185M ./3.0.0-preview > 197M ./3.0.0-preview2 > 198M ./3.0.1 > 198M ./3.0.2 > 205M ./3.0.3 > 239M ./3.1.1 > 239M ./3.1.2 > 239M ./3.1.3 > 840M ./3.2.0 > 842M ./3.2.1 > 282M ./3.2.2 > 244M ./3.2.3 > 282M ./3.2.4 > 295M ./3.3.0 > 297M ./3.3.1 > 297M ./3.3.2 > 297M ./3.3.3 > 297M ./3.3.4 > 314M ./3.4.0 > 314M ./3.4.1 > 328M ./3.4.2 > 324M ./3.4.3 > 1.1G ./3.5.0 > 1.2G ./3.5.1 > 1.1G ./4.0.0-preview1 > > On Mon, Aug 12, 2024 at 5:16 PM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Hi Kent, >> >> Can you if possible provide a heuristic estimate of space reduction your >> proposal is going to achieve? >> >> Thanks >> >> Mich Talebzadeh, >> >> Architect | Data Engineer | Data Science | Financial Crime >> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >> London, United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* The information provided is correct to the best of my >> knowledge but of course cannot be guaranteed . It is essential to note >> that, as with any advice, quote "one test result is worth one-thousand >> expert opinions (Werner >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >> >> >> On Mon, 12 Aug 2024 at 14:55, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Hello, >>> >>> On the face of it, this email contains many references, making it >>> difficult to follow. Perhaps, a simpler explanation could improve voting >>> participation. >>> >>> The STAR methodology can be helpful in understanding and evaluating this >>> proposal. STAR stands for Situation, Task, Action, Result. >>> >>> Let us have a look at this >>> >>> *S*ituation: >>> >>> - The Spark website repository is reaching its storage limit on >>> GitHub-hosted runners. >>> >>> *T*ask: >>> >>> - Reduce storage usage without compromising access to documentation. >>> >>> *A*ction:(proposed) >>> >>> - Move documentation releases from the dev directory to the >>> release directory within the Apache distribution. >>> - Leverage the Apache Archives service to create permanent links for >>> the documentation. >>> - Upload older website-hosted documentation manually via SVN. >>> - Optionally, delete old documentation and update links/use >>> redirection as needed. >>> >>> *Result:* >>> >>> - Reduced storage usage on GitHub-hosted runners. >>> - Permanent, publicly accessible links for Spark documentation via >>> the Apache Archives. >>> - Potential need for manual upload of older documentation and link >>> updates. >>> >>> >>> Consider including an estimated storage reduction achieved through this >>> approach. >>> Overall, the proposal offers a viable solution for managing Spark >>> documentation while reducing storage concerns. However, addressing the >>> potential complexity of managing older documentation versions is crucial. >>> >>> +1 for me >>> >>> Mich Talebzadeh, >>> >>> Architect | Data Engineer | Data Science | Financial Crime >>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>> London, United Kingdom >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* The information provided is correct to the best of my >>> knowledge but of course cannot be guaranteed . It is essential to note >>> that, as with any advice, quote "one test result is worth one-thousand >>> expert opinions (Werner >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>> >>> >>> On Mon, 12 Aug 2024 at 10:09, Kent Yao <y...@apache.org> wrote: >>> >>>> Archive Spark Documentations in Apache Archives >>>> >>>> Hi dev, >>>> >>>> To address the issue of the Spark website repository size >>>> reaching the storage limit for GitHub-hosted runners [1], I suggest >>>> enhancing step [2] in our release process by relocating the >>>> documentation releases from the dev[3] directory to the release >>>> directory[4]. Then it would captured by the Apache Archives >>>> service[5] to create permanent links, which would be alternative >>>> endpoints for our documentation, like >>>> >>>> >>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-docs/_site/index.html >>>> for >>>> https://spark.apache.org/docs/3.5.2/index.html >>>> >>>> Note that the previous example still uses the staging repository, >>>> which will become >>>> https://archive.apache.org/dist/spark/docs/3.5.2/index.html. >>>> >>>> For older releases hosted on the Spark website [6], we also need to >>>> upload them via SVN manually. >>>> >>>> After that, when we reach the threshold again, we can delete some of >>>> the old ones on page [6], and update their links on page [7] or use >>>> redirection. >>>> >>>> JIRA ticket: https://issues.apache.org/jira/browse/SPARK-49209 >>>> >>>> Please vote on the idea of Archive Spark Documentations in >>>> Apache Archives for the next 72 hours: >>>> >>>> [ ] +1: Accept the proposal >>>> [ ] +0 >>>> [ ] -1: I don’t think this is a good idea because … >>>> >>>> Bests, >>>> Kent Yao >>>> >>>> [1] https://lists.apache.org/thread/o0w4gqoks23xztdmjjj26jkp1yyg2bvq >>>> [2] >>>> https://spark.apache.org/release-process.html#upload-to-apache-release-directory >>>> [3] https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-docs/ >>>> [4] https://dist.apache.org/repos/dist/release/spark/docs/3.5.2 >>>> [5] https://archive.apache.org/dist/spark/ >>>> [6] https://github.com/apache/spark-website/tree/asf-site/site/docs >>>> [7] https://spark.apache.org/documentation.html >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>>