Re: [VOTE] Archive Spark Documentations in Apache Archives

Mich Talebzadeh Mon, 12 Aug 2024 16:42:56 -0700

ok thanks

Based on the list you provided, I make it to be a total of 11.88 GB.


cat convert_sum.awk
{
    split($1, a, /[MG]/)
    val = a[1]
    unit = a[2]
    if (unit == "G") {
        val = val * 1024
    }
    sum += val
}
END {
    printf("%.2f GB\n", sum / 1024)
}

awk -f convert_sum.awk size.txt
11.88 GB

Mich Talebzadeh,

Architect | Data Engineer | Data Science | Financial Crime
PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
London <https://en.wikipedia.org/wiki/Imperial_College_London>
London, United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Mon, 12 Aug 2024 at 23:33, Sean Owen <[email protected]> wrote:

> He did already; see the preceding thread here on dev@.
>
> You can figure the size that moves out of the repo from the docs sizes:
>
> 9.9M ./0.6.0
>  10M ./0.6.1
>  10M ./0.6.2
>  15M ./0.7.0
>  16M ./0.7.2
>  16M ./0.7.3
>  20M ./0.8.0
>  20M ./0.8.1
>  38M ./0.9.0
>  38M ./0.9.1
>  38M ./0.9.2
>  36M ./1.0.0
>  38M ./1.0.1
>  38M ./1.0.2
>  48M ./1.1.0
>  48M ./1.1.1
>  73M ./1.2.0
>  73M ./1.2.1
>  74M ./1.2.2
>  69M ./1.3.0
>  73M ./1.3.1
>  68M ./1.4.0
>  70M ./1.4.1
>  80M ./1.5.0
>  78M ./1.5.1
>  78M ./1.5.2
>  87M ./1.6.0
>  87M ./1.6.1
>  87M ./1.6.2
>  86M ./1.6.3
> 117M ./2.0.0
> 119M ./2.0.0-preview
> 118M ./2.0.1
> 118M ./2.0.2
> 121M ./2.1.0
> 121M ./2.1.1
> 122M ./2.1.2
> 122M ./2.1.3
> 130M ./2.2.0
> 131M ./2.2.1
> 132M ./2.2.2
> 131M ./2.2.3
> 141M ./2.3.0
> 141M ./2.3.1
> 141M ./2.3.2
> 142M ./2.3.3
> 142M ./2.3.4
> 145M ./2.4.0
> 146M ./2.4.1
> 145M ./2.4.2
> 144M ./2.4.3
> 145M ./2.4.4
> 143M ./2.4.5
> 143M ./2.4.6
> 143M ./2.4.7
> 143M ./2.4.8
> 197M ./3.0.0
> 185M ./3.0.0-preview
> 197M ./3.0.0-preview2
> 198M ./3.0.1
> 198M ./3.0.2
> 205M ./3.0.3
> 239M ./3.1.1
> 239M ./3.1.2
> 239M ./3.1.3
> 840M ./3.2.0
> 842M ./3.2.1
> 282M ./3.2.2
> 244M ./3.2.3
> 282M ./3.2.4
> 295M ./3.3.0
> 297M ./3.3.1
> 297M ./3.3.2
> 297M ./3.3.3
> 297M ./3.3.4
> 314M ./3.4.0
> 314M ./3.4.1
> 328M ./3.4.2
> 324M ./3.4.3
> 1.1G ./3.5.0
> 1.2G ./3.5.1
> 1.1G ./4.0.0-preview1
>
> On Mon, Aug 12, 2024 at 5:16 PM Mich Talebzadeh <[email protected]>
> wrote:
>
>> Hi Kent,
>>
>> Can you if possible provide a heuristic estimate of space reduction your
>> proposal is going to achieve?
>>
>> Thanks
>>
>> Mich Talebzadeh,
>>
>> Architect | Data Engineer | Data Science | Financial Crime
>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>> College London <https://en.wikipedia.org/wiki/Imperial_College_London>
>> London, United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>
>>
>> On Mon, 12 Aug 2024 at 14:55, Mich Talebzadeh <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>> On the face of it, this email contains many references, making it
>>> difficult to follow. Perhaps, a simpler explanation could improve voting
>>> participation.
>>>
>>> The STAR methodology can be helpful in understanding and evaluating this
>>> proposal. STAR stands for Situation, Task, Action, Result.
>>>
>>> Let us have a look at this
>>>
>>> *S*ituation:
>>>
>>>    - The Spark website repository is reaching its storage limit on
>>>    GitHub-hosted runners.
>>>
>>> *T*ask:
>>>
>>>    - Reduce storage usage without compromising access to documentation.
>>>
>>> *A*ction:(proposed)
>>>
>>>    - Move documentation releases from the dev directory to the
>>>    release directory within the Apache distribution.
>>>    - Leverage the Apache Archives service to create permanent links for
>>>    the documentation.
>>>    - Upload older website-hosted documentation manually via SVN.
>>>    - Optionally, delete old documentation and update links/use
>>>    redirection as needed.
>>>
>>> *Result:*
>>>
>>>    - Reduced storage usage on GitHub-hosted runners.
>>>    - Permanent, publicly accessible links for Spark documentation via
>>>    the Apache Archives.
>>>    - Potential need for manual upload of older documentation and link
>>>    updates.
>>>
>>>
>>> Consider including an estimated storage reduction achieved through this
>>> approach.
>>> Overall, the proposal offers a viable solution for managing Spark
>>> documentation while reducing storage concerns. However, addressing the
>>> potential complexity of managing older documentation versions is crucial.
>>>
>>> +1 for me
>>>
>>> Mich Talebzadeh,
>>>
>>> Architect | Data Engineer | Data Science | Financial Crime
>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London>
>>> London, United Kingdom
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>
>>>
>>> On Mon, 12 Aug 2024 at 10:09, Kent Yao <[email protected]> wrote:
>>>
>>>> Archive Spark Documentations in Apache Archives
>>>>
>>>> Hi dev,
>>>>
>>>> To address the issue of the Spark website repository size
>>>> reaching the storage limit for GitHub-hosted runners [1], I suggest
>>>> enhancing step [2] in our release process by relocating the
>>>> documentation releases from the dev[3] directory to the release
>>>> directory[4]. Then it would captured by the Apache Archives
>>>> service[5] to create permanent links, which would be alternative
>>>> endpoints for our documentation, like
>>>>
>>>>
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-docs/_site/index.html
>>>> for
>>>> https://spark.apache.org/docs/3.5.2/index.html
>>>>
>>>> Note that the previous example still uses the staging repository,
>>>> which will become
>>>> https://archive.apache.org/dist/spark/docs/3.5.2/index.html.
>>>>
>>>> For older releases hosted on the Spark website [6], we also need to
>>>> upload them via SVN manually.
>>>>
>>>> After that, when we reach the threshold again, we can delete some of
>>>> the old ones on page [6], and update their links on page [7] or use
>>>> redirection.
>>>>
>>>> JIRA ticket: https://issues.apache.org/jira/browse/SPARK-49209
>>>>
>>>> Please vote on the idea of  Archive Spark Documentations in
>>>> Apache Archives for the next 72 hours:
>>>>
>>>> [ ] +1: Accept the proposal
>>>> [ ] +0
>>>> [ ] -1: I don’t think this is a good idea because …
>>>>
>>>> Bests,
>>>> Kent Yao
>>>>
>>>> [1] https://lists.apache.org/thread/o0w4gqoks23xztdmjjj26jkp1yyg2bvq
>>>> [2]
>>>> https://spark.apache.org/release-process.html#upload-to-apache-release-directory
>>>> [3] https://dist.apache.org/repos/dist/dev/spark/v3.5.2-rc5-docs/
>>>> [4] https://dist.apache.org/repos/dist/release/spark/docs/3.5.2
>>>> [5] https://archive.apache.org/dist/spark/
>>>> [6] https://github.com/apache/spark-website/tree/asf-site/site/docs
>>>> [7] https://spark.apache.org/documentation.html
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: [email protected]
>>>>
>>>>

Re: [VOTE] Archive Spark Documentations in Apache Archives

Reply via email to