[ 
https://issues.apache.org/jira/browse/HIVE-28666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905158#comment-17905158
 ] 

Stamatis Zampetakis commented on HIVE-28666:
--------------------------------------------

The first step is to put all content that is under the wiki into 
https://github.com/apache/hive-site. Confluence [allows to 
export|https://cwiki.apache.org/confluence/spaces/exportspacewelcome.action?key=Hive]
 all pages in HTML, XML, or PDF format. There are plugins that allow exporting 
the pages to markdown but they are not free (see INFRA-22945) so it not 
possible to install them.

Although we could store the wiki pages as HTML in our website the boilerplate 
code that comes with this format would make reviews of new content very noisy. 
For this reason, I created a small python script ( [^html_to_markdown.py]) 
based on markdownify and BeautifulSoup that does a decent job in translating 
HTML to Markdown.

> Migrate documentation from the wiki to the website
> --------------------------------------------------
>
>                 Key: HIVE-28666
>                 URL: https://issues.apache.org/jira/browse/HIVE-28666
>             Project: Hive
>          Issue Type: Task
>      Security Level: Public(Viewable by anyone) 
>          Components: Website
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>         Attachments: html_to_markdown.py
>
>
> Currently all documentation is hosted and maintained in the Confluence wiki 
> (https://cwiki.apache.org/confluence/display/Hive/Home). The wiki has certain 
> drawbacks that are not easy to circumvent. 
> 1. Contributions are cumbersome. New contributors have to request a wiki 
> account from INFRA and then the PMC must give additional karma to the user to 
> be able to modify the space.
> 2. Reviews are difficult. There is no built-in feature in Confluence that 
> allows to review changes before updating the content of the space.
> 3. History is hard to track. Although, versioning is supported at page level 
> finding who and when modified a part of the page is not straightforward. 
> Moreover, when pages get moved, deleted, etc., it's very hard or impossible 
> to track what happened. 
> 4. Limited access control. Any user with the basic permissions that are 
> usually given on-demand can modify any part of the space without anyone 
> realizing.
> The above shortcomings can be alleviated by putting the documentation under 
> the Website (https://hive.apache.org/) that is under version control (git).
> Shortcomings of confluence have appeared various times in discussions in the 
> dev list:
> * https://lists.apache.org/thread/58zhfdklq485c6942fj0lmpzmh8o9fch
> * https://lists.apache.org/thread/jcck8tdod3hyzf5wjzxzn075xn79st4h



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to