[ 
https://issues.apache.org/jira/browse/COMDEV-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijing Lu updated COMDEV-511:
------------------------------
    Description: 
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]

{*}Github{*}: [https://github.com/apache/doris]
h3. *Background*

In Apache Doris, dictionary encoding is performed during data writing and 
compaction. Dictionary encoding will be implemented on string data types by 
default. The dictionary size of a column for one segment is 1M at most. The 
dictionary encoding technology accelerates strings during queries, converting 
them into INT, for example.
 
h3. *Task*
 * Phase One: Get familiar with the implementation of Apache Doris dictionary 
encoding; learning how Apache Doris dictionary encoding accelerates queries.
 *  Phase Two: Evaluate the effectiveness of full dictionary encoding and 
figure out how to optimize memory in such a case.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Chen Zhang, Apache Doris Committer, [zhangc...@apache.org 
|mailto:yangyongqi...@apache.org]
 * Mentor: Zhijing Lu, Apache Doris Committer, 
[luzhij...@apache.org|mailto:lihaop...@apache.org]  
 * Mailing List: d...@doris.apache.org

  was:
*Apache Doris*
Apache Doris is a real-time analytical database based on MPP architecture. As a 
unified platform that supports multiple data processing scenarios, it ensures 
high performance for low-latency and high-throughput queries, allows for easy 
federated queries on data lakes, and supports various data ingestion methods.
{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]

{*}Github{*}: [https://github.com/apache/doris]
h3. *Background*
In Apache Doris, dictionary encoding is performed during data writing and 
compaction. Dictionary encoding will be implemented on string data types by 
default. The dictionary size of a column for one segment is 1M at most. The 
dictionary encoding technology accelerates strings during queries, converting 
them into INT, for example.
 
h3. *Task*
 * Phase One: Get familiar with the implementation of Apache Doris dictionary 
encoding; learning how Apache Doris dictionary encoding accelerates queries.
 *  Phase Two: Evaluate the effectiveness of full dictionary encoding and 
figure out how to optimize memory in such a case.

h3. *Learning Material*

{*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
{*}Github{*}: [https://github.com/apache/doris]
h3. Mentor
 * Mentor: Chen Zhang, Apache Doris Committer, [zhangec...@apache.org 
|mailto:yangyongqi...@apache.org]
 * Mentor: Zhijing Lu, Apache Doris Committer, 
[luzhij...@apache.org|mailto:lihaop...@apache.org]  
 * Mailing List: d...@doris.apache.org


> [GSoC][Doris]Dictionary Encoding Acceleration
> ---------------------------------------------
>
>                 Key: COMDEV-511
>                 URL: https://issues.apache.org/jira/browse/COMDEV-511
>             Project: Community Development
>          Issue Type: Task
>          Components: GSoC/Mentoring ideas
>            Reporter: Zhijing Lu
>            Priority: Major
>              Labels: Doris, Mentor, full-time, gsoc2023
>
> *Apache Doris*
> Apache Doris is a real-time analytical database based on MPP architecture. As 
> a unified platform that supports multiple data processing scenarios, it 
> ensures high performance for low-latency and high-throughput queries, allows 
> for easy federated queries on data lakes, and supports various data ingestion 
> methods.
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. *Background*
> In Apache Doris, dictionary encoding is performed during data writing and 
> compaction. Dictionary encoding will be implemented on string data types by 
> default. The dictionary size of a column for one segment is 1M at most. The 
> dictionary encoding technology accelerates strings during queries, converting 
> them into INT, for example.
>  
> h3. *Task*
>  * Phase One: Get familiar with the implementation of Apache Doris dictionary 
> encoding; learning how Apache Doris dictionary encoding accelerates queries.
>  *  Phase Two: Evaluate the effectiveness of full dictionary encoding and 
> figure out how to optimize memory in such a case.
> h3. *Learning Material*
> {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/]
> {*}Github{*}: [https://github.com/apache/doris]
> h3. Mentor
>  * Mentor: Chen Zhang, Apache Doris Committer, [zhangc...@apache.org 
> |mailto:yangyongqi...@apache.org]
>  * Mentor: Zhijing Lu, Apache Doris Committer, 
> [luzhij...@apache.org|mailto:lihaop...@apache.org]  
>  * Mailing List: d...@doris.apache.org



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
For additional commands, e-mail: dev-h...@community.apache.org

Reply via email to