[ https://issues.apache.org/jira/browse/COMDEV-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhijing Lu updated COMDEV-511: ------------------------------ Description: *Apache Doris* Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods. {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/] {*}Github{*}: [https://github.com/apache/doris] h3. *Background* In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example. h3. *Task* * Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries. * Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case. h3. *Learning Material* {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/] {*}Github{*}: [https://github.com/apache/doris] h3. Mentor * Mentor: Chen Zhang, Apache Doris Committer, [zhangc...@apache.org |mailto:yangyongqi...@apache.org] * Mentor: Zhijing Lu, Apache Doris Committer, [luzhij...@apache.org|mailto:lihaop...@apache.org] * Mailing List: d...@doris.apache.org was: *Apache Doris* Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods. {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/] {*}Github{*}: [https://github.com/apache/doris] h3. *Background* In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example. h3. *Task* * Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries. * Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case. h3. *Learning Material* {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/] {*}Github{*}: [https://github.com/apache/doris] h3. Mentor * Mentor: Chen Zhang, Apache Doris Committer, [zhangec...@apache.org |mailto:yangyongqi...@apache.org] * Mentor: Zhijing Lu, Apache Doris Committer, [luzhij...@apache.org|mailto:lihaop...@apache.org] * Mailing List: d...@doris.apache.org > [GSoC][Doris]Dictionary Encoding Acceleration > --------------------------------------------- > > Key: COMDEV-511 > URL: https://issues.apache.org/jira/browse/COMDEV-511 > Project: Community Development > Issue Type: Task > Components: GSoC/Mentoring ideas > Reporter: Zhijing Lu > Priority: Major > Labels: Doris, Mentor, full-time, gsoc2023 > > *Apache Doris* > Apache Doris is a real-time analytical database based on MPP architecture. As > a unified platform that supports multiple data processing scenarios, it > ensures high performance for low-latency and high-throughput queries, allows > for easy federated queries on data lakes, and supports various data ingestion > methods. > {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/] > {*}Github{*}: [https://github.com/apache/doris] > h3. *Background* > In Apache Doris, dictionary encoding is performed during data writing and > compaction. Dictionary encoding will be implemented on string data types by > default. The dictionary size of a column for one segment is 1M at most. The > dictionary encoding technology accelerates strings during queries, converting > them into INT, for example. > > h3. *Task* > * Phase One: Get familiar with the implementation of Apache Doris dictionary > encoding; learning how Apache Doris dictionary encoding accelerates queries. > * Phase Two: Evaluate the effectiveness of full dictionary encoding and > figure out how to optimize memory in such a case. > h3. *Learning Material* > {*}Page{*}: [https://doris.apache.org|https://doris.apache.org/] > {*}Github{*}: [https://github.com/apache/doris] > h3. Mentor > * Mentor: Chen Zhang, Apache Doris Committer, [zhangc...@apache.org > |mailto:yangyongqi...@apache.org] > * Mentor: Zhijing Lu, Apache Doris Committer, > [luzhij...@apache.org|mailto:lihaop...@apache.org] > * Mailing List: d...@doris.apache.org -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@community.apache.org For additional commands, e-mail: dev-h...@community.apache.org