This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 2f55f8551 IMPALA-11519: [DOCS] add UTF-8 requirements
2f55f8551 is described below
commit 2f55f85519c0bd0baec83e74104a87fe4859726e
Author: Shajini Thayasingh <[email protected]>
AuthorDate: Tue Aug 23 09:01:12 2022 -0700
IMPALA-11519: [DOCS] add UTF-8 requirements
added a note about Glibc version and en_US.UTF-8 locale
updated the notes in both topics
Change-Id: I4d7a21c787c66868219c7bd64aa31f772de2f850
Reviewed-on: http://gerrit.cloudera.org:8080/18897
Reviewed-by: Quanlong Huang <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
docs/topics/impala_components.xml | 67 ++++++++++++++++++++-------------------
docs/topics/impala_utf_8.xml | 13 ++++++--
2 files changed, 45 insertions(+), 35 deletions(-)
diff --git a/docs/topics/impala_components.xml
b/docs/topics/impala_components.xml
index 8f5f7f383..cb70eb84f 100644
--- a/docs/topics/impala_components.xml
+++ b/docs/topics/impala_components.xml
@@ -47,46 +47,47 @@ under the License.
<conbody>
- <p> The core Impala component is the Impala daemon, physically
represented
- by the <codeph>impalad</codeph> process. A few of the key functions
that
- an Impala daemon performs are:<ul>
+ <p> The core Impala component is the Impala daemon, physically
represented by the
+ <codeph>impalad</codeph> process. A few of the key functions that an
Impala daemon
+ performs are:<ul>
<li>Reads and writes to data files.</li>
- <li>Accepts queries transmitted from the
<codeph>impala-shell</codeph>
- command, Hue, JDBC, or ODBC.</li>
- <li>Parallelizes the queries and distributes work across the
- cluster.</li>
- <li>Transmits intermediate query results back to the central
- coordinator. </li>
+ <li>Accepts queries transmitted from the
<codeph>impala-shell</codeph> command, Hue, JDBC,
+ or ODBC.</li>
+ <li>Parallelizes the queries and distributes work across the
cluster.</li>
+ <li>Transmits intermediate query results back to the central
coordinator. </li>
</ul></p>
<p>Impala daemons can be deployed in one of the following ways:<ul>
- <li>HDFS and Impala are co-located, and each Impala daemon runs on
the
- same host as a DataNode.</li>
- <li>Impala is deployed separately in a compute cluster and reads
- remotely from HDFS, S3, ADLS, etc.</li>
+ <li>HDFS and Impala are co-located, and each Impala daemon runs on
the same host as a
+ DataNode.</li>
+ <li>Impala is deployed separately in a compute cluster and reads
remotely from HDFS, S3,
+ ADLS, etc.</li>
</ul></p>
- <p> The Impala daemons are in constant communication with StateStore, to
- confirm which daemons are healthy and can accept new work. </p>
-
- <p rev="1.2"> They also receive broadcast messages from the
- <cmdname>catalogd</cmdname> daemon (introduced in Impala 1.2)
whenever
- any Impala daemon in the cluster creates, alters, or drops any type of
- object, or when an <codeph>INSERT</codeph> or <codeph>LOAD
DATA</codeph>
- statement is processed through Impala. This background communication
- minimizes the need for <codeph>REFRESH</codeph> or <codeph>INVALIDATE
- METADATA</codeph> statements that were needed to coordinate metadata
- across Impala daemons prior to Impala 1.2. </p>
-
- <p rev="2.9.0 IMPALA-3807 IMPALA-5147 IMPALA-5503">
- In <keyword keyref="impala29_full"/> and higher, you can control which
hosts act as query coordinators
- and which act as query executors, to improve scalability for highly
concurrent workloads on large clusters.
- See <xref keyref="scalability_coordinator"/> for details.
- </p>
+ <p> The Impala daemons are in constant communication with StateStore, to
confirm which daemons
+ are healthy and can accept new work. </p>
+
+ <p rev="1.2"> They also receive broadcast messages from the
<cmdname>catalogd</cmdname> daemon
+ (introduced in Impala 1.2) whenever any Impala daemon in the cluster
creates, alters, or
+ drops any type of object, or when an <codeph>INSERT</codeph> or
<codeph>LOAD DATA</codeph>
+ statement is processed through Impala. This background communication
minimizes the need for
+ <codeph>REFRESH</codeph> or <codeph>INVALIDATE METADATA</codeph>
statements that were
+ needed to coordinate metadata across Impala daemons prior to Impala
1.2. </p>
+
+ <p rev="2.9.0 IMPALA-3807 IMPALA-5147 IMPALA-5503"> In <keyword
keyref="impala29_full"/> and
+ higher, you can control which hosts act as query coordinators and
which act as query
+ executors, to improve scalability for highly concurrent workloads on
large clusters. See
+ <xref keyref="scalability_coordinator"/> for details. </p>
+
+ <note>Impala daemons should be deployed on nodes using the same Glibc
version since different
+ Glibc version supports different Unicode standard version and also
ensure that the
+ en_US.UTF-8 locale is installed in the nodes. Not using the same Glibc
version might result
+ in inconsistent UTF-8 behavior when UTF8_MODE is set to true.</note>
<p>
- <b>Related information:</b> <xref
href="impala_config_options.xml#config_options"/>,
- <xref href="impala_processes.xml#processes"/>, <xref
href="impala_timeouts.xml#impalad_timeout"/>,
- <xref href="impala_ports.xml#ports"/>, <xref
href="impala_proxy.xml#proxy"/>
+ <b>Related information:</b>
+ <xref href="impala_config_options.xml#config_options"/>, <xref
+ href="impala_processes.xml#processes"/>, <xref
href="impala_timeouts.xml#impalad_timeout"
+ />, <xref href="impala_ports.xml#ports"/>, <xref
href="impala_proxy.xml#proxy"/>
</p>
</conbody>
</concept>
diff --git a/docs/topics/impala_utf_8.xml b/docs/topics/impala_utf_8.xml
index fac6bce88..f6a5b8ed1 100644
--- a/docs/topics/impala_utf_8.xml
+++ b/docs/topics/impala_utf_8.xml
@@ -48,8 +48,17 @@ under the License.
query option can be set globally, or at per session level. Only queries
with UTF8_MODE=true will
have UTF-8 aware behaviors.</p>
<p>
- <note>If the query option UTF8_MODE is turned on globally, existing
queries that depend on the
- original binary behavior need to explicitly set
UTF8_MODE=false.</note></p>
+ <note>
+ <ul id="ul_vs2_qrx_p5b">
+ <li>If the query option UTF8_MODE is turned on globally, existing
queries that depend on
+ the original binary behavior need to explicitly set
UTF8_MODE=false.</li>
+ <li>Impala Daemons should be deployed on nodes using the same
Glibc version since
+ different Glibc version supports different Unicode standard
version and also ensure
+ that the en_US.UTF-8 locale is installed in the nodes. Not using
the same Glibc
+ version might result in inconsistent UTF-8 behavior when
UTF8_MODE is set to
+ true.</li>
+ </ul>
+ </note></p>
</conbody>
</concept>
<concept id="list_string_functions">