Author: lidong
Date: Fri Mar 18 14:13:30 2022
New Revision: 1899035
URL: http://svn.apache.org/viewvc?rev=1899035&view=rev
Log:
# add blog: kylin4 now is supporting aws glue
Added:
kylin/site/blog/2022/03/
kylin/site/blog/2022/03/17/
kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/
kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
kylin/site/cn_blog/2022/03/
kylin/site/cn_blog/2022/03/17/
kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/
kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
kylin/site/images/blog/kylin4_support_aws_glue/
kylin/site/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/17_verify_query_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png
(with props)
kylin/site/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png
(with props)
Modified:
kylin/site/blog/index.html
kylin/site/cn/blog/index.html
kylin/site/feed.xml
Added:
kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
URL:
http://svn.apache.org/viewvc/kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html?rev=1899035&view=auto
==============================================================================
---
kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
(added)
+++
kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
Fri Mar 18 14:13:30 2022
@@ -0,0 +1,638 @@
+<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+<!doctype html>
+<html>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<head>
+ <meta charset="utf-8">
+ <meta http-equiv="X-UA-Compatible" content="IE=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+
+ <title>Apache Kylin | Kylin 4 now is supporting AWS Glue Catalog</title>
+ <meta name="description" content="Why does installing Kylin on EMR need to
support AWS Glue?">
+ <meta name="author" content="Apache Kylin">
+ <link rel="shortcut icon" href="fav.png" type="image/png">
+
+
+
+<link rel="stylesheet" href="/assets/css/animate.css">
+<!-- Bootstrap -->
+<link rel="stylesheet" href="/assets/css/bootstrap.min.css">
+
+<!-- Fonts -->
+<!-- <link rel="stylesheet"
href="http://fonts.googleapis.com/css?family=Alice|Open+Sans:400,300,700"> -->
+
+<!-- Icons -->
+<link rel="stylesheet" href="/assets/css/font-awesome.min.css">
+
+ <!-- Custom styles -->
+ <link rel="stylesheet" href="/assets/css/styles.css">
+ <link rel="stylesheet" href="/assets/css/docs.css">
+ <link rel="stylesheet" href="/assets/css/pygments.css">
+
+ <link rel="canonical"
href="http://kylin.apache.org/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/">
+ <link rel="alternate" type="application/rss+xml" title="Apache Kylin"
href="http://kylin.apache.org/feed.xml" />
+
+<!--[if lt IE 9]> <script src="assets/js/html5shiv.js"></script> <![endif]-->
+<!-- Global site tag (gtag.js) - Google Analytics -->
+<script async
src="https://www.googletagmanager.com/gtag/js?id=UA-120788561-1"></script>
+<script>
+ window.dataLayer = window.dataLayer || [];
+ function gtag(){dataLayer.push(arguments);}
+ gtag('js', new Date());
+
+ gtag('config', 'UA-120788561-1');
+</script>
+<script type="text/javascript" src="/assets/js/jquery-1.9.1.min.js"></script>
+<script type="text/javascript" src="/assets/js/nside.js"></script> </script>
+<script type="text/javascript" src="/assets/js/nnav.js"></script> </script>
+<script>
+var _hmt = _hmt || [];
+(function() {
+ var hm = document.createElement("script");
+ hm.src = "https://hm.baidu.com/hm.js?bdc5e03add430c0b72cc0eb91eabfa99";
+ var s = document.getElementsByTagName("script")[0];
+ s.parentNode.insertBefore(hm, s);
+})();
+</script>
+
+</head>
+
+ <body>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<header id="header" >
+
+ <!-- Main Menu -->
+ <nav class="navbar navbar-default" role="navigation" id="nav-wrapper">
+ <div class="container-fluid" id="nav">
+ <!--
+ <img class="img-circle" width="40px" height="40px" id="circlelogo"
src="/assets/images/kylin_logo.jpg">
+ -->
+ <!-- Brand and toggle get grouped for better mobile display -->
+ <div class="navbar-header">
+ <img class="navbar-logo" width="46"
src="/assets/images/kylin_logo.png" ></img>
+ <button type="button" class="navbar-toggle collapsed"
data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
+ <span class="sr-only">Toggle navigation</span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ </button>
+ <ul class="nav icon-navbar">
+ <li><a href="https://twitter.com/apachekylin" target="_blank"
class="fa fa-twitter fa-lg" title="Twitter: @ApacheKylin" ></a></li>
+ <li><a href="https://github.com/apache/kylin" target="_blank"
class="fa fa-github-alt fa-lg" title="Github: apache/kylin" ></a></li>
+ <li><a href="https://www.facebook.com/kylinio" target="_blank"
class="fa fa-facebook fa-lg" title="Facebook: kylin.io" ></a></li>
+ </ul>
+ </div>
+
+ <!-- Collect the nav links, forms, and other content for toggling -->
+ <div class="navbar-collapse collapse" id="bs-example-navbar-collapse-1">
+
+ <ul class="nav navbar-nav">
+
+ <li><a href="/">Home</a></li>
+ <li>
+ <a href="/docs" class="dropdown-toggle" data-toggle="dropdown"
role="button" aria-haspopup="true" aria-expanded="false">Docs<span
class="caret"></span></a>
+ <ul class="dropdown-menu">
+ <li><a href="/docs/">Latest Release(Kylin 4.0.1)</a></li>
+ <li><a href="/docs31/">Kylin 3.1.3</a></li>
+ <li><a href="/docs24/">Kylin 2.4.0</a></li>
+ <li><a href="/archive/">Archive</a></li>
+ </ul>
+ </li>
+ <li><a href="/download">Download</a></li>
+ <li><a href="/community" >Community</a></li>
+ <li>
+ <a href="/development" class="dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false">Development<span class="caret"></span></a>
+ <ul class="dropdown-menu">
+ <li><a href="/development40/">Kylin 4.x</a></li>
+ <li><a href="/development/">Kylin 3.x And Older Versions</a></li>
+ </ul>
+ </li>
+ <li><a href="/blog">Blog</a></li>
+ <li><a href="/cn" >䏿ç</a></li>
+ </ul>
+ </div><!-- /.navbar-collapse -->
+ </div><!-- /.container-fluid -->
+ </nav>
+
+ <div id="head" class="parallax normal-header" >
+ <div class="text-center header-apache">
+ <a href="http://apache.org/foundation/contributing.html" title="Support
Apache" style="margin-left: 150px;">
+ <div>
+ <img src="https://www.apache.org/images/SupportApache-small.png" >
+ </div>
+ </a>
+ </div>
+ </div>
+
+ </header>
+
+ <div class="page-content main">
+ <header style=" padding:2em 0 0 ">
+ <div class="container" >
+ <div style=" padding:0 4em">
+ <div class="blog-icon">
+ <img width="30" src="/assets/images/icon_blog_w.png">
+ </div>
+ <h4 class="index-title" style="
float:left;"><span>Apache Kylin⢠Technical Blog</span></h4>
+ </div>
+ </div>
+ </div>
+
+ <div class="container blog">
+ <div>
+ <article class="post-content" >
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<div class="post" style=" padding:2em 4em 4em 4em">
+
+ <header class="post-header">
+ <h1 class="post-title">Kylin 4 now is supporting AWS Glue Catalog</h1>
+ <p class="post-meta" >Mar 17, 2022 ⢠Xiaoxiang Yu</p>
+ </header>
+
+ <article class="post-content" >
+ <h2 id="why-does-installing-kylin-on-emr-need-to-support-aws-glue">Why
does installing Kylin on EMR need to support AWS Glue?</h2>
+
+<h3 id="what-is-aws-glue">What is AWS Glue?</h3>
+
+<p>AWS Glue is a fully hosted ETL (Extract, Transform, and Load) service that
enables AWS users to easily and cost-effectively classify, cleanse, enrich data
and move data between various data storages. AWS Glue consists of a central
metastore called AWS Glue Data Catalog, an ETL engine that can automatically
generate code and a flexible scheduler that can handle dependency resolution,
monitor jobs and retry. AWS Glue is a serverless service, so there is no
infrastructure to set up or manage.</p>
+
+<h3 id="why-does-kylin-need-aws-glue-catalog">Why does Kylin need AWS Glue
Catalog?</h3>
+
+<p>At present, many users in the Kylin community use AWS EMR for running
large-scale distributed data processing jobs on Hadoop, Spark, Hive, Presto,
etc. Without AWS Glue Data Catalog, tables built on these data warehouse
components (like Hive, Spark and Presto) can not be used by any other
components. As the data warehouse needs to answer requirements from various
business departments, they use AWS Glue Data Catalog for metadata storage when
creating the AWS EMR clusters, to share the data sources among different
components and business departments. That is, to build one data cube with data
from each business department, so they can provide quick responses to different
business requirements.<br />
+In modern companies, data is saved on cloud object storage and big data teams
use AWS EMR for data processing, data analysis and model training. But with
data explosion, it becomes really difficult to extract data and the response
time is too long. In other words, the solution of EMR + Spark/Hive cannot meet
the speedy data query requirements from data analysts, O&M personnel and
sales. So some users turn to Apache Kylin as their open-source OLAP
solution.<br />
+Recently, our users approached us with the request that Kylin 4 could directly
read table metadata from AWS Glue. After some collaboration, now Kylin 4
supports AWS Glue Catalog, making it possible for tables and data to be shared
among Hive, Presto, Spark and Kylin. This helps to break down the metadata
barrier, so different topics can be combined to form a big data analysis
platform.</p>
+
+<h3 id="does-kylin-support-aws-glue">Does Kylin support AWS Glue?</h3>
+
+<table>
+ <thead>
+ <tr>
+ <th>Â </th>
+ <th>Kylin version which supports Glue</th>
+ <th>Issue Link</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Kylin on HBase (Before Kylin 4)</td>
+ <td>2.6.6 or higher<br />3.1.0 or higher</td>
+ <td>https://issues.apache.org/jira/browse/KYLIN-4206<br
/>https://zhuanlan.zhihu.com/p/99481373</td>
+ </tr>
+ <tr>
+ <td>Kylin on Parquet</td>
+ <td>4.0.1 or higher</td>
+ <td>This article.</td>
+ </tr>
+ </tbody>
+</table>
+
+<h2 id="prerequisites-for-deployment">Prerequisites for deployment</h2>
+
+<h3 id="software-version">Software Version</h3>
+
+<table>
+ <thead>
+ <tr>
+ <th><strong>Software</strong></th>
+ <th><strong>Version</strong></th>
+ <th>Reference</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Apache Kylin</td>
+ <td>4.0.1 or higher</td>
+ <td><a
href="https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency">KIP
10 refactor hive and hadoop dependency</a>.</td>
+ </tr>
+ <tr>
+ <td>AWS EMR</td>
+ <td>6.5.0 or higher<br />5.33.1 or higher</td>
+ <td><a
href="https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html">Amazon
EMR release 6.5.0 - Amazon EMR</a>.</td>
+ </tr>
+ </tbody>
+</table>
+
+<h3 id="prepare-aws-glue-database-and-tables">Prepare AWS Glue database and
tables</h3>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png"
alt="" /></p>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png"
alt="" /></p>
+
+<ul>
+ <li>Create an EMR cluster.</li>
+</ul>
+
+<p>Note: Parameter hive.metastore.client.factory.class is configured to enable
AWS Glue. For details, you may refer to the commands below.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>aws emr
create-cluster --applications <span class="nv">Name</span><span
class="o">=</span>Hadoop <span class="nv">Name</span><span
class="o">=</span>Hive <span class="nv">Name</span><span
class="o">=</span>Spark <span class="nv">Name</span><span
class="o">=</span>ZooKeeper <span class="nv">Name</span><span
class="o">=</span>Tez <span class="nv">Name</span><span
class="o">=</span>Ganglia <span class="se">\</span>
+ --ec2-attributes <span class="k">${}</span> <span class="se">\</span>
+ --release-label emr-6.5.0 <span class="se">\</span>
+ --log-uri <span class="k">${}</span> <span class="se">\</span>
+ --instance-groups <span class="k">${}</span> <span class="se">\</span>
+ --configurations <span
class="s1">'[{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'</span>
<span class="se">\</span>
+ --auto-scaling-role EMR_AutoScaling_DefaultRole <span class="se">\</span>
+ --ebs-root-volume-size 100 <span class="se">\</span>
+ --service-role EMR_DefaultRole <span class="se">\</span>
+ --enable-debugging <span class="se">\</span>
+ --name <span class="s1">'Kylin4_on_EMR65_with_Glue'</span> <span
class="se">\</span>
+ --region cn-northwest-1
+</code></pre>
+</div>
+
+<ul>
+ <li>Log in to the Master node. Check the Hadoop version and whether the
Hadoop cluster is successfully started.</li>
+</ul>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png"
alt="" /></p>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png"
alt="" /></p>
+
+<h3 id="optionalget-environmental-information">(Optional)Get environmental
information</h3>
+
+<blockquote>
+ <p>If you are using RDS or other metadata storage, you may skip this
step.</p>
+</blockquote>
+
+<p>RDBMS is recommended for metastore in Kylin 4. So for testing purposes, in
this article, we use MariaDB which comes with the Master node for metastore;
for hostname, account and password of MariaDB, see <code
class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code>.</p>
+
+<div class="highlighter-rouge"><pre
class="highlight"><code>kylin.metadata.url<span
class="o">=</span>kylin4_on_cloud@jdbc,url<span
class="o">=</span>jdbc:mysql://<span class="k">${</span><span
class="nv">HOSTNAME</span><span class="k">}</span>:3306/hue,username<span
class="o">=</span>hive,password<span class="o">=</span><span
class="k">${</span><span class="nv">PASSWORD</span><span
class="k">}</span>,maxActive<span class="o">=</span>10,maxIdle<span
class="o">=</span>10,driverClassName<span
class="o">=</span>org.mariadb.jdbc.Driver
+kylin.env.zookeeper-connect-string<span class="o">=</span><span
class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span>
+</code></pre>
+</div>
+
+<p>Configure the variables as per the actual information, for example, replace
${PASSWORD} with the real password, save it locally and it will be used to
start Kylin.</p>
+
+<h3 id="test-the-connectivity-between-spark-sql-and-aws-glue">Test the
connectivity between Spark SQL and AWS Glue</h3>
+
+<p>Test whether AWS Spark SQL can access databases and table metadata through
AWS Glue with Spark-SQL. For the first test, you will find that the startup
fails with an error.</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png"
alt="" /></p>
+
+<p>Replace <code class="highlighter-rouge">hive-site.xml</code> used by Spark
with the following commands.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> /etc/spark/conf
+sudo mv hive-site.xml hive-site.xml.bak
+sudo cp /etc/hive/conf/hive-site.xml .
+</code></pre>
+</div>
+
+<p>Then change the value of <code
class="highlighter-rouge">hive.execution.engine</code> in file <code
class="highlighter-rouge">/etc/spark/conf/hive-site.xml</code> to <code
class="highlighter-rouge">mr</code>, restart Spark-SQL CLI and verify whether
the query for AWS Glueâs table data is successful.</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png"
alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png"
alt="" /></p>
+
+<h3 id="optional-prepare-kylin-spark-enginejar">(Optional) Prepare
kylin-spark-engine.jar</h3>
+
+<blockquote>
+ <p>This issue will be fixed in Apache Kylin 4.0.2. So you can skip this step
after updating to Apache Kylin 4.0.2. For users with Kylin 4.0.1, please refer
to the following steps to replace kylin-spark-engine.jar:</p>
+</blockquote>
+
+<p>Clone Kylin git repository, execute <code class="highlighter-rouge">mvn
clean package -DskipTests</code> to build a new <code
class="highlighter-rouge">kylin-spark-project/kylin-spark-engine/target/kylin-spark-engine-4.0.0-SNAPSHOT.jar</code>
.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>git clone
https://github.com/hit-lacus/kylin.git
+<span class="nb">cd </span>kylin
+git checkout KYLIN-5160
+mvn clean package -DskipTests
+
+<span class="c"># find -name kylin-spark-engine-4.0.0-SNAPSHOT.jar
kylin-spark-project/kylin-spark-engine/target</span>
+</code></pre>
+</div>
+
+<p>Patch link: <a
href="https://github.com/apache/kylin/pull/1819">https://github.com/apache/kylin/pull/1819</a></p>
+
+<h2 id="deploy-kylin-and-connect-to-aws-glue">Deploy Kylin and connect to AWS
Glue</h2>
+
+<h3 id="download-kylin">Download Kylin</h3>
+
+<ol>
+ <li>
+ <p>Download and decompress Kylin. Please download the corresponding Kylin
package according to your EMR version. That is, with EMR 5.X you can download
Spark 2 package; with EMR 6.X you can download Spark 3 package.<br />
+ <code class="highlighter-rouge">shell
+ # aws s3 cp s3://${BUCKET}/apache-kylin-4.0.1-bin-spark3.tar.gz .
+ # wget apache-kylin-4.0.1-bin-spark3.tar.gz
+ tar zxvf apache-kylin-4.0.1-bin-spark3.tar.gz .
+ cd apache-kylin-4.0.1-bin-spark3
+ export KYLIN_HOME=/home/hadoop/apache-kylin-4.0.1-bin-spark3
+</code></p>
+ </li>
+ <li>
+ <p>(Optional) Get MariaDB driver jar<br />
+ > If you are using other databases for metastore, please skip this
step.</p>
+
+ <p><code class="highlighter-rouge">shell
+ cd $KYLIN_HOME
+ mkdir ext
+ cp /usr/lib/hive/lib/mariadb-connector-java.jar $KYLIN_HOME/ext
+</code></p>
+ </li>
+</ol>
+
+<h3 id="prepare-spark">Prepare Spark</h3>
+
+<p>AWS Spark has built-in support of AWS Glue, so you will use AWS Spark when
loading table metadata and building jobs. Kylin 4.0.1 supports Apache Spark
officially. Because the compatibility between Apache Spark and AWS Spark is not
very good, we will use Apache Spark for cube queries. To sum up, you need to
switch between AWS Spark and Apache Spark according to your task (query task or
build task).</p>
+
+<ul>
+ <li>Prepare AWS Spark</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+mkdir ext
+cp /usr/lib/hive/lib/mariadb-connector-java.jar <span
class="nv">$KYLIN_HOME</span>/ext
+</code></pre>
+</div>
+
+<ul>
+ <li>Download Apache Spark
+ <ul>
+ <li>Please download the corresponding Spark installation package
according to your EMR version. That is, with EMR 5.X you can download Spark
2.4.7 and with EMR 6.X you can download Spark 3.1.2.<br />
+<code class="highlighter-rouge">shell
+cd $KYLIN_HOME
+aws s3 cp s3://${BUCKET}/spark-2.4.7-bin-hadoop2.7.tgz $KYLIN_HOME # Or
downloads spark-2.4.7-bin-hadoop2.7.tgz from offical website
+tar zxvf spark-2.4.7-bin-hadoop2.7.tgz
+mv spark-2.4.7-bin-hadoop2.7 spark-apache
+</code></li>
+ </ul>
+ </li>
+ <li>First, you need to load AWS Glue table, so direct <code
class="highlighter-rouge">$KYLIN_HOME/spark</code> to AWS Spark with soft link.
Note: you do not need to set up <code
class="highlighter-rouge">SPARK_HOME</code>, because if <code
class="highlighter-rouge">$KYLIN_HOME/spark</code> exists and <code
class="highlighter-rouge">SPARK_HOME</code> is not set up, Kylin will use <code
class="highlighter-rouge">$KYLIN_HOME/spark</code> as <code
class="highlighter-rouge">SPARK_HOME</code> by default.</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>ln -s spark-aws
spark
+</code></pre>
+</div>
+
+<h3 id="modify-kylin-startup-script">Modify Kylin startup script</h3>
+
+<ol>
+ <li>Start Spark SQL CLI and keep it in running status.</li>
+ <li>
+ <p>Acquire PID of <code class="highlighter-rouge">SparkSQLCLIDriver</code>
with <code class="highlighter-rouge">jps -ml ${PID}</code>. Then acquire <code
class="highlighter-rouge">spark.driver.extraClasspath</code> of
<strong>Driver</strong>. Or, you can acquire these from
/etc/spark/conf/spark-defaults.conf.<br />
+ <code class="highlighter-rouge">shell
+ jps -ml | grep SparkSubmit
+ jinfo ${PID} | grep "spark.driver.extraClassPath"
+</code><br />
+ <img
src="/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png"
alt="" /></p>
+ </li>
+ <li>Edit <code class="highlighter-rouge">bin/kylin.sh</code>, modify <code
class="highlighter-rouge">KYLIN_TOMCAT_CLASSPATH</code> and add <code
class="highlighter-rouge">kylin_driver_classpath</code>; save bin/kylin.sh,
then exit Spark SQL CLI.</li>
+</ol>
+
+<ul>
+ <li>kylin.sh before modifying</li>
+</ul>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png"
alt="" /></p>
+
+<ul>
+ <li>For EMR 6.5.0, in the modified <code
class="highlighter-rouge">kylin.sh</code>, <code
class="highlighter-rouge">kylin_driver_classpath</code> is at the end of the
code.</li>
+</ul>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png"
alt="" /></p>
+
+<ul>
+ <li>For EMR 5.33.1, in the modified <code
class="highlighter-rouge">kylin.sh</code>, <code
class="highlighter-rouge">kylin_driver_classpath</code> is placed before <code
class="highlighter-rouge">$SPARK_HOME/jars</code>.</li>
+</ul>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png"
alt="" /></p>
+
+<h3 id="configure-kylin">Configure Kylin</h3>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+vim conf/kylin.properties
+</code></pre>
+</div>
+
+<h4 id="minimal-kylin-configuration">Minimal Kylin Configuration</h4>
+
+<table>
+ <thead>
+ <tr>
+ <th>Property Key</th>
+ <th>Property Value(Example)</th>
+ <th>Notes</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>kylin.metadata.url</td>
+
<td>kylin4_on_cloud@jdbc,url=jdbc:mysql://${HOSTNAME}:3306/hue,username=hive,password=${PASSWORD},maxActive=10,maxIdle=10,driverClassName=org.mariadb.jdbc.Driver</td>
+ <td>N/A</td>
+ </tr>
+ <tr>
+ <td>kylin.env.zookeeper-connect-string</td>
+ <td>${HOSTNAME}</td>
+ <td>N/A</td>
+ </tr>
+ <tr>
+ <td>kylin.engine.spark-conf.spark.driver.extraClassPath</td>
+
<td>/usr/lib/hadoop-lzo/lib/<em>:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/</em>:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar</td>
+ <td>Copied from spark.driver.extraClasspath in
/etc/spark/conf/spark-default.conf</td>
+ </tr>
+ </tbody>
+</table>
+
+<h3 id="start-kylin-and-verify-the-building-job">Start Kylin and verify the
building job</h3>
+
+<h4 id="start-kylin">Start Kylin</h4>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+ln -s spark spark_aws <span class="c"># skip this step if soft link 'spark'
exists </span>
+bin/kylin.sh restart
+</code></pre>
+</div>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png"
alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png"
alt="" /></p>
+
+<h4 id="optional-replace-kylin-spark-enginejar">(Optional) Replace
kylin-spark-engine.jar</h4>
+
+<blockquote>
+ <p>This step is only required for Kylin 4.0.1 users.</p>
+</blockquote>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span
class="nv">$KYLIN_HOME</span>/tomcat/webapps/kylin/WEB-INF/lib/
+mv kylin-spark-engine-4.0.1.jar kylin-spark-engine-4.0.1.jar.bak <span
class="c"># remove old one </span>
+cp kylin-spark-engine-4.0.0-SNAPSHOT.jar .
+
+bin/kylin.sh restart <span class="c"># restart kylin to make new jar be
loaded</span>
+</code></pre>
+</div>
+
+<h4 id="load-aws-glue-table-and-build">Load AWS Glue table and build</h4>
+
+<ul>
+ <li>Load AWS Glue table metadata</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png"
alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png"
alt="" /></p>
+
+<ul>
+ <li>Create Model and Cube, then trigger a building job.</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png"
alt="" /></p>
+
+<h3 id="verify-the-query">Verify the query</h3>
+
+<p>Switch the Spark used by Kylin and restart Kylin.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+rm spark <span class="c"># 'spark' is a soft link, it is point to aws
spark</span>
+ln -s spark_apache spark <span class="c"># switch from aws spark to apache
spark</span>
+bin/kylin.sh restart
+</code></pre>
+</div>
+
+<p>Perform a test query and this query is successful.</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/17_verify_query_en.png"
alt="" /></p>
+
+<h2 id="discussion-and-qa">Discussion and Q&A</h2>
+
+<h3 id="why-we-must-use-both-aws-spark-and-apache-spark">Why we must use both
AWS Spark and Apache Sparkï¼</h3>
+
+<p>AWS Spark has built-in support for AWS Glue so you will use AWS Spark when
loading table metadata and building jobs; Kylin 4.0.1 supports Apache Spark.
Because the compatibility between Apache Spark and AWS Spark is not very good,
we will use Apache Spark for cube query. To sum up, you need to switch between
AWS Spark and Apache Spark according to your task (query task or build
task).</p>
+
+<h3 id="why-do-users-need-to-modify-kylinsh">Why do users need to modify
kylin.sh?</h3>
+
+<p>As Spark Driver, Kylin needs to load table metadata through <code
class="highlighter-rouge">aws-glue-datacatalog-spark-client.jar</code>, so you
need to modify kylin.sh and load the relevant jar into classpath of Kylin
process.</p>
+
+<h3 id="if-i-faced-more-questions-where-should-i-asked">If I faced more
questions, where should I asked?</h3>
+
+<p>If you have any questions about using Kylin on AWS, please contact us via
mailling list(<a
href="mailto:user@kylin.apache.org">user@kylin.apache.org</a>),
please check for detail <a
href="https://kylin.apache.org/community/">https://kylin.apache.org/community/</a>
.</p>
+
+ </article>
+
+</div>
+
+
+
+
+
+ </article>
+ </div>
+ </div>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<footer id="underfooter">
+ <div>
+ <div class="row">
+ <div class="col-md-12 widget">
+ <div class="widget-body">
+ <div class="footer-img">
+ <a href="http://www.apache.org">
+ <img id="asf-logo" height="78px" alt="Apache
Software Foundation" src="/assets/images/apache_footer.png">
+ </a>
+ </div>
+ <p style="padding-top: 11px;">
+ The contents of this website are © 2015 Apache
Software Foundation under the terms of the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">
Apache License v2 </a>.
+ </p>
+ <p style="margin-bottom: 11px;">
+ Apache Kylin and its logo are trademarks of the Apache
Software Foundation.
+ </div>
+
+ </div>
+ </div>
+ </div>
+ <!-- /row of widgets -->
+
+ </div>
+ <div></div>
+
+</footer>
+
+ <script src="/assets/js/jquery-1.9.1.min.js"></script>
+ <script src="/assets/js/bootstrap.min.js"></script>
+ <script src="/assets/js/main.js"></script>
+ </body>
+</html>
+
+
+
+
Modified: kylin/site/blog/index.html
URL:
http://svn.apache.org/viewvc/kylin/site/blog/index.html?rev=1899035&r1=1899034&r2=1899035&view=diff
==============================================================================
--- kylin/site/blog/index.html (original)
+++ kylin/site/blog/index.html Fri Mar 18 14:13:30 2022
@@ -197,6 +197,16 @@ var _hmt = _hmt || [];
<div class="col-md-6 col-lg-6 col-xs-12">
+ <a class="blog-card"
href="/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/">
+ <div class="blog-pic">
+ <img width="20" src="../assets/images/icon_blog_w.png" />
+ </div>
+ <p class="blog-title">Kylin 4 now is supporting AWS Glue
Catalog</p>
+ <p align="left" class="post-meta">posted: Mar 17, 2022</p>
+ </a>
+ </div>
+
+ <div class="col-md-6 col-lg-6 col-xs-12">
<a class="blog-card"
href="/blog/2022/01/12/The-Future-Of-Kylin/">
<div class="blog-pic">
<img width="20" src="../assets/images/icon_blog_w.png" />
Modified: kylin/site/cn/blog/index.html
URL:
http://svn.apache.org/viewvc/kylin/site/cn/blog/index.html?rev=1899035&r1=1899034&r2=1899035&view=diff
==============================================================================
--- kylin/site/cn/blog/index.html (original)
+++ kylin/site/cn/blog/index.html Fri Mar 18 14:13:30 2022
@@ -199,6 +199,16 @@ var _hmt = _hmt || [];
<div class="col-md-6 col-lg-6 col-xs-12">
+ <a class="blog-card"
href="/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/">
+ <div class="blog-pic">
+ <img width="20" src="/assets/images/icon_blog_w.png" />
+ </div>
+ <p class="blog-title">宿ï¼Kylin 4 ç°å·²æ¯æ AWS Glue
Catalog</p>
+ <p align="left" class="post-meta">posted: Mar 17, 2022</p>
+ </a>
+ </div>
+
+ <div class="col-md-6 col-lg-6 col-xs-12">
<a class="blog-card"
href="/cn_blog/2022/01/12/The-Future-Of-Kylin/">
<div class="blog-pic">
<img width="20" src="/assets/images/icon_blog_w.png" />
Added:
kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
URL:
http://svn.apache.org/viewvc/kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html?rev=1899035&view=auto
==============================================================================
---
kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
(added)
+++
kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
Fri Mar 18 14:13:30 2022
@@ -0,0 +1,638 @@
+<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+<!doctype html>
+<html>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<head>
+ <meta charset="utf-8">
+ <meta http-equiv="X-UA-Compatible" content="IE=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+
+ <title>Apache Kylin | 宿ï¼Kylin 4 ç°å·²æ¯æ AWS Glue Catalog</title>
+ <meta name="description" content="为ä»ä¹å¨ EMR é¨ç½² Kylin éè¦æ¯æ
Glue ï¼">
+ <meta name="author" content="Apache Kylin">
+ <link rel="shortcut icon" href="fav.png" type="image/png">
+
+
+
+<link rel="stylesheet" href="/assets/css/animate.css">
+<!-- Bootstrap -->
+<link rel="stylesheet" href="/assets/css/bootstrap.min.css">
+
+<!-- Fonts -->
+<!-- <link rel="stylesheet"
href="http://fonts.googleapis.com/css?family=Alice|Open+Sans:400,300,700"> -->
+
+<!-- Icons -->
+<link rel="stylesheet" href="/assets/css/font-awesome.min.css">
+
+ <!-- Custom styles -->
+ <link rel="stylesheet" href="/assets/css/styles.css">
+ <link rel="stylesheet" href="/assets/css/docs.css">
+ <link rel="stylesheet" href="/assets/css/pygments.css">
+
+ <link rel="canonical"
href="http://kylin.apache.org/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/">
+ <link rel="alternate" type="application/rss+xml" title="Apache Kylin"
href="http://kylin.apache.org/feed.xml" />
+
+<!--[if lt IE 9]> <script src="assets/js/html5shiv.js"></script> <![endif]-->
+<!-- Global site tag (gtag.js) - Google Analytics -->
+<script async
src="https://www.googletagmanager.com/gtag/js?id=UA-120788561-1"></script>
+<script>
+ window.dataLayer = window.dataLayer || [];
+ function gtag(){dataLayer.push(arguments);}
+ gtag('js', new Date());
+
+ gtag('config', 'UA-120788561-1');
+</script>
+<script type="text/javascript" src="/assets/js/jquery-1.9.1.min.js"></script>
+<script type="text/javascript" src="/assets/js/nside.js"></script> </script>
+<script type="text/javascript" src="/assets/js/nnav.js"></script> </script>
+<script>
+var _hmt = _hmt || [];
+(function() {
+ var hm = document.createElement("script");
+ hm.src = "https://hm.baidu.com/hm.js?bdc5e03add430c0b72cc0eb91eabfa99";
+ var s = document.getElementsByTagName("script")[0];
+ s.parentNode.insertBefore(hm, s);
+})();
+</script>
+
+</head>
+
+ <body>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<header id="header" >
+
+ <!-- Main Menu -->
+ <nav class="navbar navbar-default" role="navigation" id="nav-wrapper">
+ <div class="container-fluid" id="nav">
+ <!--
+ <img class="img-circle" width="40px" height="40px" id="circlelogo"
src="/assets/images/kylin_logo.jpg">
+ -->
+ <!-- Brand and toggle get grouped for better mobile display -->
+ <div class="navbar-header">
+ <img class="navbar-logo" width="46"
src="/assets/images/kylin_logo.png" ></img>
+ <button type="button" class="navbar-toggle collapsed"
data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
+ <span class="sr-only">Toggle navigation</span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ </button>
+ <ul class="nav icon-navbar">
+ <li><a href="https://twitter.com/apachekylin" target="_blank"
class="fa fa-twitter fa-lg" title="Twitter: @ApacheKylin" ></a></li>
+ <li><a href="https://github.com/apache/kylin" target="_blank"
class="fa fa-github-alt fa-lg" title="Github: apache/kylin" ></a></li>
+ <li><a href="https://www.facebook.com/kylinio" target="_blank"
class="fa fa-facebook fa-lg" title="Facebook: kylin.io" ></a></li>
+ </ul>
+ </div>
+
+ <!-- Collect the nav links, forms, and other content for toggling -->
+ <div class="navbar-collapse collapse" id="bs-example-navbar-collapse-1">
+
+ <ul class="nav navbar-nav">
+
+ <li><a href="/">Home</a></li>
+ <li>
+ <a href="/docs" class="dropdown-toggle" data-toggle="dropdown"
role="button" aria-haspopup="true" aria-expanded="false">Docs<span
class="caret"></span></a>
+ <ul class="dropdown-menu">
+ <li><a href="/docs/">Latest Release(Kylin 4.0.1)</a></li>
+ <li><a href="/docs31/">Kylin 3.1.3</a></li>
+ <li><a href="/docs24/">Kylin 2.4.0</a></li>
+ <li><a href="/archive/">Archive</a></li>
+ </ul>
+ </li>
+ <li><a href="/download">Download</a></li>
+ <li><a href="/community" >Community</a></li>
+ <li>
+ <a href="/development" class="dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true"
aria-expanded="false">Development<span class="caret"></span></a>
+ <ul class="dropdown-menu">
+ <li><a href="/development40/">Kylin 4.x</a></li>
+ <li><a href="/development/">Kylin 3.x And Older Versions</a></li>
+ </ul>
+ </li>
+ <li><a href="/blog">Blog</a></li>
+ <li><a href="/cn" >䏿ç</a></li>
+ </ul>
+ </div><!-- /.navbar-collapse -->
+ </div><!-- /.container-fluid -->
+ </nav>
+
+ <div id="head" class="parallax normal-header" >
+ <div class="text-center header-apache">
+ <a href="http://apache.org/foundation/contributing.html" title="Support
Apache" style="margin-left: 150px;">
+ <div>
+ <img src="https://www.apache.org/images/SupportApache-small.png" >
+ </div>
+ </a>
+ </div>
+ </div>
+
+ </header>
+
+ <div class="page-content main">
+ <header style=" padding:2em 0 0 ">
+ <div class="container" >
+ <div style=" padding:0 4em">
+ <div class="blog-icon">
+ <img width="30" src="/assets/images/icon_blog_w.png">
+ </div>
+ <h4 class="index-title" style="
float:left;"><span>Apache Kylin⢠Technical Blog</span></h4>
+ </div>
+ </div>
+ </div>
+
+ <div class="container blog">
+ <div>
+ <article class="post-content" >
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<div class="post" style=" padding:2em 4em 4em 4em">
+
+ <header class="post-header">
+ <h1 class="post-title">宿ï¼Kylin 4 ç°å·²æ¯æ AWS Glue Catalog</h1>
+ <p class="post-meta" >Mar 17, 2022 ⢠Xiaoxiang Yu</p>
+ </header>
+
+ <article class="post-content" >
+ <h2 id="emr--kylin--glue-">为ä»ä¹å¨ EMR é¨ç½² Kylin éè¦æ¯æ Glue
ï¼</h2>
+
+<h3 id="aws-glue">ä»ä¹æ¯ AWS Glueï¼</h3>
+
+<p>AWS Glue æ¯ä¸é¡¹å®å
¨æç®¡ç ETLï¼æåã转æ¢åå
è½½ï¼æå¡ï¼ä½¿ AWS
ç¨æ·è½å¤è½»æ¾èç»æµé«æå°å¯¹æ°æ®è¿è¡åç±»ãæ¸
çåæ©å
ï¼å¹¶å¨åç§æ°æ®åå¨ä¹é´å¯é å°ç§»å¨æ°æ®ãAWS Glue
ç±ä¸ä¸ªç§°ä¸º AWS Glue æ°æ®ç®å½çä¸å¤®å
æ°æ®åå¨åºãä¸ä¸ªèªå¨çæä»£ç ç ETL
弿以åä¸ä¸ªå¤çä¾èµé¡¹è§£æãä½ä¸çæ§åéè¯ççµæ´»è®¡åç¨åºç»æãAWS
Glue æ¯æ æå¡å¨æå¡ï¼å æ¤æ é设置æç®¡çåºç¡è®¾æ½ã</p>
+
+<h3 id="kylin--aws-glue-catalog">Kylin 为ä»ä¹éè¦æ¯æ AWS Glue
Catalogï¼</h3>
+
+<p>ç®åç¤¾åºæå¾å¤ Kylin ç¨æ·å¨ä½¿ç¨ AWS EMRï¼ç»ä»¶ä¸»è¦å
æ¬
HadoopãSparkãHiveãPresto çï¼å¦ææ²¡æé
ç½®ä½¿ç¨ AWS Glue data
Catalogï¼é£ä¹å¨åä¸ªæ°æ®ä»åºç»ä»¶å¦ HiveãSparkãPresto
å»ºçæ°æ®è¡¨ï¼å¨å
¶å®ç»ä»¶ä¸æ¯æ¾ä¸å°çï¼ä¹å°±ä¸è½ä½¿ç¨ï¼å
¬å¸åºå±çæ°æ®ä»åºæ¯æä¾ç»å个ä¸å¡é¨é¨æ¥è¿è¡ä½¿ç¨ï¼ä¸ºäºè§£å³è¿ä¸ªé®é¢ï¼å¨å建
AWS EMR é群æ¶å°±å¯ä»¥ä½¿ç¨ AWS Glue data Catalog æ¥åå¨å
æ°æ®ï¼å¯¹å个ç»ä»¶å
±äº«æ°æ®æºï¼å¯¹å个ä¸å¡é¨é¨è¿è¡å
±äº«æ°æ�
�®æºï¼å°å个ä¸å¡é¨é¨çæ°æ®æå»ºæä¸ä¸ªå¤§çæ°æ®ç«æ¹ä½ï¼è½å¤å¿«éååºå
¬å¸é«éåå±çä¸å¡éæ±ã<br />
+ç°ä»£å
¬å¸çæ°æ®é½æ¯åºäºäºå¹³å°æå»ºï¼å¤§æ°æ®å¢é使ç¨ç
AWS EMR æ¥è¿è¡æ°æ®å
å·¥ãæ°æ®åæã以忍¡åè®ç»ï¼éçæ°æ®æ´å¢å¸¦æ¥ææ°æ
¢ãææ°é¾ï¼EMR/Spark/Hive
å¾é¾æ»¡è¶³æ°æ®åæå¸ãè¿è¥äººåãéå®çå¿«éæ¥è¯¢æ°æ®çéæ±ï¼äºæ¯ä¸äºç¨æ·éæ©äº
Apache Kylin ä½ä¸ºå¼æº OLAP è§£å³æ¹æ¡ã<br />
+使¯æè¿ç¤¾åºç¨æ·èç³»å°æä»¬ï¼åç¥ Kylin 4 è¿ä¸æ¯æä» Glue
读å表å
æ°æ®ï¼æä»¥æä»¬å社åºç¨æ·åä½ä¸èµ·æ£æ¥è¿ééå°çé®é¢å¹¶æç»è§£å³äºé®é¢ï¼ä»è使å¾
Kylin 4 æ¯æäº AWS Glue Catalogï¼è¿æ ·å¸¦æ¥ç好å¤å¨äº
HiveãPrestoãSparkãKylin ä¸å¯ä»¥å
±äº«è¡¨åæ°æ®ï¼ä½¿å¾æ¯ä¸ªä¸»é¢é½ä¸²èèµ·æ¥å½¢æä¸ä¸ªå¤§çæ°æ®åæå¹³å°ï¼æç
´å
æ°æ®éç¢ã</p>
+
+<h3 id="apache-kylin--aws-glue-">Apache Kylin æ¯æ AWS Glue åï¼</h3>
+
+<table>
+ <thead>
+ <tr>
+ <th>Â </th>
+ <th>æ¯æ Glue ç Kylin çæ¬</th>
+ <th>Issue Link</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Kylin on HBase (Before Kylin 4)</td>
+ <td>2.6.6 or higher<br /> 3.1.0 or higher</td>
+ <td>https://issues.apache.org/jira/browse/KYLIN-4206<br
/>https://zhuanlan.zhihu.com/p/99481373</td>
+ </tr>
+ <tr>
+ <td>Kylin on Parquet</td>
+ <td>4.0.1 or higher</td>
+ <td>æ¬æã</td>
+ </tr>
+ </tbody>
+</table>
+
+<h2 id="section">é¨ç½²ååå¤</h2>
+
+<h3 id="section-1">软件信æ¯ä¸è§</h3>
+
+<table>
+ <thead>
+ <tr>
+ <th><strong>Software</strong></th>
+ <th><strong>Version</strong></th>
+ <th>Reference</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Apache Kylin</td>
+ <td>4.0.1 or higher</td>
+ <td>å¿
é¡»æ¯ 4.0.1 以åä¸ï¼è¯¦æ
åè <a
href="https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency">KIP
10 refactor hive and hadoop dependency</a>.</td>
+ </tr>
+ <tr>
+ <td>AWS EMR</td>
+ <td>6.5.0 or higher<br />5.33.1 or higher</td>
+ <td>è¦çEMR 6 / EMR 5 çè¾æ°çæ¬ï¼<a
href="https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html">Amazon
EMR release 6.5.0 - Amazon EMR</a>.</td>
+ </tr>
+ </tbody>
+</table>
+
+<h3 id="glue-">åå¤ Glue æ°æ®åºå表</h3>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png"
alt="" /></p>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png"
alt="" /></p>
+
+<ul>
+ <li>å建 AWS EMR é群ã</li>
+</ul>
+
+<p>è¿éå¯å¨ä¸ä¸ª EMR çé群ï¼éè¦æ³¨æçæ¯ï¼è¿ééè¿é
ç½®
<code class="highlighter-rouge">hive.metastore.client.factory.class</code>
å¯å¨äº Glue å¤é¨å
æ°æ®ã以ä¸å½ä»¤å¯ä»¥ä½ä¸ºåèã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>aws emr
create-cluster --applications <span class="nv">Name</span><span
class="o">=</span>Hadoop <span class="nv">Name</span><span
class="o">=</span>Hive <span class="nv">Name</span><span
class="o">=</span>Spark <span class="nv">Name</span><span
class="o">=</span>ZooKeeper <span class="nv">Name</span><span
class="o">=</span>Tez <span class="nv">Name</span><span
class="o">=</span>Ganglia <span class="se">\</span>
+ --ec2-attributes <span class="k">${}</span> <span class="se">\</span>
+ --release-label emr-6.5.0 <span class="se">\</span>
+ --log-uri <span class="k">${}</span> <span class="se">\</span>
+ --instance-groups <span class="k">${}</span> <span class="se">\</span>
+ --configurations <span
class="s1">'[{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'</span>
<span class="se">\</span>
+ --auto-scaling-role EMR_AutoScaling_DefaultRole <span class="se">\</span>
+ --ebs-root-volume-size 100 <span class="se">\</span>
+ --service-role EMR_DefaultRole <span class="se">\</span>
+ --enable-debugging <span class="se">\</span>
+ --name <span class="s1">'Kylin4_on_EMR65_with_Glue'</span> <span
class="se">\</span>
+ --region cn-northwest-1
+</code></pre>
+</div>
+
+<ul>
+ <li>ç»å½ Master èç¹ï¼å¹¶ä¸æ£æ¥ Hadoop çæ¬ å Hadoop
é群æ¯å¦å¯å¨æåã</li>
+</ul>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png"
alt="" /></p>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png"
alt="" /></p>
+
+<h3 id="optional">è·åç¯å¢ä¿¡æ¯ï¼Optionalï¼</h3>
+
+<blockquote>
+ <p>å¦æä½ ä½¿ç¨ RDS æè
å
¶ä»å
æ°æ®åå¨ï¼è¯·é
æ
è·³è¿æ¤æ¥ã</p>
+</blockquote>
+
+<p>ç±äº Kylin 4.X æ¨èä½¿ç¨ RDBMS ä½ä¸ºå
æ°æ®åå¨ï¼å¤äºæµè¯ç®çï¼è¿éä½¿ç¨ Master èç¹èªå¸¦ç
MariaDB ä½ä¸ºå
æ°æ®åå¨ï¼å
³äº MariaDB ç主æºåç§°ãè´¦å·ãå¯ç
çä¿¡æ¯ï¼å¯ä»¥ä» <code
class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code> è·åã</p>
+
+<div class="highlighter-rouge"><pre
class="highlight"><code>kylin.metadata.url<span
class="o">=</span>kylin4_on_cloud@jdbc,url<span
class="o">=</span>jdbc:mysql://<span class="k">${</span><span
class="nv">HOSTNAME</span><span class="k">}</span>:3306/hue,username<span
class="o">=</span>hive,password<span class="o">=</span><span
class="k">${</span><span class="nv">PASSWORD</span><span
class="k">}</span>,maxActive<span class="o">=</span>10,maxIdle<span
class="o">=</span>10,driverClassName<span
class="o">=</span>org.mariadb.jdbc.Driver
+kylin.env.zookeeper-connect-string<span class="o">=</span><span
class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span>
+</code></pre>
+</div>
+
+<p>è·åè¿äºä¿¡æ¯åï¼å¹¶ä¸æ¿æ¢ä»¥ä¸ Kylin é
置项éé¢çåéï¼å¦ <code
class="highlighter-rouge">${PASSWORD}</code>ï¼ä¿åå°æ¬å°ï¼ä¾ä¸ä¸æ¥å¯å¨
Kylin è¿ç¨ä½¿ç¨ã</p>
+
+<h3 id="spark-sql--aws-glue-">æµè¯ Spark SQL å AWS Glue çè¿éæ§</h3>
+
+<p>éè¿ spark-sql æ¥æµè¯ AWS ç Spark SQL æ¯å¦è½å¤éè¿ Glue
è·åæ°æ®åºå表çå
æ°æ®ï¼é¦æ¬¡ä¼åç°å¯å¨æ¥é失败ã</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png"
alt="" /></p>
+
+<p>å
¶éè¿ä»¥ä¸å½ä»¤æ¿æ¢ Spark 使ç¨ç <code
class="highlighter-rouge">hive-site.xml</code>ã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> /etc/spark/conf
+sudo mv hive-site.xml hive-site.xml.bak
+sudo cp /etc/hive/conf/hive-site.xml .
+</code></pre>
+</div>
+
+<p>å¹¶ä¸ä¿®æ¹ <code
class="highlighter-rouge">/etc/spark/conf/hive-site.xml</code> æä»¶ä¸ <code
class="highlighter-rouge">hive.execution.engine</code> çå¼ä¸º<code
class="highlighter-rouge">mr</code>ï¼å次å°è¯å¯å¨ Spark-SQL
CLIï¼éªè¯å¯¹ Glue çè¡¨æ°æ®æ§è¡æ¥è¯¢æåã</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png"
alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png"
alt="" /></p>
+
+<h3 id="kylin-spark-enginejaroptional">åå¤
kylin-spark-engine.jarï¼Optionalï¼</h3>
+
+<blockquote>
+ <p>妿 Apache Kylin 4.0.2
å·²ç»åå¸ï¼é£ä¹åºè¯¥å·²ç»ä¿®æ¹è¯¥é®é¢ï¼å¯ä»¥è·³è¿æ¤æ¥ãå¦å请åè以䏿¥éª¤ï¼æ¿æ¢
<code class="highlighter-rouge">kylin-spark-engine.jar</code>ï¼</p>
+</blockquote>
+
+<p>åèä¸é¢çå½ä»¤ï¼å
é kylin ä»åºï¼æ§è¡ <code
class="highlighter-rouge">mvn clean package -DskipTests</code>ï¼è·å <code
class="highlighter-rouge">kylin-spark-project/kylin-spark-engine/target/kylin-spark-engine-4.0.0-SNAPSHOT.jar</code>
ã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>git clone
https://github.com/hit-lacus/kylin.git
+<span class="nb">cd </span>kylin
+git checkout KYLIN-5160
+mvn clean package -DskipTests
+
+<span class="c"># find -name kylin-spark-engine-4.0.0-SNAPSHOT.jar
kylin-spark-project/kylin-spark-engine/target</span>
+</code></pre>
+</div>
+
+<p>Patch link: <a
href="https://github.com/apache/kylin/pull/1819">https://github.com/apache/kylin/pull/1819</a></p>
+
+<h2 id="kylin--glue">é¨ç½² Kylin å¹¶è¿æ¥ Glue</h2>
+
+<h3 id="kylin">ä¸è½½ Kylin</h3>
+
+<ol>
+ <li>
+ <p>ä¸è½½å¹¶è§£å Kylin ï¼è¯·æ ¹æ® EMR ççæ¬éæ©å¯¹åºç Kylin
packageï¼å
·ä½æ¥è¯´ï¼EMR 5.X ä½¿ç¨ spark2 ç packageï¼EMR 6.X 使ç¨
spark3 ç packageã<br />
+ <code class="highlighter-rouge">shell
+ # aws s3 cp s3://${BUCKET}/apache-kylin-4.0.1-bin-spark3.tar.gz .
+ # wget apache-kylin-4.0.1-bin-spark3.tar.gz
+ tar zxvf apache-kylin-4.0.1-bin-spark3.tar.gz .
+ cd apache-kylin-4.0.1-bin-spark3
+ export KYLIN_HOME=/home/hadoop/apache-kylin-4.0.1-bin-spark3
+</code></p>
+ </li>
+ <li>
+ <p>è·å RDBMS ç é©±å¨ jarï¼Optionalï¼</p>
+
+ <blockquote>
+ <p>å¦æä½ æ¯ç¨å«ç RDBMS ä½ä¸ºå
æ°æ®åå¨ï¼è¯·è·³è¿æ¤æ¥éª¤ã</p>
+ </blockquote>
+
+ <p><code class="highlighter-rouge">shell
+ cd $KYLIN_HOME
+ mkdir ext
+ cp /usr/lib/hive/lib/mariadb-connector-java.jar $KYLIN_HOME/ext
+</code></p>
+ </li>
+</ol>
+
+<h3 id="spark">åå¤ Spark</h3>
+
+<p>ç±äº AWS Spark å
置对 AWS Glue çæ¯æï¼æä»¥ <strong>å 载表å
æ°æ®åæ§è¡æå»ºéè¦ä½¿ç¨ AWS Spark</strong>ï¼ä½æ¯èèå° Kylin
4.0.1 æ¯æ¯æ Apache Sparkï¼å¹¶ä¸ AWS Spark ç¸å¯¹ Apache Spark
ææ¯è¾å¤§ç代ç ä¿®æ¹ï¼ä¸¤è
å
¼å®¹æ§è¾å·®ï¼æä»¥<strong>æ¥è¯¢ Cube
éè¦ä½¿ç¨ Apache Spark</strong>ãç»¼ä¸æè¿°ï¼éè¦æ ¹æ® Kylin
éè¦æ§è¡æ¥è¯¢ä»»å¡è¿æ¯æå»ºä»»å¡ï¼æ¥åæ¢æä½¿ç¨çç
Sparkã</p>
+
+<ul>
+ <li>åå¤ AWS Spark</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+mkdir ext
+cp /usr/lib/hive/lib/mariadb-connector-java.jar <span
class="nv">$KYLIN_HOME</span>/ext
+</code></pre>
+</div>
+
+<ul>
+ <li>åå¤ Apache Spark
+ <ul>
+ <li>è¯·æ ¹æ® EMR ççæ¬éæ©å¯¹åºç Spark çæ¬å®è£
å
ï¼å
·ä½æ¥è¯´ï¼EMR 5.X ä½¿ç¨ <code class="highlighter-rouge">Spark 2.4.7</code>
ç Spark å®è£
å
ï¼EMR 6.X ä½¿ç¨ <code class="highlighter-rouge">Spark
3.1.2</code> ç Spark å®è£
å
ã<br />
+<code class="highlighter-rouge">shell
+cd $KYLIN_HOME
+aws s3 cp s3://${BUCKET}/spark-2.4.7-bin-hadoop2.7.tgz $KYLIN_HOME # Or
downloads spark-2.4.7-bin-hadoop2.7.tgz from offical website
+tar zxvf spark-2.4.7-bin-hadoop2.7.tgz
+mv spark-2.4.7-bin-hadoop2.7 spark-apache
+</code></li>
+ </ul>
+ </li>
+ <li>å 为è¦å
å è½½ Glue è¡¨ï¼æä»¥è¿ééè¿è½¯é¾æ¥å°<code
class="highlighter-rouge">$KYLIN_HOME/spark</code>æå AWS
Sparkï¼è¯·æ³¨ææ é设置 <code
class="highlighter-rouge">SPARK_HOME</code>ï¼å ä¸ºå¨ <code
class="highlighter-rouge">$KYLIN_HOME/spark</code> åå¨å¹¶ä¸ <code
class="highlighter-rouge">SPARK_HOME</code> æªè®¾ç½®çæ
åµä¸ï¼Kylin
ä¼é»è®¤ä½¿ç¨ <code class="highlighter-rouge">$KYLIN_HOME/spark</code>
ã</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>ln -s spark-aws
spark
+</code></pre>
+</div>
+
+<h3 id="kylin-">ä¿®æ¹ Kylin å¯å¨èæ¬</h3>
+
+<ol>
+ <li>å¯å¨ Spark SQL CLIï¼ä¸éåº</li>
+ <li>
+ <p>éè¿ <code class="highlighter-rouge">jps -ml ${PID}</code> è·å
<code class="highlighter-rouge">SparkSQLCLIDriver</code> ç PIDï¼ç¶åè·å
Driver ç <code
class="highlighter-rouge">spark.driver.extraClasspath</code>ãæè
ä¹å¯ä»¥ä» <code
class="highlighter-rouge">/etc/spark/conf/spark-defaults.conf</code>
è·åã<br />
+ <code class="highlighter-rouge">shell
+ jps -ml | grep SparkSubmit
+ jinfo ${PID} | grep "spark.driver.extraClassPath"
+</code><br />
+ <img
src="/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png"
alt="" /></p>
+ </li>
+ <li>ç¼è¾ <code class="highlighter-rouge">bin/kylin.sh</code>ï¼ä¿®æ¹
<code class="highlighter-rouge">KYLIN_TOMCAT_CLASSPATH</code> åéï¼è¿½å
<code class="highlighter-rouge">kylin_driver_classpath</code> ï¼ä¿å好
<code class="highlighter-rouge">bin/kylin.sh</code> åéåº Spark SQL CLI</li>
+</ol>
+
+<ul>
+ <li>ä¿®æ¹åç kylin.sh</li>
+</ul>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png"
alt="" /></p>
+
+<ul>
+ <li>é对 EMR 6.5.0ï¼ä¿®æ¹åç kylin.shï¼<code
class="highlighter-rouge">kylin_driver_classpath</code> æ¾å°æåã</li>
+</ul>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png"
alt="" /></p>
+
+<ul>
+ <li>é对 EMR 5.33.1ï¼ä¿®æ¹åç kylin.shï¼<code
class="highlighter-rouge">kylin_driver_classpath</code> æ¾å° <code
class="highlighter-rouge">$SPARK_HOME/jars</code> ä¹åã</li>
+</ul>
+
+<p><img
src="/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png"
alt="" /></p>
+
+<h3 id="kylin-1">é
ç½® Kylin</h3>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+vim conf/kylin.properties
+</code></pre>
+</div>
+
+<h4 id="minimal-kylin-configuration">Minimal Kylin Configuration</h4>
+
+<table>
+ <thead>
+ <tr>
+ <th>Property Key</th>
+ <th>Property Value(Example)</th>
+ <th>Notes</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>kylin.metadata.url</td>
+
<td>kylin4_on_cloud@jdbc,url=jdbc:mysql://${HOSTNAME}:3306/hue,username=hive,password=${PASSWORD},maxActive=10,maxIdle=10,driverClassName=org.mariadb.jdbc.Driver</td>
+ <td>N/A</td>
+ </tr>
+ <tr>
+ <td>kylin.env.zookeeper-connect-string</td>
+ <td>${HOSTNAME}</td>
+ <td>N/A</td>
+ </tr>
+ <tr>
+ <td>kylin.engine.spark-conf.spark.driver.extraClassPath</td>
+
<td>/usr/lib/hadoop-lzo/lib/<em>:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/</em>:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar</td>
+ <td>Copied from spark.driver.extraClasspath in
/etc/spark/conf/spark-default.conf</td>
+ </tr>
+ </tbody>
+</table>
+
+<h3 id="kylin--1">å¯å¨ Kylin å¹¶éªè¯æå»º</h3>
+
+<h4 id="kylin-2">å¯å¨ Kylin</h4>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+ln -s spark spark_aws <span class="c"># skip this step if soft link 'spark'
exists </span>
+bin/kylin.sh restart
+</code></pre>
+</div>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png"
alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png"
alt="" /></p>
+
+<h4 id="kylin-spark-enginejar-optional">æ¿æ¢ kylin-spark-engine.jar
(Optional)</h4>
+
+<blockquote>
+ <p>ä»
å¯¹äº 4.0.1 éè¦æä½è¯¥æ¥éª¤ã</p>
+</blockquote>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span
class="nv">$KYLIN_HOME</span>/tomcat/webapps/kylin/WEB-INF/lib/
+mv kylin-spark-engine-4.0.1.jar kylin-spark-engine-4.0.1.jar.bak <span
class="c"># remove old one </span>
+cp kylin-spark-engine-4.0.0-SNAPSHOT.jar .
+
+bin/kylin.sh restart <span class="c"># restart kylin to make new jar be
loaded</span>
+</code></pre>
+</div>
+
+<h4 id="glue--1">å è½½ Glue 表ãæå»º</h4>
+
+<ul>
+ <li>å è½½ Glue 表å
æ°æ®</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png"
alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png"
alt="" /></p>
+
+<ul>
+ <li>å建 Model å Cubeï¼ç¶å触åæå»º</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png"
alt="" /></p>
+
+<h3 id="section-2">éªè¯æ¥è¯¢</h3>
+
+<p>忢 Kylin 使ç¨ç Sparkï¼éå¯ Kylinã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span
class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+rm spark <span class="c"># 'spark' is a soft link, it is point to aws
spark</span>
+ln -s spark_apache spark <span class="c"># switch from aws spark to apache
spark</span>
+bin/kylin.sh restart
+</code></pre>
+</div>
+
+<p>æ§è¡æµè¯æ¥è¯¢ï¼æ¥è¯¢æå</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/17_verify_query_en.png"
alt="" /></p>
+
+<h2 id="section-3">讨论åé®ç</h2>
+
+<h3 id="sparkaws-spark--apache-spark">为ä»ä¹å¿
须使ç¨ä¸¤ä¸ª Sparkï¼AWS
Spark & Apache Sparkï¼ï¼</h3>
+
+<p>ç±äº AWS Spark å
置对 AWS Glue Catalog çæ¯æï¼å¹¶ä¸å
载表åæå»ºå¼æéè¦è·åè¡¨ï¼æä»¥<strong>å 载表å
æ°æ®åæ§è¡æå»ºéè¦ä½¿ç¨ AWS Spark</strong>ï¼ä½æ¯èèå° Kylin
4.0.1 æ¯æ¯æ Apache Sparkï¼å¹¶ä¸ AWS Spark ç¸å¯¹ Apache Spark
ææ¯è¾å¤§ç代ç ä¿®æ¹ï¼é æä¸¤è
å
¼å®¹æ§è¾å·®ï¼æä»¥<strong>æ¥è¯¢ Cube éè¦ä½¿ç¨ Apache
Spark</strong>ãç»¼ä¸æè¿°ï¼éè¦æ ¹æ® Kylin
éè¦æ§è¡æ¥è¯¢ä»»å¡è¿æ¯æå»ºä»»å¡ï¼æ¥åæ¢æä½¿ç¨çç
Sparkã<br />
+å¨å®é
使ç¨è¿ç¨ä¸ï¼å¯ä»¥èè Job Nodeï¼æå»ºä»»å¡ï¼ä½¿ç¨ AWS
Sparkï¼Query Nodeï¼æ¥è¯¢ä»»å¡ï¼ä½¿ç¨ Apache Sparkã</p>
+
+<h3 id="kylinsh">为ä»ä¹éè¦ä¿®æ¹ kylin.shï¼</h3>
+
+<p>Kylin è¿ç¨ä½ä¸º Spark Driver éè¦éè¿<code
class="highlighter-rouge">aws-glue-datacatalog-spark-client.jar</code>å
载表å
æ°æ®ï¼æä»¥è¿åéè¦ä¿®æ¹ kylin.shï¼å°ç¸å
³ jar å è½½å°
Kylin è¿ç¨ç classpathã</p>
+
+ </article>
+
+</div>
+
+
+
+
+
+ </article>
+ </div>
+ </div>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<footer id="underfooter">
+ <div>
+ <div class="row">
+ <div class="col-md-12 widget">
+ <div class="widget-body">
+ <div class="footer-img">
+ <a href="http://www.apache.org">
+ <img id="asf-logo" height="78px" alt="Apache
Software Foundation" src="/assets/images/apache_footer.png">
+ </a>
+ </div>
+ <p style="padding-top: 11px;">
+ The contents of this website are © 2015 Apache
Software Foundation under the terms of the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">
Apache License v2 </a>.
+ </p>
+ <p style="margin-bottom: 11px;">
+ Apache Kylin and its logo are trademarks of the Apache
Software Foundation.
+ </div>
+
+ </div>
+ </div>
+ </div>
+ <!-- /row of widgets -->
+
+ </div>
+ <div></div>
+
+</footer>
+
+ <script src="/assets/js/jquery-1.9.1.min.js"></script>
+ <script src="/assets/js/bootstrap.min.js"></script>
+ <script src="/assets/js/main.js"></script>
+ </body>
+</html>
+
+
+
+