Hi Xiaoxiang,
Basically you can imagine the scenario that there will be3 teams who will
be using Kylin's Cube:
a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset
to access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing
HDFS
Hi,
Question 1:
You are almost right.
If the Cube not ready, Kylin will use SparkSQL to execute query directly on
original tables.
Question 2:
It is possible but very hard.
The index data are saved in Parquet format, it is possible to read them by
Spark, but the columns' name are encoded
so y
Thank you Xiaoxiang for answering my previous question
1. For previous question 1, if I can ingest data near real-time into Hive
table, can that near realtime data be queried in Kylin insights windows by
SQL query almost instantly? If not then how can I reflect near
realtime data in (Kylin insight
1. Query them instantly is not possible, you need to trigger a build job and
wait it completed, it will cost about 5-30 mintues in most cases. So
the delay caused by Kylin is 5-30 minites.
2. DS/AI can send SQL query using Python and get the result(if kylinpy works
well), just like you do in K
Thank you Xiaoxiang,
1. For my question of near real time data: this scenario is not about
querying the cube (index), I am mentioning the query against the Hive table
only: is that possible to instantly querying the non_cube data if the data
is already in Hive?
Best regards
On Mon, Nov 13, 2023
Yes, you are right.
--
Best wishes to you !
From :Xiaoxiang Yu
At 2023-11-13 17:57:59, "Nam Đỗ Duy via user" wrote:
Thank you Xiaoxiang,
1. For my question of near real time data: this scenario is not about querying
the cube (index), I am mentioning the query against the Hive table