Re: Kylin real usecase on AI/ML (data science) project

Xiaoxiang Yu Mon, 13 Nov 2023 01:23:09 -0800

1.  Query them instantly is not possible, you need to trigger a build job and 
wait it completed, it will cost about 5-30 mintues in most cases. So 
the delay caused by Kylin is 5-30 minites.

2. DS/AI can send SQL query using Python and get the result(if kylinpy works 
well), just like you do in Kylin insight window.

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 17:09:59, "Nam Đỗ Duy via user" <user@kylin.apache.org> wrote:

Thank you Xiaoxiang for answering my previous question

1. For previous question 1, if I can ingest data near real-time into Hive 
table, can that near realtime data be queried in Kylin insights windows by SQL 
query almost instantly? If not then how can I reflect near realtime data in 
(Kylin insights Window as well as in PowerBI report which connect to Kylin via 
mez)?

2. For previous question 2, if DS/AI team cannot access Kylin parquet file via 
java/python/scala then can they:

2.1) access the Hive Star schema table?
2.2) access kylin cube via API?
2.3) access computed fields of kylin cube via API
2.4 access kylin model's  measures via API

Thank you very much

On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu <x...@apache.org> wrote:

Hi,
Question 1:
You are almost right.
If the Cube not ready, Kylin will use SparkSQL to execute query directly on 
original tables. 

Question 2:
It is possible but very hard.
The index data are saved in Parquet format, it is possible to read them by 
Spark, but the columns' name are encoded
 so you don't understand which columns are useful to you. The mapping from 
parquet files' 
columns to Model's dimensions or measures is stored Kylin's metastore, so the 
knowledge of Kylin source code 
is required to make good use of model/index files when reading them directly.

If we have a Python library(like 
https://github.com/Kyligence/kylinpy/tree/master) which provide
 the ability that you can send SQL to Kylin. Will it be helpful to your Data 
science team? 
Following is an example.

```
 >>> import sqlalchemy as sa
 >>> import pandas as pd
 >>> kylin_engine = 
 >>> sa.create_engine('kylin://ADMIN:KYLIN@sandbox/learn_kylin?timeout=60&is_debug=1')
 >>> sql = 'select * from kylin_sales limit 10'
 >>> pd.read_sql(sql, kylin_engine)

```

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 16:02:20, "Nam Đỗ Duy via user" <user@kylin.apache.org> wrote:

Hi Xiaoxiang,

Basically you can imagine the scenario that there will be3 teams who will be 
using Kylin's Cube: 

a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset to 
access kylin Cube as well.
b) Data science team (DS) who is using Pyspark, SparkML currently assessing 
HDFS and parquet directly as raw file.
c) AI team who is using various interfaces like Java, Python, Scala to assess 
HDFS and parquet directly as raw file.

I have two questions:

1) For team a) DA: when using the ODBC or mez connector, if the Cube not ready 
then I guess the PowerBI is accessing HIVE parquet file, is n't it?

2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how can 
Hive/Kylin provide more merits to these teams? For this question, I imagine of 
OLAP speed or computed metrics etc but I am not sure so please advise

Thank you very much

On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu <x...@apache.org> wrote:

Do you have any specific business scenario? Looks like there is 
not such real usecase at the moment. 

--

Best wishes to you ! 
From ：Xiaoxiang Yu

At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" <user@kylin.apache.org> wrote:

Dear Sir/Madam

I am persuading my company to use kylin as olap platform so please kindly share 
with me (inbox me if you hesitate to share publicly) your real use-cases to 
help me answer our boss’s question:

1. Which companies are using kylin now
2. How do you use kylin’s capabilities in your AI/ML projects

Thank you very much for your valuable time and support

Re: Kylin real usecase on AI/ML (data science) project

Reply via email to