Re: Kylin real usecase on AI/ML (data science) project

Nam Đỗ Duy via user Mon, 13 Nov 2023 01:10:29 -0800

Thank you Xiaoxiang for answering my previous question

1. For previous question 1, if I can ingest data near real-time into Hive
table, can that near realtime data be queried in Kylin insights windows by
SQL query almost instantly? If not then how can I reflect near
realtime data in (Kylin insights Window as well as in PowerBI report which
connect to Kylin via mez)?


2. For previous question 2, if DS/AI team cannot access Kylin parquet file
via java/python/scala then can they:

2.1) access the Hive Star schema table?
2.2) access kylin cube via API?
2.3) access computed fields of kylin cube via API
2.4 access kylin model's  measures via API

Thank you very much

On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu <x...@apache.org> wrote:

> Hi,
> Question 1:
> You are almost right.
> If the Cube not ready, Kylin will use SparkSQL to execute query directly
> on original tables.
>
> Question 2:
> It is possible but very hard.
> The index data are saved in Parquet format, it is possible to read them by
> Spark, but the columns' name are encoded
>  so you don't understand which columns are useful to you. The mapping from
> parquet files'
> columns to Model's dimensions or measures is stored Kylin's metastore, so
> the knowledge of Kylin source code
> is required to make good use of model/index files when reading them
> directly.
>
> If we have a Python library(like
> https://github.com/Kyligence/kylinpy/tree/master) which provide
>  the ability that you can send SQL to Kylin. Will it be helpful to your
> Data science team?
> Following is an example.
>
>
> ```
>  >>> import sqlalchemy as sa
>  >>> import pandas as pd
>  >>> kylin_engine = sa.create_engine('kylin://ADMIN:KYLIN@sandbox
> /learn_kylin?timeout=60&is_debug=1')
>  >>> sql = 'select * from kylin_sales limit 10'
>  >>> pd.read_sql(sql, kylin_engine)
>
> ```
>
>
>
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2023-11-13 16:02:20, "Nam Đỗ Duy via user" <user@kylin.apache.org>
> wrote:
>
> Hi Xiaoxiang,
>
> Basically you can imagine the scenario that there will be3 teams who will
> be using Kylin's Cube:
>
> a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset
> to access kylin Cube as well.
> b) Data science team (DS) who is using Pyspark, SparkML currently
> assessing HDFS and parquet directly as raw file.
> c) AI team who is using various interfaces like Java, Python, Scala to
> assess HDFS and parquet directly as raw file.
>
> I have two questions:
>
> 1) For team a) DA: when using the ODBC or mez connector, if the Cube not
> ready then I guess the PowerBI is accessing HIVE parquet file, is n't it?
> 2) For DS/AI team: you see they are accessing the raw hdfs/parquet then
> how can Hive/Kylin provide more merits to these teams? For this question, I
> imagine of OLAP speed or computed metrics etc but I am not sure so please
> advise
>
> Thank you very much
>
>
>
>
> On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu <x...@apache.org> wrote:
>
>> Do you have any specific business scenario? Looks like there is
>> not such real usecase at the moment.
>>
>>
>>
>> --
>> *Best wishes to you ! *
>> *From ：**Xiaoxiang Yu*
>>
>>
>> At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" <user@kylin.apache.org>
>> wrote:
>>
>> Dear Sir/Madam
>>
>> I am persuading my company to use kylin as olap platform so please kindly
>> share with me (inbox me if you hesitate to share publicly) your real
>> use-cases to help me answer our boss’s question:
>>
>> 1. Which companies are using kylin now
>> 2. How do you use kylin’s capabilities in your AI/ML projects
>>
>> Thank you very much for your valuable time and support
>>
>>

Re: Kylin real usecase on AI/ML (data science) project

Reply via email to