Re: Kylin real usecase on AI/ML (data science) project

Nam Đỗ Duy via user Mon, 13 Nov 2023 01:58:34 -0800

Thank you Xiaoxiang,

1. For my question of near real time data: this scenario is not about
querying the cube (index), I am mentioning the query against the Hive table
only: is that possible to instantly querying the non_cube data if the data
is already in Hive?


Best regards

On Mon, Nov 13, 2023 at 4:23 PM Xiaoxiang Yu <x...@apache.org> wrote:

> 1.  Query them instantly is not possible, you need to trigger a build job
> and wait it completed, it will cost about 5-30 mintues in most cases. So
> the delay caused by Kylin is 5-30 minites.
>
> 2. DS/AI can send SQL query using Python and get the result(if kylinpy
> works well), just like you do in Kylin insight window.
>
>
>
>
> --
> *Best wishes to you ! *
> *From ：**Xiaoxiang Yu*
>
>
> At 2023-11-13 17:09:59, "Nam Đỗ Duy via user" <user@kylin.apache.org>
> wrote:
>
> Thank you Xiaoxiang for answering my previous question
>
> 1. For previous question 1, if I can ingest data near real-time into Hive
> table, can that near realtime data be queried in Kylin insights windows by
> SQL query almost instantly? If not then how can I reflect near
> realtime data in (Kylin insights Window as well as in PowerBI report which
> connect to Kylin via mez)?
>
> 2. For previous question 2, if DS/AI team cannot access Kylin parquet file
> via java/python/scala then can they:
>
> 2.1) access the Hive Star schema table?
> 2.2) access kylin cube via API?
> 2.3) access computed fields of kylin cube via API
> 2.4 access kylin model's  measures via API
>
> Thank you very much
>
> On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu <x...@apache.org> wrote:
>
>> Hi,
>> Question 1:
>> You are almost right.
>> If the Cube not ready, Kylin will use SparkSQL to execute query directly
>> on original tables.
>>
>> Question 2:
>> It is possible but very hard.
>> The index data are saved in Parquet format, it is possible to read them
>> by Spark, but the columns' name are encoded
>>  so you don't understand which columns are useful to you. The mapping
>> from parquet files'
>> columns to Model's dimensions or measures is stored Kylin's metastore, so
>> the knowledge of Kylin source code
>> is required to make good use of model/index files when reading them
>> directly.
>>
>> If we have a Python library(like
>> https://github.com/Kyligence/kylinpy/tree/master) which provide
>>  the ability that you can send SQL to Kylin. Will it be helpful to your
>> Data science team?
>> Following is an example.
>>
>>
>> ```
>>  >>> import sqlalchemy as sa
>>  >>> import pandas as pd
>>  >>> kylin_engine = sa.create_engine('kylin://ADMIN:KYLIN@sandbox
>> /learn_kylin?timeout=60&is_debug=1')
>>  >>> sql = 'select * from kylin_sales limit 10'
>>  >>> pd.read_sql(sql, kylin_engine)
>>
>> ```
>>
>>
>>
>>
>> --
>> *Best wishes to you ! *
>> *From ：**Xiaoxiang Yu*
>>
>>
>> At 2023-11-13 16:02:20, "Nam Đỗ Duy via user" <user@kylin.apache.org>
>> wrote:
>>
>> Hi Xiaoxiang,
>>
>> Basically you can imagine the scenario that there will be3 teams who will
>> be using Kylin's Cube:
>>
>> a) Data analyst team (DA) who is using PowerBI (via ODBC or mez),
>> superset to access kylin Cube as well.
>> b) Data science team (DS) who is using Pyspark, SparkML currently
>> assessing HDFS and parquet directly as raw file.
>> c) AI team who is using various interfaces like Java, Python, Scala to
>> assess HDFS and parquet directly as raw file.
>>
>> I have two questions:
>>
>> 1) For team a) DA: when using the ODBC or mez connector, if the Cube not
>> ready then I guess the PowerBI is accessing HIVE parquet file, is n't it?
>> 2) For DS/AI team: you see they are accessing the raw hdfs/parquet then
>> how can Hive/Kylin provide more merits to these teams? For this question, I
>> imagine of OLAP speed or computed metrics etc but I am not sure so please
>> advise
>>
>> Thank you very much
>>
>>
>>
>>
>> On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu <x...@apache.org> wrote:
>>
>>> Do you have any specific business scenario? Looks like there is
>>> not such real usecase at the moment.
>>>
>>>
>>>
>>> --
>>> *Best wishes to you ! *
>>> *From ：**Xiaoxiang Yu*
>>>
>>>
>>> At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" <user@kylin.apache.org>
>>> wrote:
>>>
>>> Dear Sir/Madam
>>>
>>> I am persuading my company to use kylin as olap platform so please
>>> kindly share with me (inbox me if you hesitate to share publicly) your real
>>> use-cases to help me answer our boss’s question:
>>>
>>> 1. Which companies are using kylin now
>>> 2. How do you use kylin’s capabilities in your AI/ML projects
>>>
>>> Thank you very much for your valuable time and support
>>>
>>>

Re: Kylin real usecase on AI/ML (data science) project

Reply via email to