1. Query them instantly is not possible, you need to trigger a build job and wait it completed, it will cost about 5-30 mintues in most cases. So the delay caused by Kylin is 5-30 minites.
2. DS/AI can send SQL query using Python and get the result(if kylinpy works well), just like you do in Kylin insight window. -- Best wishes to you ! From :Xiaoxiang Yu At 2023-11-13 17:09:59, "Nam Đỗ Duy via user" <user@kylin.apache.org> wrote: Thank you Xiaoxiang for answering my previous question 1. For previous question 1, if I can ingest data near real-time into Hive table, can that near realtime data be queried in Kylin insights windows by SQL query almost instantly? If not then how can I reflect near realtime data in (Kylin insights Window as well as in PowerBI report which connect to Kylin via mez)? 2. For previous question 2, if DS/AI team cannot access Kylin parquet file via java/python/scala then can they: 2.1) access the Hive Star schema table? 2.2) access kylin cube via API? 2.3) access computed fields of kylin cube via API 2.4 access kylin model's measures via API Thank you very much On Mon, Nov 13, 2023 at 3:53 PM Xiaoxiang Yu <x...@apache.org> wrote: Hi, Question 1: You are almost right. If the Cube not ready, Kylin will use SparkSQL to execute query directly on original tables. Question 2: It is possible but very hard. The index data are saved in Parquet format, it is possible to read them by Spark, but the columns' name are encoded so you don't understand which columns are useful to you. The mapping from parquet files' columns to Model's dimensions or measures is stored Kylin's metastore, so the knowledge of Kylin source code is required to make good use of model/index files when reading them directly. If we have a Python library(like https://github.com/Kyligence/kylinpy/tree/master) which provide the ability that you can send SQL to Kylin. Will it be helpful to your Data science team? Following is an example. ``` >>> import sqlalchemy as sa >>> import pandas as pd >>> kylin_engine = >>> sa.create_engine('kylin://ADMIN:KYLIN@sandbox/learn_kylin?timeout=60&is_debug=1') >>> sql = 'select * from kylin_sales limit 10' >>> pd.read_sql(sql, kylin_engine) ``` -- Best wishes to you ! From :Xiaoxiang Yu At 2023-11-13 16:02:20, "Nam Đỗ Duy via user" <user@kylin.apache.org> wrote: Hi Xiaoxiang, Basically you can imagine the scenario that there will be3 teams who will be using Kylin's Cube: a) Data analyst team (DA) who is using PowerBI (via ODBC or mez), superset to access kylin Cube as well. b) Data science team (DS) who is using Pyspark, SparkML currently assessing HDFS and parquet directly as raw file. c) AI team who is using various interfaces like Java, Python, Scala to assess HDFS and parquet directly as raw file. I have two questions: 1) For team a) DA: when using the ODBC or mez connector, if the Cube not ready then I guess the PowerBI is accessing HIVE parquet file, is n't it? 2) For DS/AI team: you see they are accessing the raw hdfs/parquet then how can Hive/Kylin provide more merits to these teams? For this question, I imagine of OLAP speed or computed metrics etc but I am not sure so please advise Thank you very much On Mon, Nov 13, 2023 at 2:40 PM Xiaoxiang Yu <x...@apache.org> wrote: Do you have any specific business scenario? Looks like there is not such real usecase at the moment. -- Best wishes to you ! From :Xiaoxiang Yu At 2023-11-13 11:36:35, "Nam Đỗ Duy via user" <user@kylin.apache.org> wrote: Dear Sir/Madam I am persuading my company to use kylin as olap platform so please kindly share with me (inbox me if you hesitate to share publicly) your real use-cases to help me answer our boss’s question: 1. Which companies are using kylin now 2. How do you use kylin’s capabilities in your AI/ML projects Thank you very much for your valuable time and support