I'm setting up a test case for a portion of our dataset, to evaluate Kylin and I'm not seeing the performance that I would expect.
The cube building process is taking about 5-6 hours with ~69,000,000 records and 10 dimensions. I'm not sure if that's the expected build time, but the other problem is the query performance after building the cube. All queries were tested with a very simple query (e.g. SELECT SUM(clicks) FROM reporting GROUP BY search_type) Grouping by 1 or 2 dimensions gives me very responsive queries (under 2 seconds), but adding more dimensions drastically increases the query time (over 1 minute and it times out through hbase). I would expect these queries to have all similar query times since they should query the respective cuboid, so I'm not sure why the performance would suffer. I didn't set up any special rules for the cube, but during the build process it showed all the N-dimension cubes and the log simply said 'skipped'. Is there something I'm missing in the configuration? I have a HDP cluster with 3 nodes and 1 client node on which Kylin is installed. Do I need to adjust the hadoop configuration. I'm using most of the default HDP settings. What more information can I provide?
