Kylin Cube Performance

Jason Hale Mon, 01 Aug 2016 16:20:45 -0700

I'm setting up a test case for a portion of our dataset, to evaluate Kylin
and I'm not seeing the performance that I would expect.


The cube building process is taking about 5-6 hours with  ~69,000,000
records and 10 dimensions. I'm not sure if that's the expected build time,
but the other problem is the query performance after building the cube.

All queries were tested with a very simple query (e.g. SELECT SUM(clicks)
FROM reporting GROUP BY search_type)

Grouping by 1 or 2 dimensions gives me very responsive queries (under 2
seconds), but adding more dimensions drastically increases the query time
(over 1 minute and it times out through hbase). I would expect these
queries to have all similar query times since they should query the
respective cuboid, so I'm not sure why the performance would suffer. I
didn't set up any special rules for the cube, but during the build process
it showed all the N-dimension cubes and the log simply said 'skipped'.

Is there something I'm missing in the configuration?

I have a HDP cluster with 3 nodes and 1 client node on which Kylin is
installed. Do I need to adjust the hadoop configuration. I'm using most of
the default HDP settings.

What more information can I provide?

Kylin Cube Performance

Reply via email to