building a cube for demographic data queries

Will Glass-Husain Mon, 10 Oct 2022 23:03:36 -0700

Hi,

Thanks for the recent help as I set up my first Kylin system.   I have a
question regarding proper design of a cube to run some
demographic queries.   I want to make this accessible in a webapp, with
reasonable response time.


I have a CSV file with about 80 columns on sex, race, state, age, internet
access, job, etc.

Can you advise regarding proper cube design?

1) The criteria for filtering (e.g. selecting sex='male') and grouping
(e.g. group by state) should be dimensions - is this correct?

2) Items that I would like to sum should be measures, is that right?   Is
there a limit to the number of measures?  I want to report out up to 300
different measures aggregated by the dimensions.

3)
In MySQL, I am querying for different values like this

select SUM((married=1) * weight) as MARRIED_1, SUM((married=2) * weight) as
MARRIED_2 from data group by state;

This returns the total number of weighted records for records where married
is 1 and where married is 2.

Question - is there a way to do this in the Kylin query?    Or do I need to
pre-compute my weights and create columns MARRIED_1 and MARRIED_2 in the
source data, then sum it in Kylin.

4) This is a tricky one.  Does Kylin support MEDIAN?   In MySQL, there's no
MEDIAN function but we can calculate it by counting all the records, then
selecting the record at an offset of half the records.   I want to
calculate "median" (not mean) for age and some other variables.

Thanks for any tips.

Best, WILL














-- 
William Glass-Husain   /forio  |  +1 (415) 440 7500 x802  |  forio.com
<http://www.forio.com/>

building a cube for demographic data queries

Reply via email to