Hi Yuzhang,

Glad to see such a discussion; How to support "schema change" in a friendly
way is what we should do in the next phase, as we see this requirement is
stronger than before.

Last week I also did a try on 1) adding a dimension after cube be built,
and 2) adding a measure after cube be built;

For 1) I have got an idea, the first try was successful, and want to
discuss it with the community in some day.

The 2) was failed; after a new measure is added, the query got failed and
in HBase RS side there is byte parsing error. Then I didn't continue that.

Could you elaborate your idea on "the measures of the analysis system can
be decoupled from the materialized view(cube) and have their own management
system"? Have you got a rough design on it? Thank you!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]




yuzhang <[email protected]> 于2019年4月21日周日 下午8:08写道:

> Hi JiaTao:
>     Maybe it's necessary that there is an optional auto-complete machanism
> among different measure's view, isn't it?
>
>
> yuzhang
>
>
> | |
> yuzhang
> |
> |
> [email protected]
> |
> 签名由网易邮箱大师定制
> On 4/20/2019 11:38,JiaTao Tao<[email protected]> wrote:
> Hi
>
> The idea that supports Kylin adding measures dynamically is impressive.
>
> But in my opinion, once you add a measure, the existing segments should
> also calculate the new measure(just add a new measure column). Users can
> have many cubes, a cube can have many segments, if measure's view is
> different in each segment, it will increase the burden of the user.
>
> --
>
>
> Regards!
>
> Aron Tao
>
> yuzhang <[email protected]> 于2019年4月20日周六 上午1:43写道:
>
> Hi dear kylin users and develop team:
> Here have some things I want to discuss with community.
> As a representative of MOLAP engine, kylin uses pre-aggregation strategies
> to provide high-concurrency and second-level response analysis
> capabilities, but also loses some flexibility.
> The limitation that purge existing segment firstly to add an additional
> measure will cause many double calculation and unnecessary disk IO. Such
> waste should be avoid especially in MOLAP engine.
> For example, there is an cubeA with one measure m1 and segments over time
> range1(tr1). Now, user add one measure m2, but don't want to clear segments
> over tr1. The value of m2 will exist in tr2, the segments build
> subsequently. Sure, tr1 doesn't contain value of m2, which will be
> understanded by user who know litte about MOLAP. Querying over tr1 and tr2
> is valid for both m1 and m2, but the result of m2 over tr1 will be null.
> It's will be better to reminder user the measure missing.Moreover,
> refreshing will supply the m2 to segments over tr1.
> Currently, kylin's storage engine uses HBase. The measure are aggregated
> values based on combination of various dimension members and stored in a
> column of a Column Family in HBase. For the same cube, adding a new measure
> will add a column to the HBase table(mapping) and will take effect in the
> next build. For the existing HTables(segments), the new column is allowed
> to be missing. Refreshing old existing segments will add a new column in
> their HTable to store new measure. Value of new measure is aggregated
> according to the combination of dimension members in rowkey, without
> recalculating existing measure.
> Now, For additional measure and even additional dimensions, Kylin's
> current solution is Hybrid, but we found the following shortcomings during
> use:
> 1. Management costs: Repeated maintenance of similar Cubes, most of which
> have many intersections of dimensions and indicators. If you want to
> perform optimization operations such as pruning, you need to configure all
> of these cubes.
> 2. A large number of cubes: The initial analysis of the business is not
> stable, and analysts often have the need to increase some measures. The
> cube is added continuously to the Hybrid group, which will produce a lot of
> cubes.
> 3. Repeat calculation: If you want to drop the old cube in the Hybrid
> group, you need to build the latest cube by compute historical data to
> cover the old cube.
> Those will result in a lot of waste.
> In addition, I felt that the metadata about the measure was not perfect
> during the applying of Kylin.
> 1. As one of the most important concerns of analysts, if the measures of
> the analysis system can be decoupled from the materialized view(cube) and
> have their own management system, it may be more flexibility.
> 2. Once the dimensions have been choose in cube designing, it's cuboids
> are confirmed no matter the number of measures. It may make confuse to
> maintenance cubes with different measures but same cuboids. Cubes with
> different cuboids should be considered different cube, which is the
> definition of cube, isn't it?
> It's just some thinking about MOLAP during I using kylin. How do you think
> about this? Looking forward your reply, sincerely.
> Maybe here are some mistake or misunderstanding, please feel free to
> correct me or discuss further more if you find any of them.
> Best regards
> yuzhang
>
>
> yuzhang
> [email protected]
>
> <
> https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=yuzhang&uid=shifengdefannao%40163.com&iconUrl=http%3A%2F%2Fmail-online.nosdn.127.net%2Fsm1c0446ade9371d208d1e209c8bc0827f.jpg&items=%5B%22shifengdefannao%40163.com%22%5D
> >
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
>

Reply via email to