Hi Yuzhang, Please open a JIRA for this enhancement; If it can be implemented in an elegant way, that will be great!
Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC Email: shaofeng...@apache.org Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: user-subscr...@kylin.apache.org Join Kylin dev mail group: dev-subscr...@kylin.apache.org yuzhang <shifengdefan...@163.com> 于2019年4月23日周二 上午8:56写道: > Hi Shaofeng: > We also take some experiment for add measure after cube be built and > encountered byte error at the very start. The default mapping strategy > between HBase store and measure definition is "multiple measures are stored > in one column of column family", which may cause byte error after add a > measure and insert it in original measure sequence. Add an column for new > measure may be better, I think. > > I just have a preliminary idea, may be impractical for now, about the > measure management design. > Dimensions and metrics are defined once model be designed. The measure > aggregate the metrics in different dimensions to observe the data entities > represented by the model. All of these are design of 'logical view', I > think. The Cube is materialized view of these logical model, which is the > bridge between the logical view and the physical storage (and the highway > is set up). The life cycle of the measure may depend on the model rather > than the cube. > > Based on the design, an measure management can be set up after model > design be completed. We can define the measure based on model. Cubes under > the model can reuse those measure and build their segment data. When a SQL > arrive, Kylin query server need to find the suitable model with suitable > measure, then find the available cube. > > Of course, such an design change will have a very large impact on the > existing kylin architecture, and the query and metadata will have very > large changes. So it seems that it is still on paper. > More realistic or transitional design is increasing the metadata of > the measure. Just as CubeDesc defines the schema, and a relative > CubeInstance manages the built Segments. MeasureDesc can also has a > MeasureInstance to manage the segment containing it. > I observed that kylin's query service generates a GridTable for mapping > between logical views and HBase physical storage: Cuboid + Measure -> Grid > Table <- HBase store. This Grid Table is generated based on CubeDesc and > has such a mapping process for each Segment. Therefore, in the mapping > stage, it is possible to know which columns of the Grid Table can't be > obtained in current segment by the metadata. So the measure data can be > selectively read at the RS backend. > But its life cycle is the same as MeasureDesc, managed by CubeDesc. > > Regarding adding dimensions to the same cube, we also need to consider > aggregation groups and Rowkey order. I am curious and interesting how you > implemented it. > > > > Best regards > > > yuzhang > > yuzhang > shifengdefan...@163.com > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=yuzhang&uid=shifengdefannao%40163.com&iconUrl=http%3A%2F%2Fmail-online.nosdn.127.net%2Fsm1c0446ade9371d208d1e209c8bc0827f.jpg&items=%5B%22shifengdefannao%40163.com%22%5D> > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 > On 4/22/2019 09:05,ShaoFeng Shi<shaofeng...@apache.org> > <shaofeng...@apache.org> wrote: > > Hi Yuzhang, > > Glad to see such a discussion; How to support "schema change" in a friendly > way is what we should do in the next phase, as we see this requirement is > stronger than before. > > Last week I also did a try on 1) adding a dimension after cube be built, > and 2) adding a measure after cube be built; > > For 1) I have got an idea, the first try was successful, and want to > discuss it with the community in some day. > > The 2) was failed; after a new measure is added, the query got failed and > in HBase RS side there is byte parsing error. Then I didn't continue that. > > Could you elaborate your idea on "the measures of the analysis system can > be decoupled from the materialized view(cube) and have their own management > system"? Have you got a rough design on it? Thank you! > > Best regards, > > Shaofeng Shi 史少锋 > Apache Kylin PMC > Email: shaofeng...@apache.org > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > Join Kylin user mail group: user-subscr...@kylin.apache.org > Join Kylin dev mail group: dev-subscr...@kylin.apache.org > > > > > yuzhang <shifengdefan...@163.com> 于2019年4月21日周日 下午8:08写道: > > Hi JiaTao: > Maybe it's necessary that there is an optional auto-complete machanism > among different measure's view, isn't it? > > > yuzhang > > > | | > yuzhang > | > | > shifengdefan...@163.com > | > 签名由网易邮箱大师定制 > On 4/20/2019 11:38,JiaTao Tao<taojia...@gmail.com> wrote: > Hi > > The idea that supports Kylin adding measures dynamically is impressive. > > But in my opinion, once you add a measure, the existing segments should > also calculate the new measure(just add a new measure column). Users can > have many cubes, a cube can have many segments, if measure's view is > different in each segment, it will increase the burden of the user. > > -- > > > Regards! > > Aron Tao > > yuzhang <shifengdefan...@163.com> 于2019年4月20日周六 上午1:43写道: > > Hi dear kylin users and develop team: > Here have some things I want to discuss with community. > As a representative of MOLAP engine, kylin uses pre-aggregation strategies > to provide high-concurrency and second-level response analysis > capabilities, but also loses some flexibility. > The limitation that purge existing segment firstly to add an additional > measure will cause many double calculation and unnecessary disk IO. Such > waste should be avoid especially in MOLAP engine. > For example, there is an cubeA with one measure m1 and segments over time > range1(tr1). Now, user add one measure m2, but don't want to clear segments > over tr1. The value of m2 will exist in tr2, the segments build > subsequently. Sure, tr1 doesn't contain value of m2, which will be > understanded by user who know litte about MOLAP. Querying over tr1 and tr2 > is valid for both m1 and m2, but the result of m2 over tr1 will be null. > It's will be better to reminder user the measure missing.Moreover, > refreshing will supply the m2 to segments over tr1. > Currently, kylin's storage engine uses HBase. The measure are aggregated > values based on combination of various dimension members and stored in a > column of a Column Family in HBase. For the same cube, adding a new measure > will add a column to the HBase table(mapping) and will take effect in the > next build. For the existing HTables(segments), the new column is allowed > to be missing. Refreshing old existing segments will add a new column in > their HTable to store new measure. Value of new measure is aggregated > according to the combination of dimension members in rowkey, without > recalculating existing measure. > Now, For additional measure and even additional dimensions, Kylin's > current solution is Hybrid, but we found the following shortcomings during > use: > 1. Management costs: Repeated maintenance of similar Cubes, most of which > have many intersections of dimensions and indicators. If you want to > perform optimization operations such as pruning, you need to configure all > of these cubes. > 2. A large number of cubes: The initial analysis of the business is not > stable, and analysts often have the need to increase some measures. The > cube is added continuously to the Hybrid group, which will produce a lot of > cubes. > 3. Repeat calculation: If you want to drop the old cube in the Hybrid > group, you need to build the latest cube by compute historical data to > cover the old cube. > Those will result in a lot of waste. > In addition, I felt that the metadata about the measure was not perfect > during the applying of Kylin. > 1. As one of the most important concerns of analysts, if the measures of > the analysis system can be decoupled from the materialized view(cube) and > have their own management system, it may be more flexibility. > 2. Once the dimensions have been choose in cube designing, it's cuboids > are confirmed no matter the number of measures. It may make confuse to > maintenance cubes with different measures but same cuboids. Cubes with > different cuboids should be considered different cube, which is the > definition of cube, isn't it? > It's just some thinking about MOLAP during I using kylin. How do you think > about this? Looking forward your reply, sincerely. > Maybe here are some mistake or misunderstanding, please feel free to > correct me or discuss further more if you find any of them. > Best regards > yuzhang > > > yuzhang > shifengdefan...@163.com > > < > > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=yuzhang&uid=shifengdefannao%40163.com&iconUrl=http%3A%2F%2Fmail-online.nosdn.127.net%2Fsm1c0446ade9371d208d1e209c8bc0827f.jpg&items=%5B%22shifengdefannao%40163.com%22%5D > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 > > >