Re: [DISCUSS] Separating out the metastore as its own TLP

Edward Capriolo Sun, 02 Jul 2017 15:57:06 -0700

"I do not know how this works for TLP proposals, but I also do not think
the TLP process will "open" anything new up for you. IE I do not think the
proposal will grant anyone a free ride seat on the commiter/pmc list (I
surely would not support that"


I was unclear, I did not mean "you" or "anyone" as a statement to a
particular person in this chain. I meant that: Forming a TLP should not
directly increase the commiter/pmc list to anyone not currently in the Hive
pmc/committer list.

On Sun, Jul 2, 2017 at 6:50 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

>
>
> On Fri, Jun 30, 2017 at 2:49 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> +1
>>
>> As a Calcite PMC member, I am very pleased to see this change. Calcite
>> reads metadata from a variety of sources (including JDBC databases, NoSQL
>> databases such as Cassandra and Druid, and streaming systems), and if more
>> of those sources choose to store their metadata in the metastore it will
>> make our lives easier.
>>
>> Hive’s metastore has established a position as the place to go for
>> metadata in the Hadoop ecosystem. Not all metadata is relational, or
>> processed by Hive, so there are other parties using the metastore who
>> justifiably would like to influence its direction. Opening up the metastore
>> will help retain and extend this position.
>>
>> Julian
>>
>>
>> On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
>> >
>> >
>> > On 2017-06-30 07:56 (-0700), Alan Gates <al...@gmail.com> wrote: >
>> > > A few of us have been talking and come to the conclussion that it
>> would be>
>> > > a good thing to split out the Hive metastore into its own Apache
>> project.>
>> > > Below and in the linked wiki page we explain what we see as the
>> advantages>
>> > > to this and how we would go about it.>
>> > > >
>> > > Hive’s metastore has long been used by other projects in the Hadoop>
>> > > ecosystem to store and access metadata.  Apache Impala, Apache Spark,>
>> > > Apache Drill, Presto, and other systems all use Hive’s metastore.
>> Some,>
>> > > like Impala and Presto can use it as their own metadata system with
>> the>
>> > > rest of Hive not present.>
>> > > >
>> > > This sharing is excellent for the ecosystem.  Together with HDFS it
>> allows>
>> > > users to use the tool of their choice while still accessing the same
>> shared>
>> > > data.  But having this shared metadata inside the Hive project limits
>> the>
>> > > ability of other projects to contribute to the metastore.  It also
>> makes it>
>> > > harder for new systems that have similar but not identical metadata>
>> > > requirements (for example, stream processing systems on top of Apache>
>> > > Kafka) to use Hive’s metastore.  This difficulty for other systems
>> comes>
>> > > out in two ways.  One, it is hard for non-Hive community members to>
>> > > participate in the project.  Second, it adds operational cost since
>> users>
>> > > are forced to deploy all of the Hive jars just to get the metastore
>> to work.>
>> > > >
>> > > Therefore we propose to split Hive’s metastore out into a separate
>> Apache>
>> > > project.  This new project will continue to support the same Thrift
>> API as>
>> > > the current metastore.  It will continue to focus on being a high>
>> > > performance, fault tolerant, large scale, operational metastore for
>> SQL>
>> > > engines and other systems that want to store schema information about
>> their>
>> > > data.>
>> > > >
>> > > By making it a separate project we will enable other projects to join
>> us in>
>> > > innovating on the metastore.  It will simplify operations for
>> non-Hive>
>> > > users that want to use the metastore as they will no longer need to
>> install>
>> > > Hive just to get the metastore.  And it will attract new projects
>> that>
>> > > might otherwise feel the need to solve their metadata problems on
>> their own.>
>> > > >
>> > > Any Hive PMC member or committer will be welcome to join the new
>> project at>
>> > > the same level.  We propose this project go straight to a top level>
>> > > project.  Given that the initial PMC will be formed from experienced
>> Hive>
>> > > PMC members we do not believe incubation will be necessary.  (Note
>> that the>
>> > > Apache board will need to approve this.)>
>> > > >
>> > > Obviously there a many details involved in a proposal like this.
>> Rather>
>> > > than make this a ten page email we have filled out many of the
>> details in a>
>> > > wiki page:>
>> > > https://cwiki.apache.org/confluence/display/Hive/Metastore+
>> TLP+Proposal>
>> > > >
>> > > Yongzhi Chen>
>> > > Vihang Karajgaonkar>
>> > > Sergio Pena>
>> > > Sahil Takiar>
>> > > Aihua Xu>
>> > > Gunther Hagleitner>
>> > > Thejas Nair>
>> > > Alan Gates>
>> > > >
>> >
>> > +1 (from Apache Impala's (incubating) perspective)>
>> >
>> > Dimitris>
>> >
>
>
>
> "Hive’s metastore has established a position as the place to go for
> metadata in the Hadoop ecosystem. Not all metadata is relational, or
> processed by Hive, so there are other parties using the metastore who
> justifiably would like to influence its direction. Opening up the metastore
> will help retain and extend this position."
>
> The metastore is open and parties can influence its direction. Meritocracy
> is earned.
>
> For example: I have seem several parties state they wish Hive metastore
> was packaged such that it was easier to embed/include. However, no one has
> opened a ticket and completed/started/seriously scoped out that work. I do
> not see moving to a TLP and giving the code a new name will drive people to
> take that next step.
>
> I do not know how this works for TLP proposals, but I also do not think
> the TLP process will "open" anything new up for you. IE I do not think the
> proposal will grant anyone a free ride seat on the commiter/pmc list (I
> surely would not support that)
>
>
>

Re: [DISCUSS] Separating out the metastore as its own TLP

Reply via email to