Re: Hive Pulsar Integration

Slim Bouguerra Thu, 25 Apr 2019 09:14:05 -0700

Hey sorry your image is not showing? Not sure why.

On Wed, Apr 24, 2019 at 6:53 AM PengHui Li <codelipeng...@gmail.com> wrote:


> Sorry for so long to reply,
>
> I drew a simple picture, hope can help for the question.
> The main point is to reduce the read of messages from unnecessary topics
> while read data from partitioned table of hive.
> [image: image.png]
>
> Slim Bouguerra <bs...@apache.org> 于2019年4月20日周六 上午12:16写道：
>
>> Hi am not sure am getting the question 100% Can you share a design doc or
>> outline the big picture in your mind? FYI am not very familiar with Pulsar
>> thus please account for that :D
>> But let me point out that Hive does not have the notion of partitions for
>> tables backed by storage handlers, that is because by definition the table
>> is not stored by Hive therefore can not control the layout.
>>
>> Will be happy to look at any POC.
>> looking forward to hear from you.
>>
>> On Wed, Apr 17, 2019 at 7:25 PM PengHui Li <codelipeng...@gmail.com>
>> wrote:
>>
>> > @Slim
>> >
>> > I want to use different pulsar topic to store data for different hive
>> > partition. Is there a way to do this, or does this idea make sense?
>> >
>> > Can you give me some advice?
>> >
>> >
>> > 李鹏辉gmail <codelipeng...@gmail.com> 于2019年4月15日周一 下午6:22写道：
>> >
>> > > I already have a simple implementation that can write data and query
>> > data.
>> > > I read the design document and implementation of kafka.
>> > > There are some differences of table partition with what I think.
>> > >
>> > > I want hive table partition locations work with pulsar topics.
>> Different
>> > > table partitions correspond to different topics.
>> > > But i can’t get the partition where the data will be written.
>> > >
>> > > I know that the drawback of doing this is that it will lose the order
>> of
>> > > the stream data itself.
>> > > But can reduce unnecessary data reading when querying.
>> > >
>> > > Best Regards
>> > >
>> > > Penghui
>> > > Beijing,China
>> > >
>> > >
>> > >
>> > > > 在 2019年4月13日，21:43，Jörn Franke <jornfra...@gmail.com> 写道：
>> > > >
>> > > > I think you need to develop a custom hiveserde + custom
>> > > Hadoopinputformat + custom Hiveoutputformat
>> > > >
>> > > >> Am 12.04.2019 um 17:35 schrieb 李鹏辉gmail <codelipeng...@gmail.com>:
>> > > >>
>> > > >> Hi guys,
>> > > >>
>> > > >> I’m working on integration of hive and pulsar recently. But now i
>> have
>> > > encountered some problems and hope to get help here.
>> > > >>
>> > > >> First of all, i simply describe the motivation.
>> > > >>
>> > > >> Pulsar can be used as infinite streams for keeping both historic
>> data
>> > > and streaming data, So we want to use pulsar as a storage extension
>> for
>> > > hive.
>> > > >> In this way, hive can read the data in pulsar naturally, and can
>> also
>> > > write data into pulsar.
>> > > >> We will benefit from the same data that provides both interactive
>> > query
>> > > and streaming capabilities.
>> > > >>
>> > > >> As an improvement, support data partitioning can make the query
>> more
>> > > efficient(e.g. partition by date or any other field).
>> > > >>
>> > > >> But
>> > > >>
>> > > >> - how to get hive table partition definition?
>> > > >> - While user inert data to hive table, how to get partition the
>> data
>> > > should be store?
>> > > >> - While use select data from hive table, how to determine data is
>> in
>> > > that partition?
>> > > >>
>> > > >> If hive already expose some mechanism to support, please show me
>> how
>> > to
>> > > use it.
>> > > >>
>> > > >> Best regards
>> > > >>
>> > > >> Penghui
>> > > >> Beijing, China
>> > > >>
>> > > >>
>> > > >>
>> > >
>> > >
>> >
>>
> --

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

Re: Hive Pulsar Integration

Reply via email to