Hey sorry your image is not showing? Not sure why. On Wed, Apr 24, 2019 at 6:53 AM PengHui Li <codelipeng...@gmail.com> wrote:
> Sorry for so long to reply, > > I drew a simple picture, hope can help for the question. > The main point is to reduce the read of messages from unnecessary topics > while read data from partitioned table of hive. > [image: image.png] > > Slim Bouguerra <bs...@apache.org> 于2019年4月20日周六 上午12:16写道: > >> Hi am not sure am getting the question 100% Can you share a design doc or >> outline the big picture in your mind? FYI am not very familiar with Pulsar >> thus please account for that :D >> But let me point out that Hive does not have the notion of partitions for >> tables backed by storage handlers, that is because by definition the table >> is not stored by Hive therefore can not control the layout. >> >> Will be happy to look at any POC. >> looking forward to hear from you. >> >> On Wed, Apr 17, 2019 at 7:25 PM PengHui Li <codelipeng...@gmail.com> >> wrote: >> >> > @Slim >> > >> > I want to use different pulsar topic to store data for different hive >> > partition. Is there a way to do this, or does this idea make sense? >> > >> > Can you give me some advice? >> > >> > >> > 李鹏辉gmail <codelipeng...@gmail.com> 于2019年4月15日周一 下午6:22写道: >> > >> > > I already have a simple implementation that can write data and query >> > data. >> > > I read the design document and implementation of kafka. >> > > There are some differences of table partition with what I think. >> > > >> > > I want hive table partition locations work with pulsar topics. >> Different >> > > table partitions correspond to different topics. >> > > But i can’t get the partition where the data will be written. >> > > >> > > I know that the drawback of doing this is that it will lose the order >> of >> > > the stream data itself. >> > > But can reduce unnecessary data reading when querying. >> > > >> > > Best Regards >> > > >> > > Penghui >> > > Beijing,China >> > > >> > > >> > > >> > > > 在 2019年4月13日,21:43,Jörn Franke <jornfra...@gmail.com> 写道: >> > > > >> > > > I think you need to develop a custom hiveserde + custom >> > > Hadoopinputformat + custom Hiveoutputformat >> > > > >> > > >> Am 12.04.2019 um 17:35 schrieb 李鹏辉gmail <codelipeng...@gmail.com>: >> > > >> >> > > >> Hi guys, >> > > >> >> > > >> I’m working on integration of hive and pulsar recently. But now i >> have >> > > encountered some problems and hope to get help here. >> > > >> >> > > >> First of all, i simply describe the motivation. >> > > >> >> > > >> Pulsar can be used as infinite streams for keeping both historic >> data >> > > and streaming data, So we want to use pulsar as a storage extension >> for >> > > hive. >> > > >> In this way, hive can read the data in pulsar naturally, and can >> also >> > > write data into pulsar. >> > > >> We will benefit from the same data that provides both interactive >> > query >> > > and streaming capabilities. >> > > >> >> > > >> As an improvement, support data partitioning can make the query >> more >> > > efficient(e.g. partition by date or any other field). >> > > >> >> > > >> But >> > > >> >> > > >> - how to get hive table partition definition? >> > > >> - While user inert data to hive table, how to get partition the >> data >> > > should be store? >> > > >> - While use select data from hive table, how to determine data is >> in >> > > that partition? >> > > >> >> > > >> If hive already expose some mechanism to support, please show me >> how >> > to >> > > use it. >> > > >> >> > > >> Best regards >> > > >> >> > > >> Penghui >> > > >> Beijing, China >> > > >> >> > > >> >> > > >> >> > > >> > > >> > >> > -- B-Slim _______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______