Re: Question on accessing LLAP as data cache from external containers

2018-01-29 Thread Jörn Franke
Are you looking for sth like this: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html To answer your original question: why not implement the whole job in Hive? Or orchestrate using oozie some parts in mr and some in Huve. > On 30. Jan 2018, at

Question on accessing LLAP as data cache from external containers

2018-01-29 Thread Sungwoo Park
Hello all, I wonder if an external YARN container can send requests to LLAP daemon to read data from its in-memory cache. For example, YARN containers owned by a typical MapReduce job (e.g., TeraSort) could fetch data directly from LLAP instead of contacting HDFS. In this scenario, LLAP daemon ju

Re: Proposal: File based metastore

2018-01-29 Thread Edward Capriolo
On Mon, Jan 29, 2018 at 12:44 PM, Owen O'Malley wrote: > > > On Jan 29, 2018, at 9:29 AM, Edward Capriolo > wrote: > > > > On Mon, Jan 29, 2018 at 12:10 PM, Owen O'Malley > wrote: > >> You should really look at what the Netflix guys are doing on Iceberg. >> >> https://github.com/Netflix/iceberg

Re: Proposal: File based metastore

2018-01-29 Thread Owen O'Malley
> On Jan 29, 2018, at 9:29 AM, Edward Capriolo wrote: > > > > On Mon, Jan 29, 2018 at 12:10 PM, Owen O'Malley > wrote: > You should really look at what the Netflix guys are doing on Iceberg. > > https://github.com/Netflix/iceberg

Re: Proposal: File based metastore

2018-01-29 Thread Edward Capriolo
On Mon, Jan 29, 2018 at 12:10 PM, Owen O'Malley wrote: > You should really look at what the Netflix guys are doing on Iceberg. > > https://github.com/Netflix/iceberg > > They have put a lot of thought into how to efficiently handle tabular data > in S3. They put all of the metadata in S3 except f

Re: Proposal: File based metastore

2018-01-29 Thread Owen O'Malley
You should really look at what the Netflix guys are doing on Iceberg. https://github.com/Netflix/iceberg They have put a lot of thought into how to efficiently handle tabular data in S3. They put all of the metadata in S3 except for a single link to the name of the table's root metadata file. Ot