Can you give more details on the schema? Is it 6 TB just airport information as below?
> On 30. Apr 2017, at 23:08, Zeming Yu <zemin...@gmail.com> wrote: > > I thought relational databases with 6 TB of data can be quite expensive? > >> On 1 May 2017 12:56 am, "Muthu Jayakumar" <bablo...@gmail.com> wrote: >> I am not sure if parquet is a good fit for this? This seems more like filter >> lookup than an aggregate like query. I am curious to see what others have to >> say. >> Would it be more efficient if a relational database with the right index >> (code field in the above case) to perform more efficiently (with spark that >> uses predicate push-down)? >> Hope this helps. >> >> Thanks, >> Muthu >> >>> On Sun, Apr 30, 2017 at 1:45 AM, Zeming Yu <zemin...@gmail.com> wrote: >>> Another question: I need to store airport info in a parquet file and >>> present it when a user makes a query. >>> >>> For example: >>> >>> "airport": { >>> "code": "TPE", >>> "name": "Taipei (Taoyuan Intl.)", >>> "longName": "Taipei, Taiwan >>> (TPE-Taoyuan Intl.)", >>> "city": "Taipei", >>> "localName": "Taoyuan Intl.", >>> "airportCityState": "Taipei, Taiwan" >>> >>> >>> Is it best practice to store just the coce "TPE" and then look up the name >>> "Taipei (Taoyuan Intl.)" from a relational database? Any alternatives? >>> >>>> On Sun, Apr 30, 2017 at 6:34 PM, Jörn Franke <jornfra...@gmail.com> wrote: >>>> Depends on your queries, the data structure etc. generally flat is better, >>>> but if your query filter is on the highest level then you may have better >>>> performance with a nested structure, but it really depends >>>> >>>> > On 30. Apr 2017, at 10:19, Zeming Yu <zemin...@gmail.com> wrote: >>>> > >>>> > Hi, >>>> > >>>> > We're building a parquet based data lake. I was under the impression >>>> > that flat files are more efficient than deeply nested files (say 3 or 4 >>>> > levels down). Is that correct? >>>> > >>>> > Thanks, >>>> > Zeming >>> >>