Re: clarification please

Ashok Kumar Thu, 29 Oct 2015 13:14:17 -0700

Thank you sir. Very helpful


     On Thursday, 29 October 2015, 15:22, Alan Gates <alanfga...@gmail.com> 
wrote:
   

 


    Ashok Kumar  October 28, 2015 at 22:43 hi gurus,
kindly clarify the following please
   
   - Hive currently does not support indexes or indexes are not used in the 
query

Mostly true.  There is a create index, but Hive does not use the resulting 
index by default.  Some storage formats (ORC, Parquet I think) have their own 
indices they use internally to speed access.

      
   - The lowest granularity for concurrency is partition. If table is 
partitioned, then partition will be lucked in DML operation
  
lucked =locked?  I'm not sure what you intended here.  If you mean locked, then 
it depends.  By default Hive doesn't use locking.  You can set it up to do 
locking via ZooKeeper or as part of Hive transactions.  They have different 
locking models.  See 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions and 
https://cwiki.apache.org/confluence/display/Hive/Locking for more information.

You can sub-partition using buckets, but for most queries partition is the 
lowest level of granularity.  Hive does a lot of work to optimize only reading 
relevant partitions for a query.

      
   - What is the best file format to store Hive table in HDFS? Is this ORC or 
Avro that allow being split and support block compression?
  
It depends on what you want to do.  ORC and Parquet do better for traditional 
data warehousing type queries because they are columnar formats and have lots 
of optimization built in for fast access, pushing filter down into the storage 
level etc. People like Avro and other self describing formats when their data 
brings its own structure.  We very frequently see pipelines where people dump 
Avro, text, etc. into Hive and then ETL it into ORC.

      
   - Text/CSV files. By default if file type is not specified at creation time, 
Hive will default to text file?
  
Out of the box yes, but you can change that in your Hive installation by 
setting hive.default.fileformat in your hive-site.xml.

Alan.

  

Thanks

Re: clarification please

Reply via email to