Thanks Pau, 
I can consider the use of Hive in the future then. 
It looks like a very nice option. 




De: "Pau Tallada" <tall...@pic.es> 
À: user@hive.apache.org 
Envoyé: Mercredi 30 Janvier 2019 18:49:33 
Objet: Re: Help with a use case 

Hi Antoine, 


Yes, I think you can replace the files backing a table (external or managed) 
and if they conform to the same schema it should work. Remember that you should 
regenerate the statistics afterwards. 

For partitioned tables, you should run "MSCK REPAIR" after creating the new 
directory and adding the file, or Hive will not see it. 

Cheers, 

Pau. 

On Wed, Jan 30, 2019, 10:15 Antoine DUBOIS < [ 
mailto:antoine.dub...@cc.in2p3.fr | antoine.dub...@cc.in2p3.fr ] wrote: 



Hello everyone, 
I'd like some advise on a use-case. 
I have heterogeneous information in different CSV representing more than 100 GB 
in text that I want to consolidate everyday. 
My goal is to be able to provide those information via hive use SQL Alchemy. 
The information will be erase and regenerated everyday. 

I want to use spark to generate a consolidate orc file that would be integrated 
in hive. 

My question is : 
If I define an external table with hive pointing toward a orc file , is it 
possible to simply replace the orc file by a newer version everyday without 
having to remove the table from hive and recreate it ? 
The orc file will always have the same schema , partitioning would occurs on 
the same column but value in this column might evolve. 

is it a possible scenario using hive or did I get it completely wrong ? 

thank you very much in advance for any advise or help given. 

Antoine 




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to