Hi Marc, While what Dean said is true for different schemas in general, there is a way to do it all in the same table if the schema changes to the TSV file are just additions of new tab-separated columns at the very end of each row and no existing columns are being deleted.
Let's say your TSV file looked like a_val1 b_val1 c_val1 a_val2 b_val2 c_val2 Your table definition could look like: CREATE TABLE t1(a STRING, b STRING, c STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; Now say you insert more data into this table (externally through HDFS commands) which looks like a_val3 b_val3 c_val3 d_val3 a_val4 b_val4 c_val4 d_val4 Now, if you haven't changed the table definition just yet, it still says 3 columns, so a select * from t1 would still give you 3 columns. In order words, the newly added 4th column doesn't get read and doesn't have any adverse effect on the table if the table definition is still old. a_val1 b_val1 c_val1 a_val2 b_val2 c_val2 a_val3 b_val3 c_val3 a_val4 b_val4 c_val4 Now let's go ahead and change the table definition. You could use one of the ALTER TABLE commands for this. I personally like to create external tables, drop the tables and recreate them on top of the same data. Your new table create statement may look something like: CREATE TABLE t1(a STRING, b STRING, c STRING,d STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; A select * from t1 in this case would reveal: a_val1 b_val1 c_val1 NULL a_val2 b_val2 c_val2 NULL a_val3 b_val3 c_val3 d_val3 a_val4 b_val4 c_val4 d_val4 New columns show up correctly. Values from old data for which those columns didn't exist show up as NULLs. So, to summarize, a single table may work, if the following 3 conditions are met. 1. New columns get added to the very end of the existing columns 2. No existing columns are deleted 3. You are ok with NULL showing up as value for old records that don't have a given new column. I tested the above with CSV files, TSV should be the same as well. Mark On Fri, Nov 30, 2012 at 5:40 AM, Dean Wampler < dean.wamp...@thinkbiganalytics.com> wrote: > You'll have to define separate tables for the different schemas. You can > "unify" them in a query with the union feature. You should also remove the > header lines in the files, if you still have them, because Hive does not > ignore them, but treats them as "data". > > dean > > > On Fri, Nov 30, 2012 at 2:59 AM, Marc Canaleta <mcanal...@gmail.com>wrote: > >> Hi all! >> >> We want to use hive to analyze our logs. Our logs will be TSV files, one >> per hour, and as it is possible that we add/remove more columns in the >> future, we will include headers (column names) in each file. >> >> So it is possible that two TSV files for different days/hours have >> different headers. >> >> Is it possible to do this with Hive? >> >> Thanks! >> > > > > -- > *Dean Wampler, Ph.D.* > thinkbiganalytics.com > +1-312-339-1330 > > >