Is there an index in the RC File to avoid a complete pass on the
record "keys" for matching old and new records. Also, wouldn't the
RCFile need to be rebuilt anyway, since the file actually stores
blocks of n rows by m column achieving a certain block size? I haven't
carefully read the RCFi
Hello Shreepadma,
That's definitely very helpful. I doubted that this would be the case,
but I was thinking that maybe there's a way to do it using a merge
task. I will change my data structure to make it a bit like HBase, and
I hope Hive would still be the right choice for me.. it can be b
Hello,
I couldn't find any example of how to populate columns that were added
to a table. How would Hive tell which row to append by each value of
the newly added columns? Does it do a column name matching?
Sincerely,
Younos
I solved the problem by using a fully qualified path for
hive.exec.scratchdir and then the umask trick worked. It turns out
that hive was creating a different directory (on hdfs) than the one
mapreduce was trying to write into, and that's why the umask didn't
work. This remains a nasty work
Thanks for the reply Tim. It is writable to all (permission 777). As a
side note, I have discovered now that the mapreduce task spawned by
the RCFileOutputDriver is setting mapred.output.dir to a folder under
file:// regardrless of the fs.default.name. This might be expected
beahviour, but
Hello,
I'm using Cloudera's CDH4 with Hive 0.9 and Hive Server 2. I am trying
to load data into hive using the JDBC driver (the one distributed with
Cloudera CDH4 "org.apache.hive.jdbc.HiveDriver". I can create the
staging table and LOAD LOCAL into it. However when I try to insert
data in