If you trace the source code, you'll find it's not too hard to change to
let a user specify a UDF. But, that's changing the code...
Ed Capriolo posted a more useful response a while back, on the general Hive
mailing list:
"You have the option now to run HQL by creating a hiverc file
https://issues
I'm wondering if my configuration/stack is wrong, or if I'm trying to do
something that is not supported in Hive.
My goal is to choose a compression scheme for Hadoop/Hive and while
comparing configurations, I'm finding that I can't get BZip2 or Gzip to work
with the RCfile format.
Is that supporte
the file was read and I’m able
> to query it using hive.
>
>
>
> Sorry to bother and thanks a bunch for the help! Forcing me to go read
> more about InputFormats is a long term help anyway.
>
>
>
> Pat
>
>
>
> *From:* phil young [mailto:phil.wills.yo..
I found the source code is very helpful for this.
There's a custom serde in the source, with a test case you can review, which
really speeds up development of your SerDe.
org.apache.hadoop.hive.contrib.serde2.TestRegexSerDe
One thing to watch out for though, is that the framework will down-ca
My guess is that the add jar didn't contain a class with the same exact name
of your SerDe.
It would probably help to see the output of your command with logging
hive -hiveconf hive.root.logger=INFO,console -e "
your command;
"
On Mon, Jan 24, 2011 at 11:41 AM, ankit bhatnagar wrote:
>
To be clear, you would then create the table with the clause:
STORED AS
INPUTFORMAT 'your.custom.input.format'
If you make an external table, you'll then be able to point to a directory
(or file) that contains gzipped files, or uncompressed files.
On Fri, Jan 28, 2011 a
This can be accomplished with a custom input format.
Here's a snippet of the relevant code in the customer RecordReader
compressionCodecs = new CompressionCodecFactory(jobConf);
Path file = split.getPath();
final CompressionCodec codec = compressionCodecs.
I'm about to investigate the following situation, but I'd appreciate any
insight that can be given.
We have an external table which is comprised of 3 HDFS files.
We then run an INSERT OVERWRITE which is just a SELECT * from the external
table.
The table being overwritten has N buckets.
The issue i