Hi team,
I am trying to learn the CBO of hive because I need to make some performance
tuning for my ETL job.
I find a confluence doc below, but I am not sure if it is the newest version,
can anyone help to confirm that?
https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+
query on the table , I get an exception .
Failed with exception java.io.IOException:java.io.IOException: Not a file:.
This is inspite of setting mapred.input.dir.recursive=true;
Is this a supported feature in Hive ? Any alternatives ?
Sam William
sa...@stumbleupon.com
gt; when trying to do a SELECT on the table:
>
> Failed with exception java.io.IOException:java.io.IOException: Not a file:
> hdfs://path/to/partition/path/to/subdir
>
> Also, it seems to ignore directories prefixed by an underscore (_directory).
>
> I am using hive 0.7.1 on Hadoop 0.20.2.
>
> Is there a way to force Hive to ignore all subdirectories in external tables
> and only look at files?
>
> Thanks in advance,
> -Dave
>
Sam William
sa...@stumbleupon.com
Dave,
Where do you specify the classpath before starting the Hive shell , when you
introduce a custom class like this ?
Sam
On Aug 19, 2011, at 1:22 PM, Dave wrote:
> I solved my own problem. For anyone who's curious:
>
> It turns out that subclassing an InputFormat allows o
classes . I get a class not found error . What am I
missing here ?
Sam William
sa...@stumbleupon.com
Please ignore my mail. Seems like the site specific hive-env.sh was
overriding the env variable ..Its working now .
Thanks,
Sam
On Aug 29, 2011, at 4:28 PM, Aggarwal, Vaibhav wrote:
> You need to point to the exact jar file location and not just the directory
> location.
>
now I wannt to load all files
> under /app in one table. Is there any idea?
>
> R
Sam William
sa...@stumbleupon.com
I have all my slave nodes on PDT timezone. However when I run this query ,
select from_unixtime(unix_timestamp()) from dual; (dual is a one row table
that I created).
I get the date/time in UTC.What do I do get the PDT time ( i dont want to
write a udf for this ) .
Sam William
If you go this route, you may want to use nohup. This way your processes will
continue running even if you lose connection to your terminal session.
Other options:
1) You can write your queries to a DB/Queue and have a process running on the
Hive server that reads from the DB/queue and runs the
We recently adopted Hadoop and Hive for doing some significant data processing.
We went the Amazon route.
My own $.02 is as follows:
If you are already incredibly experienced with Hadoop and Hive and have someone
on staff who has previously built a cluster at least as big as the one you are
pr
; ~Abhishek
>
>
>
>
> On Thu, Dec 1, 2011 at 10:21 AM, sonia gehlot < sonia.geh...@gmail.com >
> wrote:
>
>
> Hi All,
>
> I have Unix timestamp in my table in UTC format. Is there is any inbuilt
> function to convert it into PST or PDT in YYY
he inbuilt functions. What options do I
have other than modifying FunctionRegistry and recompiling ?
Sam William
sa...@stumbleupon.com
Try file:// in front of the property value...
Sent from my iPhone
On Dec 12, 2011, at 12:07 AM, "Periya.Data" wrote:
> Hi,
>I am trying to create Hive tables on an EC2 instance. I get this strange
> error about URI schema and log4j properties not found. I do not know how to
> fix this.
>
I recommend trying a daily partitioning scheme over an hourly one. We had a
similar setup and ran into the same problem and ultimately found that daily
works fine for us, even with larger file sizes.
At the very least it is worth evaluating.
Sent from my iPhone
On Jan 5, 2012, at 2:23 PM, Mat
get this error .
Failed with exception java.io.IOException:java.io.IOException: No LZO codec
found, cannot run.
What am I missing? Any help is appreciated.
Thanks,
Sam William
sa...@stumbleupon.com
Google?
Sent from my iPhone
On Jan 25, 2012, at 7:34 PM, Dalia Sobhy wrote:
> Do anyone have any idea about rainstor ???
>
> Opensource? How to download ? How to use? PErformance ??
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
The table is pretty simple . It is an external table on the HDFS and does
not have any partitions. Any idea why this could be happening ?
Thanks,
Sam William
sa
fix made it work
if [ -z "$HIVE_AUX_JARS_PATH" ]; then
HIVE_AUX_JARS_PATH=$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar
else
HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar
Thanks
3:
error: reference to ‘eventHandler_’ is ambiguous
...
I tried with a couple of versions of Thrift , 0.9.0-dev and 0.5.0 . Neither
of them proved to be good.Has it got to do with the thrift library version
? Whats the fix.Any help is appreciated.
Thanks,
Sam W
I was able to get across this .
The solution is to use thrift-0.6.0 with the following patch
https://issues.apache.org/jira/browse/THRIFT-1060
Sam
On Feb 8, 2012, at 5:42 PM, Sam William wrote:
> Hi,
> Im trying to build the HiveODBC driver. The hive source code base I'
f option works though . Has anyone else faced this ?
Sam William
sa...@stumbleupon.com
Sorry guys, I figured this out. We have a shell script that sets the env
variables and then calls the standard hive-0.8.0/bin/hive "$@" . The
problem was that the quotes around $@ were missing and the hive executable was
just getting the first word .
It was just my bad.
T
We also do #4. Initially we had lots of conversations about all the other
options and we should do this or that... Ultimately we focused on just going
live as quickly as possible and getting more involved in the setup later.
Since then the only thing we've needed to do is hack a few o the basel
my options ?
Sam William
sa...@stumbleupon.com
> then alias hive command with hive -i /etc/hiverc
>
> On Fri, Apr 6, 2012 at 1:05 AM, Sam William wrote:
> Hi,
> I have this external jar with UDFs . I do not want to everyone in the
> company using these functions to run add jar blah.jar; create temporary
> funct
Are hive jars available on any public maven repos ? If not, is there a way to
ask ant to install the built jars to my local ~/.m2/repository ?
Sam William
sa...@stumbleupon.com
Oops. sorry .. . Found multiple repos with hive jars. Thanks
Sam
On Apr 10, 2012, at 12:37 PM, Edward Capriolo wrote:
> Yes hive is in maven.
> Is a great site with a search form:
> http://mvnrepository.com/artifact/org.apache.hive/hive-common
>
> On Tue, Apr 10, 2012 at 3:34
general purpose functions
on top of pure JDK API. Eg: string/date/math functions. Im hoping this is
doable with HIVE-2655 patch ?
Sam William
sa...@stumbleupon.com
est option .
Sam William
sa...@stumbleupon.com
Thanks guys,Adding the 'env:' in my 'add jar' works.
Sam
On May 3, 2012, at 7:35 AM, Edward Capriolo wrote:
> That is generally how you set hiveconf. Env variables can be accessed this
> way.
>
> hive> set x=${env:HOME};
> hive> set x;
> x
terminated by '\u0001' stored as textfile location
'/tmp/myloc';
did not work .
Thanks
Sam William
sa...@stumbleupon.com
ctrl char as the
delimiter.
Mapred Learn,
Yes I did have the word 'external' in the create table statement.
Thanks,
Sam
On Jun 20, 2012, at 6:24 AM, Mark Grover wrote:
> Sam,
> If you can please post a row or two of your data along with any errors you
> are get
Wow .. This works thanks..
Sam
On Jun 20, 2012, at 5:01 PM, Mapred Learn wrote:
> Hi Sam,
> Could you try '\001' instead of '\u0001' ?
>
> Sent from my iPhone
>
> On Jun 20, 2012, at 3:57 PM, Sam William wrote:
>
>>
>>
>> Mark,
>
any time?
What might happen if a pipeout file is removed that shouldn't be
removed?
2. Is it entirely up the admin to log rotate these?Why aren't
they rotated by default when you install the packages?
Thanks,
Sam
use an automatic deletion. That
would certainly be terrible. :-) But I hope you see what I
am getting at..Other ways which might cause a table to be lost,
besides someone typing in the "drop table" command in a hive session.
Thanks,
Sam
;t have a default
value
The upgrade script from 0.8 to 0.9 doesnt have anything ? What am I missing ?
Sam William
sa...@stumbleupon.com
Looks like this column is not even there in the 0.8/0.9 schema files . I have
no idea, how I have it in my schema . I just set a default 'false' value and I
m fine now.
Sam
On Jan 4, 2013, at 2:22 PM, Sam William wrote:
> When I upgraded to 0.9.0, Im getting an exception
*Part 1: my enviroment*
I have following files uploaded to Hadoop:
1. The are plain text
2. Each line contains JSON like:
{code:[int], customerId:[string], data:{[something more here]}}
1. code are numbers from 1 to 3000,
2. customerId are total up to 4 millions, daily up to 0.5 mil
I have a simple query with grouping. Something similar to bellow:
SELECT col1, col2, col3, min(date), count(*)
FROM tblX
WHERE partitionDate="20141107"
GROUP BY col1, col2, col3;
When I run this query through WebHCat everything works fine. But when I try
to run it from hive shell I have
ails please ? setting config parameters optimally in
> yarn/mr configs might help you but please do so wisely as it may imbalance
> other things if not implemented thoughtfully.
>
> regards
> Devopam
>
> On Fri, Nov 7, 2014 at 7:56 PM, Ja Sam wrote:
>
>> I have a simple
I found the problem. I had a diffrent configuration on namnode in
yarn-site.xml and on datanodes in same file.
I still don't know why, but this is easy to fix
On Fri, Nov 7, 2014 at 3:41 PM, Ja Sam wrote:
> I don't use any scheduler. Anyway this error happens when we try to run
Hi,
After streaming twitter data to HDFS using Flume, I'm trying to analyze it
using some HIVE queries. The data is in JSON format and not clean having
double quotes (") in wrong places causing the HIVE queries to fail. I am
getting the following error:
Failed with exception
java.io.IOException:o
aus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
at
org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:454)
at
org.codehaus.jackson.impl.ReaderBasedParser._parseFieldName2(ReaderBasedParser.java:1025)
Hi,
Does *org.apache.hadoop.hive.contrib.serde2.JsonSerde* come with features
of reading nested data?
Also, could you please help me with a location to download the jar for:
*org.apache.hadoop.hive.contrib.serde2.JsonSerde*?
Appreciate your help!
Thanks,
Joel
Hi,
Is it possible to use json_tuple function to extract data from json arrays
(nested too). I am trying to process json data as string and avoid using
serdes since user data may be malformed.
Please see a sample json data given below:
{
"filter_level": "low",
"retweeted": false,
"in_reply_t
I tried using EXPLODE function on the nested json array but it doesn't work
and throws following error:
FAILED: UDFArgumentException explode() takes an array or a map as a
parameter
Thanks,
Joel
On Tue, Oct 27, 2015 at 3:20 PM, Sam Joe wrote:
> Hi,
>
> Is it possible to
as in these examples:
>
>
> http://mechanics.flite.com/blog/2014/04/16/using-explode-and-lateral-view-in-hive/
>
>
> http://stackoverflow.com/questions/28716165/how-to-query-struct-array-with-hive-get-json-object
>
>
>
>
>
> *From:* Sam Joe [mailto:games2013@gmai
>
> SELECT get_json_object(text_col, '$.id') as id FROM tweets_raw limit 10;
>
>
>
> You should also be able to use json_tuple(), but start simple
>
>
>
> *From:* Sam Joe [mailto:games2013@gmail.com]
> *Sent:* Tuesday, October 27, 2015 1:43 PM
>
&
e_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301608994586624"},{"sizes":{"thumb":{"w":150,"resize
working query below
> and then work on getting the lateral view explode to work against the temp
> table.
>
>
>
> FAILED: UDFArgumentException explode() takes an array or a map as a
> parameter
>
> Apparently, hive doesn't think tr3.media is an array or map..s
Aggarwal
wrote:
> Hello Sam,
> You can easily achieve this by using elephant-bird.jars in pig. We are
> also caturing tweets via flume and filter them using pig and elephant-jars.
> You can find the related jars over internet.
>
> Cheers,
> Nishant Aggarwal
> On 28 Oct 2015
51 matches
Mail list logo