CBO doc version & log parse issue

2021-12-13 Thread sam
Hi team, I am trying to learn the CBO of hive because I need to make some performance tuning for my ETL job. I find a confluence doc below, but I am not sure if it is the newest version, can anyone help to confirm that? https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+

subdirectories in External tables.

2011-08-12 Thread Sam William
query on the table , I get an exception . Failed with exception java.io.IOException:java.io.IOException: Not a file:. This is inspite of setting mapred.input.dir.recursive=true; Is this a supported feature in Hive ? Any alternatives ? Sam William sa...@stumbleupon.com

Re: Ignore subdirectories when querying external table

2011-08-19 Thread Sam William
gt; when trying to do a SELECT on the table: > > Failed with exception java.io.IOException:java.io.IOException: Not a file: > hdfs://path/to/partition/path/to/subdir > > Also, it seems to ignore directories prefixed by an underscore (_directory). > > I am using hive 0.7.1 on Hadoop 0.20.2. > > Is there a way to force Hive to ignore all subdirectories in external tables > and only look at files? > > Thanks in advance, > -Dave > Sam William sa...@stumbleupon.com

Re: Ignore subdirectories when querying external table

2011-08-29 Thread Sam William
Dave, Where do you specify the classpath before starting the Hive shell , when you introduce a custom class like this ? Sam On Aug 19, 2011, at 1:22 PM, Dave wrote: > I solved my own problem. For anyone who's curious: > > It turns out that subclassing an InputFormat allows o

HIVE_AUX_JARS_PATH

2011-08-29 Thread Sam William
classes . I get a class not found error . What am I missing here ? Sam William sa...@stumbleupon.com

Re: HIVE_AUX_JARS_PATH

2011-08-29 Thread Sam William
Please ignore my mail. Seems like the site specific hive-env.sh was overriding the env variable ..Its working now . Thanks, Sam On Aug 29, 2011, at 4:28 PM, Aggarwal, Vaibhav wrote: > You need to point to the exact jar file location and not just the directory > location. >

Re: can hive support directory recursive in external localtion

2011-08-31 Thread Sam William
now I wannt to load all files > under /app in one table. Is there any idea? > > R Sam William sa...@stumbleupon.com

Time/date functions - timezone

2011-09-19 Thread Sam William
I have all my slave nodes on PDT timezone. However when I run this query , select from_unixtime(unix_timestamp()) from dual; (dual is a one row table that I created). I get the date/time in UTC.What do I do get the PDT time ( i dont want to write a udf for this ) . Sam William

Re: Asynchronous query exection

2011-11-15 Thread Sam Wilson
If you go this route, you may want to use nohup. This way your processes will continue running even if you lose connection to your terminal session. Other options: 1) You can write your queries to a DB/Queue and have a process running on the Hive server that reads from the DB/queue and runs the

Re: Building out Hive in EC2/S3 versus dedicated servers

2011-11-22 Thread Sam Wilson
We recently adopted Hadoop and Hive for doing some significant data processing. We went the Amazon route. My own $.02 is as follows: If you are already incredibly experienced with Hadoop and Hive and have someone on staff who has previously built a cluster at least as big as the one you are pr

Re: Convert UTC timestamp to PST

2011-12-01 Thread Sam William
; ~Abhishek > > > > > On Thu, Dec 1, 2011 at 10:21 AM, sonia gehlot < sonia.geh...@gmail.com > > wrote: > > > Hi All, > > I have Unix timestamp in my table in UTC format. Is there is any inbuilt > function to convert it into PST or PDT in YYY

Hive UDFs/ FunctionRegistry etc

2011-12-08 Thread Sam William
he inbuilt functions. What options do I have other than modifying FunctionRegistry and recompiling ? Sam William sa...@stumbleupon.com

Re: Hive Metadata URI error

2011-12-11 Thread Sam Wilson
Try file:// in front of the property value... Sent from my iPhone On Dec 12, 2011, at 12:07 AM, "Periya.Data" wrote: > Hi, >I am trying to create Hive tables on an EC2 instance. I get this strange > error about URI schema and log4j properties not found. I do not know how to > fix this. >

Re: drop table -> java.lang.OutOfMemoryError: Java heap space

2012-01-05 Thread Sam Wilson
I recommend trying a daily partitioning scheme over an hourly one. We had a similar setup and ran into the same problem and ultimately found that daily works fine for us, even with larger file sizes. At the very least it is worth evaluating. Sent from my iPhone On Jan 5, 2012, at 2:23 PM, Mat

Reading compressed files (external tables) from hive using DeprecatedLzoTextInputFormat

2012-01-25 Thread Sam William
get this error . Failed with exception java.io.IOException:java.io.IOException: No LZO codec found, cannot run. What am I missing? Any help is appreciated. Thanks, Sam William sa...@stumbleupon.com

Re: rainstor

2012-01-25 Thread Sam Wilson
Google? Sent from my iPhone On Jan 25, 2012, at 7:34 PM, Dalia Sobhy wrote: > Do anyone have any idea about rainstor ??? > > Opensource? How to download ? How to use? PErformance ??

Exception when hive submits M/R jobs

2012-01-31 Thread Sam William
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807) The table is pretty simple . It is an external table on the HDFS and does not have any partitions. Any idea why this could be happening ? Thanks, Sam William sa

Re: Exception when hive submits M/R jobs

2012-02-01 Thread Sam William
fix made it work if [ -z "$HIVE_AUX_JARS_PATH" ]; then HIVE_AUX_JARS_PATH=$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar else HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar Thanks

Building HiveODBC driver

2012-02-08 Thread Sam William
3: error: reference to ‘eventHandler_’ is ambiguous ... I tried with a couple of versions of Thrift , 0.9.0-dev and 0.5.0 . Neither of them proved to be good.Has it got to do with the thrift library version ? Whats the fix.Any help is appreciated. Thanks, Sam W

Re: Building HiveODBC driver

2012-02-09 Thread Sam William
I was able to get across this . The solution is to use thrift-0.6.0 with the following patch https://issues.apache.org/jira/browse/THRIFT-1060 Sam On Feb 8, 2012, at 5:42 PM, Sam William wrote: > Hi, > Im trying to build the HiveODBC driver. The hive source code base I'

hive -e '' not working in 0.8.0

2012-02-16 Thread Sam William
f option works though . Has anyone else faced this ? Sam William sa...@stumbleupon.com

Re: hive -e '' not working in 0.8.0

2012-02-16 Thread Sam William
Sorry guys, I figured this out. We have a shell script that sets the env variables and then calls the standard hive-0.8.0/bin/hive "$@" . The problem was that the quotes around $@ were missing and the hive executable was just getting the first word . It was just my bad. T

Re: Amazon EMR Best Practices for Hive metastore

2012-03-06 Thread Sam Wilson
We also do #4. Initially we had lots of conversations about all the other options and we should do this or that... Ultimately we focused on just going live as quickly as possible and getting more involved in the setup later. Since then the only thing we've needed to do is hack a few o the basel

Initializing hive sessions with Jars/temp functions

2012-04-05 Thread Sam William
my options ? Sam William sa...@stumbleupon.com

Re: Initializing hive sessions with Jars/temp functions

2012-04-05 Thread Sam William
> then alias hive command with hive -i /etc/hiverc > > On Fri, Apr 6, 2012 at 1:05 AM, Sam William wrote: > Hi, > I have this external jar with UDFs . I do not want to everyone in the > company using these functions to run add jar blah.jar; create temporary > funct

Install hive-jars to local maven repository

2012-04-10 Thread Sam William
Are hive jars available on any public maven repos ? If not, is there a way to ask ant to install the built jars to my local ~/.m2/repository ? Sam William sa...@stumbleupon.com

Re: Install hive-jars to local maven repository

2012-04-10 Thread Sam William
Oops. sorry .. . Found multiple repos with hive jars. Thanks Sam On Apr 10, 2012, at 12:37 PM, Edward Capriolo wrote: > Yes hive is in maven. > Is a great site with a search form: > http://mvnrepository.com/artifact/org.apache.hive/hive-common > > On Tue, Apr 10, 2012 at 3:34

HQL macro UDFS (HIVE-2655)

2012-04-30 Thread Sam William
general purpose functions on top of pure JDK API. Eg: string/date/math functions. Im hoping this is doable with HIVE-2655 patch ? Sam William sa...@stumbleupon.com

ENV variables from CLI

2012-05-02 Thread Sam William
est option . Sam William sa...@stumbleupon.com

Re: ENV variables from CLI

2012-05-03 Thread Sam William
Thanks guys,Adding the 'env:' in my 'add jar' works. Sam On May 3, 2012, at 7:35 AM, Edward Capriolo wrote: > That is generally how you set hiveconf. Env variables can be accessed this > way. > > hive> set x=${env:HOME}; > hive> set x; > x

Text file with ctrl chat as delimiter

2012-06-19 Thread Sam William
terminated by '\u0001' stored as textfile location '/tmp/myloc'; did not work . Thanks Sam William sa...@stumbleupon.com

Re: Text file with ctrl chat as delimiter

2012-06-20 Thread Sam William
ctrl char as the delimiter. Mapred Learn, Yes I did have the word 'external' in the create table statement. Thanks, Sam On Jun 20, 2012, at 6:24 AM, Mark Grover wrote: > Sam, > If you can please post a row or two of your data along with any errors you > are get

Re: Text file with ctrl chat as delimiter

2012-06-21 Thread Sam William
Wow .. This works thanks.. Sam On Jun 20, 2012, at 5:01 PM, Mapred Learn wrote: > Hi Sam, > Could you try '\001' instead of '\u0001' ? > > Sent from my iPhone > > On Jun 20, 2012, at 3:57 PM, Sam William wrote: > >> >> >> Mark, >

pipeout files

2012-09-07 Thread Sam Darwin
any time? What might happen if a pipeout file is removed that shouldn't be removed? 2. Is it entirely up the admin to log rotate these?Why aren't they rotated by default when you install the packages? Thanks, Sam

hive table missing

2012-09-09 Thread Sam Darwin
use an automatic deletion. That would certainly be terrible. :-) But I hope you see what I am getting at..Other ways which might cause a table to be lost, besides someone typing in the "drop table" command in a hive session. Thanks, Sam

0.8.0 -> 0.9.0 mysql schema upgrade

2013-01-04 Thread Sam William
;t have a default value The upgrade script from 0.8 to 0.9 doesnt have anything ? What am I missing ? Sam William sa...@stumbleupon.com

Re: 0.8.0 -> 0.9.0 mysql schema upgrade

2013-01-04 Thread Sam William
Looks like this column is not even there in the 0.8/0.9 schema files . I have no idea, how I have it in my schema . I just set a default 'false' value and I m fine now. Sam On Jan 4, 2013, at 2:22 PM, Sam William wrote: > When I upgraded to 0.9.0, Im getting an exception

Optimize hive external tables with serde

2014-10-21 Thread Ja Sam
*Part 1: my enviroment* I have following files uploaded to Hadoop: 1. The are plain text 2. Each line contains JSON like: {code:[int], customerId:[string], data:{[something more here]}} 1. code are numbers from 1 to 3000, 2. customerId are total up to 4 millions, daily up to 0.5 mil

Setting job diagnostics to REDUCE capability required - error in hive

2014-11-07 Thread Ja Sam
I have a simple query with grouping. Something similar to bellow: SELECT col1, col2, col3, min(date), count(*) FROM tblX WHERE partitionDate="20141107" GROUP BY col1, col2, col3; When I run this query through WebHCat everything works fine. But when I try to run it from hive shell I have

Re: Setting job diagnostics to REDUCE capability required - error in hive

2014-11-07 Thread Ja Sam
ails please ? setting config parameters optimally in > yarn/mr configs might help you but please do so wisely as it may imbalance > other things if not implemented thoughtfully. > > regards > Devopam > > On Fri, Nov 7, 2014 at 7:56 PM, Ja Sam wrote: > >> I have a simple

Re: Setting job diagnostics to REDUCE capability required - error in hive

2014-11-07 Thread Ja Sam
I found the problem. I had a diffrent configuration on namnode in yarn-site.xml and on datanodes in same file. I still don't know why, but this is easy to fix On Fri, Nov 7, 2014 at 3:41 PM, Ja Sam wrote: > I don't use any scheduler. Anyway this error happens when we try to run

Need suggestions on processing JSON junk (e.g., invalid double quotes) data using HIVE

2015-10-22 Thread Sam Joe
Hi, After streaming twitter data to HDFS using Flume, I'm trying to analyze it using some HIVE queries. The data is in JSON format and not clean having double quotes (") in wrong places causing the HIVE queries to fail. I am getting the following error: Failed with exception java.io.IOException:o

Re: Need suggestions on processing JSON junk (e.g., invalid double quotes) data using HIVE

2015-10-22 Thread Sam Joe
aus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:454) at org.codehaus.jackson.impl.ReaderBasedParser._parseFieldName2(ReaderBasedParser.java:1025)

Reading JSON data & org.apache.hadoop.hive.contrib.serde2.JsonSerde

2015-10-23 Thread Sam Joe
Hi, Does *org.apache.hadoop.hive.contrib.serde2.JsonSerde* come with features of reading nested data? Also, could you please help me with a location to download the jar for: *org.apache.hadoop.hive.contrib.serde2.JsonSerde*? Appreciate your help! Thanks, Joel

Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
Hi, Is it possible to use json_tuple function to extract data from json arrays (nested too). I am trying to process json data as string and avoid using serdes since user data may be malformed. Please see a sample json data given below: { "filter_level": "low", "retweeted": false, "in_reply_t

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
I tried using EXPLODE function on the nested json array but it doesn't work and throws following error: FAILED: UDFArgumentException explode() takes an array or a map as a parameter Thanks, Joel On Tue, Oct 27, 2015 at 3:20 PM, Sam Joe wrote: > Hi, > > Is it possible to

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
as in these examples: > > > http://mechanics.flite.com/blog/2014/04/16/using-explode-and-lateral-view-in-hive/ > > > http://stackoverflow.com/questions/28716165/how-to-query-struct-array-with-hive-get-json-object > > > > > > *From:* Sam Joe [mailto:games2013@gmai

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
> > SELECT get_json_object(text_col, '$.id') as id FROM tweets_raw limit 10; > > > > You should also be able to use json_tuple(), but start simple > > > > *From:* Sam Joe [mailto:games2013@gmail.com] > *Sent:* Tuesday, October 27, 2015 1:43 PM > &

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
e_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301608994586624"},{"sizes":{"thumb":{"w":150,"resize

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
working query below > and then work on getting the lateral view explode to work against the temp > table. > > > > FAILED: UDFArgumentException explode() takes an array or a map as a > parameter > > Apparently, hive doesn't think tr3.media is an array or map..s

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
Aggarwal wrote: > Hello Sam, > You can easily achieve this by using elephant-bird.jars in pig. We are > also caturing tweets via flume and filter them using pig and elephant-jars. > You can find the related jars over internet. > > Cheers, > Nishant Aggarwal > On 28 Oct 2015