Re: Serde moved? version 2.3.0

2017-10-25 Thread Stephen Sprague
ractSerDe, unless the API has changed such that >> such a mapping cannot be done. >> >> Regards, >> Matt >> >> >> >> On Oct 25, 2017, at 7:31 PM, Owen O'Malley >> wrote: >> >> >> On Wed, Oct 25, 2017 at 3:20 PM, Stephen S

Fwd: Serde moved? version 2.3.0

2017-10-25 Thread Stephen Sprague
hould use AbstractSerDe instead. > > .. Owen > > On Oct 25, 2017, at 2:18 PM, Stephen Sprague wrote: > > hey guys, > > could be a dumb question but not being a java type of guy i'm not quite > sure about it. I'm upgrading from 2.1.0 to 2.3.0 and encountering this

Serde moved? version 2.3.0

2017-10-25 Thread Stephen Sprague
hey guys, could be a dumb question but not being a java type of guy i'm not quite sure about it. I'm upgrading from 2.1.0 to 2.3.0 and encountering this error: class not found: org/apache/hadoop/hive/serde2/SerDe so in hive 2.1.0 i see it in this jar: * hive-serde-2.1.0.jar org/apache/hadoop/hi

Re: hive on spark - why is it so hard?

2017-10-01 Thread Stephen Sprague
Now its a matter of comparing the performance with Tez. Cheers, Stephen. On Wed, Sep 27, 2017 at 9:37 PM, Stephen Sprague wrote: > ok.. getting further. seems now i have to deploy hive to all nodes in the > cluster - don't think i had to do that before but not a big deal to do it > now.

Re: hive on spark - why is it so hard?

2017-09-27 Thread Stephen Sprague
bly a compatibility issue. i know. i know. no surprise here. so i guess i just got to the point where everybody else is... build spark w/o hive. lemme see what happens next. On Wed, Sep 27, 2017 at 7:41 PM, Stephen Sprague wrote: > thanks. I haven't had a chance to dig into this agai

Re: hive on spark - why is it so hard?

2017-09-27 Thread Stephen Sprague
look at the HoS Remote Driver logs. The driver > gets launched in a YARN container (assuming you are running Spark in > yarn-client mode), so you just have to find the logs for that container. > > --Sahil > > On Tue, Sep 26, 2017 at 9:17 PM, Stephen Sprague > wrote: > >>

Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
lelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:236) [hive-exec-2.3.0.jar:2.3.0] i'll dig some more tomorrow. On Tue, Sep 26, 2017 at 8:23 PM, Stephen Sprague wrote: > oh. i missed Gopal's reply. oy... that sounds forebodin

Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
oh. i missed Gopal's reply. oy... that sounds foreboding. I'll keep you posted on my progress. On Tue, Sep 26, 2017 at 4:40 PM, Gopal Vijayaraghavan wrote: > Hi, > > > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a > spark session: org.apache.hadoop.hive.ql.metadata.HiveExc

Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
u > have older versions of Spark installed locally? > > --Sahil > > On Tue, Sep 26, 2017 at 3:33 PM, Stephen Sprague > wrote: > >> thanks Sahil. here it is. >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/apache/spark/sc

Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
h more recent versions > of Spark, but we only test with Spark 2.0.0. > > --Sahil > > On Tue, Sep 26, 2017 at 2:35 PM, Stephen Sprague > wrote: > >> * i've installed hive 2.3 and spark 2.2 >> >> * i've read this doc plenty of times -> https://cwi

hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
* i've installed hive 2.3 and spark 2.2 * i've read this doc plenty of times -> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started * i run this query: hive --hiveconf hive.root.logger=DEBUG,console -e 'set hive.execution.engine=spark; select date_key, count(*) f

group by + two nulls in a row = bug?

2017-06-27 Thread Stephen Sprague
i'm running hive version 2.1.0 and found this interesting. i've broken it down into a trivial test case below. i run this: select a.date_key, a.property_id, cast(NULL as bigint) as malone_id, cast(NULL as bigint) as zpid, su

Re: any hive release imminent?

2017-06-20 Thread Stephen Sprague
e quickly followed up by Hive 2.3, which will > be more aggressive with features, but less stable. > > .. Owen > > On Mon, Jun 19, 2017 at 7:53 PM, Stephen Sprague > wrote: > >> Hey guys, >> Is there any word out on the street about a timeframe for the next

any hive release imminent?

2017-06-19 Thread Stephen Sprague
Hey guys, Is there any word out on the street about a timeframe for the next 2.x hive release? Looks like Dec 2016 was the last one. The natives are getting restless i think. :) thanks, Stephen.

Re: How to setup the max memory for my big Hive SQL which is on MapReduce of Yarn

2017-06-06 Thread Stephen Sprague
have you researched the yarn schedulers? namely the capacity and fair schedulers? those are the places where resource limits can be easily defined. On Mon, Jun 5, 2017 at 9:25 PM, Chang.Wu <583424...@qq.com> wrote: > My Hive engine is MapReduce and Yarn. What my urgent need is to limit the > mem

Re: drop table - external - aws

2017-05-17 Thread Stephen Sprague
jgaonkar > wrote: > >> This is interesting and possibly a bug. Did you try changing them to >> managed tables and then dropping or truncating them? How do we reproduce >> this on our setup? >> >> On Tue, May 16, 2017 at 6:38 PM, Stephen Sprague >> wrote: &g

Re: drop table - external - aws

2017-05-17 Thread Stephen Sprague
nd then dropping or truncating them? How do we reproduce > this on our setup? > > On Tue, May 16, 2017 at 6:38 PM, Stephen Sprague > wrote: > >> fwiw. i ended up re-creating the ec2 cluster with that same host name >> just so i could drop those tables from the metastore

Re: drop table - external - aws

2017-05-16 Thread Stephen Sprague
at 6:38 AM, Stephen Sprague wrote: > hey guys, > here's something bizarre. i created about 200 external tables with a > location something like this 'hdfs:///path'. this was three > months ago and now i'm revisiting and want to drop these tables. > > ha! no can

drop table - external - aws

2017-05-16 Thread Stephen Sprague
hey guys, here's something bizarre. i created about 200 external tables with a location something like this 'hdfs:///path'. this was three months ago and now i'm revisiting and want to drop these tables. ha! no can do! that is long gone. Upon issuing the drop table command i get this: Error

Re: hive on spark - version question

2017-03-17 Thread Stephen Sprague
ge developers. If you make more $$s it makes sense > learning this stuff is supposed to be harder. > > Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive if > you are querying large files. > > > > On Friday, March 17, 2017 11:33 AM, Stephen Sprague

Re: hive on spark - version question

2017-03-17 Thread Stephen Sprague
it based on previous >> experiences) >> >> But in hindsight, people who work on this kinds of things typically make >> more money that the average developers. If you make more $$s it makes sense >> learning this stuff is supposed to be harder. >> >> Conclu

Re: hive on spark - version question

2017-03-17 Thread Stephen Sprague
:( gettin' no love on this one. any SME's know if Spark 2.1.0 will work with Hive 2.1.0 ? That JavaSparkListener class looks like a deal breaker to me, alas. thanks in advance. Cheers, Stephen. On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague wrote: > hi guys, > wonderin

hive on spark - version question

2017-03-13 Thread Stephen Sprague
hi guys, wondering where we stand with Hive On Spark these days? i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental versions) and running up against this class not found: java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener searching the Cyber i find this: 1. h

random KILL's in YARN

2017-01-18 Thread Stephen Sprague
hey guys, I have a question on why Hiveserver2 would issue a "killjob" signal. We run Yarn on Hadoop 5.6 with the HiveServer2 process. It uses the fair-scheduler. Pre-emption is turned off. At least twice a day we have jobs that are randomly killed. they can be big jobs, they can be small ones. t

Re: tez + union stmt

2016-12-25 Thread Stephen Sprague
s you have not specified the >> location). >> >> Adding user@hive.apache.org as this is hive related. >> >> >> ~Rajesh.B >> >> On Sun, Dec 25, 2016 at 12:08 AM, Stephen Sprague >> wrote: >> >> all, >> >> i'm running tez

Re: Maintaining big and complex Hive queries

2016-12-21 Thread Stephen Sprague
my 2 cents. :) as soon as you say "complex query" i would submit you've lost the upperhand and you're behind the eight-ball right off the bat. And you know this too otherwise you wouldn't have posted here. ha! i use cascading CTAS statements so that i can examine the intermediate tables. Anothe

Re: [ANNOUNCE] Apache Hive 2.1.1 Released

2016-12-08 Thread Stephen Sprague
Ahh. thank you. On Thu, Dec 8, 2016 at 3:19 PM, Alan Gates wrote: > Apache keeps just the latest version of each release on the mirrors. You > can find all Hive releases at https://archive.apache.org/dist/hive/ if > you need 2.1.0. > > Alan. > > > On Dec 8, 2016, a

Re: [ANNOUNCE] Apache Hive 2.1.1 Released

2016-12-08 Thread Stephen Sprague
out of curiosity any reason why release 2.1.0 disappeared from apache.claz.org/hive ? apologies if i missed the conversation about it. thanks. [image: Inline image 1] On Thu, Dec 8, 2016 at 9:58 AM, Jesus Camacho Rodriguez wrote: > The Apache Hive team is proud to announce the release of Apa

Re: s3a and hive

2016-11-15 Thread Stephen Sprague
. Anyway, I reset that back to hdfs and was inserting into an external table located in s3 and *still* got that error above much to my consternation. however, by playing with "hive.exec.stagingdir" (and reading that stackoverflow) i was able to overcome the error. YMMV. Cheers, Stephen.

Re: s3a and hive

2016-11-15 Thread Stephen Sprague
a/browse/HADOOP-13345> >- Use Hive on EMR with Amazon's S3 filesystem implementation and >EMRFS. Note that this confusingly requires and overloads the 's3://' > scheme. > > Hope this helps, and please report back with any findings as we are doing > quite a bit

Re: s3a and hive

2016-11-15 Thread Stephen Sprague
th success? seems to me hive 2.2.0 and perhaps hadoop 2.7 or 2.8 are the only chances of success but i'm happy to be told i'm wrong. thanks, Stephen. On Mon, Nov 14, 2016 at 10:25 PM, Jörn Franke wrote: > Is it a permission issue on the folder? > > On 15 Nov 2016, at 06:28,

s3a and hive

2016-11-14 Thread Stephen Sprague
so i figured i try and set hive.metastore.warehouse.dir=s3a://bucket/hive and see what would happen. running this query: insert overwrite table omniture.hit_data_aws partition (date_key=20161113) select * from staging.hit_data_aws_ext_20161113 limit 1; yields this error: Failed with exce

Re: a GROUP BY that is not fully grouping

2016-11-03 Thread Stephen Sprague
ha! kinda shows how the tech stack boundaries now are getting blurred, eh? well at least for us amateurs! :o On Thu, Nov 3, 2016 at 5:00 AM, Donald Matthews wrote: > |Spark calls its SQL part HiveContext, but it is not related to this > list > > Oof, I didn't realize that. Thanks for letti

Re: hiveserver2 GC overhead limit exceeded

2016-10-23 Thread Stephen Sprague
ok. i'll bite. lets see the output of this command where Hiveserver2 is running. $ ps -ef | grep -i hiveserver2 this'll show us all the command line parameters HS2 was (ultimately) invoked with. Cheers, Stephen On Sun, Oct 23, 2016 at 6:46 AM, patcharee wrote: > Hi, > > I use beeline to conn

hiveserver2 and KILLJOB

2016-10-05 Thread Stephen Sprague
hey guys, this is a long shot but i'll ask anyway. We're running YARN and HiveServer2 (v2.1.0) and noticing "random" kills - what looks to me - being issued by HiveServer2. we've turned DEBUG log level on for the Application Master container and see the following in the logs: 2016-10-05 02:06:1

Re: How do I determine a library mismatch between jdbc client and server?

2016-09-28 Thread Stephen Sprague
you might just end up using your own heuristics. if the port is "alive" (ie. you can list it via netstat or telnet to it) but you can't connect... then you got yourself a problem. kinda like a bootstrapping problem, eh? you need to connect to get the version but you can't connect if you don't hav

Re: Hive queries rejected under heavy load

2016-09-28 Thread Stephen Sprague
gotta start by looking at the logs and run the local client to eliminate HS2. perhaps running hive as such: $ hive -hiveconf hive.root.logger=DEBUG,console do you see any smoking gun? On Wed, Sep 28, 2016 at 7:34 AM, Jose Rozanec wrote: > Hi, > > We have a Hive cluster (Hive 2.1.0+Tez 0.8.4)

Re: Hive 2.x usage

2016-09-14 Thread Stephen Sprague
> * Are you using Hive-2.x at your org and at what scale? yes. we're using 2.1.0. 1.5PB. 30 node cluster. ~1000 jobs a day.And yeah hive 2.1.0 has some issues and can require some finesse wrt the hive-site.xml settings. > * Is the release stable enough? Did you notice any correctness issue

Re: Re: load data Failed with exception java.lang.IndexOutOfBoundsException

2016-09-08 Thread Stephen Sprague
>at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat. validateInput(OrcInputFormat.java:508) would it be safe to assume that you are trying to load a text file into an table stored as ORC? your create table doesn't specify that explicitly so that means you have a setting in your configs that says

Re: hive.root.logger influencing query plan?? so it's not so

2016-09-04 Thread Stephen Sprague
for the query to hang. * so empty result expected. as Gopal mentioned previously this does indeed fix it: * set hive.fetch.task.conversion=none; but not sure its the right thing to set globally just yet. Anyhoo users beware. Regards, Stephen On Wed, Aug 31, 2016 at 7:01 AM, Stephen Spragu

Re: Beeline throws OOM on large input query

2016-09-02 Thread Stephen Sprague
hmmm. so beeline blew up *before* the query was even submitted to the execution engine? one would think 16G would be plenty 8M row sql statement. some suggestions if you feel like going further down the rabbit hole. 1. confirm your beeline java process is indeed running with expanded memory (

Re: Beeline throws OOM on large input query

2016-09-01 Thread Stephen Sprague
lemme guess. your query contains an 'in' clause with 1 million static values? :) * brute force solution is to set: HADOOP_CLIENT_OPTS=-Xmx8G (or whatever) before you run beeline to force a larger memory size (i'm pretty sure beeline uses that env var though i didn't actually check the script)

Re: Quota for rogue ad-hoc queries

2016-09-01 Thread Stephen Sprague
> rogue queries so this really isn't limited to just hive is it? any dbms system perhaps has to contend with this. even malicious rogue queries as a matter of fact. timeouts are cheap way systems handle this - assuming time is related to resource. i'm sure beeline or whatever client you use has

Re: hive.root.logger influencing query plan?? so it's not so

2016-08-31 Thread Stephen Sprague
ons. > > Cheers, > Vlad > > --- > From: Stephen Sprague > To: "user@hive.apache.org" > Cc: > Date: Tue, 30 Aug 2016 20:28:50 -0700 > Subject: hive.root.logger influencing query plan?? so it's not so > Hi guys, > I've banged my head

hive.root.logger influencing query plan?? so it's not so

2016-08-30 Thread Stephen Sprague
Hi guys, I've banged my head on this one all day and i need to surrender. I have a query that hangs (never returns). However, when i turn on logging to DEBUG level it works. I'm stumped. I include here the query, the different query plans (with the only thing different being the log level) and

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
t; >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is expl

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
ich may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 26 August 2016 at 20:32, Stephen Sprague wrote

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
arying VIEW_EXPANDED_TEXT | text | VIEW_ORIGINAL_TEXT | text | {quote} wonder if i can perform some surgery here. :o do i feel lucky? On Fri, Aug 26, 2016 at 12:28 PM, Stephen Sprague wrote: > well that doesn't bode well. :( > > we definitely

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
ion of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 26 August 2016 at 16:43, Stephe

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
thanks Gopal. you're right our metastore is using Postgres. very interesting you were able to intuit that! lemme give your suggestions a try and i'll post back. thanks! Stephen On Fri, Aug 26, 2016 at 8:32 AM, Gopal Vijayaraghavan wrote: > > NULL::character%20varying) > ... > > i want to say

hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
hey guys, this ones a little more strange. hive> create view foo_vw as select * from foo; OK Time taken: 0.376 seconds hive> drop view foo_vw; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: java.net.URI

Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key

2016-08-25 Thread Stephen Sprague
s on master. I’ll file a bug... > > From: Stephen Sprague > Reply-To: "user@hive.apache.org" > Date: Thursday, August 25, 2016 at 13:34 > To: "user@hive.apache.org" > Subject: Re: hive 2.1.0 and "NOT IN ( list )" and column is a > partition_key &

Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key

2016-08-25 Thread Stephen Sprague
Hi Gopal, Thank you for this insight. good stuff. The thing is there is no 'foo' for etl_database_source so that filter if anything should be short-circuited to 'true'. ie. double nots. 1. not in 2. and foo not present. it doesn't matter what what i put in that "not in" clause the filter al

hive 2.1.0 and "NOT IN ( list )" and column is a partition_key

2016-08-25 Thread Stephen Sprague
anybody run up against this one? hive 2.1.0 + using a "not in" on a list + the column is a partition key participant. * using not query: explain SELECT count(*) FROM bi.fact_email_funnel WHERE event_date_key = 20160824 AND etl_source_database *not* in ('foo') output frag: Map Opera

Re: hive throws ConcurrentModificationException when executing insert overwrite table

2016-08-17 Thread Stephen Sprague
indeed +1 to Gopal on that explanation! That was huge. On Wed, Aug 17, 2016 at 12:58 AM, 明浩 冯 wrote: > Hi Gopal, > > > It works when I disabled the dfs.namenode.acls. > > For the data loss, it doesn't affect me too much currently. But I will > track the issue in Kylin. > > Thank you very much fo

Re: JsonSerDe and mapping tweet's user structure error

2016-08-16 Thread Stephen Sprague
stackoverflow is your friend. that said have a peek at the doc even :) cf. https://cwiki.apache.org/ confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non- reservedKeywordsandReservedKeywords paying close attention to this paragraph: {quote} Reserved keywords are permitted a

Re: hiver errors

2016-08-10 Thread Stephen Sprague
this error messages says everything you need to know: >Likely cause: new client talking to old server. Continuing without it. when you upgrade hive you also need to upgrade the metastore schema. failing to do that can trigger the message you're getting. On Wed, Aug 10, 2016 at 6:41 AM, Mich Tale

Re: beeline/hiveserver2 + logging

2016-08-10 Thread Stephen Sprague
Hi Gopal, Aha! thank you for background behind this. that makes things much more understandable. and ~3000 queries across 10 HS2 servers. sweet. now that's what i call pushing the edge. I like it! Thanks again, Stephen. On Tue, Aug 9, 2016 at 10:29 PM, Gopal Vijayaraghavan wrote: > > not get

Re: beeline/hiveserver2 + logging

2016-08-09 Thread Stephen Sprague
g seek out that operation_log dir & associated file. Thanks, Stephen. On Tue, Aug 9, 2016 at 6:44 PM, Stephen Sprague wrote: > well, well. i just found this: https://issues.apache.org/ > jira/browse/HIVE-14183 seems something changed between 1.2.1 and > 2.1.0. > > i'll see

Re: beeline/hiveserver2 + logging

2016-08-09 Thread Stephen Sprague
well, well. i just found this: https://issues.apache.org/jira/browse/HIVE-14183 seems something changed between 1.2.1 and 2.1.0. i'll see if the Rx as prescribed in that ticket does indeed work for me. Thanks, Stephen. On Tue, Aug 9, 2016 at 5:12 PM, Stephen Sprague wrote: > hey guy

beeline/hiveserver2 + logging

2016-08-09 Thread Stephen Sprague
hey guys, try as i might i cannot seem to get beeline (via jdbc) to log information back from hiveserver2 like job_id, progress and that kind of information (similiar to what the local beeline or hive clients do.) i see this ticket that is closed: https://issues.apache.org/jira/browse/HIVE-7615 wh

Re: msck repair table and hive v2.1.0

2016-07-14 Thread Stephen Sprague
ad=0"? You mentioned that your tables tables are in s3, > but the external table created was pointing to HDFS. Was that intentional? > > ~Rajesh.B > > On Fri, Jul 15, 2016 at 6:58 AM, Stephen Sprague > wrote: > >> in the meantime given my tables are in s3 i've

Re: msck repair table and hive v2.1.0

2016-07-14 Thread Stephen Sprague
t msck repair tables does but in a non-portable way. oh well. gotta do what ya gotta do. On Wed, Jul 13, 2016 at 9:29 PM, Stephen Sprague wrote: > hey guys, > i'm using hive version 2.1.0 and i can't seem to get msck repair table to > work. no matter what i try i get the '

msck repair table and hive v2.1.0

2016-07-13 Thread Stephen Sprague
hey guys, i'm using hive version 2.1.0 and i can't seem to get msck repair table to work. no matter what i try i get the 'ol NPE. I've set the log level to 'DEBUG' but yet i still am not seeing any smoking gun. would anyone here have any pointers or suggestions to figure out what's going wrong?

Tez issues with beeline via HS2

2016-02-17 Thread Stephen Sprague
Hi guys, it was suggested i post to the user@hive group rather than the user@tez group for this one. Here's my issue. My query hangs when using beeline via HS2 (but works with the local beeline client). I'd like to overcome that. This is my query: beeline -u 'jdbc:hive2:// dwrdevnn1.sv2.trui

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Stephen Sprague
i refuse to take anybody seriously who has a sig file longer than one line and that there is just plain repugnant. On Wed, Feb 3, 2016 at 1:47 PM, Mich Talebzadeh wrote: > I just did some further tests joining a 5 million rows FACT tables with 2 > DIMENSION tables. > > > > SELECT t.calendar_mon

Re: Hive job name

2015-04-07 Thread Stephen Sprague
quot; Sure setting mapreduce.job.name explicitly is a workaround but... that's a boat load of code changes! Would not there be a "fix" to roll this back to how it got the job.name before? Thanks, Stephen Sprague On Wed, Mar 11, 2015 at 1:38 PM, Viral Bajaria wrote: > I haven't used

Re: bug in hive

2014-09-20 Thread Stephen Sprague
great policy. install open source software that's not even version 1.0 into production and then not allow the ability to improve it (but of course reap all the rewards of its benefits.) so instead of actually fixing the problem the right way introduce a super-hack work-around cuz, you know, that's

Re: Mysql - Hive Sync

2014-09-06 Thread Stephen Sprague
og and save it under hive warehouse > as table and query from there. > > > > *RegardsMuthupandi.K* > > [image: Picture (Device Independent Bitmap)] > > > > On Sat, Sep 6, 2014 at 4:47 AM, Stephen Sprague > wrote: > >> great find, Muthu. I would be interes

Re: Mysql - Hive Sync

2014-09-05 Thread Stephen Sprague
great find, Muthu. I would be interested in hearing any about any success or failures using this adapter. almost sounds too good to be true. After reading the blog ( http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-2.html) about it i see it comes with caveats and it loo

Re: ODBC Calls Extremely Slow

2014-08-15 Thread Stephen Sprague
what container are you using for your metastore? Derby, mysql or postgres? for a large set of tables don't use Derby. So you've confirmed its the ODBC driver and not the metastore itself? On Fri, Aug 15, 2014 at 8:54 AM, Bradley Wright wrote: > Try an eval of our commercial ODBC driver for Hiv

Re: Altering the Metastore on EC2

2014-08-14 Thread Stephen Sprague
i'll take a stab at this. - probably no reason. - if you can. is there a derby client s/t you can issue the command: "alter table COLUMNS_V2 modify TYPE_NAME varchar(32672)". otherwise maybe use the mysql or postgres metastores (instead of derby) and run that alter command after the install. - t

Re: Hive Server 2 memory leak

2014-06-11 Thread Stephen Sprague
searching this list will in fact show you're not alone. what is being done about it is another matter. On Wed, Jun 11, 2014 at 10:42 AM, Benjamin Bowman wrote: > All, > > I am running Hadoop 2.4 and Hive 0.13. I consistently run out of Hive > heap space when running for a long period of time

Re: Predicate pushdown optimisation not working for ORC

2014-04-03 Thread Stephen Sprague
wow. good find. i hope these config settings are well documented and that you didn't have to spend alot time searching for that. Interesting that the default isn't true for this one. On Wed, Apr 2, 2014 at 11:00 PM, Abhay Bansal wrote: > I was able to resolve the issue by setting "hive.optimize

Re: MSCK REPAIR TABLE

2014-03-27 Thread Stephen Sprague
fwiw. i would not have the repair table statement as part of a production job stream. That's kinda a poor man's way to employ dynamic partitioning off the back end. Why not either use hive's dynamic partitioning features or pre-declare your partitions? that way you are explicitly coding for your

Re: Partitioned table to partitioned table

2014-03-26 Thread Stephen Sprague
the error message is correct. remember the partition columns are not stored with the data and by doing a "select *" that's what doing. And this has nothing to do with ORC either its a Hive thing. :) so your second approach was close. just omit the partition columns yr, mo, day. On Wed, Mar 26

Re: Improving self join time

2014-03-20 Thread Stephen Sprague
g > > select key from (query result that doesn't contain the key field) ... > > > On Thu, Mar 20, 2014 at 1:28 PM, Stephen Sprague wrote: > >> I agree with your assessment of the inner query. why stop there though? >> Doesn't the outer query fetch the ids of the tag

Re: Improving self join time

2014-03-20 Thread Stephen Sprague
e a > list of duplicate elements and their counts, but it loses the information > as to what id had these elements. > > I'm trying to find which pairs of ids have any duplicate tags. > > > On Thu, Mar 20, 2014 at 11:57 AM, Stephen Sprague wrote: > >> hmm.

Re: Improving self join time

2014-03-20 Thread Stephen Sprague
hmm. would this not fall under the general problem of identifying duplicates? Would something like this meet your needs? (untested) select -- outer query finds the ids for the duplicates key from ( -- inner query lists duplicate values select count(*) as cnt, value

Re: computing median and percentiles

2014-03-20 Thread Stephen Sprague
t; maintains the count, how can Hive be used to derive the percentile? > > Value Count > 100 2 > 200 4 > 300 1 > > Thanks, > Seema > > From: Stephen Sprague > Reply-To: "user@hive.apache.org" > Date: Thursday

Re: computing median and percentiles

2014-03-19 Thread Stephen Sprague
not a hive question is it? its more like a math question. On Wed, Mar 19, 2014 at 1:30 PM, Seema Datar wrote: > > > I understand the percentile function is supported in Hive in the latest > versions. However, how does once calculate percentiles when the data is > across two columns. So say

Re: Trouble with transform and dynamic partitions into table with custom field delimiters.

2014-03-18 Thread Stephen Sprague
but why go through all this and make it so long-winded, verbose and non-standard? That's a pain to maintain! just use tabs as your transform in/out separator and go easy on the next guy who has to maintain your code. :) On Tue, Mar 18, 2014 at 4:59 PM, Nurdin Premji < nurdin.pre...@casalemedia.

Re: Writing data to LOCAL with Hive Server2

2014-03-14 Thread Stephen Sprague
ver2 is running on. Basically, there is no way to reach those files > from our boxes. That's why I was asking about writing it locally. > I'll check this list for import/export like you mentioned. > > Thanks. > > > On Friday, March 14, 2014 12:23 PM, Stephen Spra

Re: Writing data to LOCAL with Hive Server2

2014-03-14 Thread Stephen Sprague
re: HiveServer2 this is not natively possible (this falls under the export rubric.) similarly, you can't load a file directly from your client using native syntax (import.) Believe me, you're not the only one who'd like this both of these functions. :) I'd search this list for import or export ut

Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-14 Thread Stephen Sprague
luck! On Fri, Mar 14, 2014 at 4:21 AM, Nitin Pawar wrote: > Can you first try updating hive to atleast 0.11 if you can not move to > 0.12 ? > > > On Fri, Mar 14, 2014 at 4:49 PM, Arafat, Moiz wrote: > >> My comments inline >> >> >> >> *From:* Ste

Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-13 Thread Stephen Sprague
/partition_hr=1 > > $ hadoop fs -copyFromLocal test.dat > /user/moiztcs/moiz_partition_test/partition_hr=10 > > $ hadoop fs -copyFromLocal test.dat > /user/moiztcs/moiz_partition_test/partition_hr=2 > > > > 5) hive> select distinct partition_hr from moiz_partition_test ord

additional hive functions

2014-03-12 Thread Stephen Sprague
just a public service announcement. I had a case where i had a nested json array in a string and i needed that to act like a first class array in hive. natively, you can pull it out but it'll just a string. woe is me. I searched around the web and found this: http://stackoverflow.com/questions/1

Re: full outer join result

2014-03-12 Thread Stephen Sprague
2, 2014 at 9:36 AM, Stephen Sprague wrote: > interesting.don't know the answer but could you change the UNION in > the Postgres to UNION ALL? I'd be curious if the default is UNION DISTINCT > on that platform. That would at least partially explain postgres behaviour > lea

Re: full outer join result

2014-03-12 Thread Stephen Sprague
interesting.don't know the answer but could you change the UNION in the Postgres to UNION ALL? I'd be curious if the default is UNION DISTINCT on that platform. That would at least partially explain postgres behaviour leaving hive the odd man out. On Wed, Mar 12, 2014 at 6:47 AM, Martin Kud

Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-12 Thread Stephen Sprague
est.dat /user/moiztcs/moiz_partition_test/02 > > hadoop fs -copyFromLocal test.dat /user/moiztcs/moiz_partition_test/10 > > > > 4) Ran the sql > > hive> select distinct partition_hr from moiz_partition_test order by > partition_hr; > > Ended Job > > OK

Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-11 Thread Stephen Sprague
that makes no sense. if the column is an int it isn't going to sort like a string. I smell a user error somewhere. On Tue, Mar 11, 2014 at 6:21 AM, Arafat, Moiz wrote: > Hi , > > I have a table that has a partition column partition_hr . Data Type is int > (partition_hrint) . When i run

Re: bucketed table problems

2014-03-07 Thread Stephen Sprague
short answer: its by position.

Re: bucketed table problems

2014-03-07 Thread Stephen Sprague
yeah. that's not right. 1. lets see the output of "show create table foo" 2. what version of hive are you using. On Fri, Mar 7, 2014 at 11:46 AM, Keith Wiley wrote: > I want to convert a table to a bucketed table, so I made a new table with > the same schema as the old table and specified a c

Re: HIVE QUERY HELP:: HOW TO IMPLEMENT THIS CASE

2014-03-04 Thread Stephen Sprague
0') ) -- no where clause needed on 'AGE' since its part of the where clause in the -- derived table. {code} i switched your ON clause and WHERE clause so be sure to take that under consideration. And finally its not tested. Best of luck. Cheers, Stephen On Tue, Mar 4, 2014 a

Re: HIVE QUERY HELP:: HOW TO IMPLEMENT THIS CASE

2014-03-04 Thread Stephen Sprague
Let's just say this. Coercing hive into doing something its not meant to do is kinda a waste of time. Sure you can rewrite any update as a delete/insert but that's not the point of Hive. Seems like your going down a path here that's not optimal for your situation. You know, I could buy a Tesla a

Re: move hive tables from one cluster to another cluster

2014-02-28 Thread Stephen Sprague
that advice is way over complicating something that is very easy. instead, please take this approach. 1. run the ddl to create the table on the new cluster 2. distcp the hdfs data into the appropriate hdfs directory. 3. run "msck repair table " in hive to discover the partitions and populate the m

Re: Hive + Flume

2014-02-28 Thread Stephen Sprague
if you can configure flume to create temporary files that start with an underscore (_) i believe hive will safely ignore them. otherwise you have write a script to move them out. On Fri, Feb 28, 2014 at 11:09 AM, P lva wrote: > Hi, > > I'm have a flume stream that stores data in a directory whi

Re: move hive tables from one cluster to another cluster

2014-02-28 Thread Stephen Sprague
this is a FAQ. see doc on: msck repair table this will scan hdfs and create the corresponding partitions in the metastore. On Fri, Feb 28, 2014 at 12:59 AM, shashwat shriparv < dwivedishash...@gmail.com> wrote: > Where was your meta data in derby or MySql? > > > *Warm Regards_**∞_* > * Shashw

Re: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-22 Thread Stephen Sprague
yeah. That traceback pretty much spells it out - its metastore related and that's where the partitions are stored. I'm with the others on this. HiveServer2 is still a little jankey on memory management. I bounce mine once a day at midnight just to play it safe (and because i can.) Again, for me,

Re: Slow performance on queries with aggregation function

2014-02-21 Thread Stephen Sprague
Hi Jone, um. i can say for sure something is wrong. :) i would _start_ by going to the tasktracker. this is your friend. find your job and look for failed reducers. That's the starting point anyway, IMHO. On Fri, Feb 21, 2014 at 11:35 AM, Jone Lura wrote: > Hi, > > I have tried some variat

  1   2   3   >