Next gen metastore

2022-04-02 Thread Edward Capriolo
While not active in the development community as much I have been using hive in the field as well as spark and impala for some time. My ancmecdotal opinion is that the current metastore needs a significant re write to deal with "next generation" workloads. By next generation I actually mean last g

Re: How can I know use execute or executeQuery

2021-09-15 Thread Edward Capriolo
In most sql drivers.. you can always use executeQuery even if the query has no result set. On Wednesday, September 15, 2021, Alessandro Solimando < alessandro.solima...@gmail.com> wrote: > Hi Igyu, > sending different SQL statements is exactly what beeline has to handle, > I'd have a look at how

Re: [EXTERNAL] Re: Any plan for new hive 3 or 4 release?

2021-03-11 Thread Edward Capriolo
of updating a dependency broke one > project, downgrading it broke a different project. > > https://github.com/apache/druid/pull/10683 > > HDFS-15790 > > On Sat, Feb 27, 2021, 7:31 PM Matt McCline > wrote: > >> Yes to Hive 4 release. Plenty of changes (1,500+). >>

Re: Any plan for new hive 3 or 4 release?

2021-02-27 Thread Edward Capriolo
t; It will be amazing if the community could produce a release every > quarter/6months. :-) > > Le ven. 26 févr. 2021 à 14:30, Edward Capriolo a > écrit : > >> Hive was releasable trunk for the longest time. Facebook days. Then the >> big data vendors got more involved. Then it

Re: Any plan for new hive 3 or 4 release?

2021-02-26 Thread Edward Capriolo
Hive was releasable trunk for the longest time. Facebook days. Then the big data vendors got more involved. Then it became a pissing match about features. This vendor likes tez this vendor dont, this vendor likes hive on spark this one dont. Then this vendor wants to tell everyone hive stinks use

Avro tables with 5k columns any tips?

2021-02-24 Thread Edward Capriolo
Hello all, It has been a long time. I have been forced to use avro and create a table with over 5k columns. It's helluva slow. I warned folks that all the best practices say "dont make a table more than 1k or 2k columns" (impala hive cloudera). No one listened to me, so now the table is a mess. Im

Re: Article on the correctness of Hive on MR3, Presto, and Impala

2019-06-26 Thread Edward Capriolo
I like the approach of applying an arbitrary limit. Hive's q files tend to add an ordering to everything. Would it make sense to simply order by multiple columns in the result set and conduct a large diff on them? On Wednesday, June 26, 2019, Sungwoo Park wrote: > I have published a new article

Re: Hive on Tez vs Impala

2019-04-16 Thread Edward Capriolo
I have changes jobs 3 times since tez was introduced. It is a true waste of compute resources and time that it was never patched in. So I either have to waste my time patching it in, waste my time running a side deployment, or not installing it and waste money having queries run longer on mr/spark

Re: Hive on Tez vs Impala

2019-04-15 Thread Edward Capriolo
n source hadoop, which will not work if you've > kerberized the cluster. You'll have to build a version of ATS against CDH > libraries that provides the classes needed to run the engine. We have done > this work as well and it runs pretty smoothly. > > > > On Mon, Apr

Re: Hive on Tez vs Impala

2019-04-15 Thread Edward Capriolo
Out of band question. Given: https://hortonworks.com/blog/welcome-brand-new-cloudera/ Does cdh finally ship with a tea you dont have to manually patch in? On Monday, April 15, 2019, Sungwoo Park wrote: > I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH > 5.15.2 a while ago.

Re: Creating temp tables in select statements

2019-03-28 Thread Edward Capriolo
I made a udtf a while back that's let's you specify lists of tuples from there you can explode them into rows On Thursday, March 28, 2019, Jesus Camacho Rodriguez < jcamachorodrig...@hortonworks.com> wrote: > Depending on the version you are using, table + values syntax is supported. > > https://

Re: Announce: MR3 0.6 released

2019-03-23 Thread Edward Capriolo
Thanks! Very cool. On Sat, Mar 23, 2019 at 1:33 PM Sungwoo Park wrote: > I am pleased to announce the release of MR3 0.6. New key features are: > > - In Hive on Kubernetes, DAGAppMaster can run in its own Pod. > - MR3-UI requires only Timeline Server. > - Hive on MR3 is much more stable because

Re: just released: Docker image of a minimal Hive server

2019-02-21 Thread Edward Capriolo
9 11:59 AM >> *To:* user@hive.apache.org >> *Subject:* Re: just released: Docker image of a minimal Hive server >> >> Hello! >> >> If that might help, I did this repo a while ago: >> https://github.com/FurcyPin/docker-hive-spark >> It provides a pre-i

Re: just released: Docker image of a minimal Hive server

2019-02-21 Thread Edward Capriolo
Good deal and great name! On Thu, Feb 21, 2019 at 11:31 AM Aidan L Feldman (CENSUS/ADEP FED) < aidan.l.feld...@census.gov> wrote: > Hi there- > > I am a new Hive user, working at the US Census Bureau. I was interested in > getting Hive running locally, but wanted to keep the dependencies isolated

Re: Is 'application' a reserved word?

2018-05-30 Thread Edward Capriolo
We got bit pretty hard when "exchange partitions" was added. How many people in ad-tech work with exchange's? everyone! On Wed, May 30, 2018 at 1:38 PM, Alan Gates wrote: > It is. You can see the definitive list of keywords at > https://github.com/apache/hive/blob/master/ql/src/java/ > org/apac

Re: Hive, Tez, clustering, buckets, and Presto

2018-04-03 Thread Edward Capriolo
True. The spec does not mandate the bucket files have to be there if they are empty. (missing directories are 0 row tables). Thanks, Edward On Tue, Apr 3, 2018 at 4:42 PM, Richard A. Bross wrote: > Gopal, > > The Presto devs say they are willing to make the changes to adhere to the > Hive bucke

Re: Proposal: File based metastore

2018-01-30 Thread Edward Capriolo
ta in S3. They put all of the metadata in S3 except for a single link to >> the name of the table's root metadata file. >> >> Other advantages of their design: >> >>- Efficient atomic addition and removal of files in S3. >> - Consistent schema evolution

Re: Proposal: File based metastore

2018-01-29 Thread Edward Capriolo
On Mon, Jan 29, 2018 at 12:44 PM, Owen O'Malley wrote: > > > On Jan 29, 2018, at 9:29 AM, Edward Capriolo > wrote: > > > > On Mon, Jan 29, 2018 at 12:10 PM, Owen O'Malley > wrote: > >> You should really look at what the Netflix guys are doin

Re: Proposal: File based metastore

2018-01-29 Thread Edward Capriolo
g and bucketing. > > > .. Owen > > On Sun, Jan 28, 2018 at 12:02 PM, Edward Capriolo > wrote: > >> All, >> >> I have been bouncing around the earth for a while and have had the >> privilege of working at 4-5 places. On arrival each place was in a var

Proposal: File based metastore

2018-01-28 Thread Edward Capriolo
All, I have been bouncing around the earth for a while and have had the privilege of working at 4-5 places. On arrival each place was in a variety of states in their hadoop journey. One large company that I was at had a ~200 TB hadoop cluster. They actually ran PIG and there ops group REFUSED to

Re: Format dillema

2017-06-23 Thread Edward Capriolo
"You're off by a couple of orders of magnitude - in fact, that was my last year's Hadoop Summit demo, 10 terabytes of Text on S3, converted to ORC + LLAP." "We've got sub-second SQL execution, sub-second compiles, sub-second submissions … with all of it adding up to a single or double digit second

Re: Format dillema

2017-06-23 Thread Edward Capriolo
"Yes, it's a tautology - if you cared about performance, you'd use ORC, because ORC is the fastest format." It is not that simple. The average Hadoop user has years 6-7 of data. They do not have a "magic" convert everything button. They also have legacy processes that don't/can't be converted. The

Re: Format dillema

2017-06-20 Thread Edward Capriolo
"Hive 3.x branch has text vectorization and LLAP cache support for it, so hopefully the only relevant concern about Text will be the storage costs due to poor compression (& the lack of updates)." I kept hearing about vectorization, but later found out it was going to work if i used ORC. Litterall

Re: Format dillema

2017-06-20 Thread Edward Capriolo
ause it is the ONLY format all tools support 2) makes two outputs for each query using 2x space (Can someone please make a competitor for Oozie? *grin*) https://github.com/apache/incubator-airflow , mrjobs, luigi, askaban :) On Tue, Jun 20, 2017 at 1:45 PM, Owen O'Malley wrote: > > &

Re: Format dillema

2017-06-20 Thread Edward Capriolo
It is whack that two optimized row columnar formats exists and each respective project (hive/impala) has good support for one and lame/no support for the other. Impala is now an Apache project. Also 'whack' and 'lame' are technical terms often used by the people in the real world that have to use

Re: Pro and Cons of using HBase table as an external table in HIVE

2017-06-09 Thread Edward Capriolo
Think about it like this one system is scanning a local file ORC, using an hbase scanner (over the network), and scanning the data in sstable format? On Fri, Jun 9, 2017 at 5:50 AM, Amey Barve wrote: > Hi Michael, > > "If there is predicate pushdown, then you will be faster, assuming that > the

Re: FYI: Backports of Hive UDFs

2017-06-06 Thread Edward Capriolo
oad-maven- > artifact-via-classloader > https://github.com/treasure-data/digdag/tree/master/ > digdag-core/src/main/java/io/digdag/core/plugin > > Thanks, > Makoto > > 2017-06-03 3:26 GMT+09:00 Edward Capriolo >: > > Don't we currently support features that load fu

Re: FYI: Backports of Hive UDFs

2017-06-02 Thread Edward Capriolo
Don't we currently support features that load functions from external places like maven http server etc? I wonder if it would be easier to back port that back port a handful of functions ? On Fri, Jun 2, 2017 at 2:22 PM, Alan Gates wrote: > Rather than put that code in hive/contrib I was thinkin

Re: Migrating Variable Length Files to Hive

2017-06-02 Thread Edward Capriolo
On Fri, Jun 2, 2017 at 12:07 PM, Nishanth S wrote: > Hello hive users, > > We are looking at migrating files(less than 5 Mb of data in total) with > variable record lengths from a mainframe system to hive.You could think of > this as metadata.Each of these records can have columns ranging from

Re: What is the corresponding java/scala code to 'lateral view explode' in hql ?

2017-05-31 Thread Edward Capriolo
For each row, for each element in exploded list emit 1 row. Row r = { col1 } Set x = [ a,b,c] for (Object y : x){ emit(row, y) } If the set x size is 3 , three rows are output. On Wed, May 31, 2017 at 8:31 AM, 张明磊 <18717838...@163.com> wrote: > Hello experts, > I was wondering to know

Re: drop table - external - aws

2017-05-17 Thread Edward Capriolo
Im pretty sure schema tool does this for people who convert to ha name node. On Wednesday, May 17, 2017, Neil Jonkers wrote: > Hi, > > Inspecting the Hive Metastore tables. > Table SDS has a location field. > > If for reason this does not work: > "ALTER TABLE ... SET LOCATION ... ?" > > Manually

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Edward Capriolo
Here is a similar but not exact way I did something similar to what you did. I had two data files in different formats the different columns needed to be different features. I wanted to feed them into spark's: https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mining/The_FP-

Re: Hive LLAP with Parquet format

2017-05-04 Thread Edward Capriolo
The parquet orc thing has to be tje biggest detractor. Your forced to chose between a format good for impala or good for hive. On May 4, 2017 3:57 PM, "Gopal Vijayaraghavan" wrote: > Hi, > > > > Does Hive LLAP work with Parquet format as well? > > > > LLAP does work with the Parquet format, but

Re: Error with Hive 2.1.1 and Spark 2.1

2017-04-18 Thread Edward Capriolo
On Tue, Apr 18, 2017 at 3:32 PM, hernan saab wrote: > The effort of configuring an apache big data system by hand for your > particular needs is equivalent to herding rattlesnakes and cats into one > small room. > The documentation is poor and most of the time the community developers > don't rea

Re: [ANNOUNCE] Apache Hive 1.2.2 Released

2017-04-08 Thread Edward Capriolo
Nice job On Saturday, April 8, 2017, Vaibhav Gumashta wrote: > The Apache Hive team is proud to announce the release of Apache Hive version > 1.2.2. > > The Apache Hive (TM) data warehouse software facilitates querying and > managing large datasets residing in distributed storage. Built on top

Re: Hive SerDe maven dependency

2017-03-29 Thread Edward Capriolo
You should match your hive versions as close as possible. It makes sense that both hive and hadoop dependencies use a PROVIDED scope, this way if you are building an assembly/fat/shaded jar the jar is as thin as possible. On Wed, Mar 29, 2017 at 3:01 PM, srinu reddy wrote: > > > Hi > > I want to

Re: hive on spark - version question

2017-03-17 Thread Edward Capriolo
On Fri, Mar 17, 2017 at 2:56 PM, hernan saab wrote: > I have been in a similar world of pain. Basically, I tried to use an > external Hive to have user access controls with a spark engine. > At the end, I realized that it was a better idea to use apache tez instead > of a spark engine for my part

Data brinks loves showing charts saying they are faster then hive

2017-02-28 Thread Edward Capriolo
https://databricks.com/blog/2017/02/28/voice-facebook-using-apache-spark-large-scale-language-model-training.html?utm_campaign=Open%20Source&utm_content=47640295&utm_medium=social&utm_source=twitter Always neglect to include the fact that spark has a complete copy of hive inside of it!

[Discuss] tez jars ship with hive in indivisable fashion

2017-02-24 Thread Edward Capriolo
There are a few hadoop vendors that make it an unnecesary burden on users to get tez running. This forces users to compile patch in tez support. Imho this is shameful. These same vendors include all types of extra add ins like say hbase or even mongo support. This 'creative packaging' only serve

Only External tables can have an explicit location

2017-01-25 Thread Edward Capriolo
Error 40003]: Only External tables can have an explicit location using hive 1.2. I got this error. This was definitely not a requirement before Why way this added? External table ONLY used to be dropping the table will not drop the physical files.

Re: Maintaining big and complex Hive queries

2016-12-21 Thread Edward Capriolo
I have been contemplating attaching meta data for the query lineage to each table such that I can know where the data came from and have a 1 click regenerate button. On Wed, Dec 21, 2016 at 3:02 PM, Stephen Sprague wrote: > my 2 cents. :) > > as soon as you say "complex query" i would submit you

Re: Hive Serialization issues

2016-11-23 Thread Edward Capriolo
I believe json itself has encoding rules. What i suggest you do is build your own input format or serde and escape those fieds possibly by converting them to hex. On Wednesday, November 23, 2016, Dana Ram Meghwal wrote: > Hey, > Any leads? > > On Tue, Nov 22, 2016 at 5:35 PM, Dana Ram Meghwal >

Re: Adding a New Primitive Type in Hive

2016-11-19 Thread Edward Capriolo
Technically very do-able timestamps and other decimal types have been added over the years. It actually turns out to be a fair amount of work mostly due to the proliferations of serde that need to be able to read/write that type On Sat, Nov 19, 2016 at 4:11 PM, Juan Delard de Rigoulières < j...@da

Re: hive transactional table compaction fails

2016-11-07 Thread Edward Capriolo
On Mon, Nov 7, 2016 at 1:46 PM, Eugene Koifman wrote: > can you check if the user that the metastore is running as has right to > write to the table dir? > > On 10/26/16, 12:23 PM, "aft" wrote: > > >On Thu, Oct 27, 2016 at 12:00 AM, Eugene Koifman > > wrote: > >> could you provide output of ³SHO

Re: Big Data Event London, 3-4th November 2016 from Tomorrow

2016-11-02 Thread Edward Capriolo
Mich, Looking through the event on a few talks seem to be about hadoop and none mention hive. I understand how hive and this conference relate but I believe this is off topic for the hive mailing list. Thank you, Edward On Wednesday, November 2, 2016, Mich Talebzadeh wrote: > Hi, > > For tho

Re: Quota for rogue ad-hoc queries

2016-09-01 Thread Edward Capriolo
I have written nagios scripts that watch the job tracker UI and report when things take too long. On Thu, Sep 1, 2016 at 11:08 AM, Loïc Chanel wrote: > On the topic of timeout, if I may say, they are a dangerous way to deal > with requests as a "good" request may last longer than an "evil" one.

Re: does Hive implement any Combiner by default?

2016-08-08 Thread Edward Capriolo
Hive uses map side aggregation instead of combiners http://dev.bizo.com/2013/02/map-side-aggregations-in-apache-hive.html On Mon, Aug 8, 2016 at 2:59 PM, Edson Ramiro wrote: > hi all, > > I'm executing TPC-H on Hive 2.0.1, using Yarn 2.7, and I'm wondering if > Hive implements any Combiner by d

Re: Re: hive will die or not?

2016-08-07 Thread Edward Capriolo
A few entities going to "kill/take out/better than hive" I seem to remember HadoopDb, Impala, RedShift , voltdb... But apparent hive is still around and probably faster http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final On Sun, Aug 7, 2016 at 9:49 PM, 理 wrote:

Re: hue / hive issue with sqlite

2016-08-07 Thread Edward Capriolo
e format in here is > parquet, and am able to view sample sets for each column and raw select > queries are working just fine, but none of min / max / distinct / where 'd > work. > > Thanks, > > On Sun, Aug 7, 2016 at 6:38 PM, Edward Capriolo > wrote: > >> You ne

Re: hue / hive issue with sqlite

2016-08-07 Thread Edward Capriolo
You need to take this up with the appropriate hue/cloudera user group. One issue is that SQL lite is a embedded single user database and does not work well with more than one user. We switched to postges in our deployment and would still hit this issue. I never got it resolved, On Sun, Aug 7, 2016

Re: A dedicated Web UI interface for Hive

2016-07-15 Thread Edward Capriolo
I build one a long time ago still in tree very defunct best bet is working with hue or whatever Horton is pushing as not to fragment 4 ways. On Jul 15, 2016 12:56 PM, "Mich Talebzadeh" wrote: > Hi Marcin, > > For Hive on Spark I can use Spark 1.3.1 UI which does not have DAG diagram > (later vers

Re: Analyzing Bitcoin blockchain data with Hive

2016-05-01 Thread Edward Capriolo
Good stuff! On Fri, Apr 29, 2016 at 1:30 PM, Jörn Franke wrote: > Dear all, > > I prepared a small Serde to analyze Bitcoin blockchain data with Hive: > > https://snippetessay.wordpress.com/2016/04/28/hive-bitcoin-analytics-on-blockchain-data-with-sql/ > > There are some example queries, but I w

Re: [VOTE] Bylaws change to allow some commits without review

2016-04-22 Thread Edward Capriolo
+1 On Friday, April 22, 2016, Lars Francke wrote: > Yet another update. I went through the PMC list. > > These seven have not been active (still the same list as Vikram posted > during the last vote): > Ashish Thusoo > Kevin Wilfong > He Yongqiang > Namit Jain > Joydeep Sensarma > Ning Zhang > R

Re: Using Hive SerDe dependent on Protobuf 2.6

2016-03-22 Thread Edward Capriolo
Both thfrift and protbuf are wire compatible but NOT classpath compatible, you need to make sure that you are using one version (even down to the minor version) across all your codebase. On Tue, Mar 22, 2016 at 12:05 PM, kalai selvi wrote: > Hi, > > I am using Hive 0.13 in Amazon EMR. I am stuck

Re: Column type conversion in Hive

2016-03-21 Thread Edward Capriolo
Explicit conversion is done using cast (x as bigint) You said: As a matter of interest what is the underlying storage for Integer? This is dictated on disk by the input format the "temporal in memory format" is dictated by the serde, an integer could be stored as "1", "1" , as dictated by the Inp

Re: Simple UDFS and IN Operator

2016-03-08 Thread Edward Capriolo
The IN UDF is a special one in that unlike many others there is support in the ANTLR language and parsers for it. The rough answer is it can be done but it is not as direct as making other UDFs. On Tue, Mar 8, 2016 at 2:32 PM, Lavelle, Shawn wrote: > Hello All, > >I hope that this question

Re: Hive and Impala

2016-03-01 Thread Edward Capriolo
My nocks on impala. (not intended to be a post knocking impala) Impala really has not delivered on the complex types that hive has (after promising it for quite a while), also it only works with the 'blessed' input formats, parquet, avro, text. It is very annoying to work with impala, In my versi

Re: TBLPROPERTIES K/V Comprehensive List

2016-02-19 Thread Edward Capriolo
There is no comprehensive list, each serde could use the parameters for whatever it desires while other serde's use none at all. On Fri, Feb 19, 2016 at 3:23 PM, mahender bigdata < mahender.bigd...@outlook.com> wrote: > +1, Any information available ? > > On 2/10/2016 1:26 AM, Mathan Rajendran wr

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-04 Thread Edward Capriolo
metastore >> >> >> >> yeah but have you ever seen somewhat write a real analytical program in >> hive? how? where are the basic abstractions to wrap up a large amount of >> operations (joins, groupby's) into a single function call? where are the >>

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-04 Thread Edward Capriolo
Lol very off beat convo for the hive list. Lets not drag ourselves too far down here. On Wednesday, February 3, 2016, Stephen Sprague wrote: > i refuse to take anybody seriously who has a sig file longer than one line > and that there is just plain repugnant. > > On Wed, Feb 3, 2016 at 1:47 PM,

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Edward Capriolo
t for that? >> >> for example in spark i can write a DataFrame => DataFrame that internally >> does many joins, groupBys and complex operations. all unit tested and >> perfectly re-usable. and in hive? copy paste round sql queries? thats just >> dangerous. >> >> On

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-02 Thread Edward Capriolo
Hive has numerous extension points, you are not boxed in by a long shot. On Tuesday, February 2, 2016, Koert Kuipers wrote: > uuuhm with spark using Hive metastore you actually have a real > programming environment and you can write real functions, versus just being > boxed into some version of

Re: Convert string to map

2016-01-20 Thread Edward Capriolo
create table ().. [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] collection items terminated by ',' map keys terminated by ':' works in many cases On Wed, Jan 20, 2016 at 9:07 PM, Buntu Dev wrote: > I found the brickhouse Hive udf `json_map' that seems to conver

Re: adding jars - hive on spark cdh 5.4.3

2016-01-08 Thread Edward Capriolo
d ticket and come up with a more elegant solution. On Fri, Jan 8, 2016 at 12:26 PM, Ophir Etzion wrote: > Thanks! > In certain use cases you could but forgot about the aux thing, thats > probably it. > > On Fri, Jan 8, 2016 at 12:24 PM, Edward Capriolo > wrote: > >>

Re: adding jars - hive on spark cdh 5.4.3

2016-01-08 Thread Edward Capriolo
You can not 'add jar' input formats and serde's. They need to be part of your auxlib. On Fri, Jan 8, 2016 at 12:19 PM, Ophir Etzion wrote: > I tried now. still getting > > 16/01/08 16:37:34 ERROR exec.Utilities: Failed to load plan: > hdfs://hadoop-alidoro-nn-vip/tmp/hive/hive/c2af9882-38a9-42b

Re: Seeing strange limit

2015-12-30 Thread Edward Capriolo
TS" > > > > # The following applies to multiple commands (fs, dfs, fsck, distcp etc) > > export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" > > > > and HADOOP_HEAPSIZE=4096 > > > > I’m assuming just raising the above would work. > > >

Re: Seeing strange limit

2015-12-30 Thread Edward Capriolo
This message means the garbage collector runs but is unable to free memory after trying for a while. This can happen for a lot of reasons. With hive it usually happens when a query has a lot of intermediate data. For example imaging a few months ago count (distinct(ip)) returned 20k. Everything w

Re: hacking the hive ql parser?

2015-12-29 Thread Edward Capriolo
hive --service lineage 'hql' exists i believe. On Tue, Dec 29, 2015 at 3:05 PM, Yang wrote: > I'm trying to create a utility to parse out the data lineage (i.e. DAG > dependency graph) among all my hive scripts. > > to do this I need to parse out the input and output tables from a query. > does

Re: Null Representation in Hive tables

2015-12-27 Thread Edward Capriolo
are csv and dat. Any possibility to > include 2 serialization.null format in table property > > On 12/23/2015 9:16 AM, Edward Capriolo wrote: > > In text formats the null is accepted as \N. > > On Wed, Dec 23, 2015 at 12:00 PM, mahender bigdata < > > mahender.bigd...@o

Re: Null Representation in Hive tables

2015-12-23 Thread Edward Capriolo
In text formats the null is accepted as \N. On Wed, Dec 23, 2015 at 12:00 PM, mahender bigdata < mahender.bigd...@outlook.com> wrote: > Hi, > > Is there any possibility of mentioning both* > "serialization.null.format"="" and **"serialization.null.format"="\000" > *has table properties, current

Re: how to search the archive

2015-12-04 Thread Edward Capriolo
2) Sometimes I find that managed tables are not removed from HDFS even after I drop them from the Hive shell. After a "drop table foo", foo does not show up in a "show tables" listing however that table is present in HDFS. These are not external tables. I have noticed this as well. Sometimes this

Strict mode and joins

2015-10-15 Thread Edward Capriolo
So I have strict mode on and I like to keep it that way. I am trying to do this query. INSERT OVERWRITE TABLE vertical_stats_recent PARTITION (dt=2015101517) SELECT ... FROM entry_hourly_v3 INNER JOIN article_meta ON entry_hourly_v3.entry_id = article_meta.entry_id INNER JOIN channel_meta ON cha

Re: Better way to do UDF's for Hive

2015-10-01 Thread Edward Capriolo
You can define them in groovy from inside the CLI... https://gist.github.com/mwinkle/ac9dbb152a1e10e06c16 On Thu, Oct 1, 2015 at 12:57 PM, Ryan Harris wrote: > If you want to use python... > > The python script should expect tab-separated input on stdin and it should > return tab-separated deli

Re: Bucketing is not leveraged in filter push down ?

2015-09-24 Thread Edward Capriolo
Right. The big place the bucketing is leveraged is on bucket based joins. On Thu, Sep 24, 2015 at 3:29 AM, Jeff Zhang wrote: > I have one table which is bucketed on column name. Then I have the > following sql: > > - select count(1) from student_bucketed_2 where name = 'calvin > nixon';

Re: Hive Macros roadmap

2015-09-11 Thread Edward Capriolo
Macro's are in and tested. No one will remove them. The unit tests ensure they keep working. On Fri, Sep 11, 2015 at 3:38 PM, Elliot West wrote: > Hi, > > I noticed some time ago the Hive Macro feature. To me at least this seemed > like an excellent addition to HQL, allowing the user to encapsul

Re: Is it possible to set the data schema on a per-partition basis?

2015-08-31 Thread Edward Capriolo
Yes. Specifically the avro ser-de like avro support "evolving schema". On Mon, Aug 31, 2015 at 5:15 PM, Dominik Choma wrote: > I have external hcat structures over lzo-compressed datafiles , data is > partitioned by date string > Is it possible to handle schema changes by setting diffrent schema

Hive 1.1 arg!

2015-07-07 Thread Edward Capriolo
Hey all. I am using cloudera 5.4.something which uses hive 1.1 almost. I am getting bit by this error: https://issues.apache.org/jira/browse/HIVE-10437 So I am trying to update my test setup to 1.1 so I can include the annotation. @SerDeSpec(schemaProps = {serdeConstants.LIST_COLUMNS,

Re: hive locate from s3 - query

2015-07-03 Thread Edward Capriolo
You probably need to make your own serde/input format that trims the line. On Fri, Jul 3, 2015 at 8:15 AM, ram kumar wrote: > when i map the hive table to locate the s3 path, > it throws exception for the* new line at the beginning of line*. > Is there a solution to trim the new line at the begi

Re: join 2 tables located on different clusters

2015-06-24 Thread Edward Capriolo
I do not know what your exact problem is. Set you debug logging on. This can be done however assuming both clusters have network access to each other On Wed, Jun 24, 2015 at 4:33 PM, Alexander Pivovarov wrote: > Hello Everyone > > Can I define external table on cluster_1 pointing to hdfs locatio

Re: Merging small files in partitions

2015-06-16 Thread Edward Capriolo
https://github.com/edwardcapriolo/filecrush On Tue, Jun 16, 2015 at 5:05 PM, Chagarlamudi, Prasanth < prasanth.chagarlam...@epsilon.com> wrote: > Hello, > > I am looking for an optimized way to merge small files in hive partitions > into one big file. > > I came across *Alter Table/Partition Con

Re: Hive-1.2.0 does not work with stock hadoop 2.6.0

2015-06-07 Thread Edward Capriolo
Should we add HADOOP_USER_CLASSPATH_FIRST=true to the hive scripts? On Sun, Jun 7, 2015 at 11:06 AM, Edward Capriolo wrote: > [edward@jackintosh apache-hive-1.2.0-bin]$ export > HADOOP_HOME=/home/edward/Downloads/hadoop-2.6.0 > [edward@jackintosh apache-hive-1.2.0-bin]$ bin/hive &g

Hive-1.2.0 does not work with stock hadoop 2.6.0

2015-06-07 Thread Edward Capriolo
[edward@jackintosh apache-hive-1.2.0-bin]$ export HADOOP_HOME=/home/edward/Downloads/hadoop-2.6.0 [edward@jackintosh apache-hive-1.2.0-bin]$ bin/hive Logging initialized using configuration in jar:file:/home/edward/Downloads/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties [E

Re: Keys in Hive

2015-06-02 Thread Edward Capriolo
Hive does not support primary key or other types of index constraints. On Tue, Jun 2, 2015 at 4:37 AM, Ravisankar Mani < ravisankarm...@syncfusion.com> wrote: > Hi everyone, > > > > I am unable to create an table in hive with primary key > > Example : > > > > create table Hivetable((name string)

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread Edward Capriolo
What about outer lateral view? On Wed, May 20, 2015 at 11:28 AM, matshyeq wrote: > From my experience SparkSQL is still way faster than tez. > Also, SparkSQL (even 1.2.1 which I'm on) supports *lateral view* > > On Wed, May 20, 2015 at 3:41 PM, Edward Capriolo > wrote

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread Edward Capriolo
Beyond window queries, hive still has concepts like cube or lateral view that many "better than hive" systems don't have. Also now many people went around broadcasting SparkSQL/SparkSQL was/is better/faster than hive but now that tez has "whooped" them in a benchmark they are very quite. http://w

Re: Hive documentation update for isNull, isNotNull etc.

2015-04-18 Thread Edward Capriolo
"show functions" returns = < etc. I believe I added NVL ( https://issues.apache.org/jira/browse/HIVE-2288) and hive also has coalesce. Even if you can access isNull as a function I think it might be more clear to just write the query as 'column IS NULL' that would be a more portable query. On Sat

Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-14 Thread Edward Capriolo
That is too many partitions. Way to much overhead in anything that has that many partitions. On Tue, Apr 14, 2015 at 12:53 PM, Tianqi Tong wrote: > Hi Slava and Ferdinand, > > Thanks for the reply! Later when I was looking at the hive.log, I found > Hive was indeed calculating the partition sta

hive-jdbc do set commands work on the connection or statement level

2015-04-07 Thread Edward Capriolo
I am setting compression variables in multiple statements conn.createStatement().execute("set compression.type=5=snappy"); conn.createStatement().execute("select into X ..."); Does the set statement set a connection level variable or a statement level variable? Or are things set in other ways? TX

Re: Is it possible to do a LEFT JOIN LATERAL in Hive?

2015-04-05 Thread Edward Capriolo
Lateral view does support outer if that helps. On Sunday, April 5, 2015, @Sanjiv Singh wrote: > Hi Jeremy, > > Adding to my response > > 1. Hive doesn't support named insertion , so need to use other ways of > insertion data in hive table .. > > 2. As you know , hive doesn't support LEFT J

Re: How to read Protobuffers in Hive

2015-03-25 Thread Edward Capriolo
You may be able to use: https://github.com/edwardcapriolo/hive-protobuf (Use the branch not master) This code is based on the avro support. It works well even with nested objects. On Wed, Mar 25, 2015 at 12:28 PM, Lukas Nalezenec < lukas.naleze...@firma.seznam.cz> wrote: > Hi, > I am trying

The curious case of the hive-server-2 empty partitions.

2015-03-24 Thread Edward Capriolo
Hey all, I have cloudera 5.3, and an issue involving HiveServer2, Hive. We have a process that launches Hive JDBC queries, hourly. This process selects from one table and builds another. It looks something like this (slightly obfuscated query) FROM beacon INSERT OVERWRITE TABLE author_arti

Call for case studies for Programming Hive, 2nd edition

2015-03-22 Thread Edward Capriolo
Hello all, Work is getting underway for Programming Hive 2nd Edition! One of the parts I enjoyed most is the case studies. They showed hive used in a number of enterprises and for different purposes. Since the 2nd edition is on the way I want to make another call for case studies and use cases of

Re: Why hive 0.13 will initialize derby database if the metastore parameters are not set in hive-site.xml?

2015-03-06 Thread Edward Capriolo
Make sure hive autogather stats is false . Or aetup the stats db On Friday, March 6, 2015, Jim Green wrote: > Hi Team, > > Starting from hive 0.13, if the metastore parameters are not set in > hive-site.xml, but we set in .hiverc, hive will try to initialize derby > database in current working d

Re: Which [open-souce] SQL engine atop Hadoop?

2015-01-31 Thread Edward Capriolo
assume? >>> >>> contrast all of this with an avro file on hadoop with metadata baked in, >>> and i think its safe to say hive metadata is not easily accessible. >>> >>> i will take a look at your book. i hope it has an example of using >>> thrift on

Re: Which [open-souce] SQL engine atop Hadoop?

2015-01-31 Thread Edward Capriolo
metadata is not easily accessible. > > i will take a look at your book. i hope it has an example of using thrift > on a secure cluster to contact hive metastore (without using the > HiveMetaStoreClient), that would be awesome. > > > > > On Sat, Jan 31, 2015 at 1:32 PM, Edward Capriolo > w

Re: Which [open-souce] SQL engine atop Hadoop?

2015-01-31 Thread Edward Capriolo
"with the metadata in a special metadata store (not on hdfs), and its not as easy for all systems to access hive metadata." I disagree. Hives metadata is not only accessible through the SQL constructs like "describe table". But the entire meta-store also is actually a thrift service so you have pr

Re: Hive JSON Serde question

2015-01-25 Thread Edward Capriolo
Nested lists require nested lateral views. On Sun, Jan 25, 2015 at 11:02 AM, Sanjay Subramanian < sanjaysubraman...@yahoo.com> wrote: > hey guys > > This is the Hive table definition I have created based on the JSON > I am using this version of hive json serde > https://github.com/rcongiu/Hive-JS

Re: Getting Tez working against cdh 5.3

2015-01-20 Thread Edward Capriolo
bs. My goal is to have a quick recipe for getting tez to work with cdh 5.3 with minimal hacking of the install. Edward On Tue, Jan 20, 2015 at 6:39 PM, Gopal V wrote: > On 1/20/15, 12:34 PM, Edward Capriolo wrote: > >> Actually more likely something like this: >> >&g

Re: Getting Tez working against cdh 5.3

2015-01-20 Thread Edward Capriolo
the > container.. try creating a symbolic link in /bin/ to point to java.. > > On Tue, Jan 20, 2015 at 7:22 AM, Edward Capriolo > wrote: > >> It seems that CDH does not ship with enough jars to run tez out of the >> box. >> >> I have found the related cloudera

Re: Getting Tez working against cdh 5.3

2015-01-20 Thread Edward Capriolo
wrote: > My guess is.. > "java" binary is not in PATH of the shell script that launches the > container.. try creating a symbolic link in /bin/ to point to java.. > > On Tue, Jan 20, 2015 at 7:22 AM, Edward Capriolo > wrote: > >> It seems that CDH does not

  1   2   3   4   5   6   7   >