While not active in the development community as much I have been using
hive in the field as well as spark and impala for some time.
My ancmecdotal opinion is that the current metastore needs a significant re
write to deal with "next generation" workloads. By next generation I
actually mean last g
In most sql drivers.. you can always use executeQuery even if the query has
no result set.
On Wednesday, September 15, 2021, Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:
> Hi Igyu,
> sending different SQL statements is exactly what beeline has to handle,
> I'd have a look at how
of updating a dependency broke one
> project, downgrading it broke a different project.
>
> https://github.com/apache/druid/pull/10683
>
> HDFS-15790
>
> On Sat, Feb 27, 2021, 7:31 PM Matt McCline
> wrote:
>
>> Yes to Hive 4 release. Plenty of changes (1,500+).
>>
t; It will be amazing if the community could produce a release every
> quarter/6months. :-)
>
> Le ven. 26 févr. 2021 à 14:30, Edward Capriolo a
> écrit :
>
>> Hive was releasable trunk for the longest time. Facebook days. Then the
>> big data vendors got more involved. Then it
Hive was releasable trunk for the longest time. Facebook days. Then the big
data vendors got more involved. Then it became a pissing match about
features. This vendor likes tez this vendor dont, this vendor likes hive on
spark this one dont.
Then this vendor wants to tell everyone hive stinks use
Hello all,
It has been a long time. I have been forced to use avro and create a table
with over 5k columns. It's helluva slow. I warned folks that all the best
practices say "dont make a table more than 1k or 2k columns" (impala hive
cloudera). No one listened to me, so now the table is a mess. Im
I like the approach of applying an arbitrary limit. Hive's q files tend to
add an ordering to everything. Would it make sense to simply order by
multiple columns in the result set and conduct a large diff on them?
On Wednesday, June 26, 2019, Sungwoo Park wrote:
> I have published a new article
I have changes jobs 3 times since tez was introduced. It is a true waste of
compute resources and time that it was never patched in. So I either have
to waste my time patching it in, waste my time running a side deployment,
or not installing it and waste money having queries run longer on mr/spark
n source hadoop, which will not work if you've
> kerberized the cluster. You'll have to build a version of ATS against CDH
> libraries that provides the classes needed to run the engine. We have done
> this work as well and it runs pretty smoothly.
>
>
>
> On Mon, Apr
Out of band question. Given:
https://hortonworks.com/blog/welcome-brand-new-cloudera/
Does cdh finally ship with a tea you dont have to manually patch in?
On Monday, April 15, 2019, Sungwoo Park wrote:
> I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH
> 5.15.2 a while ago.
I made a udtf a while back that's let's you specify lists of tuples from
there you can explode them into rows
On Thursday, March 28, 2019, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:
> Depending on the version you are using, table + values syntax is supported.
>
> https://
Thanks! Very cool.
On Sat, Mar 23, 2019 at 1:33 PM Sungwoo Park wrote:
> I am pleased to announce the release of MR3 0.6. New key features are:
>
> - In Hive on Kubernetes, DAGAppMaster can run in its own Pod.
> - MR3-UI requires only Timeline Server.
> - Hive on MR3 is much more stable because
9 11:59 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: just released: Docker image of a minimal Hive server
>>
>> Hello!
>>
>> If that might help, I did this repo a while ago:
>> https://github.com/FurcyPin/docker-hive-spark
>> It provides a pre-i
Good deal and great name!
On Thu, Feb 21, 2019 at 11:31 AM Aidan L Feldman (CENSUS/ADEP FED) <
aidan.l.feld...@census.gov> wrote:
> Hi there-
>
> I am a new Hive user, working at the US Census Bureau. I was interested in
> getting Hive running locally, but wanted to keep the dependencies isolated
We got bit pretty hard when "exchange partitions" was added. How many
people in ad-tech work with exchange's? everyone!
On Wed, May 30, 2018 at 1:38 PM, Alan Gates wrote:
> It is. You can see the definitive list of keywords at
> https://github.com/apache/hive/blob/master/ql/src/java/
> org/apac
True. The spec does not mandate the bucket files have to be there if they
are empty. (missing directories are 0 row tables).
Thanks,
Edward
On Tue, Apr 3, 2018 at 4:42 PM, Richard A. Bross wrote:
> Gopal,
>
> The Presto devs say they are willing to make the changes to adhere to the
> Hive bucke
ta in S3. They put all of the metadata in S3 except for a single link to
>> the name of the table's root metadata file.
>>
>> Other advantages of their design:
>>
>>- Efficient atomic addition and removal of files in S3.
>> - Consistent schema evolution
On Mon, Jan 29, 2018 at 12:44 PM, Owen O'Malley
wrote:
>
>
> On Jan 29, 2018, at 9:29 AM, Edward Capriolo
> wrote:
>
>
>
> On Mon, Jan 29, 2018 at 12:10 PM, Owen O'Malley
> wrote:
>
>> You should really look at what the Netflix guys are doin
g and bucketing.
>
>
> .. Owen
>
> On Sun, Jan 28, 2018 at 12:02 PM, Edward Capriolo
> wrote:
>
>> All,
>>
>> I have been bouncing around the earth for a while and have had the
>> privilege of working at 4-5 places. On arrival each place was in a var
All,
I have been bouncing around the earth for a while and have had the
privilege of working at 4-5 places. On arrival each place was in a variety
of states in their hadoop journey.
One large company that I was at had a ~200 TB hadoop cluster. They actually
ran PIG and there ops group REFUSED to
"You're off by a couple of orders of magnitude - in fact, that was my last
year's Hadoop Summit demo, 10 terabytes of Text on S3, converted to ORC +
LLAP."
"We've got sub-second SQL execution, sub-second compiles, sub-second
submissions … with all of it adding up to a single or double digit second
"Yes, it's a tautology - if you cared about performance, you'd use ORC,
because ORC is the fastest format."
It is not that simple. The average Hadoop user has years 6-7 of data. They
do not have a "magic" convert everything button. They also have legacy
processes that don't/can't be converted. The
"Hive 3.x branch has text vectorization and LLAP cache support for it, so
hopefully the only relevant concern about Text will be the storage costs
due to poor compression (& the lack of updates)."
I kept hearing about vectorization, but later found out it was going to
work if i used ORC. Litterall
ause it is the ONLY format all tools support
2) makes two outputs for each query using 2x space
(Can someone please make a competitor for Oozie? *grin*)
https://github.com/apache/incubator-airflow , mrjobs, luigi, askaban :)
On Tue, Jun 20, 2017 at 1:45 PM, Owen O'Malley
wrote:
>
>
&
It is whack that two optimized row columnar formats exists and each
respective project (hive/impala) has good support for one and lame/no
support for the other.
Impala is now an Apache project. Also 'whack' and 'lame' are technical
terms often used by the people in the real world that have to use
Think about it like this one system is scanning a local file ORC, using an
hbase scanner (over the network), and scanning the data in sstable format?
On Fri, Jun 9, 2017 at 5:50 AM, Amey Barve wrote:
> Hi Michael,
>
> "If there is predicate pushdown, then you will be faster, assuming that
> the
oad-maven-
> artifact-via-classloader
> https://github.com/treasure-data/digdag/tree/master/
> digdag-core/src/main/java/io/digdag/core/plugin
>
> Thanks,
> Makoto
>
> 2017-06-03 3:26 GMT+09:00 Edward Capriolo >:
> > Don't we currently support features that load fu
Don't we currently support features that load functions from external
places like maven http server etc? I wonder if it would be easier to back
port that back port a handful of functions ?
On Fri, Jun 2, 2017 at 2:22 PM, Alan Gates wrote:
> Rather than put that code in hive/contrib I was thinkin
On Fri, Jun 2, 2017 at 12:07 PM, Nishanth S wrote:
> Hello hive users,
>
> We are looking at migrating files(less than 5 Mb of data in total) with
> variable record lengths from a mainframe system to hive.You could think of
> this as metadata.Each of these records can have columns ranging from
For each row, for each element in exploded list emit 1 row.
Row r = { col1 }
Set x = [ a,b,c]
for (Object y : x){
emit(row, y)
}
If the set x size is 3 , three rows are output.
On Wed, May 31, 2017 at 8:31 AM, 张明磊 <18717838...@163.com> wrote:
> Hello experts,
> I was wondering to know
Im pretty sure schema tool does this for people who convert to ha name node.
On Wednesday, May 17, 2017, Neil Jonkers wrote:
> Hi,
>
> Inspecting the Hive Metastore tables.
> Table SDS has a location field.
>
> If for reason this does not work:
> "ALTER TABLE ... SET LOCATION ... ?"
>
> Manually
Here is a similar but not exact way I did something similar to what you
did. I had two data files in different formats the different columns needed
to be different features. I wanted to feed them into spark's:
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mining/The_FP-
The parquet orc thing has to be tje biggest detractor. Your forced to chose
between a format good for impala or good for hive.
On May 4, 2017 3:57 PM, "Gopal Vijayaraghavan" wrote:
> Hi,
>
>
> > Does Hive LLAP work with Parquet format as well?
>
>
>
> LLAP does work with the Parquet format, but
On Tue, Apr 18, 2017 at 3:32 PM, hernan saab
wrote:
> The effort of configuring an apache big data system by hand for your
> particular needs is equivalent to herding rattlesnakes and cats into one
> small room.
> The documentation is poor and most of the time the community developers
> don't rea
Nice job
On Saturday, April 8, 2017, Vaibhav Gumashta
wrote:
> The Apache Hive team is proud to announce the release of Apache Hive version
> 1.2.2.
>
> The Apache Hive (TM) data warehouse software facilitates querying and
> managing large datasets residing in distributed storage. Built on top
You should match your hive versions as close as possible. It makes sense
that both hive and hadoop dependencies use a PROVIDED scope, this way if
you are building an assembly/fat/shaded jar the jar is as thin as possible.
On Wed, Mar 29, 2017 at 3:01 PM, srinu reddy wrote:
>
>
> Hi
>
> I want to
On Fri, Mar 17, 2017 at 2:56 PM, hernan saab
wrote:
> I have been in a similar world of pain. Basically, I tried to use an
> external Hive to have user access controls with a spark engine.
> At the end, I realized that it was a better idea to use apache tez instead
> of a spark engine for my part
https://databricks.com/blog/2017/02/28/voice-facebook-using-apache-spark-large-scale-language-model-training.html?utm_campaign=Open%20Source&utm_content=47640295&utm_medium=social&utm_source=twitter
Always neglect to include the fact that spark has a complete copy of hive
inside of it!
There are a few hadoop vendors that make it an unnecesary burden on users
to get tez running.
This forces users to compile patch in tez support.
Imho this is shameful. These same vendors include all types of extra add
ins like say hbase or even mongo support.
This 'creative packaging' only serve
Error 40003]: Only External tables can have an explicit location
using hive 1.2. I got this error. This was definitely not a requirement
before
Why way this added? External table ONLY used to be dropping the table will
not drop the physical files.
I have been contemplating attaching meta data for the query lineage to each
table such that I can know where the data came from and have a 1 click
regenerate button.
On Wed, Dec 21, 2016 at 3:02 PM, Stephen Sprague wrote:
> my 2 cents. :)
>
> as soon as you say "complex query" i would submit you
I believe json itself has encoding rules. What i suggest you do is build
your own input format or serde and escape those fieds possibly by
converting them to hex.
On Wednesday, November 23, 2016, Dana Ram Meghwal wrote:
> Hey,
> Any leads?
>
> On Tue, Nov 22, 2016 at 5:35 PM, Dana Ram Meghwal >
Technically very do-able timestamps and other decimal types have been added
over the years. It actually turns out to be a fair amount of work mostly
due to the proliferations of serde that need to be able to read/write that
type
On Sat, Nov 19, 2016 at 4:11 PM, Juan Delard de Rigoulières <
j...@da
On Mon, Nov 7, 2016 at 1:46 PM, Eugene Koifman
wrote:
> can you check if the user that the metastore is running as has right to
> write to the table dir?
>
> On 10/26/16, 12:23 PM, "aft" wrote:
>
> >On Thu, Oct 27, 2016 at 12:00 AM, Eugene Koifman
> > wrote:
> >> could you provide output of ³SHO
Mich,
Looking through the event on a few talks seem to be about hadoop and none
mention hive.
I understand how hive and this conference relate but I believe this is off
topic for the hive mailing list.
Thank you,
Edward
On Wednesday, November 2, 2016, Mich Talebzadeh
wrote:
> Hi,
>
> For tho
I have written nagios scripts that watch the job tracker UI and report when
things take too long.
On Thu, Sep 1, 2016 at 11:08 AM, Loïc Chanel
wrote:
> On the topic of timeout, if I may say, they are a dangerous way to deal
> with requests as a "good" request may last longer than an "evil" one.
Hive uses map side aggregation instead of combiners
http://dev.bizo.com/2013/02/map-side-aggregations-in-apache-hive.html
On Mon, Aug 8, 2016 at 2:59 PM, Edson Ramiro wrote:
> hi all,
>
> I'm executing TPC-H on Hive 2.0.1, using Yarn 2.7, and I'm wondering if
> Hive implements any Combiner by d
A few entities going to "kill/take out/better than hive"
I seem to remember HadoopDb, Impala, RedShift , voltdb...
But apparent hive is still around and probably faster
http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final
On Sun, Aug 7, 2016 at 9:49 PM, 理 wrote:
e format in here is
> parquet, and am able to view sample sets for each column and raw select
> queries are working just fine, but none of min / max / distinct / where 'd
> work.
>
> Thanks,
>
> On Sun, Aug 7, 2016 at 6:38 PM, Edward Capriolo
> wrote:
>
>> You ne
You need to take this up with the appropriate hue/cloudera user group. One
issue is that SQL lite is a embedded single user database and does not work
well with more than one user. We switched to postges in our deployment and
would still hit this issue. I never got it resolved,
On Sun, Aug 7, 2016
I build one a long time ago still in tree very defunct best bet is working
with hue or whatever Horton is pushing as not to fragment 4 ways.
On Jul 15, 2016 12:56 PM, "Mich Talebzadeh"
wrote:
> Hi Marcin,
>
> For Hive on Spark I can use Spark 1.3.1 UI which does not have DAG diagram
> (later vers
Good stuff!
On Fri, Apr 29, 2016 at 1:30 PM, Jörn Franke wrote:
> Dear all,
>
> I prepared a small Serde to analyze Bitcoin blockchain data with Hive:
>
> https://snippetessay.wordpress.com/2016/04/28/hive-bitcoin-analytics-on-blockchain-data-with-sql/
>
> There are some example queries, but I w
+1
On Friday, April 22, 2016, Lars Francke wrote:
> Yet another update. I went through the PMC list.
>
> These seven have not been active (still the same list as Vikram posted
> during the last vote):
> Ashish Thusoo
> Kevin Wilfong
> He Yongqiang
> Namit Jain
> Joydeep Sensarma
> Ning Zhang
> R
Both thfrift and protbuf are wire compatible but NOT classpath compatible,
you need to make sure that you are using one version (even down to the
minor version) across all your codebase.
On Tue, Mar 22, 2016 at 12:05 PM, kalai selvi wrote:
> Hi,
>
> I am using Hive 0.13 in Amazon EMR. I am stuck
Explicit conversion is done using cast (x as bigint)
You said: As a matter of interest what is the underlying storage for
Integer?
This is dictated on disk by the input format the "temporal in memory
format" is dictated by the serde, an integer could be stored as "1",
"1" , as dictated by the Inp
The IN UDF is a special one in that unlike many others there is support in
the ANTLR language and parsers for it. The rough answer is it can be done
but it is not as direct as making other UDFs.
On Tue, Mar 8, 2016 at 2:32 PM, Lavelle, Shawn
wrote:
> Hello All,
>
>I hope that this question
My nocks on impala. (not intended to be a post knocking impala)
Impala really has not delivered on the complex types that hive has (after
promising it for quite a while), also it only works with the 'blessed'
input formats, parquet, avro, text.
It is very annoying to work with impala, In my versi
There is no comprehensive list, each serde could use the parameters for
whatever it desires while other serde's use none at all.
On Fri, Feb 19, 2016 at 3:23 PM, mahender bigdata <
mahender.bigd...@outlook.com> wrote:
> +1, Any information available ?
>
> On 2/10/2016 1:26 AM, Mathan Rajendran wr
metastore
>>
>>
>>
>> yeah but have you ever seen somewhat write a real analytical program in
>> hive? how? where are the basic abstractions to wrap up a large amount of
>> operations (joins, groupby's) into a single function call? where are the
>>
Lol very off beat convo for the hive list. Lets not drag ourselves too far
down here.
On Wednesday, February 3, 2016, Stephen Sprague wrote:
> i refuse to take anybody seriously who has a sig file longer than one line
> and that there is just plain repugnant.
>
> On Wed, Feb 3, 2016 at 1:47 PM,
t for that?
>>
>> for example in spark i can write a DataFrame => DataFrame that internally
>> does many joins, groupBys and complex operations. all unit tested and
>> perfectly re-usable. and in hive? copy paste round sql queries? thats just
>> dangerous.
>>
>> On
Hive has numerous extension points, you are not boxed in by a long shot.
On Tuesday, February 2, 2016, Koert Kuipers wrote:
> uuuhm with spark using Hive metastore you actually have a real
> programming environment and you can write real functions, versus just being
> boxed into some version of
create table ()..
[COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char]
collection items terminated by ',' map keys terminated by ':' works in many
cases
On Wed, Jan 20, 2016 at 9:07 PM, Buntu Dev wrote:
> I found the brickhouse Hive udf `json_map' that seems to conver
d ticket
and come up with a more elegant solution.
On Fri, Jan 8, 2016 at 12:26 PM, Ophir Etzion wrote:
> Thanks!
> In certain use cases you could but forgot about the aux thing, thats
> probably it.
>
> On Fri, Jan 8, 2016 at 12:24 PM, Edward Capriolo
> wrote:
>
>>
You can not 'add jar' input formats and serde's. They need to be part of
your auxlib.
On Fri, Jan 8, 2016 at 12:19 PM, Ophir Etzion wrote:
> I tried now. still getting
>
> 16/01/08 16:37:34 ERROR exec.Utilities: Failed to load plan:
> hdfs://hadoop-alidoro-nn-vip/tmp/hive/hive/c2af9882-38a9-42b
TS"
>
>
>
> # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
>
> export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
>
>
>
> and HADOOP_HEAPSIZE=4096
>
>
>
> I’m assuming just raising the above would work.
>
>
>
This message means the garbage collector runs but is unable to free memory
after trying for a while.
This can happen for a lot of reasons. With hive it usually happens when a
query has a lot of intermediate data.
For example imaging a few months ago count (distinct(ip)) returned 20k.
Everything w
hive --service lineage 'hql' exists i believe.
On Tue, Dec 29, 2015 at 3:05 PM, Yang wrote:
> I'm trying to create a utility to parse out the data lineage (i.e. DAG
> dependency graph) among all my hive scripts.
>
> to do this I need to parse out the input and output tables from a query.
> does
are csv and dat. Any possibility to
> include 2 serialization.null format in table property
>
> On 12/23/2015 9:16 AM, Edward Capriolo wrote:
>
> In text formats the null is accepted as \N.
>
> On Wed, Dec 23, 2015 at 12:00 PM, mahender bigdata <
>
> mahender.bigd...@o
In text formats the null is accepted as \N.
On Wed, Dec 23, 2015 at 12:00 PM, mahender bigdata <
mahender.bigd...@outlook.com> wrote:
> Hi,
>
> Is there any possibility of mentioning both*
> "serialization.null.format"="" and **"serialization.null.format"="\000"
> *has table properties, current
2) Sometimes I find that managed tables are not removed from HDFS even
after I drop them from the Hive shell. After a "drop table foo", foo does
not show up in a "show tables" listing however that table is present in
HDFS. These are not external tables.
I have noticed this as well. Sometimes this
So I have strict mode on and I like to keep it that way.
I am trying to do this query.
INSERT OVERWRITE TABLE vertical_stats_recent PARTITION (dt=2015101517)
SELECT ...
FROM entry_hourly_v3 INNER JOIN article_meta ON
entry_hourly_v3.entry_id = article_meta.entry_id
INNER JOIN channel_meta ON
cha
You can define them in groovy from inside the CLI...
https://gist.github.com/mwinkle/ac9dbb152a1e10e06c16
On Thu, Oct 1, 2015 at 12:57 PM, Ryan Harris
wrote:
> If you want to use python...
>
> The python script should expect tab-separated input on stdin and it should
> return tab-separated deli
Right. The big place the bucketing is leveraged is on bucket based joins.
On Thu, Sep 24, 2015 at 3:29 AM, Jeff Zhang wrote:
> I have one table which is bucketed on column name. Then I have the
> following sql:
>
> - select count(1) from student_bucketed_2 where name = 'calvin
> nixon';
Macro's are in and tested. No one will remove them. The unit tests ensure
they keep working.
On Fri, Sep 11, 2015 at 3:38 PM, Elliot West wrote:
> Hi,
>
> I noticed some time ago the Hive Macro feature. To me at least this seemed
> like an excellent addition to HQL, allowing the user to encapsul
Yes. Specifically the avro ser-de like avro support "evolving schema".
On Mon, Aug 31, 2015 at 5:15 PM, Dominik Choma
wrote:
> I have external hcat structures over lzo-compressed datafiles , data is
> partitioned by date string
> Is it possible to handle schema changes by setting diffrent schema
Hey all. I am using cloudera 5.4.something which uses hive 1.1 almost.
I am getting bit by this error:
https://issues.apache.org/jira/browse/HIVE-10437
So I am trying to update my test setup to 1.1 so I can include the
annotation.
@SerDeSpec(schemaProps = {serdeConstants.LIST_COLUMNS,
You probably need to make your own serde/input format that trims the line.
On Fri, Jul 3, 2015 at 8:15 AM, ram kumar wrote:
> when i map the hive table to locate the s3 path,
> it throws exception for the* new line at the beginning of line*.
> Is there a solution to trim the new line at the begi
I do not know what your exact problem is. Set you debug logging on. This
can be done however assuming both clusters have network access to each other
On Wed, Jun 24, 2015 at 4:33 PM, Alexander Pivovarov
wrote:
> Hello Everyone
>
> Can I define external table on cluster_1 pointing to hdfs locatio
https://github.com/edwardcapriolo/filecrush
On Tue, Jun 16, 2015 at 5:05 PM, Chagarlamudi, Prasanth <
prasanth.chagarlam...@epsilon.com> wrote:
> Hello,
>
> I am looking for an optimized way to merge small files in hive partitions
> into one big file.
>
> I came across *Alter Table/Partition Con
Should we add
HADOOP_USER_CLASSPATH_FIRST=true
to the hive scripts?
On Sun, Jun 7, 2015 at 11:06 AM, Edward Capriolo
wrote:
> [edward@jackintosh apache-hive-1.2.0-bin]$ export
> HADOOP_HOME=/home/edward/Downloads/hadoop-2.6.0
> [edward@jackintosh apache-hive-1.2.0-bin]$ bin/hive
&g
[edward@jackintosh apache-hive-1.2.0-bin]$ export
HADOOP_HOME=/home/edward/Downloads/hadoop-2.6.0
[edward@jackintosh apache-hive-1.2.0-bin]$ bin/hive
Logging initialized using configuration in
jar:file:/home/edward/Downloads/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties
[E
Hive does not support primary key or other types of index constraints.
On Tue, Jun 2, 2015 at 4:37 AM, Ravisankar Mani <
ravisankarm...@syncfusion.com> wrote:
> Hi everyone,
>
>
>
> I am unable to create an table in hive with primary key
>
> Example :
>
>
>
> create table Hivetable((name string)
What about outer lateral view?
On Wed, May 20, 2015 at 11:28 AM, matshyeq wrote:
> From my experience SparkSQL is still way faster than tez.
> Also, SparkSQL (even 1.2.1 which I'm on) supports *lateral view*
>
> On Wed, May 20, 2015 at 3:41 PM, Edward Capriolo
> wrote
Beyond window queries, hive still has concepts like cube or lateral view
that many "better than hive" systems don't have.
Also now many people went around broadcasting SparkSQL/SparkSQL was/is
better/faster than hive but now that tez has "whooped" them in a benchmark
they are very quite.
http://w
"show functions" returns = < etc. I believe I added NVL (
https://issues.apache.org/jira/browse/HIVE-2288) and hive also has
coalesce. Even if you can access isNull as a function I think it might be
more clear to just write the query as 'column IS NULL' that would be a more
portable query.
On Sat
That is too many partitions. Way to much overhead in anything that has that
many partitions.
On Tue, Apr 14, 2015 at 12:53 PM, Tianqi Tong wrote:
> Hi Slava and Ferdinand,
>
> Thanks for the reply! Later when I was looking at the hive.log, I found
> Hive was indeed calculating the partition sta
I am setting compression variables in multiple statements
conn.createStatement().execute("set compression.type=5=snappy");
conn.createStatement().execute("select into X ...");
Does the set statement set a connection level variable or a statement level
variable? Or are things set in other ways?
TX
Lateral view does support outer if that helps.
On Sunday, April 5, 2015, @Sanjiv Singh wrote:
> Hi Jeremy,
>
> Adding to my response
>
> 1. Hive doesn't support named insertion , so need to use other ways of
> insertion data in hive table ..
>
> 2. As you know , hive doesn't support LEFT J
You may be able to use:
https://github.com/edwardcapriolo/hive-protobuf
(Use the branch not master)
This code is based on the avro support. It works well even with nested
objects.
On Wed, Mar 25, 2015 at 12:28 PM, Lukas Nalezenec <
lukas.naleze...@firma.seznam.cz> wrote:
> Hi,
> I am trying
Hey all,
I have cloudera 5.3, and an issue involving HiveServer2, Hive.
We have a process that launches Hive JDBC queries, hourly. This process
selects from one table and builds another.
It looks something like this (slightly obfuscated query)
FROM beacon INSERT OVERWRITE TABLE author_arti
Hello all,
Work is getting underway for Programming Hive 2nd Edition! One of the parts
I enjoyed most is the case studies. They showed hive used in a number of
enterprises and for different purposes.
Since the 2nd edition is on the way I want to make another call for case
studies and use cases of
Make sure hive autogather stats is false . Or aetup the stats db
On Friday, March 6, 2015, Jim Green wrote:
> Hi Team,
>
> Starting from hive 0.13, if the metastore parameters are not set in
> hive-site.xml, but we set in .hiverc, hive will try to initialize derby
> database in current working d
assume?
>>>
>>> contrast all of this with an avro file on hadoop with metadata baked in,
>>> and i think its safe to say hive metadata is not easily accessible.
>>>
>>> i will take a look at your book. i hope it has an example of using
>>> thrift on
metadata is not easily accessible.
>
> i will take a look at your book. i hope it has an example of using thrift
> on a secure cluster to contact hive metastore (without using the
> HiveMetaStoreClient), that would be awesome.
>
>
>
>
> On Sat, Jan 31, 2015 at 1:32 PM, Edward Capriolo
> w
"with the metadata in a special metadata store (not on hdfs), and its not
as easy for all systems to access hive metadata." I disagree.
Hives metadata is not only accessible through the SQL constructs like
"describe table". But the entire meta-store also is actually a thrift
service so you have pr
Nested lists require nested lateral views.
On Sun, Jan 25, 2015 at 11:02 AM, Sanjay Subramanian <
sanjaysubraman...@yahoo.com> wrote:
> hey guys
>
> This is the Hive table definition I have created based on the JSON
> I am using this version of hive json serde
> https://github.com/rcongiu/Hive-JS
bs.
My goal is to have a quick recipe for getting tez to work with cdh 5.3 with
minimal hacking of the install.
Edward
On Tue, Jan 20, 2015 at 6:39 PM, Gopal V wrote:
> On 1/20/15, 12:34 PM, Edward Capriolo wrote:
>
>> Actually more likely something like this:
>>
>&g
the
> container.. try creating a symbolic link in /bin/ to point to java..
>
> On Tue, Jan 20, 2015 at 7:22 AM, Edward Capriolo
> wrote:
>
>> It seems that CDH does not ship with enough jars to run tez out of the
>> box.
>>
>> I have found the related cloudera
wrote:
> My guess is..
> "java" binary is not in PATH of the shell script that launches the
> container.. try creating a symbolic link in /bin/ to point to java..
>
> On Tue, Jan 20, 2015 at 7:22 AM, Edward Capriolo
> wrote:
>
>> It seems that CDH does not
1 - 100 of 622 matches
Mail list logo