For HDP specific questions, you should use the Hortonworks lists:
http://hortonworks.com/community/forums/forum/hive/
Your question is about the difference between Hive 0.9 and Hive 0.11.
The big additions are:
Decimal type
ORC files
Analytics functions - cube roll up
Windowing functions
Adding to that
- Multiple files can be concatenated from the directory like
Example: cat 0-0 00-1 0-2 > final
From: Raj Hadoop
To: "user@hive.apache.org" ; "matouk.iftis...@ysance.com"
Sent: Friday, July 5, 2013 12:17 AM
Subject: Re: How Can
hive > set hive.io.output.fileformat=CSVTextFile;
hive > insert overwrite local directory '/usr/home/hadoop/da1/' select * from
customers
*** customers is a Hive table
From: Edward Capriolo
To: "user@hive.apache.org"
Sent: Friday, July 5, 2013 12:10 AM
Normally if use set mapred.reduce.tasks=1 you get one output file. You can
also look at
*hive*.*merge*.*mapfiles*, mapred.reduce.tasks, hive.merge.reducefiles also
you can use a separate tool https://github.com/edwardcapriolo/filecrush
On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar wrote:
> will h
Since you are launching locally you have to account for this.
1) If multiple jobs are running they become a burden on the local memory of
the system
2) Your local parameters like java heap size Xmx or mapred.child.java.opts
may be getting applied locally, if you are doing distinct queries they may
the one i said does not work on hdfs files. Its just one way to write the
stdlog to a file.
I am not sure if hive allows you named files for output and the above
settings will make your query run really slow if you have large dataset.
if you are really specific on having a filename then for now I
Thanks for your responses,
effctively the answer of Bertrand make this possible: the set of hive
properities below froce thet job to write the hive result in one file
whithout specifing the name (_0) :
set hive.exec.reducers.max = 1;
set mapred.reduce.tasks = 1;
for Nitin, I want to store t
I have found that for output larger than a few GB, redirecting stdout results
in an incomplete file. For very large output, I do CREATE TABLE MYTABLE AS
SELECT ... and then copy the resulting HDFS files directly out of
/user/hive/warehouse.
From: Bertrand De
Hi Hive Team,
Currently am developing and testing the Hive queries in HDP 1.1 with Hadoop
1.0.3 and Hive 0.9.0
However, it seems that my production is going to get upgraded to HDP 1.3 in
near future. Will it will impact with respect to design, optimization?
Please suggest.
Regards, Kumar Chin
Hi.
My guess is that you can try to look it up in their docs or mailing lists
(Amazon EMR). IIRC, CDH had the patch for Avro+Hive before it was included
in Hive itself, so Amazon EMR can have similar patches...
Ruslan
On Thu, Jul 4, 2013 at 12:20 PM, Dan Filimon wrote:
> Hi!
>
> I'm working on
The question is what is the volume of your output. There is one file per
output task (map or reduce) because that way each can write it
independently and in parallel. That's how mapreduce work. And except by
forcing the number of tasks to 1, there is no certain way to have one
output file.
But ind
will hive -e "query" > filename or hive -f query.q > filename will do ?
you specially want it to write into a named file on hdfs only?
On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN
wrote:
> Hello Hive users,
> Is there a manner to store the Hive query result (SELECT *.) in a
> specfique
> Local mode really helps with those little delays.
It definately helps for small data sets. But my concerns are about consistency
of results with distributed modes and some requests that fails only when it is
triggered (see my description below).
From: Edward
One setting was missing:
hive.metastore.authorization.storage.checks true
This solves the problem
-Original Message-
From: Shunichi Otsuka [mailto:sots...@yahoo-corp.jp]
Sent: Thursday, July 04, 2013 2:28 PM
To: user@hive.apache.org
Subject: metastore security issue
I am trying to s
Hello Hive users,
Is there a manner to store the Hive query result (SELECT *.) in a
specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
DIRECTORY '/directory_path_name/')?
Thanks for your answers
Sorry, just caught up with the last couple of day’s email and I feel that this
question
has already been answered fairly comprehensively. Apologies.
Z
From: Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
Sent: 04 July 2013 08:37
To: user@hive.apache.org
Subject: RE: Partition performanc
Hi!
I'm working on a few Avro MapReduce jobs whose output will end up on S3 to
be processed by Hive.
Amazon's latest Hive version [1] is 0.8.1 but Avro support was added in
0.9.1.
I can only find the haivvreo project [2] that supports 0.7.
Is this my only option?
Thanks!
[1] http://aws.amazon.c
Hi,
Just to check that I understand this problem, my reading suggests that the
overhead of
many partitions is currently unavoidable. Specifically this means that any
query on a table that has, let’s say, 10,000 partitions
will be significantly slower (than on un-partitioned table with the “same”
18 matches
Mail list logo