ueries on Files - Concise syntax for running SQL
queries over files of any supported format without registering a
table.
https://issues.apache.org/jira/browse/SPARK-11197
I think now it's more clear why all companies move to Spark to do ETL.
On Fri, Jan 15, 2016 at 3:06 PM, Alexander Pivovarov
ed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
Time to use Spark and Spark-Sql in addition to Hive?
It's probably going to happen sooner or later anyway.
I sent you Spark solution yesterday. (you just need to write
unbzip2AndCsvToListOfArrays(file: String): List[Array[String]] function
using BZip2CompressorInputStream and Super CSV API)
you
an give it a
> different line delimiter, but Hive 1.2.1 does not support it: "FAILED:
> SemanticException 3:20 LINES TERMINATED BY only supports newline '\n' right
> now."
>
>
>
> *From:* Alexander Pivovarov [mailto:apivova...@gmail.com]
> *Sent:* Tuesday,
Try CSV serde. It should correctly parse quoted field value having newline
inside
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
Hadoop should automatically read bz2 files
On Tue, Jan 12, 2016 at 9:40 AM, Gerber, Bryan W
wrote:
> We are attempting to load CSV text files (compressed
at table, so I assume you only care about reading it. Is that
> right?
>
> .. Owen
>
> On Wed, Dec 2, 2015 at 9:53 PM, Alexander Pivovarov
> wrote:
>
>> Hi Everyone
>>
>> Is it possible to create Hive table from ORC or Parquet file without
>> specifying field names and their types. ORC or Parquet files contain field
>> name and type information inside.
>>
>> Alex
>>
>
>
Hi Everyone
Is it possible to create Hive table from ORC or Parquet file without
specifying field names and their types. ORC or Parquet files contain field
name and type information inside.
Alex
ssue
https://issues.apache.org/jira/browse/HIVE-6
On Wed, Jun 24, 2015 at 4:08 PM, Alexander Pivovarov
wrote:
> I tried on local hadoop/hive instance (hive is the latest from master
> branch)
>
> mydev is ha alias to remote ha name node.
>
> $ hadoop fs -ls hdfs://mydev/tmp/et1
> Found
This
> can be done however assuming both clusters have network access to each other
>
> On Wed, Jun 24, 2015 at 4:33 PM, Alexander Pivovarov > wrote:
>
>> Hello Everyone
>>
>> Can I define external table on cluster_1 pointing to hdfs location on
>> cl
Hello Everyone
Can I define external table on cluster_1 pointing to hdfs location on
cluster_2?
I tried and got some strange exception in hive
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:java.lang.reflect.InvocationTargetException)
I w
Thank you Xuefu!
Excellent explanation and comparison!
We should put it to Hive on Spark wiki.
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark
On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang wrote:
> I have been working on HIve on Spark, and knows a little about SparkSQL.
> Here a
cipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore nei
Hi Everyone
Lets say I have hive table in 2 datacenters. Table format can be textfile
or Orc.
There is scoop job running every day which adds data to the table.
Each datacenter has its own instance of scoop job.
In Ideal case scenario the data in these two table should be the same.
The same mean
maybe user which runs hive cli does not have write permissions on
hdfs://zhangj05-a:8020/user/hive/warehouse/reporting.db
who is hdfs://zhangj05-a:8020/user/hive/warehouse/reporting.db owner?
what user runs hive cli?
On Sat, Apr 11, 2015 at 11:07 AM, Jie Zhang wrote:
> Hi,
>
> I hit the foll
Ashish, Read The Friendly Manual below
https://hive.apache.org/mailing_lists.html
On Tue, Apr 7, 2015 at 2:15 PM, Ashish Garg
wrote:
> Hello Admin,
>
> Please unsubscribe me.
>
> Regards,
> Ashish Garg
>
Vivek,
You can see the version in two places
1.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions
stringinitcap(string A)Returns string, with the first letter of
each word in uppercase, all other letters in lowercase. Words are delimited
I can suggest 3 options
1. you can use JUnit test to test your UDF (e.g. TestGenericUDFLastDay)
2. you can create q file and test your UDF via mvn (look at udf_last_day.q)
mvn clean install -DskipTests -Phadoop-2
cd itest/qtest
mvn test -Dtest=TestCliDriver -Dqfile=udf_last_day.q
-Dtest.output.ov
Congrats to Matt, Jimmy and Sergio!
On Mon, Mar 23, 2015 at 11:30 AM, Chaoyu Tang wrote:
> Congratulations to Jimmy and Sergio!
>
> On Mon, Mar 23, 2015 at 2:08 PM, Carl Steinbach wrote:
>
>> The Apache Hive PMC has voted to make Jimmy Xiang, Matt McCline, and
>> Sergio Pena committers on the A
sort by query produces multiple independent files.
order by - just one file
usually sort by is used with distributed by.
In older hive versions (0.7) they might be used to implement local sort
within partition
similar to RANK() OVER (PARTITION BY A ORDER BY B)
On Sat, Mar 7, 2015 at 3:02 PM, ma
Several useful common udf methods we added to GenericUDF recently
https://issues.apache.org/jira/browse/HIVE-9744
you can look at the following UDFs as an example:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLevenshtein.java
https://githu
hive> create table test1 (c1 array) row format delimited collection
items terminated by ',';
OK
hive> insert into test1 select array(1,2,3) from dual;
OK
hive> select * from test1;
OK
[1,2,3]
hive> select c1[0] from test1;
OK
1
$ hadoop fs -cat /apps/hive/warehouse/test1/00_0
1,2,3
On Su
yes, we even have a ticket for that
https://issues.apache.org/jira/browse/HIVE-9600
btw can anyone test jdbc driver with kerberos enabled?
https://issues.apache.org/jira/browse/HIVE-9599
On Mon, Mar 2, 2015 at 10:01 AM, Nick Dimiduk wrote:
> Heya,
>
> I've like to use jmeter against HS2/JDBC a
Congrats!
On Wed, Feb 25, 2015 at 12:33 PM, Vaibhav Gumashta <
vgumas...@hortonworks.com> wrote:
> Congrats Sergey!
>
> On 2/25/15, 9:06 AM, "Vikram Dixit" wrote:
>
> >Congrats Sergey!
> >
> >On 2/25/15, 8:43 AM, "Carl Steinbach" wrote:
> >
> >>I am pleased to announce that Sergey Shelukhin has
Hi Everyone
Lets say I have a table partitioned by period string
how to select max period?
if I run
select max(period) from invoice;
hive 0.13.1 runs MR which is slow
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
hive csv serde is available for all hive versions
https://github.com/ogrodnek/csv-serde
DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER "
DEFAULT_SEPARATOR,
add jar path/to/csv-serde.jar; (or put it to hive/hadoop/mr
classpath on all boxes on cluster)
-- you can use custom separ
Congrats!
On Mon, Feb 9, 2015 at 12:31 PM, Carl Steinbach wrote:
> The Apache Hive PMC has voted to make Chao Sun, Chengxiang Li, and Rui Li
> committers on the Apache Hive Project.
>
> Please join me in congratulating Chao, Chengxiang, and Rui!
>
> Thanks.
>
> - Carl
>
>
rg/jira/browse/HIVE-7353.
>
> Thanks,
> —Vaibhav
>
> From: Alexander Pivovarov
> Reply-To: "user@hive.apache.org"
> Date: Wednesday, February 4, 2015 at 6:03 PM
> To: "user@hive.apache.org"
> Subject: Hiveserver2 memory / thread leak v 0.13.1 (hdp-2.1.5)
&
ROW_NUMBER doc
http://docs.oracle.com/cd/B28359_01/server.111/b28286/functions144.htm#SQLRF06100
On Thu, Feb 5, 2015 at 4:48 PM, r7raul1...@163.com
wrote:
> *Table structure :*
> CREATE TABLE `u_data`(
> `userid` int,
> `movieid` int,
> `rating` int,
> `unixtime` string)
> ROW FORMAT DELIMITED
, Alexander Pivovarov
wrote:
> I like Tez engine for hive (aka Stinger initiative)
>
> - faster than MR engine. especially for complex queries with lots of
> nested sub-queries
> - stable
> - min latency is 5-7 sec (0 sec for select count(*) ...)
> - capable to process huge
I like Tez engine for hive (aka Stinger initiative)
- faster than MR engine. especially for complex queries with lots of nested
sub-queries
- stable
- min latency is 5-7 sec (0 sec for select count(*) ...)
- capable to process huge datasets (not limited by RAM as Spark)
On Mon, Feb 2, 2015 at 6
Thank you, Lefty!
On Mon, Feb 2, 2015 at 3:59 PM, Lefty Leverenz
wrote:
> Done. Welcome to the Hive wiki team, Alexander!
>
> -- Lefty
>
> On Mon, Feb 2, 2015 at 2:14 PM, Alexander Pivovarov
> wrote:
>
>> Hi Everyone
>>
>> Can I get write access to hive
Hi Everyone
Can I get write access to hive wiki?
I need to put descriptions for several UDFs added recently (init_cap,
add_months, last_day, greatest, least)
Confluence username: apivovarov
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
Basically
1. if you join table try to filter out as much as possible in WHERE (to
reduce amount of data sent form map to reduce step)
2. if you join big table with small table (< 500 MB) use SELECT /*+
MAPJOIN(small_table) */ hint to avoid reduce step.
3. if you join big table with big table make
Sachin, it works
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
SET mapred.output.compression.type=BLOCK;
create table data1_seq STORED AS SEQUENCEFILE as select * from date1;
hadoop fs -cat /user/hive/warehouse/data1_seq/00_0
Sachin, it works
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
SET mapred.output.compression.type=BLOCK;
create table data1_seq STORED AS SEQUENCEFILE as select * from date1;
hadoop fs -cat /user/hive/warehouse/data1_seq/00_0
http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
On Mon, Apr 1, 2013 at 3:45 PM, Keith Wiley wrote:
> I need rank() in Hive. I have't had much luck with Edward Capriolo's on
> git and it comes with no documentation. It depends on hive-test (also by
>
https://cwiki.apache.org/Hive/hiveplugins.html
Creating Custom UDFs
First, you need to create a new class that extends UDF, with one or more
methods named evaluate.
package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;
public final class Lower
Hive supports only equi-join
I recommend you to read some hive manual before use it. (e.g.
http://hive.apache.org/docs/r0.9.0/language_manual/joins.html
https://cwiki.apache.org/Hive/languagemanual-joins.html)
on the first sentence it says "Only equality joins, outer joins, and left
semi joins are
Options
1. create table and put files under the table dir
2. create external table and point it to files dir
3. if files are small then I recomend to create new set of files using
simple MR program and specifying number of reduce tasks. Goal is to make
files size > hdfs block size (it safes NN me
39 matches
Mail list logo