Re: Adding a virtual column for a custom input format

2020-05-06 Thread Gopal V
Hi, > I'm hoping someone can help me shed some light on how Hive deals with virtual columns. The virtual column impl is not extensible in Hive, it is a fixed set of enums. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java#L64 Howev

Re: Query rerun with global limitation

2015-01-26 Thread Gopal V
On 1/26/15, 9:18 AM, Philippe Kernévez wrote: This degradation is due to this bug (requests are replayed with a full scan) : https://issues.apache.org/jira/browse/HIVE-9382 I doubt that is the issue you are hitting, if you're moving from 0.13 to 0.14. You are possibly hitting HIVE-9401. To

Re: Getting Tez working against cdh 5.3

2015-01-23 Thread Gopal V
ty up easily without being an administrator or overwriting any of the system installed JARs. HTH. Cheers, Gopal On Tue, Jan 20, 2015 at 6:39 PM, Gopal V wrote: On 1/20/15, 12:34 PM, Edward Capriolo wrote: Actually more likely something like this: https://issues.apache.org/jira/browse/TEZ-1621

Re: Spark performance for small queries

2015-01-22 Thread Gopal V
is almost split size. Thanks, Chandra On Fri, Jan 23, 2015 at 5:01 AM, Gopal V wrote: On 1/22/15, 3:03 AM, Saumitra Shahapure (Vizury) wrote: We were comparing performance of some of our production hive queries between Hive and Spark. We compared Hive(0.13)+hadoop (1.2.1) against both Spark 0.9

Re: Spark performance for small queries

2015-01-22 Thread Gopal V
On 1/22/15, 3:03 AM, Saumitra Shahapure (Vizury) wrote: We were comparing performance of some of our production hive queries between Hive and Spark. We compared Hive(0.13)+hadoop (1.2.1) against both Spark 0.9 and 1.1. We could see that the performance gains have been good in Spark. Is there an

Re: Getting Tez working against cdh 5.3

2015-01-20 Thread Gopal V
On 1/20/15, 12:34 PM, Edward Capriolo wrote: Actually more likely something like this: https://issues.apache.org/jira/browse/TEZ-1621 I have a working Hive-13 + Tez install on CDH-5.2.0-1.cdh5.2.0.p0.36. Most of the work needed to get that to work was to build all of Hive+Tez against the CDH

Re: Trying to improve compression ratio for an ORC table

2015-01-18 Thread Gopal V
On 1/18/15, 10:11 PM, Daniel Haviv wrote: I have an ORC table with the "orc.compress"="SNAPPY" property that weighs 4.9 GB and is composed of 253 files.. I then do a CTAS into a new table where I added this property "orc.compress.size"="2485760" to improve the compression ratio. The new table w

Re: Tez session after closing CLI

2014-12-08 Thread Gopal V
On 12/8/14, 10:09 PM, Fabio wrote: Hi everyone, when running Hive on Tez, a Tez session is alive within the Hive CLI until I leave the CLI. So if I run on the terminal something like "hive -f query.sql", once the query is completed the Tez session is closed. Is there a way to run a query in this

Re: Insert into dynamic partitions performance

2014-12-06 Thread Gopal V
ets insert speed. Cheers, Gopal On 7 בדצמ׳ 2014, at 06:06, Gopal V wrote: On 12/6/14, 6:27 AM, Daniel Haviv wrote: Hi, I'm executing an insert statement that goes over 1TB of data. The map phase goes well but the reduce stage only used one reducer which becomes a great bottleneck. Ar

Re: Insert into dynamic partitions performance

2014-12-06 Thread Gopal V
On 12/6/14, 6:27 AM, Daniel Haviv wrote: Hi, I'm executing an insert statement that goes over 1TB of data. The map phase goes well but the reduce stage only used one reducer which becomes a great bottleneck. Are you inserting into a bucketed or sorted table? If the destination table is bucket

Re: Enabling Tez sessions on HiveServer2

2014-12-04 Thread Gopal V
On 12/3/14, 3:34 PM, Pala M Muthaia wrote: I didn't know doAs needs to be turned off. But I don't think that is something to give up - users create tables, manage data, query etc, and we need the queries/jobs to run as the user who submitted them for various purposes including authorization, audi

Re: Hive on Tez Error

2014-11-21 Thread Gopal V
On 11/21/14, 10:11 AM, peterm_second wrote: Caused by: java.io.IOException: Previous writer likely failed to write hdfs://hadoop-nn.mo-data.com:9000/tmp/hive/root/_tez_session_dir/a0087fb2-1430-43fa-b3e1-06644ab4961d/*. Failing because I am unlikely to write too. at org.apache.hadoop.hiv

Re: basic, dumb getting started question (single-node)

2014-11-12 Thread Gopal V
On 11/12/14, 1:27 PM, Nicholas Murphy wrote: Hadoop version 2.5.1, Hive version 0.13.1, Oracle JDK (1.6, I believe), Debian 7.7. I notice the default conf/ directory has a bunch of template files, but only that. Can someone point me to a resource, or to an example of what configuration I nee

Re: row_number() over(Partition by) Throw Error with Null Input.

2014-11-09 Thread Gopal V
On 11/9/14, 10:16 PM, karthik Srivasthava wrote: select row_number() over (PARTITION BY country,state,department,branch_name) from Employee_details; select count(*) over (PARTITION BY country,state,department,branch_name) from Employee_details; You haven't posted the entire back trace, so I'm

Re: Tez Vertex failure

2014-10-03 Thread Gopal V
On 10/3/14, 5:20 PM, Echo Li wrote: thanks for reply! the query is: *select count(customerid) from tableName where ymd=20140930 ;* That is simple enough that it should work anyway. There is strong possibility that the rest of that RuntimeException gives a clue to the problem - if anything is

Re: Hive Index and ORC

2014-09-09 Thread Gopal V
On 9/6/14, 9:36 AM, Alain Petrus wrote: I am wondering whether is it possible to use Hive index and ORC format? Does it make sense? ORC maintains its own indexes within the file - one index record every 10,000 rows (orc.row.index.stride / orc.create.index). You can take advantage of it du

Re: New to TEZ

2014-08-13 Thread Gopal V
On 8/11/14, 1:48 PM, karthik Srivasthava wrote: Hi, Below was my log.. I couldnt find where the error is. Can you please point out what caused my error... The correct items to post back would be hive.tez.container.size, hive.tez.java.opts and the value of io.sort.mb. It does look like you h

Re: Tuning Triangle Joins on Hive

2014-08-06 Thread Gopal V
On 7/31/14, 12:28 PM, Firas Abuzaid wrote: We're running various "triangle" join queries on Hive 0.9.0, and we're wondering if we can get any better performance. Here's the query we're running: SELECT count(*) FROM table r1 JOIN table r2 ON (r1.dst = r2.src) JOIN table r3 ON (r2.dst = r3.src AN