Rare ORC bug when used with preemption

2014-01-29 Thread Steven Wong
We had a Hive MR job writing to a table using the ORC file format. The cluster had fair share scheduler with task preemption enabled. For one of the job's reduce tasks, task attempt 0 finished putting the output file to the final location and, before it completed, was killed by preemption. All subs

Re: 回复: hive 0.11 auto convert join bug report

2013-09-25 Thread Steven Wong
(Operator.java:832) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:652) ... 9 more On Wed, Sep 25, 2013 at 2:16 PM, Steven Wong wrote: > For me, the bug exhibits itself in Hive 0.11 as the following stack trace. > I'm putting it here so that people sea

Re: 回复: hive 0.11 auto convert join bug report

2013-09-25 Thread Steven Wong
For me, the bug exhibits itself in Hive 0.11 as the following stack trace. I'm putting it here so that people searching on a similar problem can find this discussion thread in a web search. The discussion thread contains a workaround and a patch. java.lang.RuntimeException: org.apache.hadoop.hive.

RE: Field delimited by chr(28)

2012-04-04 Thread Steven Wong
FIELDS TERMINATED BY '\001' is for chr(1). I'm not sure if it's decimal or octal. But you can test it out. From: Chen, Stefanie (GBI) [mailto:stefanie.kim.c...@hp.com] Sent: Wednesday, April 04, 2012 1:10 PM To: user@hive.apache.org Subject: Field delimited by chr(28) I have a file which has fi

RE: Hive server concurrency question

2012-03-28 Thread Steven Wong
There are jiras on Hive Server concurrency-related issues, some open and some closed, including: https://issues.apache.org/jira/browse/HIVE-80 https://issues.apache.org/jira/browse/HIVE-1019 https://issues.apache.org/jira/browse/HIVE-1884 https://issues.apache.org/jira/browse/HIVE-2022 https://is

RE: How to get job names and stages of a query?

2012-03-20 Thread Steven Wong
The Hive history file contains the job id and other job run-time info. Not sure if there’s API on top of it or not. From: Felix.徐 [mailto:ygnhz...@gmail.com] Sent: Tuesday, March 20, 2012 12:14 AM To: user@hive.apache.org; manishbh...@rocketmail.com Subject: Re: How to get job names and stages of

RE: Same tablename in DESTINATION and FROM clause

2012-03-16 Thread Steven Wong
What I've observed is: "yes" if the table is HDFS-backed; "it depends" if otherwise. From: Mohit Gupta [mailto:success.mohit.gu...@gmail.com] Sent: Thursday, March 15, 2012 9:23 PM To: user@hive.apache.org; Ramkumar Subject: Re: Same tablename in DESTINATION and FROM clause No. First write the

RE: Basic statement problems

2012-03-09 Thread Steven Wong
The LOCATION clause has to specify the directory that contains (only) your data files. -Original Message- From: Keith Wiley [mailto:kwi...@keithwiley.com] Sent: Friday, March 09, 2012 3:44 PM To: user@hive.apache.org Subject: Basic statement problems I successfully installed and used Hi

RE: Amazon EMR Best Practices for Hive metastore

2012-03-06 Thread Steven Wong
We run a multi-AZ RDS instance hosting our metastore, which is shared by multiple EMR clusters. We utilize RDS's backup/snapshot feature, although we haven't encountered a need to restore from backup for real yet (knock on wood). -Original Message- From: Sam Wilson [mailto:swil...@moneta

FW: Post-gres Hive Metastore

2012-02-09 Thread Steven Wong
-Original Message- From: Kevin Wilfong [mailto:kevinwilf...@fb.com] Sent: Thursday, February 09, 2012 10:59 AM To: d...@hive.apache.org Subject: Post-gres Hive Metastore Hello, Does anyone still use postgres as the Hive metastore SQL backend? I'm in the process of writing scripts to upg

RE: Last value for a column

2012-01-27 Thread Steven Wong
Other than writing a custom UDAF or TRANSFORM script, a somewhat ugly way is something like: SELECT user_id, split(max(concat(time, '_', colour)), '_')[1] FROM T GROUP BY user_id From: mdefoinplatel@orange.com [mailto:mdefoinplatel@orange.com] Sent: Thursday, January 26, 2012 3:24 AM To

RE: dropping an "external" table without deleting the data

2012-01-24 Thread Steven Wong
You can change your table to external first and then drop it: ALTER TABLE my_table SET TBLPROPERTIES ('EXTERNAL'='FALSE'); Please test it on an unimportant table first. From: Igor Tatarinov [mailto:i...@decide.com] Sent: Tuesday, January 24, 2012 3:55 PM To: user@hive.apache.org Subject: droppi

RE:

2012-01-11 Thread Steven Wong
Try fetchOne or fetchN. From: Lu, Wei [mailto:w...@microstrategy.com] Sent: Tuesday, January 10, 2012 11:14 PM To: Lu, Wei; user@hive.apache.org Subject: RE: BTW, I am using hive 0.7 From: Lu, Wei Sent: Wednesday, January 11, 2012 3:13 PM To: 'user@hive.apache.org' Subject: Hi, I am using Thri

RE: Jira is down?

2012-01-02 Thread Steven Wong
http://monitoring.apache.org/status/ From: Aniket Mokashi [mailto:aniket...@gmail.com] Sent: Monday, January 02, 2012 4:04 PM To: user@hive.apache.org Subject: Jira is down? Looks like asf jira is down. Is this a scheduled downtime? Where should I subscribe to get updates about it? https://iss

RE: What is best way to load data into hive tables/hadoop file system

2011-11-01 Thread Steven Wong
Run multiple concurrent LOAD DATAs, one per file. Alternatively, if your TT nodes have access to the source file system, use a map-only Hadoop job, such as distcp. From: Shantian Purkad [mailto:shantian_pur...@yahoo.com] Sent: Monday, October 31, 2011 4:34 PM To: common-u...@hadoop.apache.org;

RE: pass entire row as parameter in hive UDF

2011-11-01 Thread Steven Wong
Would https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification work for you in the meantime? From: Chen Song [mailto:chens_alb...@yahoo.com] Sent: Monday, October 31, 2011 9:15 AM To: hive dev list; hive user list Subject: pass entire

RE: High number of input files problems

2011-11-01 Thread Steven Wong
I suspect very few people are still using Hive 0.6 or older. Try upgrading. From: Florin Diaconeasa [mailto:florin.diacone...@gmail.com] Sent: Monday, October 31, 2011 6:37 AM To: user@hive.apache.org Subject: High number of input files problems Hello, Lately our user base has increased so the

hive.map.aggr

2011-11-01 Thread Steven Wong
I have a query doing JOIN and GROUP BY: SELECT ... FROM x JOIN y ON (...) GROUP BY ...; In the first MR job, the reduce phase performs hash aggregation when hive.map.aggr=true, but the reduce phase doesn't perform hash aggregation when hive.map.aggr=false. Why does hive.map.aggr affect the redu

RE: External table over a SequenceFile

2011-10-26 Thread Steven Wong
By default, Hive tables use LazySimpleSerDe, which requires that all the columns be stuffed into the sequence file's value, separated by delimiters. From: Laurent Vaills [mailto:laurent.vai...@gmail.com] Sent: Wednesday, October 26, 2011 1:14 PM To: user@hive.apache.org Subject: External table o

RE: Problem With HDFS USERS using JDBC

2011-10-26 Thread Steven Wong
It is unclear to me if Hive JDBC and Hive Server support changing the user to a user not used by the Hive Server. Can someone familiar with Hive authentication please comment? From: Gabriel Eisbruch [mailto:gabrieleisbr...@gmail.com] Sent: Tuesday, October 25, 2011 12:07 PM To: user@hive.apache

RE: Running hive on large number of files in S3

2011-10-20 Thread Steven Wong
If you are using Amazon EMR, you can set hive.optimize.s3.query=true to speed up part b. See https://forums.aws.amazon.com/ann.jspa?annID=1105 for more info. From: Ashutosh Chauhan [mailto:hashut...@apache.org] Sent: Thursday, October 20, 2011 1:21 PM To: user@hive.apache.org Subject: Re: Runnin

RE: Hive Transform functionality using map datatype

2011-10-17 Thread Steven Wong
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TypingtheoutputofTRANSFORM -Original Message- From: Mark Grover [mailto:mgro...@oanda.com] Sent: Monday, October 17, 2011 3:02 PM To: user@hive.apache.org Cc: Baiju Devani; Bob Tiernay; Deny

RE: Tables not accessible after restarting Amazon EC2 instances

2011-10-12 Thread Steven Wong
Why not change the old hostnames in the metadata? You can't do that via Hive DDL, you'd have to do that to the metadata store directly. It'll be interesting to know if that fixes the rest of the problem. From: Agarwal, Ravindra (ASG) [mailto:ravindra_agar...@syntelinc.com] Sent: Tuesday, Octobe

RE: Copying a Hive metastore

2011-10-10 Thread Steven Wong
ago. I restored one metastore to HQL and run this hql file to finish the task. Just as Edward says, I interact with HQL directly. Hope this helps. On 10/1/11, Steven Wong wrote: > I think going to MySQL directly will have the problem of colliding internal > ids (e.g. TBLS.TBL_ID). I don

RE: HiveDerbyServerMode

2011-10-10 Thread Steven Wong
I don't have experience with this setup, but apparently the page you're looking for has moved to https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode. -Original Message- From: Matt Kennedy [mailto:stinkym...@gmail.com] Sent: Monday, October 10, 2011 9:17 AM To: user@hiv

RE: Best way to import complex data into Hive?

2011-10-05 Thread Steven Wong
http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201009.mbox/%3c4f6b25afffcafe44b6259a412d5f9b101c07a...@exchmbx104.netflix.com%3E is a similar situation. Basically you have to use the default delimiters. From: Mark Kerzner [mailto:mark.kerz...@shmsoft.com] Sent: Wednesday, October 05,

RE: How does COLLECTION ITEMS work in Hive?

2011-10-04 Thread Steven Wong
COLLECTION ITEMS refers to the ARRAY column type. For more info on arrays, see: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF From: Mark Kerzner [mailto:mark.kerz...@shmsoft.com] Sent: Tuesday, October 04,

RE: Copying a Hive metastore

2011-09-30 Thread Steven Wong
l.com] Sent: Friday, September 30, 2011 3:16 PM To: user@hive.apache.org Subject: Re: Copying a Hive metastore On Fri, Sep 30, 2011 at 5:05 PM, Steven Wong mailto:sw...@netflix.com>> wrote: Hi, What is a good way to copy the entire content of a Hive metastore and insert it into a

Copying a Hive metastore

2011-09-30 Thread Steven Wong
Hi, What is a good way to copy the entire content of a Hive metastore and insert it into another Hive metastore? The second metastore contains existing metadata that needs to be preserved. Both metastores are in MySQL, not fronted by any Hive metastore server. My guess is Hive has some metasto

RE: Best practices for storing data on Hive

2011-09-12 Thread Steven Wong
ns do. -Original Message- From: Mark Grover [mailto:mgro...@oanda.com] Sent: Monday, September 12, 2011 10:09 AM To: user@hive.apache.org Cc: Steven Wong; Travis Powell; Baiju Devani; Bob Tiernay Subject: Re: Best practices for storing data on Hive Thanks, Steven. So, am I correct in understa

RE: Best practices for storing data on Hive

2011-09-09 Thread Steven Wong
ni" , "Bob Tiernay" Sent: Thursday, September 8, 2011 9:26:10 PM Subject: Re: Best practices for storing data on Hive On Thu, Sep 8, 2011 at 8:30 PM, Steven Wong < sw...@netflix.com > wrote: I think this statement is not true: "By distributing by (and preferably orderi

RE: Hive in Read Only Mode

2011-09-08 Thread Steven Wong
Sounds like you're trying to create/modify the metastore over a read-only database connection. From: Eric Hernandez [mailto:eric.hernan...@sellingsource.com] Sent: Thursday, September 08, 2011 4:31 PM To: user@hive.apache.org Subject: Hive in Read Only Mode Hi , I am getting this error when I d

RE: Best practices for storing data on Hive

2011-09-08 Thread Steven Wong
I think this statement is not true: "By distributing by (and preferably ordering by) user_id, we can minimize seek time in the table because Hive knows where all entries pertaining to a specific user are stored." I think it is not true whether the table is bucketed on user_id or not (assuming th

RE: Hive in EC2

2011-08-31 Thread Steven Wong
When you launch an EMR cluster (or "job flow" in EMR terminology), it launches new EC2 instances, optionally with an Elastic IP assigned to the cluster's master host. One does not install EMR on existing EC2 (non-EMR) instances. -Original Message- From: MIS [mailto:misapa...@gmail.com]

RE: Hive in EC2

2011-08-31 Thread Steven Wong
EMR Hive and Apache Hive are versioned the same. From: Igor Tatarinov [mailto:i...@decide.com] Sent: Tuesday, August 30, 2011 8:27 PM To: user@hive.apache.org; jiang licht Subject: Re: Hive in EC2 The only caveat is that you are at Amazon's mercy in terms of the latest version of Hive. Also, th

RE: Re:RE: Why a sql only use one map task?

2011-08-24 Thread Steven Wong
I think mapred.max.split.size is not set by default. The max split size is not the same as the HDFS block size. From: Daniel,Wu [mailto:hadoop...@163.com] Sent: Tuesday, August 23, 2011 11:44 PM To: user@hive.apache.org Subject: Re:RE: Why a sql only use one map task? I checked my setting, all

RE: Single Map task for Hive queries

2011-08-16 Thread Steven Wong
The TERMINATED clauses don't affect how files are split among mappers. Is your hive.input.format set to org...CombineHiveInputFormat? If so, is your mapred.max.split.size set low enough? If not, there is another config to control, but I don't remember the name offhand. They are all Hadoop config

RE: how to make the data in one table available to multiple tables?

2011-08-12 Thread Steven Wong
One way is to create ly_sales as an external table and use ADD PARTITION ... LOCATION to point to the sales partitions. Unlike a non-external ("managed") table, an external table does not own its data, meaning when you DROP an external table or one of its partitions, the Hive metadata is deleted

RE: CDH3 U1 Hive Job-commit very slow

2011-08-09 Thread Steven Wong
You can tail the Hive log and see what it is doing at the time. From: air [mailto:cnwe...@gmail.com] Sent: Tuesday, August 09, 2011 1:19 AM To: user@hive.apache.org Subject: Fwd: CDH3 U1 Hive Job-commit very slow -- Forwarded message -- From: air mailto:cnwe...@gmail.com>> Date:

RE: number of maptasks of hive

2011-07-18 Thread Steven Wong
What input format are you using in Hive? (It should be the one specified by the hive.input.format setting.) -Original Message- From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Sunday, July 17, 2011 10:26 PM To: d...@hive.apache.org Subject: number of maptasks of hive Does hive split input t

RE: Failures with DELETEME tables

2011-07-11 Thread Steven Wong
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Saturday, July 09, 2011 7:42 PM To: user@hive.apache.org Subject: Re: Failures with DELETEME tables On Sat, Jul 9, 2011 at 10:24 PM, Steven Wong mailto:sw...@netflix.com>> wrote: Has anyone encountered the following excepti

RE: How to write a UDAF ?

2011-07-09 Thread Steven Wong
Have you tried https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy yet? -Original Message- From: Mapred Learn [mailto:mapred.le...@gmail.com] Sent: Saturday, July 09, 2011 6:54 PM To: user@hive.apache.org Subject: How to write a UDAF ? Hi, Could somebody point me to s

Failures with DELETEME tables

2011-07-09 Thread Steven Wong
Has anyone encountered the following exception? It is causing our SELECT queries to return incorrect results infrequently. 2011-07-06 13:46:40,225 WARN DataNucleus.Query (Log4JLogger.java:warn(106)) - Query for candidates of org.apache.hadoop.hive.metastore.model.MPartition and subclasses resu

RE: Lzo compression on Hive table

2011-07-07 Thread Steven Wong
When writing, set hive.exec.compress.output=true also. When reading, nothing special needs to be done. -Original Message- From: jonathan.hw...@accenture.com [mailto:jonathan.hw...@accenture.com] Sent: Thursday, July 07, 2011 6:07 PM To: user@hive.apache.org Subject: Lzo compression on H

RE: how to disable mapred.reduce.tasks

2011-07-01 Thread Steven Wong
Try -1, judging from this: mapred.reduce.tasks -1 The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value. B

RE: Resend -> how to load sequence file with decimal data

2011-06-24 Thread Steven Wong
Not sure if this is what you’re asking for: Hive has a LOAD DATA command. There is no decimal data type. From: Mapred Learn [mailto:mapred.le...@gmail.com] Sent: Thursday, June 23, 2011 7:25 AM To: user@hive.apache.org; mapreduce-u...@hadoop.apache.org; cdh-u...@cloudera.org Subject: Resend ->

RE: Hive running out of memory

2011-06-21 Thread Steven Wong
Is the OOM in the Hive client? If so, you should try increasing its max heap size by setting the env var HADOOP_HEAPSIZE. One place to set it in is hive-env.sh; see /home/hadoop/.versions/hive-0.7/conf/hive-env.sh.template for more info. From: Igor Tatarinov [mailto:i...@decide.com] Sent: Tues

RE: Issue on using hive Dynamic Partitions on larger tables

2011-06-17 Thread Steven Wong
The name of the parameter is actually hive.exec.max.created.files. The wiki has a typo, which I'll fix. From: Bejoy Ks [mailto:bejoy...@yahoo.com] Sent: Thursday, June 16, 2011 9:35 AM To: hive user group Subject: Issue on using hive Dynamic Partitions on larger tables Hi Hive Experts I'm f

RE: left outer join on same table

2011-06-11 Thread Steven Wong
I think you can also move the condition T2.field6='yyy' into the ON clause. From: Igor Tatarinov [mailto:i...@decide.com] Sent: Friday, June 10, 2011 9:31 PM To: user@hive.apache.org Subject: Re: left outer join on same table The condition T2.field6='yyy;' is tested after the outer join.

Hive error with hive.exec.parallel=true

2011-06-08 Thread Steven Wong
I get the following ClosedByInterruptException often - but not always - when running a query with hive.exec.parallel=true. It seems to happen only when 2 MR jobs are being launched in parallel. I doubt I'm the first person to have seen this error in this scenario, but googling didn't help me. An

RE: Hive logging concurrency

2011-06-02 Thread Steven Wong
using RollingFileAppender to avoid "synchronization issues and data loss." On the third hand, I really should look into using separate log files for separate Hive clients. Interleaved log lines from concurrent Hive clients make debugging difficult. From: Steven Wong [mailto:sw...@netflix.com] Sent:

RE: Dose block size determine the number of map task

2011-06-02 Thread Steven Wong
I always set it, so am not sure what the behavior is if it is not set. You should probably always set it. See the comments/code in CombineFileInputFormat.java for detail. From: Junxian Yan [mailto:junxian@gmail.com] Sent: Wednesday, June 01, 2011 7:54 PM To: Steven Wong; user

RE: Logging MySQL queries

2011-06-02 Thread Steven Wong
It's my bad. I was editing hive/conf/hive-log4j.properties, but turns out hive is actually a symlink to the 0.5 tree, not 0.7. So, the properties have an effect in 0.7 for me now. From: Steven Wong [mailto:sw...@netflix.com] Sent: Tuesday, May 24, 2011 5:01 PM To: user@hive.apache.org Su

RE: Dose block size determine the number of map task

2011-06-01 Thread Steven Wong
When using CombineHiveInputFormat, parameters such as mapred.max.split.size (and others) help determine how the input is split across mappers. Other factors include whether your input files' format is a splittable format or not. Hope this helps. From: Junxian Yan [mailto:junxian@gmail.com]

Hive logging concurrency

2011-06-01 Thread Steven Wong
By default, all Hive clients log to the same file called hive.log via DRFA. What I'm seeing is that many log lines are "lost" after hive.log is rolled over to hive.log.-MM-DD. Is this an issue with DRFA? What do folks do to avoid this problem when using concurrent Hive clients? Thanks. Stev

RE: Multiple Hive queries at once

2011-05-26 Thread Steven Wong
There are jiras saying that Hive Server has issues with concurrent queries, such as HIVE-80 and HIVE-1019. (Hive Server, not Hive CLI.) I don't use Hive Server heavily, so I cannot confirm or deny. From: jonathan.hw...@accenture.com [mailto:jonathan.hw...@accenture.com] Sent: Wednesday, May 25,

RE: Logging MySQL queries

2011-05-24 Thread Steven Wong
: Steven Wong [mailto:sw...@netflix.com] Sent: Monday, May 23, 2011 4:13 PM To: user@hive.apache.org Subject: RE: Logging MySQL queries After posting my question, I did some digging and also found the log4j categories. Unfortunately, setting them to DEBUG in hive-log4j.properties has no effect

RE: hive storing a byte array

2011-05-24 Thread Steven Wong
ading each byte into a byte array, before I can use it. Given both approaches, which one do you think has the least performance overhead? Thanks, Luke On 5/23/11 6:59 PM, "Steven Wong" wrote: >Hive does not support the blob data type. An option is to store your >binary data encode

RE: hive storing a byte array

2011-05-23 Thread Steven Wong
Hive does not support the blob data type. An option is to store your binary data encoded as string (such as using base64) and define them in Hive as string. -Original Message- From: Luke Forehand [mailto:luke.foreh...@networkedinsights.com] Sent: Monday, May 23, 2011 1:21 PM To: user@hi

RE: Logging MySQL queries

2011-05-23 Thread Steven Wong
.* categories are the ones you are interested in. Another option which may work is better is to use the log4jdbc proxy driver: http://code.google.com/p/log4jdbc/ Hope this helps. Carl On Mon, May 23, 2011 at 2:45 PM, Steven Wong mailto:sw...@netflix.com>> wrote: My Hive metastore uses MySQ

Logging MySQL queries

2011-05-23 Thread Steven Wong
My Hive metastore uses MySQL. I'd like to see Hive CLI log all SQL queries that are issued to MySQL. What config/property should I set to accomplish this? Thanks. Steven

RE: Maximum Number of Hive Partitions = 256?

2011-05-03 Thread Steven Wong
I have way more than 256 partitions per table. AFAIK, there is no partition limit. >From your stack trace, you have some host name issue somewhere. From: Time Less [mailto:timelessn...@gmail.com] Sent: Tuesday, May 03, 2011 6:52 PM To: user@hive.apache.org Subject: Maximum Number of Hive Partit

RE: Export data with column names

2011-05-03 Thread Steven Wong
there a way that I can export data out to the local file system with the header information? Thank you, Ranjith N. Raghunath -Original Message- From: Steven Wong [mailto:sw...@netflix.com] Sent: Tuesday, May 03, 2011 3:20 PM To: user@hive.apache.org Subject: RE: Export data with column

RE: Export data with column names

2011-05-03 Thread Steven Wong
@hive.apache.org Subject: Re: Export data with column names Look out for NPEs if you happen to set this option to true before doing add/drop table statements. On May 3, 2011, at 11:10, Steven Wong wrote: > set hive.cli.print.header=true; > > > -Original Message- > From: Rag

RE: Export data with column names

2011-05-03 Thread Steven Wong
set hive.cli.print.header=true; -Original Message- From: Raghunath, Ranjith [mailto:ranjith.raghuna...@usaa.com] Sent: Tuesday, May 03, 2011 11:03 AM To: 'user@hive.apache.org' Subject: Re: Export data with column names Thanks. Is there something I can do with cli? Thanks, Ranjith

RE: Selecting an entire map, not just one element with Squirrel?

2011-04-28 Thread Steven Wong
It should work starting with 0.7 (both client and server need to be 0.7). As for the keys, see HIVE-1734. -Original Message- From: Sunderlin, Mark [mailto:mark.sunder...@teamaol.com] Sent: Thursday, April 28, 2011 10:57 AM To: 'user@hive.apache.org' Subject: Selecting an entire map, not

RE: Mapper OOMs disappear after disabling JVM reuse

2011-04-08 Thread Steven Wong
] Sent: Friday, April 08, 2011 9:50 AM To: user@hive.apache.org Cc: Steven Wong Subject: Re: Mapper OOMs disappear after disabling JVM reuse I had a similar problem until I set this parameter to 1 (although 3 seems to work fine too). There is an explanation somewhere on the web. Basically, if you run

Mapper OOMs disappear after disabling JVM reuse

2011-04-07 Thread Steven Wong
When the following query was run with mapred.job.reuse.jvm.num.tasks=20, some of the map tasks failed with "Error: Java heap space", causing the job to fail. After changing to mapred.job.reuse.jvm.num.tasks=1, the job succeeded. FROM ( FROM intable1 SELECT acct_id, esn) b JOIN ( FROM

RE: Hadoop error 2 while joining two large tables

2011-03-16 Thread Steven Wong
In addition, put the smaller table on the left-hand side of a JOIN: SELECT ... FROM small_table JOIN large_table ON ... From: Bejoy Ks [mailto:bejoy...@yahoo.com] Sent: Wednesday, March 16, 2011 11:43 AM To: user@hive.apache.org Subject: Re: Hadoop error 2 while joining two large tables Hey had

RE: UDAF documentation

2011-03-10 Thread Steven Wong
Take a look at http://wiki.apache.org/hadoop/Hive/GenericUDAFCaseStudy, in case you haven't found it already. -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, March 10, 2011 6:18 PM To: user@hive.apache.org Cc: Christopher, Pat Subject: Re: UDAF do

RE: cannot start the transform script. reason : "argument list too long"

2011-03-01 Thread Steven Wong
Looks like this is the command line it was executing: 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing [/usr/bin/python2.6, user_id_output.py, hbase] From: Irfan Mohammed [mailto:irfan...@gmail.com] Sent: Tuesday, March 01, 2011 1:39 PM To: user@hive.apache.

RE: On compressed storage : why are sequence files bigger than text files?

2011-01-19 Thread Steven Wong
Here's a simple check -- look inside one of your sequence files: hadoop fs -cat /your/seq/file | head If it is compressed, the header will contain the compression codec's name and the data will look gibberish. Otherwise, it is not compressed. -Original Message- From: Ajo Fod [mailto:aj

RE: How to implement JOIN ON (a.key <> b.key) or ...NOT IN... semantics

2010-12-17 Thread Steven Wong
This is the way to go. -Original Message- From: Leo Alekseyev [mailto:dnqu...@gmail.com] Sent: Wednesday, December 15, 2010 6:10 PM To: Subject: How to implement JOIN ON (a.key <> b.key) or ...NOT IN... semantics I need to get rows from table A where the key is not present in table B.

Metastore compatibility

2010-12-10 Thread Steven Wong
Is it safe to share a 0.7 metastore between 0.7 clients/servers and 0.5 clients/servers? Thanks.

RE: Query output formatting

2010-12-06 Thread Steven Wong
, bitrate,0))/SUM(IF(cdn=8, 1, 0)) avgBitrateCdn8, SUM(IF(cdn=9, bitrate,0))/SUM(IF(cdn=9, 1, 0)) avgBitrateCdn9 -- You will need more IFs to handle 0 denominators. FROM fact_table GROUP BY hour Dilip On Mon, Dec 6, 2010 at 1:01 PM, Steven Wong mailto:sw...@netflix.com>> wr

RE: Query output formatting

2010-12-06 Thread Steven Wong
set() udaf. (And use lateral view join and explode if you want operate on the set data.) On Mon, Dec 6, 2010 at 1:01 PM, Steven Wong wrote: > I have this query to calculate some averages: > > > > select hour, cdn, avg(bitrate) from fact_table group by hour, cdn > > 1 

Query output formatting

2010-12-06 Thread Steven Wong
I have this query to calculate some averages: select hour, cdn, avg(bitrate) from fact_table group by hour, cdn 1 8 a 1 9 b 2 8 c 3 8 d 3 9 e But I want the output to

RE: I define one UDF function, the UDf retunr List ,but When I use ResultSet to receive result hive throw exception

2010-10-19 Thread Steven Wong
Your Hive version is not latest trunk, right? I suspect the error is fixed in HIVE-1378 in trunk. From: lei liu [mailto:liulei...@gmail.com] Sent: Tuesday, October 19, 2010 1:41 AM To: hive-u...@hadoop.apache.org Subject: I define one UDF function, the UDf retunr List ,but When I use ResultSet

RE: Exception in hive startup

2010-10-13 Thread Steven Wong
s should be documented in README.txt > > On Wed, Oct 13, 2010 at 6:14 PM, Steven Wong wrote: >> >> You need to run hive_root/build/dist/bin/hive, not hive_root/bin/hive. >> >> >> >> >> >> From: hdev ml [mailto:hde...@gmail.com] >> Sent: We

RE: Exception in hive startup

2010-10-13 Thread Steven Wong
You need to run hive_root/build/dist/bin/hive, not hive_root/bin/hive. From: hdev ml [mailto:hde...@gmail.com] Sent: Wednesday, October 13, 2010 2:18 PM To: hive-u...@hadoop.apache.org Subject: Exception in hive startup Hi all, I installed Hadoop 0.20.2 and installed hive 0.5.0. I followed all

RE: RE: hive query doesn't seem to limit itself to partitions based on the WHERE clause

2010-10-08 Thread Steven Wong
07'), so I would think that the comparison would work. '08' is greater than '07'. I'll try your suggestions, maybe the cast or changing the data type will work. Thanks, Marc On Wed, Oct 6, 2010 at 7:56 PM, Edward Capriolo mailto:edlinuxg...@gmail.com>> wro