RE: Load data throws exception, cant figure it out

2012-07-02 Thread Ruben de Vries
Figured it out myself by running with ' -hiveconf hive.root.logger=INFO,console' that it was binlog --- Hey Guys, I've been playing with Hive for a while now but somehow I run into this error all of a sudden when setting up my product

Load data throws exception, cant figure it out

2012-07-02 Thread Ruben de Vries
Hey Guys, I've been playing with Hive for a while now but somehow I run into this error all of a sudden when setting up my production cluster. $ hive -e 'LOAD DATA INPATH "/tmp/members_map_2012-06-30.map" OVERWRITE INTO TABLE members_map_full;' Loading data to table hyves_goldmine.members_map_

RE: loading data in an array within a map

2012-06-28 Thread Ruben de Vries
Sorry, can't help you with your specific problem, but incase you're really stuck; I used the JSON serde (https://github.com/rcongiu/Hive-JSON-Serde this one is better then the default one) and it converts nested arrays into maps perfectly. From: Bhaskar, Snehalata [mailto:snehalata_bhas...@syn

RE: Hive-0.8.1 PHP Thrift client broken?

2012-06-18 Thread Ruben de Vries
Going to bump this one since I hope to be able to contribute some (worth a bump :P) -Original Message- From: Ruben de Vries [mailto:ruben.devr...@hyves.nl] Sent: Friday, June 15, 2012 11:59 AM To: user@hive.apache.org Subject: Hive-0.8.1 PHP Thrift client broken? Hey Guys, I've

Hive-0.8.1 PHP Thrift client broken?

2012-06-15 Thread Ruben de Vries
Hey Guys, I've been slamming my head into a wall before on this issue, but now that I'm a bit more familiar with Hive and Thrift (I got the python version working) I figured I should try fixing the problem or find out more about it to contribute some to the project too :) The php thriftclient

RE: Hive scratch dir not cleaning up

2012-06-01 Thread Ruben de Vries
on completion of the job. Though failed / killed jobs leave data there, which needs to be removed manually. Thanks, Vinod http://blog.vinodsingh.com/ On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries wrote: Hey Hivers,   I’m almost ready to replace our old hadoop implementation with a

Hive scratch dir not cleaning up

2012-06-01 Thread Ruben de Vries
Hey Hivers, I'm almost ready to replace our old hadoop implementation with a implementation using Hive, Now I've ran into (hopefully) my last problem; my /tmp/hive-hduser dir is getting kinda big! It doesn't seem to cleanup this tmp files, googling for it I run into some tickets about a cleanu

RE: table design and performance questions

2012-05-29 Thread Ruben de Vries
Partitioning can greatly increase performance for WHERE clauses since hive can omit parsing the data in the partitions which do no meet the requirement. For example if you partition by date (I do it by INT dateint, in which case I set dateint to be MMDD) and you do WHERE dateint >= 20120101 t

RE: Job Scheduling in Hadoop-Hive

2012-05-29 Thread Ruben de Vries
Hey, We use hadoop/hive for processing our access logs and we run a daily cronjob (python script) which does the parsing jobs and some partitioning etc. The results from those jobs are then queried on by other jobs which generate the results the management team wants to see :-) From: Ronak Bh

JOIN + LATERAL VIEW works, but + MAPJOIN and no longer get any results

2012-05-22 Thread Ruben de Vries
.com/2499658 I've also created a ticket in JIRA but it doesn't seem to get any attention at all: https://issues.apache.org/jira/browse/HIVE-2992 Greetz, Ruben de Vries

RE: JOIN + LATERAL VIEW + MAPJOIN = no output?!

2012-05-01 Thread Ruben de Vries
I really do feel like this isn't as intended, should I make a ticket in JIRA? -Original Message- From: Ruben de Vries [mailto:ruben.devr...@hyves.nl] Sent: Thursday, April 26, 2012 3:37 PM To: user@hive.apache.org Subject: RE: JOIN + LATERAL VIEW + MAPJOIN = no output?!

RE: JOIN + LATERAL VIEW + MAPJOIN = no output?!

2012-04-26 Thread Ruben de Vries
https://gist.github.com/2499658 and this is the plan.xml its using -Original Message- From: Ruben de Vries [mailto:ruben.devr...@hyves.nl] Sent: Thursday, April 26, 2012 3:17 PM To: user@hive.apache.org Subject: JOIN + LATERAL VIEW + MAPJOIN = no output?! Okay first off; so JOIN

JOIN + LATERAL VIEW + MAPJOIN = no output?!

2012-04-26 Thread Ruben de Vries
Okay first off; so JOIN + LATERAL VIEW together isn't working so I moved my JOIN into a subquery and that makes the query work properly However when I added a MAPJOIN hint for the JOIN in the subquery it will also stop doing the reducer for the main query! This only happens when there's a LATERA

RE: When/how to use partitions and buckets usefully?

2012-04-26 Thread Ruben de Vries
rom 350sec to 110sec when being able to MAPJOIN(), gotta love that speed if it works! -Original Message- From: Ruben de Vries [mailto:ruben.devr...@hyves.nl] Sent: Thursday, April 26, 2012 9:16 AM To: user@hive.apache.org; gemini5201...@gmail.com; mgro...@oanda.com Subject: RE: When/how t

RE: When/how to use partitions and buckets usefully?

2012-04-26 Thread Ruben de Vries
NDA Corporation www: oanda.com www: fxtrade.com e: mgro...@oanda.com "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. - Original Message - From: "Ruben de Vries" To: use

RE: When/how to use partitions and buckets usefully?

2012-04-25 Thread Ruben de Vries
y 25M for small table, copy your hive-default.xml to hive-site.xml and set hive.mapjoin.smalltable.filesize=3 在 2012年4月25日 上午12:09,Ruben de Vries 写道: I got the (rather big) log here in a github gist: https://gist.github.com/2480893 And I also attached the plan.xml it was using to the gist.

RE: subquery + lateral view fails without count

2012-04-25 Thread Ruben de Vries
mgro...@oanda.com "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. - Original Message - From: "Ruben de Vries" To: user@hive.apache.org Sent: Monday, April 23, 2012 9:08:16 AM S

RE: When/how to use partitions and buckets usefully?

2012-04-24 Thread Ruben de Vries
ogging and post in the console to get a better picture why it consumes this much memory. Start your hive shell as  hive -hiveconf hive.root.logger=ALL,console; Regards Bejoy KS ____ From: Ruben de Vries To: "user@hive.apache.org" Sent: Tuesday, April 24,

FW: When/how to use partitions and buckets usefully?

2012-04-24 Thread Ruben de Vries
splitting to mappers is out of question. can you do a dfs count for the members_map table hdfslocation and tell us the result?  On Tue, Apr 24, 2012 at 2:06 PM, Ruben de Vries wrote: Hmm I must be doing something wrong,  the members_map table is 300ish MB. When I execute the following query: S

RE: When/how to use partitions and buckets usefully?

2012-04-24 Thread Ruben de Vries
46 PM Subject: Re: When/how to use partitions and buckets usefully? If you are doing a map side join make sure the table members_map is small enough to hold in memory On 4/24/12, Ruben de Vries wrote: > Wow thanks everyone for the nice feedback! > > I can force a mapside join by do

RE: When/how to use partitions and buckets usefully?

2012-04-23 Thread Ruben de Vries
Wow thanks everyone for the nice feedback! I can force a mapside join by doing /*+ STREAMTABLE(members_map) */ right? Cheers, Ruben de Vries -Original Message- From: Mark Grover [mailto:mgro...@oanda.com] Sent: Tuesday, April 24, 2012 3:17 AM To: user@hive.apache.org; bejoy ks Cc

RE: When/how to use partitions and buckets usefully?

2012-04-23 Thread Ruben de Vries
joins would offer you much performance improvement. Regards Bejoy KS Sent from handheld, please excuse typos. ________ From: Ruben de Vries mailto:ruben.devr...@hyves.nl>> Date: Mon, 23 Apr 2012 17:38:20 +0200 To: user@hive.apache.orgmailto:user@hive.apache.org%3cu...@hiv

RE: When/how to use partitions and buckets usefully?

2012-04-23 Thread Ruben de Vries
avoid having as many rows from visit_stats compared to each member_id for joins. Matt Tucker From: Ruben de Vries [mailto:ruben.devr...@hyves.nl]<mailto:[mailto:ruben.devr...@hyves.nl]> Sent: Monday, April 23, 2012 11:19 AM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject:

When/how to use partitions and buckets usefully?

2012-04-23 Thread Ruben de Vries
It seems there's enough information to be found on how to setup and use partitions and buckets. But I'm more interested in how to figure out when and what columns you should be partitioning and bucketing to increase performance?! In my case I got 2 tables, 1 visit_stats (member_id, date and some

subquery + lateral view fails without count

2012-04-23 Thread Ruben de Vries
It's a bit of a weird case but I thought I might share it and hopefully find someone who can confirm this to be a bug or tell me I should do things differently! Here you can find a pastie with the full create and select queries: http://pastie.org/3838924 I've got two tables: `visit_stats` with

RE: using the key from a SequenceFile

2012-04-19 Thread Ruben de Vries
ORMAT 'com.mycompany.SequenceFileKeyInputFormat' Dilip On Thu, Apr 19, 2012 at 6:09 AM, Owen O'Malley mailto:omal...@apache.org>> wrote: On Thu, Apr 19, 2012 at 3:07 AM, Ruben de Vries mailto:ruben.devr...@hyves.nl>> wrote: > I'm trying to migrate a part of our current hadoop j

RE: using the key from a SequenceFile

2012-04-19 Thread Ruben de Vries
d there's already code for working with Avro in MR as input.) On Apr 19, 2012, at 6:15 AM, madhu phatak wrote: Serde will allow you to create custom data from your sequence File https://cwiki.apache.org/confluence/display/Hive/SerDe On Thu, Apr 19, 2012 at 3:37 PM, Ruben de Vries mai

RE: using the key from a SequenceFile

2012-04-19 Thread Ruben de Vries
sequence File https://cwiki.apache.org/confluence/display/Hive/SerDe On Thu, Apr 19, 2012 at 3:37 PM, Ruben de Vries mailto:ruben.devr...@hyves.nl>> wrote: I'm trying to migrate a part of our current hadoop jobs from normal mapreduce jobs to hive, Previously the data was stored in se

using the key from a SequenceFile

2012-04-19 Thread Ruben de Vries
omSeqRecordReader extends SequenceFileRecordReader implements RecordReader { Hope some1 has a snippet or can help me out, would really love to be able to switch part of our jobs to hive, Ruben de Vries