Re: data transfer from rdbms to hive

2014-05-02 Thread Matt Tucker
It sounds like you might need to export. Via sqoop using a query or view, as the date granularity in your MySQL table is different from the desired Hive table. The overall performance may be lower as MySQL must do more than just read rows from disk, but you may still find ways to get the data in pa

Re: Setting limits for hive users

2014-04-23 Thread Matt Tucker
I would recommend looking at Hive Strict Mode first. It helps users think about their queries by throwing errors for certain operations that may be unexpectedly bad, like a full table scan over a partitioned fact table, when only a certain subset may be needed. http://my.safaribooksonline.com/book

Re: Dynamic columns in Hive Table - Best Design for the problem

2013-12-28 Thread Matt Tucker
It looks like you're essentially doing a pivot function. Your best bet is to write a custom UDAF or look at the windowing functions available in recent releases. Matt On Dec 28, 2013 12:57 PM, "Raj Hadoop" wrote: > Dear All Hive Group Members, > > I have the following requirement. > > Input: > >

Re: HQL *_utc_timestamp functions' timezone parameter?

2013-06-07 Thread Matt Tucker
Hi Mark, I ran into the same issue last week, and ended up writing a small app to print the results of getAvailableIDs()in java.util.TimeZone. Having to convert from GMT to Pacific Time (accounting for Daylight Sa

Re: .sql vs. .hql

2013-05-31 Thread Matt Tucker
Hi Keith, In regards to filename conventions, there isn't a hard requirement to use a specific extension. Aside from the two extensions already mentioned ('.hql','.sql'), there is a 3rd option commonly used in the Hive test queries

Re: Variable Substitution

2013-03-06 Thread Matt Tucker
ng the literal text in place makes it easier to notice. >> Although, Unix shells would insert an empty string, so never mind ;) >> >> On Wed, Mar 6, 2013 at 3:13 PM, Matt Tucker wrote: >>> Using CDH3u3 (Hive 0.7.1), it appears that variable substitution becomes >&g

HIVE-2915: Partitioned Tables in Hive Metastore

2013-01-17 Thread Matt Tucker
Hi, I'm currently using CDH3u3 and Hive 0.7.1, and I'm looking into how the metadata is stored for partitioned tables within the RDBMS. The issue that I see is that for tables with multiple partitioning columns, there's no good way to determine which PARTITION_KEY_VALS record maps to it's logical

Re: hive regular expression

2012-12-26 Thread Matt Tucker
Parse_url and str_to_map may offer an alternative worth looking into. On Dec 26, 2012, at 1:50 PM, Dean Wampler wrote: > Hive uses Java's regex API. This tutorial provides an excellent introduction. > > http://docs.oracle.com/javase/tutorial/essential/regex/ > > dean > > On Wed, Dec 26, 2012

Re: Any existing UDTF to flatten map

2012-11-30 Thread Matt Tucker
I ended up getting an error (Hive 0.7.1), but I would have thought something like the following would work: SELECT user_id, obj_key, obj[obj_key] AS obj_item FROM ( SELECT "user1" user_id, MAP("k1", "v1", "k2", "v2") obj FROM calendar LIMIT 1 ) tmp LATER

Re: need help on writing hive query

2012-10-31 Thread Matt Tucker
asier way, please let me know. Matt Tucker On Oct 31, 2012, at 5:37 PM, Tom Brown wrote: > It wouldn't retrieve the user's path in a single string, but you could > simply select the user id and current page, ordered by the timestamp. > > It would require a second step to tur

Re: HIVE NOT EXISTS

2012-10-01 Thread Matt Tucker
Hi Mohit, Hive doesn’t support correlated subqueries. In this instance, you can do a left outer join to find values that are not in a table. SELECT “”, a.pagename, a.pagedetail, “”, a.pagetitle, a.page_id, a.pagetype FROM page_temp_ext a LEFT OUTER JO

Re: Load XML file into HIVE

2012-08-30 Thread Matt Tucker
Hi, I was working on this several months ago, and ended up having to flatten each XML document to one root node per line. I believe that the other option would be to write a custom InputFormat. Matt On Aug 30, 2012, at 3:57 PM, Sadananda Hegde wrote: > Hi, > > I would like to load an XML

Re: Passing date as command line arguments

2012-08-03 Thread Matt Tucker
Hi, In the command line, you want to wrap 20120709 in double-quotes, as they get stripped when being passed into the hiveconf variable. Matt On Aug 3, 2012, at 6:48 PM, Techy Teck wrote: > I have my below query in test1.hql file. I am trying to pass the date (dt) as > the command line argu

Re: Dates in Hive

2012-06-25 Thread Matt Tucker
I don't have the language manual in frot of me, but I would suggest casting dateint to string, then using the unix_timestamp function. From_unixtime should get it back into a date time string Matt On Jun 25, 2012, at 8:05 PM, sonia gehlot wrote: > Hi All, > > A simple question on dates, ho

Re: Partition deletion w/out using a literal partition value

2012-02-05 Thread Matt Tucker
Hive 0.8.0 has metadata optimizations ([HIVE-1003]), but your best bet is to write a shell script that executes 'show partitions ;', and then loop through the results and drop any partitions that meet your criteria. You can then create a cron job to

Re: move tables into different database

2012-01-30 Thread Matt Tucker
Someone recently posted a command to convert a managed table to an external table. If you also use that command, there's no expensive copy operation. You'd probably want to move the data into a different HDFS directory though, to keep your namespace as consistent as possible, Matt On Jan 30,

Re: split into less files

2011-11-08 Thread Matt Tucker
It sounds like you want to look at setting hive.merge.mapredfiles to true in your hive-site.xml. Just be aware that it will likely add another map step to your jobs to consolidate the files. Matt Tucker On Nov 8, 2011, at 6:19 PM, Shouguo Li wrote: > i think that has to do with y