Re: hive client OutOfMemoryError

2014-09-28 Thread Oliver Keyes
Bah. *similar error. On 28 September 2014 12:39, Oliver Keyes wrote: > I run into that error, or a simpler error, rather a lot; I consider it > pretty normal (although experiences may differ). > > A more direct way of increasing the heapsize would be export > HADOOP_HEAPSIZ

Re: hive client OutOfMemoryError

2014-09-28 Thread Oliver Keyes
ent > heapsize, then the sql is ok. > I saw CommonJoinResolver line 589: > InputStream in = new ByteArrayInputStream(xml.getBytes("UTF-8")); > > hive client need so many heapsize, is this normal > And where Can I find this large xml > > > Thanks > -- Oliver Keyes Research Analyst Wikimedia Foundation

Sampling from a single column

2014-02-11 Thread Oliver Keyes
Hey all So, what I'm looking to do is get N randomly-sampled distinct values from a column in a table. I'm kind of flummoxed by how to do this without using TABLESAMPLE, which would require me to add Yet Another Subquery (it'd be 'select these values, from this sample, from these distinct values')

Re: Joins between databases

2014-02-06 Thread Oliver Keyes
Noted for future queries, and thanks for the help! Works like a charm :). Best, -- Oliver Keyes Product Analyst Wikimedia Foundation On 6 February 2014 17:09, Stephen Sprague wrote: > ahh. we got ourselves a bona-fide head banger. Welcome to the club!! :) > > select > a

Joins between databases

2014-02-06 Thread Oliver Keyes
Hey all, So, I'm new to hive (I come to it from MySQL/MariaDB) and I've spent the last couple of days banging my head against the problem of trying to retrieve data from a join of two tables in different databases. I understand that the db.table.column syntax is not supported in hive, and that ins