Hi guys,
I spent the day today investigating this issue, it seems like the
differences occur when there are many killed tasks.
We are using the fair scheduler, I ran the queries on large data and
with low priority which caused the tasks of this job to be
preempt(killed) many times.
After I began suspecting this issue, I gave the query the highest
priority by doing that I reduced the number of killed tasks, that
seemed to solve the problem
It is not that whenever there are killed task there are differences, it
is when there many killed task because of preemption there are
differences.
What do you say?
On Tue 10 Jan 2012 11:49:35 AM IST, Guy Doulberg wrote:
Hi,
Sorry for the late answer,
I ran the query on small data, but couldn't reproduce,
I can reproduce it at the moment on data that takes about 1.5 hour to
process,
I am trying to narrow the amount of data as much as I can, and still
reproduce it...
But I think it is clear to me, that the scale of data is the reason for
the differences,
What do you think?
On Mon 09 Jan 2012 08:14:10 PM IST, Edward Capriolo wrote:
Create table, query , and some small data set to reproduce
On Monday, January 9, 2012, Guy Doulberg<guy.doulb...@conduit.com
<mailto:guy.doulb...@conduit.com>> wrote:
Thanks, I am trying to reproduce it again,
But what should I send the ML?
On Mon 09 Jan 2012 07:54:24 PM IST, Edward Capriolo wrote:
Can you reproduce the issue? possibly with the smaller tables and
send that to the ML?
Edward
On Mon, Jan 9, 2012 at 12:46 PM, Guy Doulberg
<guy.doulb...@conduit.com<mailto:guy.doulb...@conduit.com>
<mailto:guy.doulb...@conduit.com<mailto:guy.doulb...@conduit.com>>>
wrote:
Hey Dave,
I didn't understand your question,
The Inconsistant is slightly different, about 2% of differences,
Thanks
Guy
On 01/09/2012 07:05 PM, David Houston wrote:
Hi Guy,
Inconsistant by way of the results are total off or the order is
different?
Thanks
Dave
On Jan 9, 2012 5:03 PM, "Guy Doulberg"
<guy.doulb...@conduit.com<mailto:guy.doulb...@conduit.com>
<mailto:guy.doulb...@conduit.com
<mailto:guy.doulb...@conduit.com>>> wrote:
Hi guys,
We are using hive for a while now, and recently we have
encountered an issue we just can't understand,
We are selecting(the select includes count(*)) over a join of
two big tables.
We ran the same query twice consequently over the same two
tables , and each time the result were slightly different.
We don't know how should we debug this issue, where should we
look, any ideas?
Thanks
Guy Doulberg,
Data infrastructure engineer,
Conduit