Hi All,

I have a query that attempts to computer percentiles on some datasets that are 
well in excess of 100,000,000 rows and have thus opted to use percentile_approx 
as we are routinely overrunning the memory. I’m having trouble finding a 
threshold that I want to begin sampling. Before this dataset got so large, the 
maximum number of rows I would need to include in the percentile was about 
1,000,000. I’ve tried using 1,000,000 as a sampling threshold, 100,000, and 
even the default 10,000. For some reason this query, that previously took about 
20 minutes to run is now taking around 13 hours to complete (in the case of 
100,000 as my sampling rate). Are there some hive settings I should be 
investigating to see if I can have this query complete in a reasonable time?

--
Kevin Weiler
IT
IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
kevin.wei...@imc-chicago.com<mailto:kevin.wei...@imc-chicago.com>


________________________________

The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
"reply" and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets & Asset Management nor 
any of its related entities shall accept any responsibility for any errors, 
omissions or other inaccuracies in this information or for the consequences 
thereof, nor shall it be bound in any way by the contents of this e-mail or its 
attachments. In the event of incomplete or incorrect transmission, please 
return the e-mail to the sender and permanently delete this message and any 
attachments.

Messages and attachments are scanned for all known viruses. Always scan 
attachments before opening them.

Reply via email to