Hi All, I have a query that attempts to computer percentiles on some datasets that are well in excess of 100,000,000 rows and have thus opted to use percentile_approx as we are routinely overrunning the memory. I’m having trouble finding a threshold that I want to begin sampling. Before this dataset got so large, the maximum number of rows I would need to include in the percentile was about 1,000,000. I’ve tried using 1,000,000 as a sampling threshold, 100,000, and even the default 10,000. For some reason this query, that previously took about 20 minutes to run is now taking around 13 hours to complete (in the case of 100,000 as my sampling rate). Are there some hive settings I should be investigating to see if I can have this query complete in a reasonable time?
-- Kevin Weiler IT IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | http://imc-chicago.com/ Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: kevin.wei...@imc-chicago.com<mailto:kevin.wei...@imc-chicago.com> ________________________________ The information in this e-mail is intended only for the person or entity to which it is addressed. It may contain confidential and /or privileged material. If someone other than the intended recipient should receive this e-mail, he / she shall not be entitled to read, disseminate, disclose or duplicate it. If you receive this e-mail unintentionally, please inform us immediately by "reply" and then delete it from your system. Although this information has been compiled with great care, neither IMC Financial Markets & Asset Management nor any of its related entities shall accept any responsibility for any errors, omissions or other inaccuracies in this information or for the consequences thereof, nor shall it be bound in any way by the contents of this e-mail or its attachments. In the event of incomplete or incorrect transmission, please return the e-mail to the sender and permanently delete this message and any attachments. Messages and attachments are scanned for all known viruses. Always scan attachments before opening them.