Re: Dataset for Hive

2015-04-01 Thread Chao Sun
Hi Xiaohe, You can try TPC-DS from https://github.com/hortonworks/hive-testbench. It contains large number of queries with complex joins. Chao On Wed, Apr 1, 2015 at 9:30 PM, xiaohe lan wrote: > Hi All, > > I am new to Hive. Just set up a 5 node Hadoop environment and want to have > a try on H

Re: Dataset for hive

2015-04-01 Thread xiaohe lan
Hi Vivek Veeramani, Actually, I already have that. But with the wiki dataset, I can only do "select *" queries. Thanks, Xiaohe On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani wrote: > Hi Xiaohe, > > If it's data set that you're looking for, you can find wikipedia data > dumps @ http://dumps.wi

Re: Dataset for hive

2015-04-01 Thread vivek veeramani
Hi Xiaohe, If it's data set that you're looking for, you can find wikipedia data dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the dumps @ http://meta.wikimedia.org/wiki/Data_dumps. Hope this helps.. On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan wrote: > Hi All, > > I am new

Dataset for hive

2015-04-01 Thread xiaohe lan
Hi All, I am new to Hive. Just set up a 5 nodes Hadoop environment and want to have a try on HiveQL. Is there any dataset I can download to play HiveQL. The dataset should have several tables some I can write some complex join. About 100G should be fine. Thanks, Xiaohe

Re: CamelCase using InitCap Function in Hive 0.13

2015-04-01 Thread Alexander Pivovarov
Vivek, You can see the version in two places 1. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions stringinitcap(string A)Returns string, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited

Re: adding a local jar for UDF test

2015-04-01 Thread Alexander Pivovarov
I can suggest 3 options 1. you can use JUnit test to test your UDF (e.g. TestGenericUDFLastDay) 2. you can create q file and test your UDF via mvn (look at udf_last_day.q) mvn clean install -DskipTests -Phadoop-2 cd itest/qtest mvn test -Dtest=TestCliDriver -Dqfile=udf_last_day.q -Dtest.output.ov

Re: CamelCase using InitCap Function in Hive 0.13

2015-04-01 Thread vivek veeramani
Thank you Sanjiv. I've reached out to our Infrastructure vendor to check what build we're on currently and if we can get the 1.1.0 version. But was just curious to know, is there a way we can see the build version? Thanks, Vivek On Wed, Apr 1, 2015 at 7:14 PM, @Sanjiv Singh wrote: > > Available

adding a local jar for UDF test

2015-04-01 Thread Alex Bohr
Hi, I'm developing a new UDF. I want to be able to test the new jar as I develop without having to copy the Jar up to HDFS every time I change code and recompile the Jar. I'm using Hive CLI for testing, and adding this command: add jar 'file:///home/abohr/test/hive-udf-1.0-SNAPSHOT.jar'; I've als

Re: CamelCase using InitCap Function in Hive 0.13

2015-04-01 Thread @Sanjiv Singh
Available in build 1.1.0 JIRA : https://issues.apache.org/jira/browse/HIVE-3405 Regards Sanjiv Singh Mob : +091 9990-447-339 On Wed, Apr 1, 2015 at 6:24 PM, vivek veeramani wrote: > Hi, > > I'm a relatively new user to

CamelCase using InitCap Function in Hive 0.13

2015-04-01 Thread vivek veeramani
Hi, I'm a relatively new user to Hive and was trying to format a column of String datatype from Uppercase to Camel-case. I could see the INITCAP() function in the language manual, and also could find related notes on JIRA stating it is available. But for some reason when I run my query it shows an

ORC HiveChar, HiveVarchar & HiveDecimal

2015-04-01 Thread Dave Maughan
Hi, I'm attempting to write a ReaderReader and RecordWriter for ORC that support reading and writing String for HiveChar and HiveVarchar and BigDecimal for HiveDecimal to simplify usage a little - I need Serializable types. To do this I've gone the ObjectInspector route, eg: ... extends JavaHi

回复: change Reduce_Number Of Order_By

2015-04-01 Thread r7raul1...@163.com
try set mapreduce.job.reduces r7raul1...@163.com 发件人: 郭帅 发送时间: 2015-03-27 18:17 收件人: hiveMailing 主题: change Reduce_Number Of Order_By I have some problems when I insert data with order by . The data is non-uniform , I changed the configuration below 1. set hive.optimize.sampling.orderb

Re: Adding update/delete to the hive-hcatalog-streaming API

2015-04-01 Thread Elliot West
Hi Alan, Regarding the streaming changes, I've raised an issue and submitted patches here: https://issues.apache.org/jira/browse/HIVE-10165 Thanks - Elliot. On 26 March 2015 at 23:20, Alan Gates wrote: > > > Elliot West > March 26, 2015 at 15:58 > Hi Alan, > > Yes, this is precisely our si

RE: Standard deviation (STDDEV) function calculation in Hive

2015-04-01 Thread Mich Talebzadeh
Hi Lefty, The Hive aggregate functions as you provide just states: DOUBLE stddev_pop(col) Returns the standard deviation of a numeric column in the group. DOUBLE stddev_samp(col) Returns the unbiased sample standard deviation of a numeric column in the group. There is no mention

Re: Standard deviation (STDDEV) function calculation in Hive

2015-04-01 Thread Gopal Vijayaraghavan
Hi Lefty, ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java: system.registerGenericUDAF("stddev", new GenericUDAFStd()); ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java: system.registerGenericUDAF("stddev_pop", new GenericUDAFStd()); ql/src/java/org/apache/hadoop/hi