awesome. Can the result of query ( count value) be saved in hive variable(tmp variable)?. I will be running this query using "hive -f <query file name>".
On Thu, Jan 23, 2014 at 3:59 PM, Dima Machlin <[email protected]>wrote: > Counting in your case isn’t practical as the expected count can be > greater than the amount of rows which isn’t possible using the count > function. > > > > What you can do is multiply the amount of rows by the amount of columns. > > This can be done using a different table (like dual if you’d create it) > > Let assume you have the following table : > > Create table temp (num int); > > And it contains values from 1 to N (N rows). > > (You can simply create this table) > > > > Now you can do the following : > > > > select count(distinct case when temp.n=1 then t.colA else t.colB end) > > from table t join temp on (temp.n<=2) > > > > This way your table with be joined with two rows containing the numbers 1 > and 2 > > So in your example this : > > Column A > > COLUMN B > > 1 > > 2 > > 2 > > 3 > > 5 > > 6 > > 4 > > 7 > > 1 > > 2 > > 4 > > 2 > > > > After the join will become : > > Column A > > COLUMN B > > Temp.n > > 1 > > 2 > > 1 > > 2 > > 3 > > 1 > > 5 > > 6 > > 1 > > 4 > > 7 > > 1 > > 1 > > 2 > > 1 > > 4 > > 2 > > 1 > > 1 > > 2 > > 2 > > 2 > > 3 > > 2 > > 5 > > 6 > > 2 > > 4 > > 7 > > 2 > > 1 > > 2 > > 2 > > 4 > > 2 > > 2 > > > > > > > > > > And after the Case you will have : > > Column A > > COLUMN B > > Temp.n > > 1 > > > > 1 > > 2 > > > > 1 > > 5 > > > > 1 > > 4 > > > > 1 > > 1 > > > > 1 > > 4 > > > > 1 > > > > 2 > > 2 > > > > 3 > > 2 > > > > 6 > > 2 > > > > 7 > > 2 > > > > 2 > > 2 > > > > 2 > > 2 > > > > > > > > > > And these columns will be merged into one and count distinct will be > applied. > > > > *From:* Bogala, Chandra Reddy [mailto:[email protected]] > *Sent:* Thursday, January 09, 2014 7:17 AM > *To:* '[email protected]' > *Subject:* RE: merge columns and count no of records > > > > Or Is it good idea to get data into shell variable/file and doing > processing. Or using a pig script to do? > > > > hive -e 'select distinct(columnA), distinct(columnB) from blah' | sed > 's/[\t]/,/g' >/tmp/test > > > > > > Thanks, > > Chandra > > > > > > *From:* Bogala, Chandra Reddy [Tech] > *Sent:* Wednesday, January 08, 2014 5:49 PM > *To:* '[email protected]' > *Subject:* merge columns and count no of records > > > > Hi, > > My requirement is to merge ( not concat ) two columns and count number of > distinct records. I can use self-join on column A and column B and can > count number of records. > > But looks not optimal way of doing. Is there any better way to do. > > > > Ex: Original table > > Column A > > COLUMN B > > 1 > > 2 > > 2 > > 3 > > 5 > > 6 > > 4 > > 7 > > 1 > > 2 > > 4 > > 2 > > > > Logic something like this: Count(Distinct(Merge (distinct(A),distinct(B)))) > > Query OUTPUT should be :7 > > Values {1,2,3,4,5,6,7} > > > > Thanks, > > Chandra > > > > > > > ************************************************************************************ > This footnote confirms that this email message has been scanned by > PineApp Mail-SeCure for the presence of malicious code, vandals & computer > viruses. > > ************************************************************************************ > > = >
