Re: LIKE Statement

2012-05-14 Thread John Omernik
Well the link provided isn't really about what I originally asked about. I have not come across a SQL implementation (Postgres, MySQL, or MSSQL are the ones I have experience in) where LIKE was "by default" case sensitive with wildcards. That being said, I'm not the type to based my assertions on

Re: LIKE Statement

2012-05-14 Thread Keith Wiley
Thanks for the followup. Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "I used to be with it, but then they changed what it was. Now, what I'm with isn't it, and what's it seems we

Re: LIKE Statement

2012-05-14 Thread Edward Capriolo
By saying "OTHER DATABASES" you are implying that all other databases agree on implementation. I could not find the SQL official spec but anecdotally it seems this is not the case. http://stackoverflow.com/questions/153944/is-sql-syntax-case-sensitive With hive you have like and rlike to chose f

Re: LIKE Statement

2012-05-14 Thread Keith Wiley
On Apr 4, 2012, at 06:40 , John Omernik wrote: > I think the like statement should be changed to be case-insensitive to match > it's function in other DBMS Thoughts? Out of curiosity, was there any activity on this issue? I see John's original post in the archives (~5wks ago) with no followup

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Nanda Vijaydev
Hadoop in general does well with fewer large data files instead of more smaller data files. RDBMS type of indexing and run time optimization is not exactly available in Hadoop/Hive yet. So one suggestion is to combine some of this data, if you can, into fewer tables as you are doing sqoop. Even if

NaNs and Infinity support in HIVE?

2012-05-14 Thread Sukhendu Chakraborty
Are NaNs and/or Infinity supported in HIVE? If yes, I wanted to know how are NaNs and Infinity values represented in HDFS files to be interpreted correctly in Hive. When I do 'select 1/0 from tab', I get a text value, "Infinity". However, when I enter "Infinity" v in my HDFS file represented by th

Re: Managed vs external tables in hive

2012-05-14 Thread Mark Grover
Ranjith, If the schema of the data changes, when using external tables, you can drop the table and re-create it on the same dataset taking care of the schema changes (hopefully, maintaining backwards compatibility). I think you can still achieve that using alter table commands with managed tabl

Re: Order by Sort by partitioned columns

2012-05-14 Thread Mark Grover
Hi Shin, If you could list the query that failed and the query used to create the tables in question, that would be very helpful. Mark - Original Message - From: "Shin Chan" To: "HIVE User" Sent: Monday, May 14, 2012 2:28:06 AM Subject: Order by Sort by partitioned columns Hi All Ju

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Bhavesh Shah
Thanks Nitin for your continous support. *Here is my data layout and change the queries as per needed*: 1) Initially after importing the tables from MS SQL Server, 1st basic task I am doing is that *PIVOTING.* As SQL stores data in name value pair. 2) Pivoting results in subset of data, Using th

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Nitin Pawar
partitioning is mainly used when you want to access the table based on value of a particular column and dont want to go through entire table for same operation. This actually means if there are few columns whose values are repeated in all the records, then you can consider partitioning on them. Oth

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Bhavesh Shah
Hello Nitin, Thanks for suggesting me about the partition. But I want to tell one thing that I forgot to mention before is that :* I am using Indexes on all tables tables which are used again and again. * But the problem is that after execution I didn't see the difference in performance (before app

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Justin Coffey
You can also have a reduce-side bottleneck if, for example, you are doing distinct counts or with skewed group sizes (ie one aggregation group is much larger than others). But to know this you really need to look at the stats of your jobs via the jobtracker and even the progress counter output of

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Nitin Pawar
it is definitely possible to increase your performance. I have run queries where more than 10 billion records were involved. If you are doing joins in your queries, you may have a look at different kind of joins supported by hive. If one of your table is very small in size compared to another tabl

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Bhavesh Shah
That I fail to know, how many maps and reducers are there. Because due to some reason my instance get terminated :( I want to know one thing that If we use multiple nodes, then what should be the count of maps and reducers. Actually I am confused about that. How to decide it? Also I want to try

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Nitin Pawar
with a 10 node cluster the performance should improve. how many maps and reducers are being launched? On Mon, May 14, 2012 at 1:18 PM, Bhavesh Shah wrote: > I have near about 1 billion records in my relational database. > Currently locally I am using just one cluster. But I also tried this on >

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Bhavesh Shah
I have near about 1 billion records in my relational database. Currently locally I am using just one cluster. But I also tried this on Amazon Elastic Mapreduce with 10 nodes. But the time taken to execute the complete program is same as that on my single local machine. On Mon, May 14, 2012 at 1:1

Re: Is my Use Case possible with Hive?

2012-05-14 Thread Nitin Pawar
how many # records? what is your hadoop cluster setup? how many nodes? if you are running hadoop on a single node setup with normal desktop, i doubt it will be of any help. You need a stronger cluster setup for better query runtimes and ofcourse query optimization which I guess you would have alr

Is my Use Case possible with Hive?

2012-05-14 Thread Bhavesh Shah
Hello all, My Use Case is: 1) I have a relational database which has a very large data. (MS SQL Server) 2) I want to do analysis on these huge data and want to generate reports on it after analysis. Like this I have to generate various reports based on different analysis. I tried to implement thi