Re: [VOTE] Hive 2.0 release plan

2016-01-19 Thread Hanish Bansal
Thanks sergey for quick response. I would request you to update on this group when you cut out RC for 2.0. Regards, Hanish Bansal On 20-Jan-2016 12:21 am, "Sergey Shelukhin" wrote: > Hi. > There are 2 blockers for Hive 2.0 currently. One is about to be committed, > and another is in progress, o

Re: sqoop import --hive-import failing for row_version (datatype: timestamp) column for MS SQL Server

2016-01-19 Thread 董亚军
hive does not support the data type for column row_version, you may skip this column or maps a new data type for row_version please reference: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_controlling_type_mapping On Wed, Jan 20, 2016 at 1:03 PM, sudeep mishra wrote: > Hi, > > I am

Fwd: sqoop import --hive-import failing for row_version (datatype: timestamp) column for MS SQL Server

2016-01-19 Thread sudeep mishra
Hi, I am trying to import MS SQL Server data into Hive using Sqoop import --hive-import option but it is failing for a column of datatype 'timestamp'. 16/01/20 04:50:53 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive does not support the SQL type for c

Re: the `use database` command will change the scheme of target table?

2016-01-19 Thread 董亚军
thanks Marcin, t1 created within the temp database right? which is point to HDFS. so the output directory of m/r job should be in HDFS? my problem is why the output directory was host in s3 filesystem after I *use prd* database. On Wed, Jan 20, 2016 at 11:52 AM, Marcin Tustin wrote: > That

the `use database` command will change the scheme of target table?

2016-01-19 Thread 董亚军
hi list, we use the HDFS and S3 as the Hive Filesystem at the same time. here has an issue: *scenario* 1: hive command: use default; create table temp.t1 // the database of temp which points to HDFS as select c1 from prd.t2; // the database of prd and the table t2 are all points t

Re: the `use database` command will change the scheme of target table?

2016-01-19 Thread Marcin Tustin
That is the expected behaviour. Managed tables are created within the directory of their host database. On Tuesday, 19 January 2016, 董亚军 wrote: > hi list, > > we use the HDFS and S3 as the Hive Filesystem at the same time. here has > an issue: > > > *scenario* 1: > > hive command: > > use defa

Best practices for using Parquet

2016-01-19 Thread Buntu Dev
I'm looking for converting existing Avro dataset into Parquet and wanted to know if there are any other performance related properties that I can set such as compression, block size, etc. to take advantage of the Parquet. I could only find `parquet.compression` property but would be good to know i

Re: ORC files and statistics

2016-01-19 Thread Jörn Franke
Actually it is not that different from traditional relational databases, such as Oracle Exadata, which supports storage index and recommends for DWH scenarios to avoid the traditional indexes, which are more suitable for OLTP scenarios. > On 19 Jan 2016, at 21:35, Ashok Kumar wrote: > > Thank

RE: ORC files and statistics

2016-01-19 Thread Mich Talebzadeh
Sure you have to move the mind set to Hive. In general in RDBMS an index is a different construct from the statistics kept in histograms for the table and index columns itself. For example in Oracle: Statistics for a column include: • Minimum value for the column • Maximum

Re: ORC files and statistics

2016-01-19 Thread Ashok Kumar
Thanks Owen, I got a bit confused comparing ORC with what I know about indexes in relational databases. Still need to understand it a bit better. Regards From: Owen O'Malley [mailto:omal...@apache.org] Sent: 19 January 2016 17:57 To: user@hive.apache.org; Ashok Kumar Cc: Jörn Franke Subject: R

Re: [VOTE] Hive 2.0 release plan

2016-01-19 Thread Sergey Shelukhin
Hi. There are 2 blockers for Hive 2.0 currently. One is about to be committed, and another is in progress, or may be pushed out soon. I am planning to cut an RC for Hive 2.0 this week. From: Hanish Bansal mailto:hanish.bansal.agar...@gmail.com>> Reply-To: "user@hive.apache.org

Re: [VOTE] Hive 2.0 release plan

2016-01-19 Thread Hanish Bansal
Hi, I would like to know any update about release plan for Hive 1.3.0 or 2.0.0 ?? On Tue, Dec 1, 2015 at 12:56 AM, Alan Gates wrote: > Hive 2.0 will not be 100% backwards compatible with 1.x. The following > JIRA link shows JIRAs already committed to 2.0 that break compatibility: > > https://i

Re: ORC files and statistics

2016-01-19 Thread Owen O'Malley
On Tue, Jan 19, 2016 at 9:45 AM, Ashok Kumar wrote: > Thank you both. > > So if I have a Hive table of ORC type and it contains 100K rows, there > will be 10 row groups of 10K row each. > Yes > > within each row group there will be min, max, count(distint_value) and sum > for each column withi

Re: ORC files and statistics

2016-01-19 Thread Ashok Kumar
Thank you both. So if I have a Hive table of ORC type and it contains 100K rows, there will be 10 row groups of 10K row each. within each row group there will be min, max, count(distint_value) and sum for each column within that row group. is count mean count of distinct values including null oc

Re: ORC files and statistics

2016-01-19 Thread Jörn Franke
Just be aware that you should insert the data sorted at least on the most discrimating column of your where clause > On 19 Jan 2016, at 17:27, Owen O'Malley wrote: > > It has both. Each index has statistics of min, max, count, and sum for each > column in the row group of 10,000 rows. It also

Re: ORC files and statistics

2016-01-19 Thread Owen O'Malley
It has both. Each index has statistics of min, max, count, and sum for each column in the row group of 10,000 rows. It also has the location of the start of each row group, so that the reader can jump straight to the beginning of the row group. The reader takes a SearchArgument (eg. age > 100) tha

how to set a job name for hive queries

2016-01-19 Thread Frank Luo
We are in a multi-tenant environment wanting to add a client’s name into each job name hence they can be informed/involved when job fails. We can easily do that with M/R jobs, but I haven’t figure out a way to do so for hive job. I googled and found the answer below, but I couldn’t get it to wor

ORC files and statistics

2016-01-19 Thread Ashok Kumar
Hi, I have read some notes on ORC files in Hive and indexes. The document describes in the indexes but makes reference to statistics Indexes |   | |   | |   |   |   |   |   | | IndexesIndexes ORC provides three level of indexes within each file: file level - statistics about the values in each c