NPE error during file sink stage when inserting into bucketed table

2015-04-27 Thread Jie Zhang
Hi, I have created a simple bucketed table and would like to insert some values into the table. However, hit the following NPE during file sink. Any clue what the problem can be? I am using hive 0.14.0, with hive.enforce.bucketing setting true. Thanks very much for any help! create table test4 (i

RE: How to compare data in two tables?

2015-04-27 Thread Mich Talebzadeh
OK so we have an Oracle sequence as the PK. That sequence is monotonically increasing number so each record will have its own sequence. If you do sum(sequence_col) for each Hive table then the sum should agree. That means no row is missing. Now with regard to the rows to be the same the has

RE: How to compare data in two tables?

2015-04-27 Thread Alexander Pivovarov
Golden source is Oracle DB. Ihave two cases: 1. Tables are overwritten completly every day. 2. Tables are incrementally loaded. PK is auto incremented number in Oracle. What you think if I concat all cells of a row to a string. Get int hashcode from the string. And then sum hashcodes to get a

RE: How to compare data in two tables?

2015-04-27 Thread Mich Talebzadeh
Hi Alex, Am I correct that the source of data resides in a relational table and that table has all the data already (the golden source) sent to both instances of Hive? Is the data in Hive added incrementally daily with “operation timestamp” for each record? Also do you have a unique identif

How to compare data in two tables?

2015-04-27 Thread Alexander Pivovarov
Hi Everyone Lets say I have hive table in 2 datacenters. Table format can be textfile or Orc. There is scoop job running every day which adds data to the table. Each datacenter has its own instance of scoop job. In Ideal case scenario the data in these two table should be the same. The same mean

Re: ORC file across multiple HDFS blocks

2015-04-27 Thread Alan Gates
No, you don't want to be designing ORC files to not cross block boundaries. Engines in Hadoop (MapReduce, Tez, etc.) are all built to handle the fact that files tend to cross blocks and hence nodes. There is value in lining up stripe size and HDFS block size so that your stripes don't straddl

RE: Hive and Impala

2015-04-27 Thread Mich Talebzadeh
Hi, I agree with Douglas's sentiment in here. The main attraction of Hive in general is in data ingestion process. Hive is great in getting raw data in, in rough and ready format (disregard schema on write and get data stored as is) and make sense of data later (schema on read, turn raw dat

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Exception while processing

2015-04-27 Thread Mich Talebzadeh
Hi, Has anyone seen this error in the process of loading 1 million rows into hive table t (no partition or bucket) in batch of 10K rows at a time through a temp table Loading data to table asehadoop.rs_temp__0x150dd470_t Table asehadoop.rs_temp__0x150dd470_t stats: [numFiles=1, numRows=0,

Re: Hive and Impala

2015-04-27 Thread Moore, Douglas
Hive is great for massive transformations needed in ETL type processing and full data set analytics. Impala is better suited for fast analytical queries returning a tiny subset of the original data set. Both are improving in terms of concurrency and latency however they have a long ways to go to

Re: Hive and Impala

2015-04-27 Thread Anilkumar Kalshetti
Hi Ashok, Also Now you can use spark as execution Engine for Hive. Please check HiveOnSpark[HoS] Project. Ref Link . Thanks On 27 April 2015 at 15:22, Fabio C. wrote: > If the comparison mention just MR, then

Re: Hive and Impala

2015-04-27 Thread Fabio C.
If the comparison mention just MR, then is probably outdated. Hive can now run on Tez with a great improvement in performance. However I don't know about Hive+Tez vs Impala. On Mon, Apr 27, 2015 at 10:50 AM, Nitin Pawar wrote: > What use case are you trying to solve? > > On Mon, Apr 27, 2015 at

Re: Hive and Impala

2015-04-27 Thread Nitin Pawar
What use case are you trying to solve? On Mon, Apr 27, 2015 at 2:16 PM, Ashok Kumar wrote: > Hi gurus, > > Kindly help me understand the advantage that Impala has over Hive. > > I read a note that Impala does not use MapReduce engine and is therefore > very fast for queries compared to Hive. How

Hive and Impala

2015-04-27 Thread Ashok Kumar
Hi gurus, Kindly help me understand the advantage that Impala has over Hive. I read a note that Impala does not use MapReduce engine and is therefore very fast for queries compared to Hive. However, Hive as I understand is widely used everywhere! Thank you

Re: Get HIVE INFO & FAQ

2015-04-27 Thread @Sanjiv Singh
??? Regards Sanjiv Singh Mob : +091 9990-447-339 On Mon, Apr 27, 2015 at 11:30 AM, Fun <401299...@qq.com> wrote: > Get HIVE INFO & FAQ