RE: Indexes in Hive

2016-01-05 Thread Mich Talebzadeh
I believe so Jorn. I am not sure how much it differs from ORC file storage? Cheers, Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr V8Pw Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial Data on

Re: Indexes in Hive

2016-01-05 Thread Jörn Franke
If I understand you correctly this could be just another Hive storage format. > On 06 Jan 2016, at 07:24, Mich Talebzadeh wrote: > > Hi, > > Thinking loudly. > > Ideally we should consider a totally columnar storage offering in which each > column of table is stored as compressed value (I disr

Indexes in Hive

2016-01-05 Thread Mich Talebzadeh
Hi, Thinking loudly. Ideally we should consider a totally columnar storage offering in which each column of table is stored as compressed value (I disregard for now how actually ORC does this but obviously it is not exactly a columnar storage). So each table can be considered as a loose federati

Re: Is Hive Index officially not recommended?

2016-01-05 Thread Lefty Leverenz
I'd like to revise the Indexing and IndexDev docs in the wiki to include this information (as well as information from a previous thread, if I can find it) so peopl

Re: Is Hive Index officially not recommended?

2016-01-05 Thread Gopal Vijayaraghavan
>So in a nutshell in Hive if "external" indexes are not used for improving >query response, what value they add and can we forget them for now? The builtin indexes - those that write data as smaller tables are only useful in a pre-columnar world, where the indexes offer a huge reduction in IO. P

RE: Hive on TEZ fails starting

2016-01-05 Thread Artem Ervits
Check if you have conflicting java versions On Jan 5, 2016 5:27 PM, "Mich Talebzadeh" wrote: > Hi Rajesh, > > > > This is what I have under :$HADOOP_COMMON_HOME/lib/native > > > > cd $HADOOP_COMMON_HOME/lib/native > > hduser@rhes564::/home/hduser/hadoop-2.6.0/lib/native> ls -ltr > > total 4936 >

RE: Hive on TEZ fails starting

2016-01-05 Thread Mich Talebzadeh
Hi Rajesh, This is what I have under :$HADOOP_COMMON_HOME/lib/native cd $HADOOP_COMMON_HOME/lib/native hduser@rhes564::/home/hduser/hadoop-2.6.0/lib/native> ls -ltr total 4936 -rwxr-xr-x 1 hduser hadoop 278622 Nov 13 2014 libhdfs.so.0.0.0 -rw-r-xr-x 1 hduser hadoop 440498 Nov 13 201

RE: Is Hive Index officially not recommended?

2016-01-05 Thread Mich Talebzadeh
Thanks Gopal for a very valuable insight. So in a nutshell in Hive if "external" indexes are not used for improving query response, what value they add and can we forget them for now? Regards, Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOAB

Re: Is Hive Index officially not recommended?

2016-01-05 Thread Gopal Vijayaraghavan
> I am going to run the same query in Hive. However, I only see a table >scan below and no mention of that index. May be I am missing something >here? Hive Indexes are an incomplete feature, because they are not maintained over an ACID storage & demand FileSystem access to check for validity.

RE: Is Hive Index officially not recommended?

2016-01-05 Thread Mich Talebzadeh
Hi, You point below: The "traditional" indexes can still make sense for data not in Orc or parquet format. Kindly consider below please A traditional index in an RDBMs is normally a B-tree index with a value for that column and pointer (Row ID)to the row in the data block that kee

Re: Is Hive Index officially not recommended?

2016-01-05 Thread Ting(Goden) Yao
yes. we tried mr and it works fine. so it's more likely a tez issue. Thanks for your comments. On Tue, Jan 5, 2016 at 11:58 AM Jörn Franke wrote: > You can still use execution Engine mr for maintaining the index. Indeed > with the ORC or parquet format there are min/max indexes and bloom filters

Re: Is Hive Index officially not recommended?

2016-01-05 Thread Jörn Franke
Btw this is not Hive specific, but also for other relational database systems, such as Oracle Exadata. > On 05 Jan 2016, at 20:57, Jörn Franke wrote: > > You can still use execution Engine mr for maintaining the index. Indeed with > the ORC or parquet format there are min/max indexes and bloom

Re: Is Hive Index officially not recommended?

2016-01-05 Thread Jörn Franke
You can still use execution Engine mr for maintaining the index. Indeed with the ORC or parquet format there are min/max indexes and bloom filters, but you need to sort your data appropriately to benefit from performance. Alternatively you can create redundant tables sorted in different order. T

RE: Is Hive Index officially not recommended?

2016-01-05 Thread Mich Talebzadeh
I don’t think Index on hive (as a separate entity) adds any value although you can create one You can create an ORC table which will have characteristics that can simulate index like behaviour CLUSTERED BY (object_id) INTO 256 BUCKETS STORED AS ORC TBLPROPERTIES ( "orc.compress"="SNAP

Is Hive Index officially not recommended?

2016-01-05 Thread Ting(Goden) Yao
Hi, We hit an issue when doing Hive testing to rebuild index on Tez. We were told by our Hadoop distro vendor that it's not recommended (or should avoid) using index with Hive. But I don't see an official message on Hive wiki or document

Re: NPE when reading Parquet using Hive on Tez

2016-01-05 Thread Adam Hunt
Hi Gopal, Spark does offer dynamic allocation, but it doesn't always work as advertised. My experience with Tez has been more in line with my expectations. I'll bring up my issues with Spark on that list. I tried your example and got the same NPE. It might be a mapr-hive issue. Thanks for your he

RE: Deleting empty rows from hive table through java

2016-01-05 Thread Mich Talebzadeh
Agreed. Empty rows in any database have no intrinsic value. If we think of ELT, then in theory we need to get the Web data into Hive table including empty rows and then do the clean-up and getting rid of them. This is time consuming and whatever engine we use it is not going to be efficient.

Re: Hive on TEZ fails starting

2016-01-05 Thread Rajesh Balamohan
Try ' beeline --hiveconf tez.task.launch.env="LD_LIBRARY_PATH=$LD_LIBRARY_ PATH:$HADOOP_COMMON_HOME/lib/native" --hiveconf tez.am.launch.env=" LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native" '. Please check if you have the lib*.so available in the native folder (or point it to th

Re: Deleting empty rows from hive table through java

2016-01-05 Thread Vikas Parashar
Well said Mich, I had gone through from the same scenario in which we had done ETL out side the hive. Once the transformation is done then we loaded all data into hive warehouse. I think, that's the best practice, we should follow it. Regards, Vikas Parashar On Tue, Jan 5, 2016 at 5:02 PM, Mich

RE: Deleting empty rows from hive table through java

2016-01-05 Thread Mich Talebzadeh
In would be interesting to do ETL outside of Hive by getting Data from Webpage to an intermediate file, pruning the empty rows and loading the final CSV file into Hive destination table. I am pretty sure this clean up outside of Hive would be faster compared to said thing in Hive Dr Mich

RE: Hive on TEZ fails starting

2016-01-05 Thread Mich Talebzadeh
Hi, I have added the following to the LD_LIBRARY_PATH and JAVA_LIBRARY_PATH export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native Trying to use TEZ, I still get the same error 0: jdbc:hive2:

RE: Deleting empty rows from hive table through java

2016-01-05 Thread Mich Talebzadeh
Hi Sateesh, You can do the clean-up in Hive by creating a staging table in Hive, feeding your CSV data there and then inserting data into main table where COL1 is NOT NULL. Alternatively you can create your Hive table as transactional. Although I would say the staging table is better as

Re: Deleting empty rows from hive table through java

2016-01-05 Thread Vikas Parashar
If data is not huge then please export it into csv. You have to do all the transformation on csv and point your table on it. Would you mind telling me how you are loading your data in hive. Regards, Vikas Parashar On Tue, Jan 5, 2016 at 1:46 PM, Sateesh Karuturi < sateesh.karutu...@gmail.com> w

Re: Deleting empty rows from hive table through java

2016-01-05 Thread Sateesh Karuturi
Thank you for your quick response... Directly loading the data from webpage to hive On Tue, Jan 5, 2016 at 1:44 PM, Vikas Parashar wrote: > What is the backend of your table? > Is it csv, orc or anything else! > > > Regards, > Vikas Parashar > > > On Tue, Jan 5, 2016 at 12:28 PM, Sateesh Karutur

Re: Deleting empty rows from hive table through java

2016-01-05 Thread Vikas Parashar
What is the backend of your table? Is it csv, orc or anything else! Regards, Vikas Parashar On Tue, Jan 5, 2016 at 12:28 PM, Sateesh Karuturi < sateesh.karutu...@gmail.com> wrote: > Hello... > Anyone please help me how to delete empty rows from hive table through > java? > Thanks in advance >