Re: hive will die or not?

2016-08-07 Thread Marcin Tustin
I think that's right. My testing (not very scientific) puts it on par for redshift for the datasets I use. On Sunday, August 7, 2016, Edward Capriolo wrote: > A few entities going to "kill/take out/better than hive" > I seem to remember HadoopDb, Impala, RedShift , voltdb... > > But apparent hiv

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-07 Thread Marcin Tustin
any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 7 August 2016 at 13:17, Marcin

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-07 Thread Marcin Tustin
gt; > > Dudu > > > > *From:* Marcin Tustin [mailto:mtus...@handybook.com] > *Sent:* Sunday, August 07, 2016 3:17 PM > *To:* user@hive.apache.org > *Subject:* Re: Crate Non-partitioned table from partitioned table using > CREATE TABLE .. LIKE > > > > Will CR

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-07 Thread Marcin Tustin
Will CREATE TABLE sales5 AS SELECT * FROM SALES; not work for you? On Thu, Aug 4, 2016 at 5:05 PM, Nagabhushanam Bheemisetty < nbheemise...@gmail.com> wrote: > Hi I've a scenario where I need to create a table from partitioned table > but my destination table should not be partitioned. I won't be

Re: Create table from orc file

2016-08-03 Thread Marcin Tustin
) to reader type > struct (1) (state=,code=0) > > > > So what is wrong with the above? > > > I should mention, that I created the orc files having used using the latest > orc-core lib (1.1.2). That seems not to be the same implementation for orc > files access as being use

Re: Create table from orc file

2016-08-03 Thread Marcin Tustin
Yes. Create an external table whose location contains only the orc file(s) you want to include in the table. On Wed, Aug 3, 2016 at 7:53 AM, Johannes Stamminger < johannes.stammin...@airbus.com> wrote: > Hi, > > > is it possible to write data to an orc file(s) using the hive-orc api and > to > us

Re: A dedicated Web UI interface for Hive

2016-07-15 Thread Marcin Tustin
r property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 14 July 2016 at 23:29, Marcin Tustin

Re: A dedicated Web UI interface for Hive

2016-07-14 Thread Marcin Tustin
What do you want it to do? There are at least two web interfaces I can think of. On Thu, Jul 14, 2016 at 6:04 PM, Mich Talebzadeh wrote: > Hi Gopal, > > If I recall you were working on a UI support for Hive. Currently the one > available is the standard Hadoop one on port 8088. > > Do you have a

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Marcin Tustin
> loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > >

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Marcin Tustin
Quick note - my experience (no benchmarks) is that Tez without LLAP (we're still not on hive 2) is faster than MR by some way. I haven't dug into why that might be. On Tue, Jul 12, 2016 at 9:19 AM, Mich Talebzadeh wrote: > sorry I completely miss your points > > I was NOT talking about Exadata.

Re: loading in ORC from big compressed file

2016-06-21 Thread Marcin Tustin
This is because a GZ file is not splittable at all. Basically, try creating this from an uncompressed file, or even better split up the file and put the files in a directory in hdfs/s3/whatever. On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh wrote: > Hi , > > I have big compressed data file *my_

Where are jars stored for permanent functions

2016-06-08 Thread Marcin Tustin
Hi All, I just added local jars to my hive session, created permanent functions, and find that they are available across sessions and machines. This is of course excellent, but I'm wondering where those jars are being stored? What setting or what default directory would I find them in. My session

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Marcin Tustin
Mich - it sounds like maybe you should try these benchmarks with alluxio abstracting the storage layer, and see how much it makes a difference. Alluxio should (if I understand it right) provide a lot of the optimisation you're looking for with in memory work. I've never used it, but I would love t

NullPointerException when dropping database backed by S3

2016-05-06 Thread Marcin Tustin
Hi All, I have a database backed by an s3 bucket. When I try to drop that database, I get a NullPointerException: hive> drop database services_csvs cascade; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NullPointerException)

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Marcin Tustin
Mich > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.co

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Marcin Tustin
They're not simply interchangeable. sqoop is written to use mapreduce. I actually implemented my own replacement for sqoop-export in spark, which was extremely simple. It wasn't any faster, because the bottleneck was the receiving database. Is your motivation here speed? Or correctness? On Sat,

Re: Hive footprint

2016-04-20 Thread Marcin Tustin
+---+--+--+ > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrb

Re: Hive footprint

2016-04-18 Thread Marcin Tustin
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 18 April 2016 at 23:43, Marcin Tustin wrote: > >> HBase has a different use case - it's for low-latency querying of big >> tables. I

Re: Hive footprint

2016-04-18 Thread Marcin Tustin
HBase has a different use case - it's for low-latency querying of big tables. If you combined it with Hive, you might have something nice for certain queries, but I wouldn't think of them as direct competitors. On Mon, Apr 18, 2016 at 6:34 PM, Mich Talebzadeh wrote: > Hi, > > I notice that Impal

Re: De-identification_in Hive

2016-03-19 Thread Marcin Tustin
This is a classic transform-load problem. You'll want to anonymise it once before making it available for analysis. On Thursday, March 17, 2016, Ajay Chander wrote: > Hi Everyone, > > I have a csv.file which has some sensitive data in a particular column > in it. Now I have to create a table in

Re: Hive alter table concatenate loses data - can parquet help?

2016-03-14 Thread Marcin Tustin
issue. I > can verify it and provide a fix in case of bug. > > Thanks > Prasanth > > On Mar 8, 2016, at 5:52 AM, Marcin Tustin > wrote: > > Hi Mich, > > ddl as below. > > Hi Prasanth, > > Hive version as reported by Hortonworks is 1.2.1.2.3. > >

Re: How to rename a hive table without changing location?

2016-03-12 Thread Marcin Tustin
I you wish to keep it in its current location consider creating an external table. On Saturday, March 12, 2016, Rex X wrote: > Hi Mich, > > I am doing this, because I need to update an existing big hive table, > which can be stored in any arbitrary customized location on hdfs. But when > we do A

Re: Hive alter table concatenate loses data - can parquet help?

2016-03-08 Thread Marcin Tustin
: > Hi > > can you please provide DDL for this table "show create table " > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr

Re: Hive 2 insert error

2016-03-07 Thread Marcin Tustin
I believe updates and deletes have always had this constraint. It's at least hinted at by: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-ConfigurationValuestoSetforINSERT,UPDATE,DELETE On Mon, Mar 7, 2016 at 7:46 PM, Mich Talebzadeh wrote: > Hi, > > I notice

Hive alter table concatenate loses data - can parquet help?

2016-03-07 Thread Marcin Tustin
Hi All, Following on from from our parquet vs orc discussion, today I observed hive's alter table ... concatenate command remove rows from an ORC formatted table. 1. Has anyone else observed this (fuller description below)? And 2. How to do parquet users handle the file fragmentation issue? Desc

Re: Updating column in table throws error

2016-03-06 Thread Marcin Tustin
Don't bucket on columns you expect to update. Potentially you could delete the whole row and reinsert it. On Sunday, March 6, 2016, Ashok Kumar wrote: > Hi gurus, > > I have an ORC table bucketed on invoicenumber with "transactional"="true" > > I am trying to update invoicenumber column used fo

Re: Parquet versus ORC

2016-03-06 Thread Marcin Tustin
If you google, you'll find benchmarks showing each to be faster than the other. In so far as there's any reality to which is faster in any given comparison, it seems to be a result of each incorporating ideas from the other, or at least going through development cycles to beat each other. ORC is v

Data corruption/loss in hive

2016-01-22 Thread Marcin Tustin
Hi All, I'm seeing some data loss/corruption in hive. This isn't HDFS-level corruption - hdfs reports that the files and blocks are healthy. I'm using managed ORC tables. Normally we write once an hour to each table, with occasional concatenations through hive. We perform the writing using spark

Re: the `use database` command will change the scheme of target table?

2016-01-19 Thread Marcin Tustin
That is the expected behaviour. Managed tables are created within the directory of their host database. On Tuesday, 19 January 2016, 董亚军 wrote: > hi list, > > we use the HDFS and S3 as the Hive Filesystem at the same time. here has > an issue: > > > *scenario* 1: > > hive command: > > use defa

Re: eiquivalent to identity column in Hive

2016-01-16 Thread Marcin Tustin
See this: http://stackoverflow.com/questions/23082763/need-to-add-auto-increment-column-in-a-table-using-hive On Sat, Jan 16, 2016 at 11:52 AM, Ashok Kumar wrote: > Hi, > > Is there an equivalent to Microsoft IDENTITY column in Hive please. > > Thanks and regards > -- Want to work at Handy? C

Re: Loading data containing newlines

2016-01-15 Thread Marcin Tustin
troy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore

Re: Loading data containing newlines

2016-01-15 Thread Marcin Tustin
I second this. I've generally found anything else to be disappointing when working with data which is at all funky. On Wed, Jan 13, 2016 at 8:13 PM, Alexander Pivovarov wrote: > Time to use Spark and Spark-Sql in addition to Hive? > It's probably going to happen sooner or later anyway. > > I sen

Re: foreign keys in Hive

2016-01-10 Thread Marcin Tustin
You can join on any equality criterion, just like in any other relational database. Foreign keys in "standard" relational databases are primarily an integrity constraint. Hive in general lacks integrity constraints. On Sun, Jan 10, 2016 at 9:45 AM, Ashok Kumar wrote: > hi, > > what is the equiva

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Marcin Tustin
Yes, that's why I haven't had to compile anything. On Wed, Dec 30, 2015 at 4:16 PM, Jörn Franke wrote: > Hdp Should have TEZ already on-Board bye default. > > On 30 Dec 2015, at 21:42, Marcin Tustin wrote: > > I'm afraid I use the HDP distribution so I haven&#

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Marcin Tustin
1-4, volume > one out shortly > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Marcin Tustin
I'm using TEZ 0.7.0.2.3 with hive 1.2.1.2.3. I can confirm that TEZ is much faster than MR in pretty much all cases. Also, with hive, you'll make sure you've performed optimizations like aligning ORC stripe sizes with HDFS block sizes, and concatenated your tables (not so much an optimization as a

Importing into a hive database with minimal unavailability or renaming a database

2015-12-18 Thread Marcin Tustin
Hi All, We import our production database into hive on a schedule using sqoop. Unfortunately, sqoop won't update the table schema in hive when the table schema has changed in the source database. Accordingly, to get updates to the table schema we drop the hive table first. Unfortunately, this ca