Re: Question on Apache Hive + AWS Glue Data Catalog

2025-04-28 Thread David Novogrodsky
Unsubscribe David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky On Fri, Apr 25, 2025 at 9:34 AM Sungwoo Park wrote: > Hello, > > I am wondering if anyone uses Apache Hive 3 or 4 with AWS Glue Data > Catalog. There is a git repository for

Re: Hive 4.0.0 on MR3 released

2024-07-30 Thread David Engel
Congratulations, Sungwoo. I look forward to trying out Hive 4.0.0 on MR3 when my time allows. David On Tue, Jul 30, 2024 at 11:29:02PM +0900, Sungwoo Park wrote: > Hi all, > > We would like to announce the release of Hive 4.0.0 on MR3. It is based on > Hive 4.0.0 (together with 17

Re: Hive 3 has big performance improvement from my test

2023-01-07 Thread David
I spent some time over the past couple of years making micro optimizations within Avro, Parquet, ORC. Curious to know if there's a way for you all to get timings at different levels of the stack to compare and not just look at the top-line numbers. A further breakdown could also help identify area

Re: Does Hive support data encryption?

2021-03-02 Thread David
Not directly. It relies on the underlying storage layer. For example: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html On Tue, Mar 2, 2021 at 6:34 AM qq <987626...@qq.com> wrote: > Hello: > > Does Hive support data encryption? > > Thank

Re: [EXTERNAL] Re: Any plan for new hive 3 or 4 release?

2021-02-27 Thread David
Hello, My hope has been that Hive 4.x would be built on Java 11. However, I've hit many stumbling blocks over the past year towards this goal. I've been able to make some progress, but several things are still stuck. It mostly stems from the fact that hive has many big-ticket dependencies like

Re: Hive Avro: Directly use of embedded Avro Scheme

2020-10-31 Thread David
> you can easily create a new version. > Is this the idea ? > > Br, > Dennis > -- > *Von:* David > *Gesendet:* Samstag, 31. Oktober 2020 14:52:04 > *An:* user@hive.apache.org > *Betreff:* Re: Hive Avro: Directly use of embedded Avro Scheme &g

Re: Hive Avro: Directly use of embedded Avro Scheme

2020-10-31 Thread David
What would your expectation be? That Hive reads the first file it finds and uses that schema in the table definition? What if the table is empty and a user attempts an INSERT? What should be the behavior? The real power of Avro is not so much that the schema can exist (optionally) in the file i

Re: Removing Hive-on-Spark

2020-07-27 Thread David
Hello Stephen, Thanks for your interest. Can you please elaborate a bit more on your question? Thanks. On Mon, Jul 27, 2020 at 4:11 PM Stephen Boesch wrote: > Why would it be this way instead of the other way around? > > On Mon, 27 Jul 2020 at 12:27, David wrote: > >>

Removing Hive-on-Spark

2020-07-27 Thread David
Hello Hive Users. I am interested in gathering some feedback on the adoption of Hive-on-Spark. Does anyone care to volunteer their usage information and would you be open to removing it in favor of Hive-on-Tez in subsequent releases of Hive? If you are on MapReduce still, would you be open to mi

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
Yes Peter, we're working on it. We try to make compaction work automatically. With crontab otherwise. Thanks for your help David Le mar. 2 juin 2020 à 14:48, Peter Vary a écrit : > Hi David, > > Maybe this can help: > https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_da

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
. 2 juin 2020 à 12:57, Peter Vary a écrit : > Hi David, > > You do not really need to run compaction every time. > Is it possible to wait for the compaction to start automatically next time? > > Thanks, > Peter > > On Jun 2, 2020, at 12:51, David Morin wrote: > > Th

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
Thanks Peter, Any workaround on HDP 2.6.x with Hive 2 ? Otherwise, the only way is to reduce time it takes for this "merge" queries in order to cancel locks and related transactions. Am I right ? Le mar. 2 juin 2020 à 11:52, Peter Vary a écrit : > Hi David, > > I think this

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
paction for the current database/table On 2020/06/01 20:13:08, David Morin wrote: > Hi, > > I have a compaction issue on my cluster. When I force a compaction (major) on > one table I get this error in Metastore logs: > > 2020-06-01 19:49:35,512 ERROR [-78]: compactor.Com

compaction issue: Compaction cannot compact above this txnid

2020-06-01 Thread David Morin
Hi, I have a compaction issue on my cluster. When I force a compaction (major) on one table I get this error in Metastore logs: 2020-06-01 19:49:35,512 ERROR [-78]: compactor.CompactorMR (CompactorMR.java:run(264)) - No delta files or original files found to compact in hdfs://...hive/wareh

Re: Jira Doc Access

2020-04-14 Thread David Mollitor
Thanks Ashutosh! On Mon, Apr 13, 2020 at 12:27 PM Ashutosh Chauhan wrote: > Hi David, > Added you to Hive wiki. > Thanks, > Ashutosh > > On Mon, Apr 13, 2020 at 6:39 AM David Mollitor wrote: > >> Hello Team, >> >> Is anyone able to grant me ac

Jira Doc Access

2020-04-13 Thread David Mollitor
Hello Team, Is anyone able to grant me access to the Apache Hive Wiki (dmollitor) ? Also, is there any discussion/interest in moving docs into the git repo? Thanks!

Re: Hive Config for ignoring Glacier Object while querying

2020-03-03 Thread David Lavati
Hi Anup, I'm not that familiar yet with Hive's S3/Glacier-related capabilities, but a quick search in both the code base and our jira project returned with nothing in relation to glacier. Regards, David On Tue, Mar 3, 2020 at 7:48 AM Anup Tiwari wrote: > Hi Team, > > It wil

Re: ORC: duplicate record - rowid meaning ?

2020-02-25 Thread David Morin
Hi Peter, Just to give some news concerning my issue. The problem is fixed. In fact, it was a reset of rowid in my application because default batch size of my VectorizedRowBatch (ORC) is 1024 And during the reset of this batch, a reset of rowid was done. By now it works as expected Thanks David

Re: Query Failures

2020-02-14 Thread David Mollitor
https://community.cloudera.com/t5/Support-Questions/Map-and-Reduce-Error-Java-heap-space/td-p/45874 On Fri, Feb 14, 2020, 6:58 PM David Mollitor wrote: > Hive has many optimizations. One is that it will load the data directly > from storage (HDFS) if it's a trivial query.

Re: Query Failures

2020-02-14 Thread David Mollitor
Hive has many optimizations. One is that it will load the data directly from storage (HDFS) if it's a trivial query. For example: Select * from table limit 10; In natural language it says "give me any ten rows (if available) from the table." You don't need the overhead of launching a full mapr

Re: ORC: duplicate record - rowid meaning ?

2020-02-06 Thread David Morin
ok, Peter No problem. Thx I'll keep you in touch On 2020/02/06 09:42:39, Peter Vary wrote: > Hi David, > > I more familiar with ACID v2 :( > What I would do is to run an update operation with your version of Hive and > try to see how it handles this case. > > Would

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
seems to be good. Probably a problem in the sort but I follow the rule that data are ordered by originalTransaction,bucketId,rowId ascendingly and currentTransaction descendingly. It works pretty well except for some tables with lot of updates. The only thing I can see at the moment it is the fact t

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
73_0199073_ hdfs:///delta_0199073_0199073_0002 And the first one contains updates (operation:1) and the second one, inserts (operation:0) Thanks for your help David On 2019/12/01 16:57:08, David Morin wrote: > Hi Peter, > > At the moment I have a pipeline based on Flink to wri

Re: Why Hive uses MetaStore?

2020-01-15 Thread David Mollitor
In the beginning, hive was a command line tool. All the heavy lifting happened on the user's local box. If a user wanted to execute hive from their laptop, or a server, it always needs access to the list of available tables (and their schemas and their locations), otherwise every SQL script would

Re: Alternatives to Streaming Mutation API in Hive 3.x

2020-01-13 Thread David Mollitor
Hello, Streaming? NiFi Upserts? HBase, Kudu, Hive 3.x Doing upserts on Hive can be cumbersome, depending on the use case. If Upserts are being submitted continuously and quickly, it can overwhelm the system because it will require a scan across the data set (for all intents and purposes) for ea

compaction and stripes size

2019-12-08 Thread David Morin
example with Orc files composed on small stripes, if I perform a major compaction, can I expect to get new Orc files with bigger Stripes size ? Thanks in advance David

Re: ORC: duplicate record - rowid meaning ?

2019-12-01 Thread David Morin
tate to store Hive metadata (originalTransaction, bucket, rowId, ..) Thanks for your reply because yes, when files are ordered by originalTransacion, bucket, rowId it works ! I just have to use 1 transaction instead of 2 at the moment and it will be ok. Thanks David On 2019/11/29 11:18:05, Pe

Re: ORC: duplicate record - rowid meaning ?

2019-11-19 Thread David Morin
tid":3,"rowid":0} | *5218* | | {"transactionid":11365,"bucketid":3,"rowid":1} | *5216* | | {"transactionid":11369,"bucketid":3,"rowid":1} | *5216* | | {"transactionid":11369,"bucketid":

ORC: duplicate record - rowid meaning ?

2019-11-18 Thread David Morin
he new transaction during the second INSERT but that seems to generate duplicate records. Regards, David

Re: What is the Hive HA processing mechanism?

2019-11-15 Thread David Mollitor
Hello, Not sure if this answers your question, but please note the following: Processing occurs via MapReduce, Spark, or Tez. The processing engines run on top of YARN. Each processing engine derives much of their HA from YARN. There are some quarks there, but these engines running on YARN is

Re: INSERT OVERWRITE Failure Saftey

2019-11-06 Thread David M
previous data is deleted and the new data is renamed/moved. Something to watch out for is if the query returns no rows than the old data isn’t removed. Thanks Shawn From: David M Reply-To: "user@hive.apache.org" Date: Wednesday, November 6, 2019 at 3:27 PM To: "user@hi

INSERT OVERWRITE Failure Saftey

2019-11-06 Thread David M
me a definitive answer on this? Pointers to the source code or documentation that explains this would be even better. Thanks! David McGinnis

Re: Locks with ACID: need some clarifications

2019-09-09 Thread David Morin
> > Alan. > > On Mon, Sep 9, 2019 at 10:55 AM David Morin > wrote: > >> Thanks Alan, >> >> When you say "you just can't have two simultaneous deletes in the same >> partition", simultaneous means for the same transaction ? >> If a create 2 &q

Re: Locks with ACID: need some clarifications

2019-09-09 Thread David Morin
is changes in Hive 3, where update and delete also take shared locks and > a first committer wins strategy is employed instead. > > Alan. > > On Mon, Sep 9, 2019 at 8:29 AM David Morin > wrote: > >> Hello, >> >> I use in production HDP 2.6.5 with Hive 2.1.0 &g

Locks with ACID: need some clarifications

2019-09-09 Thread David Morin
and 2 for Insert (except original transaction = current) 4. commit transactions Can we use Shared lock here ? Thus select queries can still be used Thanks David

Re: Hive Major Compaction fails (cleaning step)

2019-08-26 Thread David Morin
elong to hive. Weird, isn't it ? Thus, this is a workaround but a little bit crappy. But I'm open to any more suitable solution. Le lun. 26 août 2019 à 08:51, David Morin a écrit : > Sorry, the same link in english: > http://www.adaltas.com/en/2019/07/25/hive-3-features-tips-tricks/ >

Re: Hive Major Compaction fails (cleaning step)

2019-08-25 Thread David Morin
Sorry, the same link in english: http://www.adaltas.com/en/2019/07/25/hive-3-features-tips-tricks/ Le lun. 26 août 2019 à 08:35, David Morin a écrit : > Here after a link related to hive3: > http://www.adaltas.com/fr/2019/07/25/hive-3-fonctionnalites-conseils-astuces/ > The author sug

Re: Hive Major Compaction fails (cleaning step)

2019-08-25 Thread David Morin
août 2019 à 07:51, David Morin a écrit : > Hello, > I've been trying "ALTER TABLE (table_name) COMPACT 'MAJOR'" on my Hive 2 > environment, but it always fails (HDP 2.6.5 precisely). It seems that the > merged base file is created but the delta is not delet

Hive Major Compaction fails (cleaning step)

2019-08-25 Thread David Morin
Hello, I've been trying "ALTER TABLE (table_name) COMPACT 'MAJOR'" on my Hive 2 environment, but it always fails (HDP 2.6.5 precisely). It seems that the merged base file is created but the delta is not deleted. I found that it was because the HiveMetastore Client can't connect to the metastore bec

S3 with Tez Performance Issues?

2019-07-01 Thread David M
based on the number of files, but only if the files are located in S3. Can someone confirm this? If this is the case, is there a JIRA tracking a fix, or documentation on why this has to be this way? If not, how can I make sure we use more mappers in cases like above? Thanks! David McGinnis

Re: Creating temp tables in select statements

2019-03-28 Thread David Lavati
ble insertion you can use a syntax somewhat similar to VALUES https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL Kind Regards, David On Wed, Mar 27, 2019 at 12:40 AM Mainak Ghosh wrote: > Hello, > > We want to create temp

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
; > Alan. > > On Tue, Mar 12, 2019 at 12:24 PM David Morin > wrote: > >> Thanks Alan. >> Yes, the problem is fact was that this streaming API does not handle >> update and delete. >> I've used native Orc files and the next step I've planned to do

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
s designed for this case, though it only handles insert (not update), > so if you need updates you'd have to do the merge as you are currently > doing. > > Alan. > > On Mon, Mar 11, 2019 at 2:09 PM David Morin > wrote: > >> Hello, >> >> I've just

How to update Hive ACID tables in Flink

2019-03-11 Thread David Morin
irectories that contain these delta Orc files. Then, MERGE INTO queries are executed periodically to merge data into the Hive target table. It works pretty well but we want to avoid the use of these Merge queries. How can I update Orc files directly from my Flink job ? Thanks, David

Re: Read Hive ACID tables in Spark or Pig

2019-03-11 Thread David Morin
these queries, I have to get the valid transaction for each table from Hive Metastore and, then, read all related files. Is it correct ? Thanks, David Le dim. 10 mars 2019 à 01:45, Nicolas Paris a écrit : > Thanks Alan for the clarifications. > > Hive has made such improvements

Orc files in hdf: NullPointerException (RunLengthIntegerReaderV2)

2019-02-11 Thread David Morin
Hello, I face to one error when I try to read my Orc files from Hive (external table) or Pig or with hive --orcfiledump .. These files are generated with Flink using the Orc Java API with Vectorize column. If I create these files locally (/tmp/...), push them to hdfs, then I can read the content of

RE: Wiki Write Access

2019-02-10 Thread David M
I realized I mistyped my username My confluence username is mcginnisda. Please give me write access to the Hive confluence wiki, or tell me where I need to request it. Thanks! From: David M Sent: Thursday, February 7, 2019 10:38 AM To: user@hive.apache.org Subject: Wiki Write Access All

Wiki Write Access

2019-02-07 Thread David M
All, I'd like to get wiki write access for the Apache Hive wiki, so I can update some documentation based on a recent patch. My confluence name is mcginnda. Thanks! David McGinnis

Re: Roaring Bitmap UDFs

2017-12-08 Thread David Capwell
Think bloom filter that's more dynamic. It works well when cardinality is low, but grows quickly to out cost bloom filter as cardinality grows. This data structure supports existence queries, but your email sounds like you want count. If so not really the best fit. On Dec 8, 2017 5:00 PM, "Niti

ORC tables failing after upgrading from 0.14 to 2.1.1

2017-05-05 Thread David Capwell
Our schema is nested with top level having 5 struct types. When we try to query these structs we get the following back *ORC does not support type conversion from file type string (1) to reader type array (1)* Walking through hive in a debugger I see that schema evolution sees the correct file t

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread David Nies
its own should work. Is this an ORC table? > > What version of Hive are you using? Kindly find the answer to these questions in my first eMail :) > > HTH -David > > > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread David Nies
se codepaths as part of > the joint effort with the ODBC driver teams. I’ll see what I can do. I can’t restart the server at will though, since other teams are using it as well. > > Cheers, > Gopal > Thank you :) -David

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-20 Thread David Nies
In my test case below, I’m using `beeline` as the Java application receiving the JDBC stream. As I understand, this is the reference command line interface to Hive. Are you saying that the reference command line interface is not efficiently implemented? :) -David Nies > Am 20.06.2016 um 17

Network throughput from HiveServer2 to JDBC client too low

2016-06-20 Thread David Nies
rease network throughput? Thank you in advance! Yours David Nies Entwickler Business Intelligence ADITION technologies AG Oststraße 55, D-40211 Düsseldorf Schwarzwaldstraße 78b, D-79117 Freiburg im Breisgau T +49 211 987400 30 F +49 211 987400 33 E david.n...@adition.com <mailto:david.

RE: Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread David Newberger
Could you be looking at 2 jobs trying to use the same file and one getting to it before the other and finally removing it? David Newberger From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Wednesday, June 8, 2016 1:33 PM To: user; user @spark Subject: Creating a Hive table through

Re: Hive Metadata tables of a schema

2016-04-05 Thread David Morel
Better use HCatalog for this. David Le 5 avr. 2016 10:14, "Mich Talebzadeh" a écrit : > So you want to interrogate Hive metastore and get information about > objects for a given schema/database in Hive. > > These info are kept in Hive metastore database running on an RDBV

Re: read-only mode for hive

2016-03-09 Thread David Capwell
Could always set the tables output format to be the null output format On Mar 8, 2016 11:01 PM, "Jörn Franke" wrote: > What is the use case? You can try security solutions such as Ranger or > Sentry. > > As already mentioned another alternative could be a view. > > > On 08 Mar 2016, at 21:09, PG

Re: ORC NPE while writing stats

2015-09-03 Thread David Capwell
Thanks, that should help moving forward On Sep 3, 2015 10:38 AM, "Prasanth Jayachandran" < pjayachand...@hortonworks.com> wrote: > > > On Sep 2, 2015, at 10:57 PM, David Capwell wrote: > > > > So, very quickly looked at the JIRA and I had the following questi

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
ue is that estimateStripeSize won't always give the correct value since my thread is the one calling it... With everything ThreadLocal, the only writers would be the ones in the same thread, so should be better. On Wed, Sep 2, 2015 at 9:47 PM, David Capwell wrote: > Walking the MemoryMan

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
anything for me, so no issue sharding and not configuring? Thanks for your time reading this email! On Wed, Sep 2, 2015 at 8:57 PM, David Capwell wrote: > So, very quickly looked at the JIRA and I had the following question; > if you have a pool per thread rather than global, then assum

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
n Wed, Sep 2, 2015 at 7:34 PM, David Capwell wrote: > Thanks for the jira, will see if that works for us. > > On Sep 2, 2015 7:11 PM, "Prasanth Jayachandran" > wrote: >> >> Memory manager is made thread local >> https://issues.apache.org/jira/browse/HIVE-1019

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
-10191 and see if that helps? > > On Sep 2, 2015, at 8:58 PM, David Capwell wrote: > > I'll try that out and see if it goes away (not seen this in the past 24 > hours, no code change). > > Doing this now means that I can't share the memory, so will prob go with a > th

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
a synchronization on the > MemoryManager somewhere and thus be getting a race condition. > > Thanks, >Owen > > On Wed, Sep 2, 2015 at 12:57 PM, David Capwell wrote: > >> We have multiple threads writing, but each thread works on one file, so >> orc writer i

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
Also, the data put in are primitives, structs (list), and arrays (list); we don't use any of the boxed writables (like text). On Sep 2, 2015 12:57 PM, "David Capwell" wrote: > We have multiple threads writing, but each thread works on one file, so > orc writer is only

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
o setMinimum unless it had at least some > non-null values in the column. > > Do you have multiple threads working? There isn't anything that should be > introducing non-determinism so for the same input it would fail at the same > point. > > .. Owen > > >

ORC NPE while writing stats

2015-09-01 Thread David Capwell
We are writing ORC files in our application for hive to consume. Given enough time, we have noticed that writing causes a NPE when working with a string column's stats. Not sure whats causing it on our side yet since replaying the same data is just fine, it seems more like this just happens over t

Re: Perl-Hive connection

2015-08-06 Thread David Morel
You probably forgot to load (use) the module before calling new() Le 6 août 2015 8:49 AM, "siva kumar" a écrit : > Hi David , > I have tried the link you have posted. But im stuck > with this error message below > > Can't locate object method

RE: External sorted tables

2015-08-03 Thread David Capwell
eting the data along with sorting it, or try it > without 'sorted by' and see if you can execute a mapjoin. > > > > > > *From:* David Capwell [mailto:dcapw...@gmail.com] > *Sent:* Monday, August 03, 2015 11:59 AM > *To:* user@hive.apache.org > *Subject:* RE: External

RE: External sorted tables

2015-08-03 Thread David Capwell
at the data **is** in fact sorted... > > > > If there is something specific you are trying to accomplish by specifying > the sort order of that column, perhaps you can elaborate on that. > Otherwise, leave out the 'sorted by' statement and you should be fine. > &g

Re: External sorted tables

2015-08-03 Thread David Capwell
is read. This means that users must > be careful to insert data correctly by specifying the number of reducers to > be equal to the number of buckets, and using CLUSTER BY and SORT BY > commands in their query." > > On Thu, Jul 30, 2015 at 7:22 PM, David Capwell wrote: > >

Re: Hive Data into a Html Page

2015-07-31 Thread David Morel
Hive is not really meant to serve data as fast as a web page needs. You'll have to use some intermediate (could even be a db file, or template toolkit generated static pages). David Le 28 juil. 2015 8:53 AM, "siva kumar" a écrit : > Hi Lohith, > We u

External sorted tables

2015-07-30 Thread David Capwell
We are trying to create a external table in hive. This data is sorted, so wanted to tell hive about this. When I do, it complains about parsing the create. > CREATE EXTERNAL TABLE IF NOT EXISTS store.testing ( ... . . . . . . . . . . . . . . . . . . .> timestamp bigint, ...) . . . . . . . . . .

Re: Perl-Hive connection

2015-07-30 Thread David Morel
/lib/Thrift/API/HiveClient2.pm David

connection pooling for hive JDBC client

2015-06-03 Thread McWhorter, David
interact with and query Hive through the JDBC api from an application. Thank you, David McWhorter — David McWhorter Senior Developer, Foundations Informatics and Technology Services Office: 434.260.5232 | Mobile: 434.227.2551 david_mcwhor...@premierinc.com<mailto:david_mcwhor...@premierinc.

Hive Transactions fail

2015-03-05 Thread David Simoes
Ive had some troubles enabling transactions in Hive 1.0.0 and Ive made a post in http://stackoverflow.com/questions/28867368/hive-transactions-are-crashing Could anyone check it out and give me some pointers on why things are crashing? Tyvm, Dave

Using xPATH and Hive SQL to access XML data, but xPath a problem

2014-12-08 Thread David Novogrodsky
t[@xmlns=" http://schemas.microsoft.com/win/2004/08/events/event "]/System/EventID[@Qualifiers=""]/text()') FROM xml_event_table; I get this result(empty rows): 0 1 2 David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky

using Hive to create tables from unstructured data.

2014-11-12 Thread David Novogrodsky
STRING, timeOfCall STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\n", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s" ) LOCA

Re: ODBC Calls Extremely Slow

2014-08-15 Thread David Morel
the parser -at the driver level- does, or did, a rather poor job at parsing the query to transform it to hive semantics. David On Fri, Aug 15, 2014 at 8:54 AM, Bradley Wright wrote: Try an eval of our commercial ODBC driver for Hive: http://www.progress.com/products/datadirect-connect/od

Altering the Metastore on EC2

2014-08-11 Thread David Beveridge
We are creating an Hive schema for reading massive JSON files. Our JSON schema is rather large, and we have found that the default metastore schema for Hive cannot work for us as-is. To be specific, one field in our schema has about 17KB of nested structs within it. Unfortunately, it appears th

Re: Problem adding jar using pyhs2

2014-04-29 Thread David Engel
Hi Brad, Your test, after edting for local host/file names, etc. worked. It must be something else I'm doing wrong in my development stuff. At least I know it should work. I'll figure it out eventually. Thanks again. David On Mon, Apr 28, 2014 at 10:22:57AM -0700, Brad Ruderman w

Cannot Upgrade a Hive UDF without cluster restart. UDF is possibly cached.

2014-04-28 Thread David Zaebst
Hi all, We have a few Hive UDFs where I work. These are deployed by a bootstrap script so that the JAR files are in Hive's CLASSPATH before the server starts. This works to load the UDF whenever a cluster is started and then the UDF can be loaded with the ADD JAR and CREATE TEMPORARY FUNCTION co

Re: Problem adding jar using pyhs2

2014-04-28 Thread David Engel
essor.run() is written such that it only expects "jar file.jar" to get passed to it. That's how it appears to work when "add jar file.jar" is run from a stand-alone Hive CLI and from beeline. David On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote: > An ea

Problem adding jar using pyhs2

2014-04-25 Thread David Engel
command when used from the CLI and Beeline. It seems the "add" part of any "add file|jar|archive ..." command needs to get stripped off somewhere before it gets passed to AddResourceProcessor.run(). Unfortunately, I can't find that location when the command is received f

Re: What is the minimal required version of Hadoop for Hive 0.13.0?

2014-04-23 Thread David Gayou
1 Is it now the minimal required version ? If not, will there be a Hive 0.13.1 for older hadoop? Regards, David On Wed, Apr 23, 2014 at 4:00 PM, Dmitry Vasilenko wrote: > > Hive 0.12.0 (and previous versions) worked with Hadoop 0.20.x, 0.23.x.y, > 1.x.y, 2.x.y. > > Hive 0.13

Re: get_json_object for nested field returning a String instead of an Array

2014-04-08 Thread David Quigley
Hi Narayanan, We have had some success with a similar use case using a custom input format / record reader to recursively split arbitrary json into a set of discreet records at runtime. No schema is needed. Doing something similar might give you the functionality you are looking for. https://githu

Re: Deserializing into multiple records

2014-04-08 Thread David Quigley
I am glad that I could help. > > > > Br, > > Petter > > > > > > 2014-04-04 6:02 GMT+02:00 David Quigley : > >> > >> Thanks again Petter, the custom input format was exactly what I needed. > >> Here is example of my code in case anyone is

Re: Deserializing into multiple records

2014-04-03 Thread David Quigley
but nothing I saw actually decomposes nested JSON into a set of discreet records. Its super useful for us. On Wed, Apr 2, 2014 at 2:15 AM, Petter von Dolwitz (Hem) < petter.von.dolw...@gmail.com> wrote: > Hi David, > > you can implement a custom Input

Re: Deserializing into multiple records

2014-04-02 Thread David Quigley
Makes perfect sense, thanks Petter! On Wed, Apr 2, 2014 at 2:15 AM, Petter von Dolwitz (Hem) < petter.von.dolw...@gmail.com> wrote: > Hi David, > > you can implement a custom InputFormat (extends > org.apache.hadoop.mapred.FileInputFormat) accompanied by a custom > Rec

Deserializing into multiple records

2014-04-01 Thread David Quigley
We are currently streaming complex documents to hdfs with the hope of being able to query. Each single document logically breaks down into a set of individual records. In order to use Hive, we preprocess each input document into a set of discreet records, which we save on HDFS and create an externa

Re: Exception

2014-02-28 Thread David Gayou
fluence/display/Hive/LanguageManual+DML And i have no clue how to actually insert the data only through the jdbc client. Regards David On Fri, Feb 28, 2014 at 10:22 AM, Jone Lura wrote: > Hi all, > > I am trying to understand why I receive the following exception; > >

Re: Issue with Hive and table with lots of column

2014-02-18 Thread David Gayou
Sorry i badly reported it. It's 8192M Thanks, David. Le 18 févr. 2014 18:37, "Stephen Sprague" a écrit : > oh. i just noticed the -Xmx value you reported. > > there's no M or G after that number?? I'd like to see -Xmx8192M or > -Xmx8G. That *is*

Re: Issue with Hive and table with lots of column

2014-02-18 Thread David Gayou
1. I have no process with hiveserver2 ... "ps -ef | grep -i hive" return some pretty long command with a -Xmx8192 and that's the value set in hive-env.sh 2. The "select * from table limit 1" or even 100 is working correctly. David. On Tue, Feb 18, 2014 at 4:16 P

Re: Issue with Hive and table with lots of column

2014-02-18 Thread David Gayou
It works in CLI and hiveserver1. It fails with hiveserver 2. Regards David Gayou On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 wrote: > With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes > less memory than before. > > Could you try it with the version in trunk?

Re: Issue with Hive and table with lots of column

2014-01-31 Thread David Gayou
for my client (haven't found how to change the heap size) My usecase is really to have the most possible columns. Thanks a lot for your help Regards David On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo wrote: > Ok here are the problem(s). Thrift has frame size limits, thrift has to

Re: Issue with Hive and table with lots of column

2014-01-30 Thread David Gayou
y row basis on those dataset, so basically the more column we have the better it is. We are coming from the SQL world, and Hive is the closest to SQL syntax. We'd like to keep some SQL manipulation on the data. Thanks for the Help, Regards, David Gayou On Tue, Jan 28, 2014 at 8:35 PM, Steph

MIN/MAX issue with timestamps and RCFILE/ORC tables

2013-12-06 Thread David Engel
09-07 00:12:49.449 | | spreadsheets0.google.com | 8 | 8 | 2013-09-06 03:19:42.726 | 2013-09-06 21:01:07.743 | | spreadsheets2.google.com | 7 | 9 | 2013-09-06 03:19:42.726 | 2013-09-06 13:13:19.84 | +++-+--+--+ David -- David Engel da...@istwok.net

Re: Hive query taking a lot of time just to launch map-reduce jobs

2013-11-26 Thread David Morel
On 26 Nov 2013, at 7:02, Sreenath wrote: Hey David, Thanks for the swift reply. Each id will have exactly one file. and regarding the volume on an average each file would be 100MB of compressed data with the maximum going upto around 200MB compressed data. And how will RC files be an

Re: Difference in number of row observstions from distinct and group by

2013-11-25 Thread David Morel
and do a left outer join of table 1 on table 2. you'd be able to identify quickly what went wrong. Sort the result so you get unlikely dupes, and all. Just trial and error until you nail it. David

Re: java.lang.OutOfMemoryError: Java heap space

2013-11-25 Thread David Morel
On 22 Nov 2013, at 9:35, Rok Kralj wrote: If anybody has any clue what is the cause of this, I'd be happy to hear it. On Nov 21, 2013 9:59 PM, "Rok Kralj" wrote: what does echo $HADOOP_HEAPSIZE return in the environment you're trying to launch hive from? David

Re: Hive query taking a lot of time just to launch map-reduce jobs

2013-11-25 Thread David Morel
e of the biggest ID for a day, and the average? David

  1   2   >