Re: Roaring Bitmap UDFs

2017-12-08 Thread David Capwell
Think bloom filter that's more dynamic. It works well when cardinality is low, but grows quickly to out cost bloom filter as cardinality grows. This data structure supports existence queries, but your email sounds like you want count. If so not really the best fit. On Dec 8, 2017 5:00 PM, "Niti

ORC tables failing after upgrading from 0.14 to 2.1.1

2017-05-05 Thread David Capwell
Our schema is nested with top level having 5 struct types. When we try to query these structs we get the following back *ORC does not support type conversion from file type string (1) to reader type array (1)* Walking through hive in a debugger I see that schema evolution sees the correct file t

Re: read-only mode for hive

2016-03-09 Thread David Capwell
Could always set the tables output format to be the null output format On Mar 8, 2016 11:01 PM, "Jörn Franke" wrote: > What is the use case? You can try security solutions such as Ranger or > Sentry. > > As already mentioned another alternative could be a view. > > > On 08 Mar 2016, at 21:09, PG

Re: ORC NPE while writing stats

2015-09-03 Thread David Capwell
Thanks, that should help moving forward On Sep 3, 2015 10:38 AM, "Prasanth Jayachandran" < pjayachand...@hortonworks.com> wrote: > > > On Sep 2, 2015, at 10:57 PM, David Capwell wrote: > > > > So, very quickly looked at the JIRA and I had the following questi

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
ue is that estimateStripeSize won't always give the correct value since my thread is the one calling it... With everything ThreadLocal, the only writers would be the ones in the same thread, so should be better. On Wed, Sep 2, 2015 at 9:47 PM, David Capwell wrote: > Walking the MemoryMan

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
anything for me, so no issue sharding and not configuring? Thanks for your time reading this email! On Wed, Sep 2, 2015 at 8:57 PM, David Capwell wrote: > So, very quickly looked at the JIRA and I had the following question; > if you have a pool per thread rather than global, then assum

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
n Wed, Sep 2, 2015 at 7:34 PM, David Capwell wrote: > Thanks for the jira, will see if that works for us. > > On Sep 2, 2015 7:11 PM, "Prasanth Jayachandran" > wrote: >> >> Memory manager is made thread local >> https://issues.apache.org/jira/browse/HIVE-1019

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
-10191 and see if that helps? > > On Sep 2, 2015, at 8:58 PM, David Capwell wrote: > > I'll try that out and see if it goes away (not seen this in the past 24 > hours, no code change). > > Doing this now means that I can't share the memory, so will prob go with a > th

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
a synchronization on the > MemoryManager somewhere and thus be getting a race condition. > > Thanks, >Owen > > On Wed, Sep 2, 2015 at 12:57 PM, David Capwell wrote: > >> We have multiple threads writing, but each thread works on one file, so >> orc writer i

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
Also, the data put in are primitives, structs (list), and arrays (list); we don't use any of the boxed writables (like text). On Sep 2, 2015 12:57 PM, "David Capwell" wrote: > We have multiple threads writing, but each thread works on one file, so > orc writer is only

Re: ORC NPE while writing stats

2015-09-02 Thread David Capwell
o setMinimum unless it had at least some > non-null values in the column. > > Do you have multiple threads working? There isn't anything that should be > introducing non-determinism so for the same input it would fail at the same > point. > > .. Owen > > >

ORC NPE while writing stats

2015-09-01 Thread David Capwell
We are writing ORC files in our application for hive to consume. Given enough time, we have noticed that writing causes a NPE when working with a string column's stats. Not sure whats causing it on our side yet since replaying the same data is just fine, it seems more like this just happens over t

RE: External sorted tables

2015-08-03 Thread David Capwell
eting the data along with sorting it, or try it > without 'sorted by' and see if you can execute a mapjoin. > > > > > > *From:* David Capwell [mailto:dcapw...@gmail.com] > *Sent:* Monday, August 03, 2015 11:59 AM > *To:* user@hive.apache.org > *Subject:* RE: External

RE: External sorted tables

2015-08-03 Thread David Capwell
at the data **is** in fact sorted... > > > > If there is something specific you are trying to accomplish by specifying > the sort order of that column, perhaps you can elaborate on that. > Otherwise, leave out the 'sorted by' statement and you should be fine. > &g

Re: External sorted tables

2015-08-03 Thread David Capwell
is read. This means that users must > be careful to insert data correctly by specifying the number of reducers to > be equal to the number of buckets, and using CLUSTER BY and SORT BY > commands in their query." > > On Thu, Jul 30, 2015 at 7:22 PM, David Capwell wrote: > >

External sorted tables

2015-07-30 Thread David Capwell
We are trying to create a external table in hive. This data is sorted, so wanted to tell hive about this. When I do, it complains about parsing the create. > CREATE EXTERNAL TABLE IF NOT EXISTS store.testing ( ... . . . . . . . . . . . . . . . . . . .> timestamp bigint, ...) . . . . . . . . . .