Have you tried nohup ?
Le 5 déc. 2014 15:25, "peterm_second" a écrit :
> Hi Guys,
> How can I launch the Hiveserver2 as a daemon.
> I am launching the hiverserv2 using sshpass and I can't detach hiveserver2
> from my terminal. Is there a way to deamonise the hiveserver2 ?
>
> I've also tried usi
Hallo,
I think you have to think first about your functional and non-functional
requirements. You can scale "normal" SQL databases as well (cf CERN or
Facebook). There are different types of databases for different purposes -
there is no one fits it all. At the moment, we are a few years away from
Might be an access right problem of the hive server user.
Le jeu. 4 juin 2015 à 11:53, Chinna Rao Lalam
a écrit :
> Hi,
>
> If you are table name is orc_table in the exception i can see the table
> name as "test"
>
> Moving data to: hdfs://:8020/apps/hive/warehouse/test
>
> Failed with exception
Hi,
Is there any official way to verify that a query leveraged orc bloom
filters or orc indexes? For example, number of bytes (rows) not processed
thanks to bloom filters or storage indexes? Some indicators in the explain
output?
Thank you.
Best reagrds
Always use the newest version of Hive. You should use orc or parquet
wherever possible. If you use orc then you should explicitly enable storage
indexes and insert your table sorted (eg for the query below you would sort
on x). Additionally you should enable statistics.
Compression may bring addit
I have no problems to use jdbc for hiveserver2. I think you need the
hive*jdbc*standalone.jar and i think hadoop-commons*.jar
Le ven. 7 août 2015 à 5:23, Stephen Bly a écrit :
> What library should I use if I want to make persistent connections from
> within Scala/Java? I’m working on a web ser
Maybe there is another older log4j library in the classpath?
Le ven. 14 août 2015 à 5:34, Praveen Sripati a
écrit :
> Hi,
>
> I installed Java 1.8.0_51, Hadoop 1.2.1 and Hive 1.2.1 on Ubuntu 14.04 64
> bit, I do get the below exception when I start the hive shell or the
> beeline. How do I get a
Additionally, although it is a PoC you should have a realistic data model.
Furthermore, following good data modeling practices should be taken into
account. Joining on a double is not one of them. It should be int.
Furthermore, double is a type that is in most scenarios rarely used. In the
business
What about using the hcatalog apis?
Le mer. 26 août 2015 à 8:27, Jerrick Hoang a
écrit :
> Hi all,
>
> I want to interact with HiveMetaStore table from code and was looking at
> http://hive.apache.org/javadocs/r0.13.1/api/metastore/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.html
> , wa
Why not use hcatalog web service api?
Le mer. 26 août 2015 à 18:44, Jerrick Hoang a
écrit :
> Ok, I'm super confused now. The hive metastore is a RDBMS database. I
> totally agree that I shouldn't access it directly via jdbc. So what about
> using this class
> http://hive.apache.org/javadocs/r0.
What do you mean by it is not working?
You may also check the logs of your lap server...
Maybe there is also a limitations of number of logins in your lap server...
Maybe the account is temporarily blocked because you entered the password
wrongly too many times...
Le ven. 18 sept. 2015 à 10:34, Lo
Why not use tez ui?
Le jeu. 1 oct. 2015 à 2:29, James Pirz a écrit :
> I am using Tez 0.7.0 on Hadopp 2.6 to run Hive queries.
> I am interested in checking DAGs for my queries visually, and I realized
> that I can do that by graphviz once I can get "dot" files of my DAGs. My
> issue is I can no
You could edit the beeline script and add the driver there to the classpath
Le jeu. 8 oct. 2015 à 16:02, Timothy Garza
a écrit :
> I’ve installed Hive 1.2.1 on Amazon Linux AMI release 2015.03, master-node
> of Hadoop cluster.
>
>
>
> I can successfully access the Beeline client but when I try t
> On 29 Oct 2015, at 06:43, Ashok Kumar wrote:
>
> hi gurus,
>
> kindly clarify the following please
>
> Hive currently does not support indexes or indexes are not used in the query
Not correct. See https://snippetessay.wordpress.com
> The lowest granularity for concurrency is partition. If ta
You clearly need to escape those characters as for any other tool. You may want
to use avro instead of csv , xml or JSON etc
> On 30 Oct 2015, at 19:16, Vijaya Narayana Reddy Bhoomi Reddy
> wrote:
>
> Hi,
>
> I have a CSV file which contains hunderd thousand rows and about 200+
> columns. So
What is the create table statement? You may want to insert everything into the
orc table (sorted on x and/or y) and then apply the where statement in your
queries on the orc table.
> On 02 Nov 2015, at 13:36, Kashif Hussain wrote:
>
> Hi,
> I am trying to insert data into orc table from a tex
Bloom Filter only works for = and min max for <>= , however the latter only
works for numeric value while the bloom filter nearly works on all types.
Additionally the bloom filter is a probabilistic data structure.
For both it make sense that the data is sorted on the column which is most
select
Probably you started the new Hive version before upgrading the schema. This
means manual fixing.
> On 03 Nov 2015, at 11:56, Sanjeev Verma wrote:
>
> Hi
>
> I am trying to update the metastore using schematool but getting error
>
> schematool -dbType derby -upgradeSchemaFrom 0.12
>
> Upg
First it depends on what you want to do exactly. Second, Hive > 1.2, Tez as an
Execution Engine (I recommend >= 0.8) and Orc as storage format can be pretty
quick depending on your use case. Additionally you may want to employ
compression which is a performance boost once you understand how stor
Probably it is outdated.
Hive can access hbase tables via external tables. The execution engine in Hive
can be mr, tez, spark. Hiveql is nowadays very similar to sql . In fact,
Hortonworks plans to make it sql2011:analytics compatible.
Hbase can be accessed independently of Hive via sql using P
I recommend to use a Hadoop distribution containing these technologies. I think
you get also other useful tools for your scenario, such as Auditing using
sentry or ranger.
> On 20 Nov 2015, at 10:48, Mich Talebzadeh wrote:
>
> Well
>
> “I'm planning to deploy Hive on Spark but I can't find t
I think the most recent versions of cloudera or Hortonworks should include all
these components - try their Sandboxes.
> On 20 Nov 2015, at 12:54, Dasun Hegoda wrote:
>
> Where can I get a Hadoop distribution containing these technologies? Link?
>
>> On Fri, Nov 20, 201
Why not implement Hive UDF in Java?
> On 28 Nov 2015, at 21:26, Mahender Sarangam
> wrote:
>
> Hi team,
>
> We need expert input to discuss how to implement Rule engine in hive. Do you
> have any references available to implement rule in hive/pig.
>
>
> We are migrating our Stored Proced
How did you create the tables? Do you have automated statistics activated in
Hive?
Btw mr is outdated as a Hive execution engine. Use TEZ (maybe wait for 0.8 for
sub second queries ) or use Spark as an execution engine in Hive.
> On 01 Dec 2015, at 17:40, Mich Talebzadeh wrote:
>
> What if we
om
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this message
> shall not be understood as given or endorsed
I am not sure if I understand, but why this should not be possible using SQL
in hive?
> On 02 Dec 2015, at 21:26, Frank Luo wrote:
>
> Didn’t get any response, so trying one more time. I cannot believe I am the
> only one facing the problem.
>
> From: Frank Luo
> Sent: Tuesday, December 0
How many nodes, cores and memory do you have?
What hive version?
Do you have the opportunity to use tez as an execution engine?
Usually I use external tables only for reading them and inserting them into a
table in Orc or parquet format for doing analytics.
This is much more performant than jso
ORC or PARQUET, requires us to load 5 years of LZO data in ORC or
> PARQUET format. Though it might be performance efficient, it increases data
> redundancy.
> But we will explore that option.
>
> Currently I want to understand when I am unable to scale up mappers.
>
> Tha
for analytics the
ORC or parquet format.
> On 03 Dec 2015, at 15:28, Jörn Franke wrote:
>
> Your Hive version is too old. You may want to use also another execution
> engine. I think your problem might then be related to external tables for
> which the parameter you set probably
What operating system are you using?
> On 04 Dec 2015, at 01:25, mahender bigdata
> wrote:
>
> Hi Team,
>
> Does hive supports Hive Unicode like UTF-8,UTF-16 and UTF-32. I would like to
> see different language supported in hive table. Is there any serde which can
> show exactly japanese, ch
You forgot to tell Hive that the file is comma-separated. You may want to use
the CSV serde.
> On 16 Dec 2015, at 07:15, zml张明磊 wrote:
>
> I am confusing about the following result. Why the hive table has so many
> NULL value ?
>
> hive> select * from managers;
> OK
> fergubo01m,BS1,31,20,10
Do you have the create table statement? The sqoop command ?
> On 17 Dec 2015, at 07:13, Trainee Bingo wrote:
>
> Hi All,
>
> I have a sqoop script which brings data from oracle and dumps it to HDFS.
> Then that data is exposed to hive external table. But when I do :
> hive> select * from ;
>
Hive has the export/import commands, alternatively Falcon+oozie
> On 17 Dec 2015, at 17:21, Elliot West wrote:
>
> Hello,
>
> I'm thinking about the steps required to repeatedly push Hive datasets out
> from a traditional Hadoop cluster into a parallel cloud based cluster. This
> is not a one
I think you should draw more the attention that Hive is just one component in
the ecosystem. You can have many more components, such as ELT, integrating
unstructured data, machine learning, streaming data etc. however usually
analysts are not aware about the technologies and it staff is not much
Have you checked what the issue is with the log file causing troubles? Enough
space available? Access rights (what is the user of the spark worker?)? Does
directory exist?
Can you provide more details how the table is created?
Does the query work with mr or tez as an execution engine?
Does a n
Have you tried it with Hive ob TEZ? It contains (currently) more optimizations
than Hive on Spark.
I assume you use the latest Hive version.
Additionally you may want to think about calculating statistics (depending on
your configuration you need to trigger it) - I am not sure if Spark can use
t
27;1451429900') |
>
> ;
>
>
> http://talebzadehmich.wordpress.com
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it imm
proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Technology
>> Ltd, its
> Mich Talebzadeh
>>
>>
>>
>> Sybase ASE 15 Gold Medal Award 2008
>>
>> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>>
>> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>>
>&g
You are using an old version of Spark and it cannot leverage all optimizations
of Hive, so I think that your conclusion cannot be as easy as you might think.
> On 31 Dec 2015, at 19:34, Mich Talebzadeh wrote:
>
> Ok guys.
>
> I have not succeeded in installing TEZ. Yet so I can try the query
You can still use execution Engine mr for maintaining the index. Indeed with
the ORC or parquet format there are min/max indexes and bloom filters, but you
need to sort your data appropriately to benefit from performance. Alternatively
you can create redundant tables sorted in different order.
T
Btw this is not Hive specific, but also for other relational database systems,
such as Oracle Exadata.
> On 05 Jan 2016, at 20:57, Jörn Franke wrote:
>
> You can still use execution Engine mr for maintaining the index. Indeed with
> the ORC or parquet format there are min/max
If I understand you correctly this could be just another Hive storage format.
> On 06 Jan 2016, at 07:24, Mich Talebzadeh wrote:
>
> Hi,
>
> Thinking loudly.
>
> Ideally we should consider a totally columnar storage offering in which each
> column of table is stored as compressed value (I disr
I am not sure how much performance one could gain in comparison to ORC or
Parquet. They work pretty well once you know how to use them. However,
there is still ways to optimize them. For instance, sorting of data is a
key factor for these formats to be efficient. Nevertheless, if you have a
lot of
This observation is correct and it is the same behavior as you see it in other
databases supporting partitions. Usually you should avoid many small partitions.
> On 07 Jan 2016, at 23:53, Mich Talebzadeh wrote:
>
> Ok we hope that partitioning improves performance where the predicate is on
>
recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this message
> shall not be understood as given or endorsed by Peridale Technology Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> re
Try explain dependency
> On 08 Jan 2016, at 10:47, Mich Talebzadeh wrote:
>
> Thanks Gopal.
>
> Basically the following is true:
>
> 1.The storage layer is HDFS
> 2.The execution engine is MR, Tez, Spark etc
> 3.The access layer is Hive
>
> When we say the access layer is Hive,
y of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
> From: Jörn Franke [mailto:jornfra...@gmail.com]
> Sent: 08 January 2016 08:49
> To: user@hive.apache.org
>
Do you have some data model?
Basically modern technologies, such as Hive, but also relational database,
suggest to prejoin tables and working on big flat tables. The reason is that
they are distributed systems and you should avoid transferring for each query a
lot of data between nodes.
Hence,
Just be aware that you should insert the data sorted at least on the most
discrimating column of your where clause
> On 19 Jan 2016, at 17:27, Owen O'Malley wrote:
>
> It has both. Each index has statistics of min, max, count, and sum for each
> column in the row group of 10,000 rows. It also
ote:
>
> Thanks Owen,
>
> I got a bit confused comparing ORC with what I know about indexes in
> relational databases. Still need to understand it a bit better.
>
> Regards
>
> From: Owen O'Malley [mailto:omal...@apache.org]
> Sent: 19 January 2016 17:57
> To: user
Well, you can create an empty Hive table in Orc format and use --hive-override
in sqoop
Alternatively you can use --hive-import and set hive.default.format
I recommend to define the schema properly on the command line, because sqoop
detection of formats is based on jdbc (Java) types which is no
Check HiveMall
> On 03 Feb 2016, at 05:49, Koert Kuipers wrote:
>
> yeah but have you ever seen somewhat write a real analytical program in hive?
> how? where are the basic abstractions to wrap up a large amount of operations
> (joins, groupby's) into a single function call? where are the tool
How many disk drives do you have / node?
Generally one node should have 12 drives not configured as raid and not
configured as lvm.
Files could be a little bit larger (4 or better 40 gb - your namenode will
thank you) or use Hadoop Archive (HAR).
I am not sure about the latest status of Phoeni
Why should it not be ok if you do not miss any functionality? You can use oozie
+ hive queries to have more sophisticated logging and scheduling. Do not forget
to do proper capacity/queue management.
> On 16 Feb 2016, at 07:19, Ramasubramanian
> wrote:
>
> Hi,
>
> Is it ok to build an entire
I am not sure what you are looking for. Performance has many influence
factors...
> On 24 Feb 2016, at 18:23, Mich Talebzadeh
> wrote:
>
> Hi,
>
>
>
> Has anyone got some performance matrix for Hive 2 from user perspective?
>
> It looks very impressive on ORC tables.
>
> thanks
>
> --
how fast it returns the results in this case compare to 1.2.1 etc
>
> thanks
>
>> On 24/02/2016 17:25, Jörn Franke wrote:
>>
>> I am not sure what you are looking for. Performance has many influence
>> factors...
>>
>>> On 24 Feb 2016, at 18:23, Mich
I think you can always make a benchmark that has this and this result. You
always have to see what is evaluated and generally I recommend to always try
yourself for your data and your queries.
There is also a lot of change within the projects. Impala may have Kudo, but
Hive has ORC, Tez and Spa
It always depends on what you want to do and thus from experience I cannot
agree with your comment. Do you have any reasoning for this statement?
> On 02 Mar 2016, at 19:14, Dayong wrote:
>
> Tez is kind of outdated and Orc is so dedicated on hive. In addition, hive
> metadata store can be de
What is the use case? You can try security solutions such as Ranger or Sentry.
As already mentioned another alternative could be a view.
> On 08 Mar 2016, at 21:09, PG User wrote:
>
> Hi All,
> I have one question about putting hive in read-only mode.
>
> What are the ways of putting hive in r
Apache Knox for authentication makes sense. For Hive authorization there are
tools such as Apache ranger or Sentry, which themselves can connect via LDAP.
> On 09 Mar 2016, at 16:58, Alan Gates wrote:
>
> One way people have gotten around the lack of LDAP connectivity in HS2 has
> been to use
Why Don't you load all data and use just two columns for querying?
Alternatively use regular expressions.
> On 09 Mar 2016, at 18:43, Ajay Chander wrote:
>
> Hi Everyone,
>
> I am looking for a way, to ignore the first occurrence of the delimiter while
> loading the data from csv file to h
The data is already in the csv so it is not matter for querying. It is
recommend to convert it to ORC or Parquet for querying.
> On 09 Mar 2016, at 19:09, Ajay Chander wrote:
>
> Daniel, thanks for your time. Is it like creating two tables, one is to get
> all the data and the another one is t
Just out of curiosity: what is the code base for the odbc drivers by
Hortonworks, cloudera & co? Did they develop them on their own?
If yes, maybe one should think about an open source one, which is reliable and
supports a richer set of Odbc functionality.
Especially in the light of Orc,parque
Honestly 0.12 is a no go - you miss a lot of performance improvements. Probably
your query would execute in less than a minute. If your Hadoop vendor does not
support smooth upgrades then change it. Hive 1.2.1 is the absolute minimum
including using Orc or parquet as a table format and tez (pref
What are your requirements? Do you need to omit a column? Transform it? Make
the anonymized version joinable etc. there is not simply one function.
> On 17 Mar 2016, at 14:58, Ajay Chander wrote:
>
> Hi Everyone,
>
> I have a csv.file which has some sensitive data in a particular column in it.
How much data are you querying? What is the query? How selective it is supposed
to be? What is the block size?
> On 16 Mar 2016, at 11:23, Joseph wrote:
>
> Hi all,
>
> I have known that ORC provides three level of indexes within each file, file
> level, stripe level, and row level.
> The fi
minal_type = 25080;
> select * from gprs where terminal_type = 25080;
>
> In the gprs table, the "terminal_type" column's value is in [0, 25066]
>
> Joseph
>
> From: Jörn Franke
> Date: 2016-03-16 19:26
> To: Joseph
> CC: user; user
> Subject: Re
Joining so many external tables is always an issue with any component. Your
problem is not Hive specific; but your data model seems to be messed up. First
of all you should have them in an appropriate format, such as ORC or parquet
and the tables should not be external. Then you should use the r
If you check the newest Hortonworks distribution then you see that it generally
works. Maybe you can borrow some of their packages. Alternatively it should be
also available in other distributions.
> On 26 Mar 2016, at 22:47, Mich Talebzadeh wrote:
>
> Hi,
>
> I am running Hive 2 and now Spar
Is the MySQL database virtualized? Bottlenecks to storage of the MySQL
database? Network could be a bottleneck? Firewalls blocking new connections in
case of a sudden connection increase?
> On 30 Mar 2016, at 23:28, Udit Mehta wrote:
>
> Hi all,
>
> We are currently running Hive in productio
Please provide exact log messages , create table statements, insert statements
> On 06 Apr 2016, at 12:05, Ashim Sinha wrote:
>
> Hi Team
> Need help for the issue
> Steps followed
> table created
> Loaded the data of lenght 38 in decimal type
> Analyse table - for columns gives error like zero
Just out of curiosity, what is the use case behind this?
How do you call the shell script?
> On 16 Apr 2016, at 00:24, Shirish Tatikonda
> wrote:
>
> Hello,
>
> I am trying to run multiple hive queries in parallel by submitting them
> through a map-reduce job.
> More specifically, I have a
You could also explore the in-memory database of 12c . However, I am not sure
how beneficial it is for Oltp scenarios.
I am excited to see how the performance will be on hbase as a hive metastore.
Nevertheless, your results on Oracle/SSD will be beneficial for the community.
> On 17 Apr 2016,
Depends really what you want to do. Hive is more for queries involving a lot of
data, whereby hbase+Phoenix is more for oltp scenarios or sensor ingestion.
I think the reason is that hive has been the entry point for many engines and
formats. Additionally there is a lot of tuning capabilities fr
Hive has working indexes. However many people overlook that a block is usually
much larger than in a relational database and thus do not use them right.
> On 19 Apr 2016, at 09:31, Mich Talebzadeh wrote:
>
> The issue is that Hive has indexes (not index store) but they don't work so
> there we
I am still not sure why you think they are not used. The main issue is that the
block size is usually very large (eg 256 MB compared to kilobytes / sometimes
few megabytes in traditional databases) and the indexes refer to blocks. This
makes it less likely that you can leverage it for small data
You could try as binary. Is it just for storing the blobs or for doing analyzes
on them? In the first case you may think about storing them as files in HDFS
and including in hive just a string containing the file name (to make analysis
on the other data faster). In the later case you should thin
Dear all,
I prepared a small Serde to analyze Bitcoin blockchain data with Hive:
https://snippetessay.wordpress.com/2016/04/28/hive-bitcoin-analytics-on-blockchain-data-with-sql/
There are some example queries, but I will add some in the future.
Additionally, more unit tests will be added.
Let m
I would still need some time to dig deeper in this. Are you using a specific
distribution? Would it be possible to upgrade to a more recent Hive version?
However, having so many small partitions is a bad practice which seriously
affects performance. Each partition should at least contain several
I do not think you make it faster by setting the execution engine to Spark.
Especially with such an old Spark version.
For such simple things such as "dump" bulk imports and exports, it does matter
much less if it all what execution engine you use.
There was recently a discussion on that on the
Why don't you export the data from hbase to hive, eg in Orc format. You should
not use mr with Hive, but Tez. Also use a recent hive version (at least 1.2).
You can then do queries there. For large log file processing in real time, one
alternative depending on your needs could be Solr on Hadoop
I do not remember exactly, but I think it worked simply by adding a new
partition to the old table with the additional columns.
> On 17 May 2016, at 15:00, Mich Talebzadeh wrote:
>
> Hi Mahendar,
>
> That version 1.2 is reasonable.
>
> One alternative is to create a new table (new_table) in H
Use a distribution, such as Hortonworks
> On 18 May 2016, at 19:09, Me To wrote:
>
> Hello,
>
> I want to install hive on my windows machine but I am unable to find any
> resource out there. I am trying to set up it from one month but unable to
> accomplish that. I have successfully set up
XML is generally slow in any software. It is not recommended for large data
volumes.
> On 22 May 2016, at 10:15, Maciek wrote:
>
> Have you had to load XML data into Hive? Did you run into any problems or
> experienced any pain points, e.g. complex schemas or performance?
>
> I have done a lo
Or use Falcon ...
The Spark JDBC I would try to avoid. Jdbc is not designed for these big data
bulk operations, eg data has to be transferred uncompressed and there is the
serialization/deserialization issue query result -> protocol -> Java objects ->
writing to specific storage format etc
This
Both have outdated versions, usually one can support you better if you upgrade
to the newest.
Firewall could be an issue here.
> On 26 May 2016, at 10:11, Nikolay Voronchikhin
> wrote:
>
> Hi PySpark users,
>
> We need to be able to run large Hive queries in PySpark 1.2.1. Users are
> runni
on use case.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>
h TEZ) or use Impala instead of Hive
> etc as I am sure you already know.
>
> Cheers,
>
>
>
>
> Dr Mich Talebzadeh
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
> http://talebzadehmich.wordpress.com
&
an email to Hive user group to see anyone has managed to
>>> built a vendor independent version.
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>> LinkedIn
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>
Thanks very interesting explanation. Looking forward to test it.
> On 31 May 2016, at 07:51, Gopal Vijayaraghavan wrote:
>
>
>> That being said all systems are evolving. Hive supports tez+llap which
>> is basically the in-memory support.
>
> There is a big difference between where LLAP & Spark
This can be configured on the Hadoop level.
> On 03 Jun 2016, at 10:59, Nick Corbett wrote:
>
> Hi
>
>
> I am deploying Hive in a regulated environment - all data needs to be
> encrypted when transferred and at rest.
>
>
> If I run a 'select' statement, using HiveServer2, then a map reduce
Never use string when you can use int - the performance will be much better -
especially for tables in Orc / parquet format
> On 04 Jun 2016, at 22:31, Igor Kravzov wrote:
>
> Thanks Dudu.
> So if I need actual date I will use view.
> Regarding partition column: I can create 2 external table
This is not the recommended way to load large data volumes into Hive. Check the
external table feature, scoop, and the Orc/parquet formats
> On 08 Jun 2016, at 14:03, raj hive wrote:
>
> Hi Friends,
>
> I have to insert the data into hive table from Java program. Insert query
> will work in
The indexes are based on HDFS blocksize, which is usually around 128 mb. This
means for hitting a single row you must always load the full block. In
traditional databases this blocksize it is much faster. If the optimizer does
not pick up the index then you can query the index directly (it is ju
Hallo,
For no databases (including traditional ones) it is advisable to fetch this
amount through jdbc. Jdbc is not designed for this (neither for import nor for
export of large data volumes). It is a highly questionable approach from a
reliability point of view.
Export it as file to HDFS and
Aside from this the low network performance could also stem from the Java
application receiving the JDBC stream (not threaded / not efficiently
implemented etc). However that being said, do not use jdbc for this.
> On 20 Jun 2016, at 17:28, Jörn Franke wrote:
>
> Hallo,
>
> F
you saying that the reference command line interface
> is not efficiently implemented? :)
>
> -David Nies
>
>> Am 20.06.2016 um 17:46 schrieb Jörn Franke :
>>
>> Aside from this the low network performance could also stem from the Java
>> application receiv
I recommend you to rethink it as part of a bulk transfer potentially even using
separate partitions. Will be much faster.
> On 21 Jun 2016, at 13:22, raj hive wrote:
>
> Hi friends,
>
> INSERT,UPDATE,DELETE commands are working fine in my Hive environment after
> changing the configuration an
Marcin is correct : either split up the gzip files in smaller files of at least
on HDFS block or use bzip2 with block compression.
What is the original format of the table?
> On 22 Jun 2016, at 01:50, Marcin Tustin wrote:
>
> This is because a GZ file is not splittable at all. Basically, try
1 - 100 of 178 matches
Mail list logo