Hello,
Is this not the forum for this type of question? Is there another
forum someone recommends?
Thanks,
Ajo.
On Tue, Aug 2, 2011 at 9:35 AM, Ajo Fod wrote:
> Hello Hive Gurus,
>
> I am not sure if my system is using the sorting feature.
>
> In summary:
> - I expected t
Hello Hive Gurus,
I am not sure if my system is using the sorting feature.
In summary:
- I expected to save time on the sorting step because I was using
pre-sorted data, but the query plan seem to indicate an intermediate
sorting step.
=== The Setup ===
gt;
> *To:* user@hive.apache.org
> *Sent:* Tue, 8 March, 2011 12:04:22 PM
>
> *Subject:* Re: Hive too slow?
>
> Thank you all for the tips.I'll dig into all these and let you people know
> :)
>
> --
> *From:* Igor Tatarinov
> *To:* u
tes is indeeed not normal behaviour.Could you point me at some places
> where i can get some info on how to tune this up?
>
> Regards,
> Abhishek
>
> ------
> *From:* Ajo Fod
> *To:* user@hive.apache.org
> *Sent:* Mon, 7 March, 2011 9:21:51 PM
> *Subject
In my experience, hive is not instantaneous like other DBs, but 4 minutes to
count 2200 rows seems unreasonable.
For comparison my query of 169k rows one one computer with 4 cores running
1Ghz (approx) took 20 seconds.
Cheers,
Ajo.
On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak <
forever_yours_
The good news is that this is a simple XML section .. and this looks like a
XML read error.
Try to copy-paste one of the existing properties sections and pasting over
just the name and value strings from the message.
Cheers,
Ajo
On Fri, Mar 4, 2011 at 6:40 AM, Anja Gruenheid wrote:
> Hi!
>
> Wh
lly this is caused by not having the mysql jdbc driver on the
> classpath (it's not default included in hive).
> Just put the mysql jdbc driver in the hive folder under "lib/"
>
> On 03/02/2011 03:15 PM, Ajo Fod wrote:
>
> I've checked the mysql connectio
I found the following errors in hive.log. I even tried adding the underlying
eclipse jars to the classpath:
My classpath variable -
export
CLASSPATH=/usr/share/java/mysql.jar:/usr/lib/eclipse/plugins/org.eclipse.jdt.core_3.5.2.v_981_R35x.jar:/usr/lib/eclipse/plugins
instead of
using 'python2.6 user_id_output.py hbase'
try something like this:
using 'user_id_output.py'
... and a #! line with the location of the python binary.
I think you can include a parameter too in the call like :
using 'user_id_output.py hbase'
Cheers,
Ajo.
On Tue, Mar 1, 2011 at 8:22
I know of this type of a call would give you a subset of the table ... also
I think you can use a group by clause to get it for groups of data.
> SELECT PERCENTILE(val, 0.5) FROM pct_test WHERE val > 100;
Couldn't you use this call a few times to get the value for each percentile
value?
I think
What I normally do is use python in a map phase for this sort of stuff ...
if a UDF is not available. Any scripting language would do the trick as
well.
-Ajo
On Thu, Feb 24, 2011 at 6:27 AM, Bejoy Ks wrote:
> Hi Experts
> Could some one please help me out with this? Any similar situations
elements say 5 wont
> multiple '=' be better?
>
> Regards
> Bejoy KS
>
> ----------
> *From:* Ajo Fod
>
> *To:* user@hive.apache.org
> *Sent:* Mon, February 21, 2011 10:04:41 PM
>
> *Subject:* Re: Database/Schema , INTERVAL and S
wrote:
> Hello,
>
> I did not understand this:
>
> when I do a:
>
> select item_sid, count(*) from item_raw group by item_sid
>
> i get hits per item.
>
> how do we join this to the master table?
>
> best regards,
> -c.b.
>
> On Mon, Feb 21, 2011
On using SQL IN ... what would happen if you created a short table with the
enteries in the IN clause and used a "inner join" ?
-Ajo
On Mon, Feb 21, 2011 at 7:57 AM, Bejoy Ks wrote:
> Thanks Jov for the quick response
>
> Could you please let me know which is the latest stable version of hive.
You can group by item_sid (drop session_id and ip_number from group by
clause) and then join with the parent table to get session_id and
ip_number.
-Ajo
On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz wrote:
> Hello,
>
> So I have table of item views with item_sid, ip_number, session_id
>
> I know i
Here is the relevant documentation:
http://wiki.apache.org/hadoop/Hive/LanguageManual
... see the Union section.
Cheers,
Ajo.
On Thu, Feb 17, 2011 at 11:12 PM, sangeetha s wrote:
> Hi,
>
> I am trying to perform union of two tables which are having identical
> schemas and distinct data.There ar
t; 2011/2/15 hadoop n00b
>
> Or try the ascii value like "*DELIMITED FIELDS TERMINATED BY '124'*"
>>
>> See if that helps.
>>
>> Cheers!
>>
>> On Mon, Feb 14, 2011 at 9:44 PM, Ajo Fod wrote:
>>
>>> use delimited by "
by the sound of the error ... it sounds like you don't have HiveDriver in
your path
Can you locate the calss that supposedly has the HiveDriver class?
Cheers,
Ajo
On Wed, Feb 16, 2011 at 2:03 PM, Stuart Scott wrote:
> Hi,
>
>
>
> Does anyone know how to get a Windows client to Connect to Hive
>
similar issue because the hive
>> thrift server as far as i know is single threaded up until hive 0.5.0 (the
>> version that i use). Not too sure if that's been changed in 0.6.0 or higher.
>>
>> -Viral
>>
>> On Fri, Feb 11, 2011 at 7:06 AM, Ajo Fod wrote:
Yes, I've often wondered about asymmetric configurations. Is there a
mechanism to prevent partition map/reduce jobs to be aware of differences
between speeds of processors and allocate less work the the slower
processors?
To try to answer the question here: I have not had much experience with
mul
use delimited by "|" ... are you using this syntax:
Are you saying that the syntax here not work for you?
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table
... if you tried this ... ccould it be that the error may be caused by
something else.
Cheers,
-Ajo
On Mon, Feb 14, 20
I'd be surprised if this were not enough.
-Ajo
On Fri, Feb 11, 2011 at 2:51 PM, Cam Bazz wrote:
> Hello,
>
> I sometimes need to delete everything in hdfs and recreate the tables.
>
> The question is: how do I clear everything in the hdfs and hive?
>
> I delete everything in /tmp, hadoop/logs
Are you using hive 0.6? ... may be fixed in the latest version.
Also I wonder why these thrift libraries are being used ... is this normal
hive operation, or can you do something to avoid using thrift?
-Ajo
On Fri, Feb 11, 2011 at 12:05 AM, vaibhav negi wrote:
>
> Hi all,
>
> I am loading data
not a problem because eventually the job would complete (super-slow)
>> but it would be nice to know the reason behind this behavior and how I could
>> optimize it so that I can take full advantage of having multiple reducers
>> running.
>>
>> -Viral
>>
>&g
I've had similar experiences ... usually with bucketing.
Is this your experience too?
-Ajo
On Thu, Feb 10, 2011 at 1:57 PM, Viral Bajaria wrote:
> Hello,
>
> In my Hive cluster, I have setup the mapred.reduce.tasks to be -1 i.e. I am
> allowing HIVE to figure out the # of reducers that it would
Have you tried constructing the table as a text file?
use the following at the end of the "CREATE table" statement :
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
It might be just that sequencefile puts in some information even if there is
no data.
Cheers,
Ajo.
On Wed, Feb
ata is in HDFS by means of
>
> INSERT OVERWRITE TABLE tablename_new SELECT * FROM tablename ... (kind of)
>
> So those LOCAL tables are kind of temporary.
>
> Amlan
>
>
> On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod wrote:
> >
> > Look up for local :
> > http
Look up for local :
http://wiki.apache.org/hadoop/Hive/GettingStarted
-Ajo.
On Tue, Feb 1, 2011 at 3:15 AM, Amlan Mandal wrote:
> Hi All,
> I am a hive newbie.
>
> LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename
>
> When I use LOCAL keyword does hive create a hdfs file for
I am new to hive and hadoop and I got the packaged version from Cloudera.
So, personally, I'd be happy if the new package is mutually consistent.
-Ajo
On Mon, Jan 31, 2011 at 5:14 PM, Carl Steinbach wrote:
> Hi,
>
> I'm trying to get an idea of how many people plan on running Hive
> 0.7.0 on to
I think there is a developer mailing list ... that is probably the best
place for this question.
Also, I think there is a cost-based query optimizer in the works somewhere.
-Ajo
On Mon, Jan 31, 2011 at 2:04 PM, Anja Gruenheid
wrote:
> Hi!
>
> I'm a graduate student from Georgia Tech and I'm wor
I've noticed that it takes a while for each map job to be set up in hive ...
and the way I set up the job I noticed that there were as many maps as
files/buckets.
I read a recommendation somewhere to design jobs such that they take at
least a minute.
Cheers,
-Ajo.
On Mon, Jan 31, 2011 at 8:08 AM
Seems like you are using a MySQL metadata store ... do you have write
permissions on the store? ... can you create another table?
If not, perhaps you can try with the plain vanilla metastore and see if the
problem persists.
-Ajo.
On Fri, Jan 28, 2011 at 2:31 AM, lei liu wrote:
> When I execu
Any chance you can convert the data to a tab separated text file and try the
same query?
It may not be the SerDe, but it may be good to isolate that away as a
potential source of the problem.
-Ajo.
On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <
patrick.christop...@hp.com> wrote:
> Hi,
>
>
Have you tried using external tables?
BTW, hive tables can be defined as text tables, so you can run mapreduce on
them too. Just locate the tables under directory:
/user/hive/warehouse/
Cheers,
-Ajo.
2011/1/26 母延年YNM
> When I use load data into table like this ,
>
> LOAD DATA INPATH '/user/myn
an Coveney wrote:
> Yes, I tried that, it looks like it forces it to 1 if there are no groups.
>
> 2011/1/24 Ajo Fod
>
> oh ... sorry you say you already tried that.
>>
>>
>>
>> On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote:
>> > you could try to set
ied that, it looks like it forces it to 1 if there are no groups.
>
> 2011/1/24 Ajo Fod
>>
>> oh ... sorry you say you already tried that.
>>
>>
>>
>> On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote:
>> > you could try to set the number of reduc
oh ... sorry you say you already tried that.
On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote:
> you could try to set the number of reducers e.g:
> set mapred.reduce.tasks=4;
>
> set this before doing the select.
>
> -Ajo
>
> On Mon, Jan 24, 2011 at 1:13 PM, Jonathan Cov
you could try to set the number of reducers e.g:
set mapred.reduce.tasks=4;
set this before doing the select.
-Ajo
On Mon, Jan 24, 2011 at 1:13 PM, Jonathan Coveney wrote:
> I have a 10 node server or so, and have been mainly using pig on it, but
> would like to try out Hive.
> I am running thi
consider a table that looks like:
t1 12
t2 10
t3 -20
with cumsum, I'd like an output that looks like
t1 12
t2 22
t3 2
with diff, I'd like something that looks like
t1 12
t2 2
t3 -30
Any comments on how one would go about these problems best in the hive
framework?
-Ajo
For a tab separated file, I think it is the null string ... i.e no
characters. So, for example
12\ta\t\t2
1\tb\ta\t1
reads
12 a 2
1b a 1
On Fri, Jan 21, 2011 at 1:09 AM, lei liu wrote:
> I generate HDFS file , then I load the file to one hive table. There
It probably depends on how big the big table is ... I mean if it can
be held in memory.
-Ajo
On Wed, Jan 19, 2011 at 11:23 PM, hadoop n00b wrote:
> Thanks Leo,
>
> Does the smaller table go into the mapjoin hint? Actually, when I ran a test
> query with the bigger table in the hint, it performed
is not compressed.
>
>
> -Original Message-
> From: Ajo Fod [mailto:ajo@gmail.com]
> Sent: Tuesday, January 18, 2011 8:46 AM
> To: user@hive.apache.org
> Subject: Re: On compressed storage : why are sequence files bigger than text
> files?
>
> I tried 10
Ah! ok.
Thanks.
-Ajo.
On Wed, Jan 19, 2011 at 9:03 AM, Ping Zhu wrote:
> I think only Hive 0,7 or later accepts syntax drop table if exists.
> http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Drop_Table
>
>
> On Wed, Jan 19, 2011 at 8:54 AM, Ajo Fod wrote:
>>
>&g
Wed, Jan 19, 2011 at 8:04 AM, Edward Capriolo wrote:
> On Wed, Jan 19, 2011 at 10:46 AM, Ajo Fod wrote:
>> I've 2 questions:
>> 1) how to raise the number of reducers?
>> 2) why are there only 2 bucket files per partition even though I
>> specified 32 buckets?
>
I don't think this works.
>> drop table if exists ;
... it seems to fail on the if exists part.
Is anyone's experience different ?... I'm using CDH3 ... Hive 0.5.0.
-Ajo
I've 2 questions:
1) how to raise the number of reducers?
2) why are there only 2 bucket files per partition even though I
specified 32 buckets?
I've set the following and don't see an increase in the number of reducers.
>>set hive.exec.reducers.max=32;
>>set mapred.reduce.tasks=32;
Could this b
Can you try this with a dummy table with very few rows ... to see if
the reason the script doesn't finish is a computational issue?
One other thing is to try with a combined partition, to see if it is a
problem with the partitioning.
Also, take a look at the results of an EXPLAIN statement, see
:
> On Tue, Jan 18, 2011 at 10:25 AM, Ajo Fod wrote:
>> I tried with the gzip compression codec. BTW, what do you think of
>> bz2, I've read that it is possible to split as input to different
>> mappers ... is there a catch?
>>
>> Here are my flags now ... o
..
as earlier ... BTW, takes 32sec to complete.
sequence files are now stored in (2 files) totaling 244MB ... takes
about 84 seconds.
.. mind you the original was one file with 132MB.
Cheers,
-Ajo
On Tue, Jan 18, 2011 at 6:36 AM, Edward Capriolo wrote:
> On Tue, Jan 18, 2011 at 9:07 AM, Ajo Fo
Hello,
My questions in short are:
- why are sequencefiles bigger than textfiles (considering that they
are binary)?
- It looks like compression does not make for a smaller sequence file
than the original text file.
-- here is a sample data that is transfered into the tables below with
an INSERT O
Hello,
In the documentation I read that as many files are created in each
partition as there are buckets. In the following sample script, I
created 32 buckets, but only find 2 files in each partition directory.
Am I missing something?
In this sample script, I'm trying to load a tab separated fil
51 matches
Mail list logo