Re: Question about sorted tables

2011-08-03 Thread Ajo Fod
Hello, Is this not the forum for this type of question? Is there another forum someone recommends? Thanks, Ajo. On Tue, Aug 2, 2011 at 9:35 AM, Ajo Fod wrote: > Hello Hive Gurus, > > I am not sure if my system is using the sorting feature. > > In summary: > - I expected t

Question about sorted tables

2011-08-02 Thread Ajo Fod
Hello Hive Gurus, I am not sure if my system is using the sorting feature. In summary: - I expected to save time on the sorting step because I was using pre-sorted data, but the query plan seem to indicate an intermediate sorting step. === The Setup ===

Re: Hive too slow?

2011-03-11 Thread Ajo Fod
gt; > *To:* user@hive.apache.org > *Sent:* Tue, 8 March, 2011 12:04:22 PM > > *Subject:* Re: Hive too slow? > > Thank you all for the tips.I'll dig into all these and let you people know > :) > > -- > *From:* Igor Tatarinov > *To:* u

Re: Hive too slow?

2011-03-07 Thread Ajo Fod
tes is indeeed not normal behaviour.Could you point me at some places > where i can get some info on how to tune this up? > > Regards, > Abhishek > > ------ > *From:* Ajo Fod > *To:* user@hive.apache.org > *Sent:* Mon, 7 March, 2011 9:21:51 PM > *Subject

Re: Hive too slow?

2011-03-07 Thread Ajo Fod
In my experience, hive is not instantaneous like other DBs, but 4 minutes to count 2200 rows seems unreasonable. For comparison my query of 169k rows one one computer with 4 cores running 1Ghz (approx) took 20 seconds. Cheers, Ajo. On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak < forever_yours_

Re: Stats Gathering Problems

2011-03-04 Thread Ajo Fod
The good news is that this is a simple XML section .. and this looks like a XML read error. Try to copy-paste one of the existing properties sections and pasting over just the name and value strings from the message. Cheers, Ajo On Fri, Mar 4, 2011 at 6:40 AM, Anja Gruenheid wrote: > Hi! > > Wh

Re: Trouble using mysql metastore

2011-03-02 Thread Ajo Fod
lly this is caused by not having the mysql jdbc driver on the > classpath (it's not default included in hive). > Just put the mysql jdbc driver in the hive folder under "lib/" > > On 03/02/2011 03:15 PM, Ajo Fod wrote: > > I've checked the mysql connectio

eclipse related problems with hive

2011-03-01 Thread Ajo Fod
I found the following errors in hive.log. I even tried adding the underlying eclipse jars to the classpath: My classpath variable - export CLASSPATH=/usr/share/java/mysql.jar:/usr/lib/eclipse/plugins/org.eclipse.jdt.core_3.5.2.v_981_R35x.jar:/usr/lib/eclipse/plugins

Re: cannot start the transform script. reason : "argument list too long"

2011-03-01 Thread Ajo Fod
instead of using 'python2.6 user_id_output.py hbase' try something like this: using 'user_id_output.py' ... and a #! line with the location of the python binary. I think you can include a parameter too in the call like : using 'user_id_output.py hbase' Cheers, Ajo. On Tue, Mar 1, 2011 at 8:22

Re: Percent Rank Calculation

2011-03-01 Thread Ajo Fod
I know of this type of a call would give you a subset of the table ... also I think you can use a group by clause to get it for groups of data. > SELECT PERCENTILE(val, 0.5) FROM pct_test WHERE val > 100; Couldn't you use this call a few times to get the value for each percentile value? I think

Re: How to add hours/minutes to a timestamp column in Hive Query

2011-02-24 Thread Ajo Fod
What I normally do is use python in a map phase for this sort of stuff ... if a UDF is not available. Any scripting language would do the trick as well. -Ajo On Thu, Feb 24, 2011 at 6:27 AM, Bejoy Ks wrote: > Hi Experts > Could some one please help me out with this? Any similar situations

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

2011-02-23 Thread Ajo Fod
elements say 5 wont > multiple '=' be better? > > Regards > Bejoy KS > > ---------- > *From:* Ajo Fod > > *To:* user@hive.apache.org > *Sent:* Mon, February 21, 2011 10:04:41 PM > > *Subject:* Re: Database/Schema , INTERVAL and S

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Ajo Fod
wrote: > Hello, > > I did not understand this: > > when I do a: > > select item_sid, count(*) from item_raw group by item_sid > > i get hits per item. > > how do we join this to the master table? > > best regards, > -c.b. > > On Mon, Feb 21, 2011

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

2011-02-21 Thread Ajo Fod
On using SQL IN ... what would happen if you created a short table with the enteries in the IN clause and used a "inner join" ? -Ajo On Mon, Feb 21, 2011 at 7:57 AM, Bejoy Ks wrote: > Thanks Jov for the quick response > > Could you please let me know which is the latest stable version of hive.

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Ajo Fod
You can group by item_sid (drop session_id and ip_number from group by clause) and then join with the parent table to get session_id and ip_number. -Ajo On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz wrote: > Hello, > > So I have table of item views with item_sid, ip_number, session_id > > I know i

Re: problem while performing union on twotables

2011-02-18 Thread Ajo Fod
Here is the relevant documentation: http://wiki.apache.org/hadoop/Hive/LanguageManual ... see the Union section. Cheers, Ajo. On Thu, Feb 17, 2011 at 11:12 PM, sangeetha s wrote: > Hi, > > I am trying to perform union of two tables which are having identical > schemas and distinct data.There ar

Re: Importing a file wich includes delimiter like into HIVE

2011-02-17 Thread Ajo Fod
t; 2011/2/15 hadoop n00b > > Or try the ascii value like "*DELIMITED FIELDS TERMINATED BY '124'*" >> >> See if that helps. >> >> Cheers! >> >> On Mon, Feb 14, 2011 at 9:44 PM, Ajo Fod wrote: >> >>> use delimited by "

Re: FW: Connecting to hive

2011-02-16 Thread Ajo Fod
by the sound of the error ... it sounds like you don't have HiveDriver in your path Can you locate the calss that supposedly has the HiveDriver class? Cheers, Ajo On Wed, Feb 16, 2011 at 2:03 PM, Stuart Scott wrote: > Hi, > > > > Does anyone know how to get a Windows client to Connect to Hive >

Re: Hive Server - Transport error occurred during acceptance of message

2011-02-15 Thread Ajo Fod
similar issue because the hive >> thrift server as far as i know is single threaded up until hive 0.5.0 (the >> version that i use). Not too sure if that's been changed in 0.6.0 or higher. >> >> -Viral >> >> On Fri, Feb 11, 2011 at 7:06 AM, Ajo Fod wrote:

Re: how far can I go with a 1 node cluster

2011-02-14 Thread Ajo Fod
Yes, I've often wondered about asymmetric configurations. Is there a mechanism to prevent partition map/reduce jobs to be aware of differences between speeds of processors and allocate less work the the slower processors? To try to answer the question here: I have not had much experience with mul

Re: Importing a file wich includes delimiter like into HIVE

2011-02-14 Thread Ajo Fod
use delimited by "|" ... are you using this syntax: Are you saying that the syntax here not work for you? http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table ... if you tried this ... ccould it be that the error may be caused by something else. Cheers, -Ajo On Mon, Feb 14, 20

Re: reset hive and hadoop

2011-02-11 Thread Ajo Fod
I'd be surprised if this were not enough. -Ajo On Fri, Feb 11, 2011 at 2:51 PM, Cam Bazz wrote: > Hello, > > I sometimes need to delete everything in hdfs and recreate the tables. > > The question is: how do I clear everything in the hdfs and hive? > > I delete everything in /tmp, hadoop/logs

Re: Hive Server - Transport error occurred during acceptance of message

2011-02-11 Thread Ajo Fod
Are you using hive 0.6? ... may be fixed in the latest version. Also I wonder why these thrift libraries are being used ... is this normal hive operation, or can you do something to avoid using thrift? -Ajo On Fri, Feb 11, 2011 at 12:05 AM, vaibhav negi wrote: > > Hi all, > > I am loading data

Re: hive : question about reducers

2011-02-10 Thread Ajo Fod
not a problem because eventually the job would complete (super-slow) >> but it would be nice to know the reason behind this behavior and how I could >> optimize it so that I can take full advantage of having multiple reducers >> running. >> >> -Viral >> >&g

Re: hive : question about reducers

2011-02-10 Thread Ajo Fod
I've had similar experiences ... usually with bucketing. Is this your experience too? -Ajo On Thu, Feb 10, 2011 at 1:57 PM, Viral Bajaria wrote: > Hello, > > In my Hive cluster, I have setup the mapred.reduce.tasks to be -1 i.e. I am > allowing HIVE to figure out the # of reducers that it would

Re: query returns sometext instead of none

2011-02-10 Thread Ajo Fod
Have you tried constructing the table as a text file? use the following at the end of the "CREATE table" statement : ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; It might be just that sequencefile puts in some information even if there is no data. Cheers, Ajo. On Wed, Feb

Re: Loading files into tables

2011-02-01 Thread Ajo Fod
ata is in HDFS by means of > > INSERT OVERWRITE TABLE tablename_new SELECT * FROM tablename ... (kind of) > > So those LOCAL tables are kind of temporary. > > Amlan > > > On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod wrote: > > > > Look up for local : > > http

Re: Loading files into tables

2011-02-01 Thread Ajo Fod
Look up for local : http://wiki.apache.org/hadoop/Hive/GettingStarted -Ajo. On Tue, Feb 1, 2011 at 3:15 AM, Amlan Mandal wrote: > Hi All, > I am a hive newbie. > > LOAD DATA *LOCAL* INPATH 'filepath' [OVERWRITE] INTO TABLE tablename > > When I use LOCAL keyword does hive create a hdfs file for

Re: Please read if you plan to use Hive 0.7.0 on Hadoop 0.20.0

2011-01-31 Thread Ajo Fod
I am new to hive and hadoop and I got the packaged version from Cloudera. So, personally, I'd be happy if the new package is mutually consistent. -Ajo On Mon, Jan 31, 2011 at 5:14 PM, Carl Steinbach wrote: > Hi, > > I'm trying to get an idea of how many people plan on running Hive > 0.7.0 on to

Re: Query Optimization in Hive

2011-01-31 Thread Ajo Fod
I think there is a developer mailing list ... that is probably the best place for this question. Also, I think there is a cost-based query optimizer in the works somewhere. -Ajo On Mon, Jan 31, 2011 at 2:04 PM, Anja Gruenheid wrote: > Hi! > > I'm a graduate student from Georgia Tech and I'm wor

Re: small files with hive and hadoop

2011-01-31 Thread Ajo Fod
I've noticed that it takes a while for each map job to be set up in hive ... and the way I set up the job I noticed that there were as many maps as files/buckets. I read a recommendation somewhere to design jobs such that they take at least a minute. Cheers, -Ajo. On Mon, Jan 31, 2011 at 8:08 AM

Re: create table error

2011-01-28 Thread Ajo Fod
Seems like you are using a MySQL metadata store ... do you have write permissions on the store? ... can you create another table? If not, perhaps you can try with the plain vanilla metastore and see if the problem persists. -Ajo. On Fri, Jan 28, 2011 at 2:31 AM, lei liu wrote: > When I execu

Re: Hive Error on medium sized dataset

2011-01-26 Thread Ajo Fod
Any chance you can convert the data to a tab separated text file and try the same query? It may not be the SerDe, but it may be good to isolate that away as a potential source of the problem. -Ajo. On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat < patrick.christop...@hp.com> wrote: > Hi, > >

Re: Is there any method can not move array the raw data.

2011-01-26 Thread Ajo Fod
Have you tried using external tables? BTW, hive tables can be defined as text tables, so you can run mapreduce on them too. Just locate the tables under directory: /user/hive/warehouse/ Cheers, -Ajo. 2011/1/26 母延年YNM > When I use load data into table like this , > > LOAD DATA INPATH '/user/myn

Re: Is there a reason why this simple query would take a very long time?

2011-01-25 Thread Ajo Fod
an Coveney wrote: > Yes, I tried that, it looks like it forces it to 1 if there are no groups. > > 2011/1/24 Ajo Fod > > oh ... sorry you say you already tried that. >> >> >> >> On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote: >> > you could try to set

Re: Is there a reason why this simple query would take a very long time?

2011-01-24 Thread Ajo Fod
ied that, it looks like it forces it to 1 if there are no groups. > > 2011/1/24 Ajo Fod >> >> oh ... sorry  you say you already tried that. >> >> >> >> On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote: >> > you could try to set the number of reduc

Re: Is there a reason why this simple query would take a very long time?

2011-01-24 Thread Ajo Fod
oh ... sorry you say you already tried that. On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote: > you could try to set the number of reducers e.g: > set mapred.reduce.tasks=4; > > set this before doing the select. > > -Ajo > > On Mon, Jan 24, 2011 at 1:13 PM, Jonathan Cov

Re: Is there a reason why this simple query would take a very long time?

2011-01-24 Thread Ajo Fod
you could try to set the number of reducers e.g: set mapred.reduce.tasks=4; set this before doing the select. -Ajo On Mon, Jan 24, 2011 at 1:13 PM, Jonathan Coveney wrote: > I have a 10 node server or so, and have been mainly using pig on it, but > would like to try out Hive. > I am running thi

how to get cumulative sum and diff of a sequence of numbers

2011-01-21 Thread Ajo Fod
consider a table that looks like: t1 12 t2 10 t3 -20 with cumsum, I'd like an output that looks like t1 12 t2 22 t3 2 with diff, I'd like something that looks like t1 12 t2 2 t3 -30 Any comments on how one would go about these problems best in the hive framework? -Ajo

Re: what char represents NULL value in hive?

2011-01-21 Thread Ajo Fod
For a tab separated file, I think it is the null string ... i.e no characters. So, for example 12\ta\t\t2 1\tb\ta\t1 reads 12 a 2 1b a 1 On Fri, Jan 21, 2011 at 1:09 AM, lei liu wrote: > I generate HDFS file , then I load the file to one hive table. There

Re: Mapjoin Usage Question

2011-01-20 Thread Ajo Fod
It probably depends on how big the big table is ... I mean if it can be held in memory. -Ajo On Wed, Jan 19, 2011 at 11:23 PM, hadoop n00b wrote: > Thanks Leo, > > Does the smaller table go into the mapjoin hint? Actually, when I ran a test > query with the bigger table in the hint, it performed

Re: On compressed storage : why are sequence files bigger than text files?

2011-01-19 Thread Ajo Fod
is not compressed. > > > -Original Message- > From: Ajo Fod [mailto:ajo@gmail.com] > Sent: Tuesday, January 18, 2011 8:46 AM > To: user@hive.apache.org > Subject: Re: On compressed storage : why are sequence files bigger than text > files? > > I tried 10

Re: Drop table if exists

2011-01-19 Thread Ajo Fod
Ah! ok. Thanks. -Ajo. On Wed, Jan 19, 2011 at 9:03 AM, Ping Zhu wrote: > I think only Hive 0,7 or later accepts syntax drop table if exists. >  http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Drop_Table > > > On Wed, Jan 19, 2011 at 8:54 AM, Ajo Fod wrote: >> >&g

Re: how do I use multiple reducers in hive?

2011-01-19 Thread Ajo Fod
Wed, Jan 19, 2011 at 8:04 AM, Edward Capriolo wrote: > On Wed, Jan 19, 2011 at 10:46 AM, Ajo Fod wrote: >> I've 2 questions: >> 1) how to raise the number of reducers? >> 2) why are there only 2 bucket files per partition even though I >> specified 32 buckets? >

Drop table if exists

2011-01-19 Thread Ajo Fod
I don't think this works. >> drop table if exists ; ... it seems to fail on the if exists part. Is anyone's experience different ?... I'm using CDH3 ... Hive 0.5.0. -Ajo

how do I use multiple reducers in hive?

2011-01-19 Thread Ajo Fod
I've 2 questions: 1) how to raise the number of reducers? 2) why are there only 2 bucket files per partition even though I specified 32 buckets? I've set the following and don't see an increase in the number of reducers. >>set hive.exec.reducers.max=32; >>set mapred.reduce.tasks=32; Could this b

Re: partitioned column join does not work as expected

2011-01-18 Thread Ajo Fod
Can you try this with a dummy table with very few rows ... to see if the reason the script doesn't finish is a computational issue? One other thing is to try with a combined partition, to see if it is a problem with the partitioning. Also, take a look at the results of an EXPLAIN statement, see

Re: On compressed storage : why are sequence files bigger than text files?

2011-01-18 Thread Ajo Fod
: > On Tue, Jan 18, 2011 at 10:25 AM, Ajo Fod wrote: >> I tried with the gzip compression codec. BTW, what do you think of >> bz2, I've read that it is possible to split as input to different >> mappers ... is there a catch? >> >> Here are my flags now ... o

Re: On compressed storage : why are sequence files bigger than text files?

2011-01-18 Thread Ajo Fod
.. as earlier ... BTW, takes 32sec to complete. sequence files are now stored in (2 files) totaling 244MB ... takes about 84 seconds. .. mind you the original was one file with 132MB. Cheers, -Ajo On Tue, Jan 18, 2011 at 6:36 AM, Edward Capriolo wrote: > On Tue, Jan 18, 2011 at 9:07 AM, Ajo Fo

On compressed storage : why are sequence files bigger than text files?

2011-01-18 Thread Ajo Fod
Hello, My questions in short are: - why are sequencefiles bigger than textfiles (considering that they are binary)? - It looks like compression does not make for a smaller sequence file than the original text file. -- here is a sample data that is transfered into the tables below with an INSERT O

On bucketing : fewer files than buckets.

2011-01-17 Thread Ajo Fod
Hello, In the documentation I read that as many files are created in each partition as there are buckets. In the following sample script, I created 32 buckets, but only find 2 files in each partition directory. Am I missing something? In this sample script, I'm trying to load a tab separated fil