i want to load data into table from nutch segments data but....

2011-03-17 Thread 徐厚道
good morning everybody i want to load data into table from nutch segments data but i don't understand the python script in wikigetstart . what mean's* for line in sys.stdin ** * does it mean's the cell value? or just a

Re: Building Custom RCFiles

2011-03-17 Thread yongqiang he
Yes. It is the same with normal hive tables. thanks yongqiang On Thu, Mar 17, 2011 at 4:54 PM, Severance, Steve wrote: > Thanks Yongqiang. > > So for more complex types like map do I just setup a > > ROW FORMAT DELIMITED KEYS TERMINATED BY '|' etc... > > Thanks. > > Steve > > -Original Messag

RE: Building Custom RCFiles

2011-03-17 Thread Severance, Steve
Thanks Yongqiang. So for more complex types like map do I just setup a ROW FORMAT DELIMITED KEYS TERMINATED BY '|' etc... Thanks. Steve -Original Message- From: yongqiang he [mailto:heyongqiang...@gmail.com] Sent: Thursday, March 17, 2011 4:35 PM To: user@hive.apache.org Subject: Re:

Re: Building Custom RCFiles

2011-03-17 Thread yongqiang he
A side note, in hive, we make all columns saved as Text internally (even the column's type is int or double etc). And with some experiments, string is more friendly to compression. But it needs CPU to decode to its original type. Thanks Yongqiang On Thu, Mar 17, 2011 at 4:04 PM, yongqiang he wrot

Re: Building Custom RCFiles

2011-03-17 Thread yongqiang he
You need to customize Hive's ColumnarSerde (maybe functions in LazySerde)'s serde and deserialize function (depends you want to read or write.). And the main thing is that you need to use your own type def (not LazyInt/LazyLong). If your type is int or long (not double/float), casting it to string

Building Custom RCFiles

2011-03-17 Thread Severance, Steve
Hi, I am working on building a MR job that generates RCFiles that will become partitions of a hive table. I have most of it working however only strings (Text) are being deserialized inside of Hive. The hive table is specified to use a columnarserde which I thought should allow the writable typ

Re: Hadoop error 2 while joining two large tables

2011-03-17 Thread Edward Capriolo
On Thu, Mar 17, 2011 at 10:34 AM, wrote: > Try out CDH3b4 it has hive 0.7 and the latest of other hadoop tools. When you > work with open source it is definitely a good practice to upgrade those with > latest versions. With newer versions bugs would be minimal , performance > would be better a

Re: Hadoop error 2 while joining two large tables

2011-03-17 Thread bejoy_ks
Try out CDH3b4 it has hive 0.7 and the latest of other hadoop tools. When you work with open source it is definitely a good practice to upgrade those with latest versions. With newer versions bugs would be minimal , performance would be better and you get more functionalities. Your query looks f

RE: We had this wierd behvior

2011-03-17 Thread Guy Doulberg
Okay thanks -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, March 17, 2011 4:08 PM To: user@hive.apache.org Cc: Guy Doulberg Subject: Re: We had this wierd behvior 2011/3/17 Guy Doulberg : > Strings > > I actually simplified the scenario so I could

Re: We had this wierd behvior

2011-03-17 Thread Edward Capriolo
2011/3/17 Guy Doulberg : > Strings > > I actually simplified the scenario so I could the question, > Our partitions are actually string of dates with hour > > So the query was actually > Partition >= '20110301_20' and Partition <= '2011030223' > > Still using a single quote wouldn't be advised? > >

RE: We had this wierd behvior

2011-03-17 Thread Guy Doulberg
Strings I actually simplified the scenario so I could the question, Our partitions are actually string of dates with hour So the query was actually Partition >= '20110301_20' and Partition <= '2011030223' Still using a single quote wouldn't be advised? Thanks, -Original Message- From:

Re: We had this wierd behvior

2011-03-17 Thread Edward Capriolo
On Thursday, March 17, 2011, Guy Doulberg wrote: > Hey guys, I have a hive partitioned table. First I ran a query that look like > this:Select count(*) From tableWhere field like '%bla%' and (partition>'10' > and partition < '20') For this query I gotSome records let's say 640 When I > ran this

Re: Hadoop error 2 while joining two large tables

2011-03-17 Thread Edward Capriolo
I am pretty sure the cloudera distro has an upgrade path to a more recent hive. On Thursday, March 17, 2011, hadoop n00b wrote: > Hello All, > > Thanks a lot for your response. To clarify a few points - > > I am on CDH2 with Hive 0.4 (I think). We cannot move to a higher version of > Hive as we

Re: Hadoop error 2 while joining two large tables

2011-03-17 Thread hadoop n00b
Hello All, Thanks a lot for your response. To clarify a few points - I am on CDH2 with Hive 0.4 (I think). We cannot move to a higher version of Hive as we have to use Cloudera distro only. All records in the smaller table have at least one record in the larger table (of course a few exceptions

We had this wierd behvior

2011-03-17 Thread Guy Doulberg
Hey guys, I have a hive partitioned table. First I ran a query that look like this: Select count(*) >From table Where field like '%bla%' and (partition>'10' and partition < '20') For this query I got Some records let's say 640 When I ran this query Select count(*) >From table Where field like '