Re: how to convert single line into multiple lines in a serde (txt in txt out)?

2011-03-30 Thread Michael Jiang
Thanks Edward. That'll work. But that also means 2 tables will be created. How about we only want one table by using some serde s.t. it reads apache web log, generates multiple rows for each line of entry in the log that get loaded into the target table that I want? Is it doable by customizing Reg

Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files

2011-03-30 Thread Edward Capriolo
On Wed, Mar 30, 2011 at 3:31 PM, V.Senthil Kumar wrote: > Thanks for the suggestion. The query created just one result file. > > Also, before trying this query, I have found out another way of making this > work. I have added the following properties in hive-site.xml and it worked as > well. It cr

Re: how to convert single line into multiple lines in a serde (txt in txt out)?

2011-03-30 Thread Edward Capriolo
On Wed, Mar 30, 2011 at 3:46 PM, Michael Jiang wrote: > Also what if I want just one step to load each log entry line from log file > and for each generate multiple lines? That is, just one table created. I > don't want to have one table and then call explode() to get multiple lines. > Otherwise,

Re: how to convert single line into multiple lines in a serde (txt in txt out)?

2011-03-30 Thread Michael Jiang
Also what if I want just one step to load each log entry line from log file and for each generate multiple lines? That is, just one table created. I don't want to have one table and then call explode() to get multiple lines. Otherwise, alternative way is to use streaming on loaded table to turn it

Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files

2011-03-30 Thread V.Senthil Kumar
Thanks for the suggestion. The query created just one result file. Also, before trying this query, I have found out another way of making this work. I have added the following properties in hive-site.xml and it worked as well. It created just one result file. hive.merge.mapredfiles tru

Re: how to convert single line into multiple lines in a serde (txt in txt out)?

2011-03-30 Thread Michael Jiang
Thanks Edward. You mean implement "deserialize" to return a list? What is explode()? Sorry for basic questions. Could you please elaborate this a bit more or give me a link to some reference? Thanks! On Wed, Mar 30, 2011 at 12:03 PM, Edward Capriolo wrote: > On Wed, Mar 30, 2011 at 2:55 PM, Micha

Re: insert - Hadoop vs. Hive

2011-03-30 Thread Ashish Thusoo
If the data is already in the right format you should use LOAD syntax in Hive. This basically moves files into hdfs (so it should be not less performant than hdfs). If the data is not in the correct format or it needs to be transformed then the insert statement needs to be used. Ashish On Mar 3

Re: how to convert single line into multiple lines in a serde (txt in txt out)?

2011-03-30 Thread Edward Capriolo
On Wed, Mar 30, 2011 at 2:55 PM, Michael Jiang wrote: > Want to extend RegexSerDe to parse apache web log: for each log entry, need > to convert it into multiple entries. This is easy in streaming. But new to > serde, wondering if it is doable and how? Thanks! > You can have your serde produce li

how to convert single line into multiple lines in a serde (txt in txt out)?

2011-03-30 Thread Michael Jiang
Want to extend RegexSerDe to parse apache web log: for each log entry, need to convert it into multiple entries. This is easy in streaming. But new to serde, wondering if it is doable and how? Thanks!

Re: figuring out the right setting for dfs.datanode.max.xcievers

2011-03-30 Thread Edward Capriolo
On Wed, Mar 30, 2011 at 1:38 PM, Igor Tatarinov wrote: > I haven't found a good description on this setting and the costs in setting > it too high. Hope somebody can explain. > I have about a year's worth of data partitioned by date. Using 10 nodes and > setting xcievers to 5000, I can only save i

figuring out the right setting for dfs.datanode.max.xcievers

2011-03-30 Thread Igor Tatarinov
I haven't found a good description on this setting and the costs in setting it too high. Hope somebody can explain. I have about a year's worth of data partitioned by date. Using 10 nodes and setting xcievers to 5000, I can only save into 100 or so partitions. As a result, I have to do 4 rounds of

Re: HBase-Hive integration production ready?

2011-03-30 Thread Jean-Daniel Cryans
It's definitely usable, but since we prefer to store data in it's binary format we had to patch in HIVE-1634, with one fix or two. One day when I have some free time I might even fix that patch. I also haxored in support for composite row keys, but it's so ugly and tailored to our specific need th

Re: Hive in a readonly mode

2011-03-30 Thread Edward Capriolo
On Wed, Mar 30, 2011 at 9:29 AM, Guy Doulberg wrote: > Hey all, > > I bet someone has already asked this question before, but I couldn't a > thread with an answer to it, > > > > I want to give analysts in my organization access to hive in a readonly way, > > I.E, I don't want them to be able to cr

insert - Hadoop vs. Hive

2011-03-30 Thread David Zonsheine
Hi, I'm trying to compare adding files to hdfs for Hive usage using Hive inserts vs. adding to the hdfs directly then using Hive. Any comments, blogging about this? Thanks a lot, David Zonsheine

Hive in a readonly mode

2011-03-30 Thread Guy Doulberg
Hey all, I bet someone has already asked this question before, but I couldn't a thread with an answer to it, I want to give analysts in my organization access to hive in a readonly way, I.E, I don't want them to be able to create, drop tables, Alter tables , insert or load. How can I do that?

Re: Hive & MS SQL Server

2011-03-30 Thread shared mailinglists
Thanks Viral, i pass this info onto aour dba's however i dont think the problem is creating the tables, when looking at the logs Hive checks to see if a table called COLUMNS exists and finds a View called COLUMNS instead and therfore does not try to create the table after which any alter statements

Re: Hive & MS SQL Server

2011-03-30 Thread shared mailinglists
Thanks Appan, the goal was just to have the metadata backing Hive in SQLServer not the hadoopp data itself.. Our DBA.s monitored the sql generated by DataNucleus against sql server and were typically non to happy :-) Hive & SQL server therefore is a no goer for us at the moment so were looking at a