date:20130124

Re: Real-life experience of forcing smaller input splits?

2013-01-24 Thread Edward Capriolo

Not all files are split-table Sequence Files are. Raw gzip files are not. On Fri, Jan 25, 2013 at 1:47 AM, Nitin Pawar wrote: > set mapred.min.split.size=1024000; > set mapred.max.split.size=4096000; > set hive.merge.mapfiles=false; > > I had set above value and setting max split size to a lower

Re: Real-life experience of forcing smaller input splits?

2013-01-24 Thread Edward Capriolo

In most cases you want bigger splits because having lots of small tasks plays havoc on the job tracker. I have found that jobs with thousands of short lived map tasks tend to monopolize the slots. in other versions of hive the default was not CombineHiveInputFormat I think in most cases you want to

Re: Real-life experience of forcing smaller input splits?

2013-01-24 Thread Mathieu Despriee

Hi David, What file format and compression type are you using ? Mathieu Le 25 janv. 2013 à 07:16, David Morel a écrit : > Hello, > > I have seen many posts on various sites and MLs, but didn't find a firm > answer anywhere: is it possible yes or no to force a smaller split size > than a block

Re: Real-life experience of forcing smaller input splits?

2013-01-24 Thread Nitin Pawar

set mapred.min.split.size=1024000; set mapred.max.split.size=4096000; set hive.merge.mapfiles=false; I had set above value and setting max split size to a lower value did increase my # number of maps. My blocksize was 128MB Only thing was my files on hdfs were not heavily compressed and I was us

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread David Morel

On 24 Jan 2013, at 20:39, bejoy...@yahoo.com wrote: > Hi David, > > The default partitioner used in map reduce is the hash partitioner. So > based on your keys they are send to a particular reducer. > > May be in your current data set, the keys that have no values in table > are all falling in the

Re: Hive ODBC driver

2013-01-24 Thread Nitin Pawar

see if any of the below drivers help you https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll http://nuget.org/List/Packages/Hive.Sharp.Lib On Fri, Jan 25, 2013 at 9:47 AM, Chhaya Vishwakarma < chhaya.vishwaka...@lntinfotech.com> wrote: > Hi, > > I want to call hive in c#

Real-life experience of forcing smaller input splits?

2013-01-24 Thread David Morel

Hello, I have seen many posts on various sites and MLs, but didn't find a firm answer anywhere: is it possible yes or no to force a smaller split size than a block on the mappers, from the client side? I'm not after pointers to the docs (unless you're very very sure :-) but after real-life experie

Re: about hive limit optimization settings

2013-01-24 Thread Nitin Pawar

hive has a feature for data sampling where you actually don't read the entire table but sample of the table. I suppose these parameters belong to those queries. more you can read at https://cwiki.apache.org/Hive/languagemanual-sampling.html On Fri, Jan 25, 2013 at 4:42 AM, Wu, James C. wrote:

Re: substr() index out of range exception in hive 0.8.1

2013-01-24 Thread 曹坤

Hi Yu Yang have a look at this issue: https://issues.apache.org/jira/browse/HIVE-2722 2013/1/25 Yu Yang > Hi All, > > I'm working on hive 0.8.1. and meet following problem. > I use function substr(item,-4,1) to process one item in hive table, and > there is one row in which the content of the

Hive ODBC driver

2013-01-24 Thread Chhaya Vishwakarma

Hi, I want to call hive in c# how can I do that ? I found hive ODBC driver but not getting any downloadables.Can anyone give proper link to download hive ODBC driver . Regards, Chhaya Vishwakarma The contents of this e-mail and any attachment(s) may contain

substr() index out of range exception in hive 0.8.1

2013-01-24 Thread Yu Yang

Hi All, I'm working on hive 0.8.1. and meet following problem. I use function substr(item,-4,1) to process one item in hive table, and there is one row in which the content of the item is "ba_s0一朝忽觉京梦醒，半世浮沉雨打萍--衣俊卿小n实录010", then the job failed. I checked the task log, it appeared java.lang.Strin

Re: about hive limit optimization settings

2013-01-24 Thread Abdelrhman Shettia

Hi James. Basically if we have a table called table A which is mapped to a directory in hive /data/a . And n is the number of the files under /data/a with each row size s. hive -e "select * from a limit 10" To show the result very fast hive.limit.optimize.limit.file < n in this case will

about hive limit optimization settings

2013-01-24 Thread Wu, James C.

Hi, Do anyone know the meaning of these hive settings? The description of them are not clear to me. If someone can give me an example of how they shall be used, it would be great! hive.limit.row.max.size 10 When trying a smaller subset of data for simple LIMIT, how much size we need

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread bejoy_ks

Hi David, The default partitioner used in map reduce is the hash partitioner. So based on your keys they are send to a particular reducer. May be in your current data set, the keys that have no values in table are all falling in the same hash bucket and hence being processed by the same reducer

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread David Morel

On 24 Jan 2013, at 18:16, bejoy...@yahoo.com wrote: > Hi David > > An explain extended would give you the exact pointer. > > From my understanding, this is how it could work. > > You have two tables then two different map reduce job would be > processing those. Based on the join keys, combination

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread bejoy_ks

Hi David An explain extended would give you the exact pointer. From my understanding, this is how it could work. You have two tables then two different map reduce job would be processing those. Based on the join keys, combination of corresponding columns would be chosen as key from mapper1 a

An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread David Morel

Hi! After hitting the "curse of the last reducer" many times on LEFT OUTER JOIN queries, and trying to think about it, I came to the conclusion there's something I am missing regarding how keys are handled in mapred jobs. The problem shows when I have table A containing billions of rows with dist

View metadata

2013-01-24 Thread Todd Wilson

Hello all: I'm working on adding Hadoop as a data source to our query tool (ODBC connection against a Cloudera virtual machine (written in .NET)). Hive .9 is installed. I've got a question about views from the Hive documentation. https://cwiki.apache.org/confluence/pages/viewpage.actio

Re: Loading a Hive table simultaneously from 2 different sources

2013-01-24 Thread Dean Wampler

You'll face all the usual concurrency synchronization risks if you're updating the same "place" concurrently. One thing to keep in mind; it's all just HDFS under the hood. That pretty much tells you everything you need to know. Yes, there's also the metadata. So, one way to update a partition direc

Re: Loading a Hive table simultaneously from 2 different sources

2013-01-24 Thread Krishnan K

Hi Edward, All, Thanks for the quick reply! We are using dynamic partitions - so unable to say to which partition each record goes. We dont have much control here. Is there any properties that can be set ? I'm a bit doubtful here - is it because of the lock acquired on the table ? Regards, Kris

RE: Loading a Hive table simultaneously from 2 different sources

2013-01-24 Thread Bennie Schut

The benefit of using the partitioned approach is really nicely described in the oreilly book "Programming Hive". (Thanks for writing it Edward) For me the ability to drop a single partition if there's any doubt about the quality of the data of just one job is a large benefit. From: Edward Caprio

Re: Loading a Hive table simultaneously from 2 different sources

2013-01-24 Thread Edward Capriolo

Partition the table and load the data into different partitions. That or build the data outside he table and then use scripting to move the data in using LOAD DATA INPATH or copying. On Thu, Jan 24, 2013 at 9:44 AM, Krishnan K wrote: > Hi All, > > Could you please let me know what would happen i

Re: Real-life experience of forcing smaller input splits?

Re: Real-life experience of forcing smaller input splits?

Re: Real-life experience of forcing smaller input splits?

Re: Real-life experience of forcing smaller input splits?

Re: An explanation of LEFT OUTER JOIN and NULL values

Re: Hive ODBC driver

Real-life experience of forcing smaller input splits?

Re: about hive limit optimization settings

Re: substr() index out of range exception in hive 0.8.1

Hive ODBC driver

substr() index out of range exception in hive 0.8.1

Re: about hive limit optimization settings

about hive limit optimization settings

Re: An explanation of LEFT OUTER JOIN and NULL values

Re: An explanation of LEFT OUTER JOIN and NULL values

Re: An explanation of LEFT OUTER JOIN and NULL values

An explanation of LEFT OUTER JOIN and NULL values

View metadata

Re: Loading a Hive table simultaneously from 2 different sources

Re: Loading a Hive table simultaneously from 2 different sources

RE: Loading a Hive table simultaneously from 2 different sources

Re: Loading a Hive table simultaneously from 2 different sources

22 matches

Site Navigation

Mail list logo

Footer information