Not all files are split-table Sequence Files are. Raw gzip files are not.
On Fri, Jan 25, 2013 at 1:47 AM, Nitin Pawar wrote:
> set mapred.min.split.size=1024000;
> set mapred.max.split.size=4096000;
> set hive.merge.mapfiles=false;
>
> I had set above value and setting max split size to a lower
In most cases you want bigger splits because having lots of small tasks
plays havoc on the job tracker. I have found that jobs with thousands of
short lived map tasks tend to monopolize the slots. in other versions of
hive the default was not CombineHiveInputFormat I think in most cases you
want to
Hi David,
What file format and compression type are you using ?
Mathieu
Le 25 janv. 2013 à 07:16, David Morel a écrit :
> Hello,
>
> I have seen many posts on various sites and MLs, but didn't find a firm
> answer anywhere: is it possible yes or no to force a smaller split size
> than a block
set mapred.min.split.size=1024000;
set mapred.max.split.size=4096000;
set hive.merge.mapfiles=false;
I had set above value and setting max split size to a lower value did
increase my # number of maps. My blocksize was 128MB
Only thing was my files on hdfs were not heavily compressed and I was us
On 24 Jan 2013, at 20:39, bejoy...@yahoo.com wrote:
> Hi David,
>
> The default partitioner used in map reduce is the hash partitioner. So
> based on your keys they are send to a particular reducer.
>
> May be in your current data set, the keys that have no values in table
> are all falling in the
see if any of the below drivers help you
https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll
http://nuget.org/List/Packages/Hive.Sharp.Lib
On Fri, Jan 25, 2013 at 9:47 AM, Chhaya Vishwakarma <
chhaya.vishwaka...@lntinfotech.com> wrote:
> Hi,
>
> I want to call hive in c#
Hello,
I have seen many posts on various sites and MLs, but didn't find a firm
answer anywhere: is it possible yes or no to force a smaller split size
than a block on the mappers, from the client side? I'm not after
pointers to the docs (unless you're very very sure :-) but after
real-life experie
hive has a feature for data sampling where you actually don't read the
entire table but sample of the table.
I suppose these parameters belong to those queries.
more you can read at
https://cwiki.apache.org/Hive/languagemanual-sampling.html
On Fri, Jan 25, 2013 at 4:42 AM, Wu, James C. wrote:
Hi Yu Yang
have a look at this issue: https://issues.apache.org/jira/browse/HIVE-2722
2013/1/25 Yu Yang
> Hi All,
>
> I'm working on hive 0.8.1. and meet following problem.
> I use function substr(item,-4,1) to process one item in hive table, and
> there is one row in which the content of the
Hi,
I want to call hive in c# how can I do that ? I found hive ODBC driver but not
getting any downloadables.Can anyone give proper link to download hive ODBC
driver .
Regards,
Chhaya Vishwakarma
The contents of this e-mail and any attachment(s) may contain
Hi All,
I'm working on hive 0.8.1. and meet following problem.
I use function substr(item,-4,1) to process one item in hive table, and
there is one row in which the content of the item is
"ba_s0一朝忽觉京梦醒,半世浮沉雨打萍--衣俊卿小n实录010", then the job failed.
I checked the task log, it appeared
java.lang.Strin
Hi James.
Basically if we have a table called table A which is mapped to a directory in
hive /data/a . And n is the number of the files under /data/a with each row
size s.
hive -e "select * from a limit 10"
To show the result very fast
hive.limit.optimize.limit.file < n
in this case will
Hi,
Do anyone know the meaning of these hive settings? The description of them are
not clear to me. If someone can give me an example of how they shall be used,
it would be great!
hive.limit.row.max.size
10
When trying a smaller subset of data for simple LIMIT, how much
size we need
Hi David,
The default partitioner used in map reduce is the hash partitioner. So based on
your keys they are send to a particular reducer.
May be in your current data set, the keys that have no values in table are all
falling in the same hash bucket and hence being processed by the same reducer
On 24 Jan 2013, at 18:16, bejoy...@yahoo.com wrote:
> Hi David
>
> An explain extended would give you the exact pointer.
>
> From my understanding, this is how it could work.
>
> You have two tables then two different map reduce job would be
> processing those. Based on the join keys, combination
Hi David
An explain extended would give you the exact pointer.
From my understanding, this is how it could work.
You have two tables then two different map reduce job would be processing
those. Based on the join keys, combination of corresponding columns would be
chosen as key from mapper1 a
Hi!
After hitting the "curse of the last reducer" many times on LEFT OUTER
JOIN queries, and trying to think about it, I came to the conclusion
there's something I am missing regarding how keys are handled in mapred
jobs.
The problem shows when I have table A containing billions of rows with
dist
Hello all:
I'm working on adding Hadoop as a data source to our query tool (ODBC
connection against a Cloudera virtual machine (written in .NET)). Hive .9
is installed.
I've got a question about views from the Hive documentation.
https://cwiki.apache.org/confluence/pages/viewpage.actio
You'll face all the usual concurrency synchronization risks if you're
updating the same "place" concurrently. One thing to keep in mind; it's all
just HDFS under the hood. That pretty much tells you everything you need to
know. Yes, there's also the metadata. So, one way to update a partition
direc
Hi Edward, All,
Thanks for the quick reply!
We are using dynamic partitions - so unable to say to which partition each
record goes. We dont have much control here.
Is there any properties that can be set ?
I'm a bit doubtful here - is it because of the lock acquired on the table ?
Regards,
Kris
The benefit of using the partitioned approach is really nicely described in the
oreilly book "Programming Hive". (Thanks for writing it Edward)
For me the ability to drop a single partition if there's any doubt about the
quality of the data of just one job is a large benefit.
From: Edward Caprio
Partition the table and load the data into different partitions. That or
build the data outside he table and then use scripting to move the data in
using LOAD DATA INPATH or copying.
On Thu, Jan 24, 2013 at 9:44 AM, Krishnan K wrote:
> Hi All,
>
> Could you please let me know what would happen i
22 matches
Mail list logo