I did figure out how to compress data from an uncomressed data in hive table.
I also created a table as sequence file format.
Is there a way to know if a hive table (hdfs file underneath) is in sequence
file format? Describe extended table does not give the file format.
Thanks,
Chalcy
All,
We have a small table that we use the map-join technique to join to several
large tables in separate hive query scripts.
As I understand it, the map-join will do some preparatory work to get the small
table into the distributed cache for the map-join. These steps are (from my
understandi
Hi Ranjan,
Looks like the NPE is getting generated here:
if (0 == getCols().size()) {
throw new HiveException(
"at least one column must be specified for the table");
}
Which would seem to indicate that the table record which was fetched
doesn't have any columns.
Did you
Hi,
I've built a datastore using Hive 7.1 backed by S3 using persistent metadata.
Now that hive 8.1 is available, I'd like to migrate to the new version.
However, I'm having trouble reading tables with the persistent schema. Looking
in the logs, I'm getting stack traces like the following:
20
Snappy with sequence file works well for us. We'll have to decide which one
suits our needs.
Is there a way to convert exiting hdfs in text format to convert to sequence
files?
Thanks for all your input,
Chalcy
-Original Message-
From: Chalcy Raja [mailto:chalcy.r...@careerbuilder
It is there. I have io.compression.codecs in core-site.xml. There is not
error or warn in the sqoop to hive import which indicates anything.
The only reason we want to go to lzo is because snappy is not splittable.
Thanks,
Chalcy
-Original Message-
From: Bejoy KS [mailto:bejoy...@
Hi,
Slides from the talks which were presented at the Hadoop Summit Hive meetup
are now available:
https://cwiki.apache.org/confluence/display/Hive/Presentations
I'd like to thank all of the speakers for making this such a great event,
as well as the Hadoop Summit organizers for giving us a plac
Hi Chalcy
Lzo indexing not working, Is Lzo codec class available in
'io.compression.codec' property in core-site.xml?
Snappy is not splittable on its own. But sequence files are splittable so when
used together snappy gains the advantage of splittability.
Regards
Bejoy KS
Sent from handheld,
Hi Bejoy,
The weird thing is I did not get any errors. The sqoop import will not go to
the second phase where it creates lzo index.
We did deploy the native libraries, except hadoop-lzo lib which we copied after
we built in another machine. We did the same thing on the test machine also.
I
Hi Chalcy
Did you notice any warnings related to lzo codec on your mapreduce task logs or
on sqoop logs?
It could be because LZO libs are not available on the TaskTracker nodes. These
are native libs and are tied to OS, so if you have done an OS upgrade then you
need to rebuild and deploy the
Have you considered switching to sequence files using snappy
compression (or lzo). IIRC the process of generating LZO files and
then generating an index on top of these is cumbersome. When sequence
files are directly splittable.
On Mon, Jun 18, 2012 at 9:16 AM, Chalcy Raja
wrote:
> I am posting i
I am posting it here first and then may be on sqoop user group as well.
I am trying to use lzo compression.
Tested on a standalone by installing cdh3u3 and did sqoop to hive import with
lzo compression and everything works great. The data is sqooped into hdfs and
lzo index file got created and
Bejoy,
Again, I do understand those two steps, and I do understand that I
have a lot of options of making them run in sequence, but from the
very beginning my point was to avoid having two steps. I want to have
a dataset in the hive warehouse that I could query at any time with
just a hive query w
Going to bump this one since I hope to be able to contribute some (worth a bump
:P)
-Original Message-
From: Ruben de Vries [mailto:ruben.devr...@hyves.nl]
Sent: Friday, June 15, 2012 11:59 AM
To: user@hive.apache.org
Subject: Hive-0.8.1 PHP Thrift client broken?
Hey Guys,
I've been sl
Hive also have something called uniquejoin. May be you are looking for
that. I cannot find documentation for your reference but you can do a jira
search.
It allows you to perform joining multiple sources with same key, mapside.
(all sources should have the same key)
~Aniket
On Wed, Jun 13, 2012 a
Can you share the stack trace?
On Tue, Jun 12, 2012 at 2:18 AM, Marcin Cylke wrote:
> Hi,
>
> I'm having problems running current releases of Apache Hive, I get an
> error:
>
> java.lang.NoSuchMethodError:
>
> org.apache.thrift.server.TThreadPoolServer.(Lorg/apache/thrift/server/TThreadPoolServe
This would need changes to hive.
On Wed, Jun 13, 2012 at 8:34 AM, wrote:
> Not sure… but may be through Jython…
>
>
>
> *From:* 王锋 [mailto:wfeng1...@163.com]
> *Sent:* miércoles, 06 de junio de 2012 7:36
> *To:* user@hive.apache.org
> *Subject:* Re:Custom UDF in Python?
>
>
>
> udfs need extend
Hi Mark,
Collection items terminated by applies to both maps and arrays. In your
case, you can play with hive's nested complex data structures (so that you
can introduce another separator) to deserialize your data but that would
require some experimentation (digging into code). This would be non-t
18 matches
Mail list logo