Re: custom binary format

2014-12-18 Thread Andrew Mains
So in hive you can actually do that via the SET command (documented here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli) as follows: hive> SET fixedlengthinputformat.record.length = This value will be passed through to the JobConf, and the input format ought to pick it u

Re: custom binary format

2014-12-18 Thread Ingo Thon
Hello Andrew, this one looks indeed like a good idea. However, there is also another Problem already here. This InputFormat expects that conf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, recordLength); is set. I haven’t found any way to specify a parameter for a InputFormat. I couldn’t fin

Re: custom binary format

2014-12-18 Thread Andrew Mains
Hi Ingo, Take a look at https://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapred/FixedLengthInputFormat.html--it seems to be designed for use cases very similar to yours. You may need to subclass it to make things work precisely the way you need (in particular, to deal with the head

Re: custom binary format

2014-12-18 Thread Ingo Thon
Hi thanks for the answer so far, however, I still think there must be an easy way. The file format I’m looking at is pretty simple. There is first an header of n bytes, Which can be ignored. After that there is the data. The data consists of rows where ich rows has 9 bytes. First there is a byt

Re: custom binary format

2014-12-12 Thread Moore, Douglas
You want to look into ADD JAR and CREATE FUNCTION (for UDFs) and STORED AS 'full.class.name' for serde. For tutorials, google for "adding custom serde", I found one from Cloudera: http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/ Depending on your numbers (rows / file, byt

custom binary format

2014-12-11 Thread Ingo Thon
Dear List, I want to set up a DW based on Hive. However, my data does not come as handy csv files but as binary files in a proprietary format. The binary file consists of - 1 header of a dynamic number of bytes, which can be read from the contents of the header The header tells me how to