Thanks for the answer but this seems to apply for files that are havin a
key-value structure which I currently don't have. My file is a generic
binary file encoding data from sensors over time. I am just looking at
recreating some objects by assigning splits (ie continuous chunks of bytes)
to each
Spark uses any inputformat you specify and number of splits=number of RDD
partitions. You may want to take a deeper look at
SparkContext.newAPIHadoopRDD to load your data.
On Sat, May 9, 2015 at 4:48 PM, tog wrote:
> Hi
>
> I havé an application that currently run using MR. It currently starts
Hi
I havé an application that currently run using MR. It currently starts
extracting information from a proprietary binary file that is copied to
HDFS. The application starts by creating business objects from information
extracted from the binary files. Later those objects are used for further
pro