Re: spark and binary files

2015-05-11 Thread tog
Thanks for the answer but this seems to apply for files that are havin a key-value structure which I currently don't have. My file is a generic binary file encoding data from sensors over time. I am just looking at recreating some objects by assigning splits (ie continuous chunks of bytes) to each

Re: spark and binary files

2015-05-09 Thread ayan guha
Spark uses any inputformat you specify and number of splits=number of RDD partitions. You may want to take a deeper look at SparkContext.newAPIHadoopRDD to load your data. On Sat, May 9, 2015 at 4:48 PM, tog wrote: > Hi > > I havé an application that currently run using MR. It currently starts

spark and binary files

2015-05-08 Thread tog
Hi I havé an application that currently run using MR. It currently starts extracting information from a proprietary binary file that is copied to HDFS. The application starts by creating business objects from information extracted from the binary files. Later those objects are used for further pro