Hi,
It might be a naive question, but I still wish that somebody could help me
handle it.
I have a textFile, in which every 4 lines represent a record. Since
SparkContext.textFile() API deems of one line as a record, it does not fit
into my case. I know that SparkContext.hadoopFile or newAPIHadoo
Hi,
I have a spark mapreduce task which requires me to write the final rdd to an
existing local file (appending to this file). I tried two ways but neither
works well:
1. use saveAsTextFile() api. Spark 1.1.0 claims that this API can write to
local, but I never make it work. Moreover, the result
Hi there,
I have several large files (500GB per file) to transform into Parquet format
and write to HDFS. The problems I encountered can be described as follows:
1) At first, I tried to load all the records in a file and then used
"sc.parallelize(data)" to generate RDD and finally used
"saveAsNew
Hi there,
I was wondering if somebody could tell me how to create an object with given
classtag so as to make the function below work. The only thing to do is just
to write one line to create an object of Class T. I tried new T but it does
not work. Would it possible to give me one scala line to f
Hi there,
I was wondering if anybody could help me find an efficient way to make a
MapReduce program like this:
1) For each map function, it need access some huge files, which is around
6GB
2) These files are READ-ONLY. Actually they are like some huge look-up
table, which will not change during
Hi there,
I have a bunch of data in a RDD, which I processed it one by one previously.
For example, there was a RDD denoted by "data: RDD[Object]" and then I
processed it using "data.map(...)". However, I got a new requirement to
process the data in a patched way. It means that I need to convert