Hi,
I saw the posting about storing NumPy values in sequence files:
http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3cCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3e
I’ve had a go at implementing this, and issued a PR request at
https://github.com/apach
dd and
> then save it through your own checkpoint mechanism.
>
> If not, please share your use case.
> On 11 May 2015 00:38, "Peter Aberline" wrote:
>
>> Hi
>>
>> I have many thousands of small DataFrames that I would like to save to
>> the one Parquet fil
Hi
I have many thousands of small DataFrames that I would like to save to the
one Parquet file to avoid the HDFS 'small files' problem. My understanding
is that there is a 1:1 relationship between DataFrames and Parquet files if
a single partition is used.
Is it possible to have multiple DataFram
Hi,
I'm having problems with a ClassNotFoundException using this simple example:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import java.net.URLClassLoader
import scala.util.Marshal
class ClassToRoundTrip(val id: Int) extends s