Hi,
I have a domain-specific schema (RDF data with vertical partitioning, ie.
one table per property) and I want to instruct SparkSQL to keep semantically
closer property tables closer together, that is, group dataframes together
into different nodes (or at least encourage it somehow) so that tabl
Hello!
Spark Streaming supports HDFS as input source, and also Akka actor
receivers, or TCP socket receivers.
For my use case I think it's probably more convenient to read the data
directly from Actors, because I already need to set up a multi-node Akka
cluster (on the same nodes that Spark runs
Hey Michael,
Thanks for the great reply! That clears things up a lot. The idea about
Apache Kafka sounds very interesting; I'll look into it. The multiple
consumers and fault tolerance sound awesome. That's probably what I need.
Cheers,
Nilesh
--
View this message in context:
http://apache-sp
Hey all!
I have got an iterative problem. I'm trying to find something similar to
Hadoop's MultipleOutputs [1] in Spark 1.0. I need to build up a couple of
large dense vectors (may contain billions of elements - 2 billion doubles =>
at least 16GB) by adding partial vector chunks to it. This can be
Hello,
In a thread about "java.lang.StackOverflowError when calling count()" [1] I
saw Tathagata Das share an interesting approach for truncating RDD lineage -
this helps prevent StackOverflowErrors in high iteration jobs while avoiding
the disk-writing performance penalty. Here's an excerpt from