Hi Spark Experts,
We are trying to streamline the development lifecycle of our data
scientists taking algorithms from the lab into production. Currently the
tool of choice for our data scientists is R. Historically our engineers
have had to manually convert the R based algorithms to Java or Scal
Hi Yan,
That is a good suggestion. I believe non-Zookeeper offset management will
be a feature in the upcoming Kafka 0.8.2 release tentatively scheduled for
September.
https://cwiki.apache.org/confluence/display/KAFKA/Inbuilt+Consumer+Offset+Management
That should make this fairly easy to imple
Hi Spark Experts,
I am curious what people are using to benchmark their Spark clusters. We
are about to start a build (bare metal) vs buy (AWS/Google Cloud/Qubole)
project to determine our Hadoop and Spark deployment selection. On the
Hadoop side we will test live workloads as well as simulated
'this 2-node replication is mainly for failover in case the receiver dies
while data is in flight. there's still chance for data loss as there's no
write ahead log on the hot path, but this is being addressed.'
Can you comment a little on how this will be addressed, will there be a
durable WAL?