Hi,
I'm considering using Apache Spark for the development of an application.
This would replace a legacy program which reads CSV files and does lots
(tens/hundreds) of aggregations on them. The aggregations are fairly simple:
counts, sums, etc. while applying some filtering conditions on some of
Hi,
I don't have any code for the forEachBatch approach, I mentioned it due to
this response to my question on SO:
https://stackoverflow.com/a/65803718/1017130
I have added some very simple code below that I think shows what I'm trying
to do:
val schema = StructType(
Array(
StructFiel
Hey guys, so the problem i'm trying to tackle is the following:
- I need a data source that emits messages at a certain frequency
- There are N neural nets that need to process each message individually
- The outputs from all neural nets are aggregated and only when all N
outputs for each message
Say you have a spark streaming setup such as
JavaReceiverInputDStream<...> rndLists = jssc.receiverStream(new
JavaRandomReceiver(...));
rndLists.map(new NeuralNetMapper(...))
.foreach(new JavaSyncBarrier(...));
Is there any way of ensuring that, say, a JavaRandomReceiver and
Java
here i wrote a simpler version of the code to get an understanding of how it
works:
final List nns = new ArrayList();
for(int i = 0; i < numberOfNets; i++){
nns.add(NeuralNet.createFrom(...));
}
final JavaRDD nnRdd = sc.parallelize(nns);
JavaDStream results = rndLists.flatMap(new
FlatMapFu
Hey, i don't think that's the issue, foreach is called on 'results' which is
a DStream of floats, so naturally it passes RDDs to its function.
And either way, changing the code in the first mapper to comment out the map
reduce process on the RDD
Float f = 1.0f; //nnRdd.map(new Function() {