I am processing a log file, from each line of which I want to extract the
zeroth and 4th elements (and an integer 1 for counting) into a tuple. I had
hoped to be able to index the Array for elements 0 and 4, but Arrays appear
not to support vector indexing. I'm not finding a way to extract and
combine the elements properly, perhaps due to being a SparkStreaming/Scala
newbie.
My code so far looks like:
1] var lines = ssc.textFileStream(dirArg)
2] var linesArray = lines.map( line => (line.split("\t")))
3] var respH = linesArray.map( lineArray => lineArray(4) )
4a] var time = linesArray.map( lineArray => lineArray(0) )
4b] var time = linesArray.map( lineArray => (lineArray(0), 1))
5] var newState = respH.union(time)
If I use line 4a and not 4b, it compiles properly. (I still have issues
getting my update function to updateStateByKey working, so don't know if it
_works_ properly.)
If I use line 4b and not 4a, it fails at compile time with
[error] foo.scala:82: type mismatch;
[error] found : org.apache.spark.streaming.dstream.DStream[(String, Int)]
[error] required: org.apache.spark.streaming.dstream.DStream[String]
[error] var newState = respH.union(time)
This implies that the DStreams being union()ed have to be of identical
per-element type. Can anyone confirm that's true?
If so, is there a way to extract the needed elements and build the new
DStream?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-extract-combine-elements-of-an-Array-in-DStream-element-tp17676.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]