Fw: how to reset streaming state regularly
Hi all: In Spark Streaming, I want to count some metrics by day, but in method "mapWithState", there is no API for this. Of course, I can achieve this by adding some time information to the record. However, I still want to use the spark API implementation . So, is there any direct or indirect API for this in spark? Or is there any better solution for this? Thanks! shicheng31...@gmail.com
returning type of function that needs to be passed to method 'mapWithState'
Hi all: In the `mapWithState`method in spark streaming, you need to pass in an anonymous function. This function maintains a state and should return a result. It can be said that the final stateful result can be obtained from the state object. So, what is the significance of returning result? I looked up the official API, and it did not specifically say that the result is used for,just give a simple explanation, as follows: // A mapping function that maintains an integer state and return a String def mappingFunction(key: String, value: Option[Int], state: State[Int]): Option[String] = { // Use state.exists(), state.get(), state.update() and state.remove() // to manage state, and return the necessary string } val spec = StateSpec.function(mappingFunction).numPartitions(10) val mapWithStateDStream = keyValueDStream.mapWithState[StateType, MappedType](spec) Can anyone help me with this problem?Thanks! shicheng31...@gmail.com
Structured Streaming initialized with cached data or others
Hi ,all: As we all known, structured streaming is used to handle incremental problems. However, if I need to make an increment based on an initial value, I need to get a previous state value when the program is initialized. Is there any way to assign an initial value to the'state'? Or other solutions? Thanks! shicheng31...@gmail.com
ThriftServer gc over exceed and memory problem
Hi all: My spark's version is 2.3.2. I start thriftserver with default spark config. On another hand, I use java-application to query result via JDBC . The query application has plenty of statement to execute. The previous statement executes very quickly, and the latter statement executes slower and slower. I try to observe actions of appliction 'Thrit JDBC Server' on web ui. I keep refreshing the page, but the page response is getting slower and slower. Finally, it shows gc over exceed. Then , I try to config the memory of executor in config spark-env.sh . And the executor's memory does increase. But the problem still exists. What puzzles me is The JDBC Server application serves as driver, only handle some code distribution and rpc connection works.Does it need so much meormy? If so , how to increase it's memory? shicheng31...@gmail.com
Operators supported by Spark Structured Streaming
Hi: Spark Structured Streaming uses the DataFrame API. When programming, there are no compilation errors, but when running, it will report various unsupported conditions. The official website does not seem to have a document to list the unsupported operators. This will Inconvenient when developing. How did you solve this problem? shicheng31...@gmail.com