Thanks Gordon for your explanation.  
Mans 

    On Wednesday, December 20, 2017 2:16 PM, Tzu-Li (Gordon) Tai 
<tzuli...@apache.org> wrote:
 

 #yiv6533607487 body{font-family:Helvetica, Arial;font-size:13px;}Hi Mans,
 
What's the difference between an operator and a function ? 

An operator in Flink needs to handle processing of watermarks, records, and 
checkpointing of the operator state.To implement one, you need to extend the 
AbstractStreamOperator base class.It is considered a very low-level API that 
normal users would not use unless they have very specific needs.To add an 
operator to your pipeline, you would use DataStream::transform(…).
Functions are UDFs such as a FlatMapFunction, MapFunction, WindowFunction, 
etc., and is the typical way Flink users would define transformations on 
DataStreams / DataSets.They can be added to your pipeline using specific 
transform methods for each kind of function, e.g. DataStream::flatMap(…) 
corresponds to the FlatMapFunction.User functions are executed by an underlying 
operator (specifically, the AbstractStreamUdfOperator).UDFs only expose the 
abstraction of per-record processing and producing outputs so you don’t have to 
worry about other complications, for example handling watermarks and 
checkpointing state.Any registered state in UDFs are managed state, and will be 
checkpointed by the underlying operator.

What are the raw state interfaces ? Are they checkpoint related interfaces ?

The raw state interfaces refer to StateInitializationContext and 
StateSnapshotContext, which is only visible when you directly implement an 
AbstractStreamOperator.Through those interfaces, you have additional access to 
raw operator and keyed state input / output streams on the initializeState and 
snapshotState methods, which lets you read / write state as a stream of raw 
bytes.
Hope this helps!
Cheers,Gordon
On 20 December 2017 at 10:06:34 AM, M Singh (mans2si...@yahoo.com) wrote: 

Hi:
I amreading the documentation on working with 
state(https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html)and
 it states that :
All datastream functions can usemanaged state, but the raw state interfaces 
canonly be used when implementing operators. Using managedstate (rather than 
raw state) is recommended, since with managedstate Flink is able to 
automatically redistribute state when theparallelism is changed, and also do 
better memorymanagement.

Iwanted to find out    
   - What's the differencebetween an operator and a function ? 
   - What are the rawstate interfaces ? Are they checkpoint related interfaces? 
  


Thanks
Mans

   

Reply via email to