Re: Question about state processor data outputs

2021-05-05 Thread Chen-Che Huang
Hi Robert, Due to the performance issue of using state processor, I probably would like to give up state processor and am trying StreamingFileSink in a streaming manner. However, I need to store the files on GCS. However, I encountered the error below. It looks like Flink hasn't support GCS for

Re: Question about state processor data outputs

2021-04-16 Thread Chen-Che Huang
Hi Robert, Due to some concerns, we planned to use state processor to achieve our goal. Now we will consider to reevaluate using datastream to do the job while exploring the possibility of implementing a custom FileOutputFormat. Thanks for your comments! Best wishes, Chen-Che Huang On 2021/0

Re: Question about state processor data outputs

2021-04-15 Thread Robert Metzger
Hi, I assumed you are using the DataStream API, because you mentioned the streaming sink. But you also mentioned the state processor API (which I ignored a bit). I wonder why you are using the state processor API. Can't you use the streaming job that created the state also for writing it to files

Re: Question about state processor data outputs

2021-04-15 Thread Chen-Che Huang
Hi Robert, Thanks for your code. It's really helpful! However, with the readKeyedState api of state processor, we get dataset for our data instead of datastream and it seems the dataset doesn't support streamfilesink (not addSink method like datastream). If not, I need to transform the dataset

Re: Question about state processor data outputs

2021-04-15 Thread Robert Metzger
Hey Chen-Che Huang, I guess the StreamingFileSink is what you are looking for. It is documented here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html I drafted a short example (that is not production ready), which does roughly what you are asking for: htt

Question about state processor data outputs

2021-04-15 Thread Chen-Che Huang
Hi all, We're going to use state processor to make our keyedstate data to be written to different files based on the keys. More specifically, we want our data to be written to files key1.txt, key2.txt, ..., and keyn.txt where the value with the same key is stored in the same file. In each file,