Duplicates in self join

2018-10-07 Thread Eric L Goodman
What is the best way to avoid or remove duplicates when joining a stream with itself? I'm performing a streaming temporal triangle computation and the first part is to find triads of two edges of the form vertexA->vertexB and vertexB->vertexC (and there are temporal constraints where the first edg

Using several Kerberos keytabs in standalone cluster

2018-10-07 Thread Olga Luganska
Hello, According to the documentation, the security setup is shared by all the jobs on the same cluster, and if users need to use a different keytab, it is easily achievable in Yarn cluster setup by starting a new cluster with a different flink-conf.yaml Is it possible to setup a standalone clus

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Kostas Kloudas
Hi, Yes, please enable DEBUG to streaming to see all the logs also from the StreamTask. A checkpoint is “valid” as soon as it get acknowledged. As the documentation says, the job will restart from “ the last **successful** checkpoint” which is the most recent acknowledged one. Cheers, Kostas

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Averell
Hi Kostas, Yes, I set the level to DEBUG, but for the /org.apache.flink.streaming.api.functions.sink.filesystem.bucket/ only. Will try to enable for /org.apache.flink.streaming/. I just found one (possibly) issue with my build is that I had not used the latest master branch when merging with your

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Kostas Kloudas
Hi, I just saw that you have already set the level to DEBUG. These are all your DEBUG logs of the TM when running on YARN? Also did you try to wait a bit more to see if the acknowledgements of the checkpoints arrive a bit later? Checkpoints and acknowledgments are not necessarily aligned. Kost

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Kostas Kloudas
Hi Averell, Could you set your logging to DEBUG? This may shed some light on what is happening as it will contain more logs. Kostas > On Oct 7, 2018, at 11:03 AM, Averell wrote: > > Hi Kostas, > > I'm using a build with your PR. However, it seemed the issue is not with S3, > as when I tried t

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Averell
Hi Kostas, I'm using a build with your PR. However, it seemed the issue is not with S3, as when I tried to write to local file system (file:///, not HDFS), I also got the same problem - only the first part published. All remaining parts were in inprogress and had names prefixed with "." >From Fli

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Kostas Kloudas
Hi Averell, From the logs, only checkpoint 2 was acknowledged (search for “eceived completion notification for checkpoint with id=“) and this is why no more files are finalized. So only checkpoint 2 was successfully completed. BTW you are using the PR you mentioned before or Flink 1.6? I am as