Can Connected Components run on a streaming dataset using iterate delta?

2020-02-17 Thread kant kodali
Hi All, I am wondering if connected components can run on a streaming data? or say incremental batch? I see that with delta iteration not all vertices need to participate at every iteration which

Re: Flink 'Job Cluster' mode Ui Access

2020-02-17 Thread Jatin Banger
Hi, Recently i upgraded flink version to 1.8.3 For Session cluster it shows the version correctly. But for job cluster. I get this in the logs *Starting StandaloneJobClusterEntryPoint (Version: , Rev:6322618, Date:04.09.2019 @ 22:07:41 CST)* And my Classpath has these jars: *Classpath: /opt/fli

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Yang Wang
Hi Maxim, Both Yarn per-job and session cluster should work. Since before the JobManager and TaskManager launcher, Yarn NodeManager could guarantee that all the local resources have been localized and accessible. If do not want to use the getResource to read the file and use File interface instead

Re: job history server

2020-02-17 Thread Richard Moorhead
I did not know that. I have since wiped the directory. I will post when I see this error again. On Mon, Feb 17, 2020 at 8:03 PM Benchao Li wrote: > `df -H` only gives the sizes, not inodes information. Could you also show > us the result of `df -iH`? > > Richard Moorhead 于2020年2月18日周二 上午9:40写道

Re: job history server

2020-02-17 Thread Benchao Li
`df -H` only gives the sizes, not inodes information. Could you also show us the result of `df -iH`? Richard Moorhead 于2020年2月18日周二 上午9:40写道: > Yes, I did. I mentioned it last but I should have been clearer: > > 22526:~/ $ df -H > > >[18:15:20] > Filesystem

Re: job history server

2020-02-17 Thread Richard Moorhead
Yes, I did. I mentioned it last but I should have been clearer: 22526:~/ $ df -H [18:15:20] FilesystemSize Used Avail Use% Mounted on /dev/mapper/vg00-rootlv00 2.1G 777M 1.2G 41% / tmpfs 2.1G 753M 1.4G 37% /d

Re: job history server

2020-02-17 Thread Benchao Li
Hi Richard, Have you checked that inodes of the disk partition were full or not? Richard Moorhead 于2020年2月18日周二 上午8:16写道: > I see the following exception often: > > 2020-02-17 18:13:26,796 ERROR > org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher - > Failure while fetchin

job history server

2020-02-17 Thread Richard Moorhead
I see the following exception often: 2020-02-17 18:13:26,796 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher - Failure while fetching/processing job archive for job eaf0639027aca1624adaa100bdf1332e. java.nio.file.FileSystemException: /dev/shm/flink-history-server/job

AW: Process stream multiple time with different KeyBy

2020-02-17 Thread theo.diefent...@scoop-software.de
Hi Sebastian, I'd also highly recommend a recent Flink blog post to you where exactly this question was answered in quote some detail : https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html Best regardsTheo Ursprüngliche Nachricht Von: Eduardo Winpenny Tejedor Datum

Re: Process stream multiple time with different KeyBy

2020-02-17 Thread Eduardo Winpenny Tejedor
Hi Sebastien, Without being entirely sure of what's your use case/end goal I'll tell you (some of) the options Flink provides you for defining a flow. If your use case is to apply the same rule to each of your "swimlanes" of data (one with category=foo AND subcategory=bar, another with category=f

Process stream multiple time with different KeyBy

2020-02-17 Thread Lehuede sebastien
Hi all, I'm currently working on a Flink Application where I match events against a set of rules. At the beginning I wanted to dynamically create streams following the category of events (Event are JSON formatted and I've a field like "category":"foo" in each event) but I'm stuck by the impossibil

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Maxim Parkachov
Hi Yang, I've just tried your suggestions, but, unfortunately, in yarn per job mode it doesn't work, both commands return null. I double checked that file is shipped to yarn container, but I feel that it happens later in process. At the moment I'm reading file with File interface, instead of getti

Re: Batch reading from Cassandra. How to?

2020-02-17 Thread Till Rohrmann
Hi Lasse, as far as I know, the best way to read from Cassandra is to use the CassandraInputFormat [1]. Unfortunately, there is no such optimized way to read a large amount of data as Spark offers it at the moment. But if you want to contribute this feature to Flink, then the community would highl

Re: Test sink behaviour

2020-02-17 Thread Till Rohrmann
Hi David, if you want to test the behavior together with S3, then you could check that S3 contains a file after the job has completed. If you want to test the failure and retry behaviour, then I would suggest to introduce an own abstraction for the S3 access which you can control. That way you ca

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Yang Wang
Hi Maxim, I have verified that the following two ways could both work. getClass().getClassLoader().getResource("lib/job.properties") getClass().getClassLoader().getResource("job.properties") Best, Yang Maxim Parkachov 于2020年2月17日周一 下午6:47写道: > Hi Yang, > > thanks, this explains why classpath

Parallelize Kafka Deserialization of a single partition?

2020-02-17 Thread Theo Diefenthal
Hi, As for most pipelines, our flink pipeline start with parsing source kafka events into POJOs. We perform this step within a KafkaDeserizationSchema so that we properly extract the event itme timestamp for the downstream Timestamp-Assigner. Now it turned out that parsing is currently the m

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Maxim Parkachov
Hi Yang, thanks, this explains why classpath behavior changed, but now I struggle to understand how I could overwrite resource, which is already shipped in job jar. Before I had job.properties files in JAR in under resources/lib/job.properties for local development and deploying on cluster it was

Flink's Either type information

2020-02-17 Thread jacopo.gobbi
Hi all, How can an Either value be returned by a KeyedBroadcastProcessFunction? We keep getting "InvalidTypesException: Type extraction is not possible on Either type as it does not contain information about the 'left' type." when doing: out.collect(Either.Right(myObject)); Thanks, Jacopo Gobb

Re: Flink job fails with org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

2020-02-17 Thread Piotr Nowojski
Hey, sorry but I know very little about the KafkaConsumer. I hope that someone else might know more. However, did you try to google this issue? It doesn’t sound like Flink specific problem, but like a general Kafka issue. Also a solution might be just as simple as bumping the limit of opened fi

Re: FlinkCEP questions - architecture

2020-02-17 Thread Kostas Kloudas
Hi Juergen, I will reply to your questions inline. As a general comment I would suggest to also have a look at [3] so that you have an idea of some of the alternatives. With that said, here come the answers :) 1) We receive files every day, which are exports from some database tables, containing