Re: Example flink run with security options? Running on k8s in my case

2020-08-26 Thread Nico Kruber
Hi Adam, the flink binary will pick up any configuration from the flink-conf.yaml of its directory. If that is the same as in the cluster, you wouldn't have to pass most of your parameters manually. However, if you prefer not having a flink-conf.yaml in place, you could remove the security.ssl.i

Re: Example flink run with security options? Running on k8s in my case

2020-08-27 Thread Nico Kruber
ed security back > on and did the curl, using this: > > openssl pkcs12 -passin pass:OhQYGhmtYLxWhnMC -in > /etc/flink-secrets/flink-tls-keystore.key -out rest.pem -nodes > > curl --cacert rest.pem tls-flink-cluster-1-11-jobmanager:8081 > > curl --cacert rest.pem --cert res

Re: Flink 1.8.3 GC issues

2020-09-10 Thread Nico Kruber
What looks a bit strange to me is that with a running job, the SystemProcessingTimeService should actually not be collected (since it is still in use)! My guess is that something is indeed happening during that time frame (maybe job restarts?) and I would propose to check your logs for anything

Re: Password usage in ssl configuration

2020-11-12 Thread Nico Kruber
file will contain the cleartext > passwords for keystore and truststore files. Suppose if any attacker gains > access to this configuration file, using these passwords keystore and > truststore files can be read. What is the community approach to protect > these passwords ? > > Re

Re: [1.9.2] Flink SSL on YARN - NoSuchFileException

2021-04-21 Thread Nico Kruber
> > Your Personal Data: We may collect and process information about you that > may be subject to data protection laws. For more information about how we > use and disclose your personal data, how we protect your information, our > legal basis to use your information, your rights

Re: Connecting to MINIO Operator/Tenant via SSL

2021-05-31 Thread Nico Kruber
dshakeContext.java:422) > ~[?:1.8.0_292] > at sun.security.ssl.TransportContext.dispatch(TransportContext.java:182) > ~[?:1.8.0_292] > at sun.security.ssl.SSLTransport.decode(SSLTransport.java:152) > ~[?:1.8.0_292] > at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:

Re: Flink job hangs using rocksDb as backend

2018-07-11 Thread Nico Kruber
/runtime.io>.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:288) > > org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189) > > org.apa

Re: Multi-tenancy environment with mutual auth

2018-07-16 Thread Nico Kruber
; like Flink inter-communications doesnt support SSL mutual auth. Any > plans/ways to support it? We are going to have to create DMZ for each > tenant without that, not preferable of course. > > > - Ashish -- Nico Kruber | Software Engineer data Artisans Follow us @dataArtisans

Re: Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-09-13 Thread Nico Kruber
; org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ReadingThread.go(UnilateralSortMerger.java:1066) > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:827) > > > Any ideas why the behaviour is not determinis

Re: Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-09-13 Thread Nico Kruber
; I will try setting the temporary directory to something other than the > default, can't hurt. > > Thanks for the help, > Encho > > On Thu, Sep 13, 2018 at 3:41 PM Nico Kruber <mailto:n...@data-artisans.com>> wrote: > > Hi Encho, > the SpillingAdapt

Re: Failed to fetch BLOB - IO Exception

2018-10-23 Thread Nico Kruber
>> org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:521) >> at >> org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:231) >> at >> org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnecti

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread Nico Kruber
en one of the > > committers can find the relevant patch from RocksDB master and backport > > it. > > That isn't the greatest user experience. > > > > Because of those disadvantages, we would prefer to do the upgrade to the > > newer RocksDB version

Re: build.gradle troubles with IntelliJ

2022-01-20 Thread Nico Kruber
.gradle.api.internal.artifacts.dsl.dependencies.DefaultDependencyHandler. > > I tried scala 2.11 and 1.14 SNAPSHOT but nothing works. > > When I look at > https://repository.apache.org/content/repositories/snapshots/org/apache/flin > k/flink-streaming-java.. everything is available

Re: Queryable State

2017-01-13 Thread Nico Kruber
Hi Dawid, I'll try to reproduce the error in the next couple of days. Can you also share the value deserializer you use? Also, have you tried even smaller examples in the meantime? Did they work? As a side-note in general regarding the queryable state "sink" using ListState (".asQueryableState(

Re: Queryable State

2017-01-16 Thread Nico Kruber
ving it would resolve > also my problem :) > > Regards > Dawid Wysakowicz > > 2017-01-13 18:50 GMT+01:00 Nico Kruber : > > Hi Dawid, > > I'll try to reproduce the error in the next couple of days. Can you also > > share > > the value deserializ

Re: Queryable State

2017-01-25 Thread Nico Kruber
e list I agree, using it seems > pointless. Moreover while removing it I would take a second look at those > > functions: > > KvStateRequestSerializer::deserializeList > > KvStateRequestSerializer.serializeList > > > As I think they are not used at all even right now. Thanks for your time. >

Re: Inconsistent Results in Event Time based Sliding Window processing

2017-02-14 Thread Nico Kruber
Hi Sujit, this does indeed sound strange and we are not aware of any data loss issues. Are there any exceptions or other errors in the job/taskmanager logs? Do you have a minimal working example? Is it that whole windows are not processed or just single items inside a window? Nico On Tuesday, 1

Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Nico Kruber
You do not require a plugin, but most probably this dependency was not fetched by Eclipse. Please try a "mvn clean package" in your project and see whether this helps Eclipse. Also, you may create a clean test project with mvn archetype:generate \ -Darchetype

Re: Inconsistent Results in Event Time based Sliding Window processing

2017-02-14 Thread Nico Kruber
logies – A Northgate Public Services Company > <https://www.google.co.in/maps/place/Rave+Technologies/@19.0058078,72.823516 > ,17z/data=!3m1!4b1!4m5!3m4!1s0x3bae17fcde71c3b9:0x1e2a8c0c4a075145!8m2!3d19. > 0058078!4d72.8257047> > > > > Please consider the environment before print

Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Nico Kruber
wing dependency that causes the problem: > > > org.apache.flink > flink-test-utils_2.10 > 1.2.0 > test-jar > test > > > Best, > Flavio > > On Tue, Feb 14, 2017 at 2:51 PM, Nico Kruber

Re: Unable to use Scala's BeanProperty with classes

2017-02-14 Thread Nico Kruber
Hi Adarsh, thanks for reporting this. It should be fixed eventually. @Timo: do you have an idea for a work-around or quick-fix? Regards Nico On Tuesday, 14 February 2017 21:11:21 CET Adarsh Jain wrote: > I am getting the same problem when trying to do FlatMap operation on my > POJO class. > >

Re: Negative values using latency marker

2017-11-03 Thread Nico Kruber
Hi Tovi, if I see this correctly, the LatencyMarker gets its initial timstamp during creation at the source and the latency is reported as a metric at a sink by comparing the initial timestamp with the current time. If the clocks between the two machines involved diverge, e.g. the sinks clock fa

Re: Making external calls from a FlinkKafkaPartitioner

2017-11-03 Thread Nico Kruber
Hi Ron, imho your code should be fine (except for a potential visibility problem on the changes of the non-volatile partitionMap member, depending on your needs). The #open() method should be called (once) for each sink initialization (according to the javadoc) and then you should be fine with t

Re: Incremental checkpointing documentation

2017-11-03 Thread Nico Kruber
Hi Elias, let me answer the questions to the best of my knowledge, but in general I think this is as expected. (Let me give a link to the docs explaining the activation [1] for other readers first.) On Friday, 3 November 2017 01:11:52 CET Elias Levy wrote: > What is the interaction of incrementa

Re: Negative values using latency marker

2017-11-06 Thread Nico Kruber
> > Thanks and regards, > Tovi > -----Original Message- > From: Nico Kruber [mailto:n...@data-artisans.com] > Sent: יום ו 03 נובמבר 2017 15:22 > To: user@flink.apache.org > Cc: Sofer, Tovi [ICG-IT] > Subject: Re: Negative values using latency marker > > Hi To

Re: ResultPartitionMetrics

2017-11-08 Thread Nico Kruber
Hi Aitozi, the difference is the scope: the normal metrics (without taskmanager.net.detailed-metrics) reflect _all_ buffers of a task while the detailed statistics are more fine-grained and give you statistics per input (or output) gate - the "total" there reflects the fact that each gate has mu

Re: Getting java.lang.ClassNotFoundException: for protobuf generated class

2017-11-13 Thread Nico Kruber
Hi Shankara, can you give us some more details, e.g. - how do you run the job? - how do you add/include the jar with the missing class? - is that jar file part of your program's jar or separate? - is the missing class, i.e. "com.huawei.ccn.intelliom.ims.MeasurementTable $measurementTable" (an inner

Re: AvroParquetWriter may cause task managers to get lost

2017-11-13 Thread Nico Kruber
Hi Ivan, sure, the more work you do per record, the slower the sink will be. However, this should not influence (much) the liveness checks inside flink. Do you get some meaningful entries in the TaskManagers' logs indicating the problem? I'm no expert on Avro and don't know how much actual work

Re: Flink HA Zookeeper Connection Timeout

2017-11-13 Thread Nico Kruber
Hi Sathya, have you checked this yet? https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/ jobmanager_high_availability.html I'm no expert on the HA setup, have you also tried Flink 1.3 just in case? Nico On Wednesday, 8 November 2017 04:02:47 CET Sathya Hariesh Prakash (sathypra)

Re: Streaming : a way to "key by partition id" without redispatching data

2017-11-13 Thread Nico Kruber
Hi Gwenhaël, several functions in Flink require keyed streams because they manage their internal state by key. These keys, however, should be independent of the current execution and its parallelism so that checkpoints may be restored to different levels of parallelism (for re-scaling, see [1]).

Re: JobManager shows TaskManager was lost/killed while TaskManger Process is still running and the network is OK.

2017-11-13 Thread Nico Kruber
>From what I read in [1], simply add JVM options to env.java.opts as you would when you start a Java program yourself, so setting "-XX:+UseG1GC" should enable G1. Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/ config.html#common-options On Friday, 15 September 2017

Re: Flink takes too much memory in record serializer.

2017-11-14 Thread Nico Kruber
We're actually also trying to have the serializer stateless in future and may be able to remove the intermediate serialization buffer which is currently growing on heap before we copy the data into the actual target buffer. This intermediate buffer grows and is pruned after serialization if it i

Re: Correlation between data streams/operators and threads

2017-11-17 Thread Nico Kruber
regarding 3. a) The taskmanager logs are missing, are there any? b) Also, the JobManager logs say you have 4 slots available in total - is this enough for your 5 devices scenario? c) The JobManager log, however, does not really reveal what it is currently doing, can you set the log level to DEBUG

Re: Getting java.lang.ClassNotFoundException: for protobuf generated class

2017-11-21 Thread Nico Kruber
Hi Shankara, sorry for the late response, but honestly, I cannot think of a reason that some of your program's classes (using only a single jar file) are found some others are not, except for the class not being in the jar. Or there's some class loader issue in the Flink Beam runner (which I fin

Re: Getting java.lang.ClassNotFoundException: for protobuf generated class

2017-11-22 Thread Nico Kruber
gt; > If jar -tf target/program.jar | grep MeasurementTable shows the class is > present, are there other dependencies missing? You may need to add runtime > dependencies into your pom or gradle.build file. > > On Tue, Nov 21, 2017 at 2:28 AM, Nico Kruber wrote: > > Hi S

Re: Correlation between data streams/operators and threads

2017-11-22 Thread Nico Kruber
t is used, and 3 are free. > > c) Attached > > d) Attached > > e) I'll try the debug mode in Eclipse. > > Thanks, > Shailesh > > On Fri, Nov 17, 2017 at 1:52 PM, Nico Kruber wrote: > > regarding 3. > > a) The taskmanager logs are missing, are th

Re: Flink 1.2.0->1.3.2 TaskManager reporting to JobManager

2017-11-28 Thread Nico Kruber
Hi Regina, can you explain a bit more on what you are trying to do and how this is set up? I quickly tried to reproduce locally by starting a cluster and could not see this behaviour. Also, can you try to increase the loglevel to INFO and see whether you see anything suspicious in the logs? Nico

Re: Checkpoint expired before completing

2017-12-01 Thread Nico Kruber
Hi Steven, by default, checkpoints time out after 10 minutes if you haven't used CheckpointConfig#setCheckpointTimeout() to change this timeout. Depending on your checkpoint interval, and your number of concurrent checkpoints, there may already be some other checkpoint processes running while you

Re: Blob server not working with 1.4.0.RC2

2017-12-04 Thread Nico Kruber
Hi Bernd, thanks for the report. I tried to reproduce it locally but both a telnet connection to the BlobServer as well as the BLOB download by the TaskManagers work for me. Can you share your configuration that is causing the problem? You could also try increasing the log level to DEBUG and see if

Re: Checkpoint expired before completing

2017-12-04 Thread Nico Kruber
> Stephan > > > On Fri, Dec 1, 2017 at 10:10 PM, Steven Wu <mailto:stevenz...@gmail.com>> wrote: > > Here is the checkpoint config. no concurrent checkpoints > with 2 minute checkpoint interval and timeout. > >

Re: AW: Blob server not working with 1.4.0.RC2

2017-12-06 Thread Nico Kruber
exposing 6124 > for the JobManager BLOB server. Thanks > > Bernd > > -Ursprüngliche Nachricht- > Von: Nico Kruber [mailto:n...@data-artisans.com] > Gesendet: Montag, 4. Dezember 2017 14:17 > An: Winterstein, Bernd; user@flink.apache.org > Betreff: Re: Blob s

Re: ProgramInvocationException: Could not upload the jar files to the job manager / No space left on device

2017-12-14 Thread Nico Kruber
Hi Regina, judging from the exception you posted, this is not about storing the file in HDFS, but a step before that where the BlobServer first puts the incoming file into its local file system in the directory given by the `blob.storage.directory` configuration property. If this property is not se

Re: Flink 1.4.0 can not override JAVA_HOME for single-job deployment on YARN

2017-12-14 Thread Nico Kruber
Hi, are you running Flink in an JRE >= 8? We dropped Java 7 support for Flink 1.4. Nico On 14/12/17 12:35, 杨光 wrote: > Hi, > I am usring flink single-job mode on YARN. After i upgrade flink > verson from 1.3.2 to 1.4.0, the parameter > "yarn.taskmanager.env.JAVA_HOME" doesn’t work as before. >

Re: Flink 1.4 with cassandra-connector: Shading error

2017-12-19 Thread Nico Kruber
Hi Dominik, nice assessment of the issue: in the version of the cassandra-driver we use there is even a comment about why: try { // prevent this string from being shaded Class.forName(String.format("%s.%s.channel.Channel", "io", "netty")); shaded = false; } catch (ClassNotFoundException e)

Re: S3 Access in eu-central-1

2018-01-02 Thread Nico Kruber
Sorry for the late response, but I finally got around adding this workaround to our "common issues" section with PR https://github.com/apache/flink/pull/5231 Nico On 29/11/17 09:31, Ufuk Celebi wrote: > Hey Dominik, > > yes, we should definitely add this to the docs. > > @Nico: You recently upd

Re: BackPressure handling

2018-01-02 Thread Nico Kruber
Hi Vishal, let me already point you towards the JIRA issue for the credit-based flow control: https://issues.apache.org/jira/browse/FLINK-7282 I'll have a look at the rest of this email thread tomorrow... Regards, Nico On 02/01/18 17:52, Vishal Santoshi wrote: > Could you please point me to any

Re: Testing Flink 1.4 unable to write to s3 locally

2018-01-03 Thread Nico Kruber
Hi Kyle, except for putting the jar into the lib/ folder and setting up credentials, nothing else should be required [1]. The S3ErrorResponseHandler class itself is in the jar, as you can see with jar tf flink-s3-fs-presto-1.4.0.jar | grep org.apache.flink.fs.s3presto.shaded.com.amazonaws.services

Re: ElasticSearch Connector for version 6.x and scala 2.11

2018-01-04 Thread Nico Kruber
Actually, Flink's netty dependency (4.0.27) is shaded away into the "org.apache.flink.shaded.netty4.io.netty" package now (since version 1.4) and should thus not clash anymore. However, other netty versions may come into play from the job itself or from the integration of Hadoop's classpath (if ava

Re: Exception on running an Elasticpipe flink connector

2018-01-04 Thread Nico Kruber
Hi Vipul, Yes, this looks like a problem with a different netty version being picked up. First of all, let me advertise Flink 1.4 for this since there we properly shade away our netty dependency (on version 4.0.27 atm) so you (or in this case Elasticsearch) can rely on your required version. Since

Re: Failed to serialize remote message [class org.apache.flink.runtime.messages.JobManagerMessages$JobFound

2018-01-16 Thread Nico Kruber
IMHO, this looks like a bug and it makes sense that you only see this with an HA setup: The JobFound message contains the ExecutionGraph which, however, does not implement the Serializable interface. Without HA, when browsing the web interface, this message is (probably) not serialized since it is

Re: How to get automatic fail over working in Flink

2018-01-16 Thread Nico Kruber
Hi James, In this scenario, with the restart strategy set, the job should restart (without YARN/Mesos) as long as you have enough slots available. Can you check with the web interface on http://:8081/ that enough slots are available after killing one TaskManager? Can you provide JobManager and Ta

Re: logging question

2018-01-16 Thread Nico Kruber
Just a guess, but probably our logging initialisation changes the global log level (see conf/log4j.properties). DataStream.collect() executes the program along with creating a local Flink "cluster" (if you are testing locally / in an IDE) and initializing logging, among other things. Please commen

Re: Parallel stream consumption

2018-01-16 Thread Nico Kruber
Hi Jason, I'd suggest to start with [1] and [2] for getting the basics of a Flink program. The DataStream API basically wires operators together with streams so that whatever stream gets out of one operator is the input of the next. By connecting both functions to the same Kafka stream source, your

Re: Parallel stream consumption

2018-01-16 Thread Nico Kruber
Just found a nice (but old) blog post that explains Flink's integration with Kafka: https://data-artisans.com/blog/kafka-flink-a-practical-how-to I guess, the basics are still valid Nico On 16/01/18 11:17, Nico Kruber wrote: > Hi Jason, > I'd suggest to start with [1] and [2]

Re: Unrecoverable job failure after Json parse error?

2018-01-16 Thread Nico Kruber
Hi Adrian, couldn't you solve this by providing your own DeserializationSchema [1], possibly extending from JSONKeyValueDeserializationSchema and catching the error there? Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#the-deserializationschema On

Re: Low throughput when trying to send data with Sockets

2018-01-16 Thread Nico Kruber
Hi George, I suspect issuing a read operation for every 68 bytes incurs too much overhead to perform as you would like it to. Instead, create a bigger buffer (64k?) and extract single events from sub-regions of this buffer instead. Please note, however, that then the first buffer will only be proce

Re: Low throughput when trying to send data with Sockets

2018-01-16 Thread Nico Kruber
w is that my source is not parallel, so when I am > increasing parallelism, system's throughput falls. > > Is opening multiple sockets a quick solution to make the source parallel? > > G. > > 2018-01-16 10:51 GMT+00:00 Nico Kruber <mailto:n...@data-artisans.com>&g

Re: Unrecoverable job failure after Json parse error?

2018-01-16 Thread Nico Kruber
o have a solution in hands. > Thanks again. > Adrian >   > > - Original message - > From: Nico Kruber > To: Adrian Vasiliu , user@flink.apache.org > Cc: > Subject: Re: Unrecoverable job failure after Json parse error? > Date: Tue, Jan 16,

Re: Caused by: java.lang.ClassNotFoundException: org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase

2018-02-26 Thread Nico Kruber
Judging from the code, you should separate different jars with a colon ":", i.e. "—addclasspath jar1:jar2" Nico On 26/02/18 10:36, kant kodali wrote: > Hi Gordon, > > Thanks for the response!! How do I add multiple jars to the classpaths? > Are they separated by a semicolon and still using one

Re: TaskManager crashes with PageRank algorithm in Gelly

2018-02-26 Thread Nico Kruber
Hi, without knowing Gelly here, maybe it has to do something with cleaning up the allocated memory as mentioned in [1]: taskmanager.memory.preallocate: Can be either of true or false. Specifies whether task managers should allocate all managed memory when starting up. (DEFAULT: false). When taskma

Re: Which test cluster to use for checkpointing tests?

2018-02-26 Thread Nico Kruber
Hi Ken, LocalFlinkMiniCluster should run checkpoints just fine. It looks like it was attempting to even create one but could not finish. Maybe your program was not fully running yet? Can you tell us a little bit more about your set up and how you configured the LocalFlinkMiniCluster? Nico On 23/

Re: Which test cluster to use for checkpointing tests?

2018-02-28 Thread Nico Kruber
Nico [1] https://lists.apache.org/thread.html/3121ad01f5adf4246aa035dfb886af534b063963dee0f86d63b675a1@1447086324@%3Cuser.flink.apache.org%3E On 26/02/18 22:55, Ken Krugler wrote: > Hi Nico, > >> On Feb 26, 2018, at 9:41 AM, Nico Kruber > <mailto:n...@data-artisa

Re: logging question

2018-02-28 Thread Nico Kruber
apache.log4j.ConsoleAppender > log4j.appender.console.Target=System.out > log4j.appender.console.layout=org.apache.log4j.PatternLayout > log4j.appender.console.layout.ConversionPattern=%d{-MM-dd > HH:mm:ss,SSS} %-5p %-60c %x - %m%n >   > # Suppress the irrelevant (wrong) warnings > log4j.logger.org.jboss.netty.chan

Re: Which test cluster to use for checkpointing tests?

2018-03-06 Thread Nico Kruber
Hi Ken, sorry, I was mislead by the fact that you are using iterations and those were only documented for the DataSet API. Running checkpoints with closed sources sounds like a more general thing than being part of the iterations rework of FLIP-15. I couldn't dig up anything on jira regarding this

Re: Which test cluster to use for checkpointing tests?

2018-03-06 Thread Nico Kruber
issue, though I prefer to wait > for ongoing changes in the network model and FLIP-6 to be finalised to apply > this change properly (are they?). > > Paris > >> On 6 Mar 2018, at 10:51, Nico Kruber wrote: >> >> Hi Ken, >> sorry, I was mislead by the fac

Re: Flink is looking for Kafka topic "n/a"

2018-03-06 Thread Nico Kruber
Hi Mu, which version of flink are you using? I checked the latest branches for 1.2 - 1.5 to look for findLeaderForPartitions at line 205 in Kafka08Fetcher but they did not match. From what I can see in the code, there is a MARKER partition state with topic "n/a" but that is explicitly removed from

Re: akka.remote.ReliableDeliverySupervisor Temporary failure in name resolution

2018-03-06 Thread Nico Kruber
Hi Miki, I'm no expert on the Kubernetes part, but could that be related to https://github.com/kubernetes/kubernetes/issues/6667? I'm not sure this is an Akka issue: if it cannot communicate with some address it basically blocks it from further connection attempts for a given time (here 5 seconds)

Re: Kafka offset auto-commit stops after timeout

2018-03-06 Thread Nico Kruber
Hi Edward, looking through the Kafka code, I do see a path where they deliberately do not want recursive retries, i.e. if the coordinator is unknown. It seems like you are getting into this scenario. I'm no expert on Kafka and therefore I'm not sure on the implications or ways to circumvent/fix th

Re: bin/start-cluster.sh won't start jobmanager on master machine

2018-03-06 Thread Nico Kruber
Hi Yesheng, `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit strange since it should (imho) be an absolute path towards flink. What you could do to diagnose further, is to try to run the ssh command manually, i.e. figure out what is being executed by calling bash -x ./bin/start

Re: Akka wants to connect with username "flink"

2018-03-06 Thread Nico Kruber
Hi Lukas, those are akka-internal names that you don't have to worry about. It looks like your TaskManager cannot reach the JobManager. Is 'jobmanager.rpc.address' configured correctly on the TaskManager? And is it reachable by this name? Is port 6123 allowed through the firewall? Are you sure th

Re: Flink is looking for Kafka topic "n/a"

2018-03-08 Thread Nico Kruber
I think, I found a code path (race between threads) that may lead to two markers being in the list. I created https://issues.apache.org/jira/browse/FLINK-8896 to track this and will have a pull request ready (probably) today. Nico On 07/03/18 10:09, Mu Kong wrote: > Hi Gordon, > > Thanks for y

Re: Incremental checkpointing performance

2018-03-19 Thread Nico Kruber
Hi Miyuru, Indeed, the behaviour you observed sounds strange and kind of go against the results Stefan presented in [1]. To see what is going on, can you also share your changes to Flink's configuration, i.e. flink-conf.yaml? Let's first make sure you're really comparing RocksDBStateBackend with v

Re: flink on mesos

2018-03-19 Thread Nico Kruber
Can you elaborate a bit more on what is not working? (please provide a log file or the standard output/error). Also, can you try a newer flink checkount? The start scripts have been merged into a single one for 'flip6' and 'old' - I guess, mesos-appmaster.sh should be the right script for you now.

Re: CsvSink

2018-03-19 Thread Nico Kruber
Hi Karim, when I was trying to reproduce your code, I got an exception with the name 'table' being used - by replacing it and completing the job with some input, I did see the csv file popping up. Also, the job was crashing when the file 1.txt already existed. The code I used (running Flink 1.5-SN

Re: Calling close() on Failure

2018-03-19 Thread Nico Kruber
Hi Gregory, I tried to reproduce the behaviour you described but in my case (Flink 1.5-SNAPSHOT, using the SocketWindowWordCount adapted to let the first flatmap be a RichFlatMapFunction with a close() method), the close() method was actually called on the task manager I did not kill. Since the clo

Re: Submiting jobs via UI/Rest API

2018-03-19 Thread Nico Kruber
Thanks for reporting these issues, 1. This behaviour is actually intended since we do not spawn any thread that is waiting for the job completion (which may or may not occur eventually). Therefore, the web UI always submits jobs in detached mode and you could not wait for job completion anyway. An

Re: Flink kafka connector with JAAS configurations crashed

2018-03-19 Thread Nico Kruber
Hi, I'm no expert on Kafka here, but as the tasks are run on the worker nodes (where the TaskManagers are run), please double-check whether the file under /data/apps/spark/kafka_client_jaas.conf on these nodes also contains the same configuration as on the node running the JobManager, i.e. an appro

Re: Flink web UI authentication

2018-03-19 Thread Nico Kruber
Hi Sampath, aside from allowing only certain origins via the configuration parameter "web.access-control-allow-origin", I am not aware of anything like username/password authentication. Chesnay (cc'd) may know more about future plans. You can, however, wrap a proxy like squid around the web UI if y

Re: Incremental checkpointing performance

2018-03-23 Thread Nico Kruber
l checkpoint must copy all current data. > If, between two checkpoints, you write more data than the contents of > the database, e.g. by updating a key multiple times, you may indeed have > more data to store. Judging from the state sizes you gave, this is > probably not the case. >

Re: How does setMaxParallelism work

2018-03-28 Thread Nico Kruber
am not able to figure out how Flink decides the >> parallelism value. >> For instance, if I setMaxParallelism to 3, I see that for my job, >> there is only 1 subtask that is created. How did Flink decide that >> 1 subtask was enough? >> >>

Re: timeWindow emits records before window ends?

2018-03-28 Thread Nico Kruber
t; aggFunction=nextgen.McdrAggregator@7d7758be}, EventTimeTrigger(), > WindowedStream.aggregate(WindowedStream.java:752)) -> Map > >   > > With “duration” 42s and “records sent” 689516. > >   > > I expected no records would be sent out until 18 ms elapse. > >   > &g

Re: How does setMaxParallelism work

2018-03-28 Thread Nico Kruber
28/03/18 13:21, Data Engineer wrote: > Agreed. But how did Flink decide that it should allot 1 subtask? Why not > 2 or 3? > I am trying to understand the implications of using setMaxParallelism vs > setParallelism > > On Wed, Mar 28, 2018 at 2:58 PM, Nico Kruber <mailto:n

Re: Master and Slave files

2018-03-28 Thread Nico Kruber
ey only used by bash scripts? > > Alex -- Nico Kruber | Software Engineer data Artisans Follow us @dataArtisans -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Data Artisans GmbH | Stresemannstr. 121A,10963 Berlin, Germany data Artisans, I

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Nico Kruber
doesn't either > restart or make the YARN app exit as failed? > > > > My launch command is basically: > > flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn > ${NODE_COUNT} -ys ${SLOT_COUNT} -yjm >

Re: How does setMaxParallelism work

2018-03-30 Thread Nico Kruber
ink, where I start with parallelism of > (for example) 1, but Flink notices I have high volume of data to process, and > automatically increases parallelism of a running job? > > Thanks, > Alex > > -Original Message- > From: Nico Kruber [mailto:n...@data-artis

Re: Reg. the checkpointing mechanism

2018-04-06 Thread Nico Kruber
e JobManager logs, I see multiple entries saying "Checkpoint > triggered". > I would like to clarify whether the JobManager is triggering the > checkpoint every 200 ms? Or does the JobManager only initiate the > checkpointing and the TaskManager do it on its own? > > R

Re: Unsure how to further debug - operator threads stuck on java.lang.Thread.State: WAITING

2018-04-24 Thread Nico Kruber
runtime.operators.hash.MutableHashTable.pro >>>> >>>> <http://tors.hash.MutableHashTable.pro>cessProbeIter(MutableHashTable.java:505) >>>>   at >>>> >>>> org.apache.flink.runtime.operators.hash.Mut

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-03 Thread Nico Kruber
runtime.rpc.akka.FencedAkkaRpcActor.handleMessage(FencedAkkaRpcActor.java:69) > >> >> > at > >> >> > > >> >> > > >> >> > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-22 Thread Nico Kruber
JM and impacted TM. > > Job ID 390a96eaae733f8e2f12fc6c49b26b8b > > -- > Thanks, > Amit > > On Thu, May 3, 2018 at 8:31 PM, Nico Kruber wrote: >> Also, please have a look at the other TaskManagers' logs, in particular >> the one that is running the operat

Re: PartitionNotFoundException after deployment

2018-06-05 Thread Nico Kruber
link.runtime.io > > <http://org.apache.flink.runtime.io>.network.partition.consumer.LocalInputChannel$1.run(LocalInputChannel.java:159) > >>    at java.util.TimerThread.mainLoop(Timer.java:555) > >>    at java.util.TimerThread.run(Timer.java:505) > >

Re: Running streaming job on every node of cluster

2017-02-27 Thread Nico Kruber
Hi Evgeny, I tried to reproduce your example with the following code, having another console listening with "nc -l 12345" env.setParallelism(2); env.addSource(new SocketTextStreamFunction("localhost", 12345, " ", 3)) .map(new MapFunction() { @Override

Re: Sliding Windows Processing problem with Kafka Queue Event Time and BoundedOutOfOrdernessGenerator

2017-02-27 Thread Nico Kruber
Hi Sujit, actually, according to https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/ windows.html#allowed-lateness the sliding window should fire each time for each element arriving late. Did you set the following for your window operator? .window() .allowedLateness() The exp

Re: Running streaming job on every node of cluster

2017-02-27 Thread Nico Kruber
oyment modes allow it. > > BR, Evgeny. > > От: Nico Kruber<mailto:n...@data-artisans.com> > Отправлено: 27 февраля 2017 г. в 20:07 > Кому: user@flink.apache.org<mailto:user@flink.apache.org> > Копия: Evgeny Kincharov<mailto:evgeny_kincha...@epam.com> > Те

Re: Running streaming job on every node of cluster

2017-02-27 Thread Nico Kruber
this may also be a good read: https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/ runtime.html#task-slots-and-resources On Monday, 27 February 2017 18:40:48 CET Nico Kruber wrote: > What about setting the parallelism[1] to the total number of slots in your > cluster

Re: Running streaming job on every node of cluster

2017-02-28 Thread Nico Kruber
continue this discussion in dev mail list? Nico, what do you > think? > > BR, Evgeny. > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/config.ht > ml#configuring-taskmanager-processing-slots > От: Nico Kruber<mailto:n...@data-arti

Re: OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory

2017-03-06 Thread Nico Kruber
Hi Yassine, Thanks for reporting this. The problem you run into is due to start-local.sh which we discourage in favour of start-cluster.sh that resembles real use case better. In your case, start-local.sh starts a job manager with an embedded task manager but does not parse the task manager con

Re: questions on custom state with flink window

2017-03-10 Thread Nico Kruber
Hi Sai, 3) If you want to make "Managed Keyed State" queryable, you have to set it as queryable through the API, e.g.: final ValueStateDescriptor query1State = new ValueStateDescriptor<>("stateName", Long.class); query1State.setQueryable("queryName"); The

Re: Queryable State

2017-03-13 Thread Nico Kruber
Hi Chet, the following thins may create the error you mentioned: * the job ID of the query must match the ID of the running job * the job is not running anymore * the queryableStateName does not match the string given to setQueryable("query-name") * the queried key does not exist (note that you ne

Re: has insufficient permissions to access it - Error

2017-04-11 Thread Nico Kruber
Hi Marc, the file path doesn't look quite right, unless you really have such an (absolute!) file path. Nico On Saturday, 8 April 2017 17:41:28 CEST Kaepke, Marc wrote: > Hi, > > if I run my small Gelly application on IntelliJ (macOS and Ubuntu as well) I > have this error: > > Caused by: java.

Re: Docker PID 1

2017-04-11 Thread Nico Kruber
Hi Kat, yes, this looks like it may be an issue, please create the Jira ticket. Some background: Although docker-entrypoint.sh uses "exec" to run succeeding bash scripts for jobmanager.sh and taskmanager.sh, respectively, and thus replaces itself with these scripts, they do not seem to use exec

  1   2   >