from:"Averell"

JSON to Parquet

2020-08-20 Thread Averell

JSON -> AVRO (GenericRecord) -> Parquet. Is that an option? I would want to be able to quickly/dynamically (as less code change as possible) change the JSON schema. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-08-27 Thread Averell

Hello, I have a Flink 1.10 job which runs in AWS EMR, checkpointing to S3a as well as writing output to S3a using StreamingFileSink. The job runs well until I add the Java Hadoop properties: /-Dfs.s3a.acl.default= BucketOwnerFullControl/. Since after that, the checkpoint process fails to complete

Re: JSON to Parquet

2020-08-27 Thread Averell

stion now. Thanks for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-08-31 Thread Averell

. However, intermittenly my checkpoints still fail (about 10%). And whenever it fails, there are non-completed files left in S3 (attached screenshot below). I'm not sure whether those uncompleted files are expected, or is that a bug? Thanks and regards, Averell <http://apache-flink-user-mail

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

2020-09-12 Thread Averell

nd regards Averell <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/FlinkFileSink.png> -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

HA on AWS EMR

2020-10-18 Thread Averell

ashed. Do I miss something here? Thanks for your help. Regards, Averell [1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/yarn_setup.html -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: HA on AWS EMR

2020-10-19 Thread Averell

27;s dataDir, which is by default stored in the local storage of the EMR's master node. I'm trying to move this one to an EFS, in hope that it would help. Not sure whether this is a right approach. Thanks for your help. Regards, Averell [1] http://apache-flink-user-mailing-list-a

Re: HA on AWS EMR

2020-10-21 Thread Averell

some error messages attached below. Not sure that's a bug or expected behaviour. Thanks and best regards, Averell /07:39:33.906 [main-EventThread] ERROR org.apache.flink.shaded.curator4.org.apache.curator.ConnectionState - Authentication failed 07:40:11.585 [flink-akka.actor.defau

KryoException UnsupportedOperationException when writing Avro GenericRecords to Parquet

2020-10-21 Thread Averell

ing. I tried with both Flink 1.10.0 and 1.11.0, and currently stuck at this. Could you please help? Thanks and regards, Averell /com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap (org.apa

Re: KryoException UnsupportedOperationException when writing Avro GenericRecords to Parquet

2020-10-26 Thread Averell

Hello Till, Adding GenericRecordAvroTypeInfo(schema) does help. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: HA on AWS EMR

2020-10-27 Thread Averell

Hello Robert, Thanks for the info. That makes sense. I will save and cancel my jobs with 1.10, upgrade to 1.11, and restore the jobs from the savepoints. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Job is still in ACTIVE state after /jobs/:jobid/stop

2020-11-12 Thread Averell

, "now": 1605182656731, "timestamps": { "CANCELLING": 0, "FAILING": 0, "CANCELED": 0, "FINISHED": 0, "RUNNING": 1604016319495, "FAILED": 0, "RESTARTING": 0, "CREATED":

Re: Job is still in ACTIVE state after /jobs/:jobid/stop

2020-11-12 Thread Averell

I have some updates. Some weird behaviours were found. Please refer to the attached photo. All requests were sent via REST API The status of the savepoint triggered by that stop request (ID 11018) is "COMPLETED [Savepoint]", however, no checkpoint data has been persisted (in S3). The folder /`sav

Change to StreamingFileSink in Flink 1.10

2020-04-20 Thread Averell

tBuilder[_]] .withRollingPolicy(...) returns a RowFormatBuilder[_] .withBucketAssigner(...) returns Any/ I'm using Maven 3.6.0, Java 1.8.0_242, and Scala 2.11.12. Tried with/without IntelliJ, no difference. Not sure/understand what's wrong Thanks! Averell -- Sent from: http://apache-flink

Re: Change to StreamingFileSink in Flink 1.10

2020-04-20 Thread Averell

nd regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Change to StreamingFileSink in Flink 1.10

2020-04-20 Thread Averell

igner)*.asInstanceOf[RowFormatBuilder[IN, String, _]]* .build()/ Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Change to StreamingFileSink in Flink 1.10

2020-04-21 Thread Averell

Hello Leonard, Sivaprasanna, But my code was working fine with Flink v1.8. I also tried with a simple String DataStream, and got the same error. /StreamingFileSink .forRowFormat(new Path(path), new SimpleStringEncoder[String]()) .withRollingPolicy(DefaultRollingPolicy.b

Re: Change to StreamingFileSink in Flink 1.10

2020-04-22 Thread Averell

Thanks @Seth Wiesman and all. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-23 Thread Averell

but it didn't help (is it already too late? should that be there before the JM is started?) Thanks for your help. Averell / Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create checkpoint storage at checkpoint coordinator side. at org.apache.flink.runtime.ch

Re: Question about Scala Case Class and List in Flink

2020-04-24 Thread Averell

lds, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance./" I imported /org.apache.flink.streaming.api.scala._/ << is this enough to tell that I am using Scala API? Thanks and regar

Re: K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-24 Thread Averell

Thank you Yun Tang. Building my own docker image as suggested solved my problem. However, I don't understand why I need that while I already had that s3-hadoop jar included in my uber jar? Thanks. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.23360

Re: K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-26 Thread Averell

Hi David, Yang, Thanks. But I just tried to submit the same job on a YARN cluster using that same uberjar, and it was successful. I don't have flink-s3-fs-hadoop.jar anywhere in the lib or plugin folder. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-ar

Flink Metrics in kubernetes

2020-05-12 Thread Averell

gs that I need to care for when running in Kubernetes? Thanks a lot. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink Metrics in kubernetes

2020-05-12 Thread Averell

ve.2336050.n4.nabble.com/file/t1586/k8xDump.txt> . Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink Metrics in kubernetes

2020-05-13 Thread Averell

Hi Gary, Sorry for the false alarm. It's caused by a bug in my deployment - no metrics were added into the registry. Sorry for wasting your time. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Automatically resuming failed jobs in K8s

2020-06-10 Thread Averell

toring the job, I need to provide the full path of the last checkpoint (/s3:chk-2345//). Is there any option to just provide the base_path? 3. Store the info to restore the jobs in the K8s deployment config Thanks a lot. Regards, Averell -- Sent from: http://apache-flink-user-mailing-li

Re: Automatically resuming failed jobs in K8s

2020-06-12 Thread Averell

Thank you very much, Yang. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Preserving (best effort) messages order between operators

2019-10-30 Thread Averell

uld throttle the 1st operator when back-pressure is high, then I could mitigate the mentioned problem. But I could not find any guide on doing that. Could you please help? Thanks. Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How to stream intermediate data that is stored in external storage?

2019-10-31 Thread Averell

Hi Kant, I wonder why you need to "source" your intermediate state from files? Why not "source" it from the previous operator? I.e. instead of (A join B) -> State -> files -> (C), why not do (A join B) -> State -> (files + C)? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.

Re: How to stream intermediate data that is stored in external storage?

2019-10-31 Thread Averell

ready-to-use options. Writing a custom sink function to write to your own SQL server is also a not-so-difficult solution. Best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Preserving (best effort) messages order between operators

2019-11-01 Thread Averell

hanks. Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

S3 file source - continuous monitoring - many files missed

2018-07-23 Thread Averell

Good day everyone, I have a Flink job that has an S3 folder as a source, and we keep putting thousands of small (around 1KB each) gzip files into that folder, with the rate of about 5000 files per minute. Here is how I created that source in Scala: / val my_input_format = new TextInputFormat(n

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell

Hi Jörn, Thanks. I had missed that EMRFS strong consistency configuration. Will try that now. We also had a backup solution - using Kinesis instead of S3 (I don't see Kinesis in your suggestion, but hope that it would be alright). "/The small size and high rate is not suitable for S3 or HDFS/" <<<

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell

Just some update: I tried to enable "EMRFS Consistent View" option, but it didn't help. Not sure whether that's what you recommended, or something else. Thanks! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell

Could you please help explain more details on "/try read after write consistency (assuming the files are not modified) /"? I guess that the problem I got comes from the inconsistency in S3 files listing. Otherwise, I would have got exceptions on file not found. My use case is to read output files

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell

Hello Jörn. Thanks for your help. "/Probably the system is putting them to the folder and Flink is triggered before they are consistent./" <<< yes, I also guess so. However, if Flink is triggered before they are consistent, either (a) there should be some error messages, or (b) Flink should be abl

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell

Thank you Fabian. I tried to implement a quick test basing on what you suggested: having an offset from system time, and I did get improvement: with offset = 500ms - the problem has completely gone. With offset = 50ms, I still got around 3-5 files missed out of 10,000. This number might come from

Re: S3 file source - continuous monitoring - many files missed

2018-07-24 Thread Averell

Hello Fabian, I created the JIRA bug https://issues.apache.org/jira/browse/FLINK-9940 BTW, I have one more question: Is it worth to checkpoint that list of processed files? Does the current implementation of file-source guarantee exactly-once? Thanks for your support. -- Sent from: http://ap

Re: S3 file source - continuous monitoring - many files missed

2018-07-25 Thread Averell

Thank you Fabian for the guide to implement the fix. I'm not quite clear about the best practice of creating JIRA ticket. I modified its priority to Major as you said that it is important. What would happen next with that issue then? Someone (anyone) will pick it and create a fix, then include tha

Re: S3 file source - continuous monitoring - many files missed

2018-07-30 Thread Averell

Here is my https://github.com/lvhuyen/flink implementation of the change. 3 files were updated: StreamExecutionEnvironment.java, StreamExecutionEnvironment.scala, and ContinuousFileMonitoringFunction.java. All the thanks to Fabian. -- Sent from: http://apac

Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell

und-robin. However, I could not find the call of "rebalancing" method, but "transform" is called. Not much information about that "transform" method though. Would it possible for me to ask for some guideline on this? Thanks for your help. Averell -- Sent from: http

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell

Thank you Vino. Yes, I went thru that official guide before posting this question. The problem was that I could not see any call to one of those mentioned partitioning methods (partitionCustom, shuffle, rebalance, rescale, or broadcast) in the original readFile function. I'm still trying to look i

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell

Thanks Vino. Yes, I can do that after the source function. But that means data would be shuffled - sending from every source to the right partition. I think that by doing the partition from within the file source would help to save that shuffling. Thanks. Averell. -- Sent from: http://apache

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell

Oh, Thank you Vino. I was not aware of that reshuffling after every custom partitioning. Why would that needed though? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-07-30 Thread Averell

Hi Vino, I'm a little bit confused. If I can do the partitioning from within the source function, using the same hash function on the key to identify the partition, would that be sufficient to avoid shuffling in the next byKey call? Thanks. Averell -- Sent from: http://apache-flink

Re: Detect late data in processing time

2018-07-30 Thread Averell

Hi Soheil, Why don't you just use the processing time as event time? Simply overriding extractTimestamp to return your processing time. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-07-31 Thread Averell

doing that yet. Thank you very much for your support. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-07 Thread Averell

Thank you Fabian. "/In either case, some record will be read twice but if reading position can be reset, you can still have exactly-once state consistency because the state is reset as well./" I do not quite understand this statement. If I have read 30 lines from the checkpoint and sent those 30 r

Re: Small-files source - partitioning based on prefix of file

2018-08-09 Thread Averell

Thank you Vino and Fabien for your help in answering my questions. As my files are small, I think there would not be much benefit in checkpointing file offset state. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-09 Thread Averell

. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-10 Thread Averell

stable/dev/stream/state/state.html> . However, I have not been able to find the information I am looking for. Please help point me to the right place. Thanks and best regards, Averell. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-12 Thread Averell

Thank you Fabian. It is clear to me now. Thanks a lot for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

CoFlatMapFunction with more than two input streams

2018-08-14 Thread Averell

licating the CoFlatMapFunction (*). With option 2, there's additional cost coming from unioning, splitting/selecting, and type-casting at the final streams. Is there any better option for me? Thank you very much for your support. Regards, Averell (*) I am using Scala, and I tried to c

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread Averell

Regarding using Either wrapper, as my understanding, I would need to use that both in my sources (stream_A and B) and in the CoProcess/CoFlatMapFunction. Then using a super class Animal would be more convenient, wouldn't it? Thanks and regards, Averell -- Sent from: http://apache-flink-use

Re: CoFlatMapFunction with more than two input streams

2018-08-15 Thread Averell

quot;).map(_ match { case Right(d) => d case _ => NON_EXIST_DOG }) val transformed_cat = transformed.select("cat").map(_ match { case Left(c) => c case _ => NON_EXIST_CAT }) Thanks! Avere

Initial value of ValueStates of primitive types

2018-08-17 Thread Averell

Hi, In Flink's documents, I couldn't find any example that uses primitive type when working with States. What would be the initial value of a ValueState of type Int/Boolean/...? The same question apply for MapValueState like [String, Int] Thanks and regards, Averell -- Sent

Re: Initial value of ValueStates of primitive types

2018-08-17 Thread Averell

licit conversion: val y:Integer = getRuntimeContext.getState(new ValueStateDescriptor[Int]("Triggered", classOf[Int])).value() >> y = {Integer@5795} "0" Thanks for your time. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Raising a bug in Flink's unit test scripts

2018-08-24 Thread Averell

FLINK-9940 bug fix, or I have to raise a separated bug? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread Averell

of: (1) flink application killed (2) cluster crashed (3) one of the servers in the cluster crashed (4) unhandled exception raised when abnormal data received ... Could you please help explain? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-26 Thread Averell

Thank you Vino. I sometimes got the error message like the one below. It looks like my executors got overloaded. Here I have another question: is there any existing solution that allows me to have the job restored automatically? Thanks and best regards, Averell -- Sent from: http://apache

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread Averell

tored automatically. I am trying to test my scenarios now. Found some issues, and I think it would be better to ask in a separate thread. Thanks and regards, Averell = org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 457d8f370ef8a50bb462946e1f12b80e)

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread Averell

Hi Vino, Could you please tell where I should find the JM and TM logs? I'm running on an AWS EMR using yarn. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-27 Thread Averell

this is the expected behaviour, as my current checkpoint config is to checkpoint every 10s, and it took only a second or two for the listing of those 20K files. Am I correct here? And do we have a solution for this? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-lis

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread Averell

Thank you, Vino. I found it, http://:8088/ Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Small-files source - partitioning based on prefix of file

2018-08-28 Thread Averell

files per minute) * I set the checkpointing interval = 1 minute. In this example, /will the 1st barrier be injected into the stream of file-splits 50 seconds after the 10,000th split, or after the 2,000th one?/ Sorry for being confusing. Thanks and best regards, Averell -- Sent from:

Re: Small-files source - partitioning based on prefix of file

2018-08-28 Thread Averell

Thank you Fabian. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

ElasticSearch 6 - error with UpdateRequest

2018-08-30 Thread Averell

Indexer: RequestIndexer): Unit = { requestIndexer.add(upsertRequest(element)) } What could be the issue here? Thanks for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: ElasticSearch 6 - error with UpdateRequest

2018-08-31 Thread Averell

t vs in my project? If you have some spare time, please help explain. I also would like to know the other way to fix that issue (that you implemented in your branch). Thanks a lot for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Promethus - custom metrics at job level

2018-09-02 Thread Averell

-flink-user-mailing-list-archive.2336050.n4.nabble.com/Service-discovery-for-Prometheus-on-YARN-td21988.html, which looks like a "no" answer to my question. However, I still hope for some easy to use solution. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-ma

Re: Promethus - custom metrics at job level

2018-09-03 Thread Averell

Thank you Reza. I will try your repo first :) Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Logging metrics from within Elasticsearch ActionRequestFailureHandler

2018-09-12 Thread Averell

had some better solution. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Logging metrics from within Elasticsearch ActionRequestFailureHandler

2018-09-13 Thread Averell

hanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Utilising EMR's master node

2018-09-16 Thread Averell

or the JM only? If that is not possible, should I have different hardware configurations between the master node and core nodes (smaller server for the master)? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Utilising EMR's master node

2018-09-17 Thread Averell

that be an expected behaviour? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Utilising EMR's master node

2018-09-19 Thread Averell

Hi Gary, Thanks for your help. Regarding TM configurations, in term of performance, when my 2 servers have 16 vcores each, should I have 2 TMs with 16GB mem, 16 task slots each, or 8 TMs with 4GB mem and 4 task slots each? Thanks and regards, Averell -- Sent from: http://apache-flink-user

Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell

monitor generates 0 records during that 15-20 minutes period) Could someone please help with this? Thank you very much for your time. Best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell

the global level would be too heavy. Thanks and regards, Averell == import java.util.Date import org.apache.flink.api.common.io.FilePathFilter import org.apache.flink.core.fs.Path import org.slf4j.LoggerFactory object SdcFilePathFilter { private val TIME_FORMAT = new

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell

Please refer to this version: === import java.util.Date import org.apache.flink.api.common.io.FilePathFilter import org.apache.flink.core.fs.Path import org.slf4j.LoggerFactory object SdcFilePathFilter { private val TIME_FORMAT = new java.text.SimpleDateFormat("MMdd hhmm

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-23 Thread Averell

help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-24 Thread Averell

, checkpointing process is not triggered though. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Scheduling sources

2018-09-24 Thread Averell

help give some help? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-24 Thread Averell

to ongoing checkpoint. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Averell

this 30s timer? Thanks and regards, Averell 2018-09-25 12:01:13.222 [Canceler/Interrupts for Source: Custom File Source (1/1) (a5f5434070044510eafc9103bc24af43).] WARN org.apache.flink.runtime.taskmanager.Task - Task 'Source: Custom File Source (1/1)' did not react to cancelling sig

Re: Scheduling sources

2018-09-25 Thread Averell

) gets enriched properly, I want to have (2a) loaded properly into memory before starting to process (1). Is there any walkaround solution for me in this case? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Utilising EMR's master node

2018-09-26 Thread Averell

configurations then still 4 TMs created, but the occupied ones are always on two different servers. I'm not sure whether that's EMR's issue, or YARN's or Flink's. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Scheduling sources

2018-09-26 Thread Averell

Hi Kostas, So that means my 2a will be broadcasted to all TMs? Is that possible to partition that? As I'm using CoProcessFunction to join 1 with 2. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Scheduling sources

2018-09-26 Thread Averell

Hi Tison, "/setting a timer on schedule start and trigger (1) after a fixed delay/" would be quite sufficient for me. Looking forward to the change of that Jira ticket's status. Thanks for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-arc

Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-26 Thread Averell

ogram.png> I'm using Prometheus with Grafana. Is that possible to do what I mentioned? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-27 Thread Averell

tp://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/Histo2.png> Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-28 Thread Averell

oring one) will be separated? Is that possible to disable checkpointing for that 2nd branch? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Difference between BucketingSink and StreamingFileSink

2018-10-04 Thread Averell

to parquet? Is that possible to partition my output basing on event-time? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Difference between BucketingSink and StreamingFileSink

2018-10-04 Thread Averell

Hi, https://issues.apache.org/jira/browse/FLINK-9749 <<< as per this ticket, StreamingFileSink is a newer option, which is better than BucketingSink for Parquet. Would love to see some example one using that. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mai

Re: Difference between BucketingSink and StreamingFileSink

2018-10-04 Thread Averell

rmat(new java.util.Date(in.getTimestamp))}" } override def getSerializer = SimpleVersionedStringSerializer.INSTANCE } Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Identifying missing events in keyed streams

2018-10-04 Thread Averell

not. Haven't found a guide for defining patterns of missing events. Could you please give some advices? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming to Parquet Files in HDFS

2018-10-04 Thread Averell

ects/flink/flink-docs-master/dev/connectors/streamfile_sink.html> , buck-enconding can only combined with OnCheckpointRollingPolicy, which rolls on every checkpoint. So setting that CheckInterval makes no difference. So why should we expose that withBucketCheckInterval method? Thanks a

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Averell

ady-to-use solution that supports copying/moving file from HDFS to S3 (something like a trigger from Flink after it has finished writing to HDFS). Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Averell

What a great news. Thanks for that, Kostas. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Averell

HADOOP configurations? Thanks and best regards, Averell java.lang.Exception: unable to establish the security context at org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:73) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1118) Caused by

"unable to establish the security context" with shaded Hadoop S3

2018-10-05 Thread Averell

stacktrace below. The shading of hadoop jars started from this ticket FLINK-10366 <https://issues.apache.org/jira/browse/FLINK-10366> . Googling the error didn't help. Could someone please help me? Thanks and best regards, Averell /Setting HADOOP_CONF_DIR=/etc/hadoop/conf be

Re: Utilising EMR's master node

2018-10-06 Thread Averell

Hi Gary, Thanks for the information. I didn't know that -yn is obsolete :( I am using Flink 1.6. Not sure whether that's a bug when I tried to set -yn explicitly, but I started only 1 cluster. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.

1 2 >

1 - 100 of 166 matches

Mail list logo