Re: Streaming to Parquet Files in HDFS

2018-10-06 Thread Averell
ask 0 closing in-progress part file for bucket id=dt=2018-09-22 on checkpoint. 2018-10-06 14:40:18.254 [Sink: Unnamed (1/1)] DEBUG o.a.flink.streaming.api.functions.sink.filesystem.Buckets - Subtask 0 checkpointing: BucketState for bucketId=dt=2018-09-22 and bucketPath=s3a://assn-averell/Test/output/d

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Averell
;." >From Flink GUI, all checkpoints were shown as completed successfully. How could I debug further? Thanks a lot for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Averell
is not valid? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Averell
Could you please help give a look? Thanks and best regards, Averell taskmanager.gz <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/taskmanager.gz> org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 606ad5239f5e23cedb85d3e75bf7

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Averell
r HDFS (I tried multiple times), and had not been moved. Is there any kind of improper user code can cause such error? Thanks for your support. Best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Averell
using as keys are of types either String or Long. For this, I don't have to define equals and hashcode method, do I? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Averell
Hi Kostas, No, the same code was used. I (1) started the job, (2) created a savepoint, (3) cancelled the job, (4) restored the job with the same command as in (1) with the addition "-s ". Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming to Parquet Files in HDFS

2018-10-11 Thread Averell
te MultiPartUpload on Test/output/dt=2018-09-20/part-7-5: org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool/ 4. Where is the temporary folder that you store the parquet file before uploading to S3? Tha

Re: Identifying missing events in keyed streams

2018-10-11 Thread Averell
minimum level. Could you please explain why (2) is better? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Identifying missing events in keyed streams

2018-10-13 Thread Averell
Thank you Fabian. Tried (2), and it's working well. I found one more benefit of (2) over (3) is that it allow me to easily raise multiple levels of alarms for each keyed stream (i.e: minor: missed 2 cycles, major: missed 5 cycles,...) Thanks for your help. Regards, Averell -- Sent from:

Re: When does Trigger.clear() get called?

2018-10-13 Thread Averell
. And one related question: for keyed streams, if I know that some keys would never have new events anymore, should/could I remove those streams corresponding to those keys so that I can save some memory allocated to the metadata? Thanks and best regards, Averell -- Sent from: http://apache-flink

Re: When does Trigger.clear() get called?

2018-10-13 Thread Averell
r, so how can I clear those streams? Thank you very much for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Error restoring from savepoint while there's no modification to the job

2018-10-14 Thread Averell
typeInfo, reader);/ Does this create two different operators? If yes, then it seems impossible to assign a UID to the 1st operator. And might it be the cause for my problem? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: When does Trigger.clear() get called?

2018-10-15 Thread Averell
Thank you Fabian. All my doubts are cleared now. Best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Averell
point that I notice is the error doesn't stay on one single operator but changes from time to time (even within the same build). For example, the previous exception I quoted was with a Window operator, while the one below is with CoStreamFlatMap. Thanks and best regards, Averell

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Averell
rators. However, as I mentioned from the 1st email, I got errors when restoring savepoint created by the same version of my application. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

No resource available error while testing HA

2019-01-22 Thread Averell
ed, or a new JobManager will try to connect to the running TMs to resume the job? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: No resource available error while testing HA

2019-01-23 Thread Averell
on to zookeeper and the problem was solved. Then I have another question: when JM cannot start/connect to the JM on .88, why didn't it try on .82 where resource are still available? Thanks and regards, Averell Here is the JM log (from /mnt/var/log/hadoop-yarn/.../jobmanager.log on .82)

Re: No resource available error while testing HA

2019-01-25 Thread Averell
ot in jobmanager.log or in taskmanager.log in EMR's hadoop-yarn logs folder). <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/Screen_Shot_2019-01-25_at_22.png> Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Checkpoints failed with NullPointerException

2019-01-29 Thread Averell
tractStreamOperator.snapshotState(AbstractStreamOperator.java:407) ... 13 more Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Checkpoints failed with NullPointerException

2019-01-29 Thread Averell
I tried to create a savepoint on HDFS, and got the same exception: The program finished with the following exception: org.apache.flink.util.FlinkException: Triggering a savepoint for the job 028e392d02bd229ed08f50a2da5227e2 failed.

Re: No resource available error while testing HA

2019-01-29 Thread Averell
[...] switched from state" followed by a > stacktrace. If you cannot find the exception, the problem might be rooted > in > your log4j or logback configuration. Thanks. I got the point. I am using logback. Tried to configure rolling logs, but not yet success yet. Will ne

Re: No resource available error while testing HA

2019-01-31 Thread Averell
g the logs collected via "yarn logs --applicationId here. But it seems I still missed something. I am using Flink 1.7.1, with yarn-site configuration yarn.resourcemanager.am.max-attempts=5. Flink configurations are all of the default values. Thanks and best regards, Averell flink.log <http:

How to use ExceptionUtils.findThrowable() in Scala

2019-02-07 Thread Averell
flict_engine_exception, reason=[_doc][...]: version conflict, document already exists (current version [1])] / Thanks and best regards, Averell [1] handling-failing-elasticsearch-requests <https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html#handli

Re: No resource available error while testing HA

2019-02-07 Thread Averell
Hi Gary, I am trying to reproduce that problem. BTW, is that possible to change log level (I'm using logback) for a running job? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How to use ExceptionUtils.findThrowable() in Scala

2019-02-07 Thread Averell
P/S: This is the full stack trace 2019-02-07 01:53:12.790 [I/O dispatcher 16] ERROR o.a.f.s.connectors.elasticsearch.ElasticsearchSinkBase - Failed Elasticsearch item request: [...][[...][1]] ElasticsearchException[Elasticsearch exception [type=version_conflict_engine_exception, reason=[_doc][...

ElasticSearchSink - retrying doesn't work in ActionRequestFailureHandler

2019-02-08 Thread Averell
7c4ef9a59ff9da6dafa1 expired before completing. 2019-02-09 04:39:30.961 [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 26 @ 1549687170776 for job 1a1438ca23387c4ef9a59ff9da6dafa1./ Thanks and best regards, Averell [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/e

Re: ElasticSearchSink - retrying doesn't work in ActionRequestFailureHandler

2019-02-13 Thread Averell
Thank you Gordon. That's my exact problem. Will try the fix in 1.7.2 now. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: No resource available error while testing HA

2019-02-13 Thread Averell
regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: ElasticSearchSink - retrying doesn't work in ActionRequestFailureHandler

2019-02-13 Thread Averell
Hi Gordon, Sorry for a noob question: How can I get the RC 1.7.2 build / code to build? I could not find any branch like that in Github. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: ElasticSearchSink - retrying doesn't work in ActionRequestFailureHandler

2019-02-13 Thread Averell
also doesn't have that new class. Maybe Gordon meant 1.7.2 rc2? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: ElasticSearchSink - retrying doesn't work in ActionRequestFailureHandler

2019-02-14 Thread Averell
solution is to use the version field of each ER request - increase it for every time I retried putting the request into the queue. This works for me until now, but it doesn't look right. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Stream enrichment with static data, side inputs for DataStream

2019-02-21 Thread Averell
for every record from the main stream, you read the enrichment data from the saved ValueState to enrich that mainstream record. If no, then I am having the same issue :D Looking at Broadcast State, but there is still something that doesn't look right for me. Regards, Averell -- Sent from

Re: Broadcast state before events stream consumption

2019-02-21 Thread Averell
is kept in-memory at runtime and memory provisioning should be done accordingly. This holds for all operator states."/ I am using RocksDB state backend, and is confused by that statement and yours. Could you please help clarify? Thanks and regards, Averell -- Sent from: http://apache-fl

S3 parquet sink - failed with S3 connection exception

2019-03-04 Thread Averell
checkpoint, but it could not make any further checkpoint - all subsequent checkpoints failed with the same reason. Searching on Internet I could only find one explanation: S3Object has not been closed properly. Could someone please help? Thanks and regards, Averell /The program finished with

Re: S3 parquet sink - failed with S3 connection exception

2019-03-05 Thread Averell
broadcast stream (as mentioned in the document, it doesn't use RocksDB). But not quite sure. Thanks and regards, Averell logs.gz <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/logs.gz> -- Sent from: http://apache-flink-user-mailing-list-archive

Re: S3 parquet sink - failed with S3 connection exception

2019-03-10 Thread Averell
by Flink yet. * reduced the parallelism for my S3 continuous files reader. However, the problem still randomly occurred (random by job executions. When it occurred, the only solution is to cancel the job and restart from the last successful checkpoint). Thanks and regards, Averell [1] Hadoop-AWS

Re: S3 parquet sink - failed with S3 connection exception

2019-03-14 Thread Averell
that Flink GUI's Exception tab is reading from? Thanks and regards, Averell java.lang.ArrayIndexOutOfBoundsException: 122626 at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainLongDictionaryValuesWriter.fallBackDictionaryEncodedData(DictionaryValuesWriter.jav

Where does the logs in Flink GUI's Exception tab come from?

2019-03-14 Thread Averell
Hi everyone, I am running Flink in EMR YARN cluster, and when the job failed and restarted, I could see some logs in the Exception tab of Flink GUI. I could not find this piece of l

Re: Where does the logs in Flink GUI's Exception tab come from?

2019-03-15 Thread Averell
Hi Gary, Thanks a lot for the explanation, and sorry for missing your earlier message. I am clear now. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: No resource available error while testing HA

2019-03-15 Thread Averell
Hi Gary, Thanks for the answer. I missed your most recent answer in this thread too. However, my last question Averell wrote > How about changing the configuration of the Flink job itself during > runtime? > What I have to do now is to take a savepoint, stop the job, change the > c

Identify orphan records after joining two streams

2019-04-15 Thread Averell
make my cluster out-of-memory. Would back-pressure kicks in for this case? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Identify orphan records after joining two streams

2019-04-18 Thread Averell
apply to the low-level-API functions as well? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Serialising null value in case-class

2019-04-26 Thread Averell
hold a value./ I am confused. Why there's the difference between a null String and a null Integer? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Identify orphan records after joining two streams

2019-04-26 Thread Averell
use the out of the box function sideOutputLateData() Not sure whether I would really be benefited from that. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Serialising null value in case-class

2019-04-26 Thread Averell
seen many places in Flink documents that Java primitive types are recommended. But how are Scala types? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Write batch function for apache flink

2019-04-27 Thread Averell
Hi Anurag, Something like this one: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/batch/ Is it what you are looking for? Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Serialising null value in case-class

2019-04-27 Thread Averell
Thank you Timo. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: FileInputFormat that processes files in chronological order

2019-04-27 Thread Averell
on't think FileInputFormat has anything to do here. Use that when your files are in a format not currently supported by Flink. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Sending FileInputSplit to the next operator

2019-04-28 Thread Averell
checkpoint. Could you please help tell me the wrong in that 2nd implementation? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Timestamp and key preservation over operators

2019-04-29 Thread Averell
nt classes), and I have been trying to use DataStreamUtils.reinterpretAsKeyedStream. Is it safe to assume that as long as I dont do transformation on key, I could use that reinterpretAsKeyedStream function? Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archi

Re: Timestamp and key preservation over operators

2019-04-30 Thread Averell
e sources. Could you please recommend a solution/good-practice here? I have one more question about the recommendation [2] to emit timestamp and watermark from within the source function. Is there any way to do that with the file sources? Thanks and best regards, Averell [1] https://ci.apache.org/

Re: Timestamp and key preservation over operators

2019-04-30 Thread Averell
now, is to use MAX_WATERMARK for those idle sources. Not sure whether doing that is recommended? Thanks for your help. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Timestamp and key preservation over operators

2019-05-02 Thread Averell
9, but the watermark stays at 10:00) Thus, my question: what is the easiest way to check the timestamp of a message? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Timestamp and key preservation over operators

2019-05-03 Thread Averell
any way to accomplish that? Currently, I have an assignTimestampsAndWatermarks after my window function, but, as you said, it is against the best practice. Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Timestamp and key preservation over operators

2019-05-03 Thread Averell
Thank you Fabian. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-03 Thread Averell
this way of implementation (using Flink Table) be better than the option no.1 mentioned in my other thread: creating two different (though similar) CoProcessFunction's, maintaining two state tables (for the enrichment stream, one in each function)? Thanks and best regards, Averell

Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-03 Thread Averell
you mentioned, is there any summary page in Flink docs for that? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

IllegalArgumentException with CEP & reinterpretAsKeyedStream

2019-05-05 Thread Averell
w to debug this error, and not sure whether I should use keyed streams with CEP? Thanks and best regards, Averell My code: / val cepInput = streamA.keyBy(r => (r.id1, r.id2)) .connect(streamB.keyBy(r => (r.id1, r.id2))) .flatMap(new MyCa

Re: I want to use MapState on an unkeyed stream

2019-05-06 Thread Averell
>From my understanding, having a fake keyBy (stream.keyBy(r => "dummyString")) means there would be only one slot handling the data. Would a broadcast function [1] work for your case? Regards, Averell [1] https://ci.apache.org/projects/flink/flink-docs-stable/

How to export all not-null keyed ValueState

2019-05-07 Thread Averell
regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How to export all not-null keyed ValueState

2019-05-09 Thread Averell
RichCoFlatMapFunction with a new KeyedBroadcastProcessFunction, which has both functionalities: filter and export? Doing this would require unioning Toggle and Data into one single keyed stream. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

How to generate a sequential watermark which increases by one unit each time

2019-05-21 Thread Averell
for each file. I thought of extending the AssignerWithPeriodicWatermarks interface with a member variable holding that sequence value. However, it seems to me that it is not possible to persist that value during checkpoints. Are there any options for me? Thanks and best regards, Averell -- Sent

Re: How to export all not-null keyed ValueState

2019-05-09 Thread Averell Huyen Levan
collector: Collector[MyEvent]): Unit = { context.applyToKeyedState(toggleStateDescriptor, (k: Key, s: ValueState[Boolean]) => if (s != null) context.output(outputTag, (k, s.value( } } Thanks for your help. Regards, Averell On Thu, May 9, 2019 at 7:31 PM Fab

Re: How to export all not-null keyed ValueState

2019-05-10 Thread Averell Huyen Levan
. There's no use of applyToKeyedState in this implementation. And the problem I got is I received duplicated output (one from each parallelism-instance). Is there any option to modify the keyed state from within the processBroadcastElement() method? Thanks a lot for your help. Regards, Averell

Re: How to export all not-null keyed ValueState

2019-05-10 Thread Averell Huyen Levan
Thank you very much, Fabian. Regards, Averell On Fri, May 10, 2019 at 9:46 PM Fabian Hueske wrote: > Hi Averell, > > I'd go with your approach any state access (given that you use RocksDB > keyed state) or deduplication of messages is going to be more expensive > than a s

<    1   2