HI,
I’m running ElasticSearch as a sink for a batch file processing a CSV file of
6.2 million rows, with each row being 181 numeric values. It quite happily
processes a small example of around 2,000 rows, running each column through a
single parallel pipeline, keyed by column number.
However,
Hi
Do you specify the operatorid for all the operators?[1][2], asking this
because from the exception you gave, if you only add new operators and all
the old operators have specified operatorid, seems there would not throw
such exception.
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/
Usually the database connection pool is thread safe. When you mean task you
mean a single deployed flink job?
I still think a sink is only init once. You can prove it by putting logging
in the open and close.
On Thu., Oct. 17, 2019, 1:39 a.m. Xin Ma, wrote:
> Thanks, John. If I don't static my
It is throwing below error ,
the class I am adding variables have other variable as an object of class
which are also in state.
Caused by: org.apache.flink.util.StateMigrationException: The new state
typeSerializer for operator state must not be incompatible.
at
org.apache.flink.runtime.st
If by task you mean job then yes global static variables initialized im the
main of the job do not get serialized/transfered to the nodes where that
job may get assigned.
The other thing is also since it is a sink, the sink will be serialized to
that node and then initialized so that static variab
Hi there,
I'm using Flink SQL clinet to run the jobs for me. My stream is a JSON with
nested objects. Couldn't find much document on querying the nested JSON, so
I had to flatten the JSON and use as:
SELECT `source.ip`, `destination.ip`, `dns.query`, `organization.id`,
`dns.answers.data` FROM sour
Hi John,
You are right. IMO the batch interval setting is used for increasing the
JDBC execution performance purpose.
The reason why your INSERT INTO statement with a `non_existing_table` the
exception doesn't happen is because the JDBCAppendableSink does not check
table existence beforehand. That
Hi,
*Event Time Window: 15s*
My currentWatermark for Event Time processing is not increasing fast enough
to go past the window maxTimestamp.
I have reduced *bound* used for watermark calculation to just *10 ms*.
I have increased the parallelInput to process input from Kinesis in
parallel to 2 slots
Yes correct, I set it to batch interval = 1 and it works fine. Anyways I
think the JDBC sink could have some improvements like batchInterval + time
interval execution. So if the batch doesn't fill up then execute what ever
is left on that time interval.
On Thu, 17 Oct 2019 at 12:22, Rong Rong wro
Hi,
It sounds like setting up Mirror Maker has very little to do with Flink.
Shouldn’t you try asking this question on the Kafka mailing list?
Piotrek
> On 16 Oct 2019, at 16:06, Vishal Santoshi wrote:
>
> 1. still no clue, apart from the fact that ConsumerConfig gets it from
> somewhere (
Hi,
Take a look at the documentation. This [1] describes an example were a running
query can produce updated results (and thus retracting the previous results).
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html#table-to-stream-conversion
Yes, I think having a time interval execution (for the AppendableSink)
should be a good idea.
Can you please open a Jira issue[1] for further discussion.
--
Rong
[1] https://issues.apache.org/jira/projects/FLINK/issues
On Thu, Oct 17, 2019 at 9:48 AM John Smith wrote:
> Yes correct, I set it t
oh shit.. sorry wrong wrong forum :)
On Thu, Oct 17, 2019 at 1:41 PM Piotr Nowojski wrote:
> Hi,
>
> It sounds like setting up Mirror Maker has very little to do with Flink.
> Shouldn’t you try asking this question on the Kafka mailing list?
>
> Piotrek
>
> On 16 Oct 2019, at 16:06, Vishal Santo
I recorded two:
Time interval: https://issues.apache.org/jira/browse/FLINK-14442
Checkpointing: https://issues.apache.org/jira/browse/FLINK-14443
On Thu, 17 Oct 2019 at 14:00, Rong Rong wrote:
> Yes, I think having a time interval execution (for the AppendableSink)
> should be a good idea.
> Ca
How are folks here managing deployments in production?
We are deploying Flink jobs on EMR manually at the moment but would like to
move towards some form of automation before anything goes into production.
Adding additional EMR Steps to a long running cluster to deploy or update
jobs seems like th
Also pool should be transient because it it holds connections which
shouldn't/cannot be serialized.
On Thu., Oct. 17, 2019, 9:39 a.m. John Smith,
wrote:
> If by task you mean job then yes global static variables initialized im
> the main of the job do not get serialized/transfered to the nodes w
Ya, there will not be a problem of duplicates. But what I'm trying to
achieve is if there a large static state which needs to be present just one
per node rather than storing it per slot that would be ideal. The reason
being is that the state is quite large around 100GB of mostly static data
and it
Hi All,
I'm currently using a tumbling window of 5 seconds using TumblingTimeWindow
but due to change in requirements I would not have to window every incoming
data. With that said I'm planning to use process function to achieve this
selective windowing.
I looked at the example provided in the do
Hi,
I am trying to process data stored on HDFS using flink batch jobs.
Our data is splitted into 16 data nodes.
I am curious to know how data will be pulled from the data nodes with the
same number of parallelism set as the data split on HDFS i.e. 16.
Is the flink task being executed locally on
Hi,
Could you confirm that you’re using POJOSerializer before and after migration?
Best,
Paul Lam
> 在 2019年10月17日,21:34,ApoorvK 写道:
>
> It is throwing below error ,
> the class I am adding variables have other variable as an object of class
> which are also in state.
>
> Caused by: org.apach
Splendid. Thanks for following up and moving the discussion forward :-)
--
Rong
On Thu, Oct 17, 2019 at 11:38 AM John Smith wrote:
> I recorded two:
> Time interval: https://issues.apache.org/jira/browse/FLINK-14442
> Checkpointing: https://issues.apache.org/jira/browse/FLINK-14443
>
>
> On Thu
Hello all,
I am trying to provision a Flink cluster on k8s. Some of the jobs in our
existing cluster use RocksDB state backend. I wanted to take a look at the
Flink helm chart or deployment manifests that provision task managers with
dynamic PV and how they manage it. We are running on kops manage
You may want to look at using instances with local ssd drives. You don’t really
need to keep the state data between instance stops and starts, because Flink
will have to restore from a checkpoint or savepoint, so using ephemeral isn’t a
problem.
Sent from my iPhone
> On Oct 17, 2019, at 11:31
23 matches
Mail list logo