Hi All,
I am wondering if Flink can do streaming from data sources other than
Kafka. For example can Flink do streaming from a database like Cassandra,
HBase, MongoDb to sinks like says Elastic search or Kafka.
Also for out of core stateful streaming. Is RocksDB the only option? Can I
use some ot
Hello,
I will describe my use case shortly with steps for easier understanding:
1) currently my job is loading data from parquet files using
HadoopInputFormat along with AvroParquetInputFormat, with current approach:
AvroParquetInputFormat inputFormat = new
AvroParquetInputFormat();
Hi Tony,
Yes exactly I am assuming the lambda emits a value only after it has been
published to the control topic (t1) and at least 1 value arrives in the
data topic for each of it's arguments. This will happen at a time t2 > t1.
So yes, there is uncertainty with regards to when t2 will happen. Id
I am trying to use the KeyBy operator as follows :
Pattern myEventsCEPPattern =
Pattern.begin("FirstEvent")
.subtype(MyEvent.class)
.next("SecondEvent")
.subtype(MyEvent.class)
Hi Martin,
The performance is an issue, but in your case, yes, it might not be a
problem if X << N.
However, the other problem is where data should go in the beginning if
there is no lambda been received. This problem doesn't associate with
performance, but instead with correctness. If you want t
Nico, thank you for your reply.
I looked at the commit you cherry-picked and nothing in there explains the
error you got.
==>
The commit I cherry-picked makes setting of 'zookeeper.sasl.disable' work
correctly.
I changed flink-dist_2.11-1.2.0.jar according to it.
So now zookeeper.sasl problem
Nico, thank you for your reply./I looked at the commit you cherry-picked and
nothing in there explains theerror you got./==> The commit I cherry-picked
makes setting of 'zookeeper.sasl.disable' work correctly. I changed
flink-dist_2.11-1.2.0.jar according to it.So now zookeeper.sasl problem is
gone
Hi,
Is there a reason behind removing the default value option in
MapStateDescriptor? I was using it in the earlier version to initialize
guava cache with loader etc and in the new version by default an empty map
is returned.
Thanks
Hi, All.
I am new to Flink.
I just installed Flink in clusters and start reading documents to
understand Flink internals.
After reading some documents, I have some questions.
I have some experiences of Storm and Heron before, so I am linking their
mechanisms to questions to better understand Flink
Thanks, that helped to see how we could implement this!
On Wed, Sep 6, 2017 at 12:01 PM, Timo Walther wrote:
> Hi Johannes,
>
> you can find the implementation for the state clean up here:
> https://github.com/apache/flink/blob/master/flink-
> libraries/flink-table/src/main/scala/org/apache/flin
Hey all,
Just wanted to report this for posterity in case someone else sees something
similar. We were running Flink 1.2.1 in Kubernetes. We use an HA setup with an
external Zookeeper and S3 for checkpointing. We recently noticed a job that
appears to have deadlocked on JobManager Leader electi
I looked at the commit you cherry-picked and nothing in there explains the
error you got. This rather sounds like something might be mixed up between
(remaining artefacts of) flink 1.3 and 1.2.
Can you verify that nothing of your flink 1.3 tests remains, e.g. running
JobManager or TaskManager i
Hi Aljoscha, Tony,
We actually do not need all the keys to be on all nodes where lambdas are.
We just need the keys that represent the data for the lambda arguments to
be routed to the same node as the lambda, whichever one it might be.
Essentially in the solution we emit the data multiple times
Maybe this should be well documented also...is there any dedicated page to
Flink and JDBC connectors?
On Wed, Sep 6, 2017 at 4:12 PM, Fabian Hueske wrote:
> Great!
>
> If you want to, you can open a PR that adds
>
> if (!conn.getAutoCommit()) {
> conn.setAutoCommit(true);
> }
>
> to JdbcOutput
Great!
If you want to, you can open a PR that adds
if (!conn.getAutoCommit()) {
conn.setAutoCommit(true);
}
to JdbcOutputFormat.open().
Cheers, Fabian
2017-09-06 15:55 GMT+02:00 Flavio Pompermaier :
> Hi Fabian,
> thanks for the detailed answer. Obviously you are right :)
> As stated by h
Change the type of the mainstream from DataStream to
SingleOutputStreamOperator
The getSideOutput() function is not part of the base class DataStream rather
the extended Class SingleOutputStreamOperator
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
After discussing this between Stefan and me we think that this should actually
work.
Do you have the log output from restoring the Kafka Consumer? It would be
interesting to see whether any of those print:
-
https://github.com/apache/flink/blob/f1a173addd99e5df00921b924352a39810d8d180/flink-co
Hi Fabian,
thanks for the detailed answer. Obviously you are right :)
As stated by https://phoenix.apache.org/tuning.html auto-commit is disabled
by default in Phoenix, but it can be easily enabled just appending
AutoCommit=true to the connection URL or, equivalently, setting the proper
property in
Hi,
According to the JavaDocs of java.sql.Connection, commit() will throw an
exception if the connection is in auto commit mode which should be the
default.
So adding this change to the JdbcOutputFormat seems a bit risky.
Maybe the Phoenix JDBC connector does not enable auto commits by default
(o
Hi Billy,
a program that is defined as
Dataset -> Map > Filter -> Map -> Output
should not spill at all.
There is an unnecessary serialization/deserialization step between the last
map and the sink, but there shouldn't be any spilling to disk.
As I said in my response to Urs, spilling should o
Yes, and that's essentially what's happening in the 1.4-SNAPSHOT consumer which
also has discovery of new partitions. Starting from 1.4-SNAPSHOT we store state
in a union state, i.e. all sources get all partition on restore and if they
didn't get any they know that they are new. There is no spec
btw. not sure if you know that you can visualize the JSON plan returned by
ExecutionEnvironment.getExecutionPlan() on the website [1].
Best, Fabian
[1] http://flink.apache.org/visualizer/
2017-09-06 14:39 GMT+02:00 Fabian Hueske :
> Hi Urs,
>
> a hash-partition operator should not spill. In ge
Hi Urs,
a hash-partition operator should not spill. In general, DataSet plans aim
to be as much pipelined as possible.
There are a few cases when spilling happens:
- full sort with not sufficient memory
- hash-tables that need to spill (only in join operators)
- range partitioning to compute a hi
Wouldnt it be enough that Kafka sources store some empty container for
there state if it is empty, compared to null when it should be bootstrapped
again?
Gyula
Aljoscha Krettek ezt írta (időpont: 2017. szept. 6.,
Sze, 14:31):
> The problem here is that context.isRestored() is a global flag and
Hi,
Could you please give a bit more context around that exception? Maybe a log or
a full stack trace.
Best,
Aljoscha
> On 5. Sep 2017, at 23:52, ant burton wrote:
>
> Hello,
>
> Has anybody experienced the following error on AWS EMR 5.8.0 with Flink 1.3.1
>
> java.lang.ClassCastException:
The problem here is that context.isRestored() is a global flag and not local to
each operator. It says "yes this job was restored" but the source would need to
know that it is actually brand new and never had any state. This is quite
tricky to do, since there is currently no way (if I'm correct)
Does jobmanager.web.ssl.enabled supports Client SSL Authentication?
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Thanks for the report, I will take a look.
> Am 06.09.2017 um 11:48 schrieb Gyula Fóra :
>
> Hi all,
>
> We are running into some problems with the kafka source after changing the
> uid and restoring from the savepoint.
> What we are expecting is to clear the partition state, and set it up all
Hi Johannes,
you can find the implementation for the state clean up here:
https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/ProcessFunctionWithCleanupState.scala
and a example usage here:
https://github.com/apache/flin
Hi all,
We are running into some problems with the kafka source after changing the
uid and restoring from the savepoint.
What we are expecting is to clear the partition state, and set it up all
over again, but what seems to happen is that the consumer thinks that it
doesnt have any partitions assi
Hi to all,
I'm writing a job that uses Apache Phoenix.
At first I used the PhoenixOutputFormat as (hadoop) OutputFormat but it's
not well suited to work with Table API because it cannot handle generic
objects like Rows (it need a DBWritable Object that should be already
present at compile time). S
Hi Marchant,
I'm afraid that the serde cost still exists even if both operators run in
same TaskManager.
Best,
Kurt
On Tue, Sep 5, 2017 at 9:26 PM, Marchant, Hayden
wrote:
> I have a streaming application that has a keyBy operator followed by an
> operator working on the keyed values (a custom
Hi,
I'm actually not very familiar with the current Table API implementations but
Fabian or Timo (cc'ed) should know more. I suspect very much that this is
implemented like this, yes.
Best,
Aljoscha
> On 5. Sep 2017, at 21:14, Johannes Schulte wrote:
>
> Hi,
>
> one short question I had tha
Hi,
the following code should do what you want.
I included an implementation of an IdMapper.
At the end, I print the execution plan which is generated after the
optimization (so the pipeline is working until then).
Best, Fabian
val data: Seq[Seq[Int]] = (1 until 315).map(i => Seq(1, 2, 3))
val
34 matches
Mail list logo