can flink do streaming from data sources other than Kafka?

2017-09-06 Thread kant kodali
Hi All, I am wondering if Flink can do streaming from data sources other than Kafka. For example can Flink do streaming from a database like Cassandra, HBase, MongoDb to sinks like says Elastic search or Kafka. Also for out of core stateful streaming. Is RocksDB the only option? Can I use some ot

Additional data read inside dataset transformations

2017-09-06 Thread eSKa
Hello, I will describe my use case shortly with steps for easier understanding: 1) currently my job is loading data from parquet files using HadoopInputFormat along with AvroParquetInputFormat, with current approach: AvroParquetInputFormat inputFormat = new AvroParquetInputFormat();

Re: dynamically partitioned stream

2017-09-06 Thread Martin Eden
Hi Tony, Yes exactly I am assuming the lambda emits a value only after it has been published to the control topic (t1) and at least 1 value arrives in the data topic for each of it's arguments. This will happen at a time t2 > t1. So yes, there is uncertainty with regards to when t2 will happen. Id

Exception when using keyby operator

2017-09-06 Thread Sridhar Chellappa
I am trying to use the KeyBy operator as follows : Pattern myEventsCEPPattern = Pattern.begin("FirstEvent") .subtype(MyEvent.class) .next("SecondEvent") .subtype(MyEvent.class)

Re: dynamically partitioned stream

2017-09-06 Thread Tony Wei
Hi Martin, The performance is an issue, but in your case, yes, it might not be a problem if X << N. However, the other problem is where data should go in the beginning if there is no lambda been received. This problem doesn't associate with performance, but instead with correctness. If you want t

Re: FLINK-6117 issue work around

2017-09-06 Thread sunny yun
Nico, thank you for your reply. I looked at the commit you cherry-picked and nothing in there explains the error you got. ==> The commit I cherry-picked makes setting of 'zookeeper.sasl.disable' work correctly. I changed flink-dist_2.11-1.2.0.jar according to it. So now zookeeper.sasl problem

Re: FLINK-6117 issue work around

2017-09-06 Thread sunny yun
Nico, thank you for your reply./I looked at the commit you cherry-picked and nothing in there explains theerror you got./==> The commit I cherry-picked makes setting of 'zookeeper.sasl.disable' work correctly. I changed flink-dist_2.11-1.2.0.jar according to it.So now zookeeper.sasl problem is gone

MapState Default Value

2017-09-06 Thread Navneeth Krishnan
Hi, Is there a reason behind removing the default value option in MapStateDescriptor? I was using it in the earlier version to initialize guava cache with loader etc and in the new version by default an empty map is returned. Thanks

Question about Flink internals

2017-09-06 Thread Junguk Cho
Hi, All. I am new to Flink. I just installed Flink in clusters and start reading documents to understand Flink internals. After reading some documents, I have some questions. I have some experiences of Storm and Heron before, so I am linking their mechanisms to questions to better understand Flink

Re: Process Function

2017-09-06 Thread Johannes Schulte
Thanks, that helped to see how we could implement this! On Wed, Sep 6, 2017 at 12:01 PM, Timo Walther wrote: > Hi Johannes, > > you can find the implementation for the state clean up here: > https://github.com/apache/flink/blob/master/flink- > libraries/flink-table/src/main/scala/org/apache/flin

Flink 1.2.1 JobManager Election Deadlock

2017-09-06 Thread James Bucher
Hey all, Just wanted to report this for posterity in case someone else sees something similar. We were running Flink 1.2.1 in Kubernetes. We use an HA setup with an external Zookeeper and S3 for checkpointing. We recently noticed a job that appears to have deadlocked on JobManager Leader electi

Re: FLINK-6117 issue work around

2017-09-06 Thread Nico Kruber
I looked at the commit you cherry-picked and nothing in there explains the error you got. This rather sounds like something might be mixed up between (remaining artefacts of) flink 1.3 and 1.2. Can you verify that nothing of your flink 1.3 tests remains, e.g. running JobManager or TaskManager i

Re: dynamically partitioned stream

2017-09-06 Thread Martin Eden
Hi Aljoscha, Tony, We actually do not need all the keys to be on all nodes where lambdas are. We just need the keys that represent the data for the lambda arguments to be routed to the same node as the lambda, whichever one it might be. Essentially in the solution we emit the data multiple times

Re: Apache Phenix integration

2017-09-06 Thread Flavio Pompermaier
Maybe this should be well documented also...is there any dedicated page to Flink and JDBC connectors? On Wed, Sep 6, 2017 at 4:12 PM, Fabian Hueske wrote: > Great! > > If you want to, you can open a PR that adds > > if (!conn.getAutoCommit()) { > conn.setAutoCommit(true); > } > > to JdbcOutput

Re: Apache Phenix integration

2017-09-06 Thread Fabian Hueske
Great! If you want to, you can open a PR that adds if (!conn.getAutoCommit()) { conn.setAutoCommit(true); } to JdbcOutputFormat.open(). Cheers, Fabian 2017-09-06 15:55 GMT+02:00 Flavio Pompermaier : > Hi Fabian, > thanks for the detailed answer. Obviously you are right :) > As stated by h

Re: Fwd: some question about side output

2017-09-06 Thread Biplob Biswas
Change the type of the mainstream from DataStream to SingleOutputStreamOperator The getSideOutput() function is not part of the base class DataStream rather the extended Class SingleOutputStreamOperator -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Empty state restore seems to be broken for Kafka source (1.3.2)

2017-09-06 Thread Aljoscha Krettek
After discussing this between Stefan and me we think that this should actually work. Do you have the log output from restoring the Kafka Consumer? It would be interesting to see whether any of those print: - https://github.com/apache/flink/blob/f1a173addd99e5df00921b924352a39810d8d180/flink-co

Re: Apache Phenix integration

2017-09-06 Thread Flavio Pompermaier
Hi Fabian, thanks for the detailed answer. Obviously you are right :) As stated by https://phoenix.apache.org/tuning.html auto-commit is disabled by default in Phoenix, but it can be easily enabled just appending AutoCommit=true to the connection URL or, equivalently, setting the proper property in

Re: Apache Phenix integration

2017-09-06 Thread Fabian Hueske
Hi, According to the JavaDocs of java.sql.Connection, commit() will throw an exception if the connection is in auto commit mode which should be the default. So adding this change to the JdbcOutputFormat seems a bit risky. Maybe the Phoenix JDBC connector does not enable auto commits by default (o

Re: DataSet: partitionByHash without materializing/spilling the entire partition?

2017-09-06 Thread Fabian Hueske
Hi Billy, a program that is defined as Dataset -> Map > Filter -> Map -> Output should not spill at all. There is an unnecessary serialization/deserialization step between the last map and the sink, but there shouldn't be any spilling to disk. As I said in my response to Urs, spilling should o

Re: Empty state restore seems to be broken for Kafka source (1.3.2)

2017-09-06 Thread Aljoscha Krettek
Yes, and that's essentially what's happening in the 1.4-SNAPSHOT consumer which also has discovery of new partitions. Starting from 1.4-SNAPSHOT we store state in a union state, i.e. all sources get all partition on restore and if they didn't get any they know that they are new. There is no spec

Re: DataSet: partitionByHash without materializing/spilling the entire partition?

2017-09-06 Thread Fabian Hueske
btw. not sure if you know that you can visualize the JSON plan returned by ExecutionEnvironment.getExecutionPlan() on the website [1]. Best, Fabian [1] http://flink.apache.org/visualizer/ 2017-09-06 14:39 GMT+02:00 Fabian Hueske : > Hi Urs, > > a hash-partition operator should not spill. In ge

Re: DataSet: partitionByHash without materializing/spilling the entire partition?

2017-09-06 Thread Fabian Hueske
Hi Urs, a hash-partition operator should not spill. In general, DataSet plans aim to be as much pipelined as possible. There are a few cases when spilling happens: - full sort with not sufficient memory - hash-tables that need to spill (only in join operators) - range partitioning to compute a hi

Re: Empty state restore seems to be broken for Kafka source (1.3.2)

2017-09-06 Thread Gyula Fóra
Wouldnt it be enough that Kafka sources store some empty container for there state if it is empty, compared to null when it should be bootstrapped again? Gyula Aljoscha Krettek ezt írta (időpont: 2017. szept. 6., Sze, 14:31): > The problem here is that context.isRestored() is a global flag and

Re: Flink on AWS EMR Protobuf

2017-09-06 Thread Aljoscha Krettek
Hi, Could you please give a bit more context around that exception? Maybe a log or a full stack trace. Best, Aljoscha > On 5. Sep 2017, at 23:52, ant burton wrote: > > Hello, > > Has anybody experienced the following error on AWS EMR 5.8.0 with Flink 1.3.1 > > java.lang.ClassCastException:

Re: Empty state restore seems to be broken for Kafka source (1.3.2)

2017-09-06 Thread Aljoscha Krettek
The problem here is that context.isRestored() is a global flag and not local to each operator. It says "yes this job was restored" but the source would need to know that it is actually brand new and never had any state. This is quite tricky to do, since there is currently no way (if I'm correct)

Re: Securing Flink Monitoring REST API

2017-09-06 Thread avivros
Does jobmanager.web.ssl.enabled supports Client SSL Authentication? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Empty state restore seems to be broken for Kafka source (1.3.2)

2017-09-06 Thread Stefan Richter
Thanks for the report, I will take a look. > Am 06.09.2017 um 11:48 schrieb Gyula Fóra : > > Hi all, > > We are running into some problems with the kafka source after changing the > uid and restoring from the savepoint. > What we are expecting is to clear the partition state, and set it up all

Re: Process Function

2017-09-06 Thread Timo Walther
Hi Johannes, you can find the implementation for the state clean up here: https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/ProcessFunctionWithCleanupState.scala and a example usage here: https://github.com/apache/flin

Empty state restore seems to be broken for Kafka source (1.3.2)

2017-09-06 Thread Gyula Fóra
Hi all, We are running into some problems with the kafka source after changing the uid and restoring from the savepoint. What we are expecting is to clear the partition state, and set it up all over again, but what seems to happen is that the consumer thinks that it doesnt have any partitions assi

Apache Phenix integration

2017-09-06 Thread Flavio Pompermaier
Hi to all, I'm writing a job that uses Apache Phoenix. At first I used the PhoenixOutputFormat as (hadoop) OutputFormat but it's not well suited to work with Table API because it cannot handle generic objects like Rows (it need a DBWritable Object that should be already present at compile time). S

Re: Shuffling between map and keyBy operator

2017-09-06 Thread Kurt Young
Hi Marchant, I'm afraid that the serde cost still exists even if both operators run in same TaskManager. Best, Kurt On Tue, Sep 5, 2017 at 9:26 PM, Marchant, Hayden wrote: > I have a streaming application that has a keyBy operator followed by an > operator working on the keyed values (a custom

Re: Process Function

2017-09-06 Thread Aljoscha Krettek
Hi, I'm actually not very familiar with the current Table API implementations but Fabian or Timo (cc'ed) should know more. I suspect very much that this is implemented like this, yes. Best, Aljoscha > On 5. Sep 2017, at 21:14, Johannes Schulte wrote: > > Hi, > > one short question I had tha

Re: Union limit

2017-09-06 Thread Fabian Hueske
Hi, the following code should do what you want. I included an implementation of an IdMapper. At the end, I print the execution plan which is generated after the optimization (so the pipeline is working until then). Best, Fabian val data: Seq[Seq[Int]] = (1 until 315).map(i => Seq(1, 2, 3)) val