Read JSON file as input

2016-04-25 Thread Punit Naik
Hi I am new to Flink. I was experimenting with the Dataset API and found out that there is no explicit method for loading a JSON file as input. Can anyone please suggest me a workaround? -- Thank You Regards Punit Naik

Re: [DISCUSS] Graph algorithms for vertex and edge degree

2016-04-25 Thread Greg Hogan
Hi Fabian, I don't know if this has been looked at. There is discussion of BipartiteGraph in FLINK-2254. If Gelly had DirectedGraph and UndirectedGraph then the API could stay lean while methods could be tuned for the specific graph type. I do like having simple APIs such as DataSet and Graph for

Re: Parallelizing ExecutionConfig.fromCollection

2016-04-25 Thread Greg Hogan
Hi Till, I appreciate the detailed explanation. My specific case has been with the graph generators. I think it is possible to implement some random sources using SplittableIterator rather than building a Collection, so it might be best to rework the graph generator API to better fit the Flink mod

[jira] [Created] (FLINK-3811) Refactor ExecutionEnvironment in TableEnvironment

2016-04-25 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-3811: Summary: Refactor ExecutionEnvironment in TableEnvironment Key: FLINK-3811 URL: https://issues.apache.org/jira/browse/FLINK-3811 Project: Flink Issue Type: I

Re: Pull request failed Travis...what's next?

2016-04-25 Thread Fabian Hueske
Hi Sourigna, usually a PR is picked up and reviewed by somebody from the community and eventually merged by a committer. Sometimes it takes a few days, but if nobody reacts it helps to ping just like you did. I'll have a look at your PR tomorrow. Thanks, Fabian 2016-04-25 18:01 GMT+02:00 Sourign

Re: Partition problem

2016-04-25 Thread Andrew Palumbo
sorry - just noticed below should read: val rowsA = (0 until inCoreA.nrow).map(i => (i, inCoreA(i, ::))) drmA = env.fromCollection(rowsA).partitionByRange(0) val rowsB = (0 until inCoreB.nrow).map(i => (i, inCoreB(i, ::))) drmB = env.fromCollection(rowsB).partitionByRange(0)

Re: Partition problem

2016-04-25 Thread Andrew Palumbo
Thank you Fabian and Till for answering, I think that my explanation of the problem was a bit over simplified (I am trying to implement an operator that will pass our tests, and didn't want to throw too much code at you). I realize that this is an odd case, a 2x2 matrix in a distributed conte

[jira] [Created] (FLINK-3810) Missing break in ZooKeeperLeaderElectionService#handleStateChange()

2016-04-25 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3810: - Summary: Missing break in ZooKeeperLeaderElectionService#handleStateChange() Key: FLINK-3810 URL: https://issues.apache.org/jira/browse/FLINK-3810 Project: Flink Issue Ty

[jira] [Created] (FLINK-3809) Missing break in ZooKeeperLeaderRetrievalService#handleStateChange()

2016-04-25 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3809: - Summary: Missing break in ZooKeeperLeaderRetrievalService#handleStateChange() Key: FLINK-3809 URL: https://issues.apache.org/jira/browse/FLINK-3809 Project: Flink Issue T

Re: Pull request failed Travis...what's next?

2016-04-25 Thread Sourigna Phetsarath
Who's responsible for merging the PRs? What's the usual timeline for feedback and/or merging? Thank you. On Thu, Apr 21, 2016 at 6:09 PM, Flavio Pompermaier wrote: > We just issued a PR about FLINK-1827 ( > https://github.com/apache/flink/pull/1915) that improves test stability > except for th

[jira] [Created] (FLINK-3808) Refactor the whole file monitoring source to take a fileInputFormat as an argument.

2016-04-25 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-3808: - Summary: Refactor the whole file monitoring source to take a fileInputFormat as an argument. Key: FLINK-3808 URL: https://issues.apache.org/jira/browse/FLINK-3808 P

Re: Parallelizing ExecutionConfig.fromCollection

2016-04-25 Thread Till Rohrmann
Hi Greg, I think we haven't discussed the opportunity for a parallelized collection input format, yet. Thanks for bringing this up. I think it should be possible to implement a generic parallel collection input format. However, I have two questions here: 1. Is it really a problem for users that

Parallelizing ExecutionConfig.fromCollection

2016-04-25 Thread Greg Hogan
Hi, CollectionInputFormat currently enforces a parallelism of 1 by implementing NonParallelInput and serializing the entire Collection. If my understanding is correct this serialized InputFormat is often the cause of a new job exceeding the akka message size limit. As an alternative the Collectio

Re: [DISCUSS] Release Flink 1.0.3

2016-04-25 Thread Ufuk Celebi
I think FLINK-3803 can be included as well for 1.0.3 (checkpoint stats tracker configuration was not picked up). The PR is here: https://github.com/apache/flink/pull/1927 On Mon, Apr 25, 2016 at 12:06 PM, Robert Metzger wrote: > Okay, I'll merge it > > On Mon, Apr 25, 2016 at 12:01 PM, Ufuk Cele

Re: [DISCUSS] Graph algorithms for vertex and edge degree

2016-04-25 Thread Fabian Hueske
Hi Greg and Vasia, thanks for starting this discussion. I think it is a good idea to update the Gelly roadmap. As Vasia said, many items on the list have been implemented and other have been more or less dropped. Also new persons who want to improve Gelly have joint the community while others have

Re: [DISCUSS] Methods for translating Graphs

2016-04-25 Thread Fabian Hueske
Hi Greg, sorry for the late reply. I am not super familiar with Gelly, but the use cases you describe sound quite common to me. I had a (very) brief look at the PR and the changes seem to be rather lightweight. So, in my opinion this looks like a valuable addition. Thanks, Fabian 2016-04-21 18:

Re: Sqoop-like module in Flink

2016-04-25 Thread Fabian Hueske
Hi Flavio, sorry for not replying earlier. I think there is definitely need to improve the JdbcInputFormat. All your points wrt to the current JdbcInputFormat are valid and fixing them would be a big improvement and highly welcome contribution, IMO. I am not so sure about adding a flink-sqoop mod

[jira] [Created] (FLINK-3807) FastFailuresITCase deadlocks on Travis

2016-04-25 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3807: Summary: FastFailuresITCase deadlocks on Travis Key: FLINK-3807 URL: https://issues.apache.org/jira/browse/FLINK-3807 Project: Flink Issue Type: Bug Affe

Re: Eclipse Problems

2016-04-25 Thread Robert Metzger
Cool, thank you for working on this! On Mon, Apr 25, 2016 at 1:37 PM, Matthias J. Sax wrote: > I can confirm that the SO answer works. > > I will add a note to the Eclipse setup guide at the web site. > > -Matthias > > > On 04/25/2016 11:33 AM, Robert Metzger wrote: > > It seems that the user re

Re: Eclipse Problems

2016-04-25 Thread Matthias J. Sax
I can confirm that the SO answer works. I will add a note to the Eclipse setup guide at the web site. -Matthias On 04/25/2016 11:33 AM, Robert Metzger wrote: > It seems that the user resolved the issue on SO, right? > > On Mon, Apr 25, 2016 at 11:31 AM, Ufuk Celebi wrote: > >> On Mon, Apr 25

Re: Partition problem

2016-04-25 Thread Fabian Hueske
Hi Andrew, I might be wrong, but I think this problem is caused by an assumption of how Flink reads input data. In Flink, each InputSplit is not read by a new task and a split does not correspond to a partition. This is different from how Hadoop MR and Spark handle InputSplits. Instead, Flink cre

Re: [DISCUSS] Release Flink 1.0.3

2016-04-25 Thread Robert Metzger
Okay, I'll merge it On Mon, Apr 25, 2016 at 12:01 PM, Ufuk Celebi wrote: > +1 from my side, Robert. It's not changing anything for people who > don't configure it. We also did it for 1.0.2 with the DataSetUtils. > > On Mon, Apr 25, 2016 at 11:06 AM, Robert Metzger > wrote: > > I'm currently wor

Re: [DISCUSS] Release Flink 1.0.3

2016-04-25 Thread Ufuk Celebi
+1 from my side, Robert. It's not changing anything for people who don't configure it. We also did it for 1.0.2 with the DataSetUtils. On Mon, Apr 25, 2016 at 11:06 AM, Robert Metzger wrote: > I'm currently working on the Flink+Bigtop integration and I found this > issue quite annoying: https://i

Re: Eclipse Problems

2016-04-25 Thread Robert Metzger
It seems that the user resolved the issue on SO, right? On Mon, Apr 25, 2016 at 11:31 AM, Ufuk Celebi wrote: > On Mon, Apr 25, 2016 at 12:14 AM, Matthias J. Sax > wrote: > > What do you think about this? > > Hey Matthias! > > Thanks for bringing this up. > > I think it is very desirable to keep

Re: Eclipse Problems

2016-04-25 Thread Ufuk Celebi
On Mon, Apr 25, 2016 at 12:14 AM, Matthias J. Sax wrote: > What do you think about this? Hey Matthias! Thanks for bringing this up. I think it is very desirable to keep support for Eclipse. It's quite a high barrier for new contributors to enforce a specific IDE (although IntelliJ is gaining qu

Re: Partition problem

2016-04-25 Thread Till Rohrmann
Hi Andrew, I think the problem is that you assume that both matrices have the same partitioning. If you guarantee that this is the case, then you can use the subtask index as the block index. But in the general case this is not true, and then you have to calculate the blocks by first assigning a b

Re: [DISCUSS] Release Flink 1.0.3

2016-04-25 Thread Robert Metzger
I'm currently working on the Flink+Bigtop integration and I found this issue quite annoying: https://issues.apache.org/jira/browse/FLINK-3678 In my opinion its a bugfix we can include into the release. Any objections? On Mon, Apr 25, 2016 at 10:48 AM, Ufuk Celebi wrote: > Hey all, > > thanks to

Re: [DISCUSS] Release Flink 1.0.3

2016-04-25 Thread Ufuk Celebi
Hey all, thanks to Robert for starting the discussion. - FLINK-3790 is merged - FLINK-3800 looks good to merge after a test issue is resolved - FLINK-3701 needs some feedback on the latest changes After we have resolved these, I can kick off the first RC for 1.0.3. Are there any other fixes we