Hi Fabian, I figured you might be away - and sorry for interrupting your vacation.
I've got ahead and opened issues for these two items. Regards, -- Ken > From: Fabian Hueske > Sent: March 31, 2016 3:44:07pm PDT > To: dev@flink.apache.org > Subject: Re: cascading-flink 1.0 results > > Hi Ken, > > I'm currently on vacation and will be back in a week. > Would you like to open an issue at the cascading-flink Github project a > describe the Scheme.setNumSinkParts() problem? > I'll try to fix it when I'm back. > > Thanks for checking with Chris the ComparePlatformsTest issue. I'll exclude > that test case. > > Thanks, Fabian > > 2016-03-30 21:46 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>: > >> Hi Fabian, >> >> I've been trying out the cascading-flink 1.0 branch (updated to >> cascading-3.1-wip-56) with our cascading.utils project. >> >> I ran into one initial challenge, where older Kryo versions don't work >> with Flink - it seems like it has to be 2.24.0, otherwise you get a >> no-such-method error (2.19) or an odd hang while Kryo is trying to read >> (2.21). So there was a bit of version management required. I noticed that >> Flink has a dependency on Chill 0.7.4, which depends on Kryo 2.21. >> >> After that change, our tests run, but it looks like the Flink planner is >> ignore the Scheme.setNumSinkParts() call. >> >> E.g. Scheme.setNumSinkParts(1) should result in a single part-00000 file, >> and thus the upstream grouping should implicitly have a parallelism of 1. >> >> This is described as a suggestion (e.g. if your Flow only has maps then no >> such parallelism can be guaranteed) but it does wind up being relied upon >> by many workflows, when generating a small output file that has to be >> globally sorted. >> >> Thanks, >> >> -- Ken >> >> PS - Chris Wensel responded to the >> cascading.ComparePlatformsTest$CompareTestCase issue, and said: >> >>> make sure you ‘exclude’ *TestCase from your unit test pattern. >>> >>> all Cascading tests are *PlatformTest and *Test. there are no tests in >> *TestCase >> >> >> >> >>> From: Fabian Hueske >>> Sent: March 30, 2016 2:04:15am PDT >>> To: dev@flink.apache.org >>> Subject: Re: Expected duration for cascading-flink tests? >>> >>> Hi Ken, >>> >>> regarding the failed tests: >>> - cascading.JoinFieldedPipesPlatformTest$testJoinMergeGroupBy is expected >>> to fail due to restrictions in the MR/Tez engines. If I remember >> correctly, >>> this is about deadlocks that need to be resolved by splitting a job. >>> Flink's optimizer detects such situations and places a dam breaker to >>> resolve such a situation within a single job and is hence able to execute >>> the job correctly. >>> - cascading.ComparePlatformsTest$CompareTestCase I think you are right on >>> this one. When I implemented the runner, I did not find a way to make >> this >>> tests pass. It looked like an issue with the test itself as you assumed >> as >>> well. >>> >>> Btw. I ported the runner to Flink 1.0 and bumped the Cascading 3.1 >>> WIP version already, but haven't done an "official" release yet. You find >>> the code in the flink-1.0 branch [1]. With Flink 1.0, we also extended >> the >>> support for outer joins. It might be possible to get rid of some of the >>> HashJoin restrictions, but I have to take a closer look at how outer hash >>> joins are done with Cascading MR/Tez. >>> Anyway, I can do a Cascading-Flink release for Flink 1.0 soon and extend >>> HashJoin support later. >>> >>> Best, Fabian >>> >>> [1] https://github.com/dataartisans/cascading-flink/tree/flink-1.0 >>> >>> 2016-03-30 6:08 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>: >>> >>>> Hi Fabian, >>>> >>>>> From: Fabian Hueske >>>>> Sent: March 29, 2016 3:51:08pm PDT >>>>> To: dev@flink.apache.org >>>>> Subject: Re: Expected duration for cascading-flink tests? >>>>> >>>>> Hi Ken, >>>>> >>>>> no, this is definitely not expected. The tests complete in about 30 >> mins >>>> on >>>>> my machine. >>>>> Is it possible that you have another Flink process running on your >>>> machine >>>>> (maybe a debug thread in your IDE)? That could explain the "Address >>>> already >>>>> in use" exceptions. >>>> >>>> Good call - I'd run "bin/stop-local.sh" previously, but I see that >> there's >>>> still the Flink process running. >>>> >>>> Re-running bin/stop-local.sh displays "No jobmanager daemon to stop on >>>> host Kens-MacBook-Air.local.", but still doesn't kill off the Flink >> process. >>>> >>>> What might cause that situation? >>>> >>>> In any case, I manually killed the process and started the build again, >>>> and it finished in about 20 minutes, which is great. >>>> >>>> I see the expected errors, e.g. >>>> >>>> HashJoin does only support InnerJoin and LeftJoin but is >>>> cascading.pipe.joiner.OuterJoin >>>> >>>> though this one seems odd: >>>> >>>>> testJoinMergeGroupBy(cascading.JoinFieldedPipesPlatformTest) Time >>>> elapsed: 0.048 sec <<< FAILURE! >>>>> junit.framework.AssertionFailedError: planner should throw error on >> plan >>>> >>>> FlinkTestPlatform needs to return true from supportsGroupByAfterMerge() >> - >>>> assuming that this is actually the case (seems reasonable for Flink) >>>> >>>> Though making that change requires cascading-wip-56 to avoid a >> compilation >>>> error on the @Override. >>>> >>>> There's also this one: >>>> >>>>> Running cascading.ComparePlatformsTest$CompareTestCase >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.053 >>>> sec <<< FAILURE! - in cascading.ComparePlatformsTest$CompareTestCase >>>>> warning(junit.framework.TestSuite$1) Time elapsed: 0.009 sec <<< >>>> FAILURE! >>>>> junit.framework.AssertionFailedError: Class >>>> cascading.ComparePlatformsTest$CompareTestCase has no public constructor >>>> TestCase(String name) or TestCase() >>>>> at junit.framework.Assert.fail(Assert.java:57) >>>>> at junit.framework.TestCase.fail(TestCase.java:227) >>>>> at junit.framework.TestSuite$1.runTest(TestSuite.java:100) >>>> >>>> >>>> But that seems like an issue with the Cascading test code. I'll check >>>> w/Chris and see what he says. >>>> >>>> Anyway, the build worked with the update to cascading-wip-56. >>>> >>>> I also tried updating to Flink 1.0.0 (from 0.10.0), but so far I've run >>>> into some compilation errors, e.g. in FlinkFlowStep.java it can't find >> the >>>> JavaPlan class. >>>> >>>> Thanks again for the help, >>>> >>>> -- Ken >>>> >>>> >>>> >>>>> " >>>>> Best, Fabian >>>>> >>>>> 2016-03-29 20:36 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>: >>>>> >>>>>> An update (and a nudge)… >>>>>> >>>>>> So far it's been more than 20 hours, and the tests are still running. >>>>>> >>>>>> Most tests seem to fail with one of two different errors… >>>>>> >>>>>> 1. Address already in use >>>>>> >>>>>> cascading.flow.FlowException: [test] unhandled exception >>>>>> at cascading.flow.BaseFlow.complete(BaseFlow.java:977) >>>>>> at >>>>>> >>>> >> cascading.flow.FlowStrategiesPlatformTest.testSkipStrategiesReplace(FlowStrategiesPlatformTest.java:67) >>>>>> Caused by: org.jboss.netty.channel.ChannelException: Failed to bind >> to: >>>> / >>>>>> 127.0.0.1:6123 >>>>>> at >>>>>> >> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) >>>>>> … >>>>>> Caused by: java.net.BindException: Address already in use >>>>>> … >>>>>> >>>>>> 2. FlowStepJob.blockOnJob throws a cascading.flow.FlowException >>>>>> >>>>>> All caused by a 100 second timeout >>>>>> >>>>>> Is the above expected? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -- Ken >>>>>> >>>>>>> From: Ken Krugler >>>>>>> Sent: March 28, 2016 3:39:12pm PDT >>>>>>> To: dev@flink.apache.org >>>>>>> Subject: Expected duration for cascading-flink tests? >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I'm curious how long the tests are expected to take for >>>> cascading-flink. >>>>>>> >>>>>>> I know that https://github.com/dataArtisans/cascading-flink >> recommends >>>>>> running mvn clean install with -DskipTests, but I was going to try >>>> updating >>>>>> to flink 1.0.0 (currently using 0.10.0) and cascading 3.1.0-wip-56 >>>>>> (currently on wip-39), so I wanted to first verify that all tests >> passed >>>>>> before updating and then running the tests again. >>>>>>> >>>>>>> In any case, the tests have been running for about 2.5 hours now. >> From >>>>>> what I can tell, it's legit - most of the time is tied to >>>>>> cascading.flow.planner.rul.RuleSetExec's call() method. >>>>>>> >>>>>>> Maybe this is a sign that it's time for a new Mac :) >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -- Ken -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr