Sounds you are damn right! thanks for the insight, dumb on us for not checking this before.
saluti, Stefano 2016-04-13 11:05 GMT+02:00 Stephan Ewen <se...@apache.org>: > Sounds actually not like a Flink issue. I would look into the commons pool > docs. > Maybe they size their pools by default with the number of cores, so the > pool has only 8 threads, and other requests are queues? > > On Wed, Apr 13, 2016 at 10:29 AM, Flavio Pompermaier <pomperma...@okkam.it > > wrote: > >> Any feedback about our JDBC InputFormat issue..? >> >> On Thu, Apr 7, 2016 at 12:37 PM, Flavio Pompermaier <pomperma...@okkam.it >> > wrote: >> >>> We've finally created a running example (For Flink 0.10.2) of our >>> improved JDBC imputformat that you can run from an IDE (it creates an >>> in-memory derby database with 1000 rows and batch of 10) at >>> https://gist.github.com/fpompermaier/bcd704abc93b25b6744ac76ac17ed351. >>> The first time you run the program you have to comment the following >>> line: >>> >>> stmt.executeUpdate("Drop Table users "); >>> >>> In your pom declare the following dependencies: >>> >>> <dependency> >>> <groupId>org.apache.derby</groupId> >>> <artifactId>derby</artifactId> >>> <version>10.10.1.1</version> >>> </dependency> >>> <dependency> >>> <groupId>org.apache.commons</groupId> >>> <artifactId>commons-pool2</artifactId> >>> <version>2.4.2</version> >>> </dependency> >>> >>> In my laptop I have 8 cores and if I put parallelism to 16 I expect to >>> see 16 calls to the connection pool (i.e. '==================== CREATING >>> NEW CONNECTION!') while I see only 8 (up to my maximum number of cores). >>> The number of created task instead is correct (16). >>> >>> I hope this could help in understanding where the problem is! >>> >>> Best and thank in advance, >>> Flavio >>> >>> On Wed, Mar 30, 2016 at 11:01 AM, Stefano Bortoli <s.bort...@gmail.com> >>> wrote: >>> >>>> Hi Ufuk, >>>> >>>> here is our preliminary input formar implementation: >>>> https://gist.github.com/anonymous/dbf05cad2a6cc07b8aa88e74a2c23119 >>>> >>>> if you need a running project, I will have to create a test one cause I >>>> cannot share the current configuration. >>>> >>>> thanks a lot in advance! >>>> >>>> >>>> >>>> 2016-03-30 10:13 GMT+02:00 Ufuk Celebi <u...@apache.org>: >>>> >>>>> Do you have the code somewhere online? Maybe someone can have a quick >>>>> look over it later. I'm pretty sure that is indeed a problem with the >>>>> custom input format. >>>>> >>>>> – Ufuk >>>>> >>>>> On Tue, Mar 29, 2016 at 3:50 PM, Stefano Bortoli <s.bort...@gmail.com> >>>>> wrote: >>>>> > Perhaps there is a misunderstanding on my side over the parallelism >>>>> and >>>>> > split management given a data source. >>>>> > >>>>> > We started from the current JDBCInputFormat to make it multi-thread. >>>>> Then, >>>>> > given a space of keys, we create the splits based on a fetchsize set >>>>> as a >>>>> > parameter. In the open, we get a connection from the pool, and >>>>> execute a >>>>> > query using the split interval. This sets the 'resultSet', and then >>>>> the >>>>> > DatasourceTask iterates between reachedEnd, next and close. On >>>>> close, the >>>>> > connection is returned to the pool. We set parallelism to 32, and we >>>>> would >>>>> > expect 32 connection opened but the connections opened are just 8. >>>>> > >>>>> > We tried to make an example with the textinputformat, but being a >>>>> > delimitedinpurformat, the open is called sequentially when >>>>> statistics are >>>>> > built, and then the processing is executed in parallel just after >>>>> all the >>>>> > open are executed. This is not feasible in our case, because there >>>>> would be >>>>> > millions of queries before the statistics are collected. >>>>> > >>>>> > Perhaps we are doing something wrong, still to figure out what. :-/ >>>>> > >>>>> > thanks a lot for your help. >>>>> > >>>>> > saluti, >>>>> > Stefano >>>>> > >>>>> > >>>>> > 2016-03-29 13:30 GMT+02:00 Stefano Bortoli <s.bort...@gmail.com>: >>>>> >> >>>>> >> That is exactly my point. I should have 32 threads running, but I >>>>> have >>>>> >> only 8. 32 Task are created, but only only 8 are run concurrently. >>>>> Flavio >>>>> >> and I will try to make a simple program to produce the problem. If >>>>> we solve >>>>> >> our issues on the way, we'll let you know. >>>>> >> >>>>> >> thanks a lot anyway. >>>>> >> >>>>> >> saluti, >>>>> >> Stefano >>>>> >> >>>>> >> 2016-03-29 12:44 GMT+02:00 Till Rohrmann <trohrm...@apache.org>: >>>>> >>> >>>>> >>> Then it shouldn’t be a problem. The ExeuctionContetxt is used to >>>>> run >>>>> >>> futures and their callbacks. But as Ufuk said, each task will >>>>> spawn it’s own >>>>> >>> thread and if you set the parallelism to 32 then you should have >>>>> 32 threads >>>>> >>> running. >>>>> >>> >>>>> >>> >>>>> >>> On Tue, Mar 29, 2016 at 12:29 PM, Stefano Bortoli < >>>>> s.bort...@gmail.com> >>>>> >>> wrote: >>>>> >>>> >>>>> >>>> In fact, I don't use it. I just had to crawl back the runtime >>>>> >>>> implementation to get to the point where parallelism was >>>>> switching from 32 >>>>> >>>> to 8. >>>>> >>>> >>>>> >>>> saluti, >>>>> >>>> Stefano >>>>> >>>> >>>>> >>>> 2016-03-29 12:24 GMT+02:00 Till Rohrmann <till.rohrm...@gmail.com >>>>> >: >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> for what do you use the ExecutionContext? That should actually be >>>>> >>>>> something which you shouldn’t be concerned with since it is only >>>>> used >>>>> >>>>> internally by the runtime. >>>>> >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> Till >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Mar 29, 2016 at 12:09 PM, Stefano Bortoli < >>>>> s.bort...@gmail.com> >>>>> >>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> Well, in theory yes. Each task has a thread, but only a number >>>>> is run >>>>> >>>>>> in parallel (the job of the scheduler). Parallelism is set in >>>>> the >>>>> >>>>>> environment. However, whereas the parallelism parameter is set >>>>> and read >>>>> >>>>>> correctly, when it comes to actual starting of the threads, the >>>>> number is >>>>> >>>>>> fix to 8. We run a debugger to get to the point where the >>>>> thread was >>>>> >>>>>> started. As Flavio mentioned, the ExecutionContext has the >>>>> parallelims set >>>>> >>>>>> to 8. We have a pool of connections to a RDBS and il logs the >>>>> creation of >>>>> >>>>>> just 8 connections although parallelism is much higher. >>>>> >>>>>> >>>>> >>>>>> My question is whether this is a bug (or a feature) of the >>>>> >>>>>> LocalMiniCluster. :-) I am not scala expert, but I see some >>>>> variable >>>>> >>>>>> assignment in setting up of the MiniCluster, involving >>>>> parallelism and >>>>> >>>>>> 'default values'. Default values in terms of parallelism are >>>>> based on the >>>>> >>>>>> number of cores. >>>>> >>>>>> >>>>> >>>>>> thanks a lot for the support! >>>>> >>>>>> >>>>> >>>>>> saluti, >>>>> >>>>>> Stefano >>>>> >>>>>> >>>>> >>>>>> 2016-03-29 11:51 GMT+02:00 Ufuk Celebi <u...@apache.org>: >>>>> >>>>>>> >>>>> >>>>>>> Hey Stefano, >>>>> >>>>>>> >>>>> >>>>>>> this should work by setting the parallelism on the >>>>> environment, e.g. >>>>> >>>>>>> >>>>> >>>>>>> env.setParallelism(32) >>>>> >>>>>>> >>>>> >>>>>>> Is this what you are doing? >>>>> >>>>>>> >>>>> >>>>>>> The task threads are not part of a pool, but each submitted >>>>> task >>>>> >>>>>>> creates its own Thread. >>>>> >>>>>>> >>>>> >>>>>>> – Ufuk >>>>> >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> On Fri, Mar 25, 2016 at 9:10 PM, Flavio Pompermaier >>>>> >>>>>>> <pomperma...@okkam.it> wrote: >>>>> >>>>>>> > Any help here? I think that the problem is that the >>>>> JobManager >>>>> >>>>>>> > creates the >>>>> >>>>>>> > executionContext of the scheduler with >>>>> >>>>>>> > >>>>> >>>>>>> > val executionContext = >>>>> ExecutionContext.fromExecutor(new >>>>> >>>>>>> > ForkJoinPool()) >>>>> >>>>>>> > >>>>> >>>>>>> > and thus the number of concurrently running threads is >>>>> limited to >>>>> >>>>>>> > the number >>>>> >>>>>>> > of cores (using the default constructor of the ForkJoinPool). >>>>> >>>>>>> > What do you think? >>>>> >>>>>>> > >>>>> >>>>>>> > >>>>> >>>>>>> > On Wed, Mar 23, 2016 at 6:55 PM, Stefano Bortoli >>>>> >>>>>>> > <s.bort...@gmail.com> >>>>> >>>>>>> > wrote: >>>>> >>>>>>> >> >>>>> >>>>>>> >> Hi guys, >>>>> >>>>>>> >> >>>>> >>>>>>> >> I am trying to test a job that should run a number of tasks >>>>> to >>>>> >>>>>>> >> read from a >>>>> >>>>>>> >> RDBMS using an improved JDBC connector. The connection and >>>>> the >>>>> >>>>>>> >> reading run >>>>> >>>>>>> >> smoothly, but I cannot seem to be able to move above the >>>>> limit of >>>>> >>>>>>> >> 8 >>>>> >>>>>>> >> concurrent threads running. 8 is of course the number of >>>>> cores of >>>>> >>>>>>> >> my >>>>> >>>>>>> >> machine. >>>>> >>>>>>> >> >>>>> >>>>>>> >> I have tried working around configurations and settings, >>>>> but the >>>>> >>>>>>> >> Executor >>>>> >>>>>>> >> within the ExecutionContext keeps on having a parallelism >>>>> of 8. >>>>> >>>>>>> >> Although, of >>>>> >>>>>>> >> course, the parallelism of the execution environment is much >>>>> >>>>>>> >> higher (in fact >>>>> >>>>>>> >> I have many more tasks to be allocated). >>>>> >>>>>>> >> >>>>> >>>>>>> >> I feel it may be an issue of the LocalMiniCluster >>>>> configuration >>>>> >>>>>>> >> that may >>>>> >>>>>>> >> just override/neglect my wish for higher degree of >>>>> parallelism. Is >>>>> >>>>>>> >> there a >>>>> >>>>>>> >> way for me to work around this issue? >>>>> >>>>>>> >> >>>>> >>>>>>> >> please let me know. Thanks a lot for you help! :-) >>>>> >>>>>>> >> >>>>> >>>>>>> >> saluti, >>>>> >>>>>>> >> Stefano >>>>> >>>>>>> > >>>>> >>>>>>> > >>>>> >>>>>>> > >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>>> >>> >>>>> >> >>>>> > >>>>> >>>> >>>> >>> >> >