No problem ;-) On Wed, Apr 13, 2016 at 11:38 AM, Stefano Bortoli <s.bort...@gmail.com> wrote:
> Sounds you are damn right! thanks for the insight, dumb on us for not > checking this before. > > saluti, > Stefano > > 2016-04-13 11:05 GMT+02:00 Stephan Ewen <se...@apache.org>: > >> Sounds actually not like a Flink issue. I would look into the commons >> pool docs. >> Maybe they size their pools by default with the number of cores, so the >> pool has only 8 threads, and other requests are queues? >> >> On Wed, Apr 13, 2016 at 10:29 AM, Flavio Pompermaier < >> pomperma...@okkam.it> wrote: >> >>> Any feedback about our JDBC InputFormat issue..? >>> >>> On Thu, Apr 7, 2016 at 12:37 PM, Flavio Pompermaier < >>> pomperma...@okkam.it> wrote: >>> >>>> We've finally created a running example (For Flink 0.10.2) of our >>>> improved JDBC imputformat that you can run from an IDE (it creates an >>>> in-memory derby database with 1000 rows and batch of 10) at >>>> https://gist.github.com/fpompermaier/bcd704abc93b25b6744ac76ac17ed351. >>>> The first time you run the program you have to comment the following >>>> line: >>>> >>>> stmt.executeUpdate("Drop Table users "); >>>> >>>> In your pom declare the following dependencies: >>>> >>>> <dependency> >>>> <groupId>org.apache.derby</groupId> >>>> <artifactId>derby</artifactId> >>>> <version>10.10.1.1</version> >>>> </dependency> >>>> <dependency> >>>> <groupId>org.apache.commons</groupId> >>>> <artifactId>commons-pool2</artifactId> >>>> <version>2.4.2</version> >>>> </dependency> >>>> >>>> In my laptop I have 8 cores and if I put parallelism to 16 I expect to >>>> see 16 calls to the connection pool (i.e. '==================== CREATING >>>> NEW CONNECTION!') while I see only 8 (up to my maximum number of cores). >>>> The number of created task instead is correct (16). >>>> >>>> I hope this could help in understanding where the problem is! >>>> >>>> Best and thank in advance, >>>> Flavio >>>> >>>> On Wed, Mar 30, 2016 at 11:01 AM, Stefano Bortoli <s.bort...@gmail.com> >>>> wrote: >>>> >>>>> Hi Ufuk, >>>>> >>>>> here is our preliminary input formar implementation: >>>>> https://gist.github.com/anonymous/dbf05cad2a6cc07b8aa88e74a2c23119 >>>>> >>>>> if you need a running project, I will have to create a test one cause >>>>> I cannot share the current configuration. >>>>> >>>>> thanks a lot in advance! >>>>> >>>>> >>>>> >>>>> 2016-03-30 10:13 GMT+02:00 Ufuk Celebi <u...@apache.org>: >>>>> >>>>>> Do you have the code somewhere online? Maybe someone can have a quick >>>>>> look over it later. I'm pretty sure that is indeed a problem with the >>>>>> custom input format. >>>>>> >>>>>> – Ufuk >>>>>> >>>>>> On Tue, Mar 29, 2016 at 3:50 PM, Stefano Bortoli <s.bort...@gmail.com> >>>>>> wrote: >>>>>> > Perhaps there is a misunderstanding on my side over the parallelism >>>>>> and >>>>>> > split management given a data source. >>>>>> > >>>>>> > We started from the current JDBCInputFormat to make it >>>>>> multi-thread. Then, >>>>>> > given a space of keys, we create the splits based on a fetchsize >>>>>> set as a >>>>>> > parameter. In the open, we get a connection from the pool, and >>>>>> execute a >>>>>> > query using the split interval. This sets the 'resultSet', and then >>>>>> the >>>>>> > DatasourceTask iterates between reachedEnd, next and close. On >>>>>> close, the >>>>>> > connection is returned to the pool. We set parallelism to 32, and >>>>>> we would >>>>>> > expect 32 connection opened but the connections opened are just 8. >>>>>> > >>>>>> > We tried to make an example with the textinputformat, but being a >>>>>> > delimitedinpurformat, the open is called sequentially when >>>>>> statistics are >>>>>> > built, and then the processing is executed in parallel just after >>>>>> all the >>>>>> > open are executed. This is not feasible in our case, because there >>>>>> would be >>>>>> > millions of queries before the statistics are collected. >>>>>> > >>>>>> > Perhaps we are doing something wrong, still to figure out what. :-/ >>>>>> > >>>>>> > thanks a lot for your help. >>>>>> > >>>>>> > saluti, >>>>>> > Stefano >>>>>> > >>>>>> > >>>>>> > 2016-03-29 13:30 GMT+02:00 Stefano Bortoli <s.bort...@gmail.com>: >>>>>> >> >>>>>> >> That is exactly my point. I should have 32 threads running, but I >>>>>> have >>>>>> >> only 8. 32 Task are created, but only only 8 are run concurrently. >>>>>> Flavio >>>>>> >> and I will try to make a simple program to produce the problem. If >>>>>> we solve >>>>>> >> our issues on the way, we'll let you know. >>>>>> >> >>>>>> >> thanks a lot anyway. >>>>>> >> >>>>>> >> saluti, >>>>>> >> Stefano >>>>>> >> >>>>>> >> 2016-03-29 12:44 GMT+02:00 Till Rohrmann <trohrm...@apache.org>: >>>>>> >>> >>>>>> >>> Then it shouldn’t be a problem. The ExeuctionContetxt is used to >>>>>> run >>>>>> >>> futures and their callbacks. But as Ufuk said, each task will >>>>>> spawn it’s own >>>>>> >>> thread and if you set the parallelism to 32 then you should have >>>>>> 32 threads >>>>>> >>> running. >>>>>> >>> >>>>>> >>> >>>>>> >>> On Tue, Mar 29, 2016 at 12:29 PM, Stefano Bortoli < >>>>>> s.bort...@gmail.com> >>>>>> >>> wrote: >>>>>> >>>> >>>>>> >>>> In fact, I don't use it. I just had to crawl back the runtime >>>>>> >>>> implementation to get to the point where parallelism was >>>>>> switching from 32 >>>>>> >>>> to 8. >>>>>> >>>> >>>>>> >>>> saluti, >>>>>> >>>> Stefano >>>>>> >>>> >>>>>> >>>> 2016-03-29 12:24 GMT+02:00 Till Rohrmann < >>>>>> till.rohrm...@gmail.com>: >>>>>> >>>>> >>>>>> >>>>> Hi, >>>>>> >>>>> >>>>>> >>>>> for what do you use the ExecutionContext? That should actually >>>>>> be >>>>>> >>>>> something which you shouldn’t be concerned with since it is >>>>>> only used >>>>>> >>>>> internally by the runtime. >>>>>> >>>>> >>>>>> >>>>> Cheers, >>>>>> >>>>> Till >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On Tue, Mar 29, 2016 at 12:09 PM, Stefano Bortoli < >>>>>> s.bort...@gmail.com> >>>>>> >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Well, in theory yes. Each task has a thread, but only a number >>>>>> is run >>>>>> >>>>>> in parallel (the job of the scheduler). Parallelism is set in >>>>>> the >>>>>> >>>>>> environment. However, whereas the parallelism parameter is set >>>>>> and read >>>>>> >>>>>> correctly, when it comes to actual starting of the threads, >>>>>> the number is >>>>>> >>>>>> fix to 8. We run a debugger to get to the point where the >>>>>> thread was >>>>>> >>>>>> started. As Flavio mentioned, the ExecutionContext has the >>>>>> parallelims set >>>>>> >>>>>> to 8. We have a pool of connections to a RDBS and il logs the >>>>>> creation of >>>>>> >>>>>> just 8 connections although parallelism is much higher. >>>>>> >>>>>> >>>>>> >>>>>> My question is whether this is a bug (or a feature) of the >>>>>> >>>>>> LocalMiniCluster. :-) I am not scala expert, but I see some >>>>>> variable >>>>>> >>>>>> assignment in setting up of the MiniCluster, involving >>>>>> parallelism and >>>>>> >>>>>> 'default values'. Default values in terms of parallelism are >>>>>> based on the >>>>>> >>>>>> number of cores. >>>>>> >>>>>> >>>>>> >>>>>> thanks a lot for the support! >>>>>> >>>>>> >>>>>> >>>>>> saluti, >>>>>> >>>>>> Stefano >>>>>> >>>>>> >>>>>> >>>>>> 2016-03-29 11:51 GMT+02:00 Ufuk Celebi <u...@apache.org>: >>>>>> >>>>>>> >>>>>> >>>>>>> Hey Stefano, >>>>>> >>>>>>> >>>>>> >>>>>>> this should work by setting the parallelism on the >>>>>> environment, e.g. >>>>>> >>>>>>> >>>>>> >>>>>>> env.setParallelism(32) >>>>>> >>>>>>> >>>>>> >>>>>>> Is this what you are doing? >>>>>> >>>>>>> >>>>>> >>>>>>> The task threads are not part of a pool, but each submitted >>>>>> task >>>>>> >>>>>>> creates its own Thread. >>>>>> >>>>>>> >>>>>> >>>>>>> – Ufuk >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> On Fri, Mar 25, 2016 at 9:10 PM, Flavio Pompermaier >>>>>> >>>>>>> <pomperma...@okkam.it> wrote: >>>>>> >>>>>>> > Any help here? I think that the problem is that the >>>>>> JobManager >>>>>> >>>>>>> > creates the >>>>>> >>>>>>> > executionContext of the scheduler with >>>>>> >>>>>>> > >>>>>> >>>>>>> > val executionContext = >>>>>> ExecutionContext.fromExecutor(new >>>>>> >>>>>>> > ForkJoinPool()) >>>>>> >>>>>>> > >>>>>> >>>>>>> > and thus the number of concurrently running threads is >>>>>> limited to >>>>>> >>>>>>> > the number >>>>>> >>>>>>> > of cores (using the default constructor of the >>>>>> ForkJoinPool). >>>>>> >>>>>>> > What do you think? >>>>>> >>>>>>> > >>>>>> >>>>>>> > >>>>>> >>>>>>> > On Wed, Mar 23, 2016 at 6:55 PM, Stefano Bortoli >>>>>> >>>>>>> > <s.bort...@gmail.com> >>>>>> >>>>>>> > wrote: >>>>>> >>>>>>> >> >>>>>> >>>>>>> >> Hi guys, >>>>>> >>>>>>> >> >>>>>> >>>>>>> >> I am trying to test a job that should run a number of >>>>>> tasks to >>>>>> >>>>>>> >> read from a >>>>>> >>>>>>> >> RDBMS using an improved JDBC connector. The connection and >>>>>> the >>>>>> >>>>>>> >> reading run >>>>>> >>>>>>> >> smoothly, but I cannot seem to be able to move above the >>>>>> limit of >>>>>> >>>>>>> >> 8 >>>>>> >>>>>>> >> concurrent threads running. 8 is of course the number of >>>>>> cores of >>>>>> >>>>>>> >> my >>>>>> >>>>>>> >> machine. >>>>>> >>>>>>> >> >>>>>> >>>>>>> >> I have tried working around configurations and settings, >>>>>> but the >>>>>> >>>>>>> >> Executor >>>>>> >>>>>>> >> within the ExecutionContext keeps on having a parallelism >>>>>> of 8. >>>>>> >>>>>>> >> Although, of >>>>>> >>>>>>> >> course, the parallelism of the execution environment is >>>>>> much >>>>>> >>>>>>> >> higher (in fact >>>>>> >>>>>>> >> I have many more tasks to be allocated). >>>>>> >>>>>>> >> >>>>>> >>>>>>> >> I feel it may be an issue of the LocalMiniCluster >>>>>> configuration >>>>>> >>>>>>> >> that may >>>>>> >>>>>>> >> just override/neglect my wish for higher degree of >>>>>> parallelism. Is >>>>>> >>>>>>> >> there a >>>>>> >>>>>>> >> way for me to work around this issue? >>>>>> >>>>>>> >> >>>>>> >>>>>>> >> please let me know. Thanks a lot for you help! :-) >>>>>> >>>>>>> >> >>>>>> >>>>>>> >> saluti, >>>>>> >>>>>>> >> Stefano >>>>>> >>>>>>> > >>>>>> >>>>>>> > >>>>>> >>>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>>> >>>> >>>>>> >>> >>>>>> >> >>>>>> > >>>>>> >>>>> >>>>> >>>> >>> >> >