Thanks for taking care! On Sun, Oct 9, 2016 at 4:05 PM DuyHai Doan <doanduy...@gmail.com> wrote:
> I have created a JIRA epic to track down all the task: > https://issues.apache.org/jira/browse/ZEPPELIN-1525 > > I think I would start by the synchronize blocks and then move onto Eric's > PR for Guice DI. > > After we have a DI mechanism, it will be much easier to inject thread pools > for thread management and also to create JMX monitoring > > Any objection before I start coding ? > > > > On Sat, Oct 8, 2016 at 2:05 PM, Eric Charles <e...@apache.org> wrote: > > > On 04/10/16 12:54, Anthony Corbacho wrote: > > > >> You made my day, this is the kind of email i really like !! > >> > >> I think its a great idea and i am willing to spend sometime on it. > >> > >> I also want to move to a DI (guice) architecture , let me know what you > >> think about it. > >> > > > > A PR is opened for Guice DI. If someone jumps for review, I can rebase > > > > https://github.com/apache/zeppelin/pull/1361 > > > > > > > > > >> On Tuesday, 4 October 2016, DuyHai Doan <doanduy...@gmail.com> wrote: > >> > >> Hello devs > >>> > >>> The code base of Zeppelin has grown very fast in the last 12 months and > >>> it's great. It means that we have more and more contributors. > >>> > >>> However, to make the project maintainable at long term, we need regular > >>> code refactoring. > >>> > >>> I have some ideas to share with you > >>> > >>> 1) Use Java 8 to benefit from Lambda & streams. > >>> > >>> Now that Java 8 is well established, it is a good time to upgrade the > >>> project. I believe some interpreters also need Java 8. Cassandra > >>> interpreter right now does not have unit tests for the latest features > >>> because the Embedded Cassandra server used for testing requires Java 8. > >>> > >>> It would also be a good opportunity to go through the code base and > >>> replace some boilerplate for() loop with manual filtering by the stream > >>> shortcut : list.stream().filter(..).map(). It would improve greatly > >>> code > >>> readability > >>> > >>> 2) Multi threading > >>> > >>> I've seen the usage of synchronize block at a few places in the code > >>> base. > >>> Although perfectly valid, it has a cost at runtime and since more and > >>> more > >>> people are asking for multi-tenancy or using a single Zeppelin instance > >>> to > >>> server multiple users, I guess the synchronized blocks has a huge cost. > >>> > >>> There are some solid alternatives: > >>> > >>> - ConcurrentHashMap if we synchronized on a map > >>> - CopyOnWriteArrayList if we synchronized on a list. > >>> > >>> Of cours each sychronize block should be taken carefully not to > introduce > >>> regression > >>> > >>> 3) Thread management > >>> > >>> I've seen some usage of new Thread() {...}.run(); it may be a good time > >>> to > >>> introduce ThreadPool and pass them along (inside context objects for > >>> example) to have a more centralized thread management > >>> > >>> The advantage of having thread pool is that we can manage them in a > >>> single > >>> place, monitor them and expose the info through JMX and also control > >>> system > >>> resource by defining max thread number and thread pool queue > >>> > >>> 4) Server monitoring > >>> I hear many users on the field complain about the fact that they have > to > >>> restart Zeppelin server regularly because it "hangs" after running a > long > >>> time. > >>> > >>> If we can expose some system metrics through JMX, it would help people > >>> monitor the state of Zeppelin server and take appropriate actions > >>> > >>> Right now we may only focus on monitoring the server itself, not the > >>> interpreter JVMs processes. It can be done in a 2nd step > >>> > >>> > >>> What do you think about the ideas ? > >>> > >>> > >> >