ezhuravlev wrote > Also, maybe it's better to compare your current solution with Ignite on > some real tasks? Or at least more approximate to the real use case > > Evgenii
Hi @ezhuravlev Thank you for your replay! I'm preparing more "fair" comparison with our custom made solution but it can't be done in a simple way due to different technical solutions at the end (I'll try to explain it below). So it requires to migrate some basic functionality to have relatively objective comparison (hope I'll do this soon). ezhuravlev wrote > I don't really understand, what you've tried to measure here? > ..... > Maybe you could describe your case in detail, so we could suggest you a > better solution? I'll try to explain what is the goal of such simple measurement. But it will require additional introduction to the current situation - why and how we have come there. It won't be short but I'll try to be concise. [Goal of simple measurement] Try to evaluate an overhead of job management in a distributed manner. Literally - what time is spent for job management in a cluster of ignite computegrid framework. Idea was: 1. having 16 core (hyper-thead, 8 physical cores) machine I could make an assumption that it can execute in a parallel at least 8x2 jobs (not sure - just an assmaption) 2. create 1 task with empty jobs and count on that a crucial majority of time will be spent in task/jobs management itself As I understood from your feedback it's better to try it on different physical hosts - I'll check it out tomorrow. [why we need it] Short introduction why I do this evaluation (probably not relevant at the end but let's see). Skipping all not relevant (yet) info I'll start describing: few years ago there was a decision made - we need in-memory compute engine. Distributed one. So we could speed up... yes, SQL calculation. So we got this: 1. Create an abstraction for distributed compute grid 2. Hazelcast was chosen as a basis 3. Such as computation was, by its nature, kind of a map-reduce (calculation of reduced values on different levels, starting from store --> warehouse --> country) we chose Hazelcast's MapReduce API 4. We had only few Tasks with few Jobs 5. Jobs were running few minutes due to slow data load Everything was ok as long as we got new type of tasks. Jobs in these tasks where loading little from DB and were more CPU aggressive and amount of such jobs was increased up to 1000. We revealed that overhead for management one job in a cluster was > 2-3 sec., which was unacceptable (Hazelcast admitted that MapReduce framework was buggy and not performant at that point of time) As a rescue we decided to write our custom solution: 1. Introduce Task abstraction 2. Introduce Job abstraction as a type for sub-tasks parallelism. 3. Keep Task management in a distributed map (transaction, state [fail, done, executing]). Distributed map is backed by persistence storage in relational DB 4. Keep Job management in a distributed map (transnational, state [fail, done, executing], collocated run, job steeling and more). Jobs are kept in memory only. 5. Job has no return type (like Runnable) 6. More may come here... left out due to further simplification After some time I realized that our custom solution becomes really heavy for maintenance/support and seems to be that we are reinventing the wheel. I've figured out that Apache Ignite does quite similar stuff and much more (persistence - this is what we wanted to implement 1,5 year ago). So my goal is before switching the project (which is quite big - 24 countries coverage) to see if I not face the same problem with job management overhead from the MapReduce API in Ignite. And if it works we would glad to use another very handy features like Off-heap memory, Ignite Persistence, SQL engine and Data Streaming. I hope it will help to understand what I'm going to achieve here. Thx -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
