The following is taking most of the time: @Nullable private ServiceInfo lookupInRegisteredServices(String name) { for (ServiceInfo desc : registeredServices.values()) { if (desc.name().equals(name)) return desc; }
return null; } After changing that to use a Map lookup: - 50,000 service startup in *8s* (down from around 70s) - 100,000 service startup in *14s* (right around 2x of the 50K timing) Here's the change I tested (note it's shortened) - it's not 100%, but fine for my test cast, I believe: private final ConcurrentMap<String, ServiceInfo> registeredServicesByName = new ConcurrentHashMap<>(); @Nullable private ServiceInfo lookupInRegisteredServices(String name) { return registeredServicesByName.get(name); } private void registerService(ServiceInfo desc) { desc.context(ctx); // (CONCURRENCY NOTE: these two maps need to update concurrently) registeredServices.put(desc.serviceId(), desc); registeredServicesByName.put(desc.name(), desc); } That's in IgniteServiceProcessor.java. Any thoughts? I'll gladly clean this up and make PR - would appreciate feedback to help address possible questions with this change (e.g. is desc.name() unique?). Art On Tue, Jun 28, 2022 at 12:27 PM Arthur Naseef <artnas...@apache.org> wrote: > Yes. The "services" in our case will be schedules that periodically > perform fast operations. > > For example a service could be, "ping this device every <x> seconds". > > Art > > On Tue, Jun 28, 2022 at 12:20 PM Pavel Tupitsyn <ptupit...@apache.org> > wrote: > >> > we do not plan to make cross-cluster calls into the services >> >> If you are making local calls, I think there is no point in using Ignite >> services. >> Can you describe the use case - what are you trying to achieve? >> >> On Tue, Jun 28, 2022 at 8:55 PM Arthur Naseef <artnas...@apache.org> >> wrote: >> >>> Hello - I'm getting started with Ignite and looking seriously at using >>> it for a specific use-case. >>> >>> Working on a Proof-Of-Concept (POC), I am finding a question related to >>> performance, and wondering if the solution, using Ignite Services, is a >>> good fit for the use-case. >>> >>> In my testing, I am getting the following timings: >>> >>> - Startup of 20,000 ignite services takes 30 seconds >>> - Startup of 50,000 ignite services takes 250 seconds >>> - The 2.5x increase from 20,000 to 50,000 yielded > 8x cost in >>> startup time (appears to be exponential growth) >>> >>> Watching the JVM during this time, I see the following: >>> >>> - Heap usage is not significant (do not see signs of GC) >>> - CPU usage is only slightly increased - on the order of 20% total >>> (system has 12 cores/24 threads) >>> - Network utilization is reasonable >>> - Futex system call (measured with "strace -r") appears to be taking >>> the most time by far. >>> >>> The use-case involves the following: >>> >>> - Startup of up-to hundreds-of-thousands of services at cluster >>> spin-up >>> - Frequent, small adjustments to the services running over time >>> - Need to rebalance when a new node joins the cluster, or an old one >>> leaves the cluster >>> - Once the services are deployed, we do not plan to make >>> cross-cluster calls into the services (i.e. we do *not* plan to use >>> ignite's services().serviceProxy() on these) >>> - Jobs don't look like a fit because these (1) are "long-running" >>> (actually periodically scheduled tasks) and (2) they need to redistribute >>> even after they start running >>> >>> This is starting to get long. I have more details to share. Here is >>> the repo with the code being used to test, and a link to a wiki page with >>> some of the details: >>> >>> https://github.com/opennms-forge/distributed-scheduling-poc/ >>> >>> >>> https://github.com/opennms-forge/distributed-scheduling-poc/wiki/Ignite-Startup-Performance >>> >>> >>> Questions I have in mind: >>> >>> - Are services a good fit here? We expect to reach upwards of >>> 500,000 services in a cluster with multiple nodes. >>> - Any thoughts on tracking down the bottleneck and alleviating it? >>> (I have started taking timing measurements in the Ignite code) >>> >>> Stopping here - please ask questions and I'll gladly fill in details. >>> Any tips are welcome, including ideas for tracking down just where the >>> bottleneck exists. >>> >>> Art >>> >>>