When server-side monitoring fall short
August 08, 2019 by Evgeny Lukianchikov & Denis Galeev
While load-testing Rocket.Chat, we stumbled into an interesting case when server-side monitoring missed important performance information, which however was captured by our client-side analytics.
Rocket.Chat - one of the most famous open-source Meteor applications was picked by us to show case how our load-testing tool (Astraload SAAS) works with the underlying framework. To make sure, we’re adequate with our results, we decided to double-check performance data by adding available server-side performance monitoring tools: Meteor APM (formerly Kadira) and meteor-elastic-apm.
Under the load of 60 virtual users they both were signaling that the server is fine and we can keep raising the load. In the screenshots below, we can see that Meteor methods response time is in appropriate range (up to 150 ms).
How we were surprised when we saw that Astraload client-side statistics drastically diverge from its server-side counterpart!
Astraload SAAS load testing tool runs multiple (potentially dozen thousands) virtual browsers and gathers unique Meteor and GraphQL performance statistics, which after aggregation get presented in readable form on test results page.
Most of the virtual clients were getting Meteor methods responses after waiting for more than 10 seconds. That is striking 2 magnitudes difference from server-side monitoring results!
After double-checking all the possessed performance data, we went deeper by profiling the webapp under the load [*] and found what caused the performance issue and why it wasn’t discoverable by the server-side monitoring tools.
* profiling with load generated by just one user didn’t give us any new information
Surprisingly the slow logic was located in meteor-elastic-apm monitoring tool! The way it wraps Meteor internals to capture statistics is sub-optimal and slows down the whole application. What’s interesting, Meteor APM (Kadira) connects to Meteor couple steps deeper in execution stack and doesn’t see performance issues introduced by the other monitoring tool!
Removing meteor-elastic-apm from the Rocket.Chat project raised its performance up to 5 times in some methods according to Astraload client-side stats:
Removing Kadira as well improved performance further, however the difference was not so significant (up to 2 times for median):
From this short research we learned that indeed, everything has its price: server-side monitoring slows down your web applications. And some of such tools can not only have really huge negative performance impact but also provide misleading statistics and dig itself too deep to be easily discovered. In such cases client-side performance data may be invaluable.
Astraload perf team will help you build confidence in your website’s ability to handle any amount of users.