In this article we are going to see basic Performance Analysis approaches. I will be referring to Top-Down & Bottom-up approach. I will not compare them as both are used for analysis, I will only try to explain what are the basic steps and when to choose what type. This article mostly match for Java & DotNet application. You may use similar approaches for other platforms too.
Top-Down Analysis :
This is the most popular method. The idea is simple, performance monitoring application from top view. That means, client monitoring -> server monitoring in OS & Resource level, then app server-> then run time environment
And, when we get unusual behavior (or not expected or targeted), then profiling or instrument application. In this moment, we have found problems , and experiment with possible solution, choose the best one.
And then, we have to tune the system. Tune code with best solution, tune environment and tune run time.
Finally, we need to test for the impact. Typically, before starting an analysis, we should have some measurement data(or benchmarks) from performance test results. We need to retest those and compare with previous results. If that is significant, then it is done, if not, we need to get back to profiling and tuning application.
Here is a flow chart to show at a glance : Google drive link. Open with draw.io.
When we use?
-> Application causing issues, we need optimize whole or a part of application.
-> Optimize application for given resources(CPU/Disk/Memory/IO/network)
-> Tune the system & application for best performance.
-> We have access to code change, now need to tune application for specific goal.(throughput, response time, longer service etc).
-> We need Root Cause Analysis for unexpected Performance Issues(OOM, Slowness, crashing in different level or sub-systems, unwanted application behavior primary suspected for performance, etc)
Bottom-Up Analysis :
Here is a flow chart to show at a glance : Google drive link. Open with draw.io.
Top-Down Analysis :
This is the most popular method. The idea is simple, performance monitoring application from top view. That means, client monitoring -> server monitoring in OS & Resource level, then app server-> then run time environment
And, when we get unusual behavior (or not expected or targeted), then profiling or instrument application. In this moment, we have found problems , and experiment with possible solution, choose the best one.
And then, we have to tune the system. Tune code with best solution, tune environment and tune run time.
Finally, we need to test for the impact. Typically, before starting an analysis, we should have some measurement data(or benchmarks) from performance test results. We need to retest those and compare with previous results. If that is significant, then it is done, if not, we need to get back to profiling and tuning application.
Here is a flow chart to show at a glance : Google drive link. Open with draw.io.
When we use?
-> Application causing issues, we need optimize whole or a part of application.
-> Optimize application for given resources(CPU/Disk/Memory/IO/network)
-> Tune the system & application for best performance.
-> We have access to code change, now need to tune application for specific goal.(throughput, response time, longer service etc).
-> We need Root Cause Analysis for unexpected Performance Issues(OOM, Slowness, crashing in different level or sub-systems, unwanted application behavior primary suspected for performance, etc)
Bottom-Up Analysis :
This is another popular approach when we need to tune resource or platform (or hardware) for specific application. Let say, you have a java application , deployed. Now, bottom up analysis will allow you to analyze and find optimization scope for deployed system, hardware and resources. This is very common approach for application capacity planning, benchmarking for changed environments(migration). The key idea is, monitor application in specific environment and then tune environment(software+ hardware resources) that makes target application running at top performance.
Here is a flow chart to show at a glance : Google drive link. Open with draw.io.
-> You need to optimize resources & environment for specific application( deployed environment)
-> You need to have benchmark and get to know resource usages as well as find possible area for tuning.
-> You need to optimize run time (JVM/CLR) for your application. You can see resource usages and tune as your app needs.
-> When you need capacity planning for your hardware which must run application in optimal way.
-> When you have optimized your application from code and there is not visible area to tune, you can use this technique to achieve some more.
Please comment if you have any question.
Thanks.. :)
Please comment if you have any question.
Thanks.. :)
Thanks, Very informative. Could you please post typical counters that we need to specifically see for JAVA / .Net. I know you have mentioned CPU, memory, it would be nice if you can drill down and highlight important ones. Thanks.
ReplyDeleteFor Java counters, I will post gradually, for dotnet, see this page, http://shantonusarker.blogspot.com/p/dotnet.html
DeleteUsually, counters are target wise, what is the goal of your analysis. For example, if you are looking at response time improvement, then, you need to counters for CPU time, wait time, GC time, different Heap memory change rate, CPU queue length, thread wait time, thread wait ratio or % etc.
Deleteagain, there are generic added counters like network latency .
Thanks for your response, Yes we are looking at response time improvement for end user business transaction, Mostly app is .net so I was looking for counters for .net in app severs mainly ? May I know the counters to look for DB (Oracle)
ReplyDeletethis is very broad area and much much specific to your oracle implementation design. Like, if your db is cluster based , then you need different type of counter than general. Some general counters are
DeleteActive Session
Request Queue length size
Request Queue Rate
Process Rate
Error Rate
Error type Rate
i/o
RPS
Slow Query Rate(APM tool, you need add threshold)
latency
Max wait
Log (rate)
Event viewer (raise, processing, error, exception, alert )
BTW.. you need to consider what driver accessing oracle for counters, like JDBC & ODBC connection pooling has different counters
Oracle has Assistance with build in default counter set. And, in toad and pl/sql developer, there are some predefined query based monitoring tool builtin.
And, like as banks, based on data , the monitoring may vary based on how it is implemented inside oracle.
Say my java application talks to web services deployed on another server and I realise that response from web services is slow? Now how do I establish whether the actual web service provider is slow or the time was spent on network? I need specific approach to handle this? What network related tool should I use to see if network is playing foul here , what are the network metrics I should look at and using which tools? Tools I have my hands on in current project : lr, Jmeter,appdynamics, visualvm
ReplyDeleteappdynamics has this facility to monitor test application as well as dependent webservice.
Deletein your case, you need to monitor you application, your dependent webservice and that webservice hosting PC too. Get clear execution picture and measurement before blaming any part of it.
Main things to monitor for network : Latency time (break down this into, DNS, traffic, auth proxy etc)
Based on , how your system serves, you need to specify monitoring matrices, like CPU(thread usages, queue length), Memory( Private, Committed, virtual byte), IO(RPS, fault rate, drop rate), Network (bandwidth, latency ) etc.
I can specify those when I have top level component diagram of whole system architecture.
What network metrics do we look at and using which tool? How to establish that network between two servers is okay and is not the reason for slowness, I need specific metrics to be monitored? My java application talks to web services hosted on other servers and are terribly slow , now I want t o establish if that is due to network or web service provider itself is slow.
ReplyDelete-> What network metrics do we look at and using which tool? -> Depend on system usually all OS has network monitoring in the built.
Delete-> How to establish that network between two servers is okay and is not the reason for slowness -> it depend on your target. if your performance has goal on those network dependencies, make a script in test tool, have a load/stress test among those servers. If iy is critical, you can use network stress testing tools
-> I need specific metrics to be monitored? -> again based on your architecture and target, i can suggest any without knowing them.
if your Java app has back-end WebService , divide your testing into different part,
1. Test on Java app + WS
2. Test on WS
3. Virtulaze/Mock WS and test on Java APP
So, you have 3 different tests with 3 different goals. Now, consider latency and then decide what is the root cause.
After getting root case, retest for validation.
We are testing .Net Application which is azure base. Can you please suggest me performance counter which i need to monitor for this application?
ReplyDeleteIt depends on performance goal and tracking requirement. Usually, CPU, Memory, Disk IO, network traffic are default monitoring. If you need, you can add IIS monitoring and tracing also. To be specific things, let me know why you want to have this monitoring. What you are taking care of, what is the purpose of the site. Based on that you may need to add more (like DB request counters).
Delete