How to formulate Performance SLA or Baseline requirements ?

In this article we are going to see how we can formulate or introduce performance requirement in the team where is no requirement in place. Mainly we will see how to introduce Performance SLA or Baseline to create awareness.

To know about performance requirements in details, you may visit my previous post on performance requirements. 

SLA : (Service Level Agreement) :
I am not going details with definition, but in short,  SLA is the agreement at the end point of a service/solution/software, in our case a product. We will see details of different SLA with a web app example.

SLA can be also be described as

Baseline : For comparing , when we select a value as Ideal/standard or basis of all comparison, it is called based line. You can google for more detail info, but this is just simple to understand. All baseline values has preconditions details as standard.

Form the previous post , we know , performance is all about how fast and how many users. (in summary). So, Let's assume we have an web application like below architecture(2 tire architecture) .

1 .User interact with browser with web server.
2. Web server has app server that talks to Database
3. Database is shared with multiple downstream systems & publishers.(full back end)


Both SLA & Baseline can be represented as response time and/or throughput. I will give example for SLA of response time only which can be easily reused as baseline where is no SLA in place.

(To know more about response time & throughput, read this post)

So, let's divide this into module wise
1.Browser
2.Web UI
3.Database
4. Back-end systems( Downstream & publisher)

So, if we have simple SLA for response time only, it will at least ensure ,
1. Exactly what we are delivering or expecting from the application. 
2. What worst can perform no matter how badly we develop. 

if divide SLA for all of those items, we will get something like this. 


In each type of SLA, we need the worst case scenario in place. Like , Maximum time required for the purpose/event/functions. 

1. End User SLA :
It is the user time for each interaction which involves
a. Action (clicks, type, drag, right click, select etc) 
b. Responses time associated to that particular action
c. Complete Rendering all responses in the Browser to see full results of that action. 

Example: Lets consider LinkedIn a page. When an user go to www.linkedin.com , browser send GET URl requests. And, the requests goes to webserver and responded with page & resources which are rendered and showed to user. And, now user put user name & password & press sign in, he get logged in and see his home page. In here, we can see an user have different actions and responses from web server.
So, Application SLA will have this response time specified. What should for click log in, what should URL enter & get all item loaded(page load) , what should be typing to text box(user name & password) . 
These are mainly full page allowed events that an user can do. If page is divided into multiple small page items, it also applies for individual small item.

How to measure ? 
1. We can always measure by stopwatch manually. (which may be last thing you should follow as it involves human errors) 
2. Browser build in developer tools
3. There are some browser plugins (performance analyzer) to measure this time inside browser
4. Proxy Tools like fiddler.
5. Browser based automation(Selenium or any other tools)
6. Jmeter/Load runner driven client performance testing. (see details in Jmeter page)

2. Application SLA :
Application SLA refers to request serving time up to client end, in our example Browser. All page and events are served by application to user need time to process.The difference of client SLA & Application SLA is, Application SLA involves only requests processing time up to client. So, no client activity involved here.

For example, rendering, client side validation. So, this involved only with
a. Time to process a request from user (request processing, for our example, http requests)
b.Time to response to user after data processing(for our example, Preparing responses and send to client)

For example : If our web application page has a html response and has resources, the time to get the requests and responses of particular GET request and send them to client will be the time of application SLA. But, if client browser need to render those or process those , that time of processing is not part of Application SLA.

Another example, let assume a Game as web application which get/send all data from server via webservice but in browser it does JavaScript based game play. So, webservice response/requests will be count as Application SLA where as full game play user experience will be count as End user SLA.

How to measure?  
Usually we can follow all process described end use SLA, and deduct the rendering time
(as well as client activity time)  And we can use also load testing tools to measure request and response time, like Jmeter, load runner etc.

3. DB SLA :
This is contract between Database and Application. If we consider the application connected to DB, for all  UI calls DB server requests (via procedure/sql queries etc). So, each request processing will take time to do. It should have maximum time to response to application request no matter how many parameters send for processing the request(those also should be specified) .
So, the request processing time inside DB to the application UI is defined as DB SLA.
It can be response time as well as throughput. In this context, through put is more useful for the team. They will increase the throughput capacity when the response time is below SLA.

How to measure?  : 
1. The easy way is, Unit test/timer for individual DB calls inside application code.
2. DB queries for monitoring
3. Load test tools directly requesting to DB with set of queries , exactly the same way application send request to DB.
4. Tools like Toad, PL SQL or any other monitoring tool can get these timing from DB

Other system SLA : As we have mentioned , DB can be connected to back end systems, or downstream systems, each tire should be specified with SLA. Each handshaking should have timing on there own so that we can define intra-component request process timing & throughput as SLA.

Note :
-> We should consider Latency time to calculate SLA for each application tire.
-> If you have network SLA time specification, isolate that with latency time.
-> For network timing, try to be rational and measure full network time distribution.
-> For detail time distribution, see this https://developer.chrome.com/devtools/docs/network#resource-network-timing
-> When defining SLA/baseline , try to avoid ambiguity. Be specific on the measurement values.
-> Try to define worst case scenario. if you need, you may gave average case scenario.


How to define SLA at Requirement Phase? Or in initial phage on an Agile development Team?

Usually BAs don't know more about technical details spatially when the project is in requirement analysis phase or initial phase in agile. So, how can they know about the system. This is very crucial and important. Most of the projects I have seen in my career with performance problems happens due to Performance Requirements are not in place. If we can define application performance standard from the beginning, that can be more effective. So, let's introduce Business Level SLA like following.

Business SLA : What is it? 
When the SLA is defined according to business transactions for an application, that timing is referred to as Business SLA. So, business SLA is not related to product initially. It is related to business. And every work/functionality that will be perform by application from business perspective will be defined as business transaction.
Usually Business level SLA is measured by Time only, and later on specified with sytem throughput.
Example : Let consider linkedin. When we do log in (as we gave example in end user SLA), then the only one business transaction will be happening, that is LOGIN. So, before being linkedin was developed, if business SLA was defined for 2sec on Login , then full team can easily define what time should be taken for rendering load page, send login request processing etc.

So, a Business Transaction is the set or group of interrelated performed user actions and expected responses. And, the Business SLA time is the total time to perform the full transaction.

So, Benefits of business SLA :
1. No need full application details to define Performance Requirements in beginning of a project.
2. Usually Business won't care about mini steps, they will care about whole transaction/ function which they put as requirements.
3. You can measure real cost of the application from the beginning.
4. You can get client/user/stakeholder to have realistic view on application performance before application started
5. You can define Business and revenue goal based on performance requirements.
6. If you are developing application with 3rd party resource providers, you can define this standards to get better control over application functionality and performance.
7. A particular transaction or business function flow/steps may change on longer term but business SLA remain same which ensures requirement validation though changes occurs.
8. And if we can put this SLA with task from beginning or requirement phase, the whole team will be aware. And this awareness might save from your business risk.

My Personal Opinion :  As business personal, think Time as cost, how many the business functionality that you involve in the product, define time and aware the team for the time as cost for each functionality.


So, when we are gathering requirements and have SLA for each business transaction or new functional requirements, developers or QAs can breakdown the business SLA into end user SLA values and before each release, it is easily measurable that they are following requirements or not.

Summary : 
1. Business SLA is set of End user SLAs
2. End User SLA is the total time of Application SLA & Client side activity.
3. Application SLA is the total time of request processing of end user requests , DB SLA & all back end system SLAs
4. DB SLA is the dependent system's request processing time (application request or back end system SLA)
5. Back end System SLA is the time to process request send to the system from caller systems.

Note : This example is with typical web application, for webservice, mobile apps or even enterprise solutions , you can follow this breakdown technique to have better clarity on SLAs.

Please comment if you have any question.
Thanks.. :)

Performance analysis : Top-Down and Bottom-Up Approach

In this article we are going to see basic Performance Analysis approaches. I will be referring to Top-Down & Bottom-up approach. I will not compare them as both are used for analysis, I will only try to explain what are the basic steps and when to choose what type. This article mostly match for Java & DotNet application. You may use similar approaches for other platforms too.

Top-Down Analysis :

This is the most popular method. The idea is simple, performance monitoring application from top view. That means, client monitoring -> server monitoring in OS & Resource level, then app server-> then run time environment
And, when we get unusual behavior (or not expected or targeted), then profiling or instrument application. In this moment, we have found problems , and experiment with possible solution, choose the best one.
And then, we have to tune the system. Tune code with best solution, tune environment and tune run time.
Finally, we need to test for the impact. Typically, before starting an analysis, we should have some measurement data(or benchmarks) from performance test results. We need to retest those and compare with previous results. If that is significant, then it is done, if not, we need to get back to profiling and tuning application.

Here is a flow chart to show at a glance : Google drive link. Open with draw.io. 

image


When we use?
-> Application causing issues, we need optimize whole or a part of application.
-> Optimize application for given resources(CPU/Disk/Memory/IO/network)
-> Tune the system & application for best performance.
-> We have access to code change, now need to tune application for specific goal.(throughput, response time, longer service etc).
-> We need Root Cause Analysis for unexpected Performance Issues(OOM, Slowness, crashing in different level or sub-systems, unwanted application behavior primary suspected for performance, etc)

Bottom-Up Analysis :
This is another popular approach when we need to tune resource or platform (or hardware) for specific application. Let say, you have a java application , deployed. Now, bottom up analysis will allow you to analyze and find optimization scope for deployed system, hardware and resources. This is very common approach for application capacity planning, benchmarking for changed environments(migration). The key idea is, monitor application in specific environment and then tune environment(software+ hardware resources) that makes target application running at top performance.

Here is a flow chart to show at a glance : Google drive link. Open with draw.io. 

image


When we use?
-> When you need to increase performance but you cant change source code.
-> You need to optimize resources & environment for specific application( deployed environment)
-> You need to have benchmark and get to know resource usages as well as find possible area for tuning.
-> You need to optimize run time (JVM/CLR) for your application. You can see resource usages and tune as your app needs.
-> When you need capacity planning for your hardware which must run application in optimal way.
-> When you have optimized your application from code and there is not visible area to tune, you can use this technique to achieve some more. 

Please comment if you have any question.

Thanks.. :)

Java Thread & Synchronization : How to use thread?


This is continuing article of previous post. In this article, we are going to see how to implement or use thread in java.

We can apply threading in two ways.
a. Extending Thread class : Which is self executable (can run directly)
b. Implement runnable interface : Which is not self executable , need to include a thread to execute this.
Basically they are operated by same technique, if you see Thread class which is implementing runnable interface. That we will override. Let’s see those two
A. Extending Thread class :
We need to first extend from thread class. I am creating MyThread class which extends Thread

public class MyThread extends Thread{
    public MyThread(){
        this("No Name Provided");
    }
    public MyThread(String name) {
        super.setName(name);
        System.out.println("Extended Thread Created: " + name+" : Status ->"+super.getState().toString());               
    }       
    public MyThread(Runnable arunnable, String name) {
        super(arunnable, name);
        System.out.println("Extended Thread Created with runnable : " + name+" : Status ->"+super.getState().toString());           
    }   
    @Override
    public void start(){// Changed start
        super.start();
        System.out.println("Thread started "+super.getName()+" : Status ->"+super.getState().toString());
    }
    @Override
    public void run() {// Changed run
        System.out.println("Thread is running "+super.getName()+" : Status ->"+super.getState().toString());       
    }   
}
In each initiation, I provide print so that we can understand how it is working. Now, lets see we make a method under main that calls my thread to run.
public static void runThreadByExtendingThread(){
        MyThread t;
        for(int i=0; i<=5; i++){
            t=new MyThread("T Name : "+i);   
            System.out.println("Thread Created, Serial = "+i+", Name :"+t.getName() +" ID : "+t.getId()+" Status : "+t.getState().toString());
            t.start();
            System.out.println("Thread started, Serial = "+i+", Name :"+t.getName() +" ID : "+t.getId()+" Status : "+t.getState().toString());
        }
    }

And  adding this to main

public static void main(String[] args) {
        try{
            runThreadByExtendingThread();
        }catch(Exception e)
        {
            e.printStackTrace();
        }
    }

If we run this , we can see following in console
image

So, from here, we can see, how status is changing for a particular thread. I pointed about initial created thread(0 th)
Benefits :
1. We can implement our own type of execution engine by overriding inherited method.
2. Full execution unit is in our control, so we can design full work flow type environment inside JVM in controlled way. This is very useful while designing your own framework or custom UI component or even custom test runner.

Weak points :
As we have to inherit, so, only our custom thread cant be inherit more.

B. Implement runnable interface :
This is so simple. We need to implement runnable and override run() method. This is our main execution engine. So, we make our own method runnable and then initiate with a thread, then the thread runs this. So, I am making my own Runnable class MyRunnable

public class MyRunnable implements Runnable{
    private static int counter=0;// to see how many time it is calling run()   
    public MyRunnable(){   
    }
    @Override
    public void run() {
        System.out.println("Runnable is running at -> runnable counter "+counter++);       
    }
}

And, like as previous method, make a method to call this.
public static void runThreadByImplementRunnable(){
        MyRunnable runnable;
        Thread t2;
        for(int i=0; i<=5; i++){
        runnable = new MyRunnable();
        t2=new Thread(runnable,"T Name : "+i);
        System.out.println("Thread Created, Serial = "+i+", Name :"+t2.getName() +" ID : "+t2.getId()+" Status : "+t2.getState().toString());
        t2.start();
        System.out.println("Name :"+t2.getName() +" ID : "+t2.getId()+" Status : "+t2.getState().toString());
    }
    }

And, calling this method from main 

public static void main(String[] args) {
        try{
            runThreadByImplementRunnable();
        }catch(Exception e)
        {
            e.printStackTrace();
        } 
}

If we run this we can see following in command line
image

From here, we can , initial thread (oth) created and running our run() method. we can see step by steps execution .
Thread 0, created –> become runnable and then start our run() method. And between that, another created. 

Note : Thread use start() to call run(), it is better not to call run() method by our self. Why? you can see in thread class, jvm make some steps before calling run to make it safe.

Benefits :
1. This is faster way to implement
2. No barrier on extending class, so, just implementing runnable is enough.

Weak points :
We cant have control on execution environment.

In here you can see source code form github. I will add more example under this multithreading source code.

In the next article, we are going to see, multi-threading & communication among them.

Please comment if you have any question.
Thanks.. :)

Java Thread & Synchronization : Basic introduction to thread

In this article we are going to learn about Java Thread & Synchronization, basic introduction. The theory part which is necessary for applying threading & synchronization. I will try to cover minimum information needed for understanding, detail you may find in Oracle's blogs or Java code geek blogs.

What is a Thread ?
A thread is a mini/light weight processing unit used by Run time Environment(JVM) to process which are isolated among each other. Why it is lightweight, because, a thread can't be independently full but it is part of a process. So, technically, a thread is a single processing unit for a whole task/process.

To have clear idea, lets introduce some related terms.

a. Multi tasking :  When CPU preforms multiple process from single user.
b. Multi processing : When a CPU performs multiple processes from multiple user
c. Multi Threading : When multiple threads performs under a single process to a CPU.

So, by architecture, a thread will have

1. Single process ID, name (or Identifier from Operating system)
2. Single resource unit shared among all threads under a process. The resource can be memory, CPU time. 

image

To know memory management details , you can visit my JVM architecture post.

Note : As you know, from CPU world , there is shared CPU core technology called Hyper Threading which is core sharing technology that enables two processes running under same CPU core. Please dont confuse with this. Hyper threading makes two logical processing unit for a single core which is actually recognized by OS as two processing unit. Where as, multiple thread (2 or more) is recognized as single process to OS which is handle by its own mechanism.

How thread works?
A thread is a implementation of runnable interface with run() method. So, when a thread is created, it actually initiate an execution unit in JVM which actually runs the statement declared in run(). Usually, when we create a thread, we need to call start() to start the thread and start() actually calls run() safely.
Thread can wait, join, stop for other threads.
A group of thread can work like as, 
Preemptive multithreading : In here, CPU time shared process where each thread get time to run in CPU(queued or randomly chosen or highest priority get CPU time first). This is fully controlled by JVM. 
Cooperative multithreading : in here, Highest priority always get CPU time first. Unless , we explicitly make unselfish, it behaves like as selfish multithreading. [yield() is used with same priority to make unselfish]
Note : In any java profiler, we can actually see the running thread. For visual VM, see the thread tab.
A thread dump contains currently running thread information. 

Life cycle : A thread generally have 4 State in its life cycle.
a. Ready
b. Running
c. Hold/Sleep
d. Stop/Death of a thread
image

here is how, thread changes its states :

From ready to running, it is done by start() method calls, which actually runs run() method.

From running to hold, it is done by wait() or sleep(ms) & eventually back to running by either time out or notify()/ notifyall()[for wait() calls only]

Note : Before jdk 1.2, suspend() and resume() was used , but it is deprecated due to thread safe.

From running to stop or death , calling stop() which is deprecated in . It is handled by JVM it self.

Why deprecated? you can find very good explanation in stackoverflow.

Type of thread : 
User Thread : The thread which are use for user tasks and managed by users, they are user thread. Stopping of a thread is done by users.
Demon thread : The thread which are managed by JVM it self and act like as service. (example, GC). It used to be low in priority due to it’s nature. Stopping of a thread is done by JVM.
We can change thread type but we can’t change type when thread started.

Important Properties of a thread  :
a. priority: (int, 1-10), representing priority of a particular thread to JVM. Default value 5 as normal. If it is created under a thread group, it gets the value from group. And, if the group default max value changed, it changed to max.
b. Name(char[]) : representing name ,
c. ID (long): Id of a thread.
d. threadStatus(int): shows thread stated or not
e. group (ThreadGroup) : Represents a set threads with common purposes, easy to manage all together.
All are private so, we can access by getters.
In the next post , we will see, how threads can be applied in application.

Please comment if you have any question.

Thanks.. :)

Offline Java Memory Analysis : Introduction to tools for Heap and Thread Dumps, GC log Analysis

In this article we are going to see different tools which can be used for different Java Memory Analysis in offline. I will try to include all kinds of offline analysis tools. OS process memory monitoring and tools are ignored due to context.

What is offline analysis?
When we do analysis on recorded data , not live data, then it refers to offline analysis.

For Java memory analysis, we will perform analysis on not  live data but recorded. For Java memory analysis, we need mainly three type of information of JVM to get to bottom of it.
1. Java Heap Dump : This represents JVM heap memory information
2. Java Core /Thread Dump: This shows running thread state and conditions. The core contains more detail information. For IBM , core and thread dump are same. Different JVM may also include trace files here.
3. GC logs : This shows Garbage collection history in logs.

I will not go detail on each one, lets see the tools only

For All IBM JVM : You need to have IBM Support Assistant. This is a web distribution, you need to run this with Java and your PC will host all tools to gather. Here is the download linkHow to setup IBM ISA? Unzip and run start_isa.bat .
And in browser if you go to this link http://localhost:10911/isa5 You will be redirected here (all of this link & ports are configurable)

image

You will get 3 type of tools, JNLP web start, eclipse plugins link, web based. If you download and open the JNLP via notepad, you can actually configure JVM configuration. I used make a separate bat file to run JNLP with IBM JVM(non environment run time environment)

Make sure, your PC don’t have global log file as environment variable(if you install HP testing product you will have), Just delete that environment variable. ISA will create a variable path for log for it self and you can run this.

For all Oracle JVM : 

1. Java Mission Control(JMC), Comes free with JDK. But you need to install these tools as plugins.
a. JOverflow / Heap Analysis
b. DTrace Recorder
c. Flight Recorder Plugins
d. Console Plugins

image 

image

2. Visual VM : Comes free with JDK. Download useful plugins to get best out of it.

image

Architecturally JMC based on eclipse & Visual VM based on netbeans., so plugins installation process follows process of those IDE in installation (network config+ tools installations)

Java Heap Analysis :

1. Eclipse MAT(including IBM IDDE) supports IBM (phd)& Oracle JVM heap dumps.

2. Visual VM : Oracle hprof format heap dump reader

3. JOverflow/Dump Analysis as plugins of Java Mission Control :  Oracle hprof format heap dump reader, provides more details analysis than Visual VM. 

image

4. Heap Analyzer(IBM Only) : IBM Phd format heap analyzer. Download link.



Thread Analysis :

For IBM JVM: (Java Core file and .trc file)

1. JCA (Java Core analyzer) : This is from IBM support. Download and load text format java Core files.



2.  GCMV( Garbage collector Memory Visualizer) standalone or GCMV Eclipse Plugins , comes with ISA


3. Class Loader Analyzer (IBM only) used for class loader analysis from java core and Snap<>.trc file. It comes with ISA.

4. Trace & Request analyzer(IBM only)  used for reading Snap<>.trc files . Download Link

5. Thread & Monitor Dump Analyzer (TMDA, IBM only)


Note : .trc is trace file, you need to enable and configure to get proper information.

For Oracle JVM

1. .tdump format:  Visual VM

2. hs_err_pid.log format : We need to use Command line tools comes free with JDK.
a. jstack -> Prints Stack Trace
b. jmap -> Heap memory details
And most of the time manually reading as this is text format. We can use filtering. There are also small parser, mainly shell scripts for different filters. Here is one example.

GC Log Analysis :

As , there are different type of JVM implemented from Open JDK, GC log also have different format. Mostly commonly use IBM & Oracle format.  And in each JVM there are different type of GC logs but mostly they follow the general format. Which is text format. Some times , old IBM verbose GC might not be compatible for latest openJVM or Oracle JVM generated GC. How to generate log, I will go details in separate blog.

For Oracle(default : Solaris JVM format)  & IBM Verbose GC log reading :

1.  GCMV( Garbage collector Memory Visualizer) standalone or GCMV Eclipse Plugins , comes with ISA


2. PMAT (IBM Pattern Modeling Tool) : Multiple GC log together, comes with ISA

3. One of good 3rd Party Log viewers : GCViewer ,  Supported formats :
Sun JDK 1.4/1.5 ( -Xloggc:<file> [-XX:+PrintGCDetails] )
Sun JDK 1.2.2/1.3.1/1.4 (-verbose:gc )
IBM JDK 1.3.1/1.3.0/1.2.2 ( -verbose:gc)
IBM iSeries Classic JVM 1.4.2 (-verbose:gc)
HP-UX JDK 1.2/1.3/1.4.x ( -Xverbosegc)
BEA JRockit 1.4.2/1.5 (-verbose:memory)

Some other GC log reader tools you may find in this link.

Note :
1. For both VMs, we can use JProfiler, which is paid tool. If we use yourkit, different version of yourKit supports different version of JVMs, see archive page of yourkit for more details
2. For IBM tools, it is better to use IBM JVM. Either you can download JDK from IBM, or , you can download their development package with eclipse where JDK is present.
3. For IBM JVMs, usually an OOM will create a phd, a javaCore & a trc file. In case you are storing GC logs, verbosegc log will be there.

Please comment if you have any question. Thanks.. :)

How to find memory leaks in application? General Ideas

In this article we are going to see generic ideas about memory leak detection technique. We will use Top Down approach so that we can actually see symptoms from outside of application to inside.

For example, I will add reference on java(j2ee) and dotnet(Asp.Net/Web Forms) applications. I will try to avoid theoretical definitions , rather using my own way to express.

What is memory leak?
By the name we can understand, it is about high memory consumption. Actually, it has large impact in several type. In generic way, when an application's required memory increases while application is running, it might refers to memory leak. Not necessary a memory increment will be a definite leak, but, logical relation between functionality and required memory can define if it is leak or not. Based on symptoms, I define memory leak in two categories.

Symptom A :  Increase of memory usages: While running application , if we see the system memory & application memory usages(heap+non heap) increases, that is clear indication the required memory is higher.

We need to relate this with action that we perform with the application. Let's say , if our actions on application working with large volume of data, it is logical to have memory increment. But, we need to get, how much. And, we need to observe the new object that are created for those actions are removed from run time environment as soon as the functionality ended.

(for Java/Dotnet a , force GC from profiler should be useful to judge if the objects are collected as soon as they are useless)

Some times, we can see the memory increases more and more like stepping up thread, even after GC memory consumption is still growing and we can see generational changes.
Dot net : Gen 0 will be Gen 1 & gen 1 will move to get 2
java : Nursery -> survivor( S 0), S0 -> S 1  and S1 -> Tenured (old gen).

This can lead to critical stage and cause Out of memory Exception.

Symptom B : Memory usages is not dropping: While running application with large number of user or data (typical stress test scenario), if we see memory is not released and for longer time observation, the memory is still occupied. This may cause memory increment for new action perform on application.
To ensure this symptom, we may perform  force GC to see, if memory usages are going down then it is okay, if not, that means, it is constantly occupied. And, we can see the same generational changes of memory like as described in symptom A.

It is very logical, when we perform some task on application, it will load new class , create object and do the task. But as soon as we complete, we should expect those objects should be destroyed/cleaned in next GC cycle.

Tools :
1. OS Memory monitoring tool (perfmon or similar)

2. Application Memory monitoring tool (
Dotnet :  VMMap, or any profiler , ANTS memory/DotMemory,
Java : JConsole/Visual VM or Java Mission Control.(JMC)
IBM JAVA : Health center Under IBM Support Analyst
Or Commercial Profiler like Yourkit or JProfiler

And, you may also select any APM tools(Dynatrace, NewRelic or AppDynamics)  for both technology monitoring like)

3. Tools for Memory snapshot collect & compare : (
Dotnet : Perfview, Or Commercial Ants memory, DotMemory
Java : JConsole/Visual VM, JMC or commercial tool Yourkit or JProfiler

4. Analysis tool : Either manually, see things as you need or follow up commercial tool suggestions.

So, how to find the memory leak ?

I will follow Top down analysis. I will provide separate blog post for what is top down analysis, for quick recap, Top down analysis refers to approach on performance analysis from top level system to deep drive into code at run time .

Step A : Choose Scenario: 
First you need to know which area of you application might have memory leak. Usually, leak analysis activity comes after any performance test or profiling activity or in worst case Out of Memory exceptions happened.
In all cases, the initial steps must be know which actions or events or UI interactions are causing this suspects. This will ensure, the effort you are doing , is going on right direction.

For example : Let's consider a banking application, when user logins in , he gets account balance & all other baking functions. If he do credit, debit or any transaction, due to different 3rd party dependency , the application takes large memory and leaks are suspected.

Step B : Monitor System & Process Memory Usages : 
Monitor Environment/OS Memory Usages : After knowing application , either by automation (load test tool or function test tool) or manually, you should run the same process repetitively for a longer period and monitor memory usages of
1. Host PC using OS monitoring tool
2. Application Process (each OS has process monitoring)
3. Application Run time (Java/Dotnet)
(This part is based on architecture of run time, to know more detail you can visit my posts below)

This is small example for a IIS process monitoring with task manager and profmon.



Now, we have clear visibility over environment for memory and we can see the memory usages trends (over time). Let's go to next stage.

You can do application internal monitoring also with profilers.


Step C : Breakdown your Scenario: 
After getting memory usage over time, break down with your chosen areas(from step A) separately and narrow your scope. You will have small , well defined small steps from all steps.
Form Step A example, in here we will choose, any of steps for memory leak. Let say debit transaction. And, lets assume, for debit transaction , we need to
1. log in & see home screen
2. Go to , transaction
3. Select , transfer money to another account
4. Give all inputs and click transfer
5. Wait for mobile/id verification steps & complete that
6. See the successful message .

Step D :  Prepare your Memory profiler : 
We have fixed , small group of steps as a part of big scenario. Now, prepare your Memory profiler. I recommend : following for this

Dot net :(free) : Perfview

Dot net :(paid)
1. Dot Memory (I prefer)
2. ANTS Memory Profiler

Java :(free)
1. Visual VM
2. Jconsole
3. JMC
For IBM, Health center if JMC/VisualVM/Jconsole are not supported

Java :(paid)
1. JProfiler(I prefer)
2. YourKit

Note : It is easy to use paid tools to find memory leak due to their data representation.

So, Attach your memory profiler and take a memory snapshot at the beginning of the steps that we have chosen in Step D. (From example, Just after log in)

Perform each step in UI and take related heap snapshot in memory profiler.
(Note : if you are testing JS/Ajax application, make sure your UI requests passing to server from browser tools, without this , you may see no changes in server due your actions in UI)

So, after doing all steps, you will get heap snapshot for each step.

Step E : Comparing heap snap shot : 
Consider initial heap snap shot(from example, just after login) as Baseline or zero snapshot. Let say, we have take total 5 snap shot and No 1 is consider as base snapshot.  So, now, we can compare in two ways.
1. Comparing each snapshot with baseline : It will show the differences from initial state.

That means, from example, comparing Snapshot 2, 3, 4, 5 with snapshot 1.

2. Comparing each snapshot with it's previous snapshot : This will show gradually growing.

That means, from example, comparing Snapshot 5 with 4, 4 with 3, 3 with 2, 2 with 1.

Step F : Analyze Snapshots : 
If you do comparison, paid tool will give you differences by default. For Free tools, you might need get those manually by shorting, binding, folding.  So, lets see what items that we need to see.

1. Summary Analysis : A summary view of memory growth among all snapshots. In the summary , we need to see

a. How many time GC happened (to estimate our expectation)

b. Heap memory growth , break down into Generational heap(dotnet) or Nursery/Survive/Old generation(Java)

c. Base on Old generation growth & GC events decide was it necessary. In fact, if your application has manually GC calling, you might see extra old generation(gen 2 in dotnet) increases.

I am giving Small Example from ANTS memory profiler or Dot memory (some names are blur)


This happened when we solved a small part of the problem(only search).


in Dot memory


2. Heap snapshot comparison : In here , we need to see following differences
a. New Objects -> to determine what new objects are created for our particular step
b. Survival Objects -> Which object are necessary from previous stage. (like session object, previous data)
c. New Class -> Newly present class in run time, relate to the step . We might look at to Class loaders  also
d. Big Objects(85k+ for dotnet only) -> In CLR these are separately handled  can do filtering based on object size
e. Promoted Generations : -> Promoted objects indicates long living live objects.

3. Reference Analysis : After looking from heap snot comparison, we need to go target wise,

Get who are large differences, in size, in instance number.

Note : in Java, the size has two type, shallow size, refers to size of the object only & retention size, refers to the object size + all outgoing reference size(actually the memory that can be free due to its clean up)

For each candidate, try to analyze, 
- If there is any circular reference (exclude same interface implementation)
- How far a object is from GC (refers to , after how many GC event, it will be collected)
[if it is 1, it is nearest, so it will be collected on next GC, so you can easily ignore]
- Size of Single instance of the object. More often, God classes causing serious problems.
- Outgoing references and their size.
- Incoming references and who initiate them
- Instance List, to know exactly how may number of instances, who created them
- Instances which are associated with external resources(DB/File writer etc)





And, in all paid tool, you will get benefit of Retention Graph or Call tree. In this way, you can backtrack from object to its initiator & event. This is the main reason why paid tools are easy to find memory problems quickly.

And in summary, know your application memory usages: This is very key point, of going forward.

So, now you get your leaking object, you get who is calling, how it is called. And, you have to eliminate leaking. For that , there are some generic and technology specific workflows. I will provide  separate post on how to approach on solving memory leak , in Java & DotNet.

Note : Most of tools trigger a Full GC before taking snapshot, so for analysis you need consider that event with mealtime data monitoring and heap comparison analysis. For this, you may use free tools(shipped with framework) to take manual heap snapshot and compare them.

References :
1. A quick overview on JVM Architecture
2. What are the elements inside DotNet?
3. What is CLR? How it works?

This is a continuing post, I will add more specific information upon requests later on.

Please comment if you have any question.

 Thanks.. :)

Java OutOfMemoryError : GC overhead limit exceeded Example

This is continuing article of out of memory error explanation. In this post I will provide how to reproduce Java OutOfMemoryError : GC overhead limit exceeded. We will see how to recreate and what are the impact in monitoring tools.

I have used only JVM 1.8 x64 on windows 7 x64/8gb ram/ 2.5Ghz Core i5 laptop.

Tools : 
IDE : Eclipse
Profiling/Monitoring tool :
1. Visual VM
2. Jconsole (optional)
3. Yourkit (optional)

I am using some JVM flags to get detail GC information and monitoring via JMX. Please see this post in step 1 for detail.
I am using these flags here (xmx to limit heap to have quick error)
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.port=3000
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=localhost
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=D:\OOM
-Xmx100m

Now, for the GC overhead error, it is easily untestable , due to GC is unable to free up the heap with its best efforts. The main cause is, GC is taking 98% of CPU time to cleanup heap where heap is not feeing up more than 2%. In our example, we will see GC overhead multiple time as heap is occupied after certain amount of item entry.

Again, to know about the error, you can visit this original post.

Scenario: 

Very simple scenario, I am adding a string in a map(it is costly, if we use array list, it will take more time) in an infinite loop.

Code : 
public class GCOverheadOOM {
    private static Map aMap = new HashMap();
    public static void createGCOverheadOOM(){
        int i = 0;   
        try{
        while (true) {           
            aMap.put(i, "Shantonu adding String");   
            System.out.println("Total Items "+i++);
        }
        }catch(Throwable e){
            System.err.println("\nError after adding "+ aMap.size()+" items");
            e.printStackTrace();
        }
    }
}

And from main method, call this.
GCOverheadOOM.createGCOverheadOOM();
Note : As this is a GC related error (overhead) , this error fully depends on GC algorithm. This code generates the error in default or parallel GCs. When I used different, I got slightly different one. I have tried following jvm flags to select GC algorithm. Each at a one time
1. -XX:+UseParallelGC -XX:-UseParallelOldGC
2. -UseParNewGC -XX:+UseConcMarkSweepGC
3. -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
4. -XX:+UseG1GC
5. -Xincgc
6. -XX:+UseSerialGC
7. -XX:+UseParallelGC

Oracle has clear indication here.

Error analysis in console  : 

Error Occurred after 1488049 items added. We can see multiple OOM messages for each try by GC.
image

Dump Analysis in Visual VM (created at OOM ): I am using one of errors

Summary : 
 
 

Top contributors : 
image

Visual VM Monitoring : 
GC before ending :
image

Heap :
image


JConsole Monitoring : (overall)
image


Yourkit Monitoring :

CPU usages :
image

Heap : 
image

Non Heap : 
image

Please comment if you have any question.

Thanks.. :)