Friday, February 20, 2009

Initial Lassen Performance Results

The below are the test results for the 8 tomcat cluster with terracotta server. It is the result form the masterRun.sh script.
Analyzing Jmeter logs ...
==============================
502 Error = 0
503 Error = 0
DEV-1929 = 0
//This includes network latency, etc.
Response Time
==============================
Average JMeter response time [Last Page/e1s31] = 144.109 ms
Average JMeter response time = 98.2516 ms

//This is server latency printed by the apache tomcat servlet engine
Server Latency Stats
=====================
Average = 9.90304 ms
Minimum = 0 ms
Maximum = 5851 ms
Std. Deviation = 40.5956
Total # of Requests = 2704000
Nodes = 8

L2 Configuration
==============================
TC_OPTS=-Xms4g -Xmx4g -XX:+DisableExplicitGC -XX:-TraceClassUnloading -XX:TargetSurvivorRatio=90 -Xss128k -XX:+AggressiveHeap -Dcom.tc.l2.objectmanager.fault.logging.enabled=true -Dcom.tc.l2.cachemanager.logging.enabled=true -Dcom.tc.l2.berkeleydb.je.maxMemoryPercent=15 -XX:SurvivorRatio=12

L1 Configuration
==============================
CATALINA_OPTS=-Duse.async.processing=true -Dasync.concurrency=5 -Dcom.tc.hibernate.useFineGrainedLocking=false -Dcom.tc.l1.cachemanager.enabled=false -Duse.pojoizer=true -Xms1024m -Xmx1024m -server -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -XX:TargetSurvivorRatio=90 -Xss128k -XX:NewSize=512m -XX:MaxNewSize=512m

// We have async TIM installed which let the saving the results to database asynchronously.
MySQL DB (perf28) Entries in EXAM_RESULT
=========================================
COUNT(*)
4547

MySQL DB (perf28) results completion time
==========================================
min(START_TIME) max(END_TIME) completionsecs
2009-02-20 04:40:12 2009-02-20 06:21:41 6089

Object DB Size
==============================
13G /export1/bench/perfTests/terracotta/server/data/objectdb

L2 VerboseGC log Analysis

GC Report
================
Avg Full GC Duration = 5.50871 secs
Avg Full GC Interval = 247.74 secs
Total Full GC Time = 137.718 secs

No. of Full GC = 25
Avg Young GC Duration = 0.230732 secs
Avg Young GC Interval = 25.8137 secs
No. of Young GC = 246
Total Young GC Time = 56.76 secs

Avg Young GC Occurence b/w Full GC = 9.84
Avg Young GC Time b/w Full GC = 2.2704 secs

The page response time is less than 10 ms.

The above response time includes Servlet Initialization time. All the servlets/application initializes when first request hits the tomcat server which could be bit high.

After 5 mins from the test run completion

mysql -u root -h perf28 -e 'SELECT COUNT(*) FROM exam.EXAM_RESULT'
+----------+
| COUNT(*) |
+----------+
| 9600 |
+----------+

Lassen Performance Tuning

We need to apply the performance tuning parameters on Tomcat and HTTP server.

Tomcat Server Tuning

Jasper Configuration

Tomcat JSP compiler Jasper has some recommended configuration for production environment or rather non-development environment. They are described below and are made to web.xml

http://tomcat.apache.org/tomcat-6.0-doc/jasper-howto.html#Production%20Configuration



<servlet>
<servlet-name>jsp</servlet-name>
<servlet-class>org.apache.jasper.servlet.JspServlet</servlet-class>
<init-param>
<param-name>fork</param-name>
<param-value>false</param-value>
</init-param>
<init-param>
<param-name>xpoweredBy</param-name>
<param-value>false</param-value>
</init-param>

<!-- development Is Jasper used in development mode? If true, ---->
<!-- the frequency at which JSPs are checked for ---->
<!-- modification may be specified via the ---->
<!-- modificationTestInterval parameter. [true] ---->
<!-- -->

<init-param> <param-name>development</param-name> <param-value>false</param-value> </init-param>

<!-- mappedfile Should we generate static content with one ---->
<!-- print statement per input line, to ease ---->
<!-- debugging? [true] ---->
<!-- -->

<init-param>
<param-name>mappedfile</param-name>
<param-value>false</param-value>
</init-param>

<!-- modificationTestInterval ---->
<!-- Causes a JSP (and its dependent files) to not ---->
<!-- be checked for modification during the ---->
<!-- specified time interval (in seconds) from the ---->
<!-- last time the JSP was checked for ---->
<!-- modification. A value of 0 will cause the JSP ---->
<!-- to be checked on every access. ---->
<!-- Used in development mode only. [4] ---->
<!-- -->

<init-param>
<param-name>modificationTestInterval</param-name>
<param-value>3600</param-value>
</init-param>

<!-- genStrAsCharArray Should text strings be generated as char ---->
<!-- arrays, to improve performance in some cases? ---->
<!-- [false] ---->
<!-- -->

<init-param>
<param-name>genStringAsCharArray</param-name>
<param-value>true</param-value>
</init-param>

<load-on-startup>3</load-on-startup> </servlet>


Server Configuration

The following changes were made regarding the tomcat server made to server.xml

Reference: http://tomcat.apache.org/tomcat-6.0-doc/config/http.html

<Connector port="8080" protocol="HTTP/1.1"
enableLookups="false"
maxThreads="500"
minSpareThreads="100"
maxKeepAliveRequests="1"
acceptCount="200"
connectionTimeout="20000"
redirectPort="8443" />

maxThreads

It is the maximum number of request processing threads to be created by this Connector, which therefore determines the maximum number of simultaneous requests that can be handled. The number were set to 500 as there were requests rejected giving connection refused error when load applied on the server was about 400 user. There is thinktime applied in the test which should reduce the concurrency in the server but at certain point of time, there were few requests kept alive if the keep-alive timeout is too high.

maxThreads were increased to 1200, to match the number of users/node. This wasn't necessary but to solve the 40 min problem so that the connections are not refused by tomcat. Since maxThreads were increased, minSpareThreads were also increased to 500.

maxKeepAliveRequests

The maximum number of HTTP requests which can be pipelined until the connection is closed by the server. Setting this attribute to 1 will disable HTTP/1.0 keep-alive, as well as HTTP/1.1 keep-alive and pipelining. Setting this to -1 will allow an unlimited amount of pipelined or keep-alive HTTP requests. If not specified, this attribute is set to 100. There are redirects for each request so its good to have keep alive so that each redirected request is handled by the same thread.

Keep Alive on the tomcat server has been disabled, which probably solves the 502 Bad proxy gateway error

acceptCount

The maximum queue length for incoming connection requests when all possible request processing threads are in use. Any requests received when the queue is full will be refused. The default value is 10. It is set high so that we dont have any requests that is getting refused by the server.

MySQL Server tuning

The limit on connections made to MySQL server is specified by the max_client parameter. The default value is 100, so if we have connection pool size of 100 on the app servers, the limit on mysql server throws "com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: Too many connections".

The max connections were set to 800 which were eventually increased to 3300 (200 per node and 100 extra)

Load Balancer Tuning

The number of connections were increased from the default to handle the load.

<IfModule worker.c>
ServerLimit 50

#initial number of server processes to start
#http://httpd.apache.org/docs/2.2/mod/mpm_common.html#startservers
StartServers 3

#minimum number of worker threads which are kept spare
#http://httpd.apache.org/docs/2.2/mod/mpm_common.html#minsparethreads
MinSpareThreads 5000

#maximum number of worker threads which are kept spare
#http://httpd.apache.org/docs/2.2/mod/mpm_common.html#maxsparethreads
MaxSpareThreads 5000

#upper limit on the configurable number of threads per child process
#http://httpd.apache.org/docs/2.2/mod/mpm_common.html#threadlimit
ThreadLimit 200

#maximum number of simultaneous client connections
#http://httpd.apache.org/docs/2.2/mod/mpm_common.html#maxclients
MaxClients 10000

#number of worker threads created by each child process
#http://httpd.apache.org/docs/2.2/mod/mpm_common.html#threadsperchild
ThreadsPerChild 200

#maximum number of requests a server process serves
#http://httpd.apache.org/docs/2.2/mod/mpm_common.html#maxrequestsperchild
MaxRequestsPerChild 100000
</IfModule>

Linux tuning

We observed "too many open files" error on the linux machines. The hard limit for the "open files" parameter shown by ulimit -aH was set too low. We increased the ulimit on all the linux machines to the max (65536). This number also determines the number of socket that can be opened on a linux machine.

java.net.SocketException: Too many open files in system
at java.net.Socket.createImpl(Socket.java:388)
at java.net.Socket.(Socket.java:362)
at java.net.Socket.(Socket.java:240)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.apache.jmeter.protocol.http.sampler.HTTPSampler2.sample(HTTPSampler2.java:838)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1021)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1007)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:290)
at java.lang.Thread.run(Thread.java:619)

Monitoring Lassen Performance Runs

Here is a brief descriptions of the logs.

1. server.<server_name>.log [ Dir:LassenPerfFramework/logs ]
This contains the startup logs for Terracotta Servers. If there are problems in the starting Terracotta Servers, check these logs first.

2. TCserver_GC.<server_name>.log [ Dir:LassenPerfFramework/logs ]
This contains the verbose GC output for the Terracotta Servers. If you want to work with different JVM GC settings, these logs contains the GC information.

3. log*.jtl [ Dir:LassenPerfFramework/logs ]
This contains the output from the JMeter test run. Each HTTP Response Logging from the Cluster goes into this. This does NOT contains the response body for the successful responses (HTTP response Code:302 or 200). For errors/exceptions (HTTP response Code:500 ,502, 503, 404, etc), JMeter prints out the Response from the server. It helps in diagnosing the problems.

Each line contains the response time for that sample and response code from the server.
<httpSample t="5" lt="5" ts="1184177284608" s="true" lb="/examinator/" rc="200" rm="OK" tn="Thread Group 1-1" dt="text"/>
t="5" : 5 ms is the response time for this sample
lb="/examinator" : The URL hit by the HTTP request.
rc="200" : HTTP Response code from the server
tn="Thread Group 1-1" : Thread Group.

4. GC*.log [ Dir:LassenPerfFramework/logs ]
This contains the output from JMeter that how many threads have started plus the JVM verbose GC output if enabled.

5. responseTime.log [ Dir: ~/perfTests/tomcat/logs ]
This contains the output from the ResponseTimeTrackingValve installed on the tomcat servers. It prints out the average, maximum, minimum server latency time per 1000 requests. This doesnt include the network latency.

6. catalina.out [ Dir: ~/perfTests/tomcat/logs ]
These are the logs from the Tomcat servers. Any tomcat related errors will be logged in these logs.