Apache Tomcat Internals
- Published on
Apache Tomcat is an open source implementation of Java Servlet, JavaServer Pages, Java Expression Language, and WebSocket technology, providing a "pure Java" HTTP Web server environment in which Java code can run [1].
At the center of Tomcat's role as a web server, it is based on a part called a Servlet Container (also called Engine)](https://en.wikipedia.org/wiki/Web_container#List_of_Servlet_containers). Therefore, users can have Tomcat take charge of common matters such as socket processing and connecting HTTP requests to configured Servlets through simple settings. And users can focus on business logic implemented through Servlet.
In this article, we will look at the internal structure of Tomcat by component and look at related code.
- What is a servlet container?
- Tomcat architecture
What is a servlet container? (Based on Java Servlet Specification 4.0)
A servlet container is an object that executes requests received from clients and returns responses based on those requests [3]. A servlet container exists as part of a web server or application server that provides 'network services through which requests and responses are sent, MIME-based request decryption, and response generation' [2]. Additionally, a servlet container holds servlets and manages them through the servlet lifecycle.
A servlet container can be created inside a host web server or installed as an add-on component through the web server's extension API. Additionally, a servlet container can be configured or installed inside a web-enabled application server.
All servlet containers must support HTTP as the protocol for requests and responses. However, additional request/response-based protocols such as HTTPS can also be supported. The HTTP specification versions that containers must implement are HTTP/1.1 and HTTP/2. When supporting HTTP/2, the servlet container must support 'h2' and 'h2c' protocol identifiers.
The container supports caching mechanism, so it transforms the request and passes it to the servlet, transforms the response it receives and passes it to the client, or, for requests that do not support such a function, [RFC 7234](https:/ This can be handled through a servlet following /tools.ietf.org/html/rfc7234).
And the servlet container imposes security restrictions on the environment in which the servlet runs. For example, some application servers limit the number of threads created to ensure that container components are not affected.
Now, let's take a look at the structure and see how the above servlet container specification is implemented in Tomcat.
Tomcat Architecture
Tomcat has a hierarchical structure with nested components as shown below. Some of these components are called top-level components because they have a tight relationship with each other and exist at the top of the component hierarchy. A container (a different term from the servlet container above) is a component that contains other components. Components that exist within containers and cannot themselves contain other components are called nested components.
The image above is the complete topology of one server, but certain objects may be omitted without affecting performance. For example, if an external web server (such as Apache) resolves requests to the web application, the engine and host are unnecessary.
In the image above, multiple instances of the object can exist when executed: Logger, Valve, Host, and Context. In the case of connectors, they are shown separately to include their characteristics [4].
The Server
The server is Tomcat itself, which is an instance of the web application server and is the top-level component. The server has one port for shutting down the server, and you can start debugging the JVM by setting the debug mode.
You can also set up separate servers so that you can separate applications on a single machine and restart them individually. Additionally, even if an error occurs on one server running on one JVM, other servers can be operated in isolation without affecting them.
The Service
A service is a top-level component that connects a container (usually an engine) with the container's connectors. Each service is given a name so that administrators can easily recognize log messages generated from each service.
The Connectors
Connectors connect applications with clients. A connector indicates at which point a request from a client is received and allocates a port on the server. The default port for Nonsecure HTTP applications is 8080 to avoid conflicts with 80 on existing web servers, but this can be easily changed through settings. Multiple connectors can be configured on an engine or engine-level component, but each must have a unique port number.
The default connector is Coyote.
Connector is connected to the protocol through the Create method of ProtocolHandler in the constructor as shown below. Connect the client's request to the container by setting up the appropriate ProtocolHandler.
public interface ProtocolHandler {
// ...
public static ProtocolHandler create(String protocol)
throws // ... {
if (protocol == null || "HTTP/1.1".equals(protocol)
|| org.apache.coyote.http11.Http11NioProtocol.class.getName().equals(protocol)) {
return new org.apache.coyote.http11.Http11NioProtocol();
} else if ("AJP/1.3".equals(protocol)
|| org.apache.coyote.ajp.AjpNioProtocol.class.getName().equals(protocol)) {
return new org.apache.coyote.ajp.AjpNioProtocol();
} else {
// Instantiate protocol handler
Class<?> clazz = Class.forName(protocol);
return (ProtocolHandler) clazz.getConstructor().newInstance();
}
}
}
public class Connector extends LifecycleMBeanBase {
// ...
public Connector(String protocol) {
configuredProtocol = protocol;
ProtocolHandler p = null;
try {
p = ProtocolHandler.create(protocol);
}
// ...
}
// ...
}
Engine
An engine is a top-level container and cannot be contained by other containers (it cannot have a parent container). From this level, objects start to have child components.
The container does not necessarily have to be an engine, but just needs to satisfy the specifications of the servlet container presented above. However, since the container at this level is usually an engine, I will describe it assuming that it is an engine.
Engine is a single container that represents the entire Catalina servlet engine. The engine checks HTTP headers to determine which virtual host or context a particular request should connect to.
If it runs without changing any specific settings, it will use the default engine. This engine performs the checks mentioned above. If Tomcat is configured to provide Java Servlet support for the web server, the default class used for the request will be overridden, as the web server will usually process the connection for the request.
In Catalina, the Standard implementation of the top-level container** engine is the StandardEngine class**, which has the following inheritance relationship [3]:
// Lifecycle.java
public interface Lifecycle {
// ...
}
// Container.java
public interface Container extends Lifecycle {
// ...
}
// Engine.java
public interface Engine extends Container {
// ...
}
// StandardEngine.java
public class StandardEngine extends ContainerBase implements Engine {
// ...
}
The Realm
Realm for One Engine is responsible for user authentication and authorization. During application setup, administrators define which roles are allowed access to resources, and realms are used to enforce these policies. Realms can be authenticated via text files, database tables, LDAP servers, etc.
A realm applies to the entire engine or top-level container, so applications within a container share resources for authentication.
The Valves
A valve is a request processing component that is associated with one specific container. Valves are similar to the filter mechanism in the servlet specification, but are unique to Tomcat. Hosts, contexts and engines have valves: e.g. StandardHostValve, StandardContextValve, StandardEngineValve, StandardWrapperValve.
Valve Interface has a simple basic structure as shown below (comments removed), The logic of each valve is implemented in various forms in the invoke method:
public interface Valve {
public Valve getNext();
public void setNext(Valve valve);
public void invoke(Request request, Response response)
throws IOException, ServletException;
public boolean isAsyncSupported();
}
As an example, the following StandardEngineValve retrieves the host object from the request, If there is an error, the first valve in the host's pipeline is pulled out and invoke(request, response) is executed:
final class StandardEngineValve extends ValveBase {
@Override
public final void invoke(Request request, Response response)
throws IOException, ServletException {
// Select the Host to be used for this Request
Host host = request.getHost();
if (host == null) {
// HTTP 0.9 or HTTP 1.0 request without a host when no default host
// is defined.
// Don't overwrite an existing error
if (!response.isError()) {
response.sendError(404);
}
return;
}
if (request.isAsyncSupported()) {
request.setAsyncSupported(host.getPipeline().isAsyncSupported());
}
// Ask this Host to process this request
host.getPipeline().getFirst().invoke(request, response);
}
}
The Loggers
Loggers report the internal state of a component. Loggers can be set up in components starting from the top-level container and down. Logging properties are inherited, so any logger set at engine-level will be assigned to the child unless overridden by the child.
The Host
The host is similar to the famous Apache virtual host. In Apache, hosts allow multiple servers to run on the same machine and be distinguished by IP address or host name. In Tomcat, virtual hosts are identified by their fully qualified host name. Therefore, www.websitea.com and www.websiteb.com can exist on the same server and have their requests routed to different groups of web applications.
Setting up a host includes setting the host name. Most clients send the server's IP address and the host name used in the IP address together. Since the hostname is provided as an HTTP header, the engine can check the header to determine which host the request connects to.
The Context
Context is also called a web application. Setting up a web application includes telling the engine and host the location of the application's root folder. Dynamic reloading ensures that any changed classes can be loaded back into memory. However, this is not recommended for commercial deployment environments as it uses a lot of resources.
The context also includes specific error pages that can be set up by the system administrator. Finally, you can set initial parameters in the context for applications or access control, etc.
Reference
[1] Apache Tomcat
[2] Java Servlet Specification 4.0
[3] github.com/apache/tomcat/blob/10.0.0/java/org/apache/catalina/Container.java#L30
[4] Professional Apache Tomcat 5