Java Backend Interview Questions (4 years)

By Kaden Sungbin Cho

Published on: April 7, 2024

Sharing

I recently ‘unexpectedly’ participated in a back-end technical interview.

As much as I was embarrassed, I couldn't even answer the questions properly.

In this article, we will organize the memorable questions and answer them to fill in the gaps:

What problems occur when Java static is overused? [One]

Java 'wants' you to think 'object-oriented'. In other words, all objects in Java implicitly or explicitly originate from Object.class, so users perceive the program as a set of objects.

When users create a class, they define how instances of the class will behave. The program cannot use variables or methods of the class until a class instance is created using the new keyword. At such creation time, the JVM allocates memory on the heap and stores the address of the instance on the stack, after which variables and methods become available.

Marking something as static means that its data is not tied to any specific class instance. In order to use a typical non-static method, you must create a class instance. Because static methods do not require an instance to be called, they cannot access non-static methods or members.

When a user creates a static variable or method, it is stored in PermGen (Permanent Generation) on the heap. PermGen stores non-instance data such as static applied to classes. Since Java 8, PermgGen has become Metaspace, but static variables are stored in the heap as before [2] (static variable -> heap, currently other static -> Metaspace). The difference is that Metaspace is auto-growing, and PermGen is fixed size. Additionally, Metaspace belongs to Native Memory and not JVM Memory.

Static variables are initialized only once when the class is first referenced in code and first loaded into the JVM.

class ParentClass {
    static Car car;
    static {
        car = new Car();
    }
}

In the above code, the newly created Car() object is stored in the heap, and the static variable car has the address of the created object and is stored in Metaspace.

As shown above, static consumes Metaspace memory, so it is recommended to avoid unnecessary use by using Java 8's functional concepts (in the case of static methods).

How to prevent memory leak? [3]

The dictionary definition of a memory leak is a situation in which objects that are no longer used in an application occur, but the garbage collector cannot remove those objects from working memory (because they are still referenced). As a result, the application consumes more and more resources and causes OOM.

Java Heap Leaks

A typical form of memory leak is when objects are continuously created without being released. To easily reproduce such a situation, you can utilize the following JVM options:

-Xms<size>
-Xmx<size>

Through the above, the initial and maximum heap sizes are limited and the heap size is made small.

Case1 static field with object reference

The first case is when a static field references a large object.

private Random random = new Random();
public static final ArrayLiat<Double> list = new ArrayList<Double>(1000000);

@Test
public void givenStaticField() throws InterruptedException {
    for (int i = 0; i < 1000000; i++) {
        list.add(random.nextDouble());
    }
    
    System.gc();
    Thread.sleep(10000);
}

In the above case, gc is called but memory consumption is not reduced.

To prevent this situation, you need to be careful with your use of static. In particular, statically referencing large objects makes it difficult to collect the entire object graph.

Case2 String.intern() on Long

The second case is related to String.intern().

@Test
public void givenLengthString() throws IOException, InterruptedException {
    Thread.sleep(15000);
    
    Strgin str = new Scanner(new File("large.txt"), "UTF-8")
        .userDelimiter("\\A").next();
    str.intern();
    
    System.gc();
    Thread.sleep(15000);
}

The intern API puts str String into the JVM memory pool where it cannot be collected. Therefore, gc cannot free up memory.

To prevent this case, it is important to keep in mind that interned Strings are stored in PermGen space.

Alternatively, if your application handles large interned strings, you can increase the PergGen size with -XX:MaxPermSize=size.

And if you are using Java 8, PermGen is replaced by Metaspace so it doesn't cause OOM.

Case3 Unclosed Streams or Connections

Technically, unclosed streams cause low-level resource leaks and memory leaks.

Low-level resource leaks are OS-level resource leaks such as file descriptions, open connections, etc.

Because the JVM uses memory to track these low-level resources, memory leaks occur.

These cases can be avoided as much as possible by using a try-with-resource clause.

Unclosed connections cause memory leaks. This can also be prevented by always closing the connection after use.

Case4 When adding an object without hashCode() and equals() to HashSet

If you repeatedly put the same object without hashCode() and equals() in the Set, its size will continue to increase. Also, once added, such objects cannot be removed.

You can avoid non-implementation of hashCode() and equals() as much as possible by using annotations such as Lombok's @EqualsAndHashCode.

How is the inside of pinpoint implemented?

The main structure is as follows:

Since the question was focused on how to understand it in such detail, I think we should look at the Agent part. For that part, I looked at the document in [7] and matched the code in [4].

Basically, Pinpoint has evolved from a single-node APM to providing distributed tracing capabilities [8]. There are two main ways to implement distributed transaction tracing: manual and automatic. Pinpoint is implemented automatically through Bytecode Instrumentation.

This Bytecode method is simple enough that you only need to insert a library when using Pinpoint, but it is technically demanding and difficult when creating a library. However, 1) the initial target users of Pinpoint (Naver developers) are very large, so reducing the man-hours for use can save many users' time. 2) If it is performed automatically, users do not need to retrieve and use the API, so backward- It is said that the automatic method was chosen considering advantages such as not having to consider compatibility, and 3) that it is simple for users to turn it on and off.

As described earlier, bytecode instrumentation deals with Java bytecode, so it is a method that can increase productivity but also increase development risks. The structure of the tracking code is abstracted into interceptors. And Pinpoint injects the necessary code to track distributed transactions into your application code at class loading time. This method is said to increase performance because the tracking code is directly injected into the application code.

Under the profiler module, interceptor, [instrument] There are what appear to be major packages such as (https://github.com/pinpoint-apm/pinpoint/tree/v2.3.3/profiler/src/main/java/com/navercorp/pinpoint/profiler/instrument). To deeply understand the actual source code, knowledge of the Bytecode instrument appears to be necessary. addTransformer Looking at functions like [9, 10], looking at the Java Instrumentation API seems to be the starting point.

How can the load when creating objects in pinpoint be optimized?

This is a derived question due to the incorrect answer above ("I think it should be implemented in the same form as the proxy pattern").

Correlation between transactional and ThreadLocal?

First, we should look at the implemented internals [11]. The main explanations are below:

This annotation commonly works with thread-bound transactions managed by a PlatformTransactionManager, exposing a transaction to all data access operations within the current execution thread. Note: This does NOT propagate to newly started threads within the method.

Target: This mainly applies to thread-bound transactions managed by PlatformTransactionManager It is. I was able to check the overall structure, but it was not easy to find whether ThreadLocal was present in the code.

In a different direction, regarding @Transactional, I checked the article [12] that states that Reactor will take on the role of ThreadLocal when changing from the existing Imperative Transaction Management to Reactive Transaction Management. As guessed, the transactional state bound to one thread is stored and managed in ThreadLocal. If you search for SpringFramework as ThreadLocal to look for it in the code, you will find a TransactionSynchronizationManager that looks related and has a detailed explanation [13].

In line with the comment 'Central delegate that manages resources and transaction synchronizations per thread.', you can see various ThreadLocals scattered around:

public abstract class TransactionSynchronizationManager {

	private static final ThreadLocal<Map<Object, Object>> resources =
			new NamedThreadLocal<>("Transactional resources");

	private static final ThreadLocal<Set<TransactionSynchronization>> synchronizations =
			new NamedThreadLocal<>("Transaction synchronizations");

	private static final ThreadLocal<String> currentTransactionName =
			new NamedThreadLocal<>("Current transaction name");

	private static final ThreadLocal<Boolean> currentTransactionReadOnly =
			new NamedThreadLocal<>("Current transaction read-only status");

	private static final ThreadLocal<Integer> currentTransactionIsolationLevel =
			new NamedThreadLocal<>("Current transaction isolation level");

	private static final ThreadLocal<Boolean> actualTransactionActive =
			new NamedThreadLocal<>("Actual transaction active");
...

What is the Java JIT compiler?

[14] seems to be sufficient.

Why does Tomcat create a thread when it receives a request?

Tomcat has a thread pool. For each request, Tomcat allocates a thread out of the thread pool, and after the thread responds, it returns to the thread pool and becomes free.

Tomcat request process - Image from tomcat docs [15]

What is a concurrent hash map? [18]

HashMap is not thread-safe, but Hashtable provides thread-safety by synchronizing operations.

Hashtable is thread safe, but its performance is poor. If you want high-concurrency and high-throughput, ConcurrentMap may be the answer.

ConcurrentMap is an extension of the Map interface and is designed to solve the corresponding throughput problem in thread-safety situations. By overriding several default methods, ConcurrentMap provides implementation guidelines for providing thread-safe and memory-consistent atomic operations.

ConcurrentHashMap (CHM) is a ConcurrentMap implementation.

For performance purposes, CHM consists of table buckets made up of corresponding nodes, and mainly performs CAS operations at update time. . Table buckets are lazy initialized. Each bucket is locked independently by locking the first node of the bucket. Read operations are not blocked and updates are minimized.

The number of buckets required is relative to the number of threads accessing the table, so that there is usually no more than one update in progress per bucket.

Therefore, in the constructor of CHM, you can set concurrencyLevel in addition to initialCapacity and loadFactor, which also have HashMap (however, starting from Java 8, the former two are reserved for backward compatibility and only apply to the initial map size).

How does the hashCode function affect the performance of a hash map?

HashMap uses the key's hashCode() and equals() methods to divide values between buckets. If multiple hashCode() values reach the same bucket, the hashMap is made into a linked list, so O(1) becomes O(n).

Why was Kotlin created?

What are the advantages of Kotlin?

How is webflux different from mvc?

Does Node also use threadpool? [21, 22]

Node.js is designed based on single-threaded. Nodes perform non-blocking operations through event-based concurrency and enable concurrency.

Modern OS provides a new API that can issue I/O requests to disk. It is usually called asynchronous I/O. This API provides the ability for an application to issue I/O and immediately return control to the caller before the I/O completes.

The API for such functions on a Mac basis is as follows:

struct aiocb {
    int             aio_fildes;
    off_t           aio_offset;
    volatile void   *aio_buf;
    size_t.         aio_nbytes;
}

int aio_read(struct aiocb *aiocbp);

int aio_error(const struct aiocb *aiocbp);

One issue that makes event-based concurrency difficult is state management. Just as existing thread-based concurrency easily manages state using the stack, events also require individual state management. This part is mainly connected to the key that distinguishes events based on the concept of continuation, so it creates a data structure, stores the state, and retrieves and processes it when necessary.

References

[1] https://www.linkedin.com/pulse/static-variables-methods-java-where-jvm-stores-them-kotlin-malisciuc/

[2] https://www.linkedin.com/feed/update/urn:li:article:7549517882646272606?commentUrn=urn%3Ali%3Acomment%3A%28article%3A7549517882646272606%2C6666775262614687744%29&replyUrn=urn%3Ali%3Acomment%3A%28article%3A7549517882646272606%2C6799732294023311360%29

[3] https://stackify.com/memory-leaks-java/

[4] https://github.com/pinpoint-apm/pinpoint

[5] http://research.google.com/pubs/pub36356.html

[6] https://pinpoint-apm.gitbook.io/pinpoint/want-a-quick-tour/overview

[7] https://pinpoint-apm.gitbook.io/pinpoint/want-a-quick-tour/techdetail

[8] https://github.com/pinpoint-apm/pinpoint/releases

[9] https://www.cs.helsinki.fi/u/pohjalai/k05/okk/seminar/Aarniala-instrumenting.pdf

[10] https://www.baeldung.com/java-instrumentation

[11] https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/transaction/annotation/Transactional.html