I've put together questions you'll likely run into when interviewing for backend engineering roles, organized by topic. Every company has its own flavor, but the major branches — Java, Spring, databases, and operations — show up almost everywhere. Here are the items worth reviewing before you apply.
No matter how good your hash function is, collisions are inevitable thanks to the pigeonhole principle. How you handle them determines the data structure's performance.
Open Addressing
Resolves collisions by using empty slots within the table itself. It needs no extra memory and is cache-friendly because data sits in contiguous memory.
Linear Probing: Scans the next slot sequentially on collision. Simplest to implement, but suffers from primary clustering — once data starts piling up in one area, lookups slow down dramatically.
Quadratic Probing: Probes slots n² steps away. Reduces primary clustering, but keys with the same hash still trace the same path — that's secondary clustering.
Double Hashing: Uses one hash for the position and a second hash for the step size. Produces the most uniform distribution but pays the cost of two computations.
Open addressing performance drops sharply once the load factor exceeds about 0.7, so rehashing at the right time is essential.
Separate Chaining
Hangs data off each bucket using a linked list or tree. Java's HashMap is the canonical example, with one interesting detail.
Since Java 8, buckets with more than 8 nodes get converted from a linked list to a red-black tree (treeify)
If a tree shrinks to 6 or fewer nodes, it converts back to a linked list (untreeify)
The point is to bring worst-case lookup from O(n) down to O(log n)
Interview tip: Expect a follow-up: "Which would you prefer?" There's no single answer — say it depends on data distribution and memory constraints. For example, open addressing wins for small, cache-sensitive datasets, while separate chaining handles frequent collisions and dynamic sizing better.
In the OS section, LRU is almost guaranteed to come up. The same concept applies to caching policies, so it's worth knowing well.
FIFO: Evicts the oldest page first. Simple, but susceptible to Belady's Anomaly — adding more frames can actually increase page faults.
Optimal (OPT): Evicts the page that will go unused the longest. Theoretically optimal but requires knowing the future, so it's only used as a benchmark.
LRU (Least Recently Used): Evicts the page that hasn't been touched in the longest. Approaches optimal performance by exploiting temporal locality. Typically implemented with a HashMap + Doubly Linked List for O(1).
LFU (Least Frequently Used): Evicts the page with the fewest accesses. Downside: heavily-accessed pages stick around forever, blocking new data (cache pollution).
MFU: The opposite of LFU. The idea is that frequently-used pages have already served their purpose.
Implementing an LRU cache from scratch is a classic LeetCode problem and shows up in interviews. Know both approaches: using LinkedHashMap, and building it by hand with a doubly linked list plus a hash map.
The heap splits into young generation and old generation. The design rests on the weak generational hypothesis — most objects die young.
Young generation layout:
Eden: Where new objects land first
Survivor 0, 1: Where surviving objects move from Eden. The two spaces alternate.
Types of GC:
Minor GC: Runs in the young generation. Fast and frequent.
Major GC (Full GC): Includes the old generation. Slow, with long stop-the-world pauses.
GC algorithms:
Serial Collector: Single-threaded. Suited to apps under ~100MB. -XX:+UseSerialGC
Parallel Collector: Multi-threaded, throughput-oriented. Default in JDK 8. -XX:+UseParallelGC
CMS (Concurrent Mark Sweep): Concurrent marking minimizes STW. Deprecated in JDK 9, removed in JDK 14.
G1 (Garbage First): Manages the heap in regions. Default since JDK 9. Best for large heaps (4GB+).
ZGC: Cuts STW under 10ms. Experimental in JDK 11+, official in JDK 15+. Targets massive heaps (TB scale).
Shenandoah: A low-latency GC similar to ZGC, led by Red Hat.
Interview tip: When asked which GC you've used, don't just name it — explain why you chose it. Example: "We used G1 because response time mattered. It reduced full-GC frequency compared to CMS."
// StreamList<Integer> result = list.stream() .filter(x -> x > 10) .map(x -> x * 2) .collect(Collectors.toList());// Parallel Streamlist.parallelStream() .filter(x -> x > 10) .map(x -> x * 2) .collect(Collectors.toList());
When does parallel stream pay off?
Good fit:
Dataset is large (typically 10,000+ elements)
CPU-bound work
Independent units of work
Result order doesn't matter
Bad fit:
Small datasets (overhead exceeds gains)
I/O-bound work (threads sit idle)
Mutating shared state (synchronization cost)
Inside a transaction (different thread, no transaction propagation)
Parallel streams use a shared common ForkJoinPool by default. One greedy task can starve unrelated work, so production systems should pin streams to a dedicated pool.
A staple interview question. The more reasons you can give, the better.
Catches circular references at startup — Field/setter injection only fails when the method is invoked at runtime; constructor injection fails immediately during container initialization. Spring Boot 2.6+ blocks circular references by default.
Enables immutability — Mark dependencies final so they can't be reassigned.
Easy to test — You can call new UserService(mockRepo) directly without the container. Field injection is untestable without reflection.
Built on B+ trees. All leaves sit at the same depth, so the tree stays balanced.
A million rows usually means just 3–4 levels of traversal.
Leaf nodes link to each other, so range scans are also fast.
B+ tree vs B-tree:
B-tree stores data in every node
B+ tree stores data only at the leaves; internal nodes only index
B+ tree is better for range queries and is more disk-I/O friendly
Indexes aren't always free:
Writes get more expensive (every INSERT/UPDATE/DELETE updates the index too)
Extra disk space
Inefficient on low-cardinality columns (e.g., gender)
Full scans can be faster on small tables
The composite index trap:
Given an index on (A, B, C):
WHERE A = ? → uses the index
WHERE A = ? AND B = ? → uses the index
WHERE A = ? AND C = ? → partial use
WHERE B = ? → does not use it (matching starts from the leftmost column)
Covering indexes:
If every column the query needs lives in the index, the database can answer the query without touching the table — that's a covering index, and it's blazingly fast.
If you use JPA, this one finds you sooner or later.
The problem:
// Fetch members, then access each member's teamList<Member> members = memberRepository.findAll(); // 1 queryfor (Member m : members) { System.out.println(m.getTeam().getName()); // N queries}
That's N+1 queries total. A hundred members means 101 database calls.
Solutions:
JOIN FETCH:
@Query("SELECT m FROM Member m JOIN FETCH m.team")List<Member> findAllWithTeam();
Bundles N items into IN clauses, reducing N+1 to N/batch+1
QueryDSL: Flexible for dynamic queries
Projections (DTO direct fetch): Pull only the columns you need
JOIN FETCH doesn't paginate when fetching collections — it loads everything into memory and paginates there, which is dangerous. For paginated results, use the batch-size approach.
Interview questions tend to follow a similar shape across topics. If you prepare answers around three threads — "Why do we use it? What are the downsides? What are the alternatives?" — you'll handle whatever any company throws at you.
For technologies you've used in production, write down a one-line answer to "Why did we choose this?" Interviewers don't want the textbook answer; they want to see your thought process.
One last thing: when you don't know an answer, saying "I don't know" is itself a skill. Follow it with "but I'd approach it like this," and the interviewer often comes away more impressed than if you'd guessed. Nobody knows everything.
Lose an hour in the morning, and you will spend all day looking for it.
— Richard Whately
Other posts
・ 2 min
Stop Chasing Butterflies — Build a Garden They'll Come To
・ 5 min
Key Lessons on Market Sizing and Business Plans from Startup Training