February 2024
https://github.com/xonixx/gc_less - in this repository I did a series of experiments in GC-less (heap-less) Java using sun.misc.Unsafe
and also the newest alternative java.lang.foreign.MemorySegment
.
GC-less (heap-less) means we allocate data structures in native memory directly (outside JVM heap). Such structures are not visible to GC. But this means that we are responsible for their (explicit) de-allocations (i.e., manual memory management).
For fun and out of curiosity.
Basically it started when I wanted to check if the famous sun.misc.Unsafe
approach is still viable with the latest Java.
Also, I wanted to try to program in Java like if I program in C and see how it goes.
It was a surprise to find that this class is still supported by the latest (at the time of this writing) Java version 23.
sun.misc.Unsafe
class is unsafe indeed. You can easily crash your JVM with just a couple lines of code:
Yet still it appears to be pretty widely used.
Anyway, of interest to us is a family of methods in this class, related to off-heap memory management.
So as a practical application of GC-less (heap-less) style of programming I decided to implement some basic data structures:
Not only was I interested in how easy it is or if it’s feasible at all. I also wondered how the performance will compare to plain Java’s data structures.
Instead of writing separate implementations for each Java type (int, long, double, etc.) I decided to generate them by templates.
This way from a single template (for example, TemplateArrayList.java) the set on specialized implementations are generated:
Interesting to note is that the template itself is a runnable code. In fact implementation-wise it corresponds to long
-specialized version.
This is convenient, because we can test a template and this makes sure all the specialized implementations are correct too.
The generation is implemented in form of a script gen.awk (Why AWK?).
We use annotation @Type and class Tpl to denote patterns to be replaced by a generator:
To make sure that we indeed do not (accidentally) consume heap I enabled the Epsilon GC setting.
Epsilon GC is “a GC that handles memory allocation but does not implement any actual memory reclamation mechanism. Once the available Java heap is exhausted, the JVM will shut down.”
I implemented the data structures in a slightly “strange” OOP-resembling way.
So, for example, take a look at IntArrayList.
All the methods of this class are static, because we don’t want to allocate object on heap.
Also, all the methods (except for allocate
, which plays a role of the constructor) accept long address
as first parameter. It plays a role of this
(in a sense, it is somehow analogous to Python’s self
parameter).
So the usage looks like:
long address = IntArray.allocate(cleaner,10);
IntArray.set(address, 7, 222);
System.out.println(IntArray.get(address, 7));
So this long address
plays a role of a reference. Obviously, with this you can’t have inheritance / overriding, but we don’t need it.
Cleaner
and Ref
classesThe idea of the Cleaner class is you pass the instance of it in the allocation procedure of a (GC-less) class, such that the Cleaner
becomes responsible for free-ing the memory of the allocated instance. This plays best with try-with-resources.
try (Cleaner cleaner = new Cleaner()) {
long array = IntArray.allocate(cleaner,10);
IntArray.set(array,1,1);
long map = IntHashtable.allocate(cleaner,10,.75f);
map = IntHashtable.put(map,1,1);
/*
Why do we re-assign the map?
It's because when the data structure grows,
it can eventually overgrow its initially-allocated memory region.
The algorithm will re-allocate the bigger internal storage which will cause
the address change.
*/
map = IntHashtable.put(map,2,22);
System.out.println(IntArray.getLength(array));
System.out.println(IntHashtable.getSize(map));
}
// both array and map above are de-allocated, memory reclaimed
Ref class is needed when we want to register some object for cleanup. But we can’t simply register its address, because the object can be relocated due to memory reallocation due to growing. So it means, the GC-less object registers for cleanup by Cleaner
via the Ref
and then the Ref
pointer gets updated on each object relocation.
I had an idea to implement memory leaks detection. This appeared relatively easy to achieve. The idea: on each memory allocation we remember the place (we instantiate new Exception()
to capture a stack trace). On each corresponding memory free()
we discard it.
Thus, at the end we check what’s left.
I implemented a base test class such that test that extends it can automatically ensure the absence of memory leaks.
So we can say that one of the benefits of manual memory management off-heap is deterministic memory usage.
The class Main4 provides a visual demonstration of allocation and de-allocation in a loop as seen via memory graph of Task Manager:
It appears, that very recently this JEP emerged: “Deprecate Memory-Access Methods in sun.misc.Unsafe for Removal”.
This means that despite using memory-access methods in sun.misc.Unsafe
is fun and sometimes useful, we can no longer rely on this functionality.
The JEP happens to provide the safer alternatives. I decided to take a deeper look at the updated API and understand how it compares to now deprecated sun.misc.Unsafe
.
As an alternative we now have java.lang.foreign.MemorySegment
class. It’s worth mentioning that this class is a part of a bigger API, dedicated to “invoking foreign functions (i.e., code outside the JVM)” that enables “Java programs to call native libraries and process native data without the brittleness and danger of JNI”.
Overall I found that the new API provides a “managed” API over the native memory. That is, when previously we accessed memory via long
that represented the raw memory address, now we have java.lang.foreign.MemorySegment
object (that represents both a pointer and a memory region).
This is, obviously, good. Using a dedicated get/set methods from that class gives much safer guarantees, like, for example, built-in out-of-bounds access checks, control of accessing properly aligned memory, etc.
It can seem that this API is much heavier. For every off-heap allocation we need to have an on-heap handle in the form of MemorySegment
instance. So, we can think that this can render the idea of using the algorithms with lots of small allocations non-viable.
Surprisingly, this is not the case!
My sun.misc.Unsafe
-based hashtable implementation is an example of an algorithm that uses many allocation. It’s a well-known hashtable algorithm (analogous to standard Java’s HashMap
), with an array of buckets filled based on the hash code, and using linked lists of nodes for elements.
For the sake of experiment I converted it to MemorySegment
. I was expecting that, because of many MemorySegment
objects allocated, its JVM heap consumption will be proportional to the hashtable size.
Instead, experiment shows that the heap memory consumption is rather O(1).
It appears that:
MemorySegment
is a value-based class as stated by its javadoc. Value-based classes represent the first move towards the inline classes that will be introduced by the project Valhalla.MemorySegment
. This is implemented by excessive usage of @jdk.internal.vm.annotation.ForceInline
annotation:
My take on this is that during the JIT-compilation all allocations of MemorySegment
s are completely removed by inlining this class.
While doing these experiments I was lucky enough to meet this article: Internals of sets and dicts. It describes the implementation details of sets and maps in the Python programming language.
It appears, that the dict
(this is the name for hashtable in Python) algorithm in Python doesn’t use many allocations. Instead, it allocates one piece of memory and distributes the key, value, hashes data in it. When collection grows, it re-allocates this continuous piece of memory and re-distributes the data. This is exactly what we need!
So I came up with this MemorySegment
-based, Python-based off-heap hashtable implementation.
With Java it’s inevitable, that even if you implement an off-heap algorithm, some heap will be used. So we can only consider an off-heap algorithm practical if it consumes small/fixed amount of heap. Otherwise, if its additional consumption of heap is proportional to the problem size, off-heap doesn’t make much sense.
I’ve created this small experiment - a program that allocates a hashtable and fills it with 50 million of elements for each of 3 implementation:
Unsafe
-based hashtableMemorySegment
-based hashtableThe experiment runs with Epsilon-GC with heap size fixed at 20 MB.
We can confirm that all three implementation pass the test with this heap limit.
[0.002s][info][gc] Using Epsilon
[0.003s][warning][gc,init] Consider enabling -XX:+AlwaysPreTouch to avoid memory commit hiccups
[0.011s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 1060K (5.18%) used
===== Unsafe-based hashtable =====
Unsafe-based: 0
Unsafe-based: 1000000
Unsafe-based: 2000000
Unsafe-based: 3000000
...
Unsafe-based: 47000000
Unsafe-based: 48000000
Unsafe-based: 49000000
Freeing...
Freeing local addr 140590889037840...
Freeing local ref 140596292148752...
Freeing locals 140596292147328...
===== MemorySegment-based hashtable =====
[16.869s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 2106K (10.28%) used
[16.941s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 3159K (15.43%) used
[16.980s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 4262K (20.82%) used
[17.019s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 5438K (26.55%) used
MemorySegment-based: 0
[17.084s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 6665K (32.55%) used
[17.129s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 7894K (38.55%) used
[17.165s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 9123K (44.55%) used
[17.186s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 10352K (50.55%) used
[17.203s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 11581K (56.55%) used
MemorySegment-based: 1000000
MemorySegment-based: 2000000
MemorySegment-based: 3000000
...
MemorySegment-based: 47000000
MemorySegment-based: 48000000
MemorySegment-based: 49000000
===== Python-based hashtable =====
Python-based: 0
Python-based: 1000000
Python-based: 2000000
Python-based: 3000000
...
Python-based: 47000000
Python-based: 48000000
Python-based: 49000000
[33.169s][info ][gc ] Heap: 20480K reserved, 20480K (100.00%) committed, 11993K (58.56%) used
Finally, I would like to present to your attention couple benchmarks I’ve created.
AccessSpeedTest.java measures read/write speed (lower = better):
int[] |
Unsafe |
MemorySegment |
|
---|---|---|---|
Read test, ms | 1885 | 1883 | 1879 |
Write test, ms | 2177 | 2213 | 2214 |
Surprisingly we can hardly see here any noticeable difference in speed, despite int[]
and MemorySegment
have bounds checking, so I expected them to be slower than Unsafe
.
Another speed test is IntHashtableSpeedTest (lower = better):
Implementation | Duration, ms |
---|---|
Java’s HashMap |
40 |
Unsafe-based | 130 |
MemorySegment-based | 141 |
Python-like | 5 |
Python-like (MemorySegment) | 19 |
What can we see here?
HashMap
is 3+ times faster than Unsafe
-based with similar algorithmmap<int,int>
it’s possible to come up with the implementation (Python-like) that is 8 times faster over the built-in implementation (HashMap
)java.lang.foreign.MemorySegment
over sun.misc.Unsafe
.