I received Java Virtual Machine Specification approximately a week ago. I had been waiting for it three weeks that is why I started to read it immediately. Reading the other chapter of the specification, I realized that I start to understand how and why JVM works in such a way. For better understanding, I am going to write a few articles about it. I will start by looking at generated bytecode by javac and how it interpreted by java runtime. In this post, I intended to outline the class file format in short.
Say hello to bytecode
The start is always hard, let’s start from the beginning, let’s start with the “Hello, world!” example.
public class HelloWorld {
public static void main(String[] args) {
new HelloWorld().sayHello();
}
private void sayHello() {
}
}
To see the bytecode instructions of the below class you need to compile it with javac and then run javap as the follows:
$ javac HelloWorld.java
$ javap -c -p -v HelloWorld.class > HelloWorld.bc
javapprints bytecode instructions to the standard output, so I transferred output into the file calledHelloWorld.bc. You can call it for your choice and with any extension or none. It just convenient to me.java- source code,.class- compiled class,.bc- bytecode representation.
The result of above commands is the following
Classfile /Users/alex-diez/Projects/jvm-internals/HelloWorld.class
Last modified Apr 2, 2017; size 344 bytes
MD5 checksum 25a868af9590095c294cae89fbb9d195
Compiled from "HelloWorld.java"
public class HelloWorld
minor version: 0
major version: 52
flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
#1 = Methodref #5.#15 // java/lang/Object."<init>":()V
#2 = Class #16 // HelloWorld
#3 = Methodref #2.#15 // HelloWorld."<init>":()V
#4 = Methodref #2.#17 // HelloWorld.sayHello:()V
#5 = Class #18 // java/lang/Object
#6 = Utf8 <init>
#7 = Utf8 ()V
#8 = Utf8 Code
#9 = Utf8 LineNumberTable
#10 = Utf8 main
#11 = Utf8 ([Ljava/lang/String;)V
#12 = Utf8 sayHello
#13 = Utf8 SourceFile
#14 = Utf8 HelloWorld.java
#15 = NameAndType #6:#7 // "<init>":()V
#16 = Utf8 HelloWorld
#17 = NameAndType #12:#7 // sayHello:()V
#18 = Utf8 java/lang/Object
{
public HelloWorld();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 1: 0
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: new #2 // class HelloWorld
3: dup
4: invokespecial #3 // Method "<init>":()V
7: invokespecial #4 // Method sayHello:()V
10: return
LineNumberTable:
line 3: 0
line 4: 10
private void sayHello();
descriptor: ()V
flags: ACC_PRIVATE
Code:
stack=0, locals=1, args_size=1
0: return
LineNumberTable:
line 7: 0
}
SourceFile: "HelloWorld.java"
First four lines provide information about the path to the .class file on your computer, last modification date, size, md5 checksum and the .java source file name. Then you can see the class name, major and minor versions and class’s flags. JVM use the constant pool to resolve classes, methods, method’s argument types, primitive and String constants, etc. I will describe it in more details in future posts.
Let start with the HelloWorld constructor.
public HelloWorld();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 1: 0
You may notice that I haven’t written a constructor for the HelloWorld class. However, javac generated it for me. The descriptor shows that it is a method that does not accept parameters and return void, that presented by V symbol. You may spot that args_size is equal to 1. Constructors are special instance methods; therefore, their first argument is always this. locals is the size of local variables array and stack is the size of stack operand, why their sizes are 1 I will write about it later.
Java-JVMtypes table
Java type bytecode signature voidVbyteBshortScharCintIfloatFlongJdoubleDbooleanZreferenceLarray[
Each method has local variables array and operand stack. Their sizes computed during compilation. JVM copies all method arguments into local variables array. That is why HelloWorld constructor operand stack and local variables array have size 1. aload_0 loads element of local variables array with index 0, which is this, onto the operand stack. invokespecial pops the head of the operand stack and executes <init> method of java.lang.Object with it. After that return finish method invocation with void result.
<init>stands for initialization and is a constructor. The linejava/lang/Object."<init>":()Vmeans invokenew Object().
sayHello is a simple method which I wrote just to show how JVM invokes private methods in bytecode instruction. Let’s have a look at main method.
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: new #2 // class HelloWorld
3: dup
4: invokespecial #3 // Method "<init>":()V
7: invokespecial #4 // Method sayHello:()V
10: return
LineNumberTable:
line 3: 0
line 4: 10
The bytecode of main method is interesting. new operation allocates memory for our HelloWorld object and put reference onto the stack, dup duplicates the first item on the operand stack. The first invokespecial invoke <init> method with the first reference on HelloWorld and the second is invoke sayHello with the second reference to the HelloWorld instance.
main method operand stack looks like:
new
Stack L HelloWorld
dup
Stack L HelloWorldL HelloWorld
invokespecial "<init>":()V
Stack L HelloWorld
invokespecial sayHello:()V
Stack
Wrap it up
In this article, we see three bytecode instructions new, dup and invokespecial. new allocates memory for an object. Thus, it means that when you write new MyObject() JVM needs to do a lot of work. It is not an atomic operation, MyObject constructor is not allocating memory, it only initializes fields. dup duplicates head of operand stack; I will cover this instruction in future posts. And invokespecial instruction means that JVM needs to invoke a method of the concrete class and don’t need to look up the method in the virtual table.