Kotlin’s data classes are cool. They are the recommended way of replacing POJOs
and it is usually recommended to use them with val
s so the classes are
immutable.
They come with a handy copy()
method that “clones” the data class, which can
also be parameterized with any named argument, so the new object has all the
fields of the original class + the overridden fields. So far so good.
How does Kotlin’s copy() work under the hood?
The way this is implemented in the bytecode is by adding two methods that do all the heavy-lifting. e.g.:
// shared.kt
data class Shared(
val st: String = "some string"
)
If we inspect the bytecode of this class we see something like:
$ javap -l Shared.class # I'm trimming irrelevant stuff
...
public final Shared copy(java.lang.String, boolean, long);
descriptor: (Ljava/lang/String;ZJ)LShared;
flags: ACC_PUBLIC, ACC_FINAL
Code:
stack=6, locals=5, args_size=4
0: new #2 // class Shared
3: dup
4: aload_1
5: iload_2
6: lload_3
7: invokespecial #35 // Method "<init>":(Ljava/lang/String;ZJ)V
10: areturn
...
public static Shared copy$default(Shared, java.lang.String, boolean, long, int, java.lang.Object);
descriptor: (LShared;Ljava/lang/String;ZJILjava/lang/Object;)LShared;
flags: ACC_PUBLIC, ACC_STATIC, ACC_SYNTHETIC
.... truncated
We see here two different copy methods. One is a member method that receives all parameters necessary to create a copy of a data class. Not sure why this one is necessary since it basically just calls the constructor of the data class.
In this particular example, it loads the string (aload1
to load the first
parameter, i.e. the String), the boolean (iload2
), and the long (lload3
) and
calls the constructor (invokespecial ... <init>
) of the class with that, and
returns that reference (aretrun
).
copy$default
seems to be the one method calling the copy()
method. It has a
more interesting signature: it receives the instance being copied (the first
param), followed by all the fields of the data class, followed by an int
, and
then a weird java.lang.Object
.
Let’s first see the bytecode for something using the copy method:
// some.kt
fun doSomething(arg: Shared) {
println(arg.copy(st = "new-value", l = 10L))
}
The bytecode for the code above:
javap -c SomeKt.class
Compiled from "some.kt"
public final class SomeKt {
public static final void doSomething(Shared);
Code:
0: aload_0
1: ldc #9 // String arg
3: invokestatic #15 // Method kotlin/jvm/internal/Intrinsics.checkNotNullParameter:(Ljava/lang/Object;Ljava/lang/String;)V
6: aload_0
7: ldc #17 // String new-value
9: iconst_0
10: ldc2_w #18 // long 10l
13: iconst_2
14: aconst_null
15: invokestatic #25 // Method Shared.copy$default:(LShared;Ljava/lang/String;ZJILjava/lang/Object;)LShared;
18: astore_1
19: iconst_0
20: istore_2
21: getstatic #31 // Field java/lang/System.out:Ljava/io/PrintStream;
24: aload_1
25: invokevirtual #37 // Method java/io/PrintStream.println:(Ljava/lang/Object;)V
28: return
}
The fun part starts at instruction 6: the arguments for the copy$default
method are prepared:
6 aload_0
: loads the instance ofShared
to be copied (first parameter forcopy$default()
)7: ldc
this loads the “new-value” constant9: iconst_0
loads the boolean10: ldc2_w
loads the long param13 iconst_2
loads a mysterious int. More on this later.aconst_null
passes null as last parameter.
Now, with this information, the copy$default
method is responsible for
figuring out which fields it has to copy from the original object and which ones
it needs to override. My guess here is that it uses the int as a mask to infer
this… let’s see if that’s the case. This is the bytecode for the
copy$default
method:
public static Shared copy$default(Shared, java.lang.String, boolean, long, int, java.lang.Object);
descriptor: (LShared;Ljava/lang/String;ZJILjava/lang/Object;)LShared;
flags: ACC_PUBLIC, ACC_STATIC, ACC_SYNTHETIC
Code:
stack=5, locals=7, args_size=6
0: iload 5
2: iconst_1
3: iand
4: ifeq 12
7: aload_0
8: getfield #11 // Field st:Ljava/lang/String;
11: astore_1
12: iload 5
14: iconst_2
15: iand
16: ifeq 24
19: aload_0
20: getfield #19 // Field b:Z
23: istore_2
24: iload 5
26: iconst_4
27: iand
28: ifeq 36
31: aload_0
32: getfield #25 // Field l:J
35: lstore_3
36: aload_0
37: aload_1
38: iload_2
39: lload_3
40: invokevirtual #47 // Method copy:(Ljava/lang/String;ZJ)LShared;
43: areturn
The fun starts right at the beginning:
0: iload 5
we are loading that interestingint
into the operand stack2: iconst_1
then we load the constant1
3: iand
then weAND
those 2 integers. The result is is push into the operand stack.4: ifeq 12
means that if the result of the previous step is == 0, then we jump to instruction 12. Otherwise, we execute the following instruction.- When the
ifeq
does not succeed we replace the parameter (in this case the first String) with whatever is inside the original object (aload_0
to load the object,getfield
to get the String field,astore_1
to replace the method parameter with that value). - Otherwise, it means we need to use whatever value provided as argument to
copy$default
, i.e. we will override that field.
- When the
Same deal for the other two parameters. And then:
36: aload_0
and thenaload_1
,iload_2
,lload_3
are used to load the parameters (or the values we replaced before) into the operand stack so they can be passed to thecopy()
method.
If this is not 100% clear, this is what the code would look like if in Java:
public static Shared copy$default(Shared original, String s, boolean b, long l, int mask, Object x) {
if (mask & 1 == 0) s = original.st;
if (mask & 2 == 0) b = original.b;
if (mask & 4 == 0) l = original.l;
return copy(s, b, l);
}
From this, we can infer that this mask
int is hardcoded into the bytecode for
calls of .copy()
. In the example above, where we are overriding the first and
third fields of the class it passed the number 2 (iconst_2
). In binary 010
,
which when used in the copy$default
means exactly that: override the first and
third parameter, but reuse the middle one.
It’s also interesting that the extra Object
parameter has no apparent use.
So what?
Ok, so this was fun, but what’s the point. Why is this bad?
This is pretty nice, but I found it can cause issues sometimes. Imagine the following:
- The data class lives in library A, v1
- Library B uses A-v1 and also calls the
copy
method. - App X uses both libraries A-v1 and B.
- All of them with a different release cadence, and potentially the responsibility of different teams.
So far so good. Now imaging we update the data class in library A and add a new field:
data class Shared(
val st: String? = "some string",
val b: Boolean = false,
val l: Long = 0L,
val extra: String? = null
)
This might seem like a safe change. We added a new field, but it should not interfere with any usage of this class. It is in theory backward compatible.
Say we now release library A v2 with this change. And App X upgrades to this new version. App compiles and releases fine, except that you get a nasty surprise at runtime:
java.lang.NoSuchMethodError: 'Shared.copy$default(Shared, java.lang.String, boolean, long, java.lang.String, int, java.lang.Object)'
Ooops. Library B’s bytecode still relies on the old signature of library A v1, and so it fails to invoke that method.
This sucks because technically the class should be compatible. In fact, updating to library A v2 in library B should compile fine (the bytecode will change to include the new signature).
In fact, if this was done in old good Java without anything fancy it would have
just worked, i.e. assuming the copy
feature is implemented manually like this:
// some.java
void doSomething(Shared arg) {
Shared copy = new Shared();
copy.st = "new-value";
copy.b = arg.b;
copy.l = arg.l;
System.out.println(arg);
}
In other words, the verbose, naive, explicit, ugly Java version would not break production as Kotlin did.
Takeaways
Just be careful when using .copy()
especially when the data class is going to
be shared in a library.
This is nothing new (breaking prod because of incompatible library versions), except that it sucks that the change is technically backward compatible.
Fancy language features are not magic and can come with nasty surprises.