I merged the Float16 feature to v8 today. (Float16 is half precision defined in IEEE754.)

CL: https://chromium-review.googlesource.com/c/v8/v8/+/5082566

image

Meaning is, I added Float16Array, DataView.getFloat16, DataView.setFloat16, and Math.fround16 to the javascript engine.

The changed files total 82, and the changes are quite significant, ranging from -409 to +1385 in size.

It took almost 6 months to identify and merge the issue.

In July 2023, I decided to contribute to JavaScript through V8, shifting my focus from Blink, where I mainly contributed to CSS/HTML.

Since I’m already familiar with the Chromim workflow having the Chromim committer privilege with many contributions, I believe I could expand my knowledge with a proper size issue.

When I found an issue “Implement Float16Array”, I thought I could learn memory management. I considered it a rare chance to tackle an assignment involving empty-assigner implementation.” As a result, I assigned myself to this issue.

Each step, such as “prepare implementation”, “implementation”, and “code review” has various complex challenges.

In this post, I might not cover everything, but I will provide an overview of the process.

Journey of implementing

Take a look at stage-3 specification

https://tc39.es/proposal-float16array/

First of all, I read the spec.

I could implement this by following the spec as Float16Array is a stage 3 proposal that recommends implementation.

As you can see, the spec added the branch when storing/loading the value and utility functions in DataView, Math.

Finding the start points

I try to find start point by extensively reading the codes. V8 actively uses macros to keep the DRY principle like below:

image

Sometimes it was a less favorable developer experience as were lost symbols from chromium code search and hard to search due to adding suffixes or prefixes by the macros when I was unfamiliar with the code structure. But after with familiar, it was quite comfortable.

For the implementation, my approach was initially adding the Float16 here and then checking the compiler errors and fixing those errors.

About float16

As you know in the above macro image, specifying the C type is essential. However, I cannot guarantee universal support for float16 across all build targets, and it also necessitates adjustments to compiler options and low-level optimizations. Actually, I tried that by searching RISC-V and x86 specifications. But it requires a lot of changes and is too broad. Hence, I asked @syg, who is a Google v8 team member, to separate the problem. He gave me useful options so I could make the direction. (Thanks!) As a result, I decided to use uint16_t, which has the same 2 bytes, and implement it as software instead of the native float16 type from the compiler or hardware.

Implement the conversion

I spent much time to find and understand the conversion algorithm. I can find a bunch of implementations. Initially, I found the logic using the bit operations and float multiplication (adding exponent). (It has been changed to Chromium’s third party while code-review.)

Not only implementation as C++ but also CSA implementation is needed. It was like assembly language development bit more comfortable.

Therefore, adding the CSA like the following:

image

Apply the conversions

The conversions should be applied for both storing and loading from the memory. In v8, It has been implemented using template specialization. Hence, I implement the specialization function to convert between float or double to float16 value like:

image

Also while implementing the spec defined DataView.getFloat16, DataView.setFloat16, Math.fround16 functions apply the conversions.

Following is DataView.getFloat16’s implementation. Written by torque language. It can use CSA implementation because its compile result is CSA code.

image

Prevent JIT: Deoptimize

The JIT that will generate machine code will start with Turbofan or Maglev. When building the bytecode graph, the ‘Deoptimize’ node will be added if the type is float16 to work as software until float16 machine-level supports.

image

Adding the ‘Deoptimize’ node without the condition to prevent the JIT temporally will cause a deopt loop. This will be resolved following tasks.

Test code

The many test codes to test TypedArray exist in test262 and mjsunit. I added the Float16Array type with fixing failure and then added test cases for Float16 types.

Code Review

This process took approximately three months Many diffs, async communication make this maybe… Nevertheless, the code has undergone significant improvements during the review.

Thanks to all reviewers syg@, cbruni@, dmercadier@, dmercadier@. Really appreciated it!

End of this article

Throughout the implementation, I lost the way and sometimes moved around to implement this. Consequently, a substantial amount of my (spare) time was invested. However, this journey allowed broadening my knowledge, particularly in understanding the V8 structure and pipeline. The fun thing was this CL was the first CL to V8 but the other CLs have been merged already into V8!

You can find the option --js_float16array with compile d8 yourself! With that option, you can use Float16Array.

However, Exposing it to end users will require time to resolve several issues like the Turbofan optimization loop, and hardware support. I’ll try my best to introduce this thing to the world maybe someday.



To Reader

Thank you for reading this post! If you find some error, misspelling, or something awkward, please feel free to contribute here!