SPO600 Project Stage 3
We finally reached the final stage of the project!
In the last
stage, I will perform a comparison of how jpeg-compressor’s performance differs
in different architecture platforms. There are SSE (Intel) version, SVE2 (ARM)
version, and the original source without any optimizations. The mutual system
the three programs will run on is an X86_64 machine (portugal.cdot.systems)
Let’s start
off with the original one. First, we need to disable SSE in jpgd.cpp by setting
JPGD_USE_SSE2 to 0 because it is enabled by default. Then we config Cmake to
build our program. After running the program with an image. Here is the result:
Next, SSE
is enabled:
We can
clearly see the decompress time is hugely different between 2 versions: 908 ms
in non-optimized vs 830 ms in SSE optimized. Note that SSE optimization is only
in decompress function, and there is no optimization in the encoder file
(jpge.cpp). SSE has shown us a jump in performance compared to the original
code without optimization. Finally, we will run the SVE2 program through the qemu
emulator. However, we must compile the program on an aarch64 platform, so I
switch to israel.cdot.systems. Because of different platforms, we should re-run
the original code without optimizing the new machine. Hopefully, SVE2 will perform
as the SSE version. First, this is the result of the non-SVE2 program:
Then we
have SVE2 in place:
In this
case, SVE2 is slower because we run it through the emulator. To make the
comparison fair, we should also use the emulator for the non-optimization:
Here we go!
This time, SVE2 truly demonstrates its performance times faster than the original.
I am so excited to see when the hardware that supports SVE2 becomes available
in the future.
Now, I will
disassemble the program to find SVE2 optimizations.
For example,
this function:
When it is compiled
to assembly (using objdump to view assembly code), it will be like this:
We can
notice SVE instructions by “whilelo”. As its name suggests, the function
performs some sort of resetting word data by accessing its pointer and setting
its high and low bits using the bitwise operator AND and the shift-right
operation.
To sum up,
it has been an exciting experience to work with an open-source project and a
new technology. Even though it was really difficult because of unpopular programming
languages, such as Assembly and SVE instructions, the course has expanded my
knowledge the understand that the technology world is so enormous. The true high
level of programming is able to understand and work with those advanced
technologies.
Comments
Post a Comment