SPO600 Project Stage 3

 We finally reached the final stage of the project!

 

In the last stage, I will perform a comparison of how jpeg-compressor’s performance differs in different architecture platforms. There are SSE (Intel) version, SVE2 (ARM) version, and the original source without any optimizations. The mutual system the three programs will run on is an X86_64 machine (portugal.cdot.systems)

 

Let’s start off with the original one. First, we need to disable SSE in jpgd.cpp by setting JPGD_USE_SSE2 to 0 because it is enabled by default. Then we config Cmake to build our program. After running the program with an image. Here is the result:



Next, SSE is enabled:



We can clearly see the decompress time is hugely different between 2 versions: 908 ms in non-optimized vs 830 ms in SSE optimized. Note that SSE optimization is only in decompress function, and there is no optimization in the encoder file (jpge.cpp). SSE has shown us a jump in performance compared to the original code without optimization. Finally, we will run the SVE2 program through the qemu emulator. However, we must compile the program on an aarch64 platform, so I switch to israel.cdot.systems. Because of different platforms, we should re-run the original code without optimizing the new machine. Hopefully, SVE2 will perform as the SSE version. First, this is the result of the non-SVE2 program:



 

Then we have SVE2 in place:



In this case, SVE2 is slower because we run it through the emulator. To make the comparison fair, we should also use the emulator for the non-optimization:



Here we go! This time, SVE2 truly demonstrates its performance times faster than the original. I am so excited to see when the hardware that supports SVE2 becomes available in the future.

Now, I will disassemble the program to find SVE2 optimizations.

For example, this function:



When it is compiled to assembly (using objdump to view assembly code), it will be like this:



We can notice SVE instructions by “whilelo”. As its name suggests, the function performs some sort of resetting word data by accessing its pointer and setting its high and low bits using the bitwise operator AND and the shift-right operation.

To sum up, it has been an exciting experience to work with an open-source project and a new technology. Even though it was really difficult because of unpopular programming languages, such as Assembly and SVE instructions, the course has expanded my knowledge the understand that the technology world is so enormous. The true high level of programming is able to understand and work with those advanced technologies.

 

 

 

 

Comments

Popular posts from this blog

SPO600 Lab 2

SPO600 Lab 1

SPO600 Lab 3