SPO600 Project Stage 2

 Let’s start implementing jpeg-compressor (https://github.com/richgel999/jpeg-compressor).

 

In Stage 1, I mentioned that I would choose to implement jpeg-compressor using intrinsics. When looking at the two main files (jpge.cpp and jpgd.cpp). It is possible that SVE intrinsics can be applied to many functions because many of those have a common pattern of using arrays as input arguments, for loop, and while loop, which is very similar to scale volume functions in Lab 6. For example:



However, it is not time efficient to do converting functions from the regular loop to the SVE2 loop. It may take days or even years to finish implementation, debugging, and testing for beginners. Thus, auto-vectorization should be a better solution for big projects.

Jpeg-compressor is already optimized with SSE, which is an Intel optimization equivalent to SVE2, which is ARM. So, I think SSE should be disabled to avoid any platform conflict.

Looking at jpeg-compressor documentation on GitHub, the author says to set JPGD_USE_SSE2 to 0 to completely disable the usage of jpgd_idct.h. By searching the keyword “SSE” in Visual Studio Code, I found the configuration is in jpgd.cpp:



Then I set all (1) to (0). After disabling SSE as instructed, I build the program on Visual Studio and run it with an image:



That is on the X86_64 platform. Let’s see what happens when I upload the source code to an ARM machine (israel.cdot.systems). To start with, there is a CmakeLists.txt that helps us to build programs that have a complex structure. After building the source, our program is in the bin folder:


Then I test it with an image:



A successful running means there is no platform conflict. We can start auto-vectorizing the program from this point.

CMake makes everything easier. Take a look at CmakeLists.txt, and I found the command Cmake uses to build the program:



We can put flags “ -O3 -march=armv8-a+sve2” to instruct Cmake to build the project with auto-vectorization. After that, we will have a new jpge in the bin folder. Let’s run it:


It is the same error I encountered in Lab 6 when running a SVE2 optimized program on a platform that does not support the architecture. We need qemu-aarch64 to simulate the new architecture on Armv8 (israel.cdot.systems):



Here we go! The program runs successfully as I expected. In Stage 3, I will do an analysis of how the SVE2 takes effect on the program’s performance compared to the original solution.

Comments

Popular posts from this blog

SPO600 Lab 2

SPO600 Lab 1

SPO600 Lab 3