SPO600 Project Stage 2
Let’s start implementing jpeg-compressor (https://github.com/richgel999/jpeg-compressor).
In Stage 1,
I mentioned that I would choose to implement jpeg-compressor using intrinsics.
When looking at the two main files (jpge.cpp and jpgd.cpp). It is possible that
SVE intrinsics can be applied to many functions because many of those have a
common pattern of using arrays as input arguments, for loop, and while loop,
which is very similar to scale volume functions in Lab 6. For example:
However, it
is not time efficient to do converting functions from the regular loop to the SVE2
loop. It may take days or even years to finish implementation, debugging, and
testing for beginners. Thus, auto-vectorization should be a better solution for
big projects.
Jpeg-compressor
is already optimized with SSE, which is an Intel optimization equivalent to SVE2,
which is ARM. So, I think SSE should be disabled to avoid any platform
conflict.
Looking at
jpeg-compressor documentation on GitHub, the author says to set JPGD_USE_SSE2
to 0 to completely disable the usage of jpgd_idct.h. By searching the keyword “SSE”
in Visual Studio Code, I found the configuration is in jpgd.cpp:
Then I set
all (1) to (0). After disabling SSE as instructed, I build the program on
Visual Studio and run it with an image:
That is on the
X86_64 platform. Let’s see what happens when I upload the source code to an ARM
machine (israel.cdot.systems). To start with, there is a CmakeLists.txt that helps
us to build programs that have a complex structure. After building the source, our
program is in the bin folder:
Then I test
it with an image:
A
successful running means there is no platform conflict. We can start auto-vectorizing
the program from this point.
CMake makes
everything easier. Take a look at CmakeLists.txt, and I found the command Cmake
uses to build the program:
We can put
flags “ -O3 -march=armv8-a+sve2” to instruct Cmake to build the project with
auto-vectorization. After that, we will have a new jpge in the bin folder. Let’s
run it:
It is the same
error I encountered in Lab 6 when running a SVE2 optimized program on a platform
that does not support the architecture. We need qemu-aarch64 to simulate the new
architecture on Armv8 (israel.cdot.systems):
Here we go!
The program runs successfully as I expected. In Stage 3, I will do an analysis
of how the SVE2 takes effect on the program’s performance compared to the original
solution.
Comments
Post a Comment