Posts

SPO600 Project Stage 3

Image
 We finally reached the final stage of the project!   In the last stage, I will perform a comparison of how jpeg-compressor’s performance differs in different architecture platforms. There are SSE (Intel) version, SVE2 (ARM) version, and the original source without any optimizations. The mutual system the three programs will run on is an X86_64 machine (portugal.cdot.systems)   Let’s start off with the original one. First, we need to disable SSE in jpgd.cpp by setting JPGD_USE_SSE2 to 0 because it is enabled by default. Then we config Cmake to build our program. After running the program with an image. Here is the result: Next, SSE is enabled: We can clearly see the decompress time is hugely different between 2 versions: 908 ms in non-optimized vs 830 ms in SSE optimized. Note that SSE optimization is only in decompress function, and there is no optimization in the encoder file (jpge.cpp). SSE has shown us a jump in performance compared to the original code without opti

SPO600 Project Stage 2

Image
  Let’s start implementing jpeg-compressor ( https://github.com/richgel999/jpeg-compressor ).   In Stage 1, I mentioned that I would choose to implement jpeg-compressor using intrinsics. When looking at the two main files (jpge.cpp and jpgd.cpp). It is possible that SVE intrinsics can be applied to many functions because many of those have a common pattern of using arrays as input arguments, for loop, and while loop, which is very similar to scale volume functions in Lab 6. For example: However, it is not time efficient to do converting functions from the regular loop to the SVE2 loop. It may take days or even years to finish implementation, debugging, and testing for beginners. Thus, auto-vectorization should be a better solution for big projects. Jpeg-compressor is already optimized with SSE, which is an Intel optimization equivalent to SVE2, which is ARM. So, I think SSE should be disabled to avoid any platform conflict. Looking at jpeg-compressor documentation on GitHub

SPO600 Lab 6

#include <stdio.h> #include <stdbool.h> #include <arm_sve.h> #include "vol.h" int main () {     int x ;     int ttl = 0 ;     // ---- Create in[] and out[] arrays     int64_t * in ;     int64_t * out ;     in = ( int64_t * ) calloc ( SAMPLES , sizeof ( int64_t ));     out = ( int64_t * ) calloc ( SAMPLES , sizeof ( int64_t ));     // ---- Create dummy samples in in[]     vol_createsample ( in , SAMPLES );     // ---- This is the part we're interested in!     // ---- Scale the samples from in[], placing results in out[]     /*for (x = 0; x < SAMPLES; x++) {         out[x]=(int64_t) ((float) (VOLUME/100.0) * (float) in[x]);     }*/     svbool_t pg ; // [1]     svint64_t svin ; // [1]     svfloat64_t svv ; // [1]     const float64_t division = ( float )( VOLUME / 100.0 ); // [2]     for ( int i = 0 ; i < SAMPLES ; i += svcntd ()) // [3]     {         pg = svwhilelt_b64 ( i , SAMPLES ); // [4]         svin = svld1 ( p

SPO600 Lab 5

  In lab 5, I handed on an experiment where I benchmarked c programs that simulate the process of scaling volumes using different algorithms. The programs are separated, and this would cause inconsistent results because, for each run time, programs would create different random data of samples. Thus, I combine them into one to have a more reliable result for using mutual samples. Furthermore, I also set the number of samples to 300000, and the number of benchmark tests is 10000. For each test, I store the timer for each algorithm. In the end, I print the average time each algorithm takes. Here is my repository on GitHub: https://github.com/willvuong168/SPO600-Lab-5 Taking the advantage of a C++ course that I am taking, where I already had a timer program. All I have to do is put the timer before and after each scaling function. Before making an all-in-one program, each algorithm sums the sample's output and prints it out to the screen because each algorithm may produce a differ

SPO600 Project Stage 1

For the final project in the SPO600 course, I have to find an open-source project to implement it with the new ARM SVE2 instructions. Regarding open source projects, the first place that appears in my mind is GitHub, so I decided to look for one on GitHub. In Lab 6, I have implemented a C program from Lab 5 using SVE2 instructions, and this affects my decision to find similar open-source projects on GitHub. As recommended, I should look for an open-source project that is at library level and do processing on large data sets. To start with researching for a project, I went to the trending tab on GitHub, and it showed a huge list (100 pages) of C repositories. I filtered out the list by sorting it by its popularity, and I thought I should choose a project that does not have too many stars because if I choose a project that has many stars, there is less work to do with it because there are already many people have involved in the project. Thus, I went to the last pages of the list to fi

SPO600 Lab 4

Image
In lab 4, I have to do a simple program to print numbers from 0 to 30 with X86_64 and AArch64 Assembly. It was my bad to forget the passphrase of my SSH key, so I could not log in to the server to do the lab. Fortunately, I found a package on Ubuntu APT-GET library that simulates AArch64 architecture, which is QEMU . The command to install it: sudo apt-get install qemu-user gcc-aarch64-linux-gnu. With the program, I was able to compile and run AArch64 Assembly without ARM hardware like a Raspberry Pi 4. Back to the main program, this is my code to print from 0 to 9: .text .globl     main min = 0 max = 10   main:         mov      x19, min                         loop:         adr      x1, msg         mov      x2, len         mov      x0, 1         mov      x8, 64         svc      0                 adr      x23, msg         add      x19, x19, 1         mov      x20, x19         add      x20, x20, '0'         strb     w20, [x23,0]       

SPO600 Lab 3

Image
  For lab 3, my task is to write a simple program using the output screen and to make use of math operation in 6502 assembly. Even though it is a simple program with basic tasks, it is not that easy to write the program in assembly language, while we can do this easily in modern language. Especially we just learned a whole new concept for several classes. Luckily, the lab says: I decided to write a subtraction calculator. Inspired by the adding calculator from my professor’s Github:  (https://github.com/ctyler/6502js-code/blob/master/adding-calculator.6502) I made an investigation to understand how the code works, so later I can modify it to become a subtraction calculator. With a limited amount of time learning 6502 assembly and not many resources on the internet, I could only understand the main part of the program, but it was enough for me to make an incomplete subtraction calculator. My input process is identical to the source code. The only difference is instead of adding tw