Skip to main content

Posts

Inline Assembler (Lab 7)

Part 1 After given an Inline assembler version of the volume program I made in the last lab, I got some results that shocked me. After running it with the same 500,000,000 sample size It took only 1.2 seconds of computing time, which is better than even the best variant (bit-shifting) of the program I had made by over 50%. I answered some questions below to further my understanding: 1. What is another way of defining variables instead of the (type name register) format? This can be done using normal type variables as the compiler will automatically put values into registers. 2. For the line vol_int = (int16_t) (0.5 * 32767.0); should 32767 or 32768 be used? 32767 should be used because the int will round the value and 32768 is not in the int16_t range. 3. What does __asm__("dup v1.8h,w22"); do? The duplicate simply means copy the int value into a new vector register . This is for SIMD instructions. 4. What happens if we remove : "=r"(in_cursor)
Recent posts

Final Project Part 02 - Final Summary

In conclusion, the -O3 flag was the most important discovery with trying to optimize clib. It offered a significant speed up with no interference, and provided the chance to uniform a many times used function, strdup. Overall the function is built extremely well with very advanced logic. Attempting to alter said logic sprouted many errors and warnings and left only simple compiler optimizations such as loop unrolling which made small differences in speed up. Clib as a whole is a great idea, offering many compartmentalized features for the C programming language that programmers could definitely find useful when developing. I hope in the future I can get more involved in writing code for open source projects such as clib whether that be doing optimization work or building from the ground up. This project not only gave me an insight on how these open source projects work but also at real world coding and improvement opportunities. I can honestly say now that I have had some experience

Final Project Part 02 - Sha1 Function Enhancements

To try to squeeze out a bit more performance I attempted to some compiler optimizations. Unfortunately, due to the sheer complexity of the algorithm, I was unable to find other logic complexities to simplify. I tried some loop unrolling to make the compiler have to work a little less, some examples are here below: I made a graph to demonstrate the minute differences this makes in the test vectors below: At most a few millisecond difference is all that can be acquired, and this is only from the finalcount[] array as the digest array produces errors if not compiled in a loop along with other for loops in the code. To test this I simply altered the sha1.c code and ran the make file to see if the vectors passed or failed. As mentioned this is a compiler optimzation, in other words it is completed already, especially at the -O3 level where the benchmarking was done. I would not  recommend this change to be pushed upstream normally due to the insignificant time change

Final Project Part 02 - Compiler Flags

As mentioned in Part 01 of the project a very noticeable and easy to apply optimization was the -O3 compiler flag as before there was no optimization done at all. This seemed like the -O3 flag was simply forgotten or overlooked, however as I attempted to apply the flag I came across some issues that were disguised by only applying minimal optimization. When I first added the flag to both the STATIC and #else macros in the Makefile I got this issue in the strdup.h file in deps/strdup: fatal error: expected identifier or '(' before '__extension__' After some research and code manipulation I have discovered there are many work arounds to this problem. For one, I could use a different Makefile for the Sha1 function such as in the development version, however this seems impractical and is more like a temporary band-aid solution. Secondly I could alter the strdup function is strdup.h so it does not share the same name as the same function present in string.h which I piece

Final Project Part 01 - Final Summary

To end part 1 I will summarize what information I have gathered for part 2: I am optimizing clib, a package manager. After benchmarking, it is clear there is a significant time delay in some advanced calls of the SHA1 function, such as ones that call update many times. To optimize, I am going to add the -O3 flag and remove a loop condition (currently). Some other observations: This project is relatively small with no ./configure for other platforms. The Sha1 code is unique and does not conform to the simple sha1 algorithm such as on    Wikipedia . The documentation (i.e. README) is relatively vague at describing the dependancies. It suggests only syntax that implies installation and isn't clear at documenting development vs. published code.   I have learned alot getting to this point in part 1. Firstly, I learned that library files can only be linked by using sudo ldconfig and the files must be in usr/lib. Secondly, I learned how to alter an advanced Makefile's fla

Final Project Part 01 - Benchmarking and how to Optimize the Sha1 Function in clib

To benchmark the sha1test file I copied the flags from the standard make file of clib and placed it in the local dev makefile of sha1. It yielded the following results after I used clock to time the functions in each test: The revised timing code can be found here for personal testing. Now we can see the majority of the time is spent in Test 6, so we are going to apply our optimization and hopefully see a change in the speed of that test. The current cflags of the standard clib Makefile are as follows: -DCURL_STATICLIB -std=c99 -Ideps -Wall -Wno-unused-function -U__STRICT_ANSI__ $(shell deps/curl/bin/curl-config --cflags) I plan on adding the -O3 optimization here. The challenge will be to either ensure that this does not affect any other dependancy or I will have to make a special case in the Makefile for only the SHA1 function. I also plan to take out the if condition below in the sha1.c file and putting the if case in another loop.    This appears to be the only optimi