SPO Lab 7 (Part A – Part B) Fun with Inline Assembly!

So in this class, we had worked with C and with Assembly separately.  Now we get to work with it together! That’s right we get to use inline assembly with our C code!!! This was a hard lab, but I feel better for it.

Part A:

Convert one of our C functions into assembly. Which seemed kinda daunting and aggravating because, now you got convert lovely C code, into Assembly…

So we turned this for loop:

for(p = output; p < output + sizeof(int16_t) * SIZE;){
  idx = (unsigned short) data[i];
  res.sum += output[i] = table[idx];
  p = table + i;

To this:

for(p = output; p < output + sizeof(int16_t) * SIZE;){
 __asm__ ("LD1 {v0.8h}, [%0]; \
 DUP v1.8h, w20; \ 
 SQDMULH v0.8h, v0.8h, v1.8h; \ 
 ST1 {v0.8h}, [%0]"
 : //no output
 : "r"(p),"r"(volint) //register holding pointer (refer as %0), then volint register (refer as %1)
 p += 16;

Are code turned out to be a lot faster, then the normal version. We tested  with 1 million data, and while it took the normal system 15 sec, the inline assembly took 1

I believe inline assembly is very useful and can help benefit a processor, but the main problem is that you need to optimize it per assembler which can take a lot more time than just writing it with C.

Part B:

This Part of the lab we had to find an open source project that uses inline assembly. I choose the project to mosh, which only really uses inline assembly one time within the entire project.

Here is the code:

#if __GNUC__ && !__clang__ && __arm__
 static inline block double_block(block b) {
 __asm__ ("adds %1,%1,%1\n\t"
 "adcs %H1,%H1,%H1\n\t"
 "adcs %0,%0,%0\n\t"
 "adcs %H0,%H0,%H0\n\t"
 "it cs\n\t"
 "eorcs %1,%1,#135"
 : "+r"(b.l), "+r"(b.r) : : "cc");
 return b;
 static inline block double_block(block b) {
 uint64_t t = (uint64_t)((int64_t)b.l >> 63);
 b.l = (b.l + b.l) ^ (b.r >> 63);
 b.r = (b.r + b.r) ^ (t & 135);
 return b;

Questions about this code:

  • How much assembly-language code is present
    • Not a lot, it is only used once in the project
  • Which platform(s) it is used on
    • The assembly language is for arm processors
  • Why it is there (what it does)
    • It improves this function for assembly processors to improve its portability between assembly language
  • What happens on other platforms
    • On other platforms, it run just the C version of the code in the else statement
  • Your opinion of the value of the assembler code VS the loss of portability/increase in complexity of the code.

I feel like if you write assembly for every processor you make it more challenging and more complex for people developing that project.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s