Challenge RE #7

Introduction

From my previous posts you can notice that I’ve been poisoning myself with small doses of assembly language and C. The best combination for a fast effect 😁, soon I will be immune to it. To be honest I’ve been enjoying the challenges, because all of them are accessible, in the same sense that David Hilbert said:

A mathematical problem should be difficult in order to entice us, yet not completely inaccessible, lest it mock at our efforts. It should be to us a guide post on the mazy paths to hidden truths, and ultimately a reminder of our pleasure in the successful solution.

Should be enjoyable in general, without frustrating you. Wanted to remark that because the guy who made this challenges, Dennis Yurichev, did a great job on it.

Enough for the talk. Let’s see the 7th challenge. The assembly code to understand is the following

<f>:
   0:                movzx  edx,BYTE PTR [rdi]
   3:                mov    rax,rdi
   6:                mov    rcx,rdi
   9:                test   dl,dl
   b:                je     29 
   d:                nop    DWORD PTR [rax]
  10:                lea    esi,[rdx-0x41]
  13:                cmp    sil,0x19
  17:                ja     1e 
  19:                add    edx,0x20
  1c:                mov    BYTE PTR [rcx],dl
  1e:                add    rcx,0x1
  22:                movzx  edx,BYTE PTR [rcx]
  25:                test   dl,dl
  27:                jne    10 
  29:                repz ret

Analysis

The first 4 instructions, give us the impression we are dealing with a string in rdi register. Specially for the copy of character and the jump to the end of the program. The character it’s copied into edx register. Let’s keep describing the program before we have a complete signature of f.

Next to this, we can see the following instructions

lea esi,[rdx-0x41]
cmp sil,0x19
ja 1e

Something interesting here is the lea esi,[rdx-0x41] instruction which give us the clue that in rdx we might have something with more than 65 bits. Why 65? The magic here is that 0x41 or 65 it’s the ASCII code for the character 'A'. Then when we combine these two instructions, what we are checking is if the character is NOT between 'A' and 'Z' ASCII characters. Basically if belongs to the lowercase characters in the English alphabet. If that’s the case we jump then to 1e memory position.

Now on this memory position we have the following instructions

add rcx,0x1
movzx edx,BYTE PTR [rcx]
test dl,dl
jne 10

Which will pass to the next character in the sequence, and continue with the loop in case the character is not the '\0' character.

The last instructions to analyze, are the following

add edx,0x20
mov BYTE PTR [rcx],dl

Remember that at this point we have checked if the character is an lowercase letter, so what we have at this point need to be an uppercase letter. When we add 0x20, to an uppercase letter we will get its corresponding lowercase letter.

For example:

'A' ASCII code is 65, after adding 0x20 would be 97, which is indeed the ASCII code for 'a'

With this we already know what the code does, it’s lowercasing a provided string. The code in C would be like this:

void f(char *str)
{
    if (*str == '\0')
        return;
    
    while (*str != '\0') {
        if (*str - 0x41 > 0x19) {
            str++;
            continue;
        }

        // lowercasing a character in case is latin letters
        *str += 0x20;
        str++;
    }
}

Which can be expressed shorter as

void f(char *str)
{
    for ( ;*str != '\0'; str++) {
        if (*str - 0x41 <= 0x19) {
            *str += 0x20;
        }
    }
}

Conclusion

That’s it!! The code performs a basic lowercase of a string.