Challenge RE #11

Ups!! I’ve been complaining about the pasts challenges. Well this one it’s way more bigger than the other ones. The description as always it’s in the original website of the challenge.

The assembly code to understand, it’s quite large, comparing to our previous challenges, so it’s better that you take a look at it on the official website, here it’s a link to it.

Analysis

Given that this code contains more than one function, and the code it’s fairly large compared to our previous challenges, we are not going to reproduce the exact C code for this case. Actually this is actually optimal, ideally what you have to do is to understand the program. For that reason, this time we are going to solve the challenge by creating a flow chart of the program, and been able to provide a one paragraph description of what this program does. Let’s start with the helper function.

Helper function

As always, let’s start breaking the code into pieces. First we encounter a helper tag, that contains exactly that, a helper. Kind of unconventional, I mean there are several operations in this helper with specific constants. In the first 4 instructions, we have a comparison of rdi-48 <= 9. If the condition it’s met we return the helper with value 1 in eax. Otherwise we perform another comparison, this time (edi & -33) - 65 <= 5, in case the condition holds we set al to 1, which will make us return 1 again. The helper will return 0, only when these two conditions doesn’t holds.

Now, what the heck does this helper? I mean, let’s go again to ASCII table, you will notice that 48 it’s the code for 0. Then in the first comparison, we are checking if we get a digit. The second condition it’s more tricky, specially for the a & -33, what’s this for? The following operation it’s a subtraction to 65, which it’s A. This condition seems to be a check if the ascii code it’s one of the letters A, B, C, D, E, and F.

Yes!! Basically checks if the char been passed it’s a hex number from 0 to A...F.

Let’s rename this method to is_hex_char.

f signature

The function f starts with a typical save of registers into the stack, to preserve their values when we get out of the function, in case other process use them. That means, that these registers are going to be modified in the function f.

The arguments seems to be only two, one in rsi and another one in rdi. The argument in rdi is a C string, given the use of repnz scasb instruction, performing a comparison of the value in al with string in rdi. The value in al at this point it’s 0, so it’s basically scanning for null terminated character in the string. After this operation rcx counter will have the length of the string. The returned value it’s just an int. With this we can infer that the signature of f it’s as follows:

int f(char *str, char *str2);

Main flow

At the start of the f function we perform a strlen call, to the first string been passed. This give us the tip that we will iterate over this string and not the second one. To confirm our suspicious we can look for loops in our f function. This can easily confirmed in tag .L12 where we jump to .L6 that starts with a check like these one

cmp rbx, r14
ja .L24
;; .L24 starts end of the program

We can see that indeed we are iterating over this string. Then we have a loop. Inside the loop we have a couple of conditional checks, the first one is the check if the current char we are reading it’s equal to '+', this can be found on the following instructions

    cmp eax, 43 ;; '+'
    jne .L7
;; ...
.L7
    cmp eax, 37 ;; '%'
    jne .L8
;; ...
.L8
    ;; ...

In tag .L8 what we have it’s a copy of our current char into the second string, and skip to next char or the next 2 chars in our first string. With this in mind, we can assume that our second string, will contain our answer.

Following all these jumps, I got the following flow

Flow

Now having a broad view of the program, we can make a guess of what the program does.

Formal description

The program will scan the provided string in first argument, replacing occurrences of %<hex number> format with the hex number been supplied. In case the the '+' is present an space will be added in his place. The resulting string will be stored in the second string passed. Which can be assumed will have enough space for this.

I would say this just performs the same as scanf but only formatting hexadecimal numbers.

Conclusion

In case of large assembly code it’s highly recommended to not manually try to translate the code into C code. A better approach would be just to try to come up with the flow. Of course, I’m aware of the existence of tools like Ida64, that will perform this for you, but for that you’ll need the binary. Again the goal of these challenges is precisely get you used to read and understand assembly language. Later when we will have an actual binary will make use of Ida for sure(the free version I’m poor and unemployed at this time 😅).