Challenge #18

This challenge has more code, but according to the description is not a big deal. Here is the short description that you can find always here:

Now this is easy. Keep in mind, the code is 64-bit, because it uses 64-bit value(s). That is why I’ve omitted code fragments for 32-bit ARM and MIPS. So what does it do?

As always with large code, will avoid to display the whole code in this article, because large code of assembly can cause brain damage. I bet there’s a “legit” study about that…academy. Also because you can find the code example in the original website here.

Analysis

As always let’s break this code into blocks, in order to understand it better. The first block that I’m going to analyze it’s this one

    ;; check if the string in rsp
    ;; has length 36
	sub	rsp, 24
	mov	QWORD PTR [rsp], r8
	mov	QWORD PTR [rsp+8], r9
	call	strlen
	cmp	rax, 36
	mov	edx, -1
	jne	.L28 ;; EXIT PROGRAM

	mov	r12, rbx ;; copy rdi string into r12
	xor	ebp, ebp ;; ebp = 0
	jmp	.L35

I added some notations into the block code, here we have a call to strlen, famous C function, to know the length of a string(assuming it’s \0 terminated). In case our string length is not 36 we jump to tag .L28, where is the exit of the program, returning -1. This should count as a failure example, so on failure we are returning -1.

Another thing to identify here, it’s the presence of a loop, between tags .L33 and .L.42. Over what iterate this loop? What is the stop condition here? It’s easy to notice that we are iterating over the string we just copied into r12 register. Take a look at this snippet

;; START OF A LOOP
;; ------------------>
.L33:
    ;; if char it's hex, end program with -1
	movsx	edi, BYTE PTR [r12]
	call	isxdigit
	test	eax, eax
	je	.L37 ;; EXIT PROGRAM WITH -1

;; ... continuation other tags come between these two

.L42:
    ;; check if we are in the last char
	cmp	BYTE PTR [r12], 0
	jne	.L33
    ;; <<<<<<<<<<<<<<<<<<<
    ;; END OF A LOOP
    ;; <<<<<<<<<<<<<<<<<<<

If you noticed, at the end of tag .L42 we have a check for the amazing \0 null terminating character. Also at the start of the tag .L33, we have a copy of one character from the string in r12. This give us the certainty we are iterating over string in r12.

Another point to notice here, is the call to isxdigit, for checking if the current character it’s a hexadecimal digit. In case is not, we finish the program as well with -1. Given that this is inside a loop, we can infer that each character in the string should be a valid hexadecimal digit, otherwise we exit the program with failure code -1.

We haven’t analyzed everything and we have already an idea, that this program it’s checking if all the characters on the string are hex. Cool, let’s continue

The loop

The body of the loop gives us some insights as well. For example tag .L32, contains the increments of the loop, here you can notice that we have a counter in ebp register, and we also increment the string in r12 of course.

.L32:
    ;; increase counter and string pointer position
	add	ebp, 1
	add	r12, 1
	cmp	ebp, 37
	je	.L34
.L35:
	cmp	ebp, 8
	jne	.L43
.L29:
    ;; if negative sign, continue
	cmp	BYTE PTR [r12], 45 ;; '-' ascii, negative number
	je	.L32
.L37:
    ;; if we get here the program will exit with -1
	mov	edx, -1

Another operation here is the comparison to 37 of our counter in ebp. Remember our string’s length must be 36. Also keep in mind C strings have a null terminated character.

We can also see that on tag .L29 we perform a check to '-' character, ignoring it in case we found it. Literally jumping to the increment steps.

Now here there’s something interesting as well, on tag .L35 we check our counter with 8, in case it’s not equal we jump to tag .L43. Which also have several checks on our counter in ebp. The checks can be resumed in this way: if counter is 13, 18 or 23 check for '-' on tag .L29, if counter is 36 end the program. The default case, is just to read another character from our string on tag .L33.

First question that comes to my mind is, why 13, 18 and 23? That question it’s easy to answer when you take a look at tag .L34. I’ll add it here as well, with some comments. In this part of the code, we make use of strtoul to convert chunks of the string, from hexadecimal string representation to a unsigned long in C. These chunks of the string are divided into 5 ranges, from 0 to 8, from 9 to 13, from 14 to 18, from 19 to 23 and from 24 to 36. The results of these conversions are been stored in r15, r14, r13 and rcx.

;; ANALYZING ON 5 STEPS
.L34:
    ;; rbx from 0 to 36
    ;; ------------------------------------------------------------------------------------------------------------------|
    ;;          8                   4                       4                       4                       12           |
    ;; rbx --- rbx + 8| rbx + 9 ---- rbx + 13 | rbx + 14 ---- rbx + 18 | rbx + 19 ---- rbx + 23 | rbx + 24 ---- rbx + 36 |
    ;; ------------------------------------------------------------------------------------------------------------------|

    ;; convert string in rdi, intial string into number
	mov	edx, 16 ;; 3rd argument, base of number to convert
	xor	esi, esi ;; 2nd argument, esi it's passed as NULL, we don't need to store the address of the first valid char
	mov	rdi, rbx ;; 1st argument
	call	strtoul

	lea	rdi, [rbx+9] ;; 1st arg
	mov	edx, 16 ;; 3rd arg
	xor	esi, esi ;;  2nd arg
	mov	DWORD PTR [r15], eax ;; value of number just converted on previous call
	call	strtoul

	lea	rdi, [rbx+14] ;; 1st arg
	mov	edx, 16 ;; 2nd arg
	xor	esi, esi ;; 3rd arg
	mov	WORD PTR [r14], ax ;; value of previous
	call	strtoul

	lea	rdi, [rbx+19] ;; 1st arg
	mov	edx, 16 ;; 2nd arg
	xor	esi, esi ;; 3rd arg
	mov	WORD PTR [r13+0], ax ;; value of previous call
	call	strtoul

	mov	rcx, QWORD PTR [rsp]
	lea	rdi, [rbx+24] ;; 1st arg
	mov	edx, 16 ;; 2nd arg
	xor	esi, esi ;; 3rd arg
	mov	WORD PTR [rcx], ax ;; previous call returned
	call	strtoull

	mov	rcx, QWORD PTR [rsp+8]
	xor	edx, edx ;; goes as 0
	mov	QWORD PTR [rcx], rax
	jmp	.L28

Flow

Let’s put this into a flow diagram, will be easier to understand

Flow Diagram

Looking at the flowchart, what we have it’s easier to understand. We are iterating over the string with this format <8 hex digits>-<4 hex digits>-<4hex digits>-<4 hex digits>-<12 hex digits>. Now we have it!!! This is the format for an uuid, for example 45ab0c9c-873f-422e-963c-27d13c3fdac9.

Formal description

We are checking if the string provided has an uuid format.

Conclusion

Quite interesting how things are implemented in assembly.