My Avatar

Shilong ZHAO

Understanding Linux Processes, Memories and Pages

2015-06-27 00:00:00 +0200

In case you have any questions or suggestions, you can leave comments HERE . Thanks!

A Linux process is represented by structure struct task_struct, defined in <linux/sched.h>. The structure is called a process descriptor and contains all the information about a specific process. The state field of process descriptor, for example, gives information about the current state of the process.

struct task_struct {
	volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */
	struct mm_struct *mm, *active_mm;

The field to our interest is struct mm_struct *mm. The structure is called memory descriptor. It is defined in <linux/mm_types.h> and is used by the kernel for representing a process’s address space.

Remember that Linux is a virtual memory operating system. The memory descriptor of a process thus describes the virtual memories addressable and accessible by the corresponding process.

struct mm_struct {
	struct vm_area_struct *mmap;            /* list of VMAs */
	struct rb_root mm_rb;
	pgd_t *pgd;

The memory descriptor contains a list of virtual memory areas (VMA), i.e. struct vm_area_struct *mmap defined in <linux/mm_types.h>. The reason why it needs a list is that Linux provides segmented address space. The segments of a process are not in a single linear range, but in multiple segments. So, each VMA describes such a continuous interval of a segment.

The mm_rb is a red black tree of all the VMAs. The list is used when each node needs to be traversed and the tree is used when only a specific VMA is needed.

in a VMA, the vm_start field denotes the initial address, and vm_end the end of the interval.

struct vm_area_struct {
	/* The first cache line has the info for VMA tree walking. */

	unsigned long vm_start;         /* Our start address within vm_mm. */
	unsigned long vm_end;           /* The first byte after our end address
										within vm_mm. */

Applications work on virtual memories, but the processor operates directly on the physical memories. The kernel represents every physical page with a struct page structure, also defined in <linux/mm_types.h>. The virtual address has to be translated into physical address. This translation is done by page table. The virtual memories are split into chunks. Each chunk is used as an index into page table. The table either points to another table or the associated physical address.