无垠之码

深度剖析代码之道


内核熵池

1.熵的定义


熵一般指的是一个系统的混乱程度。热力学中热力熵指分子无规则运动的激烈程度,分子无规则运动越激烈热力熵值越高。信息论中使用信息熵来描述系统的不确定性,信息熵越大,表明系统所含有用信息量越大,不确定度越大。

2.内核实现


Linux内核维护了一个熵池用来收集来自设备驱动程序和其它来源的系统噪音,这些设备噪音作为系统熵。系统噪音可以通过很多参数来评估,如内存的使用,文件的使用量,不同类型的进程数量等等。同时许多现代处理器或主板上带有硬件随机数生成器(如Intel的RDRAND和RDSEED指令,AMD的RNG等)。这些硬件可以产生高质量的随机数,内核可以直接从这些硬件设备获取随机数据。

文档描述[2]

内核文档中关于"随机"相关参数,man手册(man 4 random),sysctl -a -r 'kernel.random\..*':

  • boot_id: 每次启动时生成,启动后不再变化,与journalctl –list-boots展示一致;
  • uuid: 每次读取的时候生成,可以作为uuid的发生器;
  • entropy_avail: 熵池中有效熵的位数;
  • urandom_min_reseed_secs: 该字段已过时(决定urandom pool reseeding的时间间隔);
  • poolsize: 熵池大小,2.4版本内核可设置[32|64|…|2048],2.6后默认256只读;
  • write_wakeup_threshold: 熵池有效熵数目低于该参数, 唤醒试图向/dev/random写入的进程;

mknod命令可以创建random|urandom设备, mknod /dev/[random|urandom] c 1 [8|9]

系统启动早期,由于启动流程相对固定,实际熵池中有效熵不足(entropy-starved systems),于是在启动关机时读取一段随机数据,在系统启动时加载增加系统熵质量。

  1. 基于systemv-init操作系统

在/etc/rc.d/init.d/random或/etc/rcb.d/rc.local存在如下脚本

echo "Initializing random number generator..."
random_seed=/var/run/random-seed
if [ -f $random_seed ]; then
        cat $random_seed >/dev/urandom
else
        touch $random_seed
fi
chmod 600 $random_seed
dd if=/dev/urandom of=$random_seed count=1 bs=512

echo "Saving random seed..."
random_seed=/var/run/random-seed
touch $random_seed
chmod 600 $random_seed
dd if=/dev/urandom of=$random_seed count=1 bs=512
  1. 基于systemd的操作系统

基于systemd的操作系统(centos-7+, ubuntu-18+),使用systemd-random-seed服务完成随机种子的保存与加载功能,systemd-random-seed源码考虑的更加全面,参见文献[4]

systemctl cat systemd-random-seed.service 

[Unit]
Description=Load/Save Random Seed
Documentation=man:systemd-random-seed.service(8) man:random(4)
DefaultDependencies=no
RequiresMountsFor=/var/lib/systemd/random-seed
Conflicts=shutdown.target
After=systemd-remount-fs.service
Before=shutdown.target
ConditionVirtualization=!container

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/lib/systemd/systemd-random-seed load
ExecStop=/lib/systemd/systemd-random-seed save
TimeoutSec=10min

系统开关机流程:

  • multi-user.target -> basic.target -> sysinit.target -> systemd-random-seed.service
  • shutdown.target -> basic.target -> sysinit.target -> systemd-random-seed.service
  1. freebsd[13.2-RELEASE]

与Linux系统类似,Freebsd在关机时保存随机熵在下次开机时通过run_rc_script函数调用/etc/rc.d/下的random脚本的start函数导入熵,在关机时调用stop保存熵。

#!/bin/sh
#
#

# PROVIDE: random
# REQUIRE: FILESYSTEMS
# BEFORE: netif
# KEYWORD: nojail shutdown

. /etc/rc.subr

name="random"
desc="Harvest and save entropy for random device"
start_cmd="random_start"
stop_cmd="random_stop"

extra_commands="saveseed"
saveseed_cmd="${name}_stop"

save_dev_random()
{
	oumask=`umask`
	umask 077
	for f ; do
		debug "saving entropy to $f"
		dd if=/dev/random of="$f" bs=4096 count=1 status=none &&
			( chflags nodump "$f" 2>/dev/null || : ) &&
			chmod 600 "$f" &&
			fsync "$f" "$(dirname "$f")"
	done
	umask ${oumask}
}

feed_dev_random()
{
	for f ; do
		if [ -f "$f" -a -r "$f" -a -s "$f" ] ; then
			if dd if="$f" of=/dev/random bs=4096 2>/dev/null ; then
				debug "entropy read from $f"
				rm -f "$f"
			fi
		fi
	done
}

random_start()
{

	if [ -n "${harvest_mask}" ]; then
		echo -n 'Setting up harvesting: '
		${SYSCTL} kern.random.harvest.mask=${harvest_mask} > /dev/null
		${SYSCTL_N} kern.random.harvest.mask_symbolic
	fi

	echo -n 'Feeding entropy: '

	if [ ! -w /dev/random ] ; then
		warn "/dev/random is not writeable"
		return 1
	fi

	# Reseed /dev/random with previously stored entropy.
	case ${entropy_dir:=/var/db/entropy} in
	[Nn][Oo])
		;;
	*)
		if [ -d "${entropy_dir}" ] ; then
			feed_dev_random "${entropy_dir}"/*
		fi
		;;
	esac

	case ${entropy_file:=/entropy} in
	[Nn][Oo])
		;;
	*)
		feed_dev_random "${entropy_file}" /var/db/entropy-file
		save_dev_random "${entropy_file}"
		;;
	esac

	case ${entropy_boot_file:=/boot/entropy} in
	[Nn][Oo])
		;;
	*)
		save_dev_random "${entropy_boot_file}"
		;;
	esac

	echo '.'
}

random_stop()
{
	# Write some entropy so when the machine reboots /dev/random
	# can be reseeded
	#
	case ${entropy_file:=/entropy} in
	[Nn][Oo])
		;;
	*)
		echo -n 'Writing entropy file: '
		rm -f ${entropy_file} 2> /dev/null
		oumask=`umask`
		umask 077
		if touch ${entropy_file} 2> /dev/null; then
			entropy_file_confirmed="${entropy_file}"
		else
			# Try this as a reasonable alternative for read-only
			# roots, diskless workstations, etc.
			rm -f /var/db/entropy-file 2> /dev/null
			if touch /var/db/entropy-file 2> /dev/null; then
				entropy_file_confirmed=/var/db/entropy-file
			fi
		fi
		case ${entropy_file_confirmed} in
		'')
			warn 'write failed (read-only fs?)'
			;;
		*)
			save_dev_random "${entropy_file_confirmed}"
			echo '.'
			;;
		esac
		umask ${oumask}
		;;
	esac
	case ${entropy_boot_file:=/boot/entropy} in
	[Nn][Oo])
		;;
	*)
		echo -n 'Writing early boot entropy file: '
		rm -f ${entropy_boot_file} 2> /dev/null
		oumask=`umask`
		umask 077
		if touch ${entropy_boot_file} 2> /dev/null; then
			entropy_boot_file_confirmed="${entropy_boot_file}"
		fi
		case ${entropy_boot_file_confirmed} in
		'')
			warn 'write failed (read-only fs?)'
			;;
		*)
			save_dev_random "${entropy_boot_file_confirmed}"
			echo '.'
			;;
		esac
		umask ${oumask}
		;;
	esac
}

load_rc_config $name

# doesn't make sense to run in a svcj: config setting
random_svcj="NO"

run_rc_command "$1"

当然与Linux系统不同的是,Freebsd内核参数关于熵相关的参数的命名不同,Freebsd使用harvest.mask表达相似含义,harvest.mask_symbolic以符号表示配置熵收集的源,本文主要研究Linux内核熵池相关概念,Freebsd实现仅做参考不再赘述。

sysctl -a | grep kern |grep random
kern.randompid: 0
kern.random.fortuna.concurrent_read: 1
kern.random.fortuna.minpoolsize: 64
kern.random.rdrand.rdrand_independent_seed: 0
kern.random.use_chacha20_cipher: 1
kern.random.block_seeded_status: 0
kern.random.random_sources: 'Intel Secure Key RNG'
kern.random.harvest.mask_symbolic: PURE_RDRAND,[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
kern.random.harvest.mask_bin: 000000001000000111011111
kern.random.harvest.mask: 33247
kern.random.initial_seeding.disable_bypass_warnings: 0
kern.random.initial_seeding.arc4random_bypassed_before_seeding: 0
kern.random.initial_seeding.read_random_bypassed_before_seeding: 0
kern.random.initial_seeding.bypass_before_seeding: 1

代码梳理

  1. 内核启动时
asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
void start_kernel(void)
{
	char *command_line;
	char *after_dashes;

	set_task_stack_end_magic(&init_task);
	smp_setup_processor_id();
	debug_objects_early_init();
	init_vmlinux_build_id();

	cgroup_init_early();

	local_irq_disable();
	early_boot_irqs_disabled = true;

	/*
	 * Interrupts are still disabled. Do necessary setups, then
	 * enable them.
	 */
	boot_cpu_init();
	page_address_init();
	pr_notice("%s", linux_banner);
	early_security_init();
	setup_arch(&command_line);
	setup_boot_config();
	setup_command_line(command_line);
	setup_nr_cpu_ids();
	setup_per_cpu_areas();
	smp_prepare_boot_cpu();	/* arch-specific boot-cpu hooks */
	early_numa_node_init();
	boot_cpu_hotplug_init();

	pr_notice("Kernel command line: %s\n", saved_command_line);
	/* parameters may set static keys */
	jump_label_init();
	parse_early_param();
	after_dashes = parse_args("Booting kernel",
				  static_command_line, __start___param,
				  __stop___param - __start___param,
				  -1, -1, NULL, &unknown_bootoption);
	print_unknown_bootoptions();
	if (!IS_ERR_OR_NULL(after_dashes))
		parse_args("Setting init args", after_dashes, NULL, 0, -1, -1,
			   NULL, set_init_arg);
	if (extra_init_args)
		parse_args("Setting extra init args", extra_init_args,
			   NULL, 0, -1, -1, NULL, set_init_arg);

	/* Architectural and non-timekeeping rng init, before allocator init */
	random_init_early(command_line);

	/*
	 * These use large bootmem allocations and must precede
	 * initalization of page allocator
	 */
	setup_log_buf(0);
	vfs_caches_init_early();
	sort_main_extable();
	trap_init();
	mm_core_init();
	poking_init();
	ftrace_init();

	/* trace_printk can be enabled here */
	early_trace_init();

	/*
	 * Set up the scheduler prior starting any interrupts (such as the
	 * timer interrupt). Full topology setup happens at smp_init()
	 * time - but meanwhile we still have a functioning scheduler.
	 */
	sched_init();

	if (WARN(!irqs_disabled(),
		 "Interrupts were enabled *very* early, fixing it\n"))
		local_irq_disable();
	radix_tree_init();
	maple_tree_init();

	/*
	 * Set up housekeeping before setting up workqueues to allow the unbound
	 * workqueue to take non-housekeeping into account.
	 */
	housekeeping_init();

	/*
	 * Allow workqueue creation and work item queueing/cancelling
	 * early.  Work item execution depends on kthreads and starts after
	 * workqueue_init().
	 */
	workqueue_init_early();

	rcu_init();

	/* Trace events are available after this */
	trace_init();

	if (initcall_debug)
		initcall_debug_enable();

	context_tracking_init();
	/* init some links before init_ISA_irqs() */
	early_irq_init();
	init_IRQ();
	tick_init();
	rcu_init_nohz();
	init_timers();
	srcu_init();
	hrtimers_init();
	softirq_init();
	timekeeping_init();
	time_init();

	/* This must be after timekeeping is initialized */
	random_init();

	/* These make use of the fully initialized rng */
	kfence_init();
	boot_init_stack_canary();

	perf_event_init();
	profile_init();
	call_function_init();
	WARN(!irqs_disabled(), "Interrupts were enabled early\n");

	early_boot_irqs_disabled = false;
	local_irq_enable();

	kmem_cache_init_late();

	/*
	 * HACK ALERT! This is early. We're enabling the console before
	 * we've done PCI setups etc, and console_init() must be aware of
	 * this. But we do want output early, in case something goes wrong.
	 */
	console_init();
	if (panic_later)
		panic("Too many boot %s vars at `%s'", panic_later,
		      panic_param);

	lockdep_init();

	/*
	 * Need to run this when irqs are enabled, because it wants
	 * to self-test [hard/soft]-irqs on/off lock inversion bugs
	 * too:
	 */
	locking_selftest();

#ifdef CONFIG_BLK_DEV_INITRD
	if (initrd_start && !initrd_below_start_ok &&
	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
		pr_crit("initrd overwritten (0x%08lx < 0x%08lx) - disabling it.\n",
		    page_to_pfn(virt_to_page((void *)initrd_start)),
		    min_low_pfn);
		initrd_start = 0;
	}
#endif
	setup_per_cpu_pageset();
	numa_policy_init();
	acpi_early_init();
	if (late_time_init)
		late_time_init();
	sched_clock_init();
	calibrate_delay();

	arch_cpu_finalize_init();

	pid_idr_init();
	anon_vma_init();
#ifdef CONFIG_X86
	if (efi_enabled(EFI_RUNTIME_SERVICES))
		efi_enter_virtual_mode();
#endif
	thread_stack_cache_init();
	cred_init();
	fork_init();
	proc_caches_init();
	uts_ns_init();
	key_init();
	security_init();
	dbg_late_init();
	net_ns_init();
	vfs_caches_init();
	pagecache_init();
	signals_init();
	seq_file_init();
	proc_root_init();
	nsfs_init();
	pidfs_init();
	cpuset_init();
	cgroup_init();
	taskstats_init_early();
	delayacct_init();

	acpi_subsystem_init();
	arch_post_acpi_subsys_init();
	kcsan_init();

	/* Do the rest non-__init'ed, we're now alive */
	rest_init();

	/*
	 * Avoid stack canaries in callers of boot_init_stack_canary for gcc-10
	 * and older.
	 */
#if !__has_attribute(__no_stack_protector__)
	prevent_tail_call_optimization();
#endif
}

3.工具增熵


Linux系统允许外部随机数源向熵池提供额外的随机数据,帮助内核提供高质量随机数。

  1. rng-tools包含的rngd守护进程,它可以从外部硬件设备(如/dev/hwrng)读取随机数据,并将其注入内核熵池(通常通过/dev/random或/dev/urandom接口),rngd支持多种外部随机数发生源,如硬件安全模块(HSM)、USB设备等,可以有效地提高熵池的随机性。
  2. haveged
  3. jitterentropy-rngd
  4. randomsound
  5. virtio-rng,为解决在虚拟化环境中,物理硬件资源有限,随机事件相对较少,虚拟机熵池可能不足的问题,虚拟机管理程序(如KVM、VMware和Xen)支持通过虚拟机监控器(VMM)向客户机提供随机数源。例如,virtio-rng是一种虚拟设备,它允许虚拟机从宿主机获取随机数据,virtio-rng的驱动会将随机数注入客户机的内核熵池中,从而提升虚拟机中的熵量。
  6. fips-certified entropy solutions
  7. entropy broker

rng-tools

4.参考文献

  1. Linux内核中的随机数生成
  2. 内核参数文档
  3. 内核设备号约束
  4. systemd-random-seed源码
  5. 内核代码片段
comments powered by Disqus