terraform-从入门到放弃
注: 本文只简单的介绍Terraform的安装以及基本使用,文章基本可看作对Terraform官方文档的翻译以及部分自我理解。
Terraform旨在任何云以及数据中心上可以自动化的配置和管理资源,Terraform将云API或数据中心接口编码为声明性配置文件,实现基础设施即代码。(Infrastructure as Code,简称IaC,是一种管理和提供计算基础设施的方式,利用代码来自动化基础设施的部署和管理)
网站中描述Terraform的几点优势:
- 使用HCL(HashiCorp Configuration Language)在Terraform文件中将基础设施编写为代码,以便从任何基础设施提供商处调配资源。
- 构建基础设施自动化工作流程,以跨IT运营和开发团队编写、协作、重用和配置基础设施作为代码。
- 通过基于角色的访问控制、策略实施和审计,为安全性、合规性和成本管理建立护栏,实现标准化。
- 使用自助服务和基础设施作为代码将工作流自动化扩展到组织中的所有团队,并与VCS、ITSM(IT服务管理,如ServiceNow、Jira)和CI/CD集成。
使用场景:
- IaC,使用Terraform可以将多云管理编码为声明书代码,Terraform提供的插件生态使得可以使用统一、易于学习的配置语言来定义资源,代码被编码、共享、版本化,并且通过一致的工作流在所有环境中执行。
- 多云部署,Terraform采用单一的自动化工作流来管理多个基础设施和SaaS提供商,并处理跨云依赖关系,简化多云基础设施的生命周期管理和编排,适用于任何规模的环境。
- Terraform支持K8S,可以通过单一工作流程配置Kubernetes集群、周边服务和应用程序资源。
- 采用Consul-Terraform-Sync的网络基础设施自动化(NIA)可确保网络和安全基础设施安全地适应变化。Consul-Terraform-Sync根据Consul观察到的变化并通过Terraform进行管理,自动执行各种网络任务和工作流。这些任务可以由不同的事件触发,例如服务实例的扩展、服务地址或端口号的更改、服务标签、元数据或健康状况的更新等。这可以缩短交付时间并大大降低配置错误的可能性。
- HCP Terraform提供灵活的远程运行环境,可以轻松集成到现有的版本控制系统、CI/CD管道和IT服务管理界面中。这种与现有工作流的深度集成,减少了工具更改的需求,简化了启动和运行的过程,同时确保了平台团队和开发人员的工作一致性。
- HCP Terraform可以帮助您对团队可部署的基础设施配置实施策略。基于工单的审查流程通常会成为瓶颈,导致开发速度变慢。相反,您可以使用HashiCorp Sentinel(一种策略即代码框架),在Terraform进行基础设施变更之前,自动实施合规性和治理策略。
- HashiCorp Vault将每个动态密钥与租约(lease)关联,并在租约到期时自动销毁这些凭据。Vault支持与多种系统集成的动态密钥,并且可以通过插件轻松扩展功能。
- HCP Packer是用于云镜像及云镜像版本跟踪管理,并通过API提供这些信息。通过将常用的基础镜像可以被标准化、安全化,并通过自动化进行更新。
0.Terraform安装
这里只粘贴ubuntu系统的安装方法,Terraform也支持windows,mac,freebsd甚至solaris系统的安装,其他的方法参见文献1。
wget -O - https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform
1.入门样例
把大象放进冰箱一般情况只需要三个步骤,同样Terraform工作流核心包括三个阶段:
- 写下你的梦想。使用HCL语言编写声明式配置文件定义资源,资源可跨越多个云平台或数据中心。
- 规划。创建一个执行计划,描述根据现有基础设施和定义的配置来创建、更新或销毁的基础设施。
- 让梦想成真。Terraform按照预订的顺序执行相关操作,帮您圆梦。

啊,上帝赐给我一个像朱颖一样漂亮的女朋友
什么?定义她?哦,她姓朱,瓜子脸…哦,对了她坐我斜对面…
HCL可以被视为一种领域特定语言(DSL)。它被设计用于描述基础设施资源的配置和管理,特别是与HashiCorp的工具(如Terraform)一起使用时。
resource "aws_vpc" "main" {
cidr_block = var.base_cidr_block
}
<BLOCK TYPE> "<BLOCK LABEL>" "<BLOCK LABEL>" {
# Block body
<IDENTIFIER> = <EXPRESSION> # Argument
}
块是用于包含其他内容的容器,通常表示某种对象的配置,例如资源。块有块类型,可以有零个或多个标签,并且有一个包含多个参数和嵌套块的主体。大多数Terraform的功能是通过配置文件中的顶级块来控制的。 IDENTIFIER表示参数名,它们出现在块内。EXPRESSION表示一个值,可以是字面值,也可以通过引用和组合其他值来得到。
terraform {
required_providers {
vsphere = {
source = "hashicorp/vsphere"
version = "2.10.0"
}
}
}
provider "vsphere" {
user = var.vsphere_user
password = var.vsphere_password
vsphere_server = var.vsphere_server
allow_unverified_ssl = true
api_timeout = 10
}
data "vsphere_datacenter" "datacenter" {
name = "Datacenter"
}
data "vsphere_datastore" "datastore" {
name = "datastore1"
datacenter_id = data.vsphere_datacenter.datacenter.id
}
data "vsphere_resource_pool" "pool" {
name = "192.168.5.32/Resources"
datacenter_id = data.vsphere_datacenter.datacenter.id
}
data "vsphere_network" "network" {
name = "VM Network"
datacenter_id = data.vsphere_datacenter.datacenter.id
}
resource "vsphere_virtual_machine" "vm" {
name = "foo"
resource_pool_id = data.vsphere_resource_pool.pool.id
datastore_id = data.vsphere_datastore.datastore.id
num_cpus = 1
memory = 1024
guest_id = "otherLinux64Guest"
network_interface {
network_id = data.vsphere_network.network.id
}
disk {
label = "disk0"
size = 20
}
wait_for_guest_net_timeout = 0
wait_for_guest_ip_timeout = 0
}
以上代码片段,定义terraform的执行版本,配置管理资源所需的程序插件信息,具体到本例中因为需要在私有的vcenter中创建一个基础的虚拟机,因此必须依赖于hashicorp/vsphere。 当然如果你的资源定义于aws,你需要使用hashicorp/aws,定义于GCP(google云平台),你需要使用hashicorp/google,等等,基本上所有的云资源的provider都可以在hashicorp的registry官网中找到。providers赋予Terraform与云提供商、SaaS提供商和其他API进行交互的能力,provider中定义的参数,一般描述云资源的endpoint,区域或者其他的连接信息。 比如(https://registry.terraform.io/providers/hashicorp/vsphere/latest)介绍hashicorp/vsphere支持的参数。
data类型的block,允许terraform使用定义于配置文件之外的一些信息。基本上每种provider都会定于一些于资源相关的数据源,比如vsphere将网络,存储,数据中心等信息分别定义于vsphere_network,vsphere_datastore,vsphere_datacenter中。data "vsphere_datacenter" "datacenter"语句将vsphere_datacenter数据源中的数据导出到本地名称datacenter,这个名称用于在同一个Terraform模块的其他地方引用该资源,但在模块范围之外没有任何意义。
resources是Terraform语言中最重要的一个元素。每一个资源块描述一个或多个基础对象,如虚拟网络,计算实例,或者更高级的组件如dns记录。当然这里指你希望实例化的朱颖,这里她有1颗cpu,1024M的内存,20G的硬盘大小,准备给她安装其他版本的Linux系统,等等。资源相关的参数基本取决于资源类型,aws,gcp或者vmware可能会有完全不一样的参数名称或参数类型。详细语法请移步至参考文献2。
请确认您的订单
执行terraform init,terraform会下载相关的执行工具,构建环境至当前的.terraform目录,资源的创建更新销毁应该是由该可执行文件terraform-provider-vsphere负责。
tree -a
.
├── main.tf
├── .terraform
│ └── providers
│ └── registry.terraform.io
│ └── hashicorp
│ └── vsphere
│ └── 2.10.0
│ └── linux_amd64
│ ├── LICENSE.txt
│ └── terraform-provider-vsphere_v2.10.0_x5
└── .terraform.lock.hcl
terraform validate可以验证当前的配置语法是否有效,当然如果此刻执行将得到以下输出:
An input variable with the name "vsphere_user" has not been declared. This variable can be declared with a variable "vsphere_user" {} block.
根据官方文档,可以将这些变量定义于单独的variables.tf文件中。使用terraform validate -var 'vsphere_password=test'可以覆盖默认变量。
variable "vsphere_user" {
description = "username of the esxi vcenter"
type = string
default = "administrator@vsphere.local"
}
variable "vsphere_password" {
description = "password of the esxi vcenter"
type = string
default = "secret"
sensitive = true
}
variable "vsphere_server" {
description = "endpoint of the esxi vcenter"
type = string
default = "192.168.5.100"
}
terraform plan输出Terraform的具体执行计划。terraform使用vmware的provider生成vm资源,资源的配置按照main.tf中定义实现,未指定的参数配置使用默认值,部分资源信息需要在执行时刻生成。
terraform plan -var 'vsphere_password=secret'
data.vsphere_datacenter.datacenter: Reading...
data.vsphere_datacenter.datacenter: Read complete after 0s [id=datacenter-2]
data.vsphere_network.network: Reading...
data.vsphere_datastore.datastore: Reading...
data.vsphere_resource_pool.pool: Reading...
data.vsphere_network.network: Read complete after 0s [id=network-70]
data.vsphere_datastore.datastore: Read complete after 0s [id=datastore-10]
data.vsphere_resource_pool.pool: Read complete after 0s [id=resgroup-8]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# vsphere_virtual_machine.vm will be created
+ resource "vsphere_virtual_machine" "vm" {
+ annotation = (known after apply)
+ boot_retry_delay = 10000
+ change_version = (known after apply)
+ cpu_limit = -1
+ cpu_share_count = (known after apply)
+ cpu_share_level = "normal"
+ datastore_id = "datastore-10"
+ default_ip_address = (known after apply)
+ ept_rvi_mode = (known after apply)
+ extra_config_reboot_required = true
+ firmware = "bios"
+ force_power_off = true
+ guest_id = "otherLinux64Guest"
+ guest_ip_addresses = (known after apply)
+ hardware_version = (known after apply)
+ host_system_id = (known after apply)
+ hv_mode = (known after apply)
+ id = (known after apply)
+ ide_controller_count = 2
+ imported = (known after apply)
+ latency_sensitivity = "normal"
+ memory = 1024
+ memory_limit = -1
+ memory_share_count = (known after apply)
+ memory_share_level = "normal"
+ migrate_wait_timeout = 30
+ moid = (known after apply)
+ name = "foo"
+ num_cores_per_socket = 1
+ num_cpus = 1
+ power_state = (known after apply)
+ poweron_timeout = 300
+ reboot_required = (known after apply)
+ resource_pool_id = "resgroup-8"
+ run_tools_scripts_after_power_on = true
+ run_tools_scripts_after_resume = true
+ run_tools_scripts_before_guest_shutdown = true
+ run_tools_scripts_before_guest_standby = true
+ sata_controller_count = 0
+ scsi_bus_sharing = "noSharing"
+ scsi_controller_count = 1
+ scsi_type = "pvscsi"
+ shutdown_wait_timeout = 3
+ storage_policy_id = (known after apply)
+ swap_placement_policy = "inherit"
+ sync_time_with_host = true
+ tools_upgrade_policy = "manual"
+ uuid = (known after apply)
+ vapp_transport = (known after apply)
+ vmware_tools_status = (known after apply)
+ vmx_path = (known after apply)
+ wait_for_guest_ip_timeout = 0
+ wait_for_guest_net_routable = true
+ wait_for_guest_net_timeout = 5
+ disk {
+ attach = false
+ controller_type = "scsi"
+ datastore_id = "<computed>"
+ device_address = (known after apply)
+ disk_mode = "persistent"
+ disk_sharing = "sharingNone"
+ eagerly_scrub = false
+ io_limit = -1
+ io_reservation = 0
+ io_share_count = 0
+ io_share_level = "normal"
+ keep_on_remove = false
+ key = 0
+ label = "disk0"
+ path = (known after apply)
+ size = 20
+ storage_policy_id = (known after apply)
+ thin_provisioned = true
+ unit_number = 0
+ uuid = (known after apply)
+ write_through = false
}
+ network_interface {
+ adapter_type = "vmxnet3"
+ bandwidth_limit = -1
+ bandwidth_reservation = 0
+ bandwidth_share_count = (known after apply)
+ bandwidth_share_level = "normal"
+ device_address = (known after apply)
+ key = (known after apply)
+ mac_address = (known after apply)
+ network_id = "network-70"
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.
醒醒,天亮了
terraform apply -var 'vsphere_password=secret'命令让梦想成真。登录vcenter界面可以确认虚拟机确实已经按要求创建,但是由于没有操作系统,所以hangup在Operating System Not Found阶段。
其实这里可以考虑提前创建一个template的模板系统,在虚拟机创建时引用,或者方案二利用packer工具,在虚拟机创建后执行自动化的安装过程(其他文章单独介绍)。
注意:terraform对资源的操作是幂等的,比如在成功创建foo虚拟机后,参数不变的前提下再次执行apply,terrform不会执行任何操作。
terraform apply -var 'vsphere_password=secret'
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
vsphere_virtual_machine.vm: Creating...
vsphere_virtual_machine.vm: Creation complete after 2s [id=422c1ddb-571b-015c-79ab-0b335e5991c8]

当然如有必要不要忘记在天亮之前,调用terraform destroy -var 'vsphere_password=secret'销毁您的"资源"(朱颖)。
最后强调一点,terraform支持多种云以及数据中心,这里这是简单演示了vcenter这种简单场景,你也可以将其用作本地的docker管理,私有云,vmware,virtualbox,pve,gcp,aws,阿里云…..
2.进阶技巧
Terraform语言中以下元参数,可以与任何资源类型一起使用,以改变Terraform执行资源块的行为。
- depends_on, 指定资源块的隐式依赖,比如在main.tf定义2个资源A和B,但是资源A依赖于资源B的创建及运行
- count,实例化指定count数量的资源,多节点自动部署
- for_each,根据map或set类型的配置创建多实例,多节点自动部署
- provider,取代默认的provider(也可以称之隐式的provider)创建资源
- lifecycle,定制化资源的生命周期
- provisioner,在资源创建成功后指定指定动作
高级语法实现多节点自动化部署
- 使用count动态创建多个节点
variable "node_count" {
description = "count of node in vcenter"
type = number
default = 3
}
resource "vsphere_virtual_machine" "vm" {
count = var.node_count
name = "node-${count.index}"
......
clone {
template_uuid = data.vsphere_virtual_machine.template.id
}
}
- 使用for_each实例化不同角色的节点
resource "vsphere_virtual_machine" "nodes" {
for_each = tomap({
"master" = { cpu = 1, memory = 128 },
"worker1" = { cpu = 2, memory = 256 },
"worker2" = { cpu = 2, memory = 256 }
})
name = each.key
num_cpus = each.value.cpu
memory = each.value.memory
resource_pool_id = data.vsphere_resource_pool.pool.id
clone {
template_uuid = data.vsphere_virtual_machine.template.id
}
}
构建异地资源节点
默认情况下,Terraform会将资源类型名称中的第一个单词(由下划线分隔)作为该资源的provider。例如,资源类型google_compute_instance会自动与名为google的提供程序的默认配置相关联。 下面的例子中如果不显式指定使用google.europe,terraform会在gcp的美国机房创建计算实例。
provider "google" {
region = "us-central1"
}
provider "google" {
alias = "europe"
region = "europe-west1"
}
resource "google_compute_instance" "geurope" {
provider = google.europe
...
}
Terraform模块化代码复用
下面的例子中使用不同的方式复用代码
- locals变量
variable "def_base_cfg" {
description = "base configuration for vm in vcenter"
type = object({
num_cpus = number
memory = number
guest_id = string
disk = object({})
})
default = {
num_cpus = 1
memory = 512
guest_id = "otherLinux64Guest"
disk = {
label = "disk0"
size = 20
}
}
}
variable "def_net_cfg" {
description = "net configuration for vm in vcenter"
type = object({
adapter_type = string
})
default = {
adapter_type = "vmxnet3"
}
}
locals {
master = merge(var.def_base_cfg, var.def_net_cfg)
node_0 = merge(local.master, { num_cpus = 2, memory = 1024 })
node_1 = merge(local.master, { memory = 4096, adapter_type = null })
}
....
resource "vsphere_virtual_machine" "vm" {
for_each = tomap({
"master" = local.master,
"node_0" = local.node_0,
"node_1" = local.node_1,
})
name = each.key
resource_pool_id = data.vsphere_resource_pool.pool.id
datastore_id = data.vsphere_datastore.datastore.id
num_cpus = each.value.num_cpus
memory = each.value.memory
guest_id = each.value.guest_id
network_interface {
network_id = data.vsphere_network.network.id
adapter_type = each.value.adapter_type != null ? each.value.adapter_type : "e1000"
// dynamic块语法运行,判断是否存在adapter_type字段而设置adapter_type和不设置,以后再介绍该用法
}
disk {
label = "disk0"
size = 20
}
wait_for_guest_net_timeout = 0
wait_for_guest_ip_timeout = 0
}
output "print_vars" {
value = local.master
}
- terraform模块语法
terraform中模块是由一些列在特定目录下的.tf或tf.json文件组成,.tf文件中通常描述一个或多个资源。 以下是一个使用terraform module模块的入门样例的目录结构
tree -a
.
├── main.tf
├── modules
│ └── esxi
│ ├── LICENSE
│ ├── main.tf
│ ├── README
│ └── variables.tf
├── .terraform
│ ├── modules
│ │ └── modules.json
│ └── providers
│ └── registry.terraform.io
│ └── hashicorp
│ └── vsphere
│ └── 2.10.0
│ └── linux_amd64
│ ├── LICENSE.txt
│ └── terraform-provider-vsphere_v2.10.0_x5
└── .terraform.lock.hcl
这里在模块的variables.tf中定义task用户输入的参数(any类型),在执行过程中验证参数的正确性和有效性,通过merge函数使用用户参数更新默认参数。 最后使用output语法导出模块的配置信息,供用户调试使用。
variable "task" {
description = "task user input"
type = any
validation {
condition = alltrue([
can(var.task.vms),
alltrue([for i in var.task.vms : can(i.name)])
])
error_message = "task.vms field require"
}
}
variable "basic_cfg" {
description = "basic configuration for vm"
type = object({
num_cpus = number
memory = number
dc = string
ds = string
rp = string
guest_id = string
disk = object({
size = string
label = string
})
net = object({
network_name = string
adapter_type = string
})
})
default = {
dc = "Datacenter"
ds = "datastore1"
rp = "192.168.5.32/Resources"
num_cpus = 1
memory = 512
guest_id = "otherLinux64Guest"
disk = {
label = "disk0"
size = 20
}
net = {
network_name = "VM Network"
adapter_type = "vmxnet3"
}
}
sensitive = false
}
terraform {
required_providers {
vsphere = {
source = "hashicorp/vsphere"
version = "2.10.0"
}
}
}
locals {
conn = var.task.conn
tasks = try(tomap({ for vm in var.task.vms : vm["name"] => merge(var.basic_cfg, vm) }))
}
provider "vsphere" {
user = local.conn.user
password = local.conn.password
vsphere_server = local.conn.endp
allow_unverified_ssl = true
api_timeout = 10
}
data "vsphere_datacenter" "datacenter" {
for_each = tomap({ for k, v in local.tasks : k => v.dc })
name = each.value
}
data "vsphere_datastore" "datastore" {
for_each = tomap({ for k, v in local.tasks : k => v.ds })
name = each.value
datacenter_id = data.vsphere_datacenter.datacenter[each.key].id
}
data "vsphere_resource_pool" "pool" {
for_each = tomap({ for k, v in local.tasks : k => v.rp })
name = each.value
datacenter_id = data.vsphere_datacenter.datacenter[each.key].id
}
data "vsphere_network" "network" {
for_each = tomap({ for k, v in local.tasks : k => v.net })
name = each.value.network_name
datacenter_id = data.vsphere_datacenter.datacenter[each.key].id
}
resource "vsphere_virtual_machine" "vm" {
for_each = local.tasks
name = each.key
resource_pool_id = data.vsphere_resource_pool.pool[each.key].id
datastore_id = data.vsphere_datastore.datastore[each.key].id
num_cpus = each.value.num_cpus
memory = each.value.memory
guest_id = each.value.guest_id
network_interface {
network_id = data.vsphere_network.network[each.key].id
adapter_type = each.value.net.adapter_type != null ? each.value.net.adapter_type : "e1000"
}
disk {
label = "disk0"
size = each.value.disk.size
}
wait_for_guest_net_timeout = 0
wait_for_guest_ip_timeout = 0
}
check "user_args_check" {
assert {
condition = alltrue([
length(local.tasks) >= 1,
alltrue([for k, v in local.tasks : contains(keys(v.net), "network_name")])
])
error_message = "task.vms should be list(object)"
}
}
output "esxi_debug" {
value = local.tasks
}
variable "gcfg" {
description = "configuration of task"
type = any
default = {
conn = {
user = "administrator@vsphere.local"
password = "secret"
endp = "192.168.5.100"
}
vms = [
{ name = "master", disk = { size = 30 }, net = { adapter_type = "e1000", network_name = "VM Network" } },
{ name = "node-0" },
{ name = "node-1" }
]
}
}
module "test" {
source = "./modules/esxi"
task = var.gcfg
}
output "debug" {
value = module.test.esxi_debug
}
调试与单元测试
terraform常见如下调试手段:
terraform console子命令帮助开发者提供交互式的调试环境,在console中可以方便检查检查变量的值,对表达式进行eval- 环境变量TF_LOG=TRACE|DEBUG|INFO|WARN|ERROR显示terrform命令执行时的日志信息
- 使用output语法,输出期待打印的变量或表达式,
terraform plan也可以显示部分变量信息 - 使用文件记录变量值,利用provisioner可以将感兴趣的变量追加至文件
当然作为一门语言,terraform也支持单元测试,支持mock测试等常见的测试手段,真所谓麻雀虽小五脏俱全。
3.踩坑记录
- terraform初始化过程中下载provider速度比较慢
- 针对各种类型的provider其定义的资源,外部数据不同,文档繁杂
- 官方vmware示例中展示的集群中vm的创建销毁,实践中vcenter的esxi处于单机模式
其实terraform还具备状态管理以及单元测试其他高阶功能,限于篇幅,后续在深入理解terraform相关源码解读中再进行介绍