ElasticSearch 备份

Posted on 2019-06-13 Edited on 2023-10-07 In Program

ElasticSearch 有两种备份方法

ES 快照备份
ElasticDump 备份

ES 快照备份

ElasticSearch 自带备份的方法，数据量大的时候，通常推荐这种方法。
集群的环境下，需要依赖共享文件系统 NFS，AWS S3， HDFS 文件系统等，以下用HDFS 为例。

备份到 HDFS 需要 repository-hdfs 插件的支持。下载下来放在了 /tmp 并安装，重启 ES，变可享用了。

1	./elasticsearch-plugin install file:///tmp/repository-hdfs-6.2.1.zip

构建仓库，以下创建一个名为 hdfs_backup 的仓库, 我们计划将数据备份到 hdfs://<hadoop-domain.com>:9000/data/es_backup 下面，如无意外会返回 {"acknowlege":"true"}

curl -XPUT http://<Port:IP>/_snapshot/hdfs_backup -H 'Content-Type:application/json' -d'
{  
  "type": "hdfs",  
  "settings": {
    "uri": "hdfs://<hadoop-domain.com>:9000",
    "path": "/data/es_backup"
  }
}'

选定索引备份，例如选择了 student 索引，如无意外会返回 {"acknowlege":"true"}，如果你希望命令阻塞直到备份完成。可以在 URL 尾部加上 wait_for_completion=true

curl -XPUT http://<Port:IP>/_snapshot/hdfs_backup/snapshot_1 -H 'ContentType:application/json' -d'
{
    "indices": "student"
}'

可使用以下命令查看备份的进度:

curl -XGET "http://<Port:IP>/_snapshot/hdfs_backup/*" '
{
  "snapshots" : [
    {
      "snapshot" : "snapshot_1",
      "uuid" : "kTdQRNs6RbO9LZ0LCfXLYg",
      "version_id" : 6060199,
      "version" : "6.6.1",
      "indices" : [
        "student"
      ],
      "include_global_state" : true,
      "state" : "IN_PROGRESS",
      "start_time" : "2019-06-13T07:41:47.556Z",
      "start_time_in_millis" : 1560411707556,
      "end_time" : "1970-01-01T00:00:00.000Z",
      "end_time_in_millis" : 0,
      "duration_in_millis" : -1560411707556,
      "failures" : [ ],
      "shards" : {
        "total" : 0,
        "failed" : 0,
        "successful" : 0
      }
    }
  ]
}

如何使用快照来恢复

curl -XPOST "http://<Port:IP>/_snapshot/hdfs_backup/snapshot_1/_restore?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
	"indices": "student","rename_replacement": "student"
}'

使用 ElasticDump 来备份

elasticdump 需要 nodeJs 的环境，适用于小数量的备份。其原理是用了 ES 的 Scroll 接口

将 mapping 和数据备份，并 gzip 到本地盘

1 2	elasticdump --input=http://<Port:IP>/student --type=data --limit=10000 --output=$ \| gzip > student.json.gz elasticdump --input=http://<Port:IP>/student --output=stduent_mapping.json --type=mapping

创建好mapping 之后，恢复数据，如果是 gzip 格式的，需要先解压

1	elasticdump --input=./student.json --output=http://<Port:IP>/student