当前位置 : 首页 » 文章分类 :  开发  »  Elasticsearch-Api-索引操作

Elasticsearch-Api-索引操作

Elasticsearch-Api-索引操作

index 相当于数据库的表,是 Elasticsearch 数据管理的顶层单位


PUT /index 创建索引

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Create index API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html

PUT /<index> 创建索引

例1、使用默认配置、不指定 mapping 创建索引 article
curl -X PUT 'http://localhost:9200/article'
返回如下

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "article"
}

例2、指定分片、副本数创建索引 my-index-000001

PUT /my-index-000001
{
  "settings": {
    "index": {
      "number_of_shards": 3,  
      "number_of_replicas": 2 
    }
  }
}

body 请求体可以简化,无需指定 index 块,如下:

PUT /my-index-000001
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 2
  }
}

例3、指定分片数、mapping创建索引 test

PUT /test
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "field1": { "type": "text" }
    }
  }
}

例4、指定mapping、基于ik的自定义分析器+同义词+停用词创建索引

PUT /test
{
  "mappings": {
    "properties":{
        "title": {
            "type": "text",
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_smart",
            "fields": {
                "keyword":{
                    "type":"keyword"
                }
            }
        },
        "content": {
            "type": "text",
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_smart"
        }
     }
  },
  "settings": {
      "index": {
          "analysis": {
              "filter": {
                "my_synonyms": {
                  "type": "synonym_graph",
                  "synonyms": [
                    "应用,服务",
                  ]
                },
                "my_stop": {
                  "type": "stop",
                  "ignore_case": true,
                  "stopwords": ["的", "一", "不", "在", "人", "有", "是", "为", "以", "于", "上", "他", "而", "后", "之", "来", "及", "了", "因", "下", "可", "到", "由", "这", "与", "也", "此", "但", "并", "个", "其", "已", "无", "小", "我", "们", "起", "最", "再", "今", "去", "好", "只", "又", "或", "很", "亦", "某", "把", "那", "你", "乃", "它", "吧", "被", "比", "别", "趁", "当", "从", "到", "得", "打", "凡", "儿", "尔", "该", "各", "给", "跟", "和", "何", "还", "即", "几", "既", "看", "据", "距", "靠", "啦", "了", "另", "么", "每", "们", "嘛", "拿", "哪", "那", "您", "凭", "且", "却", "让", "仍", "啥", "如", "若", "使", "谁", "虽", "随", "同", "所", "她", "哇", "嗡", "往", "哪", "些", "向", "沿", "哟", "用", "于", "咱", "则", "怎", "曾", "至", "致", "着", "诸", "自"]
                }
              },
              "analyzer": {
                "ik_max_custom": {
                  "type": "custom",
                  "tokenizer": "ik_max_word",
                  "filter": [
                    "my_synonyms", "my_stop"
                  ]
                },
                "ik_smart_custom": {
                  "type": "custom",
                  "tokenizer": "ik_smart",
                  "filter": [
                    "my_synonyms", "my_stop"
                  ]
                }
              }
          }
      }
  }
}

GET /index 查询索引

Elasticsearch Guide [7.16] » REST APIs » Index APIs » Get index API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-index.html

curl -X GET 'http://localhost:9200/article'
返回

{
    "article": {
        "aliases": {},
        "mappings": {},
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "number_of_shards": "1",
                "provided_name": "article",
                "creation_date": "1643005891480",
                "number_of_replicas": "1",
                "uuid": "phMOfBkAT8yE6lE1WHQniA",
                "version": {
                    "created": "7160299"
                }
            }
        }
    }
}

DELETE /index 删除索引

Elasticsearch Guide [7.16] » REST APIs » Index APIs » Delete index API
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/indices-delete-index.html

curl -X DELETE 'localhost:9200/article'
返回如下

{
    "acknowledged": true
}

ignore_unavailable 幂等删除(不存在时不报错)

默认情况下,如果 index 不存在,会报错返回 “status”: 404 index_not_found_exception
添加参数 ignore_unavailable=true 后即使 index 不存在也不会报错,接口变为幂等的。
curl -X DELETE localhost:9200/article?ignore_unavailable=true

java 中:

DeleteIndexRequest request = new DeleteIndexRequest("index_name"); 
request.indicesOptions(IndicesOptions.lenientExpandOpen());  // 设置 ignoreUnavailable 为 true

GET /index/_mapping 查询mapping

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Get mapping API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-get-mapping.html

curl -X GET 'http://localhost:9200/article/_mapping'
无 mapping 结果如下:

{
    "user_1.24.14": {
        "mappings": {}
    }
}

PUT /index/_mapping 修改mapping

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Update mapping API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-put-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/indices-put-mapping.html

添加新字段

PUT /my-index-000001/_mapping
{
  "properties": {
    "email": {
      "type": "keyword"
    }
  }
}

给已存在的 object 字段添加新的子字段

创建索引 my_index, name 字段有个 first 子字段:

PUT /my_index
{
  "mappings": {
    "properties": {
      "name": {
        "properties": {
          "first": {
            "type": "text"
          }
        }
      }
    }
  }
}

用 PUT /_mapping 请求给 name 字段添加 last 子字段:

PUT /my_index/_mapping
{
  "properties": {
    "name": {
      "properties": {
        "last": {
          "type": "text"
        }
      }
    }
  }
}

单个text字段加keyword子字段

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-put-mapping.html#add-multi-fields-existing-field-ex

index1 原来有 text 类型的子字段 content,单独给 content 字段加个 keyword 子字段,不影响其他字段:
添加 keyword 子字段后,老数据还无法直接通过 content.keyword 做 term 匹配,需要执行 POST index1/_update_by_query?conflicts=proceed 重新构建索引后才行

PUT /index1/_mapping
{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "ik_max_word",
      "search_analyzer": "ik_smart",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    }
  }
}

GET /index/_settings 查询索引的配置参数

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Get index settings API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-get-settings.html

GET /<target>/_settings 查询指定索引的全部配置项
GET /<target>/_settings/<setting> 查询指定索引的指定配置项

查询 index.number_of_shards 配置项

GET /my_blog_3shards/_settings/index.number_of_shards
{
    "my_blog_3shards": {
        "settings": {
            "index": {
                "number_of_shards": "3"
            }
        }
    }
}

查询索引的全部配置项

GET /my_blog_3shards/_settings
{
    "my_blog_3shards": {
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "number_of_shards": "3",
                "provided_name": "my_blog_3shards",
                "creation_date": "1645156600073",
                "sort": {
                    "field": "timestamp",
                    "order": "desc"
                },
                "number_of_replicas": "1",
                "uuid": "6uNlMASdSAGPNjHUNo9JiA",
                "version": {
                    "created": "7160299"
                }
            }
        }
    }
}

PUT /index/_settings 修改索引的动态配置

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Update index settings API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html

PUT /<target>/_settings 实时修改索引的动态配置参数

这个API经常被用来打开/关闭 index.refresh_interval 自动刷新,以便快速批量索引大量数据。

例如动态修改索引的副本数

PUT /my-index-000001/_settings
{
  "index" : {
    "number_of_replicas" : 2
  }
}

修改索引的分词器

必须先关闭索引才能修改分词器配置。

给 my-index-000001 索引加一个名为 content_analyzer 的索引:

POST /my-index-000001/_close

PUT /my-index-000001/_settings
{
  "analysis" : {
    "analyzer":{
      "content_analyzer":{
        "type":"custom",
        "tokenizer":"whitespace"
      }
    }
  }
}

POST /my-index-000001/_open

POST /index/_close 关闭索引

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Close index API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-close.html

POST /<index>/_close 关闭索引

索引可以被关闭,关闭的索引不可读写数据,只能查看元数据信息。


POST /index/_open 打开索引

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Open index API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-open-close.html

POST /<target>/_open 打开索引

重新打开关闭的索引使之再次可读写数据。


POST /index/_refresh 刷新索引

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Refresh API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html

刷新指定索引:

POST <target>/_refresh
GET <target>/_refresh

刷新全部索引:

POST /_refresh
GET /_refresh

refresh 操作将内存缓冲区中的数据写入 Lucene segment 使之可读
refresh 可以使最近的操作对 search 可见,比如新插入的文档在 refresh 操作后才可被检索到
默认情况下 es 每隔一秒钟执行一次 refresh,可以通过参数 index.refresh_interval 来修改这个刷新间隔

refresh 操作包括:
1、所有在内存缓冲区中的文档被写入到一个新的segment中,但是没有调用fsync,因此内存中的数据可能丢失
2、segment被打开使得里面的文档能够被搜索到
3、清空内存缓冲区


POST /index/_flush 刷入磁盘

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Flush API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-flush.html

刷指定索引的:

POST /<target>/_flush
GET /<target>/_flush

刷全部索引的:

POST /_flush
GET /_flush

flush 操作将 translog 中的操作记录刷入磁盘,默认5s一次

flush 过程主要做了如下操作:
1、通过refresh操作把所有在内存缓冲区中的文档写入到一个新的segment中
2、清空内存缓冲区
3、往磁盘里写入commit point信息
4、文件系统的page cache(segments) fsync到磁盘
5、删除旧的translog文件,因此此时内存中的segments已经写入到磁盘中,就不需要translog来保障数据安全了


POST /index/_forcemerge 强制段合并

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Force merge API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html

POST /<target>/_forcemerge 强制合并指定的索引
POST /_forcemerge 强制合并全部索引


POST /_reindex 数据拷贝

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html

POST /_reindex 将一个索引的数据复制到另一个索引

query 参数:

  • wait_for_completion 如果包含 wait_for_completion=false 参数则是异步处理,es校验请求参数后立即返回一个 task,后续可用此 task 取消或查询任务状态
  • requests_per_second 参数可以限流,限制单位时间处理的文档数

body 参数:

  • source 原索引
    • index 原索引名称,必填
  • dest 目标索引
    • index 目标索引名称,必填
    • pipeline 指定数据处理的 ingest pipeline
  • script 指定数据处理脚本

script 数据处理脚本

index1 索引中有两个 1024 维的向量,拷贝数据到 index2,同时将向量数据改为 512 维的:

{
  "source": {
    "index": "index1"
  },
  "dest": {
    "index": "index2"
  },
  "script": {
    "source": "ctx._source.title_vector = ctx._source.title_vector.subList(0,512); ctx._source.content_vector = ctx._source.content_vector.subList(0,512)",
    "lang": "painless"
  }
}

pipeline 数据处理流

dest 参数中指定 pipeline 参数,可以指定已创建好的 ingest pipeline,对数据进行加工处理

POST _reindex
{
  "source": {
    "index": "source"
  },
  "dest": {
    "index": "dest",
    "pipeline": "some_ingest_pipeline"
  }
}

POST /index/_split/new_index 拆分索引

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Split index API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-split-index.html

POST /<index>/_split/<target-index>
PUT /<index>/_split/<target-index>

将已有的索引拆分为分片数更多的新索引,原索引的每个主分片会拆分为多个目标索引上的新分片

例如将 my_source_index 拆分为新索引 my_target_index

POST /my_source_index/_split/my_target_index
{
  "settings": {
    "index.number_of_shards": 2
  }
}

上述操作在新索引创建后会立即返回,并不会等待索引分割操作完成。

索引拆分前提条件

  • 原索引必须是只读的,防止操作时有新数据写入
  • 集群健康状态必须是绿色
  • 目标索引必须不存在
  • 原索引的主分片数必须少于目标索引
  • 目标索引的主分片数必须是原索引主分片数的整数倍
  • 处理拆分过程的节点必须有足够的磁盘空间来容纳原索引的一份数据拷贝

可通过 index.blocks.write 设为 true 将索引设为数据只读,此时依然允许元数据操作,比如删除索引

PUT /my_source_index/_settings
{
  "settings": {
    "index.blocks.write": true 
  }
}

索引可拆分的倍数

索引可拆分的倍数由 index.number_of_routing_shards 静态配置项决定。
例如,索引分片数为 5, number_of_routing_shards 设为 30(5 × 2 × 3),30 可被因子 2 和 3 整除,所以可进行下面的拆分:
5 -> 10 -> 30 先1分2,再1分3
5 -> 15 -> 30 先1分3,再1分2
5 -> 30 1分6

index.number_of_routing_shards 是静态配置项,必须在创建索引时指定,或在关闭的索引上修改。

index.number_of_routing_shards 的默认值依赖于主分片的个数,目的是为了允许将索引以 2 为倍数拆分为最多 1024 个分片。例如索引有 5 个主分片,可以以 2 倍一次或多次拆分为 10,20,40,80,160,320,640 个分片,则 index.number_of_routing_shards 默认值为 640。

如果原索引只有一个主分片(或者多分片索引被 收缩 为一个主分片),则可被拆分为任意个分片,拆分后 index.number_of_routing_shards 的默认值也会随之变化。

索引拆分过程

1、创建一个新索引,和原索引定义相同,主分片数更多。
2、将原索引的段数据 硬链接(Hard Link) 到新索引的段数据上,Linux 中只是 inode 链接数的变化,很快。如果文件系统不支持硬链接,会执行数据拷贝,耗时会很长。
3、对全部文档进行重新哈希,之后删除不需要的段数据。
4、恢复目标索引,类似一个关闭的索引刚被打开一样。

为什么ES不支持增量reshard

监控拆分过程


POST /index/_shrink/new_index 收缩索引

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Shrink index API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-shrink-index.html

POST /<index>/_shrink/<target-index>
PUT /<index>/_shrink/<target-index>

将已有的索引收缩为分片数更少的新索引,目标索引的主分片数必须是原索引主分片数的整数因子

例如原索引的主分片数是 8,可以收缩为主分片数为 4, 2, 1 的新索引。如果原索引的主分片数是素数,则只能收缩为单分片的索引。

索引收缩过程:
1、创建一个新索引,和原索引定义相同,主分片数更少。
2、将原索引的段数据 硬链接(Hard Link) 到新索引的段数据上。如果文件系统不支持硬链接,会执行数据拷贝,耗时会很长。或者如果使用多数据目录(多磁盘分区),不同数据目录间的数据也需要完全拷贝,因为硬链接无法跨越磁盘。
3、恢复目标索引,类似一个关闭的索引刚被打开一样。


POST /index/_cache/clear 清理缓存

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Clear cache API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-clearcache.html

POST /<target>/_cache/clear 清理指定索引的缓存
POST /_cache/clear 清理全部缓存

默认清理全部缓存,可以指定清理 query, request, fielddata 三种缓存之一

POST /my-index-000001/_cache/clear?fielddata=true  // 只清理 fielddata 缓存
POST /my-index-000001/_cache/clear?query=true      // 只清理 query 缓存
POST /my-index-000001/_cache/clear?request=true    // 只清理 request 缓存

GET /index/_stats 查询索引统计信息

Elasticsearch Guide [7.17] » REST APIs » Index APIs » Index stats API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-stats.html

GET /<target>/_stats/<index-metric> 查询指定索引的指定统计指标
GET /<target>/_stats 查询指定索引的全部统计指标
GET /_stats 查询全部索引的全部统计指标

例如查询一个 3分片1副本的索引的 store 信息,结果显示总共大小为9.8g,其中主分片4.8g

GET /index1/_stats/store
{
  "_shards": {
    "total": 6,
    "successful": 6,
    "failed": 0
  },
  "_all": {
    "primaries": {
      "store": {
        "size_in_bytes": 4865706422
      }
    },
    "total": {
      "store": {
        "size_in_bytes": 9827457637
      }
    }
  },
  "indices": {
    "index1": {
      "uuid": "Ox5GfotcSjikFg08MHv-lQ",
      "primaries": {
        "store": {
          "size_in_bytes": 4865706422
        }
      },
      "total": {
        "store": {
          "size_in_bytes": 9827457637
        }
      }
    }
  }
}

上一篇 Elasticsearch-Api-文档操作

下一篇 Elasticsearch-Api-搜索

阅读
评论
4.1k
阅读预计19分钟
创建日期 2025-04-15
修改日期 2025-04-15
类别

页面信息

location:
protocol:
host:
hostname:
origin:
pathname:
href:
document:
referrer:
navigator:
platform:
userAgent:

评论