Elasticsearch-Api-索引操作
Elasticsearch-Api-索引操作
index 相当于数据库的表,是 Elasticsearch 数据管理的顶层单位
PUT /index 创建索引
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Create index API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html
PUT /<index>
创建索引
例1、使用默认配置、不指定 mapping 创建索引 articlecurl -X PUT 'http://localhost:9200/article'
返回如下
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "article"
}
例2、指定分片、副本数创建索引 my-index-000001
PUT /my-index-000001
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
}
body 请求体可以简化,无需指定 index
块,如下:
PUT /my-index-000001
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
例3、指定分片数、mapping创建索引 test
PUT /test
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"field1": { "type": "text" }
}
}
}
例4、指定mapping、基于ik的自定义分析器+同义词+停用词创建索引
PUT /test
{
"mappings": {
"properties":{
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"keyword":{
"type":"keyword"
}
}
},
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
},
"settings": {
"index": {
"analysis": {
"filter": {
"my_synonyms": {
"type": "synonym_graph",
"synonyms": [
"应用,服务",
]
},
"my_stop": {
"type": "stop",
"ignore_case": true,
"stopwords": ["的", "一", "不", "在", "人", "有", "是", "为", "以", "于", "上", "他", "而", "后", "之", "来", "及", "了", "因", "下", "可", "到", "由", "这", "与", "也", "此", "但", "并", "个", "其", "已", "无", "小", "我", "们", "起", "最", "再", "今", "去", "好", "只", "又", "或", "很", "亦", "某", "把", "那", "你", "乃", "它", "吧", "被", "比", "别", "趁", "当", "从", "到", "得", "打", "凡", "儿", "尔", "该", "各", "给", "跟", "和", "何", "还", "即", "几", "既", "看", "据", "距", "靠", "啦", "了", "另", "么", "每", "们", "嘛", "拿", "哪", "那", "您", "凭", "且", "却", "让", "仍", "啥", "如", "若", "使", "谁", "虽", "随", "同", "所", "她", "哇", "嗡", "往", "哪", "些", "向", "沿", "哟", "用", "于", "咱", "则", "怎", "曾", "至", "致", "着", "诸", "自"]
}
},
"analyzer": {
"ik_max_custom": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": [
"my_synonyms", "my_stop"
]
},
"ik_smart_custom": {
"type": "custom",
"tokenizer": "ik_smart",
"filter": [
"my_synonyms", "my_stop"
]
}
}
}
}
}
}
GET /index 查询索引
Elasticsearch Guide [7.16] » REST APIs » Index APIs » Get index API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-index.html
curl -X GET 'http://localhost:9200/article'
返回
{
"article": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "article",
"creation_date": "1643005891480",
"number_of_replicas": "1",
"uuid": "phMOfBkAT8yE6lE1WHQniA",
"version": {
"created": "7160299"
}
}
}
}
}
DELETE /index 删除索引
Elasticsearch Guide [7.16] » REST APIs » Index APIs » Delete index API
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/indices-delete-index.html
curl -X DELETE 'localhost:9200/article'
返回如下
{
"acknowledged": true
}
ignore_unavailable 幂等删除(不存在时不报错)
默认情况下,如果 index 不存在,会报错返回 “status”: 404 index_not_found_exception
添加参数 ignore_unavailable=true
后即使 index 不存在也不会报错,接口变为幂等的。curl -X DELETE localhost:9200/article?ignore_unavailable=true
java 中:
DeleteIndexRequest request = new DeleteIndexRequest("index_name");
request.indicesOptions(IndicesOptions.lenientExpandOpen()); // 设置 ignoreUnavailable 为 true
GET /index/_mapping 查询mapping
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Get mapping API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-get-mapping.html
curl -X GET 'http://localhost:9200/article/_mapping'
无 mapping 结果如下:
{
"user_1.24.14": {
"mappings": {}
}
}
PUT /index/_mapping 修改mapping
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Update mapping API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-put-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/indices-put-mapping.html
添加新字段
PUT /my-index-000001/_mapping
{
"properties": {
"email": {
"type": "keyword"
}
}
}
给已存在的 object 字段添加新的子字段
创建索引 my_index, name 字段有个 first 子字段:
PUT /my_index
{
"mappings": {
"properties": {
"name": {
"properties": {
"first": {
"type": "text"
}
}
}
}
}
}
用 PUT /_mapping 请求给 name 字段添加 last 子字段:
PUT /my_index/_mapping
{
"properties": {
"name": {
"properties": {
"last": {
"type": "text"
}
}
}
}
}
单个text字段加keyword子字段
index1 原来有 text 类型的子字段 content,单独给 content 字段加个 keyword 子字段,不影响其他字段:
添加 keyword 子字段后,老数据还无法直接通过 content.keyword 做 term 匹配,需要执行 POST index1/_update_by_query?conflicts=proceed
重新构建索引后才行
PUT /index1/_mapping
{
"properties": {
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
GET /index/_settings 查询索引的配置参数
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Get index settings API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-get-settings.html
GET /<target>/_settings
查询指定索引的全部配置项GET /<target>/_settings/<setting>
查询指定索引的指定配置项
查询 index.number_of_shards
配置项
GET /my_blog_3shards/_settings/index.number_of_shards
{
"my_blog_3shards": {
"settings": {
"index": {
"number_of_shards": "3"
}
}
}
}
查询索引的全部配置项
GET /my_blog_3shards/_settings
{
"my_blog_3shards": {
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "3",
"provided_name": "my_blog_3shards",
"creation_date": "1645156600073",
"sort": {
"field": "timestamp",
"order": "desc"
},
"number_of_replicas": "1",
"uuid": "6uNlMASdSAGPNjHUNo9JiA",
"version": {
"created": "7160299"
}
}
}
}
}
PUT /index/_settings 修改索引的动态配置
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Update index settings API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html
PUT /<target>/_settings
实时修改索引的动态配置参数
这个API经常被用来打开/关闭 index.refresh_interval
自动刷新,以便快速批量索引大量数据。
例如动态修改索引的副本数
PUT /my-index-000001/_settings
{
"index" : {
"number_of_replicas" : 2
}
}
修改索引的分词器
必须先关闭索引才能修改分词器配置。
给 my-index-000001 索引加一个名为 content_analyzer 的索引:
POST /my-index-000001/_close
PUT /my-index-000001/_settings
{
"analysis" : {
"analyzer":{
"content_analyzer":{
"type":"custom",
"tokenizer":"whitespace"
}
}
}
}
POST /my-index-000001/_open
POST /index/_close 关闭索引
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Close index API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-close.html
POST /<index>/_close
关闭索引
索引可以被关闭,关闭的索引不可读写数据,只能查看元数据信息。
POST /index/_open 打开索引
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Open index API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-open-close.html
POST /<target>/_open
打开索引
重新打开关闭的索引使之再次可读写数据。
POST /index/_refresh 刷新索引
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Refresh API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
刷新指定索引:
POST <target>/_refresh
GET <target>/_refresh
刷新全部索引:
POST /_refresh
GET /_refresh
refresh 操作将内存缓冲区中的数据写入 Lucene segment 使之可读
refresh 可以使最近的操作对 search 可见,比如新插入的文档在 refresh 操作后才可被检索到
默认情况下 es 每隔一秒钟执行一次 refresh,可以通过参数 index.refresh_interval
来修改这个刷新间隔
refresh 操作包括:
1、所有在内存缓冲区中的文档被写入到一个新的segment中,但是没有调用fsync,因此内存中的数据可能丢失
2、segment被打开使得里面的文档能够被搜索到
3、清空内存缓冲区
POST /index/_flush 刷入磁盘
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Flush API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-flush.html
刷指定索引的:
POST /<target>/_flush
GET /<target>/_flush
刷全部索引的:
POST /_flush
GET /_flush
flush 操作将 translog 中的操作记录刷入磁盘,默认5s一次
flush 过程主要做了如下操作:
1、通过refresh操作把所有在内存缓冲区中的文档写入到一个新的segment中
2、清空内存缓冲区
3、往磁盘里写入commit point信息
4、文件系统的page cache(segments) fsync到磁盘
5、删除旧的translog文件,因此此时内存中的segments已经写入到磁盘中,就不需要translog来保障数据安全了
POST /index/_forcemerge 强制段合并
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Force merge API
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html
POST /<target>/_forcemerge
强制合并指定的索引POST /_forcemerge
强制合并全部索引
POST /_reindex 数据拷贝
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-reindex.html
POST /_reindex
将一个索引的数据复制到另一个索引
query 参数:
wait_for_completion
如果包含wait_for_completion=false
参数则是异步处理,es校验请求参数后立即返回一个 task,后续可用此 task 取消或查询任务状态requests_per_second
参数可以限流,限制单位时间处理的文档数
body 参数:
source
原索引index
原索引名称,必填
dest
目标索引index
目标索引名称,必填pipeline
指定数据处理的 ingest pipeline
script
指定数据处理脚本
script 数据处理脚本
index1 索引中有两个 1024 维的向量,拷贝数据到 index2,同时将向量数据改为 512 维的:
{
"source": {
"index": "index1"
},
"dest": {
"index": "index2"
},
"script": {
"source": "ctx._source.title_vector = ctx._source.title_vector.subList(0,512); ctx._source.content_vector = ctx._source.content_vector.subList(0,512)",
"lang": "painless"
}
}
pipeline 数据处理流
dest
参数中指定 pipeline
参数,可以指定已创建好的 ingest pipeline,对数据进行加工处理
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "some_ingest_pipeline"
}
}
POST /index/_split/new_index 拆分索引
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Split index API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-split-index.html
POST /<index>/_split/<target-index>
PUT /<index>/_split/<target-index>
将已有的索引拆分为分片数更多的新索引,原索引的每个主分片会拆分为多个目标索引上的新分片
例如将 my_source_index 拆分为新索引 my_target_index
POST /my_source_index/_split/my_target_index
{
"settings": {
"index.number_of_shards": 2
}
}
上述操作在新索引创建后会立即返回,并不会等待索引分割操作完成。
索引拆分前提条件
- 原索引必须是只读的,防止操作时有新数据写入
- 集群健康状态必须是绿色
- 目标索引必须不存在
- 原索引的主分片数必须少于目标索引
- 目标索引的主分片数必须是原索引主分片数的整数倍
- 处理拆分过程的节点必须有足够的磁盘空间来容纳原索引的一份数据拷贝
可通过 index.blocks.write
设为 true
将索引设为数据只读,此时依然允许元数据操作,比如删除索引
PUT /my_source_index/_settings
{
"settings": {
"index.blocks.write": true
}
}
索引可拆分的倍数
索引可拆分的倍数由 index.number_of_routing_shards
静态配置项决定。
例如,索引分片数为 5, number_of_routing_shards 设为 30(5 × 2 × 3),30 可被因子 2 和 3 整除,所以可进行下面的拆分:
5 -> 10 -> 30 先1分2,再1分3
5 -> 15 -> 30 先1分3,再1分2
5 -> 30 1分6
index.number_of_routing_shards
是静态配置项,必须在创建索引时指定,或在关闭的索引上修改。
index.number_of_routing_shards
的默认值依赖于主分片的个数,目的是为了允许将索引以 2 为倍数拆分为最多 1024 个分片。例如索引有 5 个主分片,可以以 2 倍一次或多次拆分为 10,20,40,80,160,320,640 个分片,则 index.number_of_routing_shards
默认值为 640。
如果原索引只有一个主分片(或者多分片索引被 收缩 为一个主分片),则可被拆分为任意个分片,拆分后 index.number_of_routing_shards
的默认值也会随之变化。
索引拆分过程
1、创建一个新索引,和原索引定义相同,主分片数更多。
2、将原索引的段数据 硬链接(Hard Link) 到新索引的段数据上,Linux 中只是 inode 链接数的变化,很快。如果文件系统不支持硬链接,会执行数据拷贝,耗时会很长。
3、对全部文档进行重新哈希,之后删除不需要的段数据。
4、恢复目标索引,类似一个关闭的索引刚被打开一样。
为什么ES不支持增量reshard
监控拆分过程
POST /index/_shrink/new_index 收缩索引
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Shrink index API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-shrink-index.html
POST /<index>/_shrink/<target-index>
PUT /<index>/_shrink/<target-index>
将已有的索引收缩为分片数更少的新索引,目标索引的主分片数必须是原索引主分片数的整数因子
例如原索引的主分片数是 8,可以收缩为主分片数为 4, 2, 1 的新索引。如果原索引的主分片数是素数,则只能收缩为单分片的索引。
索引收缩过程:
1、创建一个新索引,和原索引定义相同,主分片数更少。
2、将原索引的段数据 硬链接(Hard Link) 到新索引的段数据上。如果文件系统不支持硬链接,会执行数据拷贝,耗时会很长。或者如果使用多数据目录(多磁盘分区),不同数据目录间的数据也需要完全拷贝,因为硬链接无法跨越磁盘。
3、恢复目标索引,类似一个关闭的索引刚被打开一样。
POST /index/_cache/clear 清理缓存
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Clear cache API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-clearcache.html
POST /<target>/_cache/clear
清理指定索引的缓存POST /_cache/clear
清理全部缓存
默认清理全部缓存,可以指定清理 query
, request
, fielddata
三种缓存之一
POST /my-index-000001/_cache/clear?fielddata=true // 只清理 fielddata 缓存
POST /my-index-000001/_cache/clear?query=true // 只清理 query 缓存
POST /my-index-000001/_cache/clear?request=true // 只清理 request 缓存
GET /index/_stats 查询索引统计信息
Elasticsearch Guide [7.17] » REST APIs » Index APIs » Index stats API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-stats.html
GET /<target>/_stats/<index-metric>
查询指定索引的指定统计指标GET /<target>/_stats
查询指定索引的全部统计指标GET /_stats
查询全部索引的全部统计指标
例如查询一个 3分片1副本的索引的 store 信息,结果显示总共大小为9.8g,其中主分片4.8g
GET /index1/_stats/store
{
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"_all": {
"primaries": {
"store": {
"size_in_bytes": 4865706422
}
},
"total": {
"store": {
"size_in_bytes": 9827457637
}
}
},
"indices": {
"index1": {
"uuid": "Ox5GfotcSjikFg08MHv-lQ",
"primaries": {
"store": {
"size_in_bytes": 4865706422
}
},
"total": {
"store": {
"size_in_bytes": 9827457637
}
}
}
}
}
页面信息
location:
protocol
: host
: hostname
: origin
: pathname
: href
: document:
referrer
: navigator:
platform
: userAgent
: