摘要:由于项目中最近用到了,并且用到的聚合功能,就深入研究了一下,中的聚合主要有四种和。
由于项目中最近用到了elasticsearch,并且用到elasticsearch的聚合(Aggregation)功能,就深入研究了一下,elasticsearch中的聚合主要有四种:Bucketing Aggregation、Metric Aggregation、Matrix Aggregation和Pipeline Aggregation。
聚合的基本结构"aggregations" : { "Metric Aggregation Avg Aggregation--计算平均值" : { --用户自己起的名字 " " : { --聚合类型,如avg, sum -- 针对的字段 } [,"meta" : { [ ] } ]? [,"aggregations" : { [ ]+ } ]? --聚合里面可以嵌套聚合 } [," " : { ... } ]* }
请求示例:
GET /endpoint_avg/_search { "size": 0, "aggs": { "avg_value": { "avg": {"field": "value"} } } }
返回结果:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 315, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "avg_value" : { "value" : 342.84761904761905 } } }
之前看到其他博客上有说search_type=count可以只返回aggregation部分的结果,但我在7.x版本中试了下,好像不行,这边只能通过将size设为0来隐藏掉除了统计数据以外的数据。
Cardinality Aggregation--去重(相当于mysql中的distinct)请求示例:
GET /endpoint_avg/_search { "size": 0, "aggs": { "avg_value": { "cardinality": {"field": "service_id"} } } }
返回结果:
{ "took" : 11, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 317, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "avg_value" : { "value" : 2 } } }Extended Status Aggragation--获取某个字段的所有统计信息(包括平均值,最大/小值....)
请求示例:
GET /endpoint_avg/_search { "size": 0, "aggs": { "avg_status": { "extended_stats": { "field": "value" } } } }
返回结果:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 326, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "avg_status" : { "count" : 326, // 数量 "min" : 2.0, // 最小值 "max" : 2481.0, // 最大值 "avg" : 347.63803680981596, // 均值 "sum" : 113330.0, // 和 "sum_of_squares" : 1.02303634E8, "variance" : 192962.62358387595, "std_deviation" : 439.275111500613, "std_deviation_bounds" : { "upper" : 1226.188259811042, "lower" : -530.91218619141 } } } }Max Aggregation--求最大值
请求示例:
GET /endpoint_avg/_search { "size": 0, "aggs": { "max_value": { "max": { "field": "value" } } } }
返回结果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 352, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "max_value" : { "value" : 2481.0 } } }Min Aggreegation--计算最小值
请求示例:
GET /endpoint_avg/_search { "size": 0, "aggs": { "min_value": { "min": { "field": "value" } } } }
返回结果:
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 352, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "min_value" : { "value" : 2.0 } } }Percentiles Aggregation -- 百分比统计,按照[ 1, 5, 25, 50, 75, 95, 99 ]来统计
请求示例:
GET /endpoint_avg/_search { "size": 0, "aggs": { "value_outlier": { "percentiles": { "field": "value" } } } }
返回结果:
{ "took" : 44, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 334, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "value_outlier" : { "values" : { "1.0" : 4.0, "5.0" : 67.2, "25.0" : 91.33333333333333, "50.0" : 151.0, "75.0" : 420.0, "95.0" : 1412.4000000000005, "99.0" : 1906.32 } } } }
从返回结果可以看出来,75%的数据在420ms加载完毕。
当然我们也可以指定自己需要统计的百分比:
GET /endpoint_avg/_search { "size": 0, "aggs": { "value_outlier": { "percentiles": { "field": "value", "percents": [95, 96, 99, 99.5] } } } }
返回结果:
{ "took" : 20, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 330, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "value_outlier" : { "values" : { "95.0" : 1366.0, "96.0" : 1449.8000000000002, "99.0" : 1906.3999999999999, "99.5" : 2064.400000000004 } } } }Percentile Ranks Aggregation -- 统计返回内数据的百分比
GET /endpoint_avg/_search { "size": 0, "aggs": { "value_range": { "percentile_ranks": { "field": "value", "values": [100, 200] } } } }
返回结果:
{ "took" : 10, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 346, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "value_range" : { "values" : { "100.0" : 32.51445086705203, "200.0" : 65.19405450041288 } } } }
从返回结果可以看出,在100ms左右加载完毕的占了32%, 200ms左右加载完毕的占了65%
Status Aggregation -- 状态统计请求示例:
GET /endpoint_avg/_search { "size": 0, "aggs": { "value_status": { "stats": { "field": "value" } } } }
返回结果:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 355, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "value_status" : { "count" : 355, "min" : 2.0, "max" : 2753.0, "avg" : 339.8112676056338, "sum" : 120633.0 } } }
可以发现跟之前的extended stats aggregation返回数据类似,只是少了一些较复杂的标准差之类的数据。
Sum Aggregation -- 求和函数请求示例:
GET /endpoint_avg/_search { "size": 0, "query": {"term": { "service_id": { "value": 5 } }}, "aggs": { "sum_value": { "sum": { "field": "value" } } } }
返回结果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 194, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "sum_value" : { "value" : 91322.0 } } }Top Hits Aggregation -- 获取前n条数据, 可以嵌套使用
请求示例:
GET /endpoint_avg/_search { "size": 0, "aggs": { "top_tags": { "terms": { "field": "service_id", "size": 2 }, "aggs": { "top_value": { "top_hits": { "size": 3, "sort": [{ "time_bucket": {"order": "desc"} }] } } } } } }
返回结果:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 372, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "top_tags" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : 5, "doc_count" : 198, "top_value" : { "hits" : { "total" : 198, "max_score" : null, "hits" : [ { "_index" : "endpoint_avg", "_type" : "type", "_id" : "201906191621_25", "_score" : null, "_source" : { "service_id" : 5, "count" : 2, "time_bucket" : 201906191621, "service_instance_id" : 250, "entity_id" : "25", "value" : 149, "summation" : 299 }, "sort" : [ 201906191621 ] }, { "_index" : "endpoint_avg", "_type" : "type", "_id" : "201906191620_24", "_score" : null, "_source" : { "service_id" : 5, "count" : 1, "time_bucket" : 201906191620, "service_instance_id" : 250, "entity_id" : "24", "value" : 93, "summation" : 93 }, "sort" : [ 201906191620 ] }, { "_index" : "endpoint_avg", "_type" : "type", "_id" : "201906191620_37", "_score" : null, "_source" : { "service_id" : 5, "count" : 1, "time_bucket" : 201906191620, "service_instance_id" : 250, "entity_id" : "37", "value" : 122, "summation" : 122 }, "sort" : [ 201906191620 ] } ] } } }, { "key" : 3, "doc_count" : 174, "top_value" : { "hits" : { "total" : 174, "max_score" : null, "hits" : [ { "_index" : "endpoint_avg", "_type" : "type", "_id" : "201906191621_144", "_score" : null, "_source" : { "service_id" : 3, "count" : 1, "time_bucket" : 201906191621, "service_instance_id" : 238, "entity_id" : "144", "value" : 93, "summation" : 93 }, "sort" : [ 201906191621 ] }, { "_index" : "endpoint_avg", "_type" : "type", "_id" : "201906191620_70", "_score" : null, "_source" : { "service_id" : 3, "count" : 1, "time_bucket" : 201906191620, "service_instance_id" : 238, "entity_id" : "70", "value" : 192, "summation" : 192 }, "sort" : [ 201906191620 ] }, { "_index" : "endpoint_avg", "_type" : "type", "_id" : "201906191620_18", "_score" : null, "_source" : { "service_id" : 3, "count" : 2, "time_bucket" : 201906191620, "service_instance_id" : 238, "entity_id" : "18", "value" : 81, "summation" : 162 }, "sort" : [ 201906191620 ] } ] } } } ] } } }Value Count Aggregation--统计不同值的数量
请求示例:
GET /endpoint_avg/_search { "size": 2, "aggs": { "value_count": { "value_count": { "field": "value" } } } }
返回结果:
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 357, "max_score" : 1.0, "hits" : [ { "_index" : "endpoint_avg", "_type" : "type", "_id" : "201906191457_16", "_score" : 1.0, "_source" : { "service_id" : 3, "count" : 1, "time_bucket" : 201906191457, "service_instance_id" : 238, "entity_id" : "16", "value" : 129, "summation" : 129 } }, { "_index" : "endpoint_avg", "_type" : "type", "_id" : "201906191503_691", "_score" : 1.0, "_source" : { "service_id" : 5, "count" : 2, "time_bucket" : 201906191503, "service_instance_id" : 250, "entity_id" : "691", "value" : 178, "summation" : 357 } } ] }, "aggregations" : { "value_count" : { "value" : 357 } } }基本metrics中常用的聚合函数就这几种,今天太累了,其他三类的聚合后续再做研究吧!
文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。
转载请注明本文地址:https://www.ucloud.cn/yun/34505.html
摘要:为了防止用户覆盖中指定的索引,将此设置添加到文件中默认值为,但当设置为时,将拒绝在请求体中指定的显式索引的请求。所有都是单索引,参数接受单个索引名,或指向单个索引的别名。查询子句的行为取决于它们是用于查询上下文还是过滤器上下文。 Elasticsearch 参考指南 Elasticsearch是一个高度可扩展的开源全文搜索和分析引擎,它允许你快速,近实时地存储,搜索和分析大量数据,它通...
摘要:聚合介绍聚合框架帮助提供基于搜索查询的聚合数据,它基于称为聚合的简单构建块,可以进行组合以构建复杂的数据摘要。指标对一组文档进行跟踪和计算指标的聚合。管道聚合其他聚合的输出及其相关指标的聚合。 聚合介绍 聚合框架帮助提供基于搜索查询的聚合数据,它基于称为聚合的简单构建块,可以进行组合以构建复杂的数据摘要。 聚合可以看作是在一组文档上构建分析信息的工作单元,执行的上下文定义了这个文档集是...
摘要:例如,获取西二旗每个小区最便宜的房源信息其中,为组内返回的文档个数,表示组内文档的排序规则,指定组内文档返回的字段。 首发于 樊浩柏科学院 文章 Elasticsearch检索实战 已经讲述了 Elasticsearch 基本检索使用,已满足大部分检索场景,但是某些特定项目中会使用到 聚合 和 LBS 这类高级检索,以满足检索需求。这里将讲述 Elasticsearch 的聚合和 L...
摘要:我们会得到查询语句中根据哪个字段进行聚合,另外每个字段基数会由其他服务进行统计,例如根据字段进行聚合,由于基数过大。如果是针对大基数字段进行聚合查询预估消耗内存较大时,就会把这种查询熔断。 showImg(https://segmentfault.com/img/bVblkvK?w=640&h=360); 12月15日,即便天气寒冷,飘着雨。跨星空间座无虚席由袋鼠云、阿里云、elast...
摘要:你运行价格警报平台,允许精通价格的客户指定一条规则,例如我有兴趣购买特定的电子产品,如果小工具的价格在下个月内从任何供应商降至美元以下,我希望收到通知。 介绍 Elasticsearch是一个高度可扩展的开源全文搜索和分析引擎,它允许你快速,近实时地存储,搜索和分析大量数据,它通常用作底层引擎/技术,为具有复杂搜索功能和要求的应用程序提供支持。 以下是Elasticsearch可用于的...
阅读 1536·2021-11-12 10:34
阅读 2397·2021-10-11 10:57
阅读 2169·2021-08-27 16:20
阅读 1188·2019-08-30 13:03
阅读 1404·2019-08-30 12:50
阅读 3195·2019-08-29 14:16
阅读 1422·2019-08-29 11:12
阅读 1483·2019-08-28 17:53