资讯专栏INFORMATION COLUMN

elasticsearch中数据聚合问题

Juven / 620人阅读

摘要:由于项目中最近用到了,并且用到的聚合功能,就深入研究了一下,中的聚合主要有四种和。

由于项目中最近用到了elasticsearch,并且用到elasticsearch的聚合(Aggregation)功能,就深入研究了一下,elasticsearch中的聚合主要有四种:Bucketing Aggregation、Metric Aggregation、Matrix Aggregation和Pipeline Aggregation。

聚合的基本结构
"aggregations" : {    
    "" : {    --用户自己起的名字
        "" : {      --聚合类型,如avg, sum
                  -- 针对的字段
        }
        [,"meta" : {  [] } ]?
        [,"aggregations" : { []+ } ]?    --聚合里面可以嵌套聚合
    }
    [,"" : { ... } ]*
}
Metric Aggregation Avg Aggregation--计算平均值

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "avg_value": {
      "avg": {"field": "value"}
    }
  }
}

返回结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 315,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_value" : {
      "value" : 342.84761904761905
    }
  }
}

之前看到其他博客上有说search_type=count可以只返回aggregation部分的结果,但我在7.x版本中试了下,好像不行,这边只能通过将size设为0来隐藏掉除了统计数据以外的数据。

Cardinality Aggregation--去重(相当于mysql中的distinct)

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "avg_value": {
      "cardinality": {"field": "service_id"}
    }
  }
}

返回结果:

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 317,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_value" : {
      "value" : 2
    }
  }
}
Extended Status Aggragation--获取某个字段的所有统计信息(包括平均值,最大/小值....)

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "avg_status": {
      "extended_stats": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 326,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_status" : {
      "count" : 326,       // 数量
      "min" : 2.0,         // 最小值  
      "max" : 2481.0,      // 最大值 
      "avg" : 347.63803680981596,    // 均值
      "sum" : 113330.0,              // 和
      "sum_of_squares" : 1.02303634E8,
      "variance" : 192962.62358387595,
      "std_deviation" : 439.275111500613,
      "std_deviation_bounds" : {
        "upper" : 1226.188259811042,
        "lower" : -530.91218619141
      }
    }
  }
}
Max Aggregation--求最大值

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "max_value": {
      "max": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 352,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_value" : {
      "value" : 2481.0
    }
  }
}

Min Aggreegation--计算最小值

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "min_value": {
      "min": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 352,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "min_value" : {
      "value" : 2.0
    }
  }
}

Percentiles Aggregation -- 百分比统计,按照[ 1, 5, 25, 50, 75, 95, 99 ]来统计

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "value_outlier": {
      "percentiles": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 44,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 334,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "value_outlier" : {
      "values" : {
        "1.0" : 4.0,
        "5.0" : 67.2,
        "25.0" : 91.33333333333333,
        "50.0" : 151.0,
        "75.0" : 420.0,
        "95.0" : 1412.4000000000005,
        "99.0" : 1906.32
      }
    }
  }
}

从返回结果可以看出来,75%的数据在420ms加载完毕。

当然我们也可以指定自己需要统计的百分比:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "value_outlier": {
      "percentiles": {
        "field": "value",
        "percents": [95, 96, 99, 99.5]
      }
    }
  }
}

返回结果:

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 330,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "value_outlier" : {
      "values" : {
        "95.0" : 1366.0,
        "96.0" : 1449.8000000000002,
        "99.0" : 1906.3999999999999,
        "99.5" : 2064.400000000004
      }
    }
  }
}
Percentile Ranks Aggregation -- 统计返回内数据的百分比
GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "value_range": {
      "percentile_ranks": {
        "field": "value",
        "values": [100, 200]
      }
    }
  }
}

返回结果:

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 346,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "value_range" : {
      "values" : {
        "100.0" : 32.51445086705203,
        "200.0" : 65.19405450041288
      }
    }
  }
}

从返回结果可以看出,在100ms左右加载完毕的占了32%, 200ms左右加载完毕的占了65%

Status Aggregation -- 状态统计

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "value_status": {
      "stats": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 355,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "value_status" : {
      "count" : 355,
      "min" : 2.0,
      "max" : 2753.0,
      "avg" : 339.8112676056338,
      "sum" : 120633.0
    }
  }
}

可以发现跟之前的extended stats aggregation返回数据类似,只是少了一些较复杂的标准差之类的数据。

Sum Aggregation -- 求和函数

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "query": {"term": {
    "service_id": {
      "value": 5
    }
  }}, 
  "aggs": {
    "sum_value": {
      "sum": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 194,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "sum_value" : {
      "value" : 91322.0
    }
  }
}
Top Hits Aggregation -- 获取前n条数据, 可以嵌套使用

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "top_tags": {
      "terms": {
        "field": "service_id",
        "size": 2
      },
      "aggs": {
        "top_value": {
          "top_hits": {
            "size": 3,
            "sort": [{
              "time_bucket": {"order": "desc"}
            }]
          }
        }
      }
    }
  }
}

返回结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 372,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "top_tags" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 5,
          "doc_count" : 198,
          "top_value" : {
            "hits" : {
              "total" : 198,
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191621_25",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 5,
                    "count" : 2,
                    "time_bucket" : 201906191621,
                    "service_instance_id" : 250,
                    "entity_id" : "25",
                    "value" : 149,
                    "summation" : 299
                  },
                  "sort" : [
                    201906191621
                  ]
                },
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191620_24",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 5,
                    "count" : 1,
                    "time_bucket" : 201906191620,
                    "service_instance_id" : 250,
                    "entity_id" : "24",
                    "value" : 93,
                    "summation" : 93
                  },
                  "sort" : [
                    201906191620
                  ]
                },
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191620_37",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 5,
                    "count" : 1,
                    "time_bucket" : 201906191620,
                    "service_instance_id" : 250,
                    "entity_id" : "37",
                    "value" : 122,
                    "summation" : 122
                  },
                  "sort" : [
                    201906191620
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : 3,
          "doc_count" : 174,
          "top_value" : {
            "hits" : {
              "total" : 174,
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191621_144",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 3,
                    "count" : 1,
                    "time_bucket" : 201906191621,
                    "service_instance_id" : 238,
                    "entity_id" : "144",
                    "value" : 93,
                    "summation" : 93
                  },
                  "sort" : [
                    201906191621
                  ]
                },
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191620_70",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 3,
                    "count" : 1,
                    "time_bucket" : 201906191620,
                    "service_instance_id" : 238,
                    "entity_id" : "70",
                    "value" : 192,
                    "summation" : 192
                  },
                  "sort" : [
                    201906191620
                  ]
                },
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191620_18",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 3,
                    "count" : 2,
                    "time_bucket" : 201906191620,
                    "service_instance_id" : 238,
                    "entity_id" : "18",
                    "value" : 81,
                    "summation" : 162
                  },
                  "sort" : [
                    201906191620
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}
Value Count Aggregation--统计不同值的数量

请求示例:

GET /endpoint_avg/_search
{
  "size": 2, 
  "aggs": {
    "value_count": {
      "value_count": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 357,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "endpoint_avg",
        "_type" : "type",
        "_id" : "201906191457_16",
        "_score" : 1.0,
        "_source" : {
          "service_id" : 3,
          "count" : 1,
          "time_bucket" : 201906191457,
          "service_instance_id" : 238,
          "entity_id" : "16",
          "value" : 129,
          "summation" : 129
        }
      },
      {
        "_index" : "endpoint_avg",
        "_type" : "type",
        "_id" : "201906191503_691",
        "_score" : 1.0,
        "_source" : {
          "service_id" : 5,
          "count" : 2,
          "time_bucket" : 201906191503,
          "service_instance_id" : 250,
          "entity_id" : "691",
          "value" : 178,
          "summation" : 357
        }
      }
    ]
  },
  "aggregations" : {
    "value_count" : {
      "value" : 357
    }
  }
}
基本metrics中常用的聚合函数就这几种,今天太累了,其他三类的聚合后续再做研究吧!

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/34505.html

相关文章

  • Elasticsearch 参考指南(目录)

    摘要:为了防止用户覆盖中指定的索引,将此设置添加到文件中默认值为,但当设置为时,将拒绝在请求体中指定的显式索引的请求。所有都是单索引,参数接受单个索引名,或指向单个索引的别名。查询子句的行为取决于它们是用于查询上下文还是过滤器上下文。 Elasticsearch 参考指南 Elasticsearch是一个高度可扩展的开源全文搜索和分析引擎,它允许你快速,近实时地存储,搜索和分析大量数据,它通...

    liaosilzu2007 评论0 收藏0
  • Elasticsearch 参考指南(聚合介绍)

    摘要:聚合介绍聚合框架帮助提供基于搜索查询的聚合数据,它基于称为聚合的简单构建块,可以进行组合以构建复杂的数据摘要。指标对一组文档进行跟踪和计算指标的聚合。管道聚合其他聚合的输出及其相关指标的聚合。 聚合介绍 聚合框架帮助提供基于搜索查询的聚合数据,它基于称为聚合的简单构建块,可以进行组合以构建复杂的数据摘要。 聚合可以看作是在一组文档上构建分析信息的工作单元,执行的上下文定义了这个文档集是...

    Youngs 评论0 收藏0
  • Elasticsearch检索 — 聚合和LBS

    摘要:例如,获取西二旗每个小区最便宜的房源信息其中,为组内返回的文档个数,表示组内文档的排序规则,指定组内文档返回的字段。 首发于 樊浩柏科学院 文章 Elasticsearch检索实战 已经讲述了 Elasticsearch 基本检索使用,已满足大部分检索场景,但是某些特定项目中会使用到 聚合 和 LBS 这类高级检索,以满足检索需求。这里将讲述 Elasticsearch 的聚合和 L...

    Winer 评论0 收藏0
  • 让我们ElasticSearch作伴,一起潇洒复习~

    摘要:我们会得到查询语句中根据哪个字段进行聚合,另外每个字段基数会由其他服务进行统计,例如根据字段进行聚合,由于基数过大。如果是针对大基数字段进行聚合查询预估消耗内存较大时,就会把这种查询熔断。 showImg(https://segmentfault.com/img/bVblkvK?w=640&h=360); 12月15日,即便天气寒冷,飘着雨。跨星空间座无虚席由袋鼠云、阿里云、elast...

    tomato 评论0 收藏0
  • Elasticsearch 参考指南(介绍)

    摘要:你运行价格警报平台,允许精通价格的客户指定一条规则,例如我有兴趣购买特定的电子产品,如果小工具的价格在下个月内从任何供应商降至美元以下,我希望收到通知。 介绍 Elasticsearch是一个高度可扩展的开源全文搜索和分析引擎,它允许你快速,近实时地存储,搜索和分析大量数据,它通常用作底层引擎/技术,为具有复杂搜索功能和要求的应用程序提供支持。 以下是Elasticsearch可用于的...

    or0fun 评论0 收藏0

发表评论

0条评论

Juven

|高级讲师

TA的文章

阅读更多
最新活动
阅读需要支付1元查看
<