VastbaseG100

基于openGauss内核开发的企业级关系型数据库。

Menu

开启自适应排序

本节主要介绍 Vastbase G100 的自适应排序功能。

功能描述

自适应排序功能能够将不同数据类型的数据进行解析,并根据实际情况选择合适的排序算法。

对于不支持的情况与数据类型,Vastbase G100 会切换为默认排序方式。

自适应排序功能分为三个步骤:

  • Put 阶段:对元组进行属性提取(Deform),并进行标准化(Normalize),然后将标准化后的数据装填到排序内存中。

    Vastbase G100 的自适应排序功能会解析所有元组而非仅解析排序行的元组。

  • Sort 阶段:根据具体情况选择排序算法。

    Vastbase G100 支持的排序算法包括: PDQSort、InsertSort、LsdRadixSort 与 MsdRadixSort。

    • PDQSort:Pattern-Defeating Quicksort,一种基于 QuickSort 的综合排序算法。

    • InsertSort:插入排序,将一个数据插入到已经排好序的有序数据的算法。仅在数据量较小,或一些特殊情况下会使用 InsertSort 排序算法。

    • LsdRadixSort:从低位开始的 RadixSort,需要遍历位宽后才能比较出结果。

    • MsdRadixSort:从高位开始的 RadixSort。

  • Get 阶段:从排序内存中根据顺序获取元组。

注意事项

  • 仅Vastbase G100 V2.2 Build 10(Patch No.17)及以后版本支持此功能。

  • 不支持磁盘排序。仅支持内存中排序,排序使用的内存超出 work_mem 限制后则会产生报错。

  • 不支持在大端序(Big-Endian)物理机上开启自适应排序。

  • 不支持上层算子为 RECURSIVE UNION 的排序操作。

  • 不支持 work_mem 小于 512 KB 时开启自适应排序。

  • 不支持 TopN 操作。

  • 在进行多路归并时,可能出现由于不支持归并排序算法,导致性能退化的情况。

  • 目前支持进行自适应排序的数据类型为:

    • 数值类型:INT1、INT2、INT4、INT8、FLOAT4、FLOAT8、NUMERIC

    • 文本类型:TEXT、BPCHAR、VARCHAR

    • 日期和时间类型:DATE、TIMESTAMP

  • 其中 TEXT、VARCHAR 不支持自定义 collate。

  • 其中 NUMERIC 类型不支持精度超过 NUMERIC(38,19)。

操作步骤

1、开启自适应排序,并设置合适的 work_mem

SET new_sort = adaptive_sort;
SET work_mem = '1024MB';

2、(可选)也可以修改 new_sort 强制指定排序算法。

--选择pdq算法 
SET new_sort = pdqsort;
--选择radix算法,并设置排序位宽阈值为 10
SET new_sort = radix;
SET new_sort_msdradix_size_threshold = 10;

示例

前置步骤 开启自适应排序,并设置合适的 work_mem

SET new_sort=adaptive_sort;
SET work_mem='1024MB';

示例 1 查询 LIST 二级分区表排序。

1、创建分区表,并创建索引。

CREATE TABLE par1_1220017(id INT,a1 TEXT,a2 DATE,a3 VARCHAR(30))
PARTITION BY LIST (id)
SUBPARTITION BY HASH(a1)
(
PARTITION p1 VALUES(100)(
SUBPARTITION p1_1,
SUBPARTITION p1_2,
SUBPARTITION p1_3),
PARTITION p2 VALUES(200)(
SUBPARTITION p2_1,
SUBPARTITION p2_2,
SUBPARTITION p2_3),
PARTITION p3 VALUES(300)(
SUBPARTITION p3_1,
SUBPARTITION p3_2,
SUBPARTITION p3_3),
PARTITION p4 VALUES(null)(
SUBPARTITION p4_1,
SUBPARTITION p4_2,
SUBPARTITION p4_3));

CREATE TABLE par2_1220017(id INT,a1 TEXT,a2 DATE,a3 VARCHAR(30))
PARTITION BY LIST (a2)
SUBPARTITION BY LIST(a1)
(
PARTITION p1 VALUES('2022-3-12')(
SUBPARTITION p1_1 VALUES('d'),
SUBPARTITION p1_2 VALUES('k'),
SUBPARTITION p1_3 VALUES('q')),
PARTITION p2 VALUES('2022-6-12')(
SUBPARTITION p2_1 VALUES('d'),
SUBPARTITION p2_2 VALUES('k'),
SUBPARTITION p2_3 VALUES('q')),
PARTITION p3 VALUES('2022-9-12')(
SUBPARTITION p3_1 VALUES('d'),
SUBPARTITION p3_2 VALUES('k'),
SUBPARTITION p3_3 VALUES('q')),
PARTITION p4 VALUES(null)(
SUBPARTITION p4_1 VALUES('d'),
SUBPARTITION p4_2 VALUES('k'),
SUBPARTITION p4_3 VALUES('q')));

--index;
CREATE INDEX ON par1_1220017(id)local;
CREATE INDEX ON par2_1220017(a2)local;

2、插入数据。

INSERT INTO par1_1220017 VALUES(100,'c',generate_series(DATE '2022-01-01', DATE '2022-4-10', '1 day'),chr(65 + (generate_series(1,100)-1)%25));
INSERT INTO par1_1220017 VALUES(200,'n',generate_series(DATE '2022-01-01', DATE '2022-4-10', '1 day'),chr(65 + (generate_series(1,100)-1)%25));
INSERT INTO par1_1220017 VALUES(null,'z',generate_series(DATE '2022-01-01', DATE '2022-4-10', '1 day'),chr(65 + (generate_series(1,100)-1)%25));
INSERT INTO par1_1220017 SELECT * FROM par1_1220017;
INSERT INTO par1_1220017 SELECT * FROM par1_1220017;

INSERT INTO par2_1220017 VALUES(generate_series(1,100),'d','2022-3-12',chr(65 + (generate_series(1,100)-1)%25));
INSERT INTO par2_1220017 VALUES(generate_series(101,200),'k','2022-6-12',chr(65 + (generate_series(1,100)-1)%25));
INSERT INTO par2_1220017 VALUES(generate_series(101,200),'k','2022-9-12',chr(65 + (generate_series(1,100)-1)%25));  
INSERT INTO par2_1220017 VALUES(generate_series(301,400),'q',null,chr(65 + (generate_series(1,100)-1)%25));
INSERT INTO par2_1220017 SELECT * FROM par2_1220017;
INSERT INTO par2_1220017 SELECT * FROM par2_1220017;

3、创建预备语句。

PREPARE i1_1220017(INT,INT) AS SELECT * FROM par1_1220017 WHERE id>$1 AND id<$2 ORDER BY 3;
PREPARE k7_1220017(DATE,DATE) AS SELECT * FROM par2_1220017 WHERE a2 IN($1,$2) ORDER BY 1;

4、使用 EXPLAIN 查看执行计划。

EXPLAIN (analyze,costs OFF) EXECUTE i1_1220017(130,230); 
EXPLAIN (analyze,costs OFF) EXECUTE k7_1220017('2022-06-13','2022-9-12');

返回结果为:

                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Sort (actual time=0.423..0.438 rows=400 loops=1)
   Sort Key: a2
   Sort Method: quicksort  Memory: 62kB
   ->  Partition Iterator (actual time=0.093..0.315 rows=400 loops=1)
         Iterations: PART
         ->  Partitioned Index Scan using par1_1220017_id_idx on par1_1220017 (actual time=0.126..0.241 rows=400 loops=3)
               Index Cond: ((id > $1) AND (id < $2))
               Selected Partitions:  PART
               Selected Subpartitions:  PART
 Total runtime: 0.727 ms
(10 rows)


                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Sort (actual time=0.470..0.494 rows=400 loops=1)
   Sort Key: id
   Sort Method: adaptive sort  Memory: 771kB
   ->  Partition Iterator (actual time=0.069..0.293 rows=400 loops=1)
         Iterations: PART
         ->  Partitioned Seq Scan on par2_1220017 (actual time=0.040..0.201 rows=400 loops=3)
               Filter: (a2 = ANY (ARRAY[$1, $2]))
               Selected Partitions:  PART
               Selected Subpartitions:  PART
 Total runtime: 0.676 ms
(10 rows)

5、清理测试表。

DEALLOCATE i1_1220017;
DEALLOCATE k7_1220017;
DROP TABLE par1_1220017;
DROP TABLE par2_1220017;

示例 2 使用VPD策略,全连接查询后排序。

1、检查数据库兼容性,本示例仅能在Oracle兼容模式下(即数据库实例初始化时指定DBCOMPATIBILITY='A')。

SHOW sql_compatibility;

2、创建测试表,并插入数据。

CREATE TABLE CWP_MASTER1(sub_project VARCHAR2(30), cwp_code VARCHAR2(30),data_owner VARCHAR2(30));
INSERT INTO CWP_MASTER1 VALUES('Z','ZZZ','VASTBASE');
INSERT INTO CWP_MASTER1 VALUES('1','102','VASTBASE');
INSERT INTO CWP_MASTER1 VALUES('C','C16','VASTBASE');
INSERT INTO CWP_MASTER1 VALUES('2','D16','VASTBASE');

CREATE TABLE invoice1( po_no VARCHAR2(30), data_owner VARCHAR2(30), owner_status VARCHAR2(30) );
INSERT INTO invoice1 VALUES('DOC/TEST20240520','VASTBASE',1);
INSERT INTO invoice1 VALUES('DOC/TEST20240520','VASTBASE',1);
INSERT INTO invoice1 VALUES('DOC/TEST20240520','VASTBASE',2);  

CREATE TABLE PURCHASE_ORDER1(PO_NO VARCHAR2(30), sub_project VARCHAR2(30), cwp_code VARCHAR2(30), data_owner VARCHAR2(30));
INSERT INTO PURCHASE_ORDER1 VALUES('DOC/TEST20240520','Z',null,'VASTBASE');
INSERT INTO PURCHASE_ORDER1 VALUES('DOC/TEST20240520','2','D16','VASTBASE');

3、创建测试函数。

CREATE OR REPLACE FUNCTION u_q_limit(p_schema_name text, p_table_name text)
RETURNS text
LANGUAGE plpgsql
NOT FENCED NOT SHIPPABLE
AS $function$
<<U_Q_LIMIT>>
DECLARE
v_where_condition VARCHAR2(1000);
BEGIN
v_where_condition := 'data_owner =''' || 'VASTBASE' || '''';
RETURN v_where_condition;
END U_Q_LIMIT;
$function$;

4、查询数据。

EXPLAIN ANALYZE SELECT invoice1.po_no,invoice1.data_owner,invoice1.owner_status,po.cwp_code,cm.cwp_code
FROM invoice1
FULL JOIN PURCHASE_ORDER1 PO
ON PO.PO_NO = INVOICE1.PO_NO
FULL JOIN CWP_MASTER1 cm
ON po.sub_project = cm.sub_project AND po.cwp_code = cm.cwp_code
WHERE invoice1.po_no = 'DOC/TEST20240520'
AND invoice1.data_owner ='VASTBASE'
AND invoice1.owner_status IN ('1', '2')
ORDER BY 1,2,3,4,5;

返回结果为:

                                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=43.85..43.86 rows=1 width=390) (actual time=0.657..0.657 rows=6 loops=1)
   Sort Key: invoice1.owner_status, po.cwp_code, cm.cwp_code
   Sort Method: adaptive sort  Memory: 1024kB
   ->  Nested Loop Left Join  (cost=13.00..43.84 rows=1 width=390) (actual time=0.320..0.590 rows=6 loops=1)
         Join Filter: ((po.po_no)::text = (invoice1.po_no)::text)
         ->  Seq Scan on invoice1  (cost=0.00..15.41 rows=1 width=234) (actual time=0.053..0.055 rows=3 loops=1)
               Filter: (((owner_status)::text = ANY ('{1,2}'::text[])) AND ((po_no)::text = 'DOC/TEST20240520'::text) AND ((data_owner)::text = 'VASTBASE'::text))
         ->  Hash Right Join  (cost=13.00..28.42 rows=1 width=234) (actual time=0.266..0.423 rows=6 loops=3)
               Hash Cond: (((cm.sub_project)::text = (po.sub_project)::text) AND ((cm.cwp_code)::text = (po.cwp_code)::text))
               ->  Seq Scan on cwp_master1 cm  (cost=0.00..13.09 rows=309 width=156) (actual time=0.022..0.023 rows=12 loops=3)
               ->  Hash  (cost=12.99..12.99 rows=1 width=234) (actual time=0.041..0.041 rows=2 loops=1)
                      Buckets: 32768  Batches: 1  Memory Usage: 257kB
                     ->  Seq Scan on purchase_order1 po  (cost=0.00..12.99 rows=1 width=234) (actual time=0.030..0.031 rows=2 loops=1)
                           Filter: ((po_no)::text = 'DOC/TEST20240520'::text)
 Total runtime: 0.806 ms
(15 rows)

5、清理测试表与函数。

DROP FUNCTION u_q_limit(p_schema_name text, p_table_name text);
DROP TABLE PURCHASE_ORDER1;
DROP TABLE invoice1;
DROP TABLE CWP_MASTER1;