VastbaseG100

基于openGauss内核开发的企业级关系型数据库。

Menu

资源池化DORADO双集群部署

内容简介

Vastbase资源池化是Vastbase海量数据库推出的一种新型的集群架构。通过DMS和DSS组件,实现集群中多个节点的底层存储数据共享和节点间的内存实时共享,达到节省底层存储资源以及集群内部支持一写多读且可以实时一致性读的目的。

本文主要介绍基于日志合一的资源池化部署同城双中心。

  • 如果需要搭建资源池化同城双中心,在硬件上需要准备磁阵、服务器和光交换机,且需要部署HAS和OM组件。

  • 当前版本只支持资源池化本地xlog日志和远程同步复制xlog日志共存的实现下部署资源池化主备双集群容灾。

手动部署同城双中心示例

资源池化部署通过OM管理工具部署两个资源池化单集群,再配置容灾参数,从而搭建同城双中心。

  • 前置条件

    • 主备存储已经挂载磁阵LUN设备,并且已经安装ultrapath多路径软件;
    • 磁阵设备可用。
  • 限制条件

    • 两套正常的dorado存储,需要搭建两套资源池化集群。
    • 为了支持日志合一,需要将集群内xlog日志全部存放在一个卷上。
  • 组网方式:

    生产中心 主端 业务计算节点0 10.0.0.10 主存储 Dorado1
    业务计算节点1 10.0.0.20
    容灾中心 备端 业务计算节点0 20.0.0.10 备存储 Dorado2
    业务计算节点1 20.0.0.20

操作步骤

  • 第一步: 在主存储上创建资源池化需要的lun,以及远程同步复制xlog卷对应的lun,并且所有lun全部映射到业务计算节点上。

    需要保证同一个Dorado lun在一个集群内多台机器上映射的盘符一致,若不一致可以通过建立软连接的方法,使其对Vastbase暴露的盘符一致即可。

    例如wwn为00000000000001的盘在主机群0节点的盘符为sda,在主机群1节点的盘符为sdb,可以通过:

    ll /dev/disk/by-id             // 查看lun对应的wwn来确定具体的盘符
    ln -s /dev/sda /dev/first      // 在0节点执行
    ln -s /dev/sdb /dev/first      // 在1节点执行
    

    然后将/dev/first作为可用的盘:

    lun 盘符
    data /dev/sda
    xlog /dev/sdb
    votingDiskPath /dev/sdc
    shareDiskDir /dev/sdd
    • 磁阵操作步骤

      1、登录主集群DeviceManager,选择服务->LUN组->创建 来创建主集群LUN组;

      2、登录主集群DeviceManager,选择数据保护->LUN->远程复制Pair->创建 为xlog卷创建远程复制Pair,执行完成后DeviceManager会在对端自动创建一个与本端xlog卷有同步复制关系的卷;

      3、点击创建好的远程复制Pair,通过操作->分裂 将上一步建立的远程复制Pair分裂,此步骤是为了先分别拉起主备集群。

      4、登录备集群DeviceManager,执行相同的创建LUN的操作,在创建的LUN组中点击成员LUN->增加,然后选择主集群上已经创建的xlog卷,添加xlog盘,并在备集群存储创建并映射剩余的3个lun。

      5、在备集群DeviceManager中选择数据保护->LUN->远程复制Pair,然后搜索已经创建的远程复制Pair,通过操作->取消从资源保护使从端可读写。

      6、在服务->LUN组->LUN中搜索刚才创建的LUN名称,查询本端WWN,用来在服务器上映射到对应的盘符。在服务器root用户下,执行rescan-scsi-bus.sh脚本扫描创建的LUN组,执行ll /dev/disk/by-id | grep xxx查询对应的盘符。

  • 第二步: 主存储上准备xml文件。

    参考《安装升级指南》中介绍的XML文件配置方法,此处以一主一备举例:

    <?xml version="1.0" encoding="UTF-8"?>
    <ROOT>
        <!-- Vastbase整体信息 -->
        <CLUSTER>
            <!-- 数据库名称 -->
            <PARAM name="clusterName" value="cluster" />
            <!-- 数据库节点名称(hostname) -->
            <PARAM name="nodeNames" value="node1,node2" />
            <!-- 数据库安装目录-->
            <PARAM name="gaussdbAppPath" value="/opt/vastbase/install/app" />
            <!-- 日志目录-->
            <PARAM name="gaussdbLogPath" value="/opt/vastbase/install/log" />
            <!-- 临时文件目录-->
            <PARAM name="tmpMppdbPath" value="/opt/vastbase/tmp"/>
            <!-- 数据库工具目录-->
            <PARAM name="gaussdbToolPath" value="/opt/vastbase/install/tool" />
            <!-- 数据库core文件目录-->
            <PARAM name="corePath" value="/opt/vastbase/install/corefile"/>
            <!-- 节点IP,与数据库节点名称列表一一对应 -->
            <PARAM name="backIp1s" value="10.0.0.10,10.0.0.20"/>
            <PARAM name="clusterType" value="single-inst"/>
            <PARAM name="GaussVT" value="Fusion"/>
            <PARAM name="enable_dss" value="on"/>
            <PARAM name="dss_home" value="/opt/vastbase/install/dss_home"/>
            <PARAM name="dss_vg_info" value="data:/dev/sda,log0:/dev/sdb"/>
            <PARAM name="votingDiskPath" value="/dev/sdc"/>
            <PARAM name="shareDiskDir" value="/dev/sdd"/>
            <PARAM name="ss_dss_vg_name" value="data"/>
            <PARAM name="dss_ssl_enable" value="on"/>
        </CLUSTER>
        <!-- 每台服务器上的节点部署信息 -->
        <DEVICELIST>
            <!-- 节点1上的部署信息 其中“name”的值配置为主机名称(hostname) -->
            <DEVICE sn="node1">
                <PARAM name="name" value="node1"/>
                <PARAM name="azName" value="AZ1"/>
                <PARAM name="azPriority" value="1"/>
                <PARAM name="backIp1" value="10.0.0.10"/>
                <PARAM name="sshIp1" value="10.0.0.10"/>
    
                <PARAM name="cmDir" value="/opt/vastbase/install/cm"/>
                <PARAM name="cmsNum" value="1"/>
                <PARAM name="cmServerPortBase" value="27000"/>
                <PARAM name="cmServerListenIp1" value="10.0.0.10,10.0.0.20"/>
                <PARAM name="cmServerlevel" value="1"/>
                <PARAM name="cmServerRelation" value="node1,node2"/>
    
                <PARAM name="dataNum" value="1"/>
                <PARAM name="dataPortBase" value="25400"/>
                <PARAM name="dataNode1" value="/opt/vastbase/install/data/dn,node2,/opt/vastbase/install/data/dn"/>
            </DEVICE>
    
            <!-- 节点2上的节点部署信息,其中“name”的值配置为主机名称(hostname) -->
            <DEVICE sn="node2">
                <PARAM name="name" value="node2"/>
                <PARAM name="azName" value="AZ1"/>
                <PARAM name="azPriority" value="1"/>
                <PARAM name="backIp1" value="10.0.0.20"/>
                <PARAM name="sshIp1" value="10.0.0.20"/>
                <PARAM name="cmDir" value="/opt/vastbase/install/cm"/>
            </DEVICE>
        </DEVICELIST>
    </ROOT>
    

    用户需要修改节点名称、节点IP、目录、盘符、端口号。

    将xml保存在/opt/software/vastbase/cluster_config.xml中。

  • 第三步: 在主存储上执行如下操作安装部署主集群,安装用户vastbase。

    初始化安装环境,简化版操作步骤如下:

    su - root 
    mkdir -p /opt/software/vastbase
    chmod 755 -R /opt/software
    #将下载的安装包放置/opt/software/vastbase目录下
    
    cd /opt/software/vastbase
    tar -zxvf Vastbase-x.x.x-openEuler-64bit-all.tar.gz
    tar -zxvf Vastbase-x.x.x-openEuler-64bit-om.tar.gz
    
    cd /opt/software/vastbase/script
    ./gs_preinstall -U vastbase -G dbgrp -X /opt/software/vastbase/cluster_config.xml --sep-env-file=/home/vastbase/env
    
    su - vastbase
    source /home/vastbase/env
    gs_install -X /opt/software/vastbase/cluster_config.xml --dorado-cluster-mode="primary"
    

    参数说明:

    • sep-env-file:分离环境变量,参数取值是一个安装用户vastbase可以访问到的文件目录
    • dorado-cluster-mode:主机群or备集群
    • vastbase:操作系统用户
    • dbgrp:操作系统用户属组
  • 第四步: 查询主集群状态。

    建立容灾关系之后就是主集群,未建立容灾关系之前还是资源池化单集群。

    has_ctl query -Cvidp
    

    集群状态:

    [  CMServer State   ]
    
    node           node_ip         instance                           state
    -------------------------------------------------------------------------
    1  node1 10.0.0.10   1    /opt/vastbase/install/cm/cm_server Primary
    2  node2 10.0.0.20   2    /opt/vastbase/install/cm/cm_server Standby
    
    
    [ Defined Resource State ]
    
    node           node_ip         res_name instance  state
    ---------------------------------------------------------
    1  node1 10.0.0.10   dms_res  6001      OnLine
    2  node2 10.0.0.20   dms_res  6002      OnLine
    1  node1 10.0.0.10   dss      20001     OnLine
    2  node2 10.0.0.20   dss      20002     OnLine
    
    [   Cluster State   ]
    
    cluster_state   : Normal
    redistributing  : No
    balanced        : Yes
    current_az      : AZ_ALL
    
    [  Datanode State   ]
    
    node           node_ip         instance                             state            | node           node_ip         instance                             state
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1  node1 10.0.0.10   6001 25400  /opt/vastbase/install/data/dn P Primary Normal | 2  node2 10.0.0.20   6002 25400  /opt/vastbase/install/data/dn S Standby Normal
    
  • 第五步: 在备存储上准备lun和xml文件。同第一步和第二步一样。

    用户需要修改节点名称、节点IP、目录、盘符、端口号。

  • 第六步: 在备存储上集群(建立容灾关系之后就是备集群)上执行如下操作安装部署备集群,安装用户vastbase。

    简化步骤如下:

    su - root 
    mkdir -p /opt/software/vastbase
    chmod 755 -R /opt/software
    #将下载的安装包放置/opt/software/vastbase目录下
    
    cd /opt/software/vastbase
    tar -zxvf Vastbase-x.x.x-openEuler-64bit-all.tar.gz
    tar -zxvf Vastbase-x.x.x-openEuler-64bit-om.tar.gz
    
    cd /opt/software/vastbase/script
    gs_preinstall -U vastbase -G dbgrp -X /opt/software/vastbase/cluster_config.xml --sep-env-file=/home/vastbase/env
    
    su - vastbase
    gs_install -X /opt/software/vastbase/cluster_config.xml --dorado-cluster-mode="standby"
    
  • 第七步: 查询备存储上集群(建立容灾关系之后就是备集群)状态。

    建立容灾关系之后就是备集群,未建立容灾关系之前还是资源池化单集群。

    has_ctl query -Cvidp
    

    集群状态:

    [  CMServer State   ]
    
    node           node_ip         instance                           state
    -------------------------------------------------------------------------
    1  node1 20.0.0.10   1    /opt/vastbase/install/cm/cm_server Primary
    2  node2 20.0.0.20   2    /opt/vastbase/install/cm/cm_server Standby
    
    
    [ Defined Resource State ]
    
    node           node_ip         res_name instance  state
    ---------------------------------------------------------
    1  node1 20.0.0.10   dms_res  6001      OnLine
    2  node2 20.0.0.20   dms_res  6002      OnLine
    1  node1 20.0.0.10   dss      20001     OnLine
    2  node2 20.0.0.20   dss      20002     OnLine
    
    [   Cluster State   ]
    
    cluster_state   : Normal
    redistributing  : No
    balanced        : Yes
    current_az      : AZ_ALL
    
    [  Datanode State   ]
    
    node           node_ip         instance                             state            | node           node_ip         instance                             state
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1  node1 20.0.0.10   6001 25400  /opt/vastbase/install/data/dn P Primary Normal | 2  node2 20.0.0.20   6002 25400  /opt/vastbase/install/data/dn S Standby Normal
    
  • 第八步: 停止主集群,配置容灾参数,重新拉起主集群。

    has_ctl stop
    
    • 主节点

      vb_guc set -N node1 -D /opt/mpp/install/data/dn -c "application_name = 'dn_master_0'"
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -c "cross_cluster_replconninfo1='localhost=10.0.0.10 localport=25400 remotehost=20.0.0.10 remoteport=25400'"
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -c "cross_cluster_replconninfo2='localhost=10.0.0.10 localport=25400 remotehost=20.0.0.20 remoteport=25400'"
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -c "ha_module_debug = off"
          
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -h "host    all             all             20.0.0.10/32        trust"
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -h "host    all             all             20.0.0.20/32        trust"
      
    • 备节点

      vb_guc set -N node2 -D /opt/mpp/install/data/dn -c "application_name = 'dn_master_1'"
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -c "cross_cluster_replconninfo1='localhost=10.0.0.20 localport=25400 remotehost=20.0.0.10 remoteport=25400'"
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -c "cross_cluster_replconninfo2='localhost=10.0.0.20 localport=25400 remotehost=20.0.0.20 remoteport=25400'"
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -c "ha_module_debug = off"
          
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -h "host    all             all             20.0.0.10/32        trust"
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -h "host    all             all             20.0.0.20/32        trust"
      
    • 设置主集群HAS参数

      has_ctl set --param --agent -k dorado_cluster_mode=1
      
    • 启动主集群

      has_ctl start
      

      参数解释:

      ross_cluster_replconninfo:主备集群建立连接信息,localport为数据库HA端口(上述示例用的xml里的dataPortBase,当开启线程池enable_thread_pool时,需配置为实际HAPort。)

      vb_guc为Vastbase提供的修改配置文件工具,也可以通过直接打开/opt/vastbase/install/data/dn($PGDATA)下的postgresql.conf与pg_hba.conf文件将上面双引号中的内容手动写入文件中。

  • 第九步: 停止备存储上的资源池化单集群(建立容灾关系之后就是备集群),配置容灾参数。

    has_ctl stop
    
    • 主节点

      vb_guc set -N node1 -D /opt/mpp/install/data/dn -c "application_name = 'dn_standby_0'"
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -c "cross_cluster_replconninfo1='localhost=20.0.0.10 localport=25400 remotehost=10.0.0.10 remoteport=25400'"
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -c "cross_cluster_replconninfo2='localhost=20.0.0.10 localport=25400 remotehost=10.0.0.20 remoteport=25400'"
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -c "ha_module_debug = off"
          
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -h "host    all             all             10.0.0.10/32        trust"
      vb_guc set -N node1 -D /opt/mpp/install/data/dn -h "host    all             all             10.0.0.20/32        trust"
      
    • 备节点

      vb_guc set -N node2 -D /opt/mpp/install/data/dn -c "application_name = 'dn_standby_1'"
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -c "cross_cluster_replconninfo1='localhost=20.0.0.20 localport=25400 remotehost=10.0.0.10 remoteport=25400'"
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -c "cross_cluster_replconninfo2='localhost=20.0.0.20 localport=25400 remotehost=10.0.0.20 remoteport=25400'"
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -c "ha_module_debug = off"
          
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -h "host    all             all             10.0.0.10/32        trust"
      vb_guc set -N node2 -D /opt/mpp/install/data/dn -h "host    all             all             10.0.0.20/32        trust"
      
  • 第十步: 拉起首备dssserver,执行build。

    export DSS_MAINTAIN=TRUE                                                        // 打开dss手动模式
    dssserver -D /opt/vastbase/install/dss_home &                                     // 拉起dssserver,-D 指定$DSS_HOME
    vb_ctl build -D /opt/vastbase/install/data/dn -b cross_cluster_full -q
    dsscmd stopdss                                                                  // 停止手动模式的dssserver
    

    build必须需要加-q,指build成功后不拉起数据库。

  • 第十一步: 配置备集群cm参数,重新拉起备集群。

    has_ctl set --param --server -k backup_open=1
    has_ctl set --param --agent -k agent_backup_open=1
    has_ctl set --param --agent -k dorado_cluster_mode=2
    在$DSS_HOME/cfg/dss_inst.ini文件中增加一行(备集群所有节点)
    CLUSTER_RUN_MODE=cluster_standby
    

    切换同步复制关系 分裂改为同步。(非常关键)

    has_ctl start
    
  • 第十二步: 查询集群状态。

    主集群使用has_ctl query -Cvidp查询出来同第四步一样。

    备集群查询结果如下,备集群节点0从没有建立容灾关系时的primary变成建立容灾关系之后的Main Standby

    has_ctl query -Cvidp
    

    集群状态:

    [  CMServer State   ]
    
    node           node_ip         instance                           state
    -------------------------------------------------------------------------
    1  node1 20.0.0.10   1    /opt/vastbase/install/cm/cm_server Primary
    2  node2 20.0.0.20   2    /opt/vastbase/install/cm/cm_server Standby
    
    
    [ Defined Resource State ]
    
    node           node_ip         res_name instance  state
    ---------------------------------------------------------
    1  node1 20.0.0.10   dms_res  6001      OnLine
    2  node2 20.0.0.20   dms_res  6002      OnLine
    1  node1 20.0.0.10   dss      20001     OnLine
    2  node2 20.0.0.20   dss      20002     OnLine
    
    [   Cluster State   ]
    
    cluster_state   : Normal
    redistributing  : No
    balanced        : Yes
    current_az      : AZ_ALL
    
    [  Datanode State   ]
    
    node           node_ip         instance                             state            | node           node_ip         instance                             state
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1  node1 20.0.0.10   6001 25400  /opt/vastbase/install/data/dn P Main Standby Normal | 2  node2 20.0.0.20   6002 25400  /opt/vastbase/install/data/dn S Standby Normal
    
  • 第十三步: 在主集群主机和备集群首备执行查询,可观察到流复制信息。

    主集群主节点0

    vb_ctl query -D /opt/vastbase/install/data/dn
    

    流复制状态查询结果:

    vb_ctl query ,datadir is /opt/vastbase/install/data/dn
    HA state:
            local_role                     : Primary
            static_connections             : 2
            db_state                       : Normal
            detail_information             : Normal
    
    Senders info:
            sender_pid                     : 1456376
            local_role                     : Primary
            peer_role                      : StandbyCluster_Standby
            peer_state                     : Normal
            state                          : Streaming
            sender_sent_location           : 2/5C8
            sender_write_location          : 2/5C8
            sender_flush_location          : 2/5C8
            sender_replay_location         : 2/5C8
            receiver_received_location     : 2/5C8
            receiver_write_location        : 2/5C8
            receiver_flush_location        : 2/5C8
            receiver_replay_location       : 2/5C8
            sync_percent                   : 100%
            sync_state                     : Async
            sync_priority                  : 0
            sync_most_available            : Off
            channel                        : 10.0.0.10:25400-->20.0.0.10:43350
    
    Receiver info:
    No information
    

    备集群首备节点0:

    vb_ctl query -D /opt/vastbase/install/data/dn
    

    流复制状态查询结果:

    vb_ctl query ,datadir is /opt/vastbase/install/data/dn
    HA state:
            local_role                     : Main Standby
            static_connections             : 2
            db_state                       : Normal
            detail_information             : Normal
    
    Senders info:
    No information
    Receiver info:
            receiver_pid                   : 1901181
            local_role                     : Standby
            peer_role                      : Primary
            peer_state                     : Normal
            state                          : Normal 
            sender_sent_location           : 2/5C8
            sender_write_location          : 2/5C8
            sender_flush_location          : 2/5C8
            sender_replay_location         : 2/5C8
            receiver_received_location     : 2/5C8
            receiver_write_location        : 2/5C8
            receiver_flush_location        : 2/5C8
            receiver_replay_location       : 2/5C8
            sync_percent                   : 100%
            channel                        : 20.0.0.10:43350<--10.0.0.10:25400
    

自动部署同城双中心示例

自动部署的前七步同手动部署一致,当前仅从第八步开始进行说明。主集群和灾备集群状态正常的情况下,分别在主集群和灾备集群调用gs_ddr工具进行双集群关系的搭建过程;且在搭建的过程中,灾备集群完成对MainStandby节点的全量build后,需要确认在共享磁阵的管理系统中,将共享盘XLOG卷对应的LUN复制状态,由分裂修改为同步,并确保主集群所配置的共享盘角色为主端;当主集群和灾备集群共享复制LUN状态同步完成后,分别输入“yes”继续双集群的搭建过程;主集群和灾备集群集群执行的命令分别对应第八步和第九步。

操作步骤

  • 第八步: 在主集群执行start命令,指定集群为primary,并根据提示进行操作。

    主集群的json文件:

    [vastbase@node1 dn]$ cat  /opt/software/vastbase/json_file
    {
        "remoteClusterConf": {
            "port": 25400,
            "shards": [[{"ip": "20.0.0.10", "dataIp": "20.0.0.10"}, {"ip": "20.0.0.20", "dataIp": "20.0.0.20"}]]
        },
        "localClusterConf": {
            "port": 25400,
            "shards": [[{"ip": "10.0.0.10", "dataIp": "10.0.0.10"}, {"ip": "10.0.0.20", "dataIp": "10.0.0.20"}]]
        }
    }
    
    [vastbase@node1 dn]$ gs_ddr -t start -m primary -X /opt/software/vastbase/cluster_config.xml --json /opt/software/vastbase/json_file
    --------------------------------------------------------------------------------
    Dorado disaster recovery start c1d7276eb6a711ee857204e8925383b2
    --------------------------------------------------------------------------------
    Start create dorado storage disaster relationship.
    Got the step for action:[start].
    Successfully check cluster status is: Normal.
    Successfully check instance status.
    Start update pg_hba config.
    Starting set application_name param
    Successfully set application_name param.
    Stopping the cluster.
    Successfully stopped the cluster.
    Starting the cluster.
    Successfully started primary instance. Please wait for standby instances.
    Waiting cluster normal.
    Successfully started standby instances.
    Please ensure that the "Remote Replication Pairs" configured correctly between the primary cluster and the disaster recovery cluster, with Replication Mode in "Synchronous" state.
    Ready to move on (yes/no)? yes
    Waiting for the main standby connection.
    Main standby already connected.
    Successfully check cluster status is: Normal.
    Successfully removed step file.
    Successfully do dorado disaster recovery start.
    
  • 第九步: 在灾备集群执行start命令,指定集群为disaster_standby,并根据提示进行操作。

    灾备集群的json文件:

    [vastbase@node1 dn]$ cat  /opt/software/openGauss/json_file
    {
        "remoteClusterConf":{
            "port": 25400,
            "shards": [[{"ip": "10.0.0.10", "dataIp": "10.0.0.10"}, {"ip": "10.0.0.20", "dataIp": "10.0.0.20"}]]
        },
        "localClusterConf":{
            "port": 25400,
            "shards": [[{"ip": "20.0.0.10", "dataIp": "20.0.0.10"}, {"ip": "20.0.0.20", "dataIp": "20.0.0.20"}]]
        }
    }
    
    [vastbase@node1 dn]$  gs_ddr -t start -m disaster_standby -X /opt/software-ort/xml/cluster_config.xml --json /opt/software-ort/xml/json_file
    --------------------------------------------------------------------------------
    Dorado disaster recovery start eb7068ceb6a711eeb0f4989449022b00
    --------------------------------------------------------------------------------
    Start create dorado storage disaster relationship.
    Got the step for action:[start].
    Successfully check cluster status is: Normal.
    Successfully check instance status.
    Start update pg_hba config.
    Starting set application_name param
    Successfully set application_name param.
    Stopping the cluster.
    Successfully stopped the cluster.
    Start start dssserver in main standby node.
    Successfully Start dssserver on node [node1]
    Start build main standby datanode in disaster standby cluster.
    Successfully build main standby in disaster standby cluster on node [node1]
    Stop dssserver instance on main standby node.
    Successfully stop dssserver before start cluster on node [node1]
    Start set all dss instance CLUSTER_RUN_MODE.
    Successfully set dss cfg CLUSTER_RUN_MODE to cluster_standby.
    Please ensure that the "Remote Replication Pairs" configured correctly between the primary cluster and the disaster recovery cluster, with Replication Mode in "Synchronous" state.
    Ready to move on (yes/no)? yes
    Starting the cluster.
    Successfully started primary instance. Please wait for standby instances.
    Waiting cluster normal.
    Successfully started standby instances.
    Successfully check cluster status is: Normal.
    Successfully removed step file.
    Successfully do dorado disaster recovery start.
    
  • 第十步: 查询集群状态。

    主集群使用has_ctl query -Cvidp查询出来同第步一样。备集群查询结果如下,备集群节点0从没有建立容灾关系时的primary变成建立容灾关系之后的Main Standby。

    [mpp@node2 dn_6002]$ has_ctl query -Cvidp
    [  CMServer State   ]
    
    node           node_ip         instance                           state
    -------------------------------------------------------------------------
    1  node1 20.0.0.10   1    /opt/huawei/install/cm/cm_server Primary
    2  node2 20.0.0.20   2    /opt/huawei/install/cm/cm_server Standby
    
    
    [ Defined Resource State ]
    
    node           node_ip         res_name instance  state
    ---------------------------------------------------------
    1  node1 20.0.0.10   dms_res  6001      OnLine
    2  node2 20.0.0.20   dms_res  6002      OnLine
    1  node1 20.0.0.10   dss      20001     OnLine
    2  node2 20.0.0.20   dss      20002     OnLine
    
    [   Cluster State   ]
    
    cluster_state   : Normal
    redistributing  : No
    balanced        : Yes
    current_az      : AZ_ALL
    
    [  Datanode State   ]
    
    node           node_ip         instance                             state            | node           node_ip         instance                             state
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1  node1 20.0.0.10   6001 25400  /opt/huawei/install/data/dn P Main Standby Normal | 2  node2 20.0.0.20   6002 25400  /opt/huawei/install/data/dn S Standby Normal
    
  • 第十一步: 在主集群主机和备集群首备执行查询,可观察到流复制信息。

    主集群主节点0:

    [vastbase@node1 dn]$ vb_ctl query -D /opt/huawei/install/data/dn
    [2023-04-18 09:38:34.397][1498175][][vb_ctl]: vb_ctl query ,datadir is /opt/huawei/install/data/dn
    HA state:
            local_role                     : Primary
            static_connections             : 2
            db_state                       : Normal
            detail_information             : Normal
    
    Senders info:
            sender_pid                     : 1456376
            local_role                     : Primary
            peer_role                      : StandbyCluster_Standby
            peer_state                     : Normal
            state                          : Streaming
            sender_sent_location           : 2/5C8
            sender_write_location          : 2/5C8
            sender_flush_location          : 2/5C8
            sender_replay_location         : 2/5C8
            receiver_received_location     : 2/5C8
            receiver_write_location        : 2/5C8
            receiver_flush_location        : 2/5C8
            receiver_replay_location       : 2/5C8
            sync_percent                   : 100%
            sync_state                     : Async
            sync_priority                  : 0
            sync_most_available            : Off
            channel                        : 10.0.0.10:25400-->20.0.0.10:43350
    
    Receiver info:
    No information
    

    备集群首备节点0:

    [vastbase@nodename pg_log]$ vb_ctl query -D /opt/huawei/install/data/dn
    [2023-04-18 11:33:09.288][2760315][][vb_ctl]: vb_ctl query ,datadir is /opt/huawei/install/data/dn
    HA state:
            local_role                     : Main Standby
            static_connections             : 2
            db_state                       : Normal
            detail_information             : Normal
    
    Senders info:
    No information
    Receiver info:
            receiver_pid                   : 1901181
            local_role                     : Standby
            peer_role                      : Primary
            peer_state                     : Normal
            state                          : Normal 
            sender_sent_location           : 2/5C8
            sender_write_location          : 2/5C8
            sender_flush_location          : 2/5C8
            sender_replay_location         : 2/5C8
            receiver_received_location     : 2/5C8
            receiver_write_location        : 2/5C8
            receiver_flush_location        : 2/5C8
            receiver_replay_location       : 2/5C8
            sync_percent                   : 100%
            channel                        : 20.0.0.10:43350<--10.0.0.10:25400
    

常见问题

  • 问题1:双集群如何扩容

    数据盘可以直接通过dsscmd adv扩容,xlog盘需要通过以下步骤:

    1、停止备集群。

    2、同时在主备集群准备一块同样大小的lun建立远程同步关系,并使其对数据库暴露的盘符一致,详见本文档第一步。

    3、将原本的xlog pair与我们新增的pair分裂,取消从资源保护。

    4、主集群通过在线dsscmd adv扩容。

    5、备集群通过离线dsscmd adv扩容。

    6、将两个pair都进行同步,均同步完成后启动备集群即可。