快捷搜索:

【DataGuard】不能没有你,我的Standby ——Oracle DataGuard最大保护模式 故障实验


不能没有你,我的Standby ——Oracle DataGuard最大保护模式 故障实验

文章摘要:通过实验模拟DataGuard在最大保护模式下,网络故障导致所有物理备库(Physical Standby)不可用时,对主库会产生的影响。

Data Guard 提供三种数据保护模式:最大保护(Maximum Protection),最大可用(Maximum Availability)和 最大性能(Maximum Performance)。
最大保护(Maximum Protection)能够确保绝无数据丢失。但要实现这一步当然是有代价的,它要求所有的事务在提交前其REDO不仅被写入到本地的Online Redologs,还要同时写入到Standby数据库的Standby Redologs,并确认REDO数据至少在一个Standby数据库中可用(如果有多个的话), 然后才会在Primary数据库上提交事物 。为了确保数据不丢失,如果出现故障导致主库不能将redo日志写入至少一个物理备库中时,主库会被关闭。


【Oracle11g官方文档中关于Dataguard 最大保护模式的介绍】

Maximum Protection
This protection mode ensures that no data loss will occur if the primary database fails. To provide this level of protection, the redo data needed to recover a transaction must be written to both the online redo log and to the standby redo log on at least one synchronized standby database before the transaction commits.To ensure that data loss cannot occur, the primary database will shut down, rather than continue processing transactions, if it cannot write its redo stream to at least one synchronized standby database.




【实验环境】
Red Hat Enterprise Linux Server release 5.4
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0

【主库、物理备库结构信息】





【实验过程】

一、查询当前主库、物理备库保护模式

主库Pirmary 

select database_role,open_mode,protection_mode,protection_level from v$database;


物理备库Physical Standby

select database_role,open_mode,protection_mode,protection_level from v$database;




二、模拟故障( 网络故障导致主库与备库通讯中断 或 物理备库宕机 )

首先在最大保护模式下,尝试停止主库的远端归档,是不允许修改的。
alter system set log_archive_dest_state_2=deffer;


因为当前实验环境只有一个物理备库,可以通过down掉物理备库网卡,来模拟主库与物理备库之间网络故障或物理备库宕机等情况。
关闭物理备库网卡: ifconfig eth0 down



三、监测主库状态

关闭物理备库网卡时,跟踪主库告警日志。
tail -f /u01/app/oracle/diag/rdbms/beijing/BJ/trace/alert_BJ.log 
告警日志中开始有报错信息


  1. Thu Jul 17 12:11:04 2014
  2. ORA-16198: LGWR received timedout error from KSR
  3. LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
  4. ORA-16198: LGWR received timedout error from KSR
  5. Thu Jul 17 12:11:14 2014
  6. NSS2 started with pid=20, OS id=4477
  7. Thu Jul 17 12:11:29 2014


  8. ***********************************************************************

  9. Fatal NI connect error 12543, connecting to:
  10.  (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=shanghai.lxh.net)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=shanghai)(CID=(PROGRAM=oracle)(HOST=beijing.lxh.net)(USER=oracle))))

  11.   VERSION INFORMATION:
  12.         TNS for Linux: Version 11.2.0.3.0 - Production
  13.         TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
  14.   Time: 17-JUL-2014 12:11:29
  15.   Tracing not turned on.
  16.   Tns error struct:
  17.     ns main err code: 12543
  18.     
  19. TNS-12543: TNS:destination host unreachable
  20.     ns secondary err code: 12560
  21.     nt main err code: 513
  22.     
  23. TNS-00513: Destination host unreachable
  24.     nt secondary err code: 113
  25.     nt OS err code: 0


  26. ***********************************************************************

此时在主库上模拟业务操作 会处于挂起状态无发成功
create table scott.test2 as select * from scott.emp;
跟踪日志中不断更新报错信息,主库无法与物理备库通讯,最终导致主库被关闭。




主库告警日志显示,所有Standby database不可达,需要关闭实例来保护主库。




点击(此处)折叠或打开

  1. Fatal NI connect error 12543, connecting to:
  2.  (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=shanghai.lxh.net)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=shanghai)(CID=(PROGRAM=oracle)(HOST=beijing.lxh.net)(USER=oracle))))

  3.   VERSION INFORMATION:
  4.         TNS for Linux: Version 11.2.0.3.0 - Production
  5.         TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
  6.   Time: 17-JUL-2014 12:16:19
  7.   Tracing not turned on.
  8.   Tns error struct:
  9.     ns main err code: 12543
  10.     
  11. TNS-12543: TNS:destination host unreachable
  12.     ns secondary err code: 12560
  13.     nt main err code: 513
  14.     
  15. TNS-00513: Destination host unreachable
  16.     nt secondary err code: 113
  17.     nt OS err code: 0
  18. Error 12543 received logging on to the standby
  19. Thu Jul 17 12:16:20 2014
  20. LGWR: Error 12543 attaching to RFS for reconnect
  21. Error 16198 for archive log file 1 to \'sh\'
  22. Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
  23. LGWR: All standby destinations have failed
  24. ******************************************************
  25. WARNING: All standby database destinations have failed
  26. WARNING: Instance shutdown required to protect primary
  27. ******************************************************
  28. LGWR (ospid: 4337): terminating the instance due to error 16098
  29. Thu Jul 17 12:16:20 2014
  30. System state dump requested by (instance=1, osid=4337 (LGWR)), summary=[abnormal instance termination].
  31. System State dumped to trace file /u01/app/oracle/diag/rdbms/beijing/BJ/trace/BJ_diag_4327.trc
  32. Dumping diagnostic data in directory=[cdmp_20140717121620], requested by (instance=1, osid=4337 (LGWR)), summary=[abnormal instance termination].
  33. Instance terminated by LGWR, pid = 4337


【实验总结】
通过上述实验模拟Oracle DataGuard在最大保护模式时,物理备库宕机或者网络故障导致所有物理备库不可用的情况,可以看到备库的不可达最终会导致主库Shutdown。
因此最大保护模式可以绝对保证数据没有丢失,但对主库与备库的网络连通情况、备库的稳定性等也有非常高的要求。

Oracle 11g官方文档中,在对最大保护模式的描述中,也推荐至少使用两个Standby database,来避免单个备库故障导致主库被关闭的情况。

本文源自: AG88环亚

您可能还会对下面的文章感兴趣: