RMAN backup failed due to “not enough space on the disk”:
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 07/23/2018 15:32:17
ORA-19502: write error on file "E:\BACKUPS\TCT8NTGV_1_1", block number 559105 (block size=512)
ORA-27070: async read/write failed
OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error: (OS 112) There is not enough space on the disk.
Really, that disk was 100% taken by RMAN backups. But why the heck hasn’t our deletion policy prevented this situation?!
Analyzing the existing RMAN backups showed way too much (and such too old) backups of archivelogs that normally shouldn’t have been there anymore. Our last full backup was of 14.07.2018 but we still had archivelog backups from 12.07.2018 onwards – although our RETENTION POLICY looked like this:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 2 DAYS;
Mind the timestamp of the RMAN error above: 23.07.2018. So we in fact hold backups of the past eleven days! How come…?!
I found all existing full backup’s datafiles to have the same checkpoint timestamp of 12.07.2018. So that’s why there are archivelog backups older than the oldest full backup: The oldest full backup of 14.07. actually contains datafiles with checkpoint time 12.07. – and hence all archivelogs as of 12.07. are still needed for recovery to comply with our retention policy.
RMAN> list backup of database summary;
List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
49416 B F A DISK 14.07.2018 00:14:43 1 1 YES FULL_BACKUP_PAS_071318100018
49913 B F A DISK 21.07.2018 00:07:21 1 1 YES FULL_BACKUP_PAS_072018100007
49914 B F A DISK 21.07.2018 00:10:53 1 1 YES FULL_BACKUP_PAS_072018100007
49981 B F A DISK 22.07.2018 00:03:14 1 1 YES FULL_BACKUP_PAS_072118100004
49988 B F A DISK 22.07.2018 02:19:43 1 1 YES FULL_BACKUP_PAS_072118100004
RMAN> list backupset 49913,49914,49981,49988;
List of Backup Sets
===================
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
49913 Full 17.23G DISK 02:06:46 21.07.2018 00:07:21
BP Key: 49913 Status: AVAILABLE Compressed: YES Tag: FULL_BACKUP_PAS_072018100007
Piece Name: E:\BACKUPS\O3T8GN83_1_1
List of Datafiles in backup set 49913
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- ------------------- ----
4 Full 14152463508157 12.07.2018 00:15:00 D:\ORADATA\PASP\TS_PASP_AIR_ARCH_01.DBF
5 Full 14152463508199 12.07.2018 00:15:00 D:\ORADATA\PASP\TS_PASP_B00_D_01.DBF
6 Full 14152463508578 12.07.2018 00:15:03 D:\ORADATA\PASP\TS_PASP_B00_I_01.DBF
7 Full 14152463508616 12.07.2018 00:15:03 D:\ORADATA\PASP\TS_PASP_B01_D_01.DBF
9 Full 14152463508720 12.07.2018 00:15:03 D:\ORADATA\PASP\TS_PASP_B04_D_01.DBF
11 Full 14152463508804 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B16_D_01.DBF
13 Full 14152463508987 12.07.2018 00:15:05 D:\ORADATA\PASP\TS_PASP_B28_I_01.DBF
14 Full 14152463509069 12.07.2018 00:15:05 D:\ORADATA\PASP\TS_PASP_CUBE_D_01.DBF
19 Full 14152463508804 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B16_D_02.DBF
21 Full 14152503333567 20.07.2018 22:00:35 D:\ORADATA\PASP\TS_PASP_B04_I_03.DBF
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
49914 Full 18.85G DISK 02:10:18 21.07.2018 00:10:53
BP Key: 49914 Status: AVAILABLE Compressed: YES Tag: FULL_BACKUP_PAS_072018100007
Piece Name: E:\BACKUPS\O4T8GN83_1_1
List of Datafiles in backup set 49914
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- ------------------- ----
1 Full 14152463508069 12.07.2018 00:14:59 D:\ORADATA\PASP\TS_PASP_SYSTEM_01.DBF
2 Full 14152463508088 12.07.2018 00:14:59 D:\ORADATA\PASP\TS_PASP_SYSAUX_01.DBF
3 Full 14152463508109 12.07.2018 00:14:59 D:\ORADATA\PASP\TS_PASP_UNDOTBS_01.DBF
8 Full 14152463508657 12.07.2018 00:15:03 D:\ORADATA\PASP\TS_PASP_B01_I_01.DBF
10 Full 14152463508757 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B04_I_01.DBF
12 Full 14152463508859 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B28_D_01.DBF
15 Full 14152463509103 12.07.2018 00:15:05 D:\ORADATA\PASP\TS_PASP_CUBE_I_01.DBF
16 Full 14152463509135 12.07.2018 00:15:06 D:\ORADATA\PASP\PROD_BIPLATFORM.DBF
17 Full 14152463509165 12.07.2018 00:15:06 D:\ORADATA\PASP\PROD_MDS.DBF
18 Full 14152463508757 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B04_I_02.DBF
20 Full 14152503333581 20.07.2018 22:00:36 D:\ORADATA\PASP\TS_PASP_B04_D_02.DBF
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
49981 Full 17.70G DISK 02:02:41 22.07.2018 00:03:14
BP Key: 49981 Status: AVAILABLE Compressed: YES Tag: FULL_BACKUP_PAS_072118100004
Piece Name: E:\BACKUPS\Q8T8JBK1_1_1
List of Datafiles in backup set 49981
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- ------------------- ----
1 Full 14152463508069 12.07.2018 00:14:59 D:\ORADATA\PASP\TS_PASP_SYSTEM_01.DBF
2 Full 14152463508088 12.07.2018 00:14:59 D:\ORADATA\PASP\TS_PASP_SYSAUX_01.DBF
3 Full 14152463508109 12.07.2018 00:14:59 D:\ORADATA\PASP\TS_PASP_UNDOTBS_01.DBF
4 Full 14152463508157 12.07.2018 00:15:00 D:\ORADATA\PASP\TS_PASP_AIR_ARCH_01.DBF
5 Full 14152463508199 12.07.2018 00:15:00 D:\ORADATA\PASP\TS_PASP_B00_D_01.DBF
8 Full 14152463508657 12.07.2018 00:15:03 D:\ORADATA\PASP\TS_PASP_B01_I_01.DBF
10 Full 14152463508757 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B04_I_01.DBF
12 Full 14152463508859 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B28_D_01.DBF
15 Full 14152463509103 12.07.2018 00:15:05 D:\ORADATA\PASP\TS_PASP_CUBE_I_01.DBF
18 Full 14152463508757 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B04_I_02.DBF
20 Full 14152506440447 21.07.2018 22:00:33 D:\ORADATA\PASP\TS_PASP_B04_D_02.DBF
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
49988 Full 17.40G DISK 02:14:16 22.07.2018 02:19:43
BP Key: 49988 Status: AVAILABLE Compressed: YES Tag: FULL_BACKUP_PAS_072118100004
Piece Name: E:\BACKUPS\QFT8JIU7_1_1
List of Datafiles in backup set 49988
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- ------------------- ----
6 Full 14152463508578 12.07.2018 00:15:03 D:\ORADATA\PASP\TS_PASP_B00_I_01.DBF
7 Full 14152463508616 12.07.2018 00:15:03 D:\ORADATA\PASP\TS_PASP_B01_D_01.DBF
9 Full 14152463508720 12.07.2018 00:15:03 D:\ORADATA\PASP\TS_PASP_B04_D_01.DBF
11 Full 14152463508804 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B16_D_01.DBF
13 Full 14152463508987 12.07.2018 00:15:05 D:\ORADATA\PASP\TS_PASP_B28_I_01.DBF
14 Full 14152463509069 12.07.2018 00:15:05 D:\ORADATA\PASP\TS_PASP_CUBE_D_01.DBF
16 Full 14152463509135 12.07.2018 00:15:06 D:\ORADATA\PASP\PROD_BIPLATFORM.DBF
17 Full 14152463509165 12.07.2018 00:15:06 D:\ORADATA\PASP\PROD_MDS.DBF
19 Full 14152463508804 12.07.2018 00:15:04 D:\ORADATA\PASP\TS_PASP_B16_D_02.DBF
21 Full 14152507027738 22.07.2018 00:05:27 D:\ORADATA\PASP\TS_PASP_B04_I_03.DBF
RMAN>
I dug through the alert.log to find a hint on what had frozen our datafiles to that checkpoint time. For the days prior to 12.07. I noticed a bunch of “ALTER TABLESPACE … BEGIN BACKUP” commands shortly followed by the corresponding “ALTER TABLESPACE … END BACKUP” – every day right after midnight. Up to 12.07.2018: Here the “ALTER TABLESPACE … END BACKUP” was missing. Expectedly querying V$BACKUP showed nearly all datafiles still in ACTIVE backup mode:
SQL> select * from v$backup;
FILE# STATUS CHANGE# TIME
---------- ------------------ ---------- -------------------
1 ACTIVE 1.4152E+13 12.07.2018 00:14:59
2 ACTIVE 1.4152E+13 12.07.2018 00:14:59
3 ACTIVE 1.4152E+13 12.07.2018 00:14:59
4 ACTIVE 1.4152E+13 12.07.2018 00:15:00
5 ACTIVE 1.4152E+13 12.07.2018 00:15:00
6 ACTIVE 1.4152E+13 12.07.2018 00:15:03
7 ACTIVE 1.4152E+13 12.07.2018 00:15:03
8 ACTIVE 1.4152E+13 12.07.2018 00:15:03
9 ACTIVE 1.4152E+13 12.07.2018 00:15:03
10 ACTIVE 1.4152E+13 12.07.2018 00:15:04
11 ACTIVE 1.4152E+13 12.07.2018 00:15:04
12 ACTIVE 1.4152E+13 12.07.2018 00:15:04
13 ACTIVE 1.4152E+13 12.07.2018 00:15:05
14 ACTIVE 1.4152E+13 12.07.2018 00:15:05
15 ACTIVE 1.4152E+13 12.07.2018 00:15:05
16 ACTIVE 1.4152E+13 12.07.2018 00:15:06
17 ACTIVE 1.4152E+13 12.07.2018 00:15:06
18 ACTIVE 1.4152E+13 12.07.2018 00:15:04
19 ACTIVE 1.4152E+13 12.07.2018 00:15:04
20 NOT ACTIVE 0
21 NOT ACTIVE 0
21 rows selected.
I didn’t had a clue where this commands came from, as RMAN doesn’t needed them for to do the backup and we use no other custom scripts for backup.
So I ended this backup mode quickly:
SQL> alter database end backup;
alter database end backup
*
ERROR at line 1:
ORA-01260: warning: END BACKUP succeeded but some files found not to be in
backup mode
SQL> select * from v$backup;
FILE# STATUS CHANGE# TIME
---------- ------------------ ---------- -------------------
1 NOT ACTIVE 1.4152E+13 12.07.2018 00:14:59
2 NOT ACTIVE 1.4152E+13 12.07.2018 00:14:59
3 NOT ACTIVE 1.4152E+13 12.07.2018 00:14:59
4 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:00
5 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:00
6 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:03
7 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:03
8 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:03
9 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:03
10 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:04
11 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:04
12 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:04
13 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:05
14 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:05
15 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:05
16 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:06
17 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:06
18 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:04
19 NOT ACTIVE 1.4152E+13 12.07.2018 00:15:04
20 NOT ACTIVE 0
21 NOT ACTIVE 0
21 rows selected.
To finally get rid of the “obsolete” old backups I had to free up some space to do a fresh full backup with current checkpoint time. As all existing full backups virtually hold the same data (as of the checkpoint’s view) I decided to delete the full backups of the last two days and relied on the archivelog backups until all was back to normal.
Now for the culprit of this mess: It turned out, that our cloud service provider uses Veeam to manage and backup our virtual machines. Although they just do a “simple” backup of the VMs, and there is usually no dedicated Oracle backup via Veeam configured, on this very VM there WAS a Veeam-Oracle-Backup somehow enabled. And Veeam does this backup with putting all tablespaces into backup mode for just a short time to get a consistent snapshot. Hence the incriminated BEGIN/END BACKUP commands in our alert.log, which caused the trouble. I’ll have a word on this with ’em to disable this Veeam thing in respect to our databases. But I wonder why Veeam doesn’t use an “ALTER DATABASE [BEGIN|END] BACKUP” instead of an “ALTER TABLESPACE … [BEGIN|END] BACKUP”. I think this way they won’t get “cross tablespace consistency”.