I really don’t want to turn this blog into something making our database look bad. But in this case it is really necessary as it is VERY UNUSUAL that we recommend to patch the database BEFORE upgrade.
Just for clarification:
The following topic will affect all databases between 126.96.36.199 and 188.8.131.52.1 – those (and only those) need to be patched BEFORE upgrade. The topic is fixed in 184.108.40.206 but as it gets introduced with the BEFORE upgrade database version you’ll have to apply the fix before upgrade. The inclusion of the fix in 220.127.116.11 means only that the misbehavior won’t happen there again. But as it is a meta data dictionary corruption you’ll have to apply the fix before as otherwise it will break during or after the upgrade.
First of all, thanks to Ehtiram Hasanov (cleverbridge AG) and Oliver Pyka (http://www.pyka.de/) for highlighting this to me. And sorry for hitting this issue …
After upgrading to Oracle Database 18.104.22.168 you’ll get one of the below errors when trying to read data:
- ORA-07445: exception encountered: core dump [qcsIsColInFro()+358] [SIGSEGV] [ADDR:0x4] [PC:0xCDB4A26] [Address not mapped to object] 
- ORA-12899 / ORA-607
- ORA-600 [kdmv_check_row_2:IMCU row has wrong contents]
- ORA-600 [kddummy_blkchk]
- ORA-600 [kdBlkCheckError]
- ORA-600 [klaprs_12]
- ORA-600 
- ORA-600 
Basically this happens when you try to drop a column with a DEFAULT value and a NOT NULL definition – it ends up with dropped column data being written to disk leading to block corruptions. This causes problems for generating undo which cannot be applied; a ROLLBACK fails.
If you need more information please look up this MOS Note about Bug 17325413 – Drop column with DEFAULT value and NOT NULL definition ends up with dropped column data hitting disk leading to corruption
Versions being affected:
- These versions require to be patched BEFORE upgrade:
- Oracle Database 22.214.171.124.9 and above (may happen with earlier PSUs as well)
Solution: Apply the fix 17325413 on top – see below
- Oracle Database 126.96.36.199.0 and 188.8.131.52.1
Solution: Apply the most recent PSU
- Oracle Database 184.108.40.206.9 and above (may happen with earlier PSUs as well)
- These versions can get you the issue if you haven’t patched BEFORE upgrade:
- Oracle Database 220.127.116.11
- Oracle Database 18.104.22.168
Workaround and/or Fix:
The MOS Note about Bug 17325413 – Drop column with DEFAULT value and NOT NULL definition ends up with dropped column data hitting disk leading to corruption explains the workaround WHEN you hit this issues.
As a precaution you will have to make sure that you applied one of those fixes BEFORE upgrading to Oracle Database 22.214.171.124. as the fix for Bug 17325413 is included in all those mentioned below (list is taken from above MOS Note as well).
- 126.96.36.199 (Server Patch Set)
- 188.8.131.52.3 Database Patch Set Update
- 184.108.40.206.2 Database Patch Set Update
- 220.127.116.11 Bundle Patch 4 for Exadata Database
- 18.104.22.168 Bundle Patch 23 for Exadata Database
- 22.214.171.124 Patch 7 on Windows Platforms
- 126.96.36.199 Patch 1 on Windows Platforms
- 188.8.131.52 Patch 29 on Windows Platforms
- Patch 17325413 is available only for 184.108.40.206 psu 9 and psu 12 but based on Oracle Support research (and customer’s experience – see below) it should work also on higher PSU levels for Oracle Database 220.127.116.11.
- There’s a support note out there offering a solution once you hit the issues:
MOS Note: 2017572.1 – ORA-7445 [qcsIsColInFro] Querying After Upgrade to 12c
But based on Ehtiram’s experience this is not a complete solution and requires additional work and testing.
- It seems to be that the issue exists in Oracle 18.104.22.168 (most likely in 22.214.171.124) and all 11.2.0.x databases below 126.96.36.199.2 as wellLet me quote Ehtiram:“Yes, the ‘uncomplete’ bug fix(patch#17325413) was only available up to PSU 12 for 188.8.131.52, but we already had 184.108.40.206+PSU 14 and at that time applied PSU 15.
The oracle support confirmed that, we can use the patch version(patch#17325413) for PSU 12 and that should work also in the higher PSU versions. That did work actually, patch applied without a problem.
But as I mentioned, manual workaround scripts from oracle(note 17325413) are not correct/complete. I just considered all the columns in the database, which are nullable, made them ‘not null’ and immediately back to ‘null’. Anyway, affected columns will be the subset of them. Although, there were some kind of ‘false positives’(because nullable column of dba_tab_cols does not always represents the truth). We did fix them back to to original status.”
The best way to avoid this is really to apply the patch (or the PSU/BP including the patch) before upgrading.
The issue has been mentioned in “Oracle 220.127.116.11 – Known Issues and Alerts” (MOS Note:1562139.1) under “Issues Introduced”:
But that does jump into your eye as a thing you need to fix before upgrade. We’ll see if we can get the issue added to the 12c MOS Notes as “Upgrade Issues“.
A few initial thoughts –
(1) As of now, the syntax of the "workaround" in MOS Note 2017572.1 contains at least 1 syntax error
(2) What about databases that have already been upgraded to 12c from 11g?
if you send me the syntax error then I’ll forward it to the note owner for correction.
And if you have upgraded and didn’t see this all should be fine.
Personally I don’t think that this will hit a huge number of people as it will apply to 18.104.22.168.9 and above – and 22.214.171.124.0 and 126.96.36.199.1. But it could also be that many of the "I never patch my database" folks out there are on plain 188.8.131.52.0 now – and they’ll hit this once they upgrade to 12c or 12.2.
this problem should be fixed in "184.108.40.206 Patch Set – List of Bug Fixes by Problem Type (Doc ID 1683802.1)", right?
Please look into the list above – it says:
the fix for Bug 17325413 is included in all those mentioned below (list is taken from above MOS Note as well).
220.127.116.11 (Server Patch Set)
Is this only referring to upgrades from 11g to 12c, or is this an issue with 11g upgrades, for example from 18.104.22.168 to 22.214.171.124 as well?
I’ve reread the note a couple of times and am still not clear on it.
Hm, I thought my clarification is obvious by saying:
"The following topic will affect databases 126.96.36.199.9+, 188.8.131.52.0 and 184.108.40.206.1 only – those (and only those) need to be patched BEFORE upgrade. The topic is fixed in 220.127.116.11 but as it gets introduced with the BEFORE upgrade database version you’ll have to apply the fix before upgrade."
So it will affect all databases 18.104.22.168.9+, 22.214.171.124.0 and 126.96.36.199.1.
And the topic is fixed in 188.8.131.52 and the patch sets 184.108.40.206.2+
thank you for this crucial information, we still have a question about the sequence of events that lead to the corruption, is it:
A) On an unpatched 220.127.116.11 database you create a table with a column with not null and default value, then while the database is still 18.104.22.168 you drop the column, after upgrading said database to 12c the corruption appears?
B) On an unpatched 22.214.171.124 database you create a table with a column with not null and default value, then after upgrading said database to 12c you drop the column and then the corruption appears?
If the case is A) it is bad, but after you upgraded to 12c and your users used the db for a time you can be pretty sure everything is ok (or do a select * on all data schemes). If the case is B) it is a disaster cause at any time you could ruin your data by dropping a column that was created while the db was 126.96.36.199, and would raise the question what to do with all the databases that were already upgraded to 12c before knowing about this issue.
For our remaining 188.8.131.52.0 databases, we currently plan to apply path 17325413 directly before upgrading to 12c.
may I recommend that you’ll apply the PSU from July (or in two weeks the Oct 2015) to your 184.108.40.206 databases instead. This issue is fixed in 220.127.116.11.2 and onwards.
The corruption will appear after the upgrade – that’s why we strongly recommend to patch the database before upgrade which is unusual but true.
updating to the PSU is not really an option given the troubles we had in the past with individual Opatches (new opatches reintroducing old bugs etc.). Our current base system of 18.104.22.168 contains 11 individual Opatches, all for individual errors we found while running 11gR2, about half of which are part of 22.214.171.124.7, half are not.
Given the bad experiences with past Opatches, it is very difficult to get the permission to install what is basically a set of 40-60 additional Opatches and hoping none of them introduce/reintroduce new problems without extensive testing that we d rather invest in getting a stable base system for 12c (that one has already 20 individual Opatches).
I am not even talking about getting ports for all of our Opatches that are not part of PSU 126.96.36.199.7/8. So as there is a single Opatch 17325413 for 188.8.131.52, we would definitely rather use that as it reduces the risk.
I understand that the corruption will appear after the upgrade, so i guess the delete of the column has to happen before upgrading to 184.108.40.206? That would mean that you recommend applying that OPatch/PSU as soon as possible to all 220.127.116.11 databases to protect them from developing that issue?
yes I do recommend to apply the fix to all 18.104.22.168 databases if you know that you won’t able to apply a newer PSU than 22.214.171.124.1.
And speaking generally I can see your points only partially. I know that it’s not always easy to take the right decision but please always keep in mind:
There are roughly (just estimating) 100 security fixes you’ll miss. I wouldn’t want to deal with that risk. But that is just my personal habbit 😉
Following an example, I was able to create the situation that produced bad data, on our 126.96.36.199.1 database. I applied the patch and created a second table, and was not able to recreate the bad situation. But the patch did nothing to fix the data in the first table that was already corrupt.
Is there any way to identify if the situation already exists? And then what to do about it. on the 188.8.131.52.1 database?
the check I can think of is to query for columns with DEFAULT value and a NOT NULL definition.
Otherwise you may please check with Support for further advice.
Is this only relevant for "in-situ" upgrades? i.e. if you were to upgrade from an affected 184.108.40.206.0 database to a 220.127.116.11.0 database using export/import, would you then avoid the issue?
when you prefer expdp/impdp for the move to 12.1 you won’t be affected by this issue.
But still I’d recommend highly the most recent PSU for Oracle 18.104.22.168 – and since 22.214.171.124.1 it includes the fix as well.