VBox 5.0.10/12 issues with PERL and Seg Faults – UPDATE

A bit more than two months ago I did hear from several people having issues with our Hands-On Lab environment. And it became clear that only those who use Oracle Virtual Box 5 see such errors.

VBox 5.0.10 crash issues with our Hands-On-Lab

Oracle VirtualBox 5.0.x – Segmentation Fault in PERL

Then I read Danny Bryant‘s blog post (thanks to Deiby Gomez for pointing me to it) about similar issues and a potential solution yesterday:

And interestingly one of my colleagues, our PL/SQL product manager Bryn Llewellyn started an email thread and a test internally yesterday as well. The issue seem to occur only on newer versions of Apple’s MacBooks.

Potential Root Cause

The PERL issues seem to happen only on specific new Intel CPUs with a so called 4th level cache.

The current assumption is that Intel CPUs with Iris Pro graphics are affected. Iris Pro means eDRAM (embedded DRAM) which is reported as 4th level cache in CPUID. We have confirmed that Crystal Well and Broadwell CPUs with Iris Pro are affected. It is likely that the Xeon E3-1200 v4 family is also affected.

It seems to be that there’s a bug in the perl binary. It links against ancient code from the Intel compiler suite doing optimizations according to the CPU features. Very recent Intel CPUs have 4 cache descriptors.

People who encountered this used Virtual Box VBox 5.0.x – and it passes this information to the guest. This leads to a problem within the perl code. You won’t see it on VBox 4.3 as this version does not pass the information to the guest. 

But actually it seems that this issue is independent of Virtual Box or any other virtualization software. It simply happens in this case as many people use VBox on Macs – and some Macs are equipped with this new CPU model. But people run Oracle in VBox environments and therefore see the issue as soon as they upgraded to VBox 5.0.x.

Potential Solutions

If you are using Oracle in VBox there are actually two solutions:

  • Revert to VBox 4.3 as this won’t get you in trouble
    This problem was not triggered on VBox 4.3.x because this version did not  pass the full CPUID cache line information to the guest.
  • Run this sequence of commands in VBox 5.0 to tweak the CPUID bits passed to the guest:
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/Leaf" "0x4"
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/SubLeaf" "0x4"
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/eax"  "0"
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/ebx" "0" 
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/ecx" "0" 
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/edx"  "0"
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/SubLeafMask" "0xffffffff"
    • Of course you’ll need to replace VM_NAME by the name of your VM.

If the error happens on a bare metal machine meaning it happens not inside a virtual image but on a native environment then the only chance you’ll have (to my knowledge right now) is to exchange the PERL before doing really something such as running root.sh or rootupgrade.sh in your Grid Infrastructure installation or before using the DBCA or the catctl.pl tool to create or upgrade a database.

In this case please refer to the blog post of Laurent Leturgez:

Issues with Oracle PERL causing segmentation faults:
http://laurent-leturgez.com/2015/05/26/oracle-12c-vmware-fusion-and-the-perl-binarys-segmentation-fault

Further Information

This issues is currently tracked internally as bug 22539814: ERRORS INSTALLING GRID INFRASTRUCTURE 12.1.0.2 ON INTEL CPUS WITH 4 CACHE LEVEL.

So far we have not seen reports by people encountering this in a native environment but only by people using VBox 5.0.x or Parallels or VMware on a very modern version of Apple hardware.
–Mike

VBox 5.0.10 crash issues with our Hands-On-Lab

Milano - Nov 2015 (c) Mike Dietrich

I’ve ran two Hands-On-Workshops with customers and partners in Italy last week in Milano where we used our well known and thousands-of-times proven Hand-On-Lab environment:

But this time some people failed while running the lab with random corruptions either shutting down the entire VM while running – or displaying file corruptions in the spfile – or other issues.

The common thing in all cases: People had VBox 5.0.10 downloaded and installed right before the workshop.

Of course they’ve did it – as I’m tempted too since weeks. Every time I start VBox on my PC Oracle Virtual Box asks me:

Even though the screenshot is German you know what it offers me:
Download and Install Virtual Box 5.0.10.

Actually the current issue reminds me a lot on what I have experienced in 2014 in an Upgrade Hands-On Workshop in Vienna, Austria. 20 Oracle partners came together for two days for a Hands-On Upgrade/Migrate/Consolidate training. And 6 or 7 had random issues with their Virtual Box images. Corruptions. Failing upgrades at random phases. No patterns.

Only until somebody figured out via a Google search that at the same time other people started reporting similar behavior with their own VBox images using the brand new version of Virtual Box. It turned out that this newest version of Oracle Virtual Box 4.3 (I think it was 26) had exactly such issues. Everybody else in our room – including myself – running a version a few weeks older had no issues at all.

When we exchanged the affected installations the next morning replacing it (if I remember correctly: 4.3.24) all went fine for the rest of the workshop.

I won’t say that VBox 5.0.10 is bad as I lack evidence, reproducible test cases, bugs. 

But I follow other people’s Twitter and Facebook messages. And it seems to be that the PERL problem I did report a few days back:

Oracle VirtualBox 5.0.x – Segmentation Fault in PERL

is not he only issue with VBox images build in version 4 – and now running (more or less) on VBox 5.0.10.

Please see also:

–Mike

Oracle VirtualBox 5.0.x – Segmentation Fault in PERL

Please see as well:

VBox 5.0.10 crash issues with our Hands-On-Lab


 

Yesterday and the day before I’ve exchanged several emails with Ana who downloaded our Hands-On-Lab from here:

after OOW15, encountering a SEGMENTATION FAULT when trying to start the database upgrade with catctl.pl:

$ $ORACLE_HOME/perl/bin/perl catctl.pl catupgrd.sql

Segmentation fault 

Very strange thing … 

The database is in upgrade mode (checked this in the alert.log) and there are no strange things mentioned anywhere. Plus hundreds of people have run and completed our lab so far.

Tue Nov 10 20:39:47 2015
MMON started with pid=21, OS id=9828
Starting background process MMNL
Tue Nov 10 20:39:47 2015
MMNL started with pid=22, OS id=9832
Stopping Emon pool
Tue Nov 10 20:39:47 2015
ALTER SYSTEM enable restricted session;
Tue Nov 10 20:39:47 2015
ALTER SYSTEM SET _system_trig_enabled=FALSE
SCOPE=MEMORY;
Autotune of undo retention is turned off.
Tue Nov 10 20:39:47 2015
ALTER SYSTEM SET _undo_autotune=FALSE
SCOPE=MEMORY;
Tue Nov 10 20:39:47 2015
ALTER SYSTEM SET undo_retention=900 SCOPE=MEMORY;
Tue Nov 10 20:39:47 2015
ALTER SYSTEM SET aq_tm_processes=0 SCOPE=MEMORY;
Tue Nov 10 20:39:47 2015
ALTER SYSTEM SET enable_ddl_logging=FALSE
SCOPE=MEMORY;
Resource Manager disabled during database
migration: plan '' not set
Tue Nov 10 20:39:47 2015
ALTER SYSTEM SET resource_manager_plan=
SCOPE=MEMORY;
Tue Nov 10 20:39:47 2015
ALTER SYSTEM SET recyclebin='OFF' DEFERRED
SCOPE=MEMORY;
Resource Manager disabled during database
migration
replication_dependency_tracking turned off (no
async multimaster replication found)
AQ Processes can not start in restrict mode
Starting background process CJQ0
Tue Nov 10 20:39:47 2015
CJQ0 started with pid=27, OS id=9836
Completed: ALTER DATABASE OPEN MIGRATE 

We checked several other things – and then I came across this tweet by Martin Klier yesterday:

and started to search a bit.

I have no 100% proof for the actual reason but several people seem to have issues with SEGMENTATION FAULTs in Oracle’s PERL ($ORACLE_HOME/perl/bin/perl) when using Oracle VirtualBox 5.0.x – and according to VitualBox Forum that seems to happen with the most recent VBox 5.0.10 as well.

The “funny” thing is that all works perfectly well in VBox 4.3.x

It reminds me a lot on the reoccuring VBox bug with my German keyboard not allowing me to type in the | (pipe) character which requires to press “ALT GR” +  “<” keys together.

–Mike
,
.

VBox Hands-on-Lab image – build your own :-)

Oh … I know … I promised to post all the details how I’ve build up our pretty straight forward Hands-On-Lab Roy, Carol, Cindy, Joe and I used at OOW and some other occasions to let you upgrade, migrate and consolidate databases to Oracle Database 12c and into Oracle Multitenant.

And well, some have emailed me already … and I had this feeling that my schedule will be very tight after OOW. Even right now (Sunday evening) I’m already back at my second home, Lufthansa Senator Lounge at Munich Airport. Waiting for my flight to Rome in an hour or so. Honestly speaking I had really no time in the past weeks to sit down for 2 hours to write down all the steps to guide you through the rebuild. And I didn’t want to throw just a few nuggets – my intention is always to get you detailed steps which really work and don’t miss anything.

But I have very good news for all who are waiting for the HOL Image 🙂
Roy is working hard (and I’m confident that he’ll succeed) to get the image published on OTN within the next weeks. So please stay tuned. Even with the Christmas holidays coming up I’m tied into a schedule to visit Rome, Torino, Milan, Brussels, assist some customers in their final go-live-phase for Oracle Database 12c – and I’m really looking forward to that vacation.

Stay tuned – and thanks again for your patience 🙂

-Mike

Is Oracle certified to run on VMWare?

This question in similar occurences gets asked during every Upgrade Workshop at least once. People would like to know if they can run an Oracle Database or Oracle Real Application Clusters or Oracle Grid Control or Oracle Fusion Middleware or … in an VM environment with VMWare’s virtualisation products.

2011_01_17_ovm.jpg

And the answer is: Yes, you can!!
But … there’s a fine print you should take care on before setting up virtual environments with a different solution than XEN based Oracle VM.

Please read Note:942852.1 – VMWare Certification for Oracle Products and Note:249212.1 – Support Position for Oracle Products Running on VMWare Virtualized Environments for further details:

Support Status for VMware Virtualized Environments
Oracle has not certified any of its products on VMware virtualized environments. Oracle Support will assist customers running Oracle products on VMware in the following manner: Oracle will only provide support for issues that either are known to occur on the native OS, or can be demonstrated not to be as a result of running on VMware.

If a problem is a known Oracle issue, Oracle support will recommend the appropriate solution on the native OS. If that solution does not work in the VMware virtualized environment, the customer will be referred to VMware for support. When the customer can demonstrate that the Oracle solution does not work when running on the native OS, Oracle will resume support, including logging a bug with Oracle Development for investigation if required.

If the problem is determined not to be a known Oracle issue, we will refer the customer to VMware for support. When the customer can demonstrate that the issue occurs when running on the native OS, Oracle will resume support, including logging a bug with Oracle Development for investigation if required.

NOTE: Oracle has not certified any of its products on VMware. For Oracle RAC, Oracle will only accept Service Requests as described in this note on Oracle RAC 11.2.0.2 and later releases.