VBox 5.0.10/12 issues with PERL and Seg Faults – UPDATE

A bit more than two months ago I did hear from several people having issues with our Hands-On Lab environment. And it became clear that only those who use Oracle Virtual Box 5 see such errors.

VBox 5.0.10 crash issues with our Hands-On-Lab

Oracle VirtualBox 5.0.x – Segmentation Fault in PERL

Then I read Danny Bryant‘s blog post (thanks to Deiby Gomez for pointing me to it) about similar issues and a potential solution yesterday:

And interestingly one of my colleagues, our PL/SQL product manager Bryn Llewellyn started an email thread and a test internally yesterday as well. The issue seem to occur only on newer versions of Apple’s MacBooks.

Potential Root Cause

The PERL issues seem to happen only on specific new Intel CPUs with a so called 4th level cache.

The current assumption is that Intel CPUs with Iris Pro graphics are affected. Iris Pro means eDRAM (embedded DRAM) which is reported as 4th level cache in CPUID. We have confirmed that Crystal Well and Broadwell CPUs with Iris Pro are affected. It is likely that the Xeon E3-1200 v4 family is also affected.

It seems to be that there’s a bug in the perl binary. It links against ancient code from the Intel compiler suite doing optimizations according to the CPU features. Very recent Intel CPUs have 4 cache descriptors.

People who encountered this used Virtual Box VBox 5.0.x – and it passes this information to the guest. This leads to a problem within the perl code. You won’t see it on VBox 4.3 as this version does not pass the information to the guest. 

But actually it seems that this issue is independent of Virtual Box or any other virtualization software. It simply happens in this case as many people use VBox on Macs – and some Macs are equipped with this new CPU model. But people run Oracle in VBox environments and therefore see the issue as soon as they upgraded to VBox 5.0.x.

Potential Solutions

If you are using Oracle in VBox there are actually two solutions:

  • Revert to VBox 4.3 as this won’t get you in trouble
    This problem was not triggered on VBox 4.3.x because this version did not  pass the full CPUID cache line information to the guest.
  • Run this sequence of commands in VBox 5.0 to tweak the CPUID bits passed to the guest:
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/Leaf" "0x4"
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/SubLeaf" "0x4"
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/eax"  "0"
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/ebx" "0" 
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/ecx" "0" 
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/edx"  "0"
    VBoxManage setextradata VM_NAME "VBoxInternal/CPUM/HostCPUID/Cache/SubLeafMask" "0xffffffff"
    • Of course you’ll need to replace VM_NAME by the name of your VM.

If the error happens on a bare metal machine meaning it happens not inside a virtual image but on a native environment then the only chance you’ll have (to my knowledge right now) is to exchange the PERL before doing really something such as running root.sh or rootupgrade.sh in your Grid Infrastructure installation or before using the DBCA or the catctl.pl tool to create or upgrade a database.

In this case please refer to the blog post of Laurent Leturgez:

Issues with Oracle PERL causing segmentation faults:
http://laurent-leturgez.com/2015/05/26/oracle-12c-vmware-fusion-and-the-perl-binarys-segmentation-fault

Further Information

This issues is currently tracked internally as bug 22539814: ERRORS INSTALLING GRID INFRASTRUCTURE 12.1.0.2 ON INTEL CPUS WITH 4 CACHE LEVEL.

So far we have not seen reports by people encountering this in a native environment but only by people using VBox 5.0.x or Parallels or VMware on a very modern version of Apple hardware.
–Mike

VBox 5.0.10 crash issues with our Hands-On-Lab

Milano - Nov 2015 (c) Mike Dietrich

I’ve ran two Hands-On-Workshops with customers and partners in Italy last week in Milano where we used our well known and thousands-of-times proven Hand-On-Lab environment:

But this time some people failed while running the lab with random corruptions either shutting down the entire VM while running – or displaying file corruptions in the spfile – or other issues.

The common thing in all cases: People had VBox 5.0.10 downloaded and installed right before the workshop.

Of course they’ve did it – as I’m tempted too since weeks. Every time I start VBox on my PC Oracle Virtual Box asks me:

Even though the screenshot is German you know what it offers me:
Download and Install Virtual Box 5.0.10.

Actually the current issue reminds me a lot on what I have experienced in 2014 in an Upgrade Hands-On Workshop in Vienna, Austria. 20 Oracle partners came together for two days for a Hands-On Upgrade/Migrate/Consolidate training. And 6 or 7 had random issues with their Virtual Box images. Corruptions. Failing upgrades at random phases. No patterns.

Only until somebody figured out via a Google search that at the same time other people started reporting similar behavior with their own VBox images using the brand new version of Virtual Box. It turned out that this newest version of Oracle Virtual Box 4.3 (I think it was 26) had exactly such issues. Everybody else in our room – including myself – running a version a few weeks older had no issues at all.

When we exchanged the affected installations the next morning replacing it (if I remember correctly: 4.3.24) all went fine for the rest of the workshop.

I won’t say that VBox 5.0.10 is bad as I lack evidence, reproducible test cases, bugs. 

But I follow other people’s Twitter and Facebook messages. And it seems to be that the PERL problem I did report a few days back:

Oracle VirtualBox 5.0.x – Segmentation Fault in PERL

is not he only issue with VBox images build in version 4 – and now running (more or less) on VBox 5.0.10.

Please see also:

–Mike

VBox Hands-on-Lab image – build your own :-)

Oh … I know … I promised to post all the details how I’ve build up our pretty straight forward Hands-On-Lab Roy, Carol, Cindy, Joe and I used at OOW and some other occasions to let you upgrade, migrate and consolidate databases to Oracle Database 12c and into Oracle Multitenant.

And well, some have emailed me already … and I had this feeling that my schedule will be very tight after OOW. Even right now (Sunday evening) I’m already back at my second home, Lufthansa Senator Lounge at Munich Airport. Waiting for my flight to Rome in an hour or so. Honestly speaking I had really no time in the past weeks to sit down for 2 hours to write down all the steps to guide you through the rebuild. And I didn’t want to throw just a few nuggets – my intention is always to get you detailed steps which really work and don’t miss anything.

But I have very good news for all who are waiting for the HOL Image 🙂
Roy is working hard (and I’m confident that he’ll succeed) to get the image published on OTN within the next weeks. So please stay tuned. Even with the Christmas holidays coming up I’m tied into a schedule to visit Rome, Torino, Milan, Brussels, assist some customers in their final go-live-phase for Oracle Database 12c – and I’m really looking forward to that vacation.

Stay tuned – and thanks again for your patience 🙂

-Mike