User Tools

Site Tools


ovirt_rpm_troubleshooting

self-hosted management VM health problem

by arthurfayzullin@gmail.com

Occasionally, a situation arises where due to internal problems with self-hosted management VM, system which monitors its health begins to turn it down. The hardest thing in this situation, that it begins to turn it down immediately after turning it on, thus making it impossible to correct the situation. To remedy this situation, it is necessary to translate the system into maintenance mode, thereby disabling the tracking state of this VM.

sudo hosted-engine --set-maintenance --mode=global

Then start VM

sudo hosted-engine --vm-start

Then connect to this VM to detect and resolve problems

Do not forget to turn off maintenance mode

sudo hosted-engine --set-maintenance --mode=none

You can test system state using command (do it after each step, to be shure in right system state)

sudo hosted-engine --vm-status

ovirt-shell not starting

ovirt-shell failed to start with “No module named kitchen.text.converters” error:

# ovirt-shell 
Traceback (most recent call last):
  File "/usr/bin/ovirt-shell", line 9, in <module>
    load_entry_point('ovirt-shell==3.1.0.7-SNAPSHOT', 'console_scripts', 'ovirt-shell')()
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 299, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 2229, in load_entry_point
    return ep.load()
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 1948, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  File "/usr/lib/python2.6/site-packages/ovirtcli/main.py", line 20, in <module>
    from ovirtcli.context import OvirtCliExecutionContext
  File "/usr/lib/python2.6/site-packages/ovirtcli/context.py", line 18, in <module>
    from cli.command import *
  File "/usr/lib/python2.6/site-packages/cli/__init__.py", line 3, in <module>
    from cli.context import ExecutionContext
  File "/usr/lib/python2.6/site-packages/cli/context.py", line 27, in <module>
    from cli.settings import Settings
  File "/usr/lib/python2.6/site-packages/cli/settings.py", line 23, in <module>
    from cli import platform
  File "/usr/lib/python2.6/site-packages/cli/platform/__init__.py", line 5, in <module>
    from cli.platform.posix.terminal import PosixTerminal as Terminal
  File "/usr/lib/python2.6/site-packages/cli/platform/posix/terminal.py", line 24, in <module>
    from cli.terminal import Terminal
  File "/usr/lib/python2.6/site-packages/cli/terminal.py", line 17, in <module>
    from kitchen.text.converters import getwriter
ImportError: No module named kitchen.text.converters

Reason: python-kitchen not installed

Solution:

Install python-kitchen from EPEL repository.

yum install python-kitchen

Vm failed to start with sanlock socket error - permission denied

Vm failed to start, and you can see error looks like that:

VM testVm is down. Exit message: internal error Failed to open socket to sanlock daemon: permission denied.

Possible reason: selinux configuration problem.

Check sebool values:

getsebool -a | grep virt

virt_use_comm --> off
virt_use_fusefs --> off
virt_use_nfs --> on
virt_use_samba --> off
virt_use_sanlock --> on
virt_use_sysfs --> on
virt_use_usb --> on
virt_use_xserver --> off

virt_use_sanlock and virt_use_nfs must be on, if not set it:

setsebool -P virt_use_sanlock=on
setsebool -P virt_use_nfs=on

Vm failed to start with sanlock socket error - No such file or directory

Vm failed to start, and you can see error looks like that:

VM testVm is down. Exit message: internal error Failed to open socket to sanlock daemon: No such file or directory.

Possible reason: softdog module not loaded.

Solution:

modprobe softdog
service wdmd start
service sanlock start

And, for autoloading softdog module:

echo modprobe softdog >> /etc/rc.modules
chmod +x /etc/rc.modules

Or:

echo -e '#!/bin/sh\nmodprobe softdog\nexit 0' > /etc/sysconfig/modules/softdog.modules
chmod +x /etc/sysconfig/modules/softdog.modules

New engine install on remote DB fails

Ian Levesque reported in users@ovirt.org maillist:

New engine install on remote DB fails “uuid-ossp extension is not loaded”

Alex Lourie post recommendation/solution:

The solution we've come up with is this:

1. Use (or tell remote DB admin to do so) the psql command to load the extension functions to template1 DB
on remote DB server:
   psql -U postgres -d template1 -f /usr/share/pgsql/contrib/uuid-ossp.sql
2. Now, all newly created databases will include extension functions.

template1 is a special DB in postgres. In fact, when you create a new DB, it is actually copied from template1
with a new name.

Storage domain does not exist after yum update

Ricky Schneberger reported in users@ovirt.org maillist:

After an normal “yum update” i am unable to get one of the storage domains “UP”.

Maor Lipchuk post solution:

go to the meta data of the data storage (in the storage server

go to {storage_domain_name}/######..../dom_md/metadata)

delete the chksum line _SHA_CKSUM=################

try to activate the storage domain again the DC (it should fail again)

vdsm.log should print the computed cksum of the storage domain (Should
be an error there which say "Meta Data seal is broken (checksum
mismatch).... computed_cksum = ")

copy the comuted chksum to the meta data (_SHA_CKSUM={new chksum number}

try to activate it again.

If you have a problems with NFS Storage/Iso/Export domains

Force NFS ver. 3, in file /etc/nfsmount.conf

[ NFSMount_Global_Options ]
Defaultvers=3
Nfsvers=3

Management bridge (ovirtmgmt) not created

If management bridge was not created during host setup procedure, remove host from the engine management console. Also, remove vdsm and libvirt from host machine:

service vdsmd stop
service libvirtd stop
yum -y remove *vdsm* *libvirt* *qemu* *sanlock* jpackage*
rm -rf /etc/libvirt/
rm -rf /var/lib/libvirt/
yum clean all
yum makecache

Then try to reinstall host. If that not helps you can try to add ovirt management bridge manually.

At first disable NetworkManager, then correct /etc/resolv.conf

service NetworkManager stop
chkconfig NetworkManager off

Here the examples of ifcfg files, resides in /etc/sysconfig/network-scripts

vim /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
BOOTPROTO=none
NM_CONTROLLED=no
ONBOOT=yes
BRIDGE=ovirtmgmt

vim /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt

DEVICE=ovirtmgmt
BOOTPROTO=static
GATEWAY=xxx.xxx.xxx.xxx
IPADDR=xxx.xxx.xxx.xxx
NETMASK=255.255.255.0
NM_CONTROLLED=no
ONBOOT=yes
TYPE=Bridge

Possible VM startup failed

How to prevent possible VM startup failed. I.e. you can look message in vdsm log like that:

qemuProcessReadLogOutput:1005 : internal error Process exited while reading console log output: Supported machines are:
pc         RHEL 6.2.0 PC (alias of rhel6.2.0)
rhel6.2.0  RHEL 6.2.0 PC (default)
rhel6.1.0  RHEL 6.1.0 PC
rhel6.0.0  RHEL 6.0.0 PC
rhel5.5.0  RHEL 5.5.0 PC
rhel5.4.4  RHEL 5.4.4 PC
rhel5.4.0  RHEL 5.4.0 PC

Try to run this command on oVirt management node (hack from Jerome Deliege):

psql -U postgres engine -c "update vdc_options set option_value='rhel6.3.0' where option_name LIKE 'EmulatedMachine';"

or this:

psql -U postgres engine -c "update vdc_options set option_value='pc' where option_name LIKE 'EmulatedMachine';"

How to disable ssl support:

psql -U postgres engine -c "update vdc_options set option_value='false' where option_name='UseSecureConnectionWithServers' and version='general';"
psql -U postgres engine -c "update vdc_options set option_value='' where option_name = 'SpiceSecureChannels';"

Then restart oVirt

For version 3.0

service jboss-as stop
service jboss-as start

For version 3.1 and greater

service ovirt-engine stop
service ovirt-engine start

If you disable ssl, you must stop firewalls on engine and hosts.

service iptables stop

Failed to import Vm

After virt-v2v you got error: Failed to import Vm <vmName> to <storageName>

Also you can look error in /var/log/ovirt-engineengine.log :

2012-08-16 16:39:30,090 ERROR [org.ovirt.engine.core.bll.ImportVmCommand] (pool-3-thread-50) [2781049c] Command
org.ovirt.engine.core.bll.ImportVmCommand throw exception: org.springframework.dao.DataIntegrityViolationException:
CallableStatementCallback; SQL [{call insertsnapshot(?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: duplicate key value violates
unique constraint "pk_snapshots"

Where: SQL statement "INSERT INTO snapshots( snapshot_id, status, vm_id, snapshot_type, description, creation_date,
app_list, vm_configuration) VALUES(  $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8 )"

Solution:

1. Go to export domain folder on you nfs mount point.

cd export

2. Find you domain ovf file.

vim `grep -Ri <vm name> * | cut -d : -f 1`

3. Find all occurrencses of ovf:vm_snapshot_id=“00000000-0000-0000-0000-000000000000” and replace it with unique id generated by uuid command

Image Locked problem

If Vm or Template remain in state Image Locked more than the reasonable time period, check that the operation (template creation, in my case) really occurs, if not, you can reset this state:

1. Got the vm_guid

psql -U engine -d engine -c "SELECT vm_guid,template_status,vm_name from vm_static where vm_name like '%<vm or template name>%'";

The output looks like:

               vm_guid                | template_status |       vm_name        
--------------------------------------+-----------------+----------------------
 61eedf77-de4e-42c2-8870-420372b44501 |                 | <VmName>
 6b807ca8-3bbb-4339-bafa-f6a67893b3bb |               0 | <TemplateName>

If template status not NULL, this line contains vm_guid for locked template in other case line contains vm_guid for you locked Vm.

2. To “unlock” Vm you can use this command (use you real vm_guid)

psql -U engine -d engine -c "update vm_dynamic set status = 0 where vm_guid='61eedf77-de4e-42c2-8870-420372b44501';"

3. To “unlock” Template you can use this command (use you real vm_guid)

psql -U engine -d engine -c "update vm_static set template_status=0 where vm_guid='61eedf77-de4e-42c2-8870-420372b44501';"
ovirt_rpm_troubleshooting.txt · Last modified: 2014/10/23 11:04 by dreyou