Gotcha 3 when deploying a vIDM cluster with vRLCM 8.1

Recently, I was trying to deploy vRA 8.1 with vRLCM 8.1 Patch 1.  I already configured the vIDM 3 node cluster, so I was ready to go.

First I deployed a vRA 8.1 standard environment and that went fine without any issues.

So, after that I was confident enough to deploy a vRA 8.1 cluster. Unfortunately my deployment failed. The corresponding error I found in /var/log/deploy.log was the following:

Identity Service health check failed. If load-balancer is deployed, make sure it is properly configured.

Before vRA 8.1, I always used ‘Persistence’  Source IP and ‘type’ SSL Passthrough for the Application Profile of the vRA Load Balancer. Also there was no proper information available on how to configure the LB for vIDM.

Last week I found an updated document on how to configure your Load Balancer for vRA 8.1. Surprisingly the applied Load Balancer configuration was slightly changed and the Load Balancer configuration for vIDM was added.

Now with vRA 8.1 the ‘Persistence’ has been changed to None, the ‘type’ SSL Passthrough has not been changed and the ‘Expires in’ value has been changed to None for the Application Profile of the vRA Load Balancer.

For the vIDM Load Balancer the ‘Persistence’ should now been set to Source IP, the ‘type’ should now be SSL Passthrough and the value for ‘Expires in’ should be set to 36000.

https://docs.vmware.com/en/vRealize-Automation/8.1/vrealize-automation-load-balancing-guide.pdf

After I changed the Load Balancer configuration for vIDM and vRA my deployment succeeded. 🥳🤩😎

Finally I could enjoy my new vRA 8.1 cluster running with a vIDM 3.3.2 cluster.

 

Gotcha with the VRA CLOUD INFOBLOX PLUGIN 1.1 and vRA 8.1

This week I discovered an issue when configuring the vRA 8.1 IPAM integration with the VRA CLOUD INFOBLOX PLUGIN version 1.1.

https://marketplace.vmware.com/vsx/solutions/cas-infoblox-plugin-for-abx-0-0-1

When I click on the validate button, it failed with an error.

Unable to validate the provided access credentials: Failed to validate credentials. AdapterReference: http://provisioning-service.prelude.svc.cluster.local:8282/provisioning/adapter/ipam/endpoint-config. Error: Execution of action Infoblox_ValidateEndpoint failed on provider side: Infoblox HTTP request failed with: HTTPSConnectionPool(host=’pb0infblx01.flexlab.local’, port=443): Max retries exceeded with url: /wapi/v2.7/networkview?_return_fields=name (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)],)”,),)) Cloud account: null Task: /provisioning/endpoint-tasks/820902de-bf34-4c91-8217-e3eedd8ea609 (less)

After doing some troubleshooting with my colleagues we discovered the root cause of this error.

This blog reveals how to fix this specific error.

The core of the problem is in the way Python itself handles SSL handshakes. Most programming languages such as Java & C++ allow users to trust unconditionally a particular SSL certificate. Python does not allow that – even if you accept a particular cert as ‘trusted’ – Python still attempts to verify that the whole certificate chain is trusted (including signer, CA etc…)
This is why Infoblox (and other 3rd party providers) that are using certs that are not self-signed must be configured to return the whole certificate chain, not just the end server cert.

When I uploaded the new signed certificate chain to my Infoblox appliance everything looked fine.

However the validation of the integration failed.

When we checked the failed Action Run from the extensibility tab, we discovered that only the leaf certificate has been pulled from the Infoblox appliance instead of the certificate chain what also includes the root ca.

Apparently, I also needed to upload the root CA to my Infoblox appliance. Something I did not do because I used the complete certificate chain when uploading the new signed certificate to the Infoblox appliance.

So I added the root ca to my Infoblox appliance too.

This time when I pressed the validate button, it succeeded.

Note: You need to re-create the integration otherwise it does not work.

When looking  add the succeeded Action Runs, you now see that the entire certificate chain has been pulled.

Enjoy using the vRA Cloud Infoblox Plugin 😁🧐

 

 

Gotcha 2 when deploying a vIDM cluster with vRLCM 8.1

Last week I released a blog article regarding Gotcha’s when deploying a vIDM cluster with vRLCM 8.1. This week it’s time to reveal Gotcha 2 and if time allows also Gotcha 3.

Gotcha 2 is all about Power Off and Power On of the vIDM cluster. The preferred way for Powering On or Powering Off a vIDM cluster is by using the Day 2 Operations of the globalenvironment in the vRLCM gui.

Go to Lifecycle Operations and navigate to Environments.

Next Go to “VIEW DETAILS” of your globalenvironment and click on the 3 dots. This is the location where the Day 2 Operations are located for your environment. In the list of Day 2 Operations you will find Power On and Power Off.

When the Power On or Power Off Day 2 Operations are not used, there is a risk that  the vIDM cluster will not start anymore. This can happen for Example when a vSphere HA event occurs or when the vIDM virtual machines will be Powered On or Powered Off directly with the vSphere Client.

If this happens, it is good to know about some troubleshooting steps. VMware released the following KB Article specifically on this topic.  https://kb.vmware.com/s/article/75080

In my situation, most of the time when a vIDM cluster was not Powered Off via the vRLCM gui, the DelegateIP was gone from the vIDM virtual appliance running as the primary postgres instance. What also happened was that one or both of the secondary postgres instances turned into a state with a ‘down’ status.

To find out what vIDM node is configured as the master postgres instance, run the following command on one of the vIDM nodes in the cluster. (when a password is asked, just provide an enter here.)

su postgres -c “echo -e ‘password’|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \”show pool_nodes\””

In the above screenshot you can see that the vIDM node with IP-address 10.1.0.31 is the primary postgres instance. You can also see that the vIDM node with IP-adress 10.1.0.40 turned into a state with a ‘down’ status.

To validate if we are hitting the issue regarding “No DelegateIP assigned to the primary postgres instance”, we can run the following command on the vIDM node running as the primary postgres instance.

ifconfig eth0:0 | grep ‘inet addr:’ | cut -d: -f2

If the command returns the DelegateIP like the screenshot below, you are not hitting this specific issue. However, if the command returns nothing, you are hitting this specific issue.

Make sure the DelegateIP is not held by any other non-primary instances by running above ifconfig command on the other instances. If at all any of the non-primary instances are still having the DelegateIP run the following command first to detach it.

 ifconfig eth0:0 down

Run the below command on the primary instance to re-assign the DelegateIP.

ifconfig eth0:0 inet delegateIP netmask <Netmask>

After you re-assign  the DelegateIP you need to restart horizon service on all the vIDM nodes by running the command “service horizon-workspace restart”.

If you also hit the second issue where the secondary vIDM postgres instance or instances are turned into a state with a ‘down’ status, you can use the following procedure to fix this.

First shutdown the postgres service on the impacted vIDM postgres instance(s) by running the command “service vpostgres stop“.

Secondly run the following command to recover the impacted vIDM postgres instance. (The default password for user pgpool = password)

/usr/local/bin/pcp_recovery_node -h delegateIP -p 9898 -U pgpool -n <node_id>

Finally validate if all of the vDIM postgres instances are up again.

su postgres -c “echo -e ‘password’|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \”show pool_nodes\””

That’s it for now. Hopefully this info was useful for you.

In my next blog I will continue to reveal even more Gotcha’s.

Gotcha 1 when deploying a vIDM cluster with vRLCM 8.1

Last week I released a new blog about How to setup a NSX-V LB for vIDM. http://2vsteaks.com/how-to-setup-a-nsx-v-lb-for-vidm/

This week I wanted to deploy a vIDM 3 node cluster with vRLCM 8.1. I used my latest blog as a reference for configuring the NSX-V 6.4.6 LB. During the deployment of my new vIDM cluster I discovered a couple a Gotcha’s which I wanted to share with you in a few separate blogs.. 

The first Gotcha I discovered during the deployment process of the new vIDM environment. Despite that all the pre-requisite checks turned green, my deployment failed. It failed in step 5 off the deployment at the point “VidmTrustLBCertificate”.

Here is the detailed error message:

java.security.cert.CertificateException: Failed to find valid root certificate               at com.vmware.vrealize.lcm.util.CertificateUtil.getRootCertificateFromCertificates(CertificateUtil.java:436) at com.vmware.vrealize.lcm.vidm.driver.helpers.VidmInstallHelper.trustCertificate(VidmInstallHelper.java:719)        at com.vmware.vrealize.lcm.vidm.core.task.VidmTrustLBCertificateTask.execute(VidmTrustLBCertificateTask.java:93)                at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:45)                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)        at java.lang.Thread.run(Thread.java:748)

To workaround this issue, I created a new NSX-V Application Profile named vIDM-Deploy.

Application Profile Type: SSL Passthrough

Persistence: Source IP

Expires in: 3600

I replaced my existing Application Profile of the type “HTTPS End-To-End” who was assigned to my vIDM virtual server with this new Application Profile of the type “SSL Passthrough”.

 

When I tried my deployment again it succeeded completely without errors.

I validated my deployment by checking the vIDM System Diagnostic page. https://vidm.flexlab.local/SAAS/admin/app/page#!/systemDiagnostic

Why the correct Application Profile of the type “HTTPS End-To-End” did not work is still under investigation. I will let you know the outcome as soon as I know it too 😉

Did you like this info?

There are more vIDM cluster Gotcha’s to come in my next blog(s)

So stay tuned..🧐

How to setup a NSX-V LB for vIDM

Recently I wanted to deploy a VMware Identity Manager 3.3.2 cluster with vRealize Lifecycle Manager 8.1. As a prerequisite I needed to prepare the vIDM Load Balancer. Unfortunately I was not able to find a complete overview on how to configure this with NSX-V 6.4.6. After some research and reaching out to others, I was able to find all the information I needed. This article reveals how I configured the NSX-V Load Balancer for the vIDM 3 node cluster.

Upload vdim certificate chain and the corresponding root CA certificates:

[vIDM Certificate Chain]

*In the field “Certificate Contents” add the entire certificate chain. Just like the below example.

—–BEGIN CERTIFICATE—–

[contents leaf certificate]

—–END CERTIFICATE—–

—–BEGIN CERTIFICATE—–

[contents root ca]

—–END CERTIFICATE—–

*In the field “Private Key” add the private key of the certificate in rsa format. Just like the below example.

—–BEGIN RSA PRIVATE KEY—–

[contents rsa private key]

—–END RSA PRIVATE KEY—–

Note: If your private key does not start with —–BEGIN RSA PRIVATE KEY—–, than you have to convert your private key first before NSX-V accepts your private key. You can do this for example with openssl. See my example below.

openssl rsa -in vidm-private-key.key -out vidm-private-rsa-key.key

[Corresponding Root CA]

Configure the Application Profile:

Application Profile Type: HTTPS End-To-End

Persistence: Cookie

Cookie Name: JSESSIONID

Mode: App Session

Expires in: 3600

Insert X-Forwarded-For HTTP header: Enable

Client Authentication: Ignore

Server Authentication: Enable

Configure the Service Monitor:

Interval: 5

Timeout: 10

Max Retries: 3

Type: HTTPS

Expected: 200

Method: GET

URL: /SAAS/API/1.0/REST/system/health/heartbeat

 

Configure the Pool:

Algorithm: ROUND-ROBIN

Monitor: VMware Identity Manager

Weight: 1

Monitor Port: 443

Port: 443

 

Configure the Virtual Server:

Virtual Server: Enable

Acceleration: Disable

Application Profile: VMware Identity Manager

Protocol: HTTPS

Port: 443

Default Pool: pool_vidm_443

Note: To enforce Layer 7 traffic, you need to disable Acceleration on the Virtual Server level.

 

That’s it… Now you can start deploying your vIDM cluster with vRLCM 8.1. 

 

How to monitor vRA 8.1 via the api

Hello All,

It’s way too long that a posted a new article. Sorry for that but family comes first. The good news is that I am starting again 😉

This week I discovered 2 interesting api calls for checking the health of a vRA 8.1. deployment with the “:8008/health endpoint”. I thought, this is worth for sharing. So here they come  😎

The first api call is to validate the health of a vRA 8.1 node:

I my example, I am using Postman to explore the api call.

GET http://pb0vra8va01.flexlab.local:8008/api/v1/services/local

 

The second api call is to validate the health of a vRA 8.1 cluster:

GET http://pb0vra8va01.flexlab.local:8008/api/v1/services/cluster

For a detailed overview of what kind of services are validated see the below example.

Very cool right! I hope you liked it..

How to configure a signed certificate for vLCM 1.2

When I was playing around with the new released vRealize Suite Lifecycle Manager 1.2, I discovered that during the .ovf deployment there was no option to use a signed certificate.

This article describes how to replace the self signed certificate for a signed certificate on the vLCM appliance.

1. Deploy the vLCM 1.2 appliance using the ova file. (I used VMware-vLCM-Appliance-1.2.0.10-8234885_OVF10.ova)

2. Check the self signed certificate in the web browser when accessing the vLCM home page. It should look like the following picture.

3. You can find this self signed certificate with it’s private key on the vLCM appliance on the following location /opt/vmware/vlcm/cert. The files are called server.crt and server.key.

4. Make a backup of these files. (I renamed them to server.crt.bck and server.key.bck)

5. Generate your signed certificate and upload the signed certificate with it’s private key to the vLCM appliance on the same location as where you found the self signed certificate. See picture below.

6. Now restart the service vlcm-xserver by using the command systemctl restart vlcm-xserver.

7. Next check for the status of the vlcm-xserver service by using the command systemctl status vlcm-xserver. It will tell you if the service is active or not.

8. Finally check again for the certificate in the web browser when accessing the vLCM home page. Now it should use your signed certificate. Just like the picture below.

So that’s all. Pretty easy right 😉

vRA 7.3 High Availability Improvements

With the latest release of vRealize Automation 7.3, the following very nice high availability improvements have been introduced.

vPostgres Database Automatic Failover

IaaS Manager Server Service Automatic Failover

These important improvements, significantly help to make vRA even more robust and stable as ever before.

This article introduces these two new cool features.

vPostgres Database Automatic Failover

For the people who didn’t know, with the introduction of vRA 7.0, VMware introduced the feature “Synchronous vPostgres replication” on top of the Asynchronous vPostgres replication feature. An important side note, is that Synchronous vPostgres replication could only be enabled when three vRealize Automation virtual appliances have been configured as a three node vRA cluster. Unfortunately, both vPostgres replication flavours require a manual failover of the vPostgres Database.

Now with the 7.3 release of vRealize Automation, VMware enhances the Synchronous vPostgres replication flavour by introducing automatic failover.

But what configuration tasks needs to be addressed to take advantage of this new feature? That’s the best part, it is just a simple click on a button 😉

  1. Configure three vRA virtual appliances as a three node cluster.
  2. Login to the VAMI interface of one of the three vRA virtual appliances and go to vRA Settings –> Database.
  3. Click on the Sync Mode Button.
  4. Done!
     IaaS Manager Server Service Automatic Failover

The IaaS Manager Server Service has always been there as an active / passive component with a manual failover mechanism.

With the 7.3 release of vRealize Automation, VMware enhances the IaaS Manager Server Service component by introducing automatic failover.

A big difference between all previous releases of vRealize Automation and the 7.3 release is, that the IaaS Manager Server Services have been started simultaneously on both Windows servers when using this automatic failover feature. There is also no need anymore to change configuration settings on the load balancer in the case of a failure. This failover over process is now fully covered on an application level.

p.s.1: Automatic Manager Server Service failover is disabled by default if you install or upgrade the Manager Service with the standard vRealize Automation Windows installer. You can use the command “python /usr/lib/vcac/tools/vami/commands/manager-service-automatic-failover ENABLE” from one of the vRA virtual appliances to enable this feature manually.

p.s.2: You can use the the command “vra-command list-nodes –components” from one of the vRA virtual appliances to discover which Manager Server Service has been configured as the active node and which one has been configured as the passive node.

As always, please enjoy using this awesome product!

How-to fix vRealize Business error “Untrusted certificate chain” after an upgrade to vRA 7.3 and vRB 7.3

After I upgraded vRealize Business and vRealize Automation to version 7.3.0, I discovered that vRB was not working anymore from the vRA portal.

The following error message “javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Untrusted certificate chain” gets presented when accessing the vRB sections in vRA.

This article reveals how I managed to solve this unexpected issue.

  1. We can recognize this issue by opening the vRA portal and navigate to the vRB sections.
  2. When the above error appears, we also see that the related vRB services have not been registered on the vRA appliance(s).
  3. A first remediation attempt will be a reboot of the vRB virtual appliance.
  4. If the reboot of the vRB virtual appliance did not solve the issue, you can try the following procedure.
  5. Login to the vRB virtual appliance VAMI interface and go to the Registration section.
  6. Provide the SSO Admin User and Password and click Unregister.
  7. Next, login to the console of the vRB virtual appliance and get the password for the keystore. You can find this password in the following file “/shared/catalina.properties”. Search for the line bio-ssl.keystore.password. You can also use the command cat /shared/catalina.properties | grep “bio-ssl”.
  8. With the keystore password, we can remove the existing pricing-api certificate. Use the following command to remove this certificate keytool -delete -noprompt -alias pricing-api -keystore /server/conf/ssl.keystore -storepass 7ks5LoxHzf1YYAKykt2pzqSd74uL
  9. Now go back to the vRB virtual appliance VAMI interface and navigate again to the Registration section. Provide the vRA portal name, SSO Default Tenant, SSO Admin User and SSO Admin Password and click Register.
  10. Finally, reboot the vRB virtual appliance and login again to the vRA portal. Navigate again to the vRB sections and validate if the issue has been resolved like it did for me.
  11. Enjoy again using vRB 7.3.0 with vRA 7.3.0.

Monitoring vRA with the vRA Health Service

One of the new cool features of vRA 7.3 is the Health Service. With this feature you can monitor a variety of vRA related health checks within the vRA User Interface. These checks or also called tests can be configured on a pre-defined time interval.

You can configure the health service to run the following tests:

  • vRealize Automation virtual appliance System Tests. These tests determine if components, such as the vRealize Automation license, are registered and necessary resources, such as memory, are available on the vRealize Automation virtual appliance.
  • vRealize Automation virtual appliance Tenant Tests. These tests determine if tenant-related components, such as software-service, are registered and necessary resources, such as vSphere virtual machines, are available.
  • vRealize Orchestrator on the vRealize Orchestrator host System Tests. These tests confirm that components, such as the vro-server service, are registered and necessary resources, such as sufficient Java memory heap, are available.

This article describes how to setup and use the new health service tests within the vRA UI.

  1. Login to the vRA portal with a user which has Tenant Administrator and IaaS Administrator privileges.
  2. Go to Administration –> Health and click on “New Configuration”.
  3. First we are going to create the configuration for the vRealize Automation tests. Provide the required information for the Configuration Details section and click next.
  4. Secondly, select the applicable Test Suites and click next.
  5. Now, provide the required information for the Configure Parameters section and click next.
  6. Finally, review the Summary section and click finish.
  7. After the configuration for the vRealize Automation tests has been completed, we are going to create the configuration for the vRealize Orchestrator tests. Provide the required information for the Configuration Details section and click next.
  8. Secondly, select the applicable Test Suites and click next.
  9. Now, provide the required information for the Configure Parameters section and click next.
  10. Finally, review the Summary section and click finish.
  11. Now we have configured both vRA Health Service configurations, we can check the results of these tests. If the presented results are not accurate enough, we can force new up-to-date results by clicking on the run buttons below the configurations.
  12. When analyzing our results, we see that all tests for the vRealize Orchestrator tests have completed successfully. When we click on the results diagram, we can see the details of these tests.
  13. Unfortunately, we see that not all tests for the vRealize Automation tests have completed successfully. Again, when we click on the results diagram, we can see the details of these tests.
  14. Now we have discovered that four of the vRealize Automation tests have failed, it’s time to discover why. When we click on the Remediation links, we see that the first three failed tests are caused due to issues with the configured vSphere Endpoint.
  15. The fourth failed test is caused due to configuration issues with the embedded vRealize Orchestrator services.
  16. After we think we have fixed the reported errors, we can trigger again new up-to-date test results to validate our remediation work.
  17. Now we see all configured tests with the status PASSED, we know that we are back on track with a healthy environment.
  18. Enjoy using this new feature of the cool new vRealize Automation 7.3.0 release.