Gotcha 3 when deploying a vIDM cluster with vRLCM 8.1

Recently, I was trying to deploy vRA 8.1 with vRLCM 8.1 Patch 1.  I already configured the vIDM 3 node cluster, so I was ready to go.

First I deployed a vRA 8.1 standard environment and that went fine without any issues.

So, after that I was confident enough to deploy a vRA 8.1 cluster. Unfortunately my deployment failed. The corresponding error I found in /var/log/deploy.log was the following:

Identity Service health check failed. If load-balancer is deployed, make sure it is properly configured.

Before vRA 8.1, I always used ‘Persistence’  Source IP and ‘type’ SSL Passthrough for the Application Profile of the vRA Load Balancer. Also there was no proper information available on how to configure the LB for vIDM.

Last week I found an updated document on how to configure your Load Balancer for vRA 8.1. Surprisingly the applied Load Balancer configuration was slightly changed and the Load Balancer configuration for vIDM was added.

Now with vRA 8.1 the ‘Persistence’ has been changed to None, the ‘type’ SSL Passthrough has not been changed and the ‘Expires in’ value has been changed to None for the Application Profile of the vRA Load Balancer.

For the vIDM Load Balancer the ‘Persistence’ should now been set to Source IP, the ‘type’ should now be SSL Passthrough and the value for ‘Expires in’ should be set to 36000.

https://docs.vmware.com/en/vRealize-Automation/8.1/vrealize-automation-load-balancing-guide.pdf

After I changed the Load Balancer configuration for vIDM and vRA my deployment succeeded. 🥳🤩😎

Finally I could enjoy my new vRA 8.1 cluster running with a vIDM 3.3.2 cluster.

 

Gotcha with the VRA CLOUD INFOBLOX PLUGIN 1.1 and vRA 8.1

This week I discovered an issue when configuring the vRA 8.1 IPAM integration with the VRA CLOUD INFOBLOX PLUGIN version 1.1.

https://marketplace.vmware.com/vsx/solutions/cas-infoblox-plugin-for-abx-0-0-1

When I click on the validate button, it failed with an error.

Unable to validate the provided access credentials: Failed to validate credentials. AdapterReference: http://provisioning-service.prelude.svc.cluster.local:8282/provisioning/adapter/ipam/endpoint-config. Error: Execution of action Infoblox_ValidateEndpoint failed on provider side: Infoblox HTTP request failed with: HTTPSConnectionPool(host=’pb0infblx01.flexlab.local’, port=443): Max retries exceeded with url: /wapi/v2.7/networkview?_return_fields=name (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)],)”,),)) Cloud account: null Task: /provisioning/endpoint-tasks/820902de-bf34-4c91-8217-e3eedd8ea609 (less)

After doing some troubleshooting with my colleagues we discovered the root cause of this error.

This blog reveals how to fix this specific error.

The core of the problem is in the way Python itself handles SSL handshakes. Most programming languages such as Java & C++ allow users to trust unconditionally a particular SSL certificate. Python does not allow that – even if you accept a particular cert as ‘trusted’ – Python still attempts to verify that the whole certificate chain is trusted (including signer, CA etc…)
This is why Infoblox (and other 3rd party providers) that are using certs that are not self-signed must be configured to return the whole certificate chain, not just the end server cert.

When I uploaded the new signed certificate chain to my Infoblox appliance everything looked fine.

However the validation of the integration failed.

When we checked the failed Action Run from the extensibility tab, we discovered that only the leaf certificate has been pulled from the Infoblox appliance instead of the certificate chain what also includes the root ca.

Apparently, I also needed to upload the root CA to my Infoblox appliance. Something I did not do because I used the complete certificate chain when uploading the new signed certificate to the Infoblox appliance.

So I added the root ca to my Infoblox appliance too.

This time when I pressed the validate button, it succeeded.

Note: You need to re-create the integration otherwise it does not work.

When looking  add the succeeded Action Runs, you now see that the entire certificate chain has been pulled.

Enjoy using the vRA Cloud Infoblox Plugin 😁🧐

 

 

Gotcha 2 when deploying a vIDM cluster with vRLCM 8.1

Last week I released a blog article regarding Gotcha’s when deploying a vIDM cluster with vRLCM 8.1. This week it’s time to reveal Gotcha 2 and if time allows also Gotcha 3.

Gotcha 2 is all about Power Off and Power On of the vIDM cluster. The preferred way for Powering On or Powering Off a vIDM cluster is by using the Day 2 Operations of the globalenvironment in the vRLCM gui.

Go to Lifecycle Operations and navigate to Environments.

Next Go to “VIEW DETAILS” of your globalenvironment and click on the 3 dots. This is the location where the Day 2 Operations are located for your environment. In the list of Day 2 Operations you will find Power On and Power Off.

When the Power On or Power Off Day 2 Operations are not used, there is a risk that  the vIDM cluster will not start anymore. This can happen for Example when a vSphere HA event occurs or when the vIDM virtual machines will be Powered On or Powered Off directly with the vSphere Client.

If this happens, it is good to know about some troubleshooting steps. VMware released the following KB Article specifically on this topic.  https://kb.vmware.com/s/article/75080

In my situation, most of the time when a vIDM cluster was not Powered Off via the vRLCM gui, the DelegateIP was gone from the vIDM virtual appliance running as the primary postgres instance. What also happened was that one or both of the secondary postgres instances turned into a state with a ‘down’ status.

To find out what vIDM node is configured as the master postgres instance, run the following command on one of the vIDM nodes in the cluster. (when a password is asked, just provide an enter here.)

su postgres -c “echo -e ‘password’|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \”show pool_nodes\””

In the above screenshot you can see that the vIDM node with IP-address 10.1.0.31 is the primary postgres instance. You can also see that the vIDM node with IP-adress 10.1.0.40 turned into a state with a ‘down’ status.

To validate if we are hitting the issue regarding “No DelegateIP assigned to the primary postgres instance”, we can run the following command on the vIDM node running as the primary postgres instance.

ifconfig eth0:0 | grep ‘inet addr:’ | cut -d: -f2

If the command returns the DelegateIP like the screenshot below, you are not hitting this specific issue. However, if the command returns nothing, you are hitting this specific issue.

Make sure the DelegateIP is not held by any other non-primary instances by running above ifconfig command on the other instances. If at all any of the non-primary instances are still having the DelegateIP run the following command first to detach it.

 ifconfig eth0:0 down

Run the below command on the primary instance to re-assign the DelegateIP.

ifconfig eth0:0 inet delegateIP netmask <Netmask>

After you re-assign  the DelegateIP you need to restart horizon service on all the vIDM nodes by running the command “service horizon-workspace restart”.

If you also hit the second issue where the secondary vIDM postgres instance or instances are turned into a state with a ‘down’ status, you can use the following procedure to fix this.

First shutdown the postgres service on the impacted vIDM postgres instance(s) by running the command “service vpostgres stop“.

Secondly run the following command to recover the impacted vIDM postgres instance. (The default password for user pgpool = password)

/usr/local/bin/pcp_recovery_node -h delegateIP -p 9898 -U pgpool -n <node_id>

Finally validate if all of the vDIM postgres instances are up again.

su postgres -c “echo -e ‘password’|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \”show pool_nodes\””

That’s it for now. Hopefully this info was useful for you.

In my next blog I will continue to reveal even more Gotcha’s.

Gotcha 1 when deploying a vIDM cluster with vRLCM 8.1

Last week I released a new blog about How to setup a NSX-V LB for vIDM. http://2vsteaks.com/how-to-setup-a-nsx-v-lb-for-vidm/

This week I wanted to deploy a vIDM 3 node cluster with vRLCM 8.1. I used my latest blog as a reference for configuring the NSX-V 6.4.6 LB. During the deployment of my new vIDM cluster I discovered a couple a Gotcha’s which I wanted to share with you in a few separate blogs.. 

The first Gotcha I discovered during the deployment process of the new vIDM environment. Despite that all the pre-requisite checks turned green, my deployment failed. It failed in step 5 off the deployment at the point “VidmTrustLBCertificate”.

Here is the detailed error message:

java.security.cert.CertificateException: Failed to find valid root certificate               at com.vmware.vrealize.lcm.util.CertificateUtil.getRootCertificateFromCertificates(CertificateUtil.java:436) at com.vmware.vrealize.lcm.vidm.driver.helpers.VidmInstallHelper.trustCertificate(VidmInstallHelper.java:719)        at com.vmware.vrealize.lcm.vidm.core.task.VidmTrustLBCertificateTask.execute(VidmTrustLBCertificateTask.java:93)                at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:45)                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)        at java.lang.Thread.run(Thread.java:748)

To workaround this issue, I created a new NSX-V Application Profile named vIDM-Deploy.

Application Profile Type: SSL Passthrough

Persistence: Source IP

Expires in: 3600

I replaced my existing Application Profile of the type “HTTPS End-To-End” who was assigned to my vIDM virtual server with this new Application Profile of the type “SSL Passthrough”.

 

When I tried my deployment again it succeeded completely without errors.

I validated my deployment by checking the vIDM System Diagnostic page. https://vidm.flexlab.local/SAAS/admin/app/page#!/systemDiagnostic

Why the correct Application Profile of the type “HTTPS End-To-End” did not work is still under investigation. I will let you know the outcome as soon as I know it too 😉

Did you like this info?

There are more vIDM cluster Gotcha’s to come in my next blog(s)

So stay tuned..🧐

How to setup a NSX-V LB for vIDM

Recently I wanted to deploy a VMware Identity Manager 3.3.2 cluster with vRealize Lifecycle Manager 8.1. As a prerequisite I needed to prepare the vIDM Load Balancer. Unfortunately I was not able to find a complete overview on how to configure this with NSX-V 6.4.6. After some research and reaching out to others, I was able to find all the information I needed. This article reveals how I configured the NSX-V Load Balancer for the vIDM 3 node cluster.

Upload vdim certificate chain and the corresponding root CA certificates:

[vIDM Certificate Chain]

*In the field “Certificate Contents” add the entire certificate chain. Just like the below example.

—–BEGIN CERTIFICATE—–

[contents leaf certificate]

—–END CERTIFICATE—–

—–BEGIN CERTIFICATE—–

[contents root ca]

—–END CERTIFICATE—–

*In the field “Private Key” add the private key of the certificate in rsa format. Just like the below example.

—–BEGIN RSA PRIVATE KEY—–

[contents rsa private key]

—–END RSA PRIVATE KEY—–

Note: If your private key does not start with —–BEGIN RSA PRIVATE KEY—–, than you have to convert your private key first before NSX-V accepts your private key. You can do this for example with openssl. See my example below.

openssl rsa -in vidm-private-key.key -out vidm-private-rsa-key.key

[Corresponding Root CA]

Configure the Application Profile:

Application Profile Type: HTTPS End-To-End

Persistence: Cookie

Cookie Name: JSESSIONID

Mode: App Session

Expires in: 3600

Insert X-Forwarded-For HTTP header: Enable

Client Authentication: Ignore

Server Authentication: Enable

Configure the Service Monitor:

Interval: 5

Timeout: 10

Max Retries: 3

Type: HTTPS

Expected: 200

Method: GET

URL: /SAAS/API/1.0/REST/system/health/heartbeat

 

Configure the Pool:

Algorithm: ROUND-ROBIN

Monitor: VMware Identity Manager

Weight: 1

Monitor Port: 443

Port: 443

 

Configure the Virtual Server:

Virtual Server: Enable

Acceleration: Disable

Application Profile: VMware Identity Manager

Protocol: HTTPS

Port: 443

Default Pool: pool_vidm_443

Note: To enforce Layer 7 traffic, you need to disable Acceleration on the Virtual Server level.

 

That’s it… Now you can start deploying your vIDM cluster with vRLCM 8.1.