Gotcha 3 when deploying a vIDM cluster with vRLCM 8.1

Recently, I was trying to deploy vRA 8.1 with vRLCM 8.1 Patch 1.  I already configured the vIDM 3 node cluster, so I was ready to go.

First I deployed a vRA 8.1 standard environment and that went fine without any issues.

So, after that I was confident enough to deploy a vRA 8.1 cluster. Unfortunately my deployment failed. The corresponding error I found in /var/log/deploy.log was the following:

Identity Service health check failed. If load-balancer is deployed, make sure it is properly configured.

Before vRA 8.1, I always used ‘Persistence’  Source IP and ‘type’ SSL Passthrough for the Application Profile of the vRA Load Balancer. Also there was no proper information available on how to configure the LB for vIDM.

Last week I found an updated document on how to configure your Load Balancer for vRA 8.1. Surprisingly the applied Load Balancer configuration was slightly changed and the Load Balancer configuration for vIDM was added.

Now with vRA 8.1 the ‘Persistence’ has been changed to None, the ‘type’ SSL Passthrough has not been changed and the ‘Expires in’ value has been changed to None for the Application Profile of the vRA Load Balancer.

For the vIDM Load Balancer the ‘Persistence’ should now been set to Source IP, the ‘type’ should now be SSL Passthrough and the value for ‘Expires in’ should be set to 36000.

https://docs.vmware.com/en/vRealize-Automation/8.1/vrealize-automation-load-balancing-guide.pdf

After I changed the Load Balancer configuration for vIDM and vRA my deployment succeeded. 🥳🤩😎

Finally I could enjoy my new vRA 8.1 cluster running with a vIDM 3.3.2 cluster.

 

Gotcha with the VRA CLOUD INFOBLOX PLUGIN 1.1 and vRA 8.1

This week I discovered an issue when configuring the vRA 8.1 IPAM integration with the VRA CLOUD INFOBLOX PLUGIN version 1.1.

https://marketplace.vmware.com/vsx/solutions/cas-infoblox-plugin-for-abx-0-0-1

When I click on the validate button, it failed with an error.

Unable to validate the provided access credentials: Failed to validate credentials. AdapterReference: http://provisioning-service.prelude.svc.cluster.local:8282/provisioning/adapter/ipam/endpoint-config. Error: Execution of action Infoblox_ValidateEndpoint failed on provider side: Infoblox HTTP request failed with: HTTPSConnectionPool(host=’pb0infblx01.flexlab.local’, port=443): Max retries exceeded with url: /wapi/v2.7/networkview?_return_fields=name (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)],)”,),)) Cloud account: null Task: /provisioning/endpoint-tasks/820902de-bf34-4c91-8217-e3eedd8ea609 (less)

After doing some troubleshooting with my colleagues we discovered the root cause of this error.

This blog reveals how to fix this specific error.

The core of the problem is in the way Python itself handles SSL handshakes. Most programming languages such as Java & C++ allow users to trust unconditionally a particular SSL certificate. Python does not allow that – even if you accept a particular cert as ‘trusted’ – Python still attempts to verify that the whole certificate chain is trusted (including signer, CA etc…)
This is why Infoblox (and other 3rd party providers) that are using certs that are not self-signed must be configured to return the whole certificate chain, not just the end server cert.

When I uploaded the new signed certificate chain to my Infoblox appliance everything looked fine.

However the validation of the integration failed.

When we checked the failed Action Run from the extensibility tab, we discovered that only the leaf certificate has been pulled from the Infoblox appliance instead of the certificate chain what also includes the root ca.

Apparently, I also needed to upload the root CA to my Infoblox appliance. Something I did not do because I used the complete certificate chain when uploading the new signed certificate to the Infoblox appliance.

So I added the root ca to my Infoblox appliance too.

This time when I pressed the validate button, it succeeded.

Note: You need to re-create the integration otherwise it does not work.

When looking  add the succeeded Action Runs, you now see that the entire certificate chain has been pulled.

Enjoy using the vRA Cloud Infoblox Plugin 😁🧐

 

 

Gotcha 2 when deploying a vIDM cluster with vRLCM 8.1

Last week I released a blog article regarding Gotcha’s when deploying a vIDM cluster with vRLCM 8.1. This week it’s time to reveal Gotcha 2 and if time allows also Gotcha 3.

Gotcha 2 is all about Power Off and Power On of the vIDM cluster. The preferred way for Powering On or Powering Off a vIDM cluster is by using the Day 2 Operations of the globalenvironment in the vRLCM gui.

Go to Lifecycle Operations and navigate to Environments.

Next Go to “VIEW DETAILS” of your globalenvironment and click on the 3 dots. This is the location where the Day 2 Operations are located for your environment. In the list of Day 2 Operations you will find Power On and Power Off.

When the Power On or Power Off Day 2 Operations are not used, there is a risk that  the vIDM cluster will not start anymore. This can happen for Example when a vSphere HA event occurs or when the vIDM virtual machines will be Powered On or Powered Off directly with the vSphere Client.

If this happens, it is good to know about some troubleshooting steps. VMware released the following KB Article specifically on this topic.  https://kb.vmware.com/s/article/75080

In my situation, most of the time when a vIDM cluster was not Powered Off via the vRLCM gui, the DelegateIP was gone from the vIDM virtual appliance running as the primary postgres instance. What also happened was that one or both of the secondary postgres instances turned into a state with a ‘down’ status.

To find out what vIDM node is configured as the master postgres instance, run the following command on one of the vIDM nodes in the cluster. (when a password is asked, just provide an enter here.)

su postgres -c “echo -e ‘password’|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \”show pool_nodes\””

In the above screenshot you can see that the vIDM node with IP-address 10.1.0.31 is the primary postgres instance. You can also see that the vIDM node with IP-adress 10.1.0.40 turned into a state with a ‘down’ status.

To validate if we are hitting the issue regarding “No DelegateIP assigned to the primary postgres instance”, we can run the following command on the vIDM node running as the primary postgres instance.

ifconfig eth0:0 | grep ‘inet addr:’ | cut -d: -f2

If the command returns the DelegateIP like the screenshot below, you are not hitting this specific issue. However, if the command returns nothing, you are hitting this specific issue.

Make sure the DelegateIP is not held by any other non-primary instances by running above ifconfig command on the other instances. If at all any of the non-primary instances are still having the DelegateIP run the following command first to detach it.

 ifconfig eth0:0 down

Run the below command on the primary instance to re-assign the DelegateIP.

ifconfig eth0:0 inet delegateIP netmask <Netmask>

After you re-assign  the DelegateIP you need to restart horizon service on all the vIDM nodes by running the command “service horizon-workspace restart”.

If you also hit the second issue where the secondary vIDM postgres instance or instances are turned into a state with a ‘down’ status, you can use the following procedure to fix this.

First shutdown the postgres service on the impacted vIDM postgres instance(s) by running the command “service vpostgres stop“.

Secondly run the following command to recover the impacted vIDM postgres instance. (The default password for user pgpool = password)

/usr/local/bin/pcp_recovery_node -h delegateIP -p 9898 -U pgpool -n <node_id>

Finally validate if all of the vDIM postgres instances are up again.

su postgres -c “echo -e ‘password’|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \”show pool_nodes\””

That’s it for now. Hopefully this info was useful for you.

In my next blog I will continue to reveal even more Gotcha’s.

Gotcha 1 when deploying a vIDM cluster with vRLCM 8.1

Last week I released a new blog about How to setup a NSX-V LB for vIDM. http://2vsteaks.com/how-to-setup-a-nsx-v-lb-for-vidm/

This week I wanted to deploy a vIDM 3 node cluster with vRLCM 8.1. I used my latest blog as a reference for configuring the NSX-V 6.4.6 LB. During the deployment of my new vIDM cluster I discovered a couple a Gotcha’s which I wanted to share with you in a few separate blogs.. 

The first Gotcha I discovered during the deployment process of the new vIDM environment. Despite that all the pre-requisite checks turned green, my deployment failed. It failed in step 5 off the deployment at the point “VidmTrustLBCertificate”.

Here is the detailed error message:

java.security.cert.CertificateException: Failed to find valid root certificate               at com.vmware.vrealize.lcm.util.CertificateUtil.getRootCertificateFromCertificates(CertificateUtil.java:436) at com.vmware.vrealize.lcm.vidm.driver.helpers.VidmInstallHelper.trustCertificate(VidmInstallHelper.java:719)        at com.vmware.vrealize.lcm.vidm.core.task.VidmTrustLBCertificateTask.execute(VidmTrustLBCertificateTask.java:93)                at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:45)                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)        at java.lang.Thread.run(Thread.java:748)

To workaround this issue, I created a new NSX-V Application Profile named vIDM-Deploy.

Application Profile Type: SSL Passthrough

Persistence: Source IP

Expires in: 3600

I replaced my existing Application Profile of the type “HTTPS End-To-End” who was assigned to my vIDM virtual server with this new Application Profile of the type “SSL Passthrough”.

 

When I tried my deployment again it succeeded completely without errors.

I validated my deployment by checking the vIDM System Diagnostic page. https://vidm.flexlab.local/SAAS/admin/app/page#!/systemDiagnostic

Why the correct Application Profile of the type “HTTPS End-To-End” did not work is still under investigation. I will let you know the outcome as soon as I know it too 😉

Did you like this info?

There are more vIDM cluster Gotcha’s to come in my next blog(s)

So stay tuned..🧐

How to setup a NSX-V LB for vIDM

Recently I wanted to deploy a VMware Identity Manager 3.3.2 cluster with vRealize Lifecycle Manager 8.1. As a prerequisite I needed to prepare the vIDM Load Balancer. Unfortunately I was not able to find a complete overview on how to configure this with NSX-V 6.4.6. After some research and reaching out to others, I was able to find all the information I needed. This article reveals how I configured the NSX-V Load Balancer for the vIDM 3 node cluster.

Upload vdim certificate chain and the corresponding root CA certificates:

[vIDM Certificate Chain]

*In the field “Certificate Contents” add the entire certificate chain. Just like the below example.

—–BEGIN CERTIFICATE—–

[contents leaf certificate]

—–END CERTIFICATE—–

—–BEGIN CERTIFICATE—–

[contents root ca]

—–END CERTIFICATE—–

*In the field “Private Key” add the private key of the certificate in rsa format. Just like the below example.

—–BEGIN RSA PRIVATE KEY—–

[contents rsa private key]

—–END RSA PRIVATE KEY—–

Note: If your private key does not start with —–BEGIN RSA PRIVATE KEY—–, than you have to convert your private key first before NSX-V accepts your private key. You can do this for example with openssl. See my example below.

openssl rsa -in vidm-private-key.key -out vidm-private-rsa-key.key

[Corresponding Root CA]

Configure the Application Profile:

Application Profile Type: HTTPS End-To-End

Persistence: Cookie

Cookie Name: JSESSIONID

Mode: App Session

Expires in: 3600

Insert X-Forwarded-For HTTP header: Enable

Client Authentication: Ignore

Server Authentication: Enable

Configure the Service Monitor:

Interval: 5

Timeout: 10

Max Retries: 3

Type: HTTPS

Expected: 200

Method: GET

URL: /SAAS/API/1.0/REST/system/health/heartbeat

 

Configure the Pool:

Algorithm: ROUND-ROBIN

Monitor: VMware Identity Manager

Weight: 1

Monitor Port: 443

Port: 443

 

Configure the Virtual Server:

Virtual Server: Enable

Acceleration: Disable

Application Profile: VMware Identity Manager

Protocol: HTTPS

Port: 443

Default Pool: pool_vidm_443

Note: To enforce Layer 7 traffic, you need to disable Acceleration on the Virtual Server level.

 

That’s it… Now you can start deploying your vIDM cluster with vRLCM 8.1. 

 

How to monitor vRA 8.1 via the api

Hello All,

It’s way too long that a posted a new article. Sorry for that but family comes first. The good news is that I am starting again 😉

This week I discovered 2 interesting api calls for checking the health of a vRA 8.1. deployment with the “:8008/health endpoint”. I thought, this is worth for sharing. So here they come  😎

The first api call is to validate the health of a vRA 8.1 node:

I my example, I am using Postman to explore the api call.

GET http://pb0vra8va01.flexlab.local:8008/api/v1/services/local

 

The second api call is to validate the health of a vRA 8.1 cluster:

GET http://pb0vra8va01.flexlab.local:8008/api/v1/services/cluster

For a detailed overview of what kind of services are validated see the below example.

Very cool right! I hope you liked it..

My LBaaS Journey part 1

My upcoming blog posts will be dedicated to my LBaaS journey. With LBaaS, I am referring to “Load Balancer as a Service”. In my journey, I am going to explain how you can create a NSX based  edge device, configured as a vRealize Automation 7.x Infrastructure Load Balancer what will be offered as a self service catalog item within vRealize Automation.

The first part of the journey is all about the relevant REST API calls to the NSX manager. On a high level, the following steps will be followed in this part of the journey.

  • Create the NSX Edge device
  • Configure the NSX Edge device as a vRA 7.x infrastructure Load Balancer
  • Capture the NSX Edge device config via a REST API call
  • Modify the NSX Edge device config
  • Deploy a new NSX Edge device via a REST API call

In the next section, I am going to explain these high level steps in much more detail

  1. The first task is to create your nsx edge device and configure it as you want. Because my use-case is all about creating a vRA 7.x Infrastructure Load Balancer, I used the following VMware document as my reference (https://docs.vmware.com/en/vRealize-Automation/7.4/vrealize-automation-load-balancing.pdf).  The NSX-V version I am using is 6.4.3. Please write down the Id of the nsx edge device, we need this Id later in this blog-post. The Id of my template nsx edge device = edge-60.
  2. No we have created the template nsx edge device, we want to capture all of it’s configuration. To achieve this, we need to execute a REST API command to the NSX Manager. However, before we can execute REST API commands to the NSX Manager, we need to authenticate to the NSX Manager REST API. I am using Postman in this blog-post as my REST Client. In the below screenshots, you can see how I managed to get an Authorization Header for my future REST API calls to the NSX Manager.
  3. Now we have the Authorization Header, we need to add two additional Headers. With these two additional headers, we make sure that the Response Body of the REST API call, will be in json format.
  4. Now we have all the ingredients to execute the right REST API call to capture the configuration of our template nsx edge device, we can execute the following REST API call.

    Make sure you have the 3 Headers configured as explained earlier in this blog-post. As you can see in this REST API call, we also needed to provide in here the Id of the template nsx edge device. The outcome of this REST API call is the configuration of the template nsx edge device in json format.
  5. Now we have the main configuration of our template nsx edge device in json format, we are able to modify the contents. In this blog I am going to modify the following information:
  6. The contents of the new configuration file used in my example looks now like the following:
  7. We can now try to deploy the template nsx edge device based on the modified json file as the REST Request Body. Again make sure you have also the 3 Headers configured . Now execute the following REST API call.

  8. As you already discovered, the REST API call failed with the following error “HTTP Status 400 Bad Request”
  9. The reason for this error is due to a missing password in the REST Request Body. This missing variable and it’s value are not part of the GET REST API call. To fix this issue with the REST API call, you need to modify the REST Request Body again and include the variable “password” with your password as the value. See the below example.
  10. After we included the password variable into the REST Request Body, we can finally deploy a new nsx edge device based on the json configuration file. Again make sure you have also the 3 Headers configured. Execute again the following REST API call. This time the REST API call should succeed and it will return a status of 201 Created.


  11. After a few moments you will see in your vCenter Client that a new nsx edge device has been deployed.

This is the end of part 1 of My LBaaS Journey blog series. I hope you enjoyed! In the next part of my Journey, I am going to develop a vRealize Orchestrator Workflow based on the REST API calls demonstrated in part 1 of the Journey.

Signed certificate gotcha for VIC 1.3.1

As a follow up on my previous post, I wanted to discover how to deploy the new released VMware vSphere Integrated Containers 1.3.1 appliance with a signed certificate for the management portal on port 8282.

The certificate section looks very promising during the .ovf deployment phase and I could provide the SSL Cert, SSL Cert Key and CA Cert.

However, after the deployment the vic management console was not accessible ;(

What did I missed ?

This article describes how I managed to replace the self signed certificate for a signed certificate on the VIC appliance after the initial deployment.

1. Deploy the VIC 1.3.1 appliance using the ova file. (I used vic-v1.3.1-3409-132fb13d.ova)

2. Try to access the vic management portal on port 8282.

If you are unable to access the the vic management portal, there is a big chance that the provided signed certificate is incorrect. You can always choose to ignore the certificate part during deployment and it will use self signed certificates.

Again, in my case, I used a signed certificate and the vic management portal was not accessible. After some research I noticed that the private key of the signed certificate, needs to be in PKCS#8 format.

You can also check the log files on the vic appliance on the location /storage/log/admiral for more information.

3. Convert the private key of your signed certificate in PKCS#8 format by using the following command: openssl pkcs8 -topk8 -inform PEM -outform PEM -nocrypt -in server.crt -out key.pkcs8.pem

4. Replace the private key on the vic appliance with the private key what has been converted to PKCS#8 format. You can find the existing private key and certificate on the vic appliance on the location /storage/data/admiral/configs

Rename the incorrect private key to server.key_original

Rename the new converted private key to server.key

5. Restart the admiral service with the following command systemctl restart admiral.service

6. Next check for the status of the admiral.service service by using the command systemctl status admiral.service. It will tell you if the service is active or not.

7. Finally,try again to access the vic management portal on port 8282. This time it should be accessible.

Enjoy using vSphere Integrated Containers 🙂

 

How to configure a signed certificate for vLCM 1.2

When I was playing around with the new released vRealize Suite Lifecycle Manager 1.2, I discovered that during the .ovf deployment there was no option to use a signed certificate.

This article describes how to replace the self signed certificate for a signed certificate on the vLCM appliance.

1. Deploy the vLCM 1.2 appliance using the ova file. (I used VMware-vLCM-Appliance-1.2.0.10-8234885_OVF10.ova)

2. Check the self signed certificate in the web browser when accessing the vLCM home page. It should look like the following picture.

3. You can find this self signed certificate with it’s private key on the vLCM appliance on the following location /opt/vmware/vlcm/cert. The files are called server.crt and server.key.

4. Make a backup of these files. (I renamed them to server.crt.bck and server.key.bck)

5. Generate your signed certificate and upload the signed certificate with it’s private key to the vLCM appliance on the same location as where you found the self signed certificate. See picture below.

6. Now restart the service vlcm-xserver by using the command systemctl restart vlcm-xserver.

7. Next check for the status of the vlcm-xserver service by using the command systemctl status vlcm-xserver. It will tell you if the service is active or not.

8. Finally check again for the certificate in the web browser when accessing the vLCM home page. Now it should use your signed certificate. Just like the picture below.

So that’s all. Pretty easy right 😉

vRA 7.3 High Availability Improvements

With the latest release of vRealize Automation 7.3, the following very nice high availability improvements have been introduced.

vPostgres Database Automatic Failover

IaaS Manager Server Service Automatic Failover

These important improvements, significantly help to make vRA even more robust and stable as ever before.

This article introduces these two new cool features.

vPostgres Database Automatic Failover

For the people who didn’t know, with the introduction of vRA 7.0, VMware introduced the feature “Synchronous vPostgres replication” on top of the Asynchronous vPostgres replication feature. An important side note, is that Synchronous vPostgres replication could only be enabled when three vRealize Automation virtual appliances have been configured as a three node vRA cluster. Unfortunately, both vPostgres replication flavours require a manual failover of the vPostgres Database.

Now with the 7.3 release of vRealize Automation, VMware enhances the Synchronous vPostgres replication flavour by introducing automatic failover.

But what configuration tasks needs to be addressed to take advantage of this new feature? That’s the best part, it is just a simple click on a button 😉

  1. Configure three vRA virtual appliances as a three node cluster.
  2. Login to the VAMI interface of one of the three vRA virtual appliances and go to vRA Settings –> Database.
  3. Click on the Sync Mode Button.
  4. Done!
     IaaS Manager Server Service Automatic Failover

The IaaS Manager Server Service has always been there as an active / passive component with a manual failover mechanism.

With the 7.3 release of vRealize Automation, VMware enhances the IaaS Manager Server Service component by introducing automatic failover.

A big difference between all previous releases of vRealize Automation and the 7.3 release is, that the IaaS Manager Server Services have been started simultaneously on both Windows servers when using this automatic failover feature. There is also no need anymore to change configuration settings on the load balancer in the case of a failure. This failover over process is now fully covered on an application level.

p.s.1: Automatic Manager Server Service failover is disabled by default if you install or upgrade the Manager Service with the standard vRealize Automation Windows installer. You can use the command “python /usr/lib/vcac/tools/vami/commands/manager-service-automatic-failover ENABLE” from one of the vRA virtual appliances to enable this feature manually.

p.s.2: You can use the the command “vra-command list-nodes –components” from one of the vRA virtual appliances to discover which Manager Server Service has been configured as the active node and which one has been configured as the passive node.

As always, please enjoy using this awesome product!