RedHat Openshift deployment inside a VMware Cloud Director’s organization (2).

Part 2: Install overview and high level procedure

This post is the second in a series dedicated to deploying RHOS within a multitenant environment such as a VCD org. In the first part, I outlined the architecture of the solution, as well as the various options regarding storage and backup. In this second part, I will delve much more into the practical aspects, detailing the steps taken to install the base, OCP, upon which I will subsequently add the components of the PLUS version.

The cluster is made of a series of VMs created within the VCD org, 7 to be precise: 3 Masters, 2 Workers, and 2 Infra nodes. From a licensing perspective, RedHat requires only the workers to be licensed. The number and attributes of nodes are as reported in the official documentation: a minimum of 3 for the control plane, at least 2 for the workers, and the same for the infra nodes.

Name/RolevCPURAMStorageIP
Master01817217x.x.x.11
Master02817217x.x.x.12
Master03817217x.x.x.13
Worker01817217x.x.x.21
Worker02817217x.x.x.22
Infra011224224x.x.x.31
Infra021224224x.x.x.32
Fileserver112118x.x.x.101
VMs to create

The same goes for resources, as a shortage of them could prevent some PODs from starting. Therefore, it is essential to avoid unnecessarily occupying valuable resources by creating an org according to the PAYG or FLEX model, setting the allocation to 0%. Also for storage, which should be of the thin type.

Master and worker nodes have the same resources, while the infra nodes require much more as they are responsible for supporting the additional 4 elements of the PLUS version.

In case of resource scarcity, we can consider a single-node cluster installation, strictly not for production use. However, in this case, it will not be possible to attach the PLUS components.

The total (“virtual”, minimally used) resources will be 61 vCPUs, 135GB of RAM, and 1,7 TB of storage. For simplicity, we will put everything on the same organization network, separating the nodes by role within vAPPs (Master, Worker, Infra). Additionally, we will need a public IP to assign to the console and API endpoint: one will suffice since they use different ports. In this demo, I will use a public domain name on which I have control over the DNS to add the relevant “A” records for the console and API.

The file server is a RHEL 9 dedicated explicitly to the role of “file server”. We will proceed with the following steps:

  • Setup of an organization in VCD
  • Creation of empty VMs
  • Creation of the cluster in the RHOS console
  • Setup of the Edge NSX (for NAT and FW)
  • Creation of DNS records
  • RHOS procedure until the ISO download to be attached to the empty VMs for booting
  • Connection of the ISO and powering on the VMs
  • Completion of the cluster installation from the RHOS console
  • Installation of RHEL9 on the file server with definition of the share to be used as persistent storage

In the next post I’ll show step-by-step the activities above.

RedHat Openshift deployment inside a VMware Cloud Director’s organization (1).

Part 1: Architecture Overview, Persistent Storage and Backup

The use of container-based environments, specifically Kubernetes, has long ceased to be the exclusive domain of developers. For numerous advantages, which I won’t list here due to space constraints (and the article’s purpose), many traditional applications are migrating to this type of technology. VMware has been moving in this direction for several years now, offering a Kubernetes-based offering with Tanzu. However, many users require a specific platform, namely RedHat’s Openshift.

Unlike Tanzu, which is integrated into VCD via the CSE extension, Openshift requires a separate installation and usage. Despite being an installation on the world’s most widely used virtual platform in on-premises and hybrid environments, vSphere, and on the layer above that makes this platform multi-tenant, VCD, the deployment of Openshift can occur in a console-assisted mode by Redhat in bare metal version.

I’ve designed the architecture of this infrastructure in several steps, from the simplest and fastest, namely an ad hoc installation for a tenant in an organization, self-contained, to envisioning what is called Hypershift, where I see the cluster composed of shared masters, with access to the management console in multi-tenant mode residing in a “master” common organization, from which each client/tenant can deploy the necessary worker nodes (and possibly infrastructure ones) for their workloads, within their organization. I don’t consider it feasible to share infrastructure nodes given the peculiarities that each client can give to their installation. However, this type of architecture is still in its infancy.

In this series of articles, I will therefore describe the architecture and development of the simplest mode, the self-contained one with all nodes dedicated to the tenant, distinguishing and starting from the basic version of OCP (unlike the “Plus” version, which includes other components I will discuss later).

A note of caution is necessary for storage in case of persistent storage needs. By default, OCP does not require persistent storage. However, if needed, I have experimented with different approaches, not all successful.

  • Local storage: it would have been the simplest, albeit the most resource-intensive (the storage of each node had to be exactly replicated in the other nodes, with a huge waste of space, albeit super-redundant). However, according to Kubernetes documentation, the local storage provisioner did not allow dynamic volume creation.
  • Gluster: The provisioner was native, but here too there was a huge waste of resources, as it required at least 2 nodes to build the cluster plus a third for the Heketi balancer.
  • FreeNAS: A free application, licensed as TrueNAS, with the NFS provisioner, some applications didn’t work due to the system renaming the owner of some folders.
  • Netapp Ontap Select: A great solution, with a myriad of available features, especially indicated for those who have physical Netapp devices behind the scenes and do not want to expose them publicly. The downside: the license cost, rightly high given the numerous features available.
  • A physical VLAN from the physical storage – which however conflicts with the concept of abstraction and virtualization, along with a long series of security considerations.
  • Deploying RH Data Foundation, which is essentially Ceph, a component of the Plus version (but can also be installed separately).
  • A file server (in my case RedHat) with an NFS-exported share to be entered into the platform via the NFS provisioner. And this was my choice.
    Someone might question the lack of redundancy in this choice. Actually, not exactly: you could build a cluster of file servers, or if you’re not talking about DR/BC but more lenient RTO/RPO, a simple backup of this server, perhaps at tighter intervals.

There’s also the possibility of adopting an S3 protocol, which I didn’t explore in this test.

Finally, since I mentioned backup, I’ll report that for data and infrastructure protection, I deemed it appropriate to work on 3 levels:

  • The basic one, where I back up all virtual machines in a “traditional” way.
  • The intermediate one, where the “etcd” file is the backup object, to preserve the platform infrastructure.
  • The upper layer, but also the most delicate one, where I save all namespaces and related data, through, in my case, Kasten 10.

I omitted the backup of the actual data, but it’s not really a forgetfulness because it falls under the first point, where I back up all VMs in the organization, including the file server.

With this premise, I will continue in the next article with the definition of prerequisites and step-by-step deployment of the solution within a VCD organization.

Infrastructure automation with VMware Cloud Provider Lifecycle Manager: Infrastructure as Code for a Cloud Service Provider

Infrastructure as Code (IaC) is a fundamental concept in modern IT operations, revolutionizing the way infrastructure is provisioned, managed, and scaled. When it comes to VMware products, one of the key tools for managing and provisioning infrastructure through IaC is the VMware Cloud Provider Lifecycle Manager (CPLM). This tool enables service providers to automate and streamline the deployment, operation, and management of VMware cloud solutions.

Understanding Infrastructure as Code (IaC)

IaC is the practice of managing and provisioning computing infrastructure using code and automation, ensuring consistency, repeatability, and efficiency. It involves defining infrastructure, networking, and system configurations using declarative or imperative programming, which can be version-controlled, shared, and deployed automatically.

Introduction to VMware Cloud Provider Lifecycle Manager (CPLM)

IaC is the practice of managing and provisioning computing infrastructure using code and automation, ensuring consistency, repeatability, and efficiency. It involves defining infrastructure, networking, and system configurations using declarative or imperative programming, which can be version-controlled, shared, and deployed automatically.

Key Features and Benefits of VMware CPLM

1. Automated Lifecycle Management: CPLM automates the provisioning, upgrading, and management of VMware cloud solutions, reducing manual efforts and the risk of errors.

2. Consistency and Standardization: CPLM enforces consistent configurations and best practices, ensuring a standardized approach across the infrastructure.

3. Version Control: Utilizing version-controlled code, configurations, and templates, CPLM allows for tracking changes and reverting to previous versions if needed.

4. Scalability and Flexibility: The automated and programmable nature of CPLM allows for scaling infrastructure up or down based on demand, providing flexibility in resource allocation.

Leveraging IaC Principles with CPLM

  • Code Templates: Utilize code templates written in a domain-specific language (DSL) compatible with CPLM to define infrastructure requirements, configurations, and settings.
  • Declarative Approach: Use a declarative approach to describe the desired state of the infrastructure, allowing CPLM to ensure the actual state matches the defined state.
  • Modularization and Reusability: Modularize code into reusable components, promoting code reuse and easier maintenance of infrastructure configurations.
  • Parameterization: Parameterize configurations to allow for customization based on specific deployment requirements, making the IaC adaptable to various scenarios.

Sample Workflow using VMware CPLM

Summarizing in few, schematical points:

a. Create a code template defining the desired infrastructure and configuration.

b. Version control the code template using a suitable version control system (e.g., Git).

c. Use CPLM to deploy the infrastructure, referencing the code template and desired parameters.

d. Monitor and manage the deployed infrastructure through CPLM, making necessary updates or scaling actions through code changes and re-deployment.

Conclusion

Incorporating Infrastructure as Code principles through VMware Cloud Provider Lifecycle Manager enhances the efficiency, consistency, and scalability of managing VMware-based cloud solutions. By embracing automated and programmable infrastructure provisioning, organizations can accelerate deployment processes, reduce errors, and optimize resource usage, ultimately leading to a more agile and reliable IT environment.

Data Sovereignty with GenAI in VMware Environments

In my previous post, where I expressed all my enthusiasm about GenAI applied to VMware through Nvidia, I also left the post with some concerns about data sovereignty.

Actually, living in a context where decisions are data-driven, the concepts of data sovereignty and AI are reshaping how businesses operate. GenAI joins the game, a revolutionary AI platform, together with VMware. When these forces combine, they create a dynamic synergy that not only harnesses the power of AI but also champions the crucial principle of data sovereignty.

Data sovereignty, the idea that individuals and organizations should retain control over their data, has gained prominence in an era marked by concerns about privacy and data breaches. GenAI, with its advanced AI capabilities, integrates perfectly with this notion. When integrated into VMware environments, GenAI not only enhances AI functionalities but also respects the sovereignty of data.

In a VMware environment, GenAI acts as a catalyst for businesses to extract invaluable insights from their data. Whether it’s uncovering trends, predicting customer behavior, or optimizing processes, GenAI’s capabilities are as transformative as they are versatile. But what truly sets this collaboration apart is its commitment to data sovereignty.

GenAI empowers VMware users to maintain control over their data throughout the AI process. This means data remains secure and within the confines of an organization’s infrastructure, while GenAI’s algorithms process and generate insights. This approach aligns with the growing need for ethical and responsible AI, ensuring that sensitive data doesn’t leave the confines of a trusted environment.

As AI becomes an integral part of business operations, data sovereignty becomes a non-negotiable aspect of adopting AI technologies. This is especially true in industries where data privacy regulations are stringent. GenAI’s integration within VMware environments not only augments AI capabilities but also builds a bridge between innovation and data protection.

This collaboration speaks to a future where AI’s potential is harnessed without compromising data security. As companies look to extract value from their data, GenAI within VMware environments represents a milestone in striking a balance between innovation and data sovereignty. In this landscape, the empowered user can embrace AI’s advantages with confidence, knowing that their data remains sovereign and their insights remain transformative.

Empowering Innovation: VMware Integrates GenAI for Advanced AI Capabilities

Time is done for VMware Explore! (Do you really mind if I’ll continue with VMworld? 🙂 ) Unfortunately I can’t be in Vegas, but hopefully I’ll be in Barcelona. In the while some of my friends attended the current event, and couldn’t stop talking about GenerativeAI (as also Stephen Foskett and Tom Hollingsworth stated in their Rundown).

So I felt the hype and started having a deeper knowledge abouth this symbiosis – GenAI and VMware. And this is what I learnt, so far.

In the ever-evolving landscape of technology, the collaboration between two industry leaders can often lead to groundbreaking advancements. This is exemplified by the strategic integration of GenAI, a pioneering AI platform, with VMware, global leader in cloud infrastructure and digital workspace technology. This partnership marks a significant leap forward in the accessibility and power of AI capabilities within VMware’s ecosystem.

GenAI’s integration with VMware introduces a new dimension of AI-driven functionalities to VMware’s extensive suite of products and solutions. This partnership is poised to unlock innovative opportunities for businesses seeking to harness the potential of artificial intelligence to enhance their operations, optimize processes, and drive efficiency.

The core strength of GenAI’s AI capabilities seamlessly integrates with VMware’s infrastructure. By combining GenAI’s natural language processing, image recognition, and other AI-powered features with VMware’s robust cloud infrastructure and digital workspace solutions, users can leverage AI in previously unimaginable ways.

Consider the scenario of an organization utilizing VMware’s cloud services. With the integration of GenAI, the data stored in the cloud can now be analyzed, processed, and transformed using advanced AI algorithms. This enables companies to extract valuable insights, make data-driven decisions, and uncover hidden patterns that might have remained unnoticed otherwise.

Furthermore, GenAI’s prowess in text generation can enhance communication and interaction within VMware’s digital workspace environment. Imagine automated chatbots powered by GenAI, capable of providing instant, contextually relevant responses to user inquiries. This not only improves user experience but also streamlines customer support and increases operational efficiency.

The integration also emphasizes the importance of responsible AI. GenAI’s commitment to transparency and user control aligns perfectly with VMware’s dedication to ethical AI deployment. Users within the VMware ecosystem can fine-tune AI outputs, ensuring that the technology works in harmony with their goals and values.

This collaboration is a testament to the transformative potential of AI technologies when integrated into established tech ecosystems. By harnessing GenAI’s capabilities within VMware’s infrastructure, businesses can embark on a journey of innovation and optimization that was previously unattainable.

In conclusion, the integration of GenAI within VMware’s ecosystem heralds a new era of AI-powered possibilities. The amalgamation of GenAI’s advanced AI capabilities with VMware’s cloud and digital workspace solutions opens avenues for businesses to innovate, streamline operations, and make informed decisions. As this partnership continues to evolve, it exemplifies the power of collaboration in propelling technological advancements and shaping a more efficient and AI-empowered future.

I’m still exploring the matter looking from a security, confidential computing and sovereign data perspective. I’ll be back with it, in the while, for all that can… enjoy VMware Explore!

Unable to assign a tag in vSphere

After long time I’m here again taking away dust from my blog.

Today I want to share an issue that I found trying to assign a tag to a datastore, in order for VMware Cloud Director to use it inside the same storage policy.

This is what I found when I was trying to assign the tag:

And the relative pop-up – blank:

Having a look at the “Tags and Custom Attributes” section, I had all my tags available to be assigned, but I didn’t have them in the previous pop-up window.

Wandering all the links and tabs I found among categories that the field in the category related to the tag I was trying to assign “Tags per object” was checked as: “One tag”. This means that I can assign only 1 tag to an object.

Changing it in “Many tags”:

made the window available and populated.

And, from the previous view, the field “Multiple Cardinality” changed in “true”:

Tagging is a very powerful tool, but it needs attention and accuracy to take full advantage of it.

VCD Cell status: “Unknown”

I’d like to share an experience that I had recently when I decided to upgrade our VCD from 10.1 to 10.3.
According to the upgrade procedures from VMware, one of the prerequisites was to have all the cells in a “Manual” or “Automatic” state in case of failover.

The cell’s status from appliance console

In my case I had that status as “Indeterminate”. Usually this happens whan one of the cells is in a defined status and the others in the opposite status.
I hadd all the cells in manual status, but when I sent a “GET” from API through postman, that condition resulted as “Unknown”.
I tried to set it again with a POST but nothing changed, still unknown.
The environment was a production one, so I didn’t like the idea to redeploy that cell, although not the main one.
After several VMware GSS support sessions, the last one was resolutive.
After confirming that DB was healthy with the commands:

root@vcd1a [ ~ ]# su – postgres
postgres [ ~ ]$ repmgr cluster crosscheck

And that the cell was in unknown state with
/opt/vmware/appliance/bin/api/replicationClusterStatus.py

SSH console

Then I checked the file
/opt/vmware/vpostgres/10/etc/repmgr.conf

for all of the cells: for some reason, some of the expected contents of that file on the affected cell was missing. Actually, the content was much less than the other ones. In this cell I only had these lines:

node_id=24618
node_name='vcd1a'
conninfo='host=192.168.100.101 user=repmgr dbname=repmgr'
data_directory='/var/vmware/vpostgres/current/pgdata'
pg_bindir='/opt/vmware/vpostgres/current/bin'

repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgrd'
repmgrd_service_stop_command='sudo /usr/bin/systemctl stop repmgrd'

service_start_command = 'sudo /usr/bin/systemctl start vpostgres'
service_stop_command = 'sudo /usr/bin/systemctl stop vpostgres'
service_restart_command = 'sudo /usr/bin/systemctl restart vpostgres'
service_reload_command = 'sudo /usr/bin/systemctl reload vpostgres'

I missed a lot of other lines: thanks to support I added the missing ones:

repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgrd'
repmgrd_service_stop_command='sudo /usr/bin/systemctl stop repmgrd'

monitor_interval_secs=2 #The interval (in seconds, default: 2) to check the availability of the upstream node.
connection_check_type=ping #The option connection_check_type is used to select the method repmgrd uses to determine
                                #whether the upstream node is available.
                                # Possible values are:
                                # ping (default) - uses PQping() to determine server availability
                                # connection - determines server availability by attempt ingto make a new connection to the upstream node
                                # query - determines server availability by executing an SQL statement on the node via the existing connection
reconnect_attempts=6 #The number of attempts (default: 6) will be made to reconnect to an unreachable upstream node before
                                #initiating a failover.
                                #There will be an interval of reconnect_interval seconds between each reconnection attempt.
reconnect_interval=1 #Interval (in seconds, default: 10) between attempts to reconnect to an unreachable upstream node
                                #The number of reconnection attempts is defined by the parameter reconnect_attempts
degraded_monitoring_timeout=-1 #Interval (in seconds) after which repmgrd will terminate if either of the servers (locale node and
                                #or upstream node) being monitored is no longer available (degraded monitoring mode).
                                #-1 (default) disables this timeout completely.
failover=manual
promote_command='/opt/vmware/vpostgres/current/bin/repmgr standby promote -f /opt/vmware/vpostgres/current/etc/repmgr.conf --log-to-file'
follow_command='/opt/vmware/appliance/bin/standbyFollow.py %n'
promote_check_timeout=30
primary_visibility_consensus=true
#------------------------------------------------------------------------------
# Logging settings
#------------------------------------------------------------------------------
#
# Note that logging facility settings will only apply to `repmgrd` by default;
# `repmgr` will always write to STDERR unless the switch `--log-to-file` is
# supplied, in which case it will log to the same destination as `repmgrd`.
# This is mainly intended for those cases when `repmgr` is executed directly
# by `repmgrd`.
log_level='INFO' # Log level: possible values are DEBUG, INFO, NOTICE,
                                 # WARNING, ERROR, ALERT, CRIT or EMERG
                                 # WARNING, ERROR, ALERT, CRIT or EMERG

#log_facility='STDERR' # Logging facility: possible values are STDERR, or for
                                 # syslog integration, one of LOCAL0, LOCAL1, ..., LOCAL7, USER
log_file='/var/vmware/vpostgres/current/pgdata/log/repmgr.log' # STDERR can be redirected to an arbitrary file
log_status_interval=300 # interval (in seconds) for repmgrd to log a status message

And, yes – it made the magic. Also this cell now was in Manual mode.

I can’t imagine why that file was cropped, if it was a result of previous upgrades or, simply, some clumsy manual command. The main point is that now I was able to upgrade my VCD.

Pending activities from a VCD tenant to vCenter

Sometime could happen that, for several reasons, your vCD installation is unable to perform and transmit activities from tenants to the underlying vCenter.
When these requests become huge, your VCD could stop talking with vCenter.
In this case, what you need to do is to touch your cloud DB. ATTENTION: we’re talking of a MSSQL-backed VCD installation.


First of all, stop the cells

Second, and I should shout, SECOND: take a snapshot of your DB PLUS take a backup of your SQL


Third, opening SQL Studio, launch the following query:
..

Delete from task;
update jobs set status = 3 where status = 1;
update last_jobs set status = 3 where status = 1;
delete from busy_object;
delete from ccr_drs_host_group_host_inv;
delete from ccr_drs_host_group_inv;
delete from ccr_drs_rule_inv;
delete from ccr_drs_vm_group_inv;
delete from ccr_drs_vm_group_vm_inv;
delete from ccr_drs_vm_host_rule_inv;
delete from compute_resource_inv;
delete from custom_field_manager_inv;
delete from cluster_compute_resource_inv;
delete from datacenter_inv;
delete from datacenter_network_inv;
delete from datastore_inv;
delete from dv_portgroup_inv;
delete from dv_switch_inv;
delete from folder_inv;
delete from managed_server_inv;
delete from managed_server_datastore_inv;
delete from managed_server_network_inv;
delete from network_inv;
delete from resource_pool_inv;
delete from storage_profile_inv;
delete from storage_pod_inv;
delete from task_inv;
delete from task_activity_queue;
delete from activity;
delete from failed_cells;
delete from lock_handle;
delete from vm_inv;
delete from property_map;

After restarting the cells you need to reconnect your vCenter (s) to resync your environment.


Be aware that this operation will not solve your main issue affecting communication between VCD and vCenter: all started because of it, so as soon as you have your VCD back working, start looking at the first problem.

VMware Cloud Director – migration to 9.7 appliances. Troubleshooting.

This night I decided to migrate my linux-based MSSQL-based VCD cells to an appliances installation.
I found a very exhaustive post from

https://stevenonofaro.com/migrating-from-vcloud-director-9-5-with-an-sql-database-to-the-vcloud-director-9-7-appliance/

with the support of official documentation

https://docs.vmware.com/en/VMware-Cloud-Director/9.7/com.vmware.vcloud.install.doc/GUID-826F9B56-7A0D-4159-89E4-2BB522D9F603.html.

Everything went smooth ’till deployment of stand-by cells.
According to documentation


https://docs.vmware.com/en/VMware-Cloud-Director/9.7/com.vmware.vcloud.install.doc/GUID-26E41AD2-0268-4B12-9505-9F729C5EF63E.html

the only operations to perform were following the online procedure during OVF deploy, specifying that this was a stand-by cell (same size of the primary one).
But I realized that VCD service didn’t start, configuration logs were showing that a password for certificates keystore wasn’t provided and configuration failed.


So, I restarted the configuration and after asking me which NIC I wanted to assigno to http and to consoleproxy (which I didn’t expect since in the appliance deployment bot services are provided by eth0), it asked for the private password for both certificates, http and consoleproxy.

Don’t be scared by the colour…. it’s a customized version 🙂


So, the service started, the cell was shown and active in VCD GUI, but…. on its console at port 5480 couldn’t reach the primary cell (unreachable status). Instead, the primary informed me that this stand-by cell was in a “running” status.
Going deeper, connecting via ssh to the primary cell, the following command:

sudo -i -u postgres /opt/vmware/vpostgres/current/bin/repmgr -f /opt/vmware/vpostgres/current/etc/repmgr.conf cluster matrix

gave me the following output:

INFO: connecting to database
Name | Id | 16711 | 24618 | 26034
—————-+—-+—-+—-+—-
xx-vcd-cell01 | 24618 | * | * | *
xx-vcd-cell02 | 26034 | ? | ? | ?
WARNING: following problems detected:
node 24618 inaccessible via SSH

…seems cell #2 was right.

And again, the output of the following command:
sudo -i -u postgres /opt/vmware/vpostgres/current/bin/repmgr -f /opt/vmware/vpostgres/current/etc/repmgr.conf cluster crosscheck
was:
INFO: connecting to database
Name | Id | 16711 | 24618 | 26034
—————-+—-+—-+—-+—-
xx-vcd-cell01 | 24618 | * | * | *
xx-vcd-cell02 | 26034 | ? | ? | ?
WARNING: following problems detected:
node 26034 inaccessible via SSH

Same result.
From
tail -100 cell-runtime.log
I found that the cell wasn’t listed in the pg_hba.conf – I expected this task was completed by the configuration task. So I proceeded manually, creating /opt/vmware/appliance/etc/pg_hba.d/cell-file.txt” with this cell IP:

“#TYPE DATABASE USER ADDRESS METHOD

host vcloud vcloud 192.168.3.182/24 md5

(VLAN3 is the VLAN assigned to eth1, the DB one)
but the console for the stand-by cell didn’t change, still primary unreachable.
At least, the previous log didn’t show again that error, instead the new cell was added to Broker Network.

In any case, I don’t understand why the system was complaining about the missing entry in pg_hba.conf since it periodically reconnected to the DB: it should be reasonable OR unreachabe OR reachable, ever!

Proceeding with log observation, another strange behaviour: every minute an old Cell UUID was removed (the previous one) and a new one was added. And so on. Well…. this is not normal, of course, or at least I think so…

Then I had a look at a Tom Fojta’s post here,

https://fojta.wordpress.com/tag/cell/,

in order to make all the cells connect each other with no password sharing a key. But it just worked in 1-way, from primary to stand-by and not vice versa. Same result if I executed the same commands on the stand-by cell: able to manually connect to the primary witout password, but the postgres command was still showing no ssh connection. Modifying vcloud user to postgres one in the chown command in Tom’s post didn’t work either.

Last – certificate, again, coming back to the original error I received. I remember when I renewed them last time, I set a password for the 2 single certs (http and consoleproxy), and a different one for the keystore. Having a look at the configuration file I noticed, as written before, an error when verifying the password: actually, it wasn’t wrong, it only was dfferent from keystore.


Solution wasn’t to run the configure command manually (many other parameters are inside the response file) but, instead, I set the 2 private cert passwords same as the keystore password, removed the 2 stand-by cells and redeployed from scratch.


This time, as described by official documentations, I didn’t need to add anything else than the first configuration during OVF deployment (except the user/password section for DB, since already present in the response file).

Now I have a perfect running VCD system.


About Load Balancing: active – stand-by is for DB role, but not for application. So, the cells can run in active active configuration from the application’s point of view.

Thanks to GSS for many of these hints, I learned a hard lesson from this case.

Tenant App for vROPS: Access Denied

When a Tenant App for vROPS was announced, I was really excited: the only thing this product was missing was a multitenant option. Thanks to this feature, not only it became multitenant, but also integrated in VCD!


Setup was more or less smooth, with some FW ports and routing to set (remember, you also need a public IP to NAT the tenant app to, since it will be available through VCD.

Double authorization for every tenant, from VCD side and from Tenant App side.


But the first run wasn’t successful. I kept getting an “Access denied” for the enabled tenants.

This is the VCD part…
…and this, on the tenant app side

Eventually I decided to open a SR to GSS because I didn’t get rid of this situation.
They pointed me immediately to the known issues section https://docs.vmware.com/en/Management-Packs-for-vRealize-Operations-Manager/2.4/rn/Tenant-App-24-Release-Notes.html#knownissues

Since mine was a greenfield installation, the point that solved my issue was the second one: logging as admin to Tenant App, Access Management: disabled organizations that were enabled, and enable again.
After this: now it works as a charm. I only have to remind the process for the nex tenants.
Important: this feature doesn’t work if you login to VCD as provider, but only as tenant administrator.

It opens new offer opportunities: beside the primary role of monitoring and analyzing the tenant environment, it also allow the possibility to bill resources per use.

Virtual is ethereal