Liferay SAAS Overview
While trying to set up an installation of the Liferay Portal (GA 6.0.6) for the company where I currently work there were some requirements that we had to fulfill. The company develops a Liferay Saas solution for supporting various business processes and this has been implemented in the form of portlets deployed in the Liferay Portal. The back-end database chosen was MySQL.
As for any Saas solutions one of the key things is scalability and in the projects requirements it was stated that the solution should scale properly to support around 15000 clients. Each client is actually a different company which can have in turn up to 100 users. This means a maximum of 1.5 million users could be logged in at a time.
Another requirement was proper data separation. In order to achieve this we have looked into the possible options (as indicated by various sources eg.: Multi-Tenant Data Architecture)
- Separate database instances
- Shared database, separated database schemas
- Shared database and shared schema
As our potential clients are mainly medium to large size companies, their expectations regarding data separation we ruled out option 3. as relying on the business logic layer for separating the data from different clients was too risky.
Given the large number of expected clients option 1. would mean a very high costs infrastructure taking the price of the Saas to unacceptable prices.
Thus, option 2. was the best choice for this scenario offering very good data separation that relies on the database engine security and at a reasonable cost, and also offering another advantage regarding backups and restores which can be made per client (as opposed to option 3. which would have required extensive work and risks).
Liferay offers the possibility to store portal instances data in different databases. These databases are called shards. To set up Liferay to use shards we have followed the instructions offered by the Liferay website here.
While setting up shards we have gained an insight on the inner workings of the Liferay architecture. Liferay uses Spring and Hibernate and all its configuration resides in configuration files stored in several places.
As we opted for the shared database, separated database schemas solution, we decided to configure Liferay to use sharding and connect each shard to a different schema in our MySQL database. This worked fine for 50 shards with only setting everything up as indicated by the Liferay documentation. Everything except the Document Library Jackrabbit which was still working on only one schema of the database which in our setup was unacceptable as we did not want all the client documents to be mixed together for security reasons and also for backup/restore reasons. We have solved this by creating a hook as described below in the section Liferay shards for Jackrabbit.
However, we noticed that upon starting up of the portal it connected to each and every database that we configured. This was ok for 50 shards, but for the 15000 shards we needed it meant that the starting up would take forever (actually a couple of hours, but that’s the same thing). Also, it didn’t really make sense to go with such a behavior as this also took up a lot of resources even if no users were logged on any of the instances/shards.
So, our goal now was to configure such that the system is made aware of all the possible databases and schemas it needed to store the data for the 15000 clients, but to only connect to the necessary shard when a user accessed the portal instance in question. Actually on accessing a portal instance, the system will initialize the connection pool allocated for the specific instance. This was a performance trade-off we accepted as creating the first connection to a shard will take a bit more time than having the connection pool already initialized.
Liferay shards for Jackrabbit
I will not detail this in the current post and will try to do it in another post if there are those interested. We absolutely needed the documents repository to be split into one database per instance for data separation and backup / restore purposes.
To achieve this we have created our own implementation JCR hook which we placed in the portal-ext.properties in the “dl.hook.impl” property. This hook uses 15000 different repositories and selects the right repository setting depending on the “companyId” which is actually the instance identifier.
See technical details on how to configure Liferay for multi-tenancy.