handling liferay multi tenancy

Liferay Saas solution – handling multi-tenancy

Liferay SAAS Overview

While trying to set up an installation of the Liferay Portal (GA 6.0.6) for the company where I currently work there were some requirements that we had to fulfill. The company develops a Liferay Saas solution for supporting various business processes and this has been implemented in the form of portlets deployed in the Liferay Portal. The back-end database chosen was MySQL.

As for any Saas solutions one of the key things is scalability and in the projects requirements it was stated that the solution should scale properly to support around 15000 clients. Each client is actually a different company which can have in turn up to 100 users.  This means a maximum of 1.5 million users could be logged in at a time.

Another requirement was proper data separation. In order to achieve this we have looked into the possible options (as indicated by various sources eg.: Multi-Tenant Data Architecture)

  1. Separate database instances
  2. Shared database, separated database schemas
  3. Shared database and shared schema

As our potential clients are mainly medium to large size companies, their expectations regarding data separation we ruled out option 3. as relying on the business logic layer for separating the data from different clients was too risky.

Given the large number of expected clients option 1. would mean a very high costs infrastructure taking the price of the Saas to unacceptable prices.

Thus, option 2. was the best choice for this scenario offering very good data separation that relies on the database engine security and at a reasonable cost, and also offering another advantage regarding backups and restores which can be made per client (as opposed to option 3. which would have required extensive work and risks).

Liferay sharding

Liferay offers the possibility to store portal instances data in different databases. These databases are called shards. To set up Liferay to use shards we have followed the instructions offered by the Liferay website here.

While setting up shards we have gained an insight on the inner workings of the Liferay architecture. Liferay uses Spring and Hibernate and all its configuration resides in configuration files stored in several places.

As we opted for the shared database, separated database schemas solution, we decided to configure Liferay to use sharding and connect each shard to a different schema in our MySQL database. This worked fine for 50 shards with only setting everything up as indicated by the Liferay documentation. Everything except the Document Library Jackrabbit which was still working on only one schema of the database which in our setup was unacceptable as we did not want all the client documents to be mixed together for security reasons and also for backup/restore reasons. We have solved this by creating a hook as described below in the section Liferay shards for Jackrabbit.

However, we noticed that upon starting up of the portal it connected to each and every database that we configured. This was ok for 50 shards, but for the 15000 shards we needed it meant that the starting up would take forever (actually a couple of hours, but that’s the same thing). Also, it didn’t really make sense to go with such a behavior as this also took up a lot of resources even if no users were logged on any of the instances/shards.

So, our goal now was to configure such that the system is made aware of all the possible databases and schemas it needed to store the data for the 15000 clients, but to only connect to the necessary shard when a user accessed the portal instance in question. Actually on accessing a portal instance, the system will initialize the connection pool allocated for the specific instance. This was a performance trade-off we accepted as creating the first connection to a shard will take a bit more time than having the connection pool already initialized.

Liferay shards for Jackrabbit

I will not detail this in the current post and will try to do it in another post if there are those interested. We absolutely needed the documents repository to be split into one database per instance for data separation and backup / restore purposes.

To achieve this we have created our own implementation JCR hook which we placed in the portal-ext.properties in the “dl.hook.impl” property. This hook uses 15000 different repositories and selects the right repository setting depending on the “companyId” which is actually the instance identifier.

See technical details on how to configure Liferay for multi-tenancy.

John Negoita

View posts by John Negoita
I'm a Java programmer, been into programming since 1999 and having tons of fun with it.

16 Comments

  1. Patrick WolfAugust 8, 2013

    Hi John, Thank you for sharing your experience of multi-tenancy with Liferay which is fantastic to me as I am working on a similar project and I am figuring out how I could achieve it. Searching for a scalable solution, I came across your two blogs oin this topic and your solution of separating data sounds smart.
    For multi-tenancy, I first thought of a solution as mentioned in option 3, which you rejected as requiring extensive work and risks. I thought it it is the simplest and quickest to implement. What do you mean by demanding extensive work and risky. Could you please spell it out? Thnks again.

    Reply
    1. John NegoitaAugust 9, 2013

      Hi Patrick,

      glad you found this useful. Option 3 involves shared database and shared schema. So, only one set of tables with all the data from all clients.

      First of all our clients will not accept such a setup. It’s risky because data is separated only by the business logic implemented in the portal and our portlets. It’s absolutely crucial for our clients that data they put it is only visible to them and not other clients. The portal implementation is quite safe, but not perfect. I can tell you that we had a security audit and there were some issues found that we had to fix in the portal implementation.

      Also, I mentioned that it’s more work with shared database+schema because by contract we need to be able to backup/restore the database without loosing more than 24 hours of data / client. I would rather have separate backups per client rather than have everything in one big backup file. In case one client says they’ve made a mistake and want to revert to point X, I would not want to disappoint all other clients by reverting for all of them at point X. Achieving this in a shared database/schema setup would not be impossible, but it would take a lot of work on our side to try to extract the data separately, and if the client asks to try restore the database partially.

      hope this makes sense to you,
      let me know how your project goes and maybe we can help each other by exchanging information,

      Reply
  2. Patrick WolfAugust 12, 2013

    Hi John,
    Thank you very much for your swift reply. This makes sense and you are actually right. Option 2 is the best alternative in this case.

    The project I am working on is not yet started I am only experimenting. I have already built a basic infrastructure in the cloud with Apache, mod_jk as the connector between Apache and Tomcat/liferay and MySQL. Apache deIivers static content. I created only one portal instance to see how things are working. This is quite OK as you can imagine. Then I wondered how would things go on with thousands of portal instances, each one of these portal instances being owned by a tenant/client.

    Then came to me the question you raise. How can I handle it in a smart way. After that another question which seems trivial came to my mind: “how can I manage tenants’ registration?”. I assume that each tenant would like her own domain name which differentiates from other tenants’ domain names. This means creating a virtual host in Apache and in Liferay through the Control Panel, plus adding a shard and doing all the configurations and perhaps modifying the hooks, then restarting the whole stuff.
    Restaring means service interruption every time a new client comes in. This is unacceptable. So now, my problem is, first implementing sharding, which I think, will not be a piece of cake, and then finding a solution to automate clients’ registration without stopping the service which should be available 24/7. This, I think, can be achievable with a failover mechanism, i.e. two load balancers, one in front of Apache and one in front of Tomat/Liferay. While stopping one member, it redirects to the other one and vice versa. I wonder how you handled this?

    Thank you again for your clear and sensible explanations.

    Reply
    1. John NegoitaAugust 13, 2013

      Hi Patrick,

      this all sounds too familiar to me. We already have in production a smaller setup for 50 shards and we had set it up with nginx instead of Apache as a front end, then 2 Liferay nodes on Tomcat and one MySQL server. nginx took care of both load balancing and failover, but what we realized is that something is probably wrong in the setup of the clustering, because from time to time our clients reported that they were not able to login. This was solved when stopping one of the nodes, so we are now reviewing the clustering configuration. I would be very grateful if you can share your setup for the clustering.
      Regarding registration and shard creation, we have a half manual / half automated process for it, but the good news is that you don’t have to restart the server. One strategy would be to create empty instances upfront and create your own registration portlet that assigns users to the existing instances. We use the same domain for all clients but with different aliases. The aliases have to be registered in the DNS though, and one challenge for us is that our provider does not have an API for managing DNS, so we have to do it manually.

      I can help you with information about how to create the registration portlet and even with the code we use to programatically create and initialize instance. I would appreciate any know-how, materials and examples on setting up the clustering.

      Reply
  3. Patrick WolfAugust 14, 2013

    Hi John,

    Thanks a lot for your clues which will help me.

    Regarding your clients’ problem when being unable to sign in, it is very strange and difficult for me to give you a straightforward solution.
    I set up a cluster of Liferay on vmWare ESX instances for another project. I actually followed blindly Liferay documentation on that topic for the cluster configuration (http://www.liferay.com/fr/documentation/liferay-portal/6.1/user-guide/-/ai/liferay-clusteri-2). First you have to be sure that multicasting is enabled in your network setup for the two cluster members. The portal uses multicast to send messages to all cluster members.
    Then you have to add some extra properties in you portal-ext.properties file regarding Ehcache replication if you are using default Ehacache and other properties for Lucene indexes replication. Everything is explained in Liferay documentation. But I assume you have already had a look at it.
    I don’t know the nginx http server. I am used to Apache an stick to it. For the clustering project, I used the mod_proxy connector because it is already part of the Apache modules. I plan to change for mod_jk which is a bit more complicated to configure but much more flexible and faster. So on that part I cannot be of much help for the nginx part.
    There is no way to attach files here but I can provide you with snippets of configiuration for the cluster on Liferay and a step by step configuration.
    I think and wonder whether the fact that your clients cannot connect sometimes when your two nodes are enabled, it is not a side effect of the sharding? It is just an assumption. Did you have a look at the catalina.out file when one client cannot connect, and in your nginx logs and mysql logs at the same time? Or can you reproduce it on your test or pre-production environment?
    Did you check the host configuration on both nodes (server.xml connector confugration for the cluster)?
    I am sorry that I cannot help much. There may be many diferent root causes to your problem.

    Reply
  4. Patrick WolfAugust 18, 2013

    Hi John,

    This is very interesting the way you implemented a SaaS solution with Liferay.

    Regarding cluster configuration, I blindly followed Liferay documentation, which I think you may have already read.

    I don’t know at all the nginx HTTP server. Which connector module does it use?
    mod_jk or mod_proxy_ajp together with mod_proxy_balancer and mod_proxy?

    Apache comes with mod_proxy out of the boxe. This is the simplest to configure. However mod_jk is more flexible, offer more capabilities and is faster, but a little bit trickier to configure.

    Did you enable multicasting on your network interface? This is a requirement of Liferay for session and cache replication. If this is not possible because you have your platform on a cloud provider, which usually forbids multicasting, you can get it working over unicast by setting up RMI of JGroups.
    I have not had to use this altrernative yet. In your web.xml file, did you add a tag:

    Another way is to add in the context.xml file the line:

    and do the same for the other nodes.

    Sorry, I would like to help you more but it is not easy as your problem may depend on your environment settings. When a user fails to sign in, do you have any errors in the log of nginx or Tomcat?

    If you need an example on how I configured clustering, I can provide you with them.

    Reply
    1. John NegoitaAugust 19, 2013

      Hi Patrick,

      thank you for the information. I believe that our configuration is missing the ehcache configuration all together. I will have someone do a review of the configuration this week. I will send you an email if that’s ok with you and communicate easier and maybe we can exchange configuration files. Thanks.

      Reply
  5. Patrick WolfSeptember 2, 2013

    Hi John,

    I am very sorry for my late posting. I hope you managed to configure ehcache and solve your problem with login. If you still have some difficulties, don’t hesitate to drop me an email.

    Reply
    1. John NegoitaSeptember 20, 2013

      Hi Patrick,

      with the holidays and some of the issues we’ve encountered after coming back, we’ve left the clustering setup aside for the moment. We will pick it up again in a couple of weeks hopefully.

      Reply
  6. […] Liferay usually manages multiple instances. If you have multiple clients hosted on your installation then you probably set up a new instance for each client. Also, Liferay can be configured for using multiple shards. That means that it can separate the data for each instance (client) on a different database. If you would like to find out more about how you can configure a multi sharded environment see my post Liferay Saas solution – handling multi-tenancy. […]

    Reply
  7. […] If you are interested in specifics for the persistence layer, and especially the way that Liferay handles a multi-tenant architecture here’s a post that discusses Liferay multi-tenancy configuration. […]

    Reply
  8. Lauri PietarinenJune 11, 2014

    Here is another alternative that positions itself between alternatives 2 and 3:

    – create shared schema and shared tables but separate using views:

    Here is a quick example (using one table):

    create table base.users (tenantid int, userid char(10), username char(50) etc…);

    create view tenant1.users as select * from base.users where tenantid = 1;
    create view tenant2.users as select * from base.users where tenantid = 2;
    etc…
    create view tenant100.users as select * from base.users where tenantid = 100;

    Now use sharding as in alternative 2, but with only one set of “real” tables. The shards will be isolated, but the base table will be the same, which, in my opinion will simplify backups and maintenance. It is not trivial, in my opinion, to manage 15000 x n tables.
    The shards will only see their own stuff.

    Restoring tenant-wise would be simple enough, since all rows can be identified by tenant and surgically removed and restored from temporary backup tables.

    Or…

    have some kind of hybrid combination: Probably most of the 15K customers will be small with some larger ones among them. Create a common base schema for the little guys and separate set of tables for the big guys.

    Lauri

    Reply
  9. Pramod GajreNovember 2, 2014

    Hi Jhon,

    Is it possible to achieve schema level multi-tenancy using PostgreSQL DB
    (Shared database, separated database schemas)

    Thanks,
    Pramod

    Reply
    1. John NegoitaFebruary 16, 2015

      Hi Pramod,

      in theory any JDBC datasource can be used with this setup, however I did not test it with PostgreSQL DB. Let me know if you do and if you run into any issues.

      Reply
  10. […] name of the associated shard; if you don’t know what they are, you can read more about Liferay shards (Liferay sharding paragraph). Basically Liferay shards are logical or physical database areas […]

    Reply
  11. […] I encountered the concept of materialized views when working on a small dashboarding project on Liferay. The dashboard was supposed to display statistics about an application users data. The challenge was that the application used internally several hundreds MySQL databases (schemas). Each schema had the same tables and structure, but different data corresponding to a different client. If you want to know more about the details of this read my article on handline multi-tenancy in Liferay. […]

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top