Real Time Collaboration in Alfresco
One of the challenges of working remotely is the ability to collaborate without stepping on each other’s toes. When creating and collaborating on documents, you need to ensure that the version you have is up to date, and any new versions you put in will be current by the time you make changes. Alfresco is great when this is a purely serial process: one person locks a document to work on it, then when finished notifies the next, incrementing the version.
But what if, instead, you need multiple people to work on the same document, at the same time? What if you wanted to see those changes, as they were typing? Enter Real-Time Collaborative Editing.
Real-Time Collaborative Editing
At Parashift, we work remotely all the time, which means that collaborative editing is a great problem to solve for us and our clients. Externally, a Real-Time Collaborative Editing tool capability in Alfresco has come about as a request from our clients quite often, and its one we have spent a lot of time researching to find the right fit. The value in being able to simultaneously work on a proposal or a design document is a great addition to any ECM platform. Alfresco’s core value is not in editing, but managing content, separating the concern of editing to existing tools, which means that there is a void to be filled.
With the challenge identified, and after a few discussions internally, we set out to find a solution to meet the following requirements:
- Must allow more than one person to edit a document at a given time
- Must to support office documents or the conversion to them
- Must be easy to integrate or have an existing integration into Alfresco
- Should be Open Source Software
- Should ideally run in a web browser
We found 3 applications that may solve this challenge: Etherpad-Lite, Google Docs and Onlyoffice. We decided to give each of these a test drive, putting our findings below.
A brief history of real-time web editors
In 2009, Google Wave was released to a relatively lukewarm reception and was decommissioned a year later. Google had acquired EtherPad to work on Google Wave, and shortly after released their EtherPad editor as open source. In traditional Open Source fashion, the community gathered to ensure this tech would remain accessible, releasing Etherpad-Lite, built on top of NodeJS. Google Wave also lives on as an Open Source project, as Apache Wave.
Google continued to pursue real-time web editing, releasing Google Docs which, coupled with Google Drive, is a powerful offering.
OnlyOffice, an open source office suite, and a competitor to Google Docs, has also recently added real-time collaborative capabilities.
We tried out each of the 3 applications as a fit for our requirements and, along the way, created a couple of integrations into Alfresco. We’ve released these two integrations, so rather than taking our word for it, you can try it out for yourself!
Google Docs was our first stop, as there is an existing integration into Alfresco, so the barrier of entry was greatly reduced.
We found the experience pleasant overall, but there were a few quirks:
- Having to link your Google account via OAuth can be cumbersome, and within testing we had it break once or twice, having to manually delete some hidden settings folders to reset a couple of users. The newer versions since our trial have resolved this, but the initial setup, linking Alfresco accounts to Google via OAuth, is still there.
- Google Docs is a SaaS offering, which means that if you require everything to be on-premise, this is not the right fit for you. If you can accept that some files may live on Google’s servers, then this shouldn’t be a concern, but for the more security conscious it may not be entirely appropriate.
The only real show stopper we found is that Google Docs disables real-time collaboration when embedded into an external application, such as Alfresco. You are free to edit a document, but a second or third person may not collaborate live with you. This means that the only real benefit you get is an in-browser editor.
Etherpad-Lite or just Etherpad is a very active community driven project, with great API documentation and a host of modules that exist already to enhance the vanilla experience. Built on NodeJS, you can get an instance up and running in no time. Being open source, its a pleasure to integrate into.
Etherpad’s main constraint is only supporting HTML and Text documents. It is a very hard challenge to display and edit office documents inside a browser, so in order to reduce the development effort, focusing on simpler web-first formats seems like a reasonable tradeoff. What this means though is, for our use case, we need extra steps in order to convert the HTML and Text into a more appropriate format, which can cause some unnecessary overhead and introduces errors during conversion.
If you’re happy to use simple formatting and use it more as a scratch pad/ideas tool, then Etherpad is a great fit. Maybe you want to collaborate on a blog or come up with an email disclaimer, but still want the ECM benefits such as approval or version control, the best of both worlds. If you need to have more powerful options around the formatting of documents, as we did, then this solution may be a bit lite.
While trialling Etherpad, we created an integration into Alfresco which can be found on Github.
The main challenge of integration here is that we chose to use Alfresco Share as a proxy. This decision allows to run up an Etherpad instance on loopback, and feed clients directly from Share. This approach ended up exposing a few of the rough edges around Spring Surf which were not entirely designed for this purpose:
- The CSRF Filter and POST requests: Without completely disabling CSRF protection or completely replacing the share security config with your own, updates to Etherpad are blocked by default. This makes it a bit harder to configure. In a perfect world, we would have a drop-in module, without compromising on security.
- Spring Surf doesn’t handle proxying too well: The Spring Proxy controller makes assumptions about URLs, which are not entirely valid. We ended up rolling our own proxy controller to handle URLs in a more sensible way, but solving that bug caused another. So our latest version only enables the etherpad proxy on certain URLs to prevent this from causing any regressions in the future.
OnlyOffice was the last of the 3 we investigated. The benefit of OnlyOffice is that is very similar in functionality to Google Docs: An online editor that supports office-like documents. Their licensing, with the release of a community version of the server licensed under AGPL, is Open Source which is a great plus. You can deploy it on-premise, and to make it easier to do so, they have included a pre-made Docker Image.
Their real-time collaboration is handled slightly differently to Google Docs and Etherpad: A section of the document is locked per user. So while you don’t see anyone actually typing, you still are notified that someone else is editing the document. This has the added benefit that your position doesn’t keep shifting if someone above you is writing lengthy paragraphs.
OnlyOffice supports most of the common office document formats. The actual editing and display of documents is on par with Google Docs, allowing for even more complex formats in some cases, as it uses LibreOffice to do the conversion. This is a good fit, as ideally we’d never have to leave the browser to make changes or collaborate, which is what OnlyOffice allow us to do. This also means it’s a one-button press to start editing a document. Coupling this with Alfresco’s version control and Share interface, this makes quite a powerful solution.
We built the integration into Alfresco during one of our monthly Hackathons. Like Etherpad, this integration can be found on Github. Due to the interaction with the document server, this turned out to be a much more complex interaction than Etherpad:
- Relies on a URL Callback: You need to make sure that the callback URL is accessible from the document server.
- Must live on its own URL at the root directory: This means it is a bit of a harder setup and it can’t be proxied or share the same domain as Alfresco.
Out of all the three applications we tried, we found that OnlyOffice closely matched our requirements. This isn’t to say that the others are not also viable solutions, but they fell short of what we were after. Now the next stage is to enhance the module and find more uses for it. One possible expansion will be removing the PDFjs preview for office documents, replacing them with an embedded document view from OnlyOffice.
We found that being able to approach this in a modular fashion, with Alfresco acting as an ECM platform, allowed us to find a solution to a pretty common but complicated problem with complex tradeoffs. The fact that we can simplify this down to a one-button click means we are doing our job well, and allows us to increase efficiency in a great way.
Once again, we invite you to give these a shot and see for yourself. You’ll probably find, like we did, that this will change the way you work.