First thing first, if all you are interested in is to get Gmvault v1.8-beta, please go here.
On the 21st of March 2013, second day of Spring 2013 and because I waited for the solar system planets to be all aligned, a new version of Gmvault has been released. It took me 7 months to release a new version because Gmvault needed to be re-factored to be more manageable and also because I wanted to take some times to improve the Gmvault kernel. Many new features and bug fixes have been implemented in version 1.8-beta like the support of German, French, Japanese for labels and emails as well as the implementation of the export function allowing you to view your emails in your favourite email client in Gmvault but for today I have decided to focus on the performance improvements journey in that post. If you want more information on the new features implemented in v1.8-beta please refer to the download page.
So going back to my train of thought, the Gmvault core had 2 weaknesses in my opinion: performances and internationalisation. Following a normal craftsman development cycle, I first implemented my ideas, focusing on the usability and friendliness of the tool, leaving out the performances issues. Then as users with 300 000 emails and more started to appear I decided to work on the performance issues.
1) Performance improvements
A Gmvault contributor (Thank you Dave Vasilesky) spotted early the performance issues encountered by Gmvault and even provided a solution. His solution was to batch request the Gmail IMAP server during a sync operation instead of querying it for every individual requests which was a simple but dumb idea. I wanted to go even further and also experimented with a multi-threaded Gmvault version and a multi-process one to circumvent the GIL problem. This took me some time but it turned out that the gain given by multi-tasking Gmvault was really small because most of the time is spent waiting on I/Os from the Gmail server (How surprising). For that reason and because the error management is much more complicated in a multi-thread env, I decided to drop the multi-tasking for the moment. In addition, experimenting with asynchronous I/O might have been the next step but the network calls used by IMAPClient and imaplib, the Gmvault libraries used to talk "IMAP" are blocking calls and I didn't want to rewrite the IMAP layers at the stage.
2) Roadmap for the future developments
I now feel that the current version (1.8-beta) of Gmvault is in a stable state and should serve well users. I might even drop the beta in few weeks if no real problems have been reported. Now where is Gmvault going ?
The next steps and features will be the following:
- Build a nice user interface around the Gmvault kernel
Even if Gmvault is quite easy to use, it is necessary to create a better user experience for allowing lambda Gmail users to backup their mailbox. Now, should the GUI built on python technologies like the rest of Gmvault ? I don't think so, it will have to be based on a stack HTML5/js which can be used for building maybe a cloud service in the future. This will be my focus for the next few months.
- Automatic backups and scheduling
Users would like to use Gmvault in the background daily without having to think about it. Gmvault would then only backup new emails and would update the gmvault-db with modifications on already saved emails. The necessary information is now present in Gmvault to implement that feature and it is only a matter of saving more organised information. The automatic scheduling should be also added in the system some how (delegated to the system scheduling capabilities or not).
View emails from the local email repository and search their content
I have many more ideas in mind that might be added to the road-map but I think that with these 3 objectives, my plate is already pretty full.
A nice to have would be some capabilities allowing to view email contents and search them. But wait this could be integrated in the Graphical Interface and what is currently missing in Gmvault is some sort of full-text search capabilities and a database to organise the access to the emails. I think that this feature will appear naturally once the GUI is there. The choice of database technology is yet definitive but a simple Sqlite3 DB to handle the email metadata information (not the complete email, part of it) might be sufficient with some full-text search engine.
3) Help me to revamp the Gmvault site
The site is one year old and needs to be revamped but I am not a web-site designer so if you wish to help in that matter please contact me.