First thing first, if all you are interested in is to get Gmvault v1.8-beta, please go here.
On the 21st of March 2013, second day of Spring 2013 and because I waited for the solar system planets to be all aligned, a new version of Gmvault has been released. It took me 7 months to release a new version because Gmvault needed to be re-factored to be more manageable and also because I wanted to take some times to improve the Gmvault kernel. Many new features and bug fixes have been implemented in version 1.8-beta like the support of German, French, Japanese for labels and emails as well as the implementation of the export function allowing you to view your emails in your favourite email client in Gmvault but for today I have decided to focus on the performance improvements journey in that post. If you want more information on the new features implemented in v1.8-beta please refer to the download page.
So going back to my train of thought, the Gmvault core had 2 weaknesses in my opinion: performances and internationalisation. Following a normal craftsman development cycle, I first implemented my ideas, focusing on the usability and friendliness of the tool, leaving out the performances issues. Then as users with 300 000 emails and more started to appear I decided to work on the performance issues.
A Gmvault contributor (Thank you Dave Vasilesky) spotted early the performance issues encountered by Gmvault and even provided a solution. His solution was to batch request the Gmail IMAP server during a sync operation instead of querying it for every individual requests which was a simple but dumb idea. I wanted to go even further and also experimented with a multi-threaded Gmvault version and a multi-process one to circumvent the GIL problem. This took me some time but it turned out that the gain given by multi-tasking Gmvault was really small because most of the time is spent waiting on I/Os from the Gmail server (How surprising). For that reason and because the error management is much more complicated in a multi-thread env, I decided to drop the multi-tasking for the moment. In addition, experimenting with asynchronous I/O might have been the next step but the network calls used by IMAPClient and imaplib, the Gmvault libraries used to talk "IMAP" are blocking calls and I didn't want to rewrite the IMAP layers at the stage.
I now feel that the current version (1.8-beta) of Gmvault is in a stable state and should serve well users. I might even drop the beta in few weeks if no real problems have been reported. Now where is Gmvault going ?
The next steps and features will be the following:
Even if Gmvault is quite easy to use, it is necessary to create a better user experience for allowing lambda Gmail users to backup their mailbox. Now, should the GUI built on python technologies like the rest of Gmvault ? I don't think so, it will have to be based on a stack HTML5/js which can be used for building maybe a cloud service in the future. This will be my focus for the next few months.
Users would like to use Gmvault in the background daily without having to think about it. Gmvault would then only backup new emails and would update the gmvault-db with modifications on already saved emails. The necessary information is now present in Gmvault to implement that feature and it is only a matter of saving more organised information. The automatic scheduling should be also added in the system some how (delegated to the system scheduling capabilities or not).
I have many more ideas in mind that might be added to the road-map but I think that with these 3 objectives, my plate is already pretty full.A nice to have would be some capabilities allowing to view email contents and search them. But wait this could be integrated in the Graphical Interface and what is currently missing in Gmvault is some sort of full-text search capabilities and a database to organise the access to the emails. I think that this feature will appear naturally once the GUI is there. The choice of database technology is yet definitive but a simple Sqlite3 DB to handle the email metadata information (not the complete email, part of it) might be sufficient with some full-text search engine.
The site is one year old and needs to be revamped but I am not a web-site designer so if you wish to help in that matter please contact me.
First if all you are interested in, is to get Gmvault 1.7-beta, click here.
Gmvault: Gmail backup simply, v1.0 has been released on the 7th of May 2012 on Hackernews and since then, I have been tremendously surprised by its success.Gmvault is a command-line tool built to backup on your disk, your Gmail inbox and restore it as it was, in any Gmail account you wish. It is full of features such as encryption, compression, automatic syncing, GTalk chats backups (and many more), while being very simple to use. It had to be simple because I wanted everybody to be able to use it, not only geeks like me, but also normal users that have a GMail account and wanted to reclaim their emails from Google. Simplicity and usability is not the topic of this post so let's go back to it.
There has been more than 30000 downloads and more the 80000 users coming to www.gmvault.org in total since the 7th of May which is a lot considering that the only advertisement I did was to post it on HackerNews.There have been also many website articles (http://bitly.com/bundles/gaubert/5) and even youtube videos explaining how to use it (http://bit.ly/OWN4Xc) :-)
I am very happy about this adventure and would like to pursue it by developing v2.0 that will come up with a graphical interface in order to be really available for any users in the Gmail users' spectrum: from my Granma to myself. But before to go to v2.0, there is a mandatory stop in v1.7-beta.
v1.7-beta is the fourth version released 3 months after v1.0-beta and it is the first version I am starting to be statisfied with. In between there was v1.5-beta and v1.6-beta:
v1.0-beta was the proof of concept that established the name and demonstrated that there was a need for such tool, but it was not fully ready. The backup-restore engine was already working very well but being an experienced developer I knew I had not hit the performance issues you will always encounter when you launch a new software. I also knew that the deployment and packaging was not ready but I wanted to release it to see if the idea would get a bit of traction (and it did !). I was in the Fuck it ship it mode (http://bit.ly/PQIuFE) when I did it and am very grateful for having done that.
v1.5-beta fixed the major issues of v1.0-beta which were, to not work properly on Linux because of some bash script issues and to not uninstall cleanly on Windows. These were minor issues but that could totally stop users running it, so I had to fix them quickly and I released a new version less that 10 days after. In my hurry, I even messed-up one feature (the encryption could not be activated anymore) as I deactivated some of my tests (so bad).
v1.6-beta started to tackle some performances issues reported by some users. In some cases, the restore operation was eating all the memory of the machine and eventually crashing it. I had to do some profiling in Python which is not the easiest and found that the issue was coming from the socket and ssl python layer. I also worked a lot on the MacOSX deployment as there were some issues with 32 bits machines. It is a bit tricky to make a clean Python OSX app and I will probably write a blog post about it. With that version I was closer to my goal but users had started to request new features that needed to be added and new restore issues started to appear.
So I started to work on v1.7-beta but decided to take my time in building this version as I needed to erase the performance issues. In special cases, the Python process could start to spin-out and consume lots of CPU. Again I did find a bug in the socket layer and had to monkey patch the socket module to fix it. Note that the bug has been reported to the Python developers but it did not make it through all releases yet. It also turns out that GMail IMAP doesn't want to ingest back some of the emails it originally contained and it spits them out in error so I created a quarantine area to stage these emails. Most of the time, these are advertising emails that contain bad characters. I also added new features such as Chats backup. They are now stored like to emails and can be restored in a special label gmvault-chats as we are not allow to write in Chats. GMail IMAP is like its own little world as it is full of specific behaviours that not necessarly follow the IMAP philosophy. It is compliant to IMAP but Gmail engineers had to make some choices like for example on how to map IMAP folders with labels, ... This will also deserve a blog post as I have some questions for the GMail team. I hope that v1.7-beta is in the stable state I would have liked to have from the beginning. Note that v1.0 was not at all desastrous but I could have done a little bit better for a launch.
So what should I do next time I release a new product ?
Lessons Learned
This is it, to enjoy Gmvault v1.7-beta please go there.
Do not hesitate to contact me via Github, Twitter, email, Gmvault-Users forum ... if you want to report a problem or would like to suggest a nice feature.
]]>We use a simple MoinMoin wiki at work in our team in order to record some best practices, information related to problems and so on.
I wanted to maintain a calendar of events related to the team and Google Calendar is the right tool for that but I didn't want to give to the team and new URL to track.
So I looked how to embed it in a MoinMoin page. Google Cal for that advises to embed an iframe as you can also show multiple calendars at once (see picture).
So I did a little GSearch and ended up on the HTML.py macro for MoinMoin.
It is a pretty dumb macro as all it does is to pass the macro argument to the HTML engine that is going to render it.
The issue is that the macro is compatible with version anterior to 1.6.
So I did modified it to be compatible with version 1.6 onwards.
It is available on github under https://github.com/gaubert/geekomotion/blob/master/HTML.py
Get the file and drop it under [MY_MOIN_MOIN_ROOT]/data/plugin/macro or in [PYTHON_DISTRIB]/lib/pythonx.x/site-packages/MoinMoin/macro.
If it works you should have a handy page like that.
Just one additional comment, When you will copy the iframe tag from the GoogleCal page (see below), remove all quotes from the iframe attributes otherwise the macro system of MOINMOIN will not work.
You shoud insert the following HTML macro:
<<HTML("iframe src=https://www.google.com/calendar/embed?src=koa1pe0jc7195mb51carkmtpak%40group.calendar.google.com&ctz=Europe/Berlin style=border: 0 width=800 height=600 frameborder=0 scrolling=no></iframe>")>>
Enjoy !!!
]]>