Skip to content

Pictures and slides from my DrupalCon 2008 talk

Hello and greetings from DrupalCon 2008 in Szeged, Hungary!

We (Thierry Manfé, Scott Mattoon and myself) are having a great time manning our booth and talking about Drupal, MySQL and Open Source@Sun with the nice crowd of Drupal Users and Developers here. Sun is a gold sponsor of the event and we're giving a number of sessions as well.

Today I gave my first presentation about MySQL Backup and Security - Best practices - unfortunately I ran a tad bit out of time at the end... The slides have already been attached to the session page, so you can read up on the last few things I was going to talk about. Feel free to contact me, if you have further questions!

Tomorrow I'll be talking about High availability solutions for MySQL: An Overview and practical demo, which will also include a practical demonstration of a two-Node Linux Cluster, performed by Jakub Suchy. In the afternoon, I will also hold a BoF about bzr - The Bazaar source revision control system

I've also uploaded some pictures from the event (and some impressions from the city) on my gallery (more will follow later). Enjoy!

The key to accessing your data: MySQL Connectors and bindings for various languages

Being able to use an Open Source DBMS to manage your data is nice, but what good would it be if you can't easily access it from your applications? One key factor to the popularity of MySQL is probably its wide range of available language bindings, which started with support for C, PHP and Perl from early on.

I've tried to gather a list of languages and their respective MySQL drivers/modules below. It's by no means complete or exhaustive, but I think I covered quite a lot of popular as well as exotic programming languages.

There is a number of connectors which are actually developed by the Sun Database Group (aka MySQL) itself and that are ready to use:

  • Connector/ODBC - Standardized database driver Windows, Linux, Mac OS X, and Unix platforms.
  • Connector/J - Standardized database driver for Java platforms and development.
  • Connector/Net - Standardized database driver for .NET platforms and development.
  • Connector/MXJ - MBean for embedding the MySQL server in Java applications.
  • MySQL native driver for PHP - mysqlnd - The MySQL native driver for PHP is an additional, alternative way to connect from PHP 6 to the MySQL Server 4.1 or newer.
  • libmysql - The original implementation of the MySQL Client/Server protocol (in C). This library is the basis for a large number of client libraries for other languages.

In addition to the above, there are several other connectors developed by Sun/MySQL, which are still under development:

But it's not only us who develop language bindings for the MySQL server. There is an abundance of drivers that are developed and maintained by the Community, independently from Sun/MySQL (but sometimes with support or guidance from MySQL engineers). The list below is not sorted in any particular order other than the sequence in how I found them over time:

I probably forgot some other drivers/bindings - if you have any more to add, please let me know!

And if you'd like to create your own implementation for your favourite language: the protocol is documented here and here. Jan's additional notes may also be helpful to get you started.

2008 Open Source CMS Award: two more weeks to submit your nomination!

Vote now for 2008 Open Source CMS Award!Just to remind you that Packt Publishing is running their Open Source CMS Award again:

The Packt Open Source Content Management System Award is designed to encourage, support, recognize and reward Open Source Content Management Systems (CMS) that have been selected by a panel of judges and visitors to www.PacktPub.com. Now entering its third year, the Award has established itself as an important measure for quality and the popularity of Open Source Content Management Systems.

You have two more weeks to submit your favourite CMS in the following categories:

As for the last two years, I'll be a member of the team of judges that have to choose from the finalists that received the most nominations during the nomination stage.

I look forward to the list of finalists - it's always interesting to find out about new developments in this area and how the established projects in this market have developed over the course of the year!

Recent additions to my openSUSE Build Service repository

I recently added two new packages to my repository on the openSUSE Build Service:

  • Maatkit is a collection of essential command-line utilities for MySQL. Each is completely stand-alone, without dependencies other than core Perl and the DBI drivers needed to connect to MySQL, and doesn't need to be "installed" - you can just execute the scripts. This makes the tools easy to use on systems where you can't install anything extra, such as customer sites or ISPs.
  • protobuf - Protocol Buffers - Google's data interchange format. Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.

The protobuf package is required, if you want to compile drizzle. Packages are available for openSUSE, Fedora and Mandriva Linux. Feedback is welcome!

Supporting the Software Freedom Day


Are you a member of a local Linux User Group? Or a MySQL User Group? Or any other group that is related to open source software? Have you heard of the Software Freedom Day yet? This is a good opportunity to spread the word and showcase what OSS is all about to a wider audience. Some quotes from their Software Freedom Day website:

Software Freedom Day is a global, grassroots effort to educate the public about the importance of software freedom and the virtues and availability of Free and Open Source Software. Local teams from all over the world organise events on the third Saturday in September.
Software Freedom Day is a global celebration and education of why transparent and sustainable technologies are now more important than ever. With over 200 teams in 60 countries participating, it is a fantastic event to get your schools and communities involved in.
Go along to your local event or start your own event and meet a wide range of people, all working together to help ensure our freedoms are maintained by the technologies of tomorrow. Forming a Local SFD Team can be a fun, effective community building experience for your local user group and community.
Software Freedom Day is an outreach day where you can inspire newcomers with the values and quality of Free Software and communicate the broader issues of Software Freedom through a variety of activities of your choice. Is there something locally relevant to your country or region that you need to express? Is there some great local success story you want to tell? Software Freedom Day is your chance to stand united with the entire Free Software world with what you care about. Freedom.

Events will take place all over the world, organized by volunteers and local user groups. If you are interested to participate, here are some ideas and instructions:

MySQL European Customer Conferences 2008

As last year, MySQL will host three Customer Conferences in Europe. They will take place at the following dates and locations:

The content differs slightly per location, but there will be sessions in two parallel tracks on a wide range of topics, including success stories from customers as well as talks on very technical/practical topics. Here are some excerpts from the agenda for the UK Event:

  • Delivering Web 2.0 Applications with MySQL and memcached (Ivan Zoratti, Head of MySQL Sales Engineering, EMEA)
  • Best Practices for Deploying MySQL on Solaris (MC Brown, Technical Writer)
  • High Scalability with Load Balancing & MySQL Proxy (Jan Kneschke, Senior Software Engineer)
  • Choosing the Right HA Solution for MySQL (Anders Karlsson, Sales Engineer)
  • MySQL Performance Under a Microscope (Tobias Asplund, Senior Instructor)
  • Backup Strategies for MySQL (Kai Voigt, Senior Instructor)

If you use MySQL in a production environment, you should seriously consider attending one of these events. The speakers all know their stuff from practical, hands-on experience "in the trenches", no marketing fluff.

And if you register before August 31st, you will receive an early bird discount (e.g. 159 EUR instead of 199 EUR for the German event). Don't miss the opportunity!

Why is MSNBot ignoring robots.txt?

Today, the root file system on our public svn server nearly ran out of disk space. The reason? The /tmp directory was quickly filling up with temporary files created by websvn, which I set up parallel to the FishEye repository browser for testing purposes. A quick investigation of the apache log files revealed the culprit - a crawler from Microsoft was running haywire and decided to ignore the rules in the robots.txt file, even though it did actually looked at the file before!

Here is how robots.txt looked like (I now changed it to disallow everything):

User-agent: *
Disallow: /fisheye/
Disallow: /websvn/

If I am not mistaken, no crawler should actually consider going into the SVN browser directories. Some snippets from the apache log:

$ grep robots.txt /var/log/apache2/access_log | grep msn
65.55.208.178 - - [03/Aug/2008:16:58:35 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.212.64 - - [03/Aug/2008:19:05:55 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.235.139 - - [03/Aug/2008:22:14:47 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.25.136 - - [04/Aug/2008:00:31:32 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.212.64 - - [04/Aug/2008:00:57:38 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.235.139 - - [04/Aug/2008:06:49:33 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.212.64 - - [04/Aug/2008:07:16:21 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.25.136 - - [04/Aug/2008:09:29:17 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.104.156 - - [04/Aug/2008:11:08:24 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [04/Aug/2008:11:29:34 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.212.64 - - [05/Aug/2008:13:30:20 +0200] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)"
65.55.208.178 - - [05/Aug/2008:16:17:59 +0200] "GET /robots.txt HTTP/1.1" 200 53 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"

Good boy, it checks the robots.txt file. But what is this?

$ grep msnbot /var/log/apache2/access_log | tail -20
65.55.208.164 - - [05/Aug/2008:22:48:15 +0200] "GET /websvn/filedetails.php?repname=MySQL+Documentation&path=%2Fworkbench%2Fall-entities.ent&rev=9981&sc=1 HTTP/1.1" 200 6408 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:15 +0200] "GET /websvn/dl.php?repname=MySQL+Connector%2FJ&path=%2Fbranches%2Fbranch_5_0%2Fconnector-j%2F&rev=6600&isdir=1 HTTP/1.1" 200 40960 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:19 +0200] "GET /websvn/rss.php?repname=MySQL+Documentation&path=%2Fproto-doc%2F&rev=9994&sc=1&isdir=1 HTTP/1.1" 200 36907 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:21 +0200] "GET /websvn/rss.php?repname=MySQL+Documentation&path=%2Ffalcon%2F&rev=8323&sc=0&isdir=1 HTTP/1.1" 200 15278 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:21 +0200] "GET /websvn/rss.php?repname=MySQL+Proxy&path=%2Ftrunk%2FDoxyfile&rev=365&sc=1&isdir=0 HTTP/1.1" 200 4162 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:21 +0200] "GET /websvn/rss.php?repname=Eventum&path=%2Feventum%2Freports%2F&rev=3542&sc=1&isdir=1 HTTP/1.1" 200 90591 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:23 +0200] "GET /websvn/log.php?repname=MySQL+Documentation&path=%2Fndbapi%2F&rev=9749&sc=0&isdir=1 HTTP/1.1" 200 21440 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.208.164 - - [05/Aug/2008:22:48:23 +0200] "GET /websvn/log.php?repname=MySQL+Documentation&path=%2Ffalcon%2F&rev=8511&sc=0&isdir=1 HTTP/1.1" 200 18541 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"

As you can see, it is happily crawling everything below /websvn/, which also includes links named "Tarball" - guess what they are good for? Yes, they create tarballs of a given SVN directory, using /tmp to build up the archive file... Within a very short amount of time, it used up more than 6 GB of disk space, as it seems as if websvn leaves these temporary directories behind, if the connection gets aborted or times out. We do have a cron job that wipes /tmp from files older than a certain amount of days, but it currently fills up much faster than what the cron job usually discards. I need to investigate if it is actually is a bug in websvn to leave these temporary dirs behind.

Hello Microsoft? Can you please fix your bots so they not only read but honor robots.txt files and stop DOSing our site? Thanks :-)

Speaking at DrupalCon 2008 in Szeged, Hungary

I am going to Drupalcon Szeged I just got informed that two of my session proposals for DrupalCon 2008 got accepted - I will be speaking about the following topics there:

The second talk will be held in cooperation with Jakub Suchy, who will take over the practical demo. Sun Microsystems is a Gold Sponsor of the event and I am glad that we can show some support for this truly amazing and vibrant community CMS. DrupalCon 2008 will take place from August, 27th-30th in Szeged, Hungary. The list of proposed talks looks truly impressive! Among the key note speakers will be Dries Buytaert and Rasmus Lerdorf. I look very much forward to this conference. If you have a chance, make sure to attend it!

Thoughts about OSS project hosting and the importance of controlling downloads

In a recent article, Matt Asay was musing about the aspects of hosting an Open Source project by yourself vs. using a public project hosting service like SourceForge, GitHub or Launchpad. He concluded that it's important for commercial/sponsored open source projects in particular to do the hosting by themselves, so they can maintain full control and can gain more insight, which hopefully will turn into more revenue at some point.

However, Matt seems to reduce "hosting" to "providing downloads" only:

Control and visibility. Given the importance of customer conversions, it becomes hugely valuable information to know that it takes, say, eight months on average for someone to buy the "Enterprise" version of your code after downloading the software. With Sourceforge et al., you have no way of connecting the dots between download and purchase. But if you host your downloads, you can suddenly link a download to a purchase using marketing automation software like Loopfuse.
[...]
It can tell you many things, but the key is to be able to glean insight from the earliest stage of your interaction with a potential customer, and that means you have to host your own downloads. Otherwise, you have no idea how or when a would-be customer downloads your code, which makes the "why" they download it less interesting, because it becomes less actionable.

I understand and agree to Matt's point in principle - you want to know more about the users that download and use your stuff. Here are some related thoughts about this topic.

Project hosting is not just about downloads

First: project hosting is much more than just providing a download/mirror infrastructure for your product releases. On the one hand, you have the regular users of your product who are primarily interested in having easy and fast access to the latest builds for their platform of choice and a platform to exchange their problems and experiences with other users.

But project hosting facilities also address a completely different audience, with different needs. These are the developers, who want to have easy access to the latest source code, be able to submit bug reports and patches and want a direct communication path to the project's developers.

I think it is important to ensure that you serve both the developer community as well as the user community as best as you can, which could of course mean you should provide the full range of project hosting all by yourself. But by doing so, you also create an island that makes it difficult to benefit from the "cross-pollination effects" between your project and others. This can partially be remedied if you don't only set up a project hosting infrastructure for your own purposes, but also open it for projects related to your project (and which not maintained by your own team), e.g. how SugarForge is doing it. But the cost and effort involved in setting up and maintaining such an infrastructure should not be underestimated.

There is more to distribute than releases

At MySQL, we just recently moved away the MySQL Server source trees from the proprietary BitKeeper revision control system to Bazaar. Along with this migration, we also relocated the public repositories from mysql.bkbits.net to Launchpad.net, to make it easier for external developers to access and work with the code. Currently, MySQL only makes use of the source repository hosting capabilities - downloads, bug reports and most other things like mailing lists or forums are all maintained by ourselves and hosted on mysql.com.

Due to the distributed nature of Bazaar, we could of course also provide the source repos from our own servers (similar to how we do it for several of our projects that are still maintained in Subversion). But I think it makes a lot of sense to use Launchpad for that, as it allows a tighter integration and collaboration with contributors and other related projects, and it gives us more visibility within the developer community.

Drizzle has taken this even further: the project utilizes all of Launchpad's facilities, including Blueprints, Bug reporting, mailing lists. It's going to be an interesting learning experience to see how this affects and improves community interaction/participation. I'd love to see MySQL move more into this direction as well (especially the bug database and worklog would be good candidates), but this probably will take some more time.

I too recently moved the source tree of my own personal project from a Subversion repository on my private server to Launchpad. Several reasons motivated me to do this, one of them being the opportunity to gain more practical experience with Bazaar and getting away from a central source code repository that makes me the bottleneck in making changes and applying patches. A distributed revision control system makes much more sense from a community contribution point of view, which Ian Clatworthy summarizes quite well in his paper "Distributed Version Control Systems - Why and How". In a way I deliberately give away some of the control over my project. And I must say I like how Launchpad integrates the various available subsystems like blueprints, code branches and bug reports - things are much better connected and they provide useful workflows that make the entire system much more productive to use than e.g. SourceForge.

I still provide downloads of released versions from my own site (as does MySQL), but mostly because I actually did not know until recently that Launchpad offered this kind of service - I will look into that for the next release. I am more interested in making sure that my users have easy access to properly packaged versions of my project for their operating system of choice. Therefore I work closely with the packagers from various distributions and make sure they integrate new releases quickly. In addition to that, I make use of hosted services like the OpenSUSE Build Service, which automatically provides package repositories for a number of platforms. I aim for wide distribution on as many channels as possible, instead of trying to be the sole provider of my product. This brings me to another point:

Downloads stats are overrated

Direct downloads from your project's web site usually are only one part of the distribution system. I believe that being included in the various Linux or other Open Source Operating System Distributions (e.g. Free/OpenBSD, OpenSolaris, etc.) plays a much bigger role in gaining popularity and reaching more users. Most users usually go with what they get as part of the package, as the distributor usually has taken care of a tight integration and proper packaging of your project within his own product and also takes care of providing updates and fixes.

Unfortunately it's almost impossible to gather any detailed intelligence about the number of users of a project this way, as distributions usually don't keep track of (or don't disclose) their download figures and which packages on their releases are the most popular. Debian's Popularity Contest is probably the only exception to this, but it's unclear how reliable that information is. Here I must agree with Matt again, if we just look at project hosting services acting as download providers only and include distributions in this equation:

As open source becomes more commercial, someone is going to need to step up to offer such visibility into these hosted services, or we're going to find the hosted services proving useful for ever decreasing amounts of time.

I guess we all would love to know more about the users that don't download a package from our site, but go with the one provided by their distribution of choice instead or download it from somewhere else. But so far, this is a blank spot on our radar screen.

 Another caveat that results from these multiple distribution channels: just looking at your own download stats may actually give you a skewed picture of your user base, particularly if you look at the platforms (which will probably be dominated by Windows or Mac OS X, as these OSes usually don't ship your code as part of their own product).

So instead of trying to force downloads through a single instance only, I think it's much more important to ensure widespread distribution and a top-notch first hand experience. If users like your product, they are much more inclined to consider coming back and purchasing something from you than if you annoy them by making your product hard to download and install or require them to register before they can obtain a copy of your product. It's all about lowering the barriers as much as you can, even if you have to give up some control in exchange.

 

tweetbackcheck