lots more work done on autotagger.com

There are lots of days when I'm happy with drupal, but today was not one of them.

The last couple of days have been an uphill battle to get the data from the old autotagger databases imported into drupal so that users can edit the city records.

I've spent a day or two to move all the countries under the right continent and to clean up all the accents and such, now the time of bulk import has arrived.

The structure is quite simple: planet/continent/country/region/city and on every level you'll be able to attach images and 'event' notes. The tricky bit is that there are rather a large number of cities on planet earth.

The database I have contains about 2.5 million of them and importing them into drupal is the biggest headache I've head so far. First of all, drupal stores its data all over the place, so you can't just go and import your records into the sql tables and be done with it (well, that's not strictly true, but you risk creating a situation that is less than stable because you overlook some detail).

So, the advised way is to set up a structure and then call node_save to let drupal do the dirty work. That works well, until you notice that ever node_save call you do consumes a whole pile of memory that never comes back. Possibly this is not drupals fault but an internal problem with PHP but either way it stops you from doing this in one sitting.

I've added another case on drupal.org (hey, I'm hopeful, someone might actually one day get around to answering some of that stuff) but I don't think I have time to wait around for an answer so I've figured out a stupid but effective way to work around it. I've split up the original dataset into batches of 10,000 cities and I'm loading them one batch at the time.

That should work but it's a complete pain. Also the speed of the import is nothing to be happy about, speed definitely is not drupals forte, and I seriously wonder how we are going to deal with major pageviews on these sites, I really am not sure drupal will be able to handle it.