chlab very irregular posts about web development

Cache Soap envelope schema for schema validation

I recently had to enable schema validation on incoming requests on a Soap API running on Zend Soap. We were having noticeably worse performance with schema validation than without - and found out that it was because of the import tag pulling in the Soap envelope schema for every request:

<xsd:import namespace="" schemaLocation=""/>

My first thought was to match all the schemaLocations in the schemas and cache them manually, but it seemed to me there must be a better way.. There is and it’s called a catalog.

“What’s a catalog” I hear you say:

Basically it’s a lookup mechanism used when an entity (a file or a remote resource) references another entity. The catalog lookup is inserted between the moment the reference is recognized by the software (XML parser, stylesheet processing, or even images referenced for inclusion in a rendering) and the time where loading that resource is actually started.

So, it turns out libxml has catalog support. A catalog is basically an XML file that libxml will parse and use to map references. I was doing the schema validation with DomDocument::schemaValidate and since DomDocument uses libxml behind the scenes, this works for PHP as well. libxml2 per default looks for an xml catalog in /etc/xml/catalog and DomDocument is hardwired to that location as well.

You can add XML catalogs with a little tool called xmlcatalog that comes with libxml (I think). The usage is pretty straightforward, call up the man page to get an overview or read it online. Here’s how I added a catalog to map the Soap envelope schema location to a local path:

1 - Create /etc/xml if it doesn’t yet exist. 2 - Copy the Soap envelope schema from to a local path, let’s say /etc/xml/soap-envelope-1.1.xsd 3 - Create the catalog file (if it doesn’t exist yet) and add our new rule:

xmlcatalog --create --noout --add "rewriteURI" "" \
"file:///etc/xml/soap-envelope-1.1.xsd" \

The file created at /etc/xml/catalog should then look something like:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
 <rewriteURI uriStartString="" rewritePrefix="file:///etc/xml/soap_1.1_envelope.xsd"/>

4 - Restart apache

Your schema validation should be quick as a flash after that.

* You can change this default location by setting the XML_CATALOG_FILES environment variable.

Compress PDF files with Quartz filters from command line

What are Quartz filters?

Quartz is, in a nutshell, a graphic library and part of the OS X Core Graphics Framework. A Quartz filter is basically an XML file defining how Quartz should be used. Quartz filters are often used to compress PDF documents by reducing the size of the images in it – but can also be used for a variety of other image manipulations.

Here’s an article about using Quartz filters to reduce the file size of PDFs: learn how to reduce PDF file size with a Quartz filter. As far as I know, you can create your own filters with the ColorSync program that comes with OS X or by writing the XML yourself. The “Reduce File Size” Quartz filter by OS X goes a bit hard on the images, making it useless for print PDFs and even a bit hard on “screen” PDFs. A guy named Jerome Colas created a few filters which give you more control over the outcome of the images. Check them out in his article “Reduce PDF file size : free Acrobat replacement for Leopard“.

Using Quartz filters from command line

I wanted to use Quartz filters for exporting PDFs from an application I developed at work, preferably from the command line. It seems not that many people have wanted to do this or write about how to do it. After searching google on and off for days and trying all kinds of things, I was pointed in the right direction; you can do this with a Quartz printer application stored in /System/Library/Printers/Libraries/quartzfilter. The syntax is:

/System/Library/Printers/Libraries/quartzfilter sourcefile filter destination

So converting big.pdf into small.pdf with the “Reduce File Size” filter would work like this:

/System/Library/Printers/Libraries/quartzfilter big.pdf /System/Library/Filters/Reduce\ File\ Size.qfilter small.pdf

Pretty straightforward isn’t it?

Other ways of using Quartz filters

While searching, I learned about some other ways to use Quartz filters which may be interesting for some people.

Using Quartz filters in Python

On my mac I have the file: /Developer/Examples/Quartz/Python/ I’m not sure if it came with XCode or if it’s OS X native. The script can be used to apply Quartz filters to PDF files, except well, it doesn’t work. If you have an up to date OS X, you should have at least Python 2.5 installed and it looks like the script was developed for Python ≤ 2.3 as it’s using features from the CoreGraphics library that aren’t supported in newer versions of Python. I don’t know the first thing about Python, but I figure if you’re developing with Python you might be able to fix it.

Using Quartz filters in Automator very easy. Create a new Automator project and choose the action “Apply Quartz Filter to PDF Documents” from the “PDFs” section of the actions library and use it in combination with whatever other Automator action you need.

Using Quartz filters with AppleScript

Check this detailed article by Martin Michel over at MacScripter and try the cool droplet he made: thoughts and examples about using the quartzfilter tool.

That’s it, I hope some of you can benefit from this post! Thanks to the guys at macosxhints and macscripter.

Note: I tested these things on OS X 10.5.8, I’m not sure how this will work on other versions of OS X. Please leave a comment if you find out more. Cheers