I recently had to enable schema validation on incoming requests on a Soap API running on Zend Soap. We were having noticeably worse performance with schema validation than without - and found out that it was because of the import tag pulling in the Soap envelope schema for every request:
My first thought was to match all the schemaLocations in the schemas and cache them manually, but it seemed to me there must be a better way..
There is and it’s called a catalog.
“What’s a catalog” I hear you say:
Basically it’s a lookup mechanism used when an entity (a file or a remote resource) references another entity. The catalog lookup is inserted between the moment the reference is recognized by the software (XML parser, stylesheet processing, or even images referenced for inclusion in a rendering) and the time where loading that resource is actually started.
So, it turns out libxml has catalog support. A catalog is basically an XML file that libxml will parse and use to map references. I was doing the schema validation with
DomDocument::schemaValidate and since DomDocument uses libxml behind the scenes, this works for PHP as well. libxml2 per default looks for an xml catalog in /etc/xml/catalog and DomDocument is hardwired to that location as well.
You can add XML catalogs with a little tool called xmlcatalog that comes with libxml (I think). The usage is pretty straightforward, call up the man page to get an overview or read it online. Here’s how I added a catalog to map the Soap envelope schema location to a local path:
1 - Create /etc/xml if it doesn’t yet exist.
2 - Copy the Soap envelope schema from http://schemas.xmlsoap.org/soap/envelope/ to a local path, let’s say /etc/xml/soap-envelope-1.1.xsd
3 - Create the catalog file (if it doesn’t exist yet) and add our new rule:
The file created at /etc/xml/catalog should then look something like:
4 - Restart apache
Your schema validation should be quick as a flash after that.
* You can change this default location by setting the XML_CATALOG_FILES environment variable.
Quartz is, in a nutshell, a graphic library and part of the OS X Core Graphics Framework. A Quartz filter is basically an XML file defining how Quartz should be used. Quartz filters are often used to compress PDF documents by reducing the size of the images in it – but can also be used for a variety of other image manipulations.
Here’s an article about using Quartz filters to reduce the file size of PDFs: learn how to reduce PDF file size with a Quartz filter. As far as I know, you can create your own filters with the ColorSync program that comes with OS X or by writing the XML yourself. The “Reduce File Size” Quartz filter by OS X goes a bit hard on the images, making it useless for print PDFs and even a bit hard on “screen” PDFs. A guy named Jerome Colas created a few filters which give you more control over the outcome of the images. Check them out in his article “Reduce PDF file size : free Acrobat replacement for Leopard“.
Using Quartz filters from command line
I wanted to use Quartz filters for exporting PDFs from an application I developed at work, preferably from the command line. It seems not that many people have wanted to do this or write about how to do it. After searching google on and off for days and trying all kinds of things, I was pointed in the right direction; you can do this with a Quartz printer application stored in /System/Library/Printers/Libraries/quartzfilter. The syntax is:
So converting big.pdf into small.pdf with the “Reduce File Size” filter would work like this:
Pretty straightforward isn’t it?
Other ways of using Quartz filters
While searching, I learned about some other ways to use Quartz filters which may be interesting for some people.
Using Quartz filters in Python
On my mac I have the file:
I’m not sure if it came with XCode or if it’s OS X native. The script can be used to apply Quartz filters to PDF files, except well, it doesn’t work. If you have an up to date OS X, you should have at least Python 2.5 installed and it looks like the script was developed for Python ≤ 2.3 as it’s using features from the CoreGraphics library that aren’t supported in newer versions of Python. I don’t know the first thing about Python, but I figure if you’re developing with Python you might be able to fix it.
Using Quartz filters in Automator
..is very easy. Create a new Automator project and choose the action “Apply Quartz Filter to PDF Documents” from the “PDFs” section of the actions library and use it in combination with whatever other Automator action you need.