Ptarmigan Media Parser for XMLmetadata in small bites
Ptarmigan (the 'p' is silent) produces XML content from the metadata (title, artist, album, e.g.) found in media files and streams.
It is a SAX event generator which consumes MP3, Ogg Vorbis, FLAC, WMA and four (4) different types of playlists to produce events that can be used to build an XML document or to feed into a SAX pipeline for further processing.
Filters are provided to extract metadata from the following: ID3v1, ID3v2, Ogg/Vorbis, FLAC and WMA headers. The playlist formats supported include M3U, PLS, ASX and the new B4S (WinampXML). Support for other formats is planned.
This is an open source project hosted at SourceForge and licensed under a conventionally modified BSD license.
0.2 Alpha release - having survived the gauntlet
I've reworked the design some, added a few more filters and test files.
It can be used with (or without) a buffered input source. If a buffered source is used, Ptarmigan will limit the number of bytes it consumes for non-playlist sources, permitting you to rewind as necessary to the start of the audio data for one-pass usage.
Ptarmigan passes its own JUnit test suite. It has also been stress-tested extensively using JReceiver on both Linux and Windows.
The new MPEG support includes the determination of the offset of the first frame and the estimation of both average bitrate and duration. VBR support will likely be coming in one of the next couple of releases. There's also a new mpeg.xsd schema describing what will be produced by this parser.
Parsing errors will be reported as SAX events. ParseExceptions are tunneled and reported via this mechanism. See ptarmigan.xsd schema to see the format. Both IO and SAX errors will not be reported through SAX events, but rather through conventional IOException and SAXException exceptions that you'd catch when invoking the parser.
You can optionally calculate a MD5/SHA1 hash digest of the audio data of audio files, excluding the metadata.
Many other minor changes, such as support for the 'old' ID3v2 tags used in iTunes.
The size of the ptarmigan.jar has grown to 87K (or 70K without debug info).
It's technically still in alpha as it's not yet feature-complete and may need a design review. Nevertheless it's pretty solid.
Source code is now in CVS.
0.1 Alpha release - the initial public release of Ptarmigan. Everything is new but should be pretty well tested nevertheless.
Ptarmigan is an offshoot of the JReceiver Audio Server
Small Footprint - deployment requirements are modest, with less than 250K of supporting libraries needed, some of which you may already use in your application. Cut that number in half if you don't need to parse XML-based playlists.
XML-centric - using SAX events for efficiency and backed by XML Schema, namespaces, and an xUnit test suite to enforce consistent and predictable behavior. Though the initial implementation requires Java, the architecture isn't tied to it. A port using C, Python, Perl or C#/.NET can share the same schema and test harness to ensure identical behavior.
Extensible - uses a pluggable interface so that new parsers can be added without recompilation.
xUnit Testing - uses JUnit and XMLUnit test suites against a set of control files to ensure that future development doesn't break existing functionality. In addition, it is used in the JReceiver Server which is used with a diversity of music collections.
Open Standards - relies on SAX2, a widely-supported standard for producing XML content. The supported libraries share the same liberal licensing (BSD-derived) of Ptarmigan. It also uses Schema and XML namespaces, two other widely supported XML standards.
More Open Standards - the libraries and build environment upon which the Java implementation depend are open-source projects themselves, from the Apache Jakarta project and others.
Fast - event-based SAX is speedy where you can ignore the information you don't need and capture what you want. No need to wait for the entire document to load before extracting data.
Efficient - unlike a DOM-based solution, SAX2 does not keep the full XML document in memory and instead produces events which require comparatively little memory and which scale well to accommodate large data sources, like playlists with 20,000 entries.
Consistent - consistency is attained through the test harness, where the XML content generated by the individual parsers validated against a set of Schema documents and against a set of control documents.
Flexible - there are many ways to consume the events produced by Ptarmigan:
More Flexibility - even though there may be multiple blocks of metadata in a document Ptarmigan can get them all in one pass. For example, consider a FLAC file that contains Vorbis tags, prefixed by an ID3v2 tag and has an ID3v1 tag at the end of the file. This is what it looks like in XML.
Yet More Flexibility - most metadata parsers work only with files. Ptarmigan concentrates instead on streams for more efficient parsing of remote sources where you don't have to download the entire file before you can extract the metadata. Exception: ID3v1 tags are only extracted from local file sources at present.
Multi-platform - the Java implementation can run on a variety of platforms, including Windows, Linux, Sun, BSD, Mac OSX and others. It'll be tested soon on J2ME (Micro Edition) for use on portable Java-based devices.
Deployment Binaries and Developer Documentation
Everything, including Source and Test Environment
The table below shows the minimal set of libraries needed:
|SAX2 XML Parser**||Reading XML Sources||114K|
** SAX2 parsers include Ælfred2, Crimson, Xerces-J, Oracle, Piccolo (114K) and possibly others. For a current list, see here.
If the parsing of XML-based playlists isn't required, the SAX2 XML Parser may be dropped from the list.
Note: the other libraries included (digester, beanutils, collections, e.g.) in the deployment distribution are to support the demo application in the /sample directory.
TODO: should sax.jar (32K) be included in the list as well?
How to integrate Ptarmigan in your application? Here's the API Documentation. Most application developers need only be concerned with GeneratorFactory and the Generator interface.
If this is your first use of SAX, check out the quick start at the SAX site.
See the demonstration app in the /ptarmigan/sample directory in the distribution.
You might also consider buying David Brownell's SAX2 from O'Reilly. Reportedly it's excellent.
Elliotte Rusty Harold has an online edition of his book Processing XML with Java available.
Please post your questions to the appropriate list rather than contacting the author directly.
Reed Esau () is the founder and lead of the project.