Activity 1: Inventory of the PRONOM and GDFR software and data
From GDFR Wiki
Activity 1: Inventory of the PRONOM and GDFR software and data
Lead: Adrian Brown (TNA), Andrea Goethals (HUL), Rory McLeod (BL)
Additional Participants: Andrew Jackson (BL), Rob Sharpe (Tessella)
Deadline: December 31, 2008
Activity Description: Compile inventories of PRONOM and GDFR software and data. For the software, this should minimally include software version, scheduled completion date, features, and IP claims / license information. For the data, this should minimally include IP claims.
Deliverables:A software and data inventory published to the GDFR wiki.
Software Versions
- GDFR 1.0.0
- Creator: OCLC under contract to Harvard University
- License: LGPL
- A subset of the software (the software contained within the org.oclc.rfa package) is under the OCLC Public License 2.0.
- Completion date: It is completed now but has bugs.
- Ongoing maintenance: The contract with OCLC has expired. Any bug-fixing or enhancements to the software would need to be planned as a separate project.
- Languages: Java, XML, XSLT, Perl
- Server requirements:
- Tomcat 5.5.25
- Apache 2.0.52 with mod_perl, mod_rewrite, mod_jk
- Berkeley XMLDB 2.3.10
- Java 1.5 JDK
- Perl 5.8.5
- Apache Ant 1.6.2
- GCC 3.2 (for compling XMLDB only)
- Client requirements: Firefox web browser
- Features
- Two interfaces: (1) A web browser interface used for data users, data editors and administrators (access to features controlled by user roles) - See www.formatregistry.org/registry ; (2) A machine interface
- Search records (All of these available in the web & machine interface)
- Simple free text search of formats
- Advanced search of formats by name, file extension, GDFR identifier, MIME type, genre or by related agents, hardware, software, IPR
- Advanced search of format assessments
- Advanced search of documentation or files related to formats, hardware, software, media, IPR
- Advanced search of hardware by name, type, version, or related agents, hardware, software or IPR
- Advanced search of IPR
- Advanced search of media type, name, version or related agents, IPR, hardware, or documentation
- Advanced search of software processes
- Advanced search of format relationships
- Advanced search of software by name, type, version or related agents, documentation, formats, IPR, hardware, software or processes
- Browse records (web interface)
- Browse of formats by genre, MIME type, name, file extension or GDFR identifier
- Browse agents by type, personal name, corporation or country code
- Browse documentation or files by type, intent or title
- Browse hardware by name, type or version
- Browse IPR by type or jurisdiction
- Browse media by type, name or version
- Browse software by name, type or version
- Display records (format, agent, assessment, documentation/file, hardware, software, media, software processes, IPR) retrieved through searching or browsing (web or machine interface)
- Add records (via the web interface or through SRU update)
- Edit records (via the web interface)
- Search records (All of these available in the web & machine interface)
- Two types of registry nodes: Source and mirror
- All data is entered and edited at the single source node
- Mirror nodes can be set up and housed by anyone
- Data can be synced from the source node to the mirror node (this is currently buggy)
- Data can be searched, browsed and displayed on the source or mirror nodes
- Two interfaces: (1) A web browser interface used for data users, data editors and administrators (access to features controlled by user roles) - See www.formatregistry.org/registry ; (2) A machine interface
- PRONOM 6.2
- Creator: Tessella under contract to The National Archive of the United Kingdom, England and Wales (TNA)
- License: Belongs to TNA.
- Licensed through Tessella's Safety Deposit Box (SDB) to other organisations.
- Completion date: Completed in summer 2007
- Ongoing maintenance: Support/warranty arrangements between TNA and Tessella
- Languages: ASP.NET, Transact SQL, XML
- Server requirements:
- IIS
- SQL Server 2005
- Client requirements:
- Web based so none really
- Administration is via an MS Access application
- Features
- See [1]
- Web services available to retrieve DROID signature files
- Web services available to support characterisation decisions (utilised by PLANETS characterisation service)
- Web services available to support risk based preservation planning and migration (utilised by TNA's Seamless Flow programme and currently not exposed in public instance on TNA's web site)
- MS Access based administration interface
- PRONOM 7.0
- Creator: Tessella under PLANETS project
- License: TNA will continue to own the ASP pages and be the public face of PRONOM. TNA also own the data. New administration application and database structure owned by PLANETS project (licensing under discussion)
- Will also be licensed through Tessella's Safety Deposit Box (SDB) to other organisations.
- Free use of Java application to PLANETS partners
- Completion date: December 2008 (but this will be first release after major changes so it will be realistical to expect a few bugs at that point)
- Ongoing maintenance: PLANETS project (till summer 2010), Tessella, TNA (but all to be discussed)
- Languages:
- Public-facing web pages still in ASP.NET, Transact SQL, XML
- Admin application in Java, Hibernate (in principle allows any database), XML
- Server requirements for public-facing Web pages:
- IIS
- SQL Server 2005
- Server requirements for admin application:
- Tomcat (or, in principle, any Java web container)
- And database engine (with a Hibernate adaptor). Note, however, this applies to an empty database. Initially a populated database will still remain only avialble in SQL Server 2005.
- Client requirements:
- Web based so none really
- Features
- As PRONOM 6.2 with additional features (e.g., extra information held, GDFR-style facetted classificaiton, more web services)
- Detailed SRD available on request (currently being updates internally)
See [2]