BookTools |
|
The BookTools Project is a cooperative, free-software project to create a set of public domain utilities for the image processing of scanned book page images.
These tools, in combination with other public domain and low-cost commercial image manipulation utilities, will form a set of facilities for cleaning up, understanding, beautifying, quality-checking and compressing book page images, furthering the goal of a universal, on-line library based upon these images.
The BookTools are placed into the public domain under the conditions of the Gnu Public License (GPL), which guarantees the availability of source code. This supports the goal of an extensible set of tools which are cooperatively enhanced and maintained for the common good.
Since much attention is already focussed on creating access images for Internet browsing, the focus of the BookTools Project will be more on manipulating the high-quality source images captured during scanning. When these are appropriately cleaned up and organized using BookTools, derivative access images can be created using more conventional utilities.
The following BookTools are now available:
- gather
- "Gather up" individual TIFF/Group4 and JPEG page images into a multi-page image-only PDF (Portable Document File).
- Sponsor: Library of Congress Law Library. contact: Nick Kozura, nkoz@loc.gov
- Author: Steve Williams, Picture Elements, Inc., steve@picturel.com
- * This is currently being prepared for release.
The following BookTools are under development:
- find_ht
- Locate the largest rectangular halftone region on a page known to contain a halftone, ignoring any specified rectangular regions. Use iteratively to locate all halftone regions.
- Sponsor: Library of Congress Office of Preservation. contact: Basil Manns, bman@loc.gov
- Collaborator: Cornell University Department of Preservation and Conservation. contact: Anne Kenney, ark3@cornell.edu
- Author: Picture Elements, Inc., info@picturel.com
- un_ht
- Create a grayscale image from a specified rectangular halftone region of a grayscale page image.
- Sponsor: Library of Congress Office of Preservation. contact: Basil Manns, bman@loc.gov
- Collaborator: Cornell University Department of Preservation and Conservation. contact: Anne Kenney, ark3@cornell.edu
- Author: Picture Elements, Inc., info@picturel.com
The BookTools Project is actively seeking sponsors to fund the development of the utilities listed below and other new efforts.
Other organizations or graduate students wishing to undertake developments under the BookTools Project of any of the utilities proposed below are welcomed and will be assisted in keeping those developments compatible with the rest of the BookTools.
Suggested additions to the Wish List are solicited.
Contributions of existing utilities to the public domain under the BookTools Project are welcomed. Guidance and assistance will be provided to allow them to be modified for compatibility with the existing BookTools.
The following utilities are planned or desired:
- gather
- Enhance the existing gather utility to allow PDF pages to be interspersed with TIFF/Group 4 images and JPEG images.
- compound_PDF
- Create a compound, single-page, image-only PDF file from independent image files. One image is specified as background, other images are placed in specified rectangular regions, with scaling and/or cropping used.
- find_txtblk
- Locate the main (largest) text block on a page, ignoring internal paragraph boundaries.
- place_txtblk
- Place the main text block in desired position on a page, maintaining the relative positions of outlyer objects (pgnum, header, footer).
- set_pgsize
- Regularize the page sizes of a set of pages.
- find_pgnum
- Locate a rectangular region containing the page number.
- find_title
- Locate the main title page page number.
- find_title_verso
- Locate the back of the title page (where copyright notice, publishing date and other cataloging information is found).
- find_toc
- Locate the table of contents pages.
- find_idx
- Locate the page range of the index pages.
- find_header
- Locate a rectangular region of a page containing a header.
- find_footer
- Locate a rectangular region of a page containing a footer.
- find_hdline
- Locate rectangular regions containing the largest point size text on the page. Return a region set. Ignore any specified rectangular "keep out" regions as specified by an input region set. This may be used iterative to find successively smaller text regions for building navigation aids.
- Dump Utilities
- Utilities to dump image file format parameters in comprehensive, compatible and automatically useable ways.
- Jam Utilities
- Utilities to insert information, parameters, comments, copyright notices into image files of various file formats.
- Quality Checking Utilities
- Utilities for page-to-page consistency checking of image sets, automatic image quality assurance, test chart analysis, etc.
The beginnings of an architecture document exist. Eventually, this will serve as a guide to developing new BookTools.
Sponsors are actively being sought to fund the development of public domain BookTools. This sponsorship may go forward in a variety of ways:
- Underwrite your own programmer's time to write a utility under the BookTools Project.
- Contract with Picture Elements to write a utility under the BookTools Project.
- Contract with a developer of your choosing to develop a utility under the BookTools Project. These could be companies or graduate students seeking funded, interesting development projects.
Sponsoring organizations will receive prominent credit for their philanthropy.
This is a simple way for organizations needing a utility for a conversion project of their own to get it developed, while assuring its wide accessibility to the community and its continuing support by a collaborative group of programmers on the Internet. Just add a line item to your conversion budget for the development of the BookTool you need.
Help make the inexpensive mass conversion of books to images a reality!
For more information, please contact:
Lou Sharpe, BookTools Project coordinator
lsharpe@picturel.com
303-444-6767
| Home |
info@picturel.com
Copyright © 1997 Picture Elements, Inc.