An index or topic hierarchy of full-text documents can organize a domain and speed information retrieval. The automatic generation of such an index is an ideal application for unsupervised learning, where the learner creates an integrated summary of a domain. Traditional indexes, like the Library of Congress system or Dewey Decimal system, are generated by hand, updated infrequently, and applied inconsistently. With machine learning, they can be generated automatically, updated as new documents arrive, and applied consistently. Despite the appeal of automatic indexing, organizing natural language documents is a difficult balance between what we want to do and what we can do. For optimal performance, the machine learner must know or acquire all that a human library patron knows about natural language. This will be beyond the capabilities of machine learning for many years to come. For the foreseeable future, we will have to apply approximate solutions to the problem and do whatever data engineering is necessary to yield good performance. This paper describes an application of clustering to full-text databases, presents a new clustering method, and discusses the data engineering necessary to use clustering for this application. In particular, the paper deals with engineering the feature set to permit learning and otherwise engineering the data to match assumptions underlying the learning algorithm.