Monday, December 20, 2004

Data Nirvana

This is a post I'll probably continue to edit and work on over time. I want to start brainstorming a Universal Data Storage and Retrieval System (UDSRS) that will enable me to structure and process information more effectively. The idea is to accumulate a set of tools that are capable of passing off to each other whatever arbitrary forms of data I might have saved or run across, each one processing a given type of data in a specific way without locking it up into a proprietary data silo where it's not available to other tools for other purposes. At the same time, I want to have one (but not necessarily only one) easy access point for any and all types of data in the system.

  • Avoid proprietary formats where possible - but it won't always be possible. For example, as long as I have clients who send me (sometimes quite complex) Word documents, and given that tools like Wordfast do exist that enable me to work with reasonable efficiency within Word, it'll probably never be worth the effort to try to create XSLT transforms or something to round-trip documents between Word and some "open" format.
  • That said, just about all data can be represented as text. Either the content itself can be boiled down to a string (including HTML, XML, e-mail, .ics, etc., etc.), or at the very least a network or local URL (i.e. a string) can be created to point to the file holding the binary data, e.g. a graphic, application, or proprietary-format data file.
  • Thus the fundamental unit in the UDSRS is an "item" consisting of a text string. It may or may not contain various types of metadata and/or its own subordinate elements.
  • Within the UDSRS, an "item" may be stored either as a separate file (e.g. a .webloc file) or as a string (e.g. a URL) within a longer text file. Items can be seamlessly interconverted between autonomous files and intra-file text strings.
  • The native object represented by an "item" may or may not be a separate file. In most cases, such as individual e-mail messages, address book entries, calendar items, etc. they will exist natively as sub-file-level objects.
  • Those native objects that do exist as separate files can easily be incorporated into the UDSRS in the form of file:// URIs.
  • For other kinds of objects, it will be worth considering switching to tools that do in fact store each object as a separate file, but in most cases this will probably remain impossible or impractical.
  • Each tool that represents objects on a sub-file level must provide a storage/retrieval mechanism capable of creating a link in text format, such as a script, to any item. At a minimum, we should be able to automatically generate an AppleScript to "look up" the object (such as a contact record in Address Book), then run that script via Services when we want to retrieve the item. Anything clumsier than that would probably be unusable, and simpler solutions should be sought whenever possible. These storage/retrieval mechanisms for each tool may be regarded as "plugins" to the UDSRS.
  • The UDSRS itself will offer various ways to sort, tag/label, organize and manipulate the items (strings) it contains. For the most part, more specialized tools will be used to read and work with the actual content of the items. For certain item types, it may make sense to use more than one tool to work with them for different purposes - for example, I may want to incorporate the same text into both an e-mail message and a blog post.


Post a Comment

Links to this post:

Create a Link

<< Home