getLost

getLost is my triplet based data management system

Data System

getLost is my project to integrate all my data into one system. The system is based upon a triplet structure, similar to RDF. I'm assuming a familiarity with RDF (I consider Tim Bray's introduction good, mainly because it does not go into the semantic web).

Browser integration

getLost has an tight integration with the web browser (Mozilla). There is logging to getLost, i.e. it is recorded which web pages are visited, recording time of visit and the contents of the page. This information is not yet used very well. But the idea is to use this to be able to see what has changed on a page and use the number of visits to a site as some kind of ranking system for advanced searches (I often have that I know I read something but that I can't find it anymore).

Another integration with the web browser is the Mozilla-sidebar. The sidebar displays information about the current web page, some automaticly generated data like the last time visited, some kind of pagerank, and some of my own (manual added) data about the page (of course there is also the possibility to add data about the web page).

Search Engine

As already mentioned above I'd like to do searches taking advantage of the logging information, and possibly other metadata like comments made about a certain webpage (if I took the effort to make a comment about a page it is probably good and if my comment also mentions "A" the page probably has some good information about "A").

But what I consider much more interesting is to search making explicit use of metadata. To do searches like "creator > walsh" which finds pages about which there is a triplet which has a predicate which something to do with "creator" and an object which has something to do with "walsh". So this would find http://norman.walsh.name/home, because there is an triplet at http://norman.walsh.name/home.rdf saying
http://norman.walsh.name/home -- http://purl.org/dc/elements/1.1/creator --> http://norman.walsh.name/knows/who#norman-walsh
and there is the triplet at http://dublincore.org/2003/03/24/dces saying
http://purl.org/dc/elements/1.1/creator -- http://www.w3.org/2000/01/rdf-schema#label --> 'Creator'
And according to http://norman.walsh.name/knows/who.rdf there is
http://norman.walsh.name/knows/who#norman-walsh -- http://nwalsh.com/rdf/palm#surname --> 'Walsh'

The importance of this is that it makes use of metadata without knowing what the metadata is really saying. This is a completely opposite the Semantic web idea with machine-understandable, assertions, deriving logic, proofs, etc.
When I read Tim Bray stuff, for example his introduction doesn't talk about the semantic web, just as RDF as representing data in standard way in triplets, with the property that subject, predicate and triplet are (can be) resources.

The mayor problem at the moment is the crawler, which isn't that good, and only supports RDF format, (it scans other xml files only to extract links from it). And there isn't that much metadata available in RDF format. Probably adding support for other formats (like RSS) will somewhat reduce this problem, but according to Tim Bray There is no cheap metadata. Actually the only place with a lot of good real-world metadata I know of is norman.walsh.name (and therefore I consider it the best source of information about RDF/metadata).

The information from the RDF files parsed by the crawler is also used in the Mozilla sidebar. I would like to use this so that if I have comments about a web page and publish these as RDF, then (when getLost is public) others can see my comments in their sidebar when their visiting the same web page.

Web interface

Since I don't know a easy editor for RDF/metadata, and editing RDF/XML by hand is very easy (I think the RDF/XML syntax isn't very nice, It's the Syntax, Stupid! or Is RDF/XML Good for Anything? explain very well why I think this). So I'm working on a web based interface to edit metadata. This web based interface is also used in the Mozilla sidebar so metadata about web pages can be easily added. So when I read some interesting web page I immediately make some comment about it in the sidebar.