Friday, December 24, 2004

Idea of the Introspector:SWSH the SemanticWebSHell

First of all, Merry Christmas!

In this article, I want to propose an semantic web shell, the SWSH that will be the first key user interface component to the introspector.

The Name

I was looking at the CWM today, and then Swish, and was wondering about all these Semantic Web Acronyms.: Swish, Swig, Swap. The first name that came to my mind was Swash
(but later I found out that it is taken), and I thought what could swash that be?

What will SWSH be?

Well like bash, but for the semantic web. This is something that has been going through my mind as of late. It is related to applying the introspector to bash as well. I have long since planned to make an introspector interface to bash, and I have the need to pass more information into the gcc to guide the rdf output, so my plan was to describe all the parameters to the gcc in rdf so that I can relate them to the resulting output.

A semantic web shell will allow you to interface to any command from the shell, but the parameters, returns, environment variables and scripting are accessible via rdf resources.

Each shell script, each command, each variable, each invocation and each file are defined as rdf resources and they will be able to annotated.

The environment of your shell with be an rdf storage. The shell will allow you to wrap bash commands as resources as well, and describe them using rdf.

Networks of Pipes
The piping system is important, pipes will be able to be defined as rdf graphs, and the most interesting thing will be the logic that can be done on the data elements between the pipes.

You might decide to insert an agent to make some decision on the data in the pipe as it passes though, and split it out into multiple pipes depending on the value. This will all be possible.

Adding in semantics to existing output
The next innovation will be conversion of data from cut and split into rdf.
You will be able to say that "cut -d: -f 1,4,6" will return three columns and define a resource to describe them further. This will allow you to mark up text files.

Implementation :

Of course you are asking yourself how this maybe implemented.

One key component here will be replacing the getopt lib with redland. All of the options to all these tools will be able to passed via rdf.

Another key component will be replacing printf with an emission of an rdf statement. This will include all the information about the context of the program where it was called, and all the parameters extracted via the gcc introspector.

In order to be able to implement all of this in one lifetime, we will need the gcc::introspector to provide us with all the information need about the data structures of all the programs and we will need to translate the data structures into rdf. This can be done semi automatically however, like is seen with the serialization routines that are possible in java and C# when reflection is enabled.

As soon as the ability to traverse the ASTS of the gcc is stable and efficient, it will be feasible to create meta programs to create a base level rdf interface to almost any program. Then by marking up of the data structure with more advanced semantics via rdf, the binding can be customized and regenerated in iterative fashion.


Future Music

When the introspector is in full gear with modules for each command that is executed, you will be able to extract also the meta data out of the scripts. For example if you have a awk script, then the awk::introspector will give you an rdf dump of that script which can be processed further.

When you are compiling, you will be able to invoke configure, make and compile all driven from the metadata about your computer. The project data will be extracted out of make, the configuration data out of autoconf, the source code data out of the gcc. By introspecting over the Linux kernel via the gcc and the gcc itself, you will have all the metadata about your machine available.

The Linux kernel will also be enabled with a introspector, so that all the kernel symbols will be accessible via the rdf query interface. Shared libs and dependencies will also be rdf resources.
Include files, and libs as well.

At the lowest level, even the file system will be treated as an rdf resource, directories, files will be addressable and annotatable via rdf. The commands, ls and file will be able to return rdf objects as well.

By using rdf, we can unify all the tools of the Linux system from the kernel down to the shell and present a single point of contact for all information in the system.

Also, for source code, and files, we will be able trace the history of each file, The edits to it, the copying and linking of it etc. Via the shsh, the history file will give you all the information you ever could want. When cvs, svn have be added to this framework, then you will be able to trace all changes and associate them to who made them.

When the editors and web browsers have been enabled, then you will able to get rdf descriptions of each edit, and each history file. Then you will be able to much easier find out where files and changes came from. The editors can also use this information for highlighting and intelligent editing.

mike