GCC Hacker: Wednesday, February 23, 2005

Speech for 15 minute short lightning presentation on the introspector on Sunday the 27th of Feb. at the FOSDEM.

Because I have problems with timing my presentations in the past, I have decided to write a script for my 15 minutes to make sure that we get the most information packed in as possible.

After reviewing my material, I have discovered that there is enough material for at least an hours presentation. I have moved it out to my blog and you can find it http://rdfintrospector.blogspot.com/2005/02/removed-text-from-introspector.html

Introduction : 1 minute

Hello all, thanks for showing up today to listen to my presentation. I would like to talk to you about something that I have been obsessed with for years : the true nature of programs that process other programs, what I would like to call "meta-programs".

Because of the time limitations on this speech, I will not be able to take any questions, or be able to go into much detail. The purpose of this presentation is to state once and for all the scope and purpose of the introspector project and call out for support from the free software community.

Let be start by stating with my personal historical motivations and the core questions to be answered by the introspector project, then get right to the core ideas that I would like to imprint upon you while providing definitions for the terms I will be using.

I will not be able to presenting supporting details for my theory, and or present a full the history and current state of the project because of the lack of time. I have however included it in this paper for your review and look forward to discussing it with you after my lightning talk if you are interested.

I think it is more imporant to understand the scope and setting of the introspector than to understand how it is currently implemented. The point is that introspection is a mental process, it is a way of thinking more than it is a software.

The Original Motivation : RAD 2 minutes (3rd minute)

When I first started learning computer programming as a teenager in the 80s, I was drawn to the ideas of turbo prolog which I played around with, but never really could make use of it. What I did make use of however and become fascinated with was the DBASE III system which was widely used at the time.

The thing that made DBASE so attractive was that it is so simple, practical and interactive. I followed the evolutionary path of these simple database systems from DBASE III, to DBASE IV, to Clipper, to Borland Paradox, and finally to Microsoft Access, I became convinced of the power of simple database solutions.

RAD (Rapid Application Development) was one of key ideas of the 80s, and I was imprinted at an early age by this idea. The Usage of Screen Painters, Simple interactive development environments, Program Generators and Reporting tools were the keys idea of RAD.

Later when I started to seriously program in C and SQL I was disappointed with the amount of work and resources that were needed to be put into creating the same simple functionality that was available in DBASE III! I longed for the a way to be iterate over the fields of a record in the simple manner that you could do in DBASE. This functionality was key in allowing for the creation of screen painters, report generators and all types of really useful programs. In short, RAD!

I was deeply interested in all types of tools to make this work simpler, and looked into Case Tools, persistence toolkits, and in the end, wound up writing my own program generators for C and C++ from the very beginning that emulated the best parts of what I had with DBASE!

OEW: or why write your own parser ? 1 minute (4th minute)

I worked for Innovative Software back in 94, now called IS-teledata. I was attracted to the now discontinued program OEW, the object engineering workbench, a c++ round-trip computer aided software engineering (case) tool. It could parse out your C++ code, allow it to be edited in a simple self-styled diagram (this was before UML, and the fact is that Booch's clouds at the time were just too complex to draw!) and it could finally regenerate the new code right back out, producing documentation and reports. They had a lossy C++ parser, it could not handle all C++ code you threw at it. My question at the time was, "why don't we just use the GCC compilers parser"? I had then gotten a copy of the source code of GCC and tried to read it! I was LOST in the complexity of the code. There was no way that I could make sense out of it, I did not even know where to start!

This however was the second key motivational idea behind the introspector project. And now, 10 years later I have started to answer that question.

Part of the answer to that question is a second question if the GPL can prevent the usage of the parser by another program ? The short answer is there is nothing preventing this from happening! What if the parser were to emit all the data that it contains about program at hand into a readable format? I will try and that question in more detail later after I define my terms.

Why doesn't the Compiler have a public data model and an external representation of it data? 1 minute (fifth minute)

The next question that had to be answered was why there is a lack of a model from compilers internal data! The OEW tool was also lacking this feature. At the time I wanted an API into the OEW case tool, a way to get at the data, so that I could create a RAD like tool for C++ and have the features of DBASEII!

This was the key problem that prevented me from proceeding on many levels. My answer to this question is presented here : The model of the compiler data is really the same as the model of communication itself, this communication is context specific, and between the programmer and an software agent working on behalf compiler writer.

The attempts to define standards such as MOF (Meta-Object Facility) and XMI (XML Metadata Interchange) show how it is very complex and impractical it is to define the model of the metadata of software. The semantic web project is the best attempt that I have seen so far at being able to capture and annotate the models of software. That is why I am using RDF and OWL as the basis for the storage of the data of the introspector.

Core Ideas

Here are the core ideas and definitions of the introspector project, If you leave this presentation today with these imprinted in your mind, then I will have been successful :

The introspector is a pattern for the behaviour of the programmer, a process that is applied to your software with assistance of the introspector tools.
All software programs, source code and binaries are messages from the author of them to other people and agents that represent them. In the end the processor, the chip, is an agent that represents the chip producer but is acting on behalf of the owner of the chip. The chip is communicated with to be told what to do by the programmer. The compiler is an agent that acts as an intermediary between the author the program and the chip itself. The programmer produces software that acts an agent that is told what to do by the user and then translates that message via a network of messages to the chip, while the software is running. (1 minute. 6th minute)
Communication is context specific. The language that the chip understands is deeply tied to the chip itself. Communication with it is context specific, it requires an understanding of the current state of the chip and the computer system to be efficient and effective. In addition it is dependant on the wiring of the chip and the features and functionality provided by it. Communication with compiler is also context specific, it provides a simple layer of abstraction above the chip itself, but it is not able to fully distance itself from it. The program itself is also written in a context and executed in another one. All of these contexts are different and communication between parties and agents in separate contexts is inhibited by the accidental complexity occurring when translating the message between two and more contexts. (1 minute. 7th minute)
Meta-Programs are programs and agents that process these messages from the programmer. These agents communicate with each other, in general via a whole bunch of incompatible file formats and data structures, all very messy.
Meta-Data is the data that about software, it is the sum of all the data that is processed and passes through all of the the meta-programs. The Source Code of program can be considered in this framework to be meta-data, but on the lowest level because it is not structured explicitly. Only after it has been processed and split up by the meta-programs does it contain more information and is more useful. This added information is the meta-data that we are really interested in. (1 minute. 8th minute)
Object-Programs are the real instances of the software that is being executed by the user. The binary code of the object-program is itself meta-data that is emitted by the compiler as a message to the chip. The full trace of all the meta-data associated this object-program is defined to be all the data that is used to produce the binary code. A full trace of all the meta-data, of all the messages that were used to produced the object-program during the entire build process is what we are interested in collecting and understanding! (1/2 minute. 9 1/2th minute)
Object-Data is the data that is contained and processed by the Object-Program. This object-data can be partially understood by looking at, and cross referencing it with meta-data we collected about the object-program during the build. But, we also need to follow the trace of the object-data through the program itself! That means we need to know all the data that flows through the final object-program running on the users computer and capture that! If we have all the meta-data about the object-program's build, and we know the entire flow of the object-data though that object-program, and have a trace of the execution of the object-program, then we can begin to understand the structure and the source of the object-data! To collect these traces, we have to modify the object-program and teach it to intercept and enrich the object-data with meta-data, and collect the execution traces, or we need to create a better debugger or even kernel module that can do so. If this proposed introspector kernel module was able to access the full meta-data about the build of the object program, then it could automatically collect and start decoding these traces! But, In addition we will need to capture data about the execution context of the object-program in order to begin to understand data originating out outside of the system, by using documented test cases and benchmarks we can give solid descriptions and meanings to the execution context. (2 minutes. 11 1/2th minute)
Reflection is the process of collecting meta-data and processing it by the programmer or user. It is the basis for writing meta-programs. The programmer needs to be able to query and even update the meta-data about the object-program in order to use reflection to it fullest capacity. Programmer and User Specific Code that is executed at compile time would allow the most powerful form of reflection, the ability to add in new processing instructions and patterns into the compiler itself. This would require communication from the user context or programmer context back into to the context of the compiler. The introspector aims at providing this ability by opening up communication channel between the users and programmers to that of the compiler developer! Now, more simply, Reflective code that is executed at run-time needs to be able to access and maybe even update the of meta-data of the program which can be stored in file or embedded into a shared object. (1 minute. 13 1/2 th minute)
Introspection is the process of a user or programmer evaluating the results of reflection, it is normally motivated by the need to learn about the object-program, or a concrete problem in the object-program or need for a feature. The full set of meta-data, including traces of the object-data, are evaluated in the context of that concrete problem. Ideally the introspection would be started with a input file that describes the exact nature of the results to be gained.
Resolution is the creation of concrete changes to the object program, It will normally result in a set of meta-data describing the new things to implement.
Execution is the final commiting and implementation of these changes to the object program. This includes generation of code, creation of packages, the communication of meta-data back into the context of the build environment.
Interception is the process of intercepting, and capturing the message between two meta-programs or between two functions in a meta-program.
Enrichment is the process of adding in more context data and more meta-data to the existing set of data. This feeds understanding.
Visualization is the process of selecting, focusing, filtering, layouting the meta-data and feeding the results to the visual cortex of the user for further pattern matching.
Understanding the human mental process that involves the visualization of the results of the introspection, and the refocusing of that process on arising open questions until the mind has built an internal mental model of the software. Understanding involves translating between contexts, and the creation of abstract contexts.
The mind of the programmer and user of the final program is what is feeding information back into to the meta-program, and therefore the interface to the user and programmer must be as good as possible. The current form of using many different languages and formats of the meta-data creates accidental costs in the communication. The actions of the programmer can be seen as following some process and program, these actions are then codified in the meta-program.

Thank you for listening to my speech,

I hope that I have explained the motivation and the goals of the introspector project to you. If you are interested in hearing more about it, please contact me at mdupont777@yahoo.com, or jabber me at mdupont@nureality.ca, or visit the introspector irc chat at irc.freenode.net:#introspector

GCC Hacker

Wednesday, February 23, 2005

Introspector Lightning talk at FOSDEM 2005

About Me

Links

Previous Posts

Archives