Monday, February 19, 2007

enterprise introspector restarted

As some of you who know me longer, I have long dreamed of introspector functionality for the enterprise.

The challege is to have enough information about your business so that you can reference your data files against it.

After the process of converting my older ontologies into swoop ones, the cleaning up of my original work to collect and understand business software

I am using sql-ledger as an example of how the full application stack of a enterprise application can be introspected.

I have opened up a sourceforge feature request for posting the files.

We have a raw database model extracted with a small tool using the DB-Introspector . Here is a mini owl mode for the database, separated out so that it can be updated later.

so far, I have an mini owl model for the business, which I should describe better .

basically, I split the tables into entities, relationships, and resource descriptors and physical resources.

transactions are just relationships over time. projects are just long (recursive) transactions.

an address is just a resource descriptor.

I will continue posting more and more data about sql-ledger as this project continues.

here are some of my plans :
create a model of the menu system in owl. be able to addresss each menu point in some business workflow.

create a test case for each menu system that collects the form data for each menu. capture a trace of the entire application stack.

be able to reference each part of the trace to each part of the ontology.

creation of a owl document that describes each form. be able to show the mapping of the form fields to the database table fields, all the way to the translation strings.

in the end, we should have a rdf descriptor for the entire sql-ledger system.

with that we can then create an rdf descriptor out of a single instance of the system.

hope to hear from you,

mike

Sunday, November 12, 2006

Saving "Insane in the Kernel" from chat

you can see a draft of my song on the gentoo site :
http://dev.gentoo.org/~vapier/wtf
so i wrote a little song
to the tune of insane in the membrane
from cypress hill
lyrics : http://www.citay.de/texte/a_d/cyp_ins.html
Insane in the kernel
insane in the shell
a hacker like me is going insane
insane in the kernel, insane in the shell
Like stallman hacked that emacs
I'll code this here script
and release you an rpm
soon i gotta get my cvs update
Microsoft trys and patent my code
these crackers want to root my server
head underground to the next project
they get mad when they come to raid my server
and i take off in my ssh -C connection
Yeah, i am the hacker
the pilot of the this here project
when i dream of the ultraviolet software
and hide from the microsoft cronies
Now do you belive in the unseen?
Look, but dont make your eyes strain
a hacker like me is going insane
insane in the kernel, insane in the shell
repeat
wtf !?

Meme War : Software Industrial Complex vs Free as in Freedom

Orginally posted to Groklaw :

Authored by: Anonymous on Monday, November 06 2006 @ 04:22 PM EST Pam, you write :
"I'm very sad about Novell. Whatever they thought they were doing, they are
now Microsoft's FUD puppy, and contractually they will be having to repeat
Microsoft's FUD with every deal, I think. Every time they tell a prospect that
they have a patent peace with Microsoft, they are implying that one needs one,
and the damage to Linux's good name is obvious right there."

Let us take that statement, replace Novell with Mono, and Linux with DOTGNU.
If you reread the sentance, you will see this entire pattern has occured before.


Then we add to the idea chain here :

MS - ximian(mono) - Novel - Suse

Bang, we see the strong connection between ximian and microsoft has taken place,
and how the huge fight between the two occured. The industry backed mono with
book contracts. You see how contractual agreements underline now what was before
just a underlying disturbance in the force.

How long has this disturbance in the force been there? I think for much longer.
Something is deeply, darkly wrong at Novell and Microsoft. They are part of the
SIC. The software industrial complex.

Pam also continues to question the timing of this thought:
"How could Novell not see that? Is it too late to nix this devilish
deal?"

The devilish idea could have very well happened much earlier, even before when
the midnight commanders author decided to throw the gpl overboard on the mono
projects classlibs and break away from working with the gnu dotgnu p/net
project.

It has been said ximian has very good relations with Microsoft for a long time.
It has only been to their benefit, except for the disturbances in the free
software community. Hurt feelings are not something that corporations need to
care about really. People are hurt all the time, it is just business, nothing
personal. What does Freedom matter anyway? Much better we just take it away from
the user, so they cannot leave. that is what businesses need: stable, and
non-free workers bound by all sorts of contracts, visible and invisible. I have
worked inside the SIC for years, and can tell you that It loves microsoft
office!!!

Microsoft has been creating many good relations with developers, and turning
them away from free software for years. But that is just business.
I was also a microsoft junkie, until linus brought me GNU! It was not the FSF
that got me GNU.

Some of the reasearchs at microsoft all use Open Source, GNU and BSD tools,
because they are the best, you can find references to that on the research pages
at microsoft. Microsoft has also restributed perl for years and other tools. I
remember even seeing xenix from microsoft at radio shack in the 80s.

Intel is also trying to get into the linux software business after downturns on
windows also affected thier sales. It is the nature of the market that they are
all tied together in a massive web of interdependancies that define the
technology market.

Now what if Intel would finally find itself needing to develop its own version
of linux that is optimized for its chip? What about a linux on the chip, burnt
right in an optimized out. A linux chip.

Think about what would happen if your chip contained the ability to compile new
highly optimized programs, a compiler itself. Take the gcc and turn it into a
chip!

Imagine to be able to create even new computer chips, or rewire them using nano
robots. A fab chip that contains an entire IC FAB on a chip itself. with
nanobots working on it to produce new chips inside it.

All these new technolgies can be implemented open source tools. Of course to
produce such chips you need to be the most advanced manufacturer on the planet,
but you will still need software. Why not let the people own the software?

Open Source software is Adaptive software.

It has a high viability because it copies itself freely consuming all available
space. It tends to consume software developers completely until they turn into
memiods of a given software defending it to death. That is the true greatness of
Free Software, it is the great minds that have been attracted to it. It is the
ability to see them interact an watch how they think, how the software grows. Of
course we see that at in the SIC as well. But you dont get to see the sources,
or have time to understand things most of the time when you work inside the
SIC.

What if the rate of change in the software would reflect some type of metric, we
would watch the rate of change of open source versus closed source.

In companies, the rate of change in the source is defined by the contractual
flow of money to the software engineering process that delivers to requirement.
Things dont change for years and each line of software is so expensive that
there better be a good reason to change it.

In free software, the rate of change in the source is defined by the meme
strength of the software to copy itself onto a developer, who then ebodies and
carries it. See the Egotistical Meme from Dawkins for more about memes.

Now we can apply some dawkins game theory here :

We will see many creative people developing new creative ideas, so, there is a
chance that the meme will mutate into something new and exciting. Lets call them
the doves.

Yet not all play for the gain of all. We look at predators. How many of them
will turn against the meme and go against it. Someone like me, growing up as a
Microsoft memiods turning into Free as in freedom GNU/linux memiods. Or someone
who grew up in Freedom turning against it ie: Ximian.

The population of the software development market is attacked by waves and waves
of memes searching for hosts. Each one hopes to capture a developer working on
it. Each one has some scheme.

The closed system scheme is built around a soft landing. Microsoft software
development tricks you with wizards that hold up the light for you to walk in
the dark, but lead you down the path into complete dependancy. It is a warm and
fuzzy place.

The Free Software movement confronts you with someone who is not getting any
good press. In fact the newspapers seem to go out of thier way to not talk about
free as in freedom at all. I almost choked the other day when the FAZ was
talking about creative commons and the wikipedia. The capitalistic press just
cannot handle GNU.

they find the idea of Free as in Freedom distasteful, I think. It must have
something to do with the word Manifesto.

Most industrial companies feel the need to control the freedom of thier workers.
Maybe they have to do as well, and there is the real core of the problem.

Lets view the world from that of an egotistical meme that has an army of
memiods. Lets call this meme "SIC" (the software industrial complex)
we can define it by a simple set of rules :
1- Those who have must protect it from those who dont.
2- Those who dont must have a problem, so we sure should not help them, they
might multiply.
3. What better way to protect your own, when you can just disable the
competition with FUD.
4. Capture the mind of the mentally weak, fill them with ideas, make they want
to buy our bugs.
5. The stronger ones we will give them real benefits to control the weaker with
our FUD.
6. Create a hierarchy of FUD that trickles down to the office level and floods
the minds of the workers.

Now, let me tell you the real cost of msoffice to the SIC, it is the cost of
training slow neurons. No one wants to do it. They might start a riot.

What is software all about for the SIC anyway? Its need OFFICE for the sheer
cost of brainwashing and retraining all those neurons! And to think that the SIC
has been investing in these software memetic brandings for a long time! It is
alot of energy invested into, so it must have some purpose.

Just look at the cost in calories it would take to retrain the nation to use
open office!! What a waste of resources, we should let them have office.

give the people the ability to learn linux. that goes against the entire idea of
a empire of SIC.

Seriously folks, lets spend those resources on something worthwhile, like giving
internet connections and computing power to the third world. Let's teach the
world to sing in perfect harmony! Lets set a sample for future generations and
share with them what we know. Why not let them see how we developed software?
Why not share with them something we have worked hard on?

How many of us are willing and able to put work into becoming the perfect free
software memiod? Who is willing to make that sacrifice of time and resources.

Do we not need freedom to have freedom? If we dont have a computer, then we
cannot enjoy GNU. If we never learn to read we cannot program GNU. GNU needs
young minds to copy itself onto. Fresh neurons. We should invest more into the
third world software development. but how can you invest without money? No
money, no calories for neuron imprinting.

Anyway, enough for tonight.

mike

---------------------
Update :
I have found a nice page that gives more information Softpanormas Stallmans Page:

Donations pay for expenses, not ailing kids' dreams) are applicable to FSF. Moreover additional question about possible conflict of interests is perfectly applicable too. It looks like FSF accepted generous donations from Eazel. At the same time outspoken Eazel's co-founder, Miguel de Icaza sits on the board of directors of the Free Software Foundation. At this point RMS words "Go Get 'em, gnomes!" appear to have a quite different, more troubling meaning. As Denis Powell noted in his paper Wanna Invest in a Bridge Okay, How About a Donation :

Here the linux planet note about Ximian/FSF

...Because, you see, it seems as if not all information wants to be free. The financial records of the Free Software Foundation, for instance. I've repeatedly requested them, and those requests have gone unanswered. It is a peculiar irony that I can easily learn far more about the financial dealings of Microsoft Corp., than I can about the Free Software Foundation, where information wants to be free so long as it's other people's information.
I am not alleging impropriety here. It could be that it's all mere coincidence. But it is absolutely undeniable that the FSF has thrown its support behind a desktop controlled by two for-profit companies, one of which has an officer who sits on the FSF's board; the same company has purchased advertising aimed at confounding those who are seeking a desktop that is truly free in every rational sense of the word; and the other company has suggested that users can assist its product in surviving but help it avoid paying its bills by donating to the Free Software Foundation, or else an officer of that company has flung down and danced upon his fiduciary responsibilities by saying, in a communication that is part of his corporate function, that people might want to send money to the FSF instead of the company. And they all do it, evangelists as they are for "free" software, with a holier-than-thou air.

Saved the GNU Choo Choo from MSN/M$/Ximian/Novell/Suse

MSN has taken my BLOG offline for some reason.

my link is broke
I get an access denyed
but the page still lives in google :

This was saved from the google cache:

December 08

The GNU Choo Choo

see see http://ingeb.org/songs/pardonme.html for lyrics for lyrics

pardon me boy, is that the GNU choo choo ?

can you afford to board the GNU choo choo?

you leave the M****$oft station at quarter to four

when you hear the whistle blowing at eight to the bar, then you know that hurd os cannot be far!

shovel all the code in, got to keep committing

Ohhhhhh GNU Choo choo there you are!

Theres going to be a certain party at the station

RMS is going to cry until I promise to never say open source

GNU Choo choo.... oh there you are..

Tuesday, October 17, 2006

Summit Systems API Wikipedia Node launched

Press Release : call for public participation in documenting the summit systems api.

I call out to all the people who want to know more about the summit systems API to pitch in and help add in new links and web snippets to the article.

here is one part of it :

Summit API Package Names

  • API Toolkit
  • Accounting API
  • Risk API
  • STP API
  • Hedge API
  • Cash Flow
  • Documentation/Document
  • Financial Toolkit
  • Gateway
  • Interface
  • Loader Server/OpenLoader
  • Open DSAPI
  • ValueList
is this correct? please update the wiki

We need to find people who have this precious knowledge to help explain what this whole thing is and how it works.

wikipedia.org/Summit_Systems_API

The wikipedia is a good place because we can combine the terms from finance and computing into the model expressed in the wiki.

The reason for the blog post is to get it into the rss feeds, wikipedia content is not that quickly indexed.

mike

Wednesday, October 11, 2006

Need for a spamfilter directly into firefox and to use fact++ as a box engine to box in spam.

I would like to ask you to listen to what I think is my new idea :

A new firefox browser plugin that finds spammers and allows you to augment your html elements with an overwrite the class to class=spam to content that is spam, even apply user defined stylesheets to it like to make it smaller or red, etc. Advertising could also be tagged as such. Interesting content as well.
Javascript snippets as well.

The key to making this as a semantic web application is the YouTube effect allowing people to post the best spam rules and earn the most recommends.
Some people will add the new alias of a yahoo spammer via a simple xslt script that generates a web2.0 enhanced stylesheet javascript and a virtual server where the reasoning engines lives and earn a couple of xp. Others who create intelligent spamrules that cover whole classes of spam and will rise to the top of the spammer hackers community.

Others will setup servers and rent professional space where these reasoning engines live and provide them with large caches for running efficent lookups. These servers will run the forground web2.0 process for the users, allowing them to antispam and filter and deliver the web content. It will be sliced into parts and dissected. Then served to you in a steady stream of data context pages each containing logically related data that is cached together. Execution contexts. Basically a program that is executed on your computer that you trust.

I want to have an ssh server that I connect to or some way to prevent my web accounts from being accessed by someone else. For example, If I know that I wont access my webmail except from one computer, then I can add in such a rule. It could be changed from the administration interface that can access in a more advanced way. If yahoo agrees to allow me to limit access.

The whole point is that the common man will be willing to pay a small price for a n secure private antispam webhost filter that he could use anywhere. They would pay rent for servers that run web2.0 apps for them from anywhere.

This announcing of just an idea, limiting it to firefox and using free/open source software is a strategy that requires no risk and protects community assets.

What is more valuable to mankind than reliable and up to date information.
If we consider free/open source as the best way to create a web of trust and honor among mankind and unite all people then we must see that it is also creating an incredible capital potential, up to date and reliable information. Each line of source code is a statement about some real or abstract thing that is described.
When source code is published under an free/open source license then it is accessable from around the world and for all times as a free item. It has the potential to solve very many problems and add a positive gain to the economy. Thus the economic impact of free software is great.
It does not have a paying lobby, like other software giants, which is why most big conservative newspapers dont report on them often or in postive light. You never see much intelligent analysis of open source by the economist. Unless it is something that has muscle like oracle, and willing to post full page adds, you dont get much press.

Just look at the wikipedia. It is the most visited page in germany, more than youtube according to the FrankfurterAllegemeineZeitung today. That shows you how open ideas are more important than just entertainment. Wikipedia is a platform for people with something to say. This antispam system should be as well.

The semantic web ontology engines, there are many of them, cwm from timbl written in python, pellet written in java, fact++ written in c++ could be added in to allow users process the results of the bayesen filter themselves as an rdf datastream. Using an open source project means that you can also get a hosting at sourceforge for free and run example servers on there.

Users would be abler to define and share spam ontologies.

Those ontologies could be used to augment the editing of the spam. The existing bayes spamfilter could be used to view as each web page as an email being sent from the person who the antispam software thinks is the originator the sender of the message. We would try and trace each part of a webpage to its originator, examine its url content and match that against our spam database.

So you would allow people to define thier own own rules in a web 2.0 environment.

All this information can be also defined in a web 2.0 environment, imagine an very cool web 2.0 spam ontology editor app that would allow you to share your spam rules with other people.

Other ontologies would be used to describe the network of interlinked servers and paths the spammers use to hide themselfs.

This ideas came from my original intent on mentioning of the need for yahoo anti spam software to filter out messages that are sent 2 years in the future.
I Needed to manually filter out messages sent from my yahoo mail sent in 2008!
I also Need to filter out messages in many other languages that I dont speak.
Infact it would be great to build antispam directly into firefox.


Mike

Monday, September 18, 2006

The Web 2.0 will produce Porn 2.0 but not the Porn:Ontology#FreePorn

reposted from a Submission to http://www.oreillynet.com/xml/blog/2006/06/the_7_flaws_of_the_semantic_we.html

This is a reposting about my thoughts on this thread before, which have not been published on the oreillynet yet, that is fine, but I would like to get a copy of my post please? Basically I said that Ajax was Sexy and that the semantic web is not viable to sell sex, that is why the Web 2.0 will produce Porn 2.0, but not the Porn:Ontology#FreePorn.

Let me restate my point about the advertising without going into name of the the #1 consumer of internet advertising : The semantic web seen as a pure web of logic is not viable because it cannot be used for advertising. Otherwise it will be forced to contain opaque data designed to stop logic and appeal to the more primative forces . Thus you will always have chunks of data that are opaque. For them to be only small chunks, then they could be filtered out. Therefore the chunks of advertising have to look the same as the rest of the semantic web. But in a closed, secure semantic web of trust there will be be no way for such information to be hidden, thus it is excluded.

This is not the problem of the Web 2.0. It can be the advertisement and the logical content at the same time. the user can be lead to something that they dont even want, and then the search engines will get money for that.

This fuels the industry and that industry is powerful.

see a quote of my previous post here :

The Content Wrangler, Inc. (presumably Scott Abel) writes :

"Nowadays, adult entertainment companies are not just leaders in earning revenue from the Net, they’re also leaders in the technology arena. In many areas, they are the dominate force. The leaders, not the followers. And, they’re doing as much as possible to protect their turf. They file patents to protect their content matching algorithms and online content management and manipulation functionalities. "

Thanks for listening,

Mike

Wednesday, September 13, 2006

Google Blacklisting of my Post on "Why the semantic web cannot work"

I would like to complain about google blacklisting my post on the search results.
"No, Google I Don't mean" on the Google Blogsearch
does not return my post.

"Why the semantic web cannot work" also nothing.

Searching on Blogger Com returns 3 hits including one reference to me.

I even made the quote of the day :
08:55
QOTD : pants
The reason why the semantic web cannot work is that it cannot be used to trick people into looking at pay porn sites. - Mike Dupont
from Danny Ayers | Langemarks Cafe

Now, yahoo does much better!


Msn Even finds a related post :

Here http://www.spitting-image.net/archives/2004_05.html

Here Comes the Semantic Web?
Although many skeptics point to the historical failure of Strong Artificial Intelligence and the logical inconsistencies of human consensual reality as reasons why the Semantic Web cannot work, my view is that the Semantic Web is going to be bigger than Google in terms of its ultimate impact on civilization. It will be monstrous. Huge. We cannot even predict what it will be used for...
article w/links
---I lived and worked most of my life with people with cognitive and language disorders. I think I've an idea what interacting with the Semantic Web will be like.
Posted by Cieciel at 02:44 AM

Saturday, August 19, 2006

Configuration tools part 1 - lsc large scale c++ programming from lakos

I have started a repacking of the cdep, adep, ldep from john lakos.
ported it to g++ 4.03, still has some crashes in the cleanup of ldep. marked the source code, maybe someone has time to look into this.

prdownloads.sf.net introspector LSC-rpkg-0.1.tgz

Friday, August 04, 2006

Photographs of some of my notes





 Posted by Picasa

Sunday, July 30, 2006

No, Google I Don't mean "Gay Films"!!! Why the semantic web cannot work, the properties free and good need to be porn:OpaqueData and porn:Misleading

Hi all,

I am looking for open source tools to deal with amazon
and that have book interfaces.

After using debian apt-cache search amazon to select a couple of
packages that looked interesting i then googled for them,
" alexandria cowbell gcfilms ".

Although I have moderate search filter turned on,
and personalized google search suggested :
Meinten Sie: alexandria cowbell gay films

I was pretty shocked, and after I turned on strict filtering
it still returned the same thing.
well, I guess google is just catering to its porn clients.

Hopefully this will get slashdotted and google will clean up its act.

Now, on the topic of porn, I would like reiterate on my view of
the semantic web.

The reason why the semantic web cannot work is that
it cannot be used to trick people into looking at pay porn sites.

Let me state a couple of assertions :
  1. the semantic web is a medium
  2. for a medium to be viable, it must be usable to sell porn.
  3. all successful mediums have been used to sell porn
  4. there is no such thing as free porn.
  5. There is no free bandwidth.
  6. the semantic web is disjoint from Opaque and Misleading Content.
  7. the porn industry needs a Medium tha can be used to Mislead you into looking and clicking thru into thier pages.
  8. the porn industry needs to create a misleading meaning of free, thus redefining the term free to non-free.
  9. the semantic web is to eliminate that possibility of creating misleading content.
Therefore the global semantic web cannot be used as a viable medium for misleading porn advertising. Because it is disjoint with Misleading a subclass of Content.

Of course the semantic web could be used to create an ontology
of porn and be used in local semantic web,
and from that web a misleading html web could be created.

But as long as the semantic web cannot be used as a misleading medium
for advertising pay porn and
the misleading ads being mixed in with the supposedly free content creates
the viability for the medium.
but exactly this mix is what is explicitly excluded from the semantic web.

the result would be a pure porn page that allowed peer to peer exchanging of
porn based on semantic tags,
that would fulfill on aspect. But as soon as you get into the ability to
globally tag all porn
then the issue is that most of the porn is bad. Not only is the attribute
free misleading, but also the term good.

so in the end, the isps/search engines cannot bite the hand of
it low quality porn industry that is feeding it,
and will never support the semantic web for the customer fully.

In fact, this brings me to the conclusion that the customer will
always need to be decieved for advertising to be received,
and that for a medium to be successful it will always need to
contain opaque and misleading data in it.

I would like to suggest the following
namespace porn with the classes porn:OpaqueData and porn:Misleading as subclasses
of porn:FreePorn and porn:GoodPorn.

You can run the ontology on pellet and it will prove that the semantic web is not satisfiable

Here is a nice interactive view of the ontology

basically I state that the semantic web as a medium is disjoint from advertising.

pellet say also :
B:Disjoint Classes axiom found: DisjointClasses(SemanticWeb Opaque)
Disjoint Classes axiom found: DisjointClasses(SemanticWeb Medium)
Or: unionOf(Misleading Opaque Advertising)

I look forward to some feedback! please send me your comments

mike

Monday, June 12, 2006

Project Management and Free Software

I have been reading about project management in this nice book
[http://www.amazon.de/exec/obidos/ASIN/3455094732/028-8254887-6504516] Project Management für Einzelkämpfer.

It describes how to avoid feature bloat and reduce the scope of your project to the most important things.

This is great advice, and I just wanted to cover some of the issues with using free software.

Lets assume for this moment that you have a task to do, and you have decided to use Free (open source) software. You dont really want to spend time working on the software itself if you dont need to. But lets assume that you have the resources to do this in your team.

First of all, just getting the software to work is a exercise in distraction. Configuring, Compiling and Testing the software is just one task. But what about selecting the right package from the available ones. Or having to use functions from many incompatible parts.

These tasks are in themselves distractions from the project goal.

Now the real issue is the loss of control. The amount of dependancies that a software package has
is not always obvious from the beginning. Just getting the latest version and compiling the software, brings in many new variables into the equasion. How can this be planned and measured?

So, really you get a field full of landmines that have to be defused.

Now look at the number of file formats, and the cost of hooking up the programs to each other.
at last when you want to publish your results you will need to produce nice and easy to consume reports with tables.

So, What I propose is a simple introspector framework that collects all the input and output formats of all the software by intercepting the IO calls and the stacks around them. Then we can mark the memory that is the source of the outside data. Then follow the control graph of the assembly. We mark all the nodes that it travels through. This graph contains test data extracted from profiling the testcases and benchmarks. So we need a real time profiling tool that is capable of memory profiling and association of the profile paths with the data traces.

This will finally lead to a point where the data is emitted. There we collect the calls to output and note the marked memory, as to where it came from. I want to summarise the metadata with an added integer or long that represents an index into a table of paths.


more to come

mike

Friday, April 07, 2006

Human text written in perl mode

Here is something I have been experimenting with, representing my thoughts in perl syntax .

RDF ->TESTS
generate tests of the rdf model

TESTS -> RDF
extract rdf model out of tests

RDF -> HUMAN
Read the rdf into a human mind
introspect on visual pattern matching
introspection -> INTROSPECTION MENTAL MODEL -> VISUAL MENTAL MODEL
-> UNDERSTAND DATA COLLECTED;

INTROSPECTION MENTAL MODEL ->

HUMAN -> RDF
write rdf
write patterns matched

printf
type string, integer, float, constant string
variable, constant, in string
count 0,1,2,3
sources =>{
"local variables" => "declare the variables in the function body",
"parameters" => "add parameters to the function"
}

TEXT IN PERL MODE
=> gives you useful indentation model
=> represents this document
=> POST TO BLOG => sub {

},
=> {
NAME => PERL,
CONVERT TO TARGET LANGUAGE => {
NAME=> C ,
CONVERT TO TARGET LANGUAGE => {
NAME => asm,
},
method => [compile it, and then check the errors, parse the errors,
look at the types of errors,
extract the variable data in the error message,
fix the problem by inserting the missing data.
repeat
]
}
}
=> sometimes needs a terminating ;

Tuesday, April 04, 2006

Tips and Tricks using the GCC, CPP and Binutils

For the http://www.lug-salem.de/ I am preparing a short presentation for showing how to use the gcc and cpp and so for collecting information.

the version I am using is :
gcc (GCC) 4.0.3 20060304 (prerelease) (Debian 4.0.2-10)

the general idea is to specialize the information more and more, adding in more constants.
By dealing with the output of the preprocessor we can get a concise overview of the source code in one file. By looking at the assembler, we can see all types of information that is otherwise hard to find.

Here is the outline:
  1. preparation
    1. GNU/sourceforce/debian/cpan/google/redhat/
    2. documentation
    3. mailing list
    4. unpacking the project
    5. looking through the files available
    6. configuration and debugging m4, shell, sed, grep,test and friends
    7. aclocale, automake, autoconf
  2. modification of the makefiles,
    1. turning on the verbose mode and save temps in CFLAGS
CFLAGS = --verbose -save-temps
CXXFLAGS
  1. CPP and various options
    1. checking how to run the C preprocessor... gcc -E
    2. Macro Bodies
    3. MACRO DEFINITIONS
    4. non executed blocks
    5. DEPENDANCIES
  1. compilation with the gcc, what are the passes.
    1. CPP
    2. LEX
    3. PARSE
    4. AST
    5. RTL
    6. BACKEND
  2. What dump options are available
    1. CC1
    2. I files /usr/lib/gcc/i486-linux-gnu/4.0.3/cc1 -E -quiet -v -I. -I. -I.. -MD device.d -MF .deps/device.Tpo -MP -MT device.o -MQ device.o -DHAVE_CONFIG_H device.c -mtune=i686 -fworking-directory -O2 -fpch-preprocess -o device.i
    3. S files /usr/lib/gcc/i486-linux-gnu/4.0.3/cc1 -fpreprocessed device.i -quiet -dumpbase device.c -mtune=i686 -auxbase-strip device.o -g -O2 -version -o device.s
    4. tree files
    5. RTL
    6. flow graphs
    7. MAP FILES
  1. binutils
    1. NM, OBJDUMP, ReadElf for getting at the results
    1. Finding out the sizes of objects
    2. finding names of functions out of the addresses
    3. unmangling names
  2. using and scripting GDB for debugging and data collection
    1. stopping the command immediatly with a kill sig stop
    2. scripting the gdb
  3. Creating and Dealing with core dumps
    1. Ulimit
    2. debugging without debug information (map files and objdump)
    3. libbacktrace
    4. mapping OBJ files to ASM
  4. Doxygen and Co
  5. GraphViz
  6. Profiling, gprof, cache grid, memory profiles, strace, oprofile

Tips and Tricks using the GCC, CPP and Binutils

For the http://www.lug-salem.de/ I am preparing a short presentation for showing how to use the gcc and cpp and so for collecting information.

the general idea is to specialize the information more and more, adding in more constants.
By dealing with the output of the preprocessor we can get a concise overview of the source code in one file. By looking at the assembler, we can see all types of information that is otherwise hard to find.

Here is the outline:

1. unpacking the project

looking through the files available

2. configuration and debugging
m4, shell, sed, grep,test and friends
aclocale, automake, autoconf

3. modification of the makefiles,
turning on the verbose mode and save temps in CFLAGS
creating of new rules

4. CPP and various options
DEPENDANCIES

4. compilation with the gcc, what are the passes.

4. What dump options are available
I files
S files
tree files
RTL
flow graphs
MAP FILES

5. binutils NM, OBJDUMP, ReadElf for getting at the results
Finding out the sizes of objects
finding names of functions out of the addresses
unmangling names

6. using and scripting GDB for debugging and data collection

7. Dealing with core dumps
stopping the command immediatly with a kill sig stop
debugging without debug information (map files and objdump)
libbacktrace
mapping OBJ files to ASM

8. Doxygen and Co

9. GraphViz

10. Profiling, gprof, cache grid, memory profiles, strace, oprofile

Tuesday, March 21, 2006

introspection as a mental process

Let us look at the human mind as the most expensive processor imaginable.
The IO is very very slow and error prone.
It is however the best pattern matching server we can afford at the time.
So, the process of pattern matching needs to be augmented.
The introspector will need to collect the data from all types of data sources,
and it will need to do so quickly. Therefore it is important that datasamples can be collected and classified. Imaginable is an firm grasp of the gcc toolchain and using that metadata collecting data that way. The metadata is then published for all to use. This would include all byte ranges (Programs(Functions(Blocks(...(Token(Chars(Bytes(Bits(Meaning)))))) of source code with all semantic data attached to define the meaning of the source code. Each statement of meaning is a signed declaration from a sender, and only when that statement has been evaulated and its contents accepted by a different reader, you the reader that is, Or even an indexing system.
Such an indexing system is one of the major goals of mine for the introspector. The idea is simple : Given a model that is completely understood, ie source code of the compiler, we can model any data expressed in that language that we find on the internet via google et al in a page against our model of the language. This will produce a semantic subset of the introspector system, the current set of knowledge that we have about the subject program that we are introspecting.
Thus, a full introspection could be viewed as one mega file local portable net meta(cvs tgz google(mail) mbox) search that queries each resource in each context and builds pages of data for workers to receive and process and analyse layout present review search graphing diagraming.
All of these applications are available under linux . If we can introspect them via the gcc and gather semantic information about them then we can parse those pages and align them with introspector resources presented. Included in the available introspection data provided will be audited samples of the various output files that are traced against the metadata and expanded with metadata in a rdf format. Basically each bit, byte , token, of data that is of any atomic value is treated as an rdf resource in terms of a gcc datamodel. This is available from the gdb as well.
An introspector interface into the gdb would be of great value.

Listening to Erick Sermon Marvin Gaye - Just Like Music

So the idea is collecting these samples of data that is for the human mind and indexing it via the gcc. Each and every relevant resource or configuration of resources that are described in a program serve as the source of a query into a gigantic database (google et al). The results are used to find metadata about the program. By joining the searches , or the results of them, we look for common pages and relationships between them.

Also, now here is an important point :

The testing of this data, and the statements of predicates about those tests can be widely automated, but the final driving force in the human mind and therefore we need to build the best human interface so that the user can drive the introspection process comfortably.

Monday, March 06, 2006

ideas for the introspector

be able to import RDF and annotate database tables with rdf information.
be able to reference a table in the database , a field, etc by describing the sql with rdf and mounting it as rdf database source that is usable in rdf.
Be able to attach a rdf edit control into existing applications.

Monday, February 20, 2006

updated ontology

I have updated the old introspector ontology,
and have made in more standard. Will be updating it more.

Raw N3 that will be processed by CWM : introspector.n3
Processed N3: introspecter_gcc_cwm.n3
Process RDF for postprocessing


Object Viewer
DumpOnt

Friday, February 17, 2006

introspector-gcc.0.1

This is the first release. see the blog for docs. This is the first release of a new gcc introspector implementation. It uses a new directory structure as the output and finally you can use textutils and perl to process the asts! I have converters from this directory structure to a html page in a tree structure, albeit very simple.

I need to update this and add more information on how to build it.
it will only run on i386 for now. Run make in one of the gcc subdirs. ignore the toplevel makefile.

Download from sf.net :

Downloads from objectweb :

Thursday, July 07, 2005

vcg bary rewrite underway

I have been working on rewriting and decoding the bary routines from vcg.

here is the snapshot : http://introspector.sourceforge.net/2005/07/bary-rewrite-0.1.tgz
56cde10020c7700cbc16f0e7074309fa

unwind introspector

Long time no blog, because I have been offline for months. Now I have a DSL connection and can publish some of my files.
  1. Created a new libintrospector that is part of the gcc-4.0
    1. it is not finished, but a work in progress.
    2. Started with the printf introspection, replaced all the printfs in the gcc with a new printf introspector function. This will be using the unwind introspector to create an intelligent stacktrace.
    1. Removed the dependancy on raptor and redland,
      1. There is not a need for the full redland and raptor functionality in the gcc core for now.
      1. replaced the implementation with empty stubs.
      1. will be able to store the rdf data in dwarf2 format and later convert the full dwarf2 data into rdf.
  2. Started on unwind introspector, a new implementation of lib unwind for the that includes a better dwarf2 support.
    1. Extracted the routines from libunwind that are needed to only decode the stack.
    2. Made a simple method for converting the dwarf sections into data sections that are loaded into the image. this simplified access to the dwarf data and eliminates the need for libelf. Moving the dwarf decoding routines into the program.
You can find the first steps here:
03feb06be7c1756d53dd34c5e35a92db http://introspector.sourceforge.net/2005/07/gcc-4.0.0-introspector-0.1.tgz

unwind introspector:
85616411cfa501bf089d0a2744e3c7c0 http://introspector.sourceforge.net/2005/07/unwind-introspector-0.1.tgz

Wednesday, May 25, 2005

gcc 4.0 patch instructions

Dear All,

I have decided to patch the gcc 4.0 and finally produce a clean release of the introspector for popular usage.

The code will be available as a replacement for some files in the gcc-4.0.0 source. I am working on the patches right now, so dont expect it to work yet. Of course you can get the prerequisite packages and test them.

Here are the steps that I needed to do to prepare the introspector:

Install and build the gcc 4.0
  • wget ftp://ftp.cs.tu-berlin.de/pub/gnu/gcc/gcc-4.0.0/gcc-core-4.0.0.tar.bz2
  • mkdir gcc-4.0.0/introspector/
  • cd gcc-4.0.0/introspector
  • ../configure --prefix=/usr/local/introspector --enable-languages=c
    • For now we will only use the c language
  • make
We should have a basic gcc there.

Now we go into the gcc subdir, patch the files from the cvs

  • cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/introspector login
  • cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/introspector co -P gcc-4
Add all those files to the gccgcc/
  • cp gcc-4/* gcc-4.0.0/gcc/

Also, we want to get the raptor and redland libs
  • Raptor
    • wget http://download.librdf.org/source/raptor-1.4.5.tar.gz
    • tar -xzf raptor-1.4.5.tar.gz
    • cd raptor-1.4.5
    • ./configure
    • make
    • make install
  • Redland
    • wget http://download.librdf.org/source/redland-1.0.0.tar.gz
    • tar -xzf redland-1.0.0.tar.gz
    • cd redland-1.0.0
    • ./configure
    • make
    • make install
  • Redland Bindings
    • wget http://download.librdf.org/source/redland-bindings-1.0.0.2.tar.gz
    • tar -xzf redland-bindings-1.0.0.2.tar.gz
    • ./configure
    • make
    • make install

Friday, March 25, 2005

What is readable source?

What can be considered readable source code? What freedom do you have in expressing yourself and calling that source code?

The FSF defines the four basic freedoms of source code

I feel that freedom #1 "to study how the program works, and adapt it" to my needs is a more important basic freedom than freedom #3 for you to "improve the program, and release your improvements to the public, so that the whole community benefits".

Because the "improvement" could be to create derived works that preevent me from reading your improvement, this is what I term as uglified source code.

The GPL allows authors to distribute software that they are the sole author of that is uglified (not in the preferred form for editing) at will, with no punishment. At least the University of Saarlands is doing so with the stated reason of making it hard to read and understand. There is no limit to this is seems. Or is there?

Your creative expression in uglifing your software, distributing it the non preferred form for editing, can limit me from reading the source code.

This is bad in my opinion, and I would like to prevent that from happening to my software under the GPL, if I cannot prevent the proliferation of uglified code in general.

It also means that you should not be able to use my readable sources to create software that is not readable, This should be preventable by the GPL.

Freedom has its limits and there are some freedoms that are more important to the public interest than others.

The GPL in Section 3 states
"The source code for a work means the preferred form of the work for making modifications to it."
Would it be possible to modify the GNU public license to add definitions for uglified code? It seems to be impossible to prevent uglified code, but it should be possible to lay down some guidelines.

Here are some suggestions for some definitions, and they would not limit the creative expression of an author.

1. This code may not be uglified, except by the copyright holder. Uglified means it is generated by some automatic tool that changes the code that is edited by human. The results of the uglification process are not the preferred form of editing. The uglification process is done to take away the ability of a user of the software to read and understand the sources. If some tool is used to process the source, then all the inputs to this tool must be included and the tool must be also included. The uglification process is an automatic process where the original sources are not distributed and the uglifier software not distributed. the uglified source is a derived object and can be considered to be like a binary file.

2. This code may not be combined with uglified code, unless by the copyright holder of all parts. Users may not create derived works that include other peoples code with uglified to create a derived work.

Rational :

Because of problems with the university of Saarlands releasing uglified code under the gpl, code that was modified to be unreadable, I want to make sure that the software I write cannot be included in such a tool. The GPL does not prevent people from creating uglified code. But I should be able to prevent someone from adding obfuscated code with my code and creating a derived work.

I see this a conflict between the freedom to expression and the freedom to read and understand.

The original author of the software can distribute the original source in a obfuscated form, in a way that is automatically converted into something that is really hard to read and modify, and not even the preferred form of editing, there is nothing anyone can do against that.

When it is no longer the prefered form of editable source code, then it becomes a difficult issue because the copyright holder does not sue themselves for violation of the GPL.


Feedback :

Thanks to Alfred M. Szmidt (AMS) for his criticism and advice.

Thanks to S11001001 for pointing out that a new license might not even been needed : I dont even know if a new license is need, or if the GPL needs to be clarified in this case. But I do still want to tell you my idea, maybe it can be used to create a more watertight definition of preferred form of source in the GPL.

Thanks MarcusU from DotGNU for Spel Kheking.

References :
Rusty's thoughts on the claused in the GPL here.
This is also the topic of discussion in the LKML.
The debian policy makers have voted on the topic of the definition of source code here
This topic was discussed on debian legal as well in great detail.

The GCC supports VCG output, but it is also an issue that it is obfuscated
mentioned here
Look for example at vcg.1.30/src/step1.c for an example of the obfuscatoroutput. This is not source within the meaning of the GPL. A strict view
would say that given a GPLed program without full source, we cannot
distribute it at all; even with a less strict view that the authors
intended this version to be distributed, distributing a program without
proper sources from a *.gnu.org site seems dubious.

Loic Dachary Mentions that we are not allowed to apply the GPL to the VCG at all because it is obfuscated
I'm having a problem related to the distribution of VCG, aspublished at http://rw4.cs.uni-sb.de/users/sander/html/gsvcg1.html. Although VCG is published under the GNU GPL, it contains obfuscated source code. As a result, I'm unable to redistribute it because I would violate the GNU GPL that states that the sources are defined as "the preferred form of the work for making modifications to it".

Friday, February 25, 2005

Removed Text from the Introspector Lightning talk at FOSDEM 2005

Here is the material that did not make it into the Original Speech.

Involving the human mind in the process of introspection


One of the major tasks that I see in this process of understanding code is the involvement of the HUMAN MIND, the user.

I think that by feeding information about the software to the visual cortext via the eyes, or by whatever means that might be used by disabled persons, then the minds natural pattern matching and model building process will take over. When the mind is able to then pose new questions to gain more information to the introspector system then the viewpoint of the visualization system is focused on newly selected topic.

The mind will then focus on interesting aspects. The next step is to allow the patterns found to be captured and fed back into the tool. This creates a feedback loop where the meta programming tool is guided activly by the mind exploring the software.

An meta programming tool will be then successfull when it allows the programmer to directly, naturally, and efficently access the data collected out of both the software and the context of the software.

Operations on the data in the form of Structures, Lists, Trees, Graphs, Relational databases, Vectors in Memory and a simple text files. All of these forms of data are needed to allow the programmer to chose the right access method to attack the problem at hand.

Of course GUIS will be of value, and visualization tools that can layout and filter graphs will of use. But these tools need to be secondary to the goal of raw access to the data. All of this data needs to be accessed via . I personally think that the graph layout algorithms can be applied to data structures to optimize the memory of them.

The conclusion is that the introspector needs to be as slim as possible and as efficent as possible in providing useful information to the programmer. But it needs to be as open and usable as possible, providing the redundant representations of the meta data so that it can be exploited.

The Context of programming

The idea of context is difficult to define in general for meta-programs, because you have a meta-context! The context of a meta-program is related to all the contexts of the object-programs that it operates on.

Because of the idioms and the style of the programmer, the important data about a program can be encoded in a unique and programmer dependant style. This style or character of the code enbodies is the essence of the coder. Because of the seemingly unlimited expressability of a programmer, there is no way to dictate how a particular idea will be encoded. Naming and Style Conventions, Coding Styles, and Documentation contain context specific infomation that is needed to understand the code.

To make the problem worse, the Dreams and Visions of the programmer, Conversations between programmers over coffee , Unwritten assumptions, and Cultural Background plays a role in the style of code written.

Programming is Communication

Writing code is a form of formal communcation! When you view code as a message, then you can open you eyes to the interpersonal and social aspects of code that aid in its understanding.

The act of writing code has at least four aspects :
  1. Communication of instructions to the compiler (and other meta-programs) and finally to the computer for execution. So, in the first step, you write programs for a computer. You are communcating the instructions of how the object program is to execute as the real job of a programmer. The Programmer communicates with authors of the meta-programming tool via thier Agent, the meta-program.
  2. Communication of concepts to ones future self. The second step is to write a program so that you might be able understand and reuse the your mental state at the time of writing, the communication of the concepts to yourself.
  3. Communication of concepts to other programmers, and third parties who might use or even further develop your code.
  4. Communction of meta-data back to the programmer in the form of feedback to the programmer. Compiler error messages for example.
Intercepting Communication is one of the main goals of the introspector

The interception of that communication and its decoding by a third party is the next step when the code is taken out of the context of the original message to the Computer Chip.

The problem is that an outside person, will not be easily able to fully understand the captured message exchanged in a closed context with no external reference information.

So we have set the scene now for meta programming : People creating tools for thier own usage as messages to themselves and a small user group and others trying to intercept those messages.

A program is a message. Understanding a program involves decoding that message and recoding it into your context. Usage of contextual information outside of the code itself is often needed to decode the message. The introspector allows you to collect this reference data in a central repository and supports the understanding of the message.

Examples and classes of meta programs

Some examples of what I consider to fall in the class of meta-programs are :
  • compilers, translators and interpreters are programs that process and execute other programs
  • Custom User Defined programs that are written by users to process the software
Programs that affect and control the process of creating the software
  • build tools like Autoconf, Make, Automake, Ant that control the compilation and build process
  • I dont consider tools that are just used in the build process to be meta-programms even if they can be used to implement meta programs, because they are not dealing with the software directly such as Grep, Bash, Sed, and more trivally Tar, Gz and the Linux Kernel. These programs however contain important meta-data related to the program and will need to have interceptors installed to collect that data.
  • Tools that deal with software packages like dpkg, rpm and apt can also be consider to be meta-programs because they are providers and consumers of meta-data about the software.
  • Linkers, Assemblers
  • optimization routines of the gcc
User Space Run Time Functionality
  • The reflection mechanisms of java and the eval function of perl
  • Dynamic languages such as Lisp, Prolog, Haskell, to some extent Perl C# and many other advanced languages that have direct support for meta-programming
Profilers and runtime optimization routines
  • Profilers and Data Collection routines
  • Dynamic Linkers
  • JIT tools and partial specialization routines
  • Process Introspection and snapshoting (Core Dumps included)
  • The GDB debugger
Code Generators
  • Language Creation tools such as Yacc, Lex , Antlr and TreeCC the
  • program transformation tools like refactoring browser tools, aspect oriented browsers, generic programming tools
Programs that extract information from your code and deliver it to the user
  • code validation and model checking tools such as lint and more advanced model checking tools
  • reverse engineering tools, case tools, program visualization tools
  • intelligent code editors and program browsers that have a limited understanding of the code (emacs falls into this catagory in the strictest sense)
  • automatic documentation tools like Doxygen
  • Even IDES can be considered strictly a meta-program, or at least a container for them.
  • of course, I would consider my pet project, the introspector a meta program.

Metaprograms are like mushrooms, they sprout out of dark, damp and dead parts of existing code

The one thing that I have observed is that very many meta-programming projects just spontaniously sprout out of the ground, each has a similar goal, that of processing programs and making programming easier, meta-programming. Most such programs are not reusable or reused, and they mostly do not provide any well defined interface to thier meta-data.

In lisp you have a standard Meta Object Protocol, (MOP), but this is also very lisp specific although well thought out, but on the other side, there is a huge amount of meta-data in lisp that does not have a standard well defined interface into it.
The more context specific a meta-data and a meta-program is, the more effective it is for the context it is created for, the best example is an assembler or compiler optimized for a specific processor. There are a huge amount of research and experimental systems that provide various degrees of freedom to the programmer and user.

For the most part, meta programming tools can be classified into three sections :

1. So context specfic so they cannot be generally reused and are generally disposable. They sprout out of some concrete problem and are just like mushrooms that grow on some rotting material. The scope of the coverage of the fungus is limited by the scope of the problems in the object-program.

2. So abstract and complex as to be not easily usable, understandable, or practical. The context is artifical, abstract and mathematical. This is a different form of being context specific, the context is the mind of the author or his limited slice of research. This is a classical example of a message from the programmer to himself that I will explain later, and lacking any reference to the outside word.
3. The few rare cases are pratical tools that find a safe mix between abstraction and context. The C language has a very small set of abstractions, and the GCC has been able to define routines that are reuable between various languages. The problem with these pratical tools is that are in general lacking any of the advanced meta-programming features that are found in the previous two classes.

Metaprogramming tools normally dont work together or and for the most part they dont work for you

For the average programmer working on an average system, very little is available for thier usage. When you sit down to work on a normal programming task, lets say one associated with working on the source any of the GNU tools, there are basically no standard, integrated and usable meta-programming tools that you can use for all aspects of your work.

There is very little in terms of a standard interface or set of requirements that are placed on meta programs in general. This is due to fact that programming is a form of formal context specific communication that I will explain later.

Metaprogramming tools are disposable

Meta programs are tools that are for the most part disposable. Thier effects result in bugs being found and fixed in such things as validators. Or in documentation being produced. Or in code being generated. The programs themselves interact with the programmer via configuration files or a GUI or via individual commands. The programmer guides and controls the meta-programming process. So in the end, the metaprogramming tools are only as good as they are usable by a programmer. They are only as good as they are applicable to a given problem.

The set of the meta-data for a given program is very large

The compiler is a meta-program that contains a large amount of data about the software at hand, but there is a large set of programs that make up the build process. Luckily for most interesting programs the source for all these programs are available. So all of the tools that are used to direct the build of the software can be considered meta programs that affect the final object-software. If we look at all the data that is contained by all the instances of the meta-programs then we define a large set of meta data that

All these tools, when considered together, use and process many aspects of the software. So we can say that the total amount of data in memory at all points of the process of the running of the meta software contains a very good picture of the software that is being compiled. Now it is the question, how can we get the meta programs to communicate this data to us?!


Recoding the message into a RDF with an explicit context

Now, once that a program as been understood, it can encoded into a context independant representation, like RDF with explicit context, relationships and meaning.

RDF means Resource Description Framework. Resources are things of value that are worth identifing and describing. Every single aspect of the software can be represented as a graph. The Nodes in the graph are resources or literals. The edges are called predicates, they can represent pointers, containment or basically any binary relationship between nodes. In RDF each type of edge is another type of resource and can be defined in detail.

We can assign a unique resource identifier in the form of a URI to each identifier, variable, each value, each function call of the software on the static level. By adding in the concept of a program instance, time and computer we can also assign resources to dynamic things like values in memory, function stacks and frames.

When this model of the program has been started to be built, then the communication in the form of Documentation, Emails, Bug Reports, Feature Requests and Specifications about that program can be decoded, because it will reference symbols in the code. Or the code will reference symbols in the communcation.

Now, the symbols that occur in the source code could be constant integers, constant strings, identifiers in the code, or even sets and sequences of types without names.

So, the first step to decoding a program would be index the set of all identifiers. Then determining the relationship between the identifiers and the concepts is needed. Mapping of names onto wordnet resources would be a great start. The relationships between identifiers needs to be discovered.

By transforming the source code into a set of RDF statements that describe it, and also converting the context data into a similar form a union of the two graphs can be created. Relationships between the two can be found.


Application of Meta-Data to the Interceptor Pattern

If the meta program is changed so that it emits this data in a usable common format, then this data can be put into context and used to piece together a total picture of the context of the software. This is what I call the interception pattern. The message between the programmer and the machine is intercepted and recorded. There needs to be a common API for this interception. There also needs to be tools for automating this interception. That can be done by the usage of the meta data collected from the compiler and the build tools in the first pass. By decoding the data structures of the build tools we can semi automatically create serialization routines. By applying the techniques described here, each program can be trained to communicate its meta data to the introspector. Each program that is hooked up to this framework increases the knowledge available for the integration task.

The idea of the semantic printf function

The next idea would be to replace the printf routines with a general routine to query and extract the data that is available in the context of that printf. Given that we will have access to a list of all the variables available at any given context, and that we will also be able to know any variable that can be directly and indirectly accessed from that variable, it will be possible to invoke and process user specified extraction and interception code that the point of the printf. The printf could reference the point of the meta-data giving each variable to be emitted a very detailed context.

The data that we need is there, we just need to get at it

As the user of a meta program, you often feel that you are a second class citizen. Yes, well that is the core problem that I am addressing. Most programs are written to solve a problem for some person. The fact that you are using it is secondary. The gcc compiler itself is a good example of a self serving program. It represents a huge amount of knowledge that is locked up into a representation that is highly inaccessable. The fact is that much of the information that the user of the compiler needs and has to manually enter is available to the compiler developers is

Because of the large amount of open source tools, and the fact that all the GNU tools are based on a limited core set of tools all available in source format, they are a perfect target for the collection of meta data. Not only are all the source histories available, but also the documentation, the mailings list, and basically all the contextual information. There is a huge amount of publically available data about the GNU project.

The adding of meta data to C

The history of C an C like languages can be seen as an evolution of meta data and meta programs. Each new addition to the language gives more meta data information to the meta program, the compiler. Each language breaks with the previous version for some reasons, good or bad. In the end you are forced to rewrite your code to use these new features. In the end, the process is just the adding of more meta-data to the existing program and then the interpreting of this advanced meta-data by a more advanced meta-program, a better compiler. There is no reason that this meta information and the validiation of it cannot be added via other means and the processing of it decoupled from the monolithic process. Even the addition of meta data about the persistance and the network accessibility of software via DCE IDL and Corba can be specified in the same manner on top of the existing software without new syntaxes.

The reading of introspector augmented meta-data back into the meta programs.

It is reasonable to consider the idea of reading the instances of the data stored in the meta programs directly out of the introspector. The api that the introspector gives for intercepting the metadata can be used to then read the updated data back out, or even from another source. In this manner, entire programs could be translated from other languages or generated programatically. The entire set of intermediate files and file formats can be unified into a common data representation and communication mechanism. This is possible because the programs to be modified are free software and they can modified to provide this interface. The idea of the kernel module would allow for this to be done without changin the software.

The monolith and the network

The fact that the GCC is linked in the way that it is a organisational, political and socialogical descision. It can be also be split up into many independant functions. Given a mechanism for intercepting, communicating and introspecting the function frames any conceviable network of processing can be implemented without using the archaic linking mechanism used by the existing gcc.

The linker and function frame is a data bus, that can be intercepted


The linker and the function call frame represent a path of data communcation. The compiler produces tight bindings between functions and the linker copies them into the same executable. Given enough meta data about the function call, this data can be packed into a neutral data format and the functions can be implemented in a completly isolated and separated process.

Simplicity and Practicality are the key factors for the success of free software

The great science fiction author Stanislav Lem writes in his (polish to german translation) article metainformationsthoerie [1] that the evolution of ideas computer science is natural selection function that selects ideas by the commercial success of an idea and not by the gain in knowledge. He sites the meme idea of richard dawkins who compared information to genes as self replicating individuals competing for resources.

We can treat free software as a meme and analyse its attributes.

For a free software this success is defined in terms of the following terms
  1. Replication - How often a software is executed (invoked), copied, downloaded, how often the ideas are copied, how often the software is used! We can see that the invokation of a program is the copying of the software into the core of the processor, in the moment it becomes active. We can measure the success of software as the core share of it. How often is it copied into the core of the computer, how often does it become alive.
  2. Mutation - How often a software is changed to adapt to the environment. This is a function of how useful the software it and how easy it is to be mutated into something more useful. The paradox of free software is that the mutation functions are expensive because of the nature of the protection mechanism. Free software needs to protect itself as a meme from being mutated into non free software.
  3. Resources - The amount of work, time, space that is required to use, understand and mutate the software. This is the cost function that is to be minimized. The memes success however is
These factors help explain Richard Gabriel's paradoxical phenoma of "Worse is Better" [2]
(Being that I am from New Jersey, I naturally identify with the New Jersey Worse is better attitude). Simplicity and practicality and interactivity are the most important factors in the success of an idea.

I say that interactivity is important, because it is simple and practical in reducing the costs of learning and using a software. When people are evaluating a software they want to within a very short period determine if this factors are met.

Free software has the paradoxical feature that the source code of successful free software tools are complex, impratical and not interactive. The situation created is that the resources that need to be invested into learning the context of free software need to be so high that the programmer becomes bound to that context and identifies with it.

How does the GPL prevent the usage of meta-data ?

This is going to get hairy here, this is question that I have been thinking about for many years!
The short answer is : there is nothing stopping any program from reading the meta-data of free software.

Reading the meta-data does not create a derived work. The meta-data of a object-program is Copyright covers the copying of the derived works. Of course if the structure of the meta-data is context specific and is a derived work of the object-program.

The solution to this entire problem can be stated as follows :

Any meta-data about a object-program that is intercepted from inside a meta-program in a the foreign program-context can be translated into a user-context without creating a derived work, only the translation routine is derived from the structure of the foreign context.

Because of the amount of data available about free software, open source and even shared source software they are all able to be translated in this manner.

The conflict between free software context and the open meta-data

The user is interested in practicality, simplicity and interactivity. The free software as a meme is interested in memotic success, replication and mutation and the controlling of resources. These two are at odds. Free software tries to protect itself to by making access to the meta-data to be impractical, complex and non interactive. The introspector has the goal of resolving this conflict and making the meta-data accessable by the user.

Conclusion

Source Code is in the end just meta-data that flows in a network of meta-programs. The communction between these meta-programs are handled via primitive mechanisms that inhibit sharing of data.

Via modification of the meta programs, a man in the middle attack can be implemented to intercept the messages from the programmer to the computer, augment this message with contextual information and unify it into a global knowledge base. Given a critical mass of meta-data the messages and data flows of a program can be understood.

This represents an end the existing concepts of using a function creating a derived work for the very fact that the compiler and linker can semit automatically create wrappers, interceptors, serializers and introspection code for any source code that is embedded in a critical mass of meta-data.

This represents a shift in power away from the creators of meta-tools to the users of them and will give more freedom to the users of free software.

[1] metainformationsthoerie http://www.heise.de/tp/deutsch/kolumnen/lem/5443/1.html
[2] Richard Gabriel : Worse is better http://www.jwz.org/doc/worse-is-better.html

Thursday, February 24, 2005

Lambda:Rule idea

[a lambda:Rule;
lambda:rule set_homepage;
lambda:args (:nick,:uri); lamba:string "^addturtle [a foaf:Person; foaf:nick :nick; foaf:homepage :uri]."]


where lamba:rule is the name of the rule, lambda:args is a list of the args and lamba:string is the string to replace the args with. That could be used to define the rules in turtle

Wednesday, February 23, 2005

Introspector Lightning talk at FOSDEM 2005

Speech for 15 minute short lightning presentation on the introspector on Sunday the 27th of Feb. at the FOSDEM.

Because I have problems with timing my presentations in the past, I have decided to write a script for my 15 minutes to make sure that we get the most information packed in as possible.

After reviewing my material, I have discovered that there is enough material for at least an hours presentation. I have moved it out to my blog and you can find it http://rdfintrospector.blogspot.com/2005/02/removed-text-from-introspector.html

Introduction : 1 minute

Hello all, thanks for showing up today to listen to my presentation. I would like to talk to you about something that I have been obsessed with for years : the true nature of programs that process other programs, what I would like to call "meta-programs".

Because of the time limitations on this speech, I will not be able to take any questions, or be able to go into much detail. The purpose of this presentation is to state once and for all the scope and purpose of the introspector project and call out for support from the free software community.

Let be start by stating with my personal historical motivations and the core questions to be answered by the introspector project, then get right to the core ideas that I would like to imprint upon you while providing definitions for the terms I will be using.

I will not be able to presenting supporting details for my theory, and or present a full the history and current state of the project because of the lack of time. I have however included it in this paper for your review and look forward to discussing it with you after my lightning talk if you are interested.

I think it is more imporant to understand the scope and setting of the introspector than to understand how it is currently implemented. The point is that introspection is a mental process, it is a way of thinking more than it is a software.

The Original Motivation : RAD 2 minutes (3rd minute)

When I first started learning computer programming as a teenager in the 80s, I was drawn to the ideas of turbo prolog which I played around with, but never really could make use of it. What I did make use of however and become fascinated with was the DBASE III system which was widely used at the time.

The thing that made DBASE so attractive was that it is so simple, practical and interactive. I followed the evolutionary path of these simple database systems from DBASE III, to DBASE IV, to Clipper, to Borland Paradox, and finally to Microsoft Access, I became convinced of the power of simple database solutions.

RAD (Rapid Application Development) was one of key ideas of the 80s, and I was imprinted at an early age by this idea. The Usage of Screen Painters, Simple interactive development environments, Program Generators and Reporting tools were the keys idea of RAD.

Later when I started to seriously program in C and SQL I was disappointed with the amount of work and resources that were needed to be put into creating the same simple functionality that was available in DBASE III! I longed for the a way to be iterate over the fields of a record in the simple manner that you could do in DBASE. This functionality was key in allowing for the creation of screen painters, report generators and all types of really useful programs. In short, RAD!

I was deeply interested in all types of tools to make this work simpler, and looked into Case Tools, persistence toolkits, and in the end, wound up writing my own program generators for C and C++ from the very beginning that emulated the best parts of what I had with DBASE!

OEW: or why write your own parser ? 1 minute (4th minute)

I worked for Innovative Software back in 94, now called IS-teledata. I was attracted to the now discontinued program OEW, the object engineering workbench, a c++ round-trip computer aided software engineering (case) tool. It could parse out your C++ code, allow it to be edited in a simple self-styled diagram (this was before UML, and the fact is that Booch's clouds at the time were just too complex to draw!) and it could finally regenerate the new code right back out, producing documentation and reports. They had a lossy C++ parser, it could not handle all C++ code you threw at it. My question at the time was, "why don't we just use the GCC compilers parser"? I had then gotten a copy of the source code of GCC and tried to read it! I was LOST in the complexity of the code. There was no way that I could make sense out of it, I did not even know where to start!

This however was the second key motivational idea behind the introspector project. And now, 10 years later I have started to answer that question.

Part of the answer to that question is a second question if the GPL can prevent the usage of the parser by another program ? The short answer is there is nothing preventing this from happening! What if the parser were to emit all the data that it contains about program at hand into a readable format? I will try and that question in more detail later after I define my terms.

Why doesn't the Compiler have a public data model and an external representation of it data? 1 minute (fifth minute)

The next question that had to be answered was why there is a lack of a model from compilers internal data! The OEW tool was also lacking this feature. At the time I wanted an API into the OEW case tool, a way to get at the data, so that I could create a RAD like tool for C++ and have the features of DBASEII!

This was the key problem that prevented me from proceeding on many levels. My answer to this question is presented here : The model of the compiler data is really the same as the model of communication itself, this communication is context specific, and between the programmer and an software agent working on behalf compiler writer.

The attempts to define standards such as MOF (Meta-Object Facility) and XMI (XML Metadata Interchange) show how it is very complex and impractical it is to define the model of the metadata of software. The semantic web project is the best attempt that I have seen so far at being able to capture and annotate the models of software. That is why I am using RDF and OWL as the basis for the storage of the data of the introspector.

Core Ideas

Here are the core ideas and definitions of the introspector project, If you leave this presentation today with these imprinted in your mind, then I will have been successful :
  1. The introspector is a pattern for the behaviour of the programmer, a process that is applied to your software with assistance of the introspector tools.
  2. All software programs, source code and binaries are messages from the author of them to other people and agents that represent them. In the end the processor, the chip, is an agent that represents the chip producer but is acting on behalf of the owner of the chip. The chip is communicated with to be told what to do by the programmer. The compiler is an agent that acts as an intermediary between the author the program and the chip itself. The programmer produces software that acts an agent that is told what to do by the user and then translates that message via a network of messages to the chip, while the software is running. (1 minute. 6th minute)
  3. Communication is context specific. The language that the chip understands is deeply tied to the chip itself. Communication with it is context specific, it requires an understanding of the current state of the chip and the computer system to be efficient and effective. In addition it is dependant on the wiring of the chip and the features and functionality provided by it. Communication with compiler is also context specific, it provides a simple layer of abstraction above the chip itself, but it is not able to fully distance itself from it. The program itself is also written in a context and executed in another one. All of these contexts are different and communication between parties and agents in separate contexts is inhibited by the accidental complexity occurring when translating the message between two and more contexts. (1 minute. 7th minute)
  4. Meta-Programs are programs and agents that process these messages from the programmer. These agents communicate with each other, in general via a whole bunch of incompatible file formats and data structures, all very messy.
  5. Meta-Data is the data that about software, it is the sum of all the data that is processed and passes through all of the the meta-programs. The Source Code of program can be considered in this framework to be meta-data, but on the lowest level because it is not structured explicitly. Only after it has been processed and split up by the meta-programs does it contain more information and is more useful. This added information is the meta-data that we are really interested in. (1 minute. 8th minute)
  6. Object-Programs are the real instances of the software that is being executed by the user. The binary code of the object-program is itself meta-data that is emitted by the compiler as a message to the chip. The full trace of all the meta-data associated this object-program is defined to be all the data that is used to produce the binary code. A full trace of all the meta-data, of all the messages that were used to produced the object-program during the entire build process is what we are interested in collecting and understanding! (1/2 minute. 9 1/2th minute)
  7. Object-Data is the data that is contained and processed by the Object-Program. This object-data can be partially understood by looking at, and cross referencing it with meta-data we collected about the object-program during the build. But, we also need to follow the trace of the object-data through the program itself! That means we need to know all the data that flows through the final object-program running on the users computer and capture that! If we have all the meta-data about the object-program's build, and we know the entire flow of the object-data though that object-program, and have a trace of the execution of the object-program, then we can begin to understand the structure and the source of the object-data! To collect these traces, we have to modify the object-program and teach it to intercept and enrich the object-data with meta-data, and collect the execution traces, or we need to create a better debugger or even kernel module that can do so. If this proposed introspector kernel module was able to access the full meta-data about the build of the object program, then it could automatically collect and start decoding these traces! But, In addition we will need to capture data about the execution context of the object-program in order to begin to understand data originating out outside of the system, by using documented test cases and benchmarks we can give solid descriptions and meanings to the execution context. (2 minutes. 11 1/2th minute)
  8. Reflection is the process of collecting meta-data and processing it by the programmer or user. It is the basis for writing meta-programs. The programmer needs to be able to query and even update the meta-data about the object-program in order to use reflection to it fullest capacity. Programmer and User Specific Code that is executed at compile time would allow the most powerful form of reflection, the ability to add in new processing instructions and patterns into the compiler itself. This would require communication from the user context or programmer context back into to the context of the compiler. The introspector aims at providing this ability by opening up communication channel between the users and programmers to that of the compiler developer! Now, more simply, Reflective code that is executed at run-time needs to be able to access and maybe even update the of meta-data of the program which can be stored in file or embedded into a shared object. (1 minute. 13 1/2 th minute)
  9. Introspection is the process of a user or programmer evaluating the results of reflection, it is normally motivated by the need to learn about the object-program, or a concrete problem in the object-program or need for a feature. The full set of meta-data, including traces of the object-data, are evaluated in the context of that concrete problem. Ideally the introspection would be started with a input file that describes the exact nature of the results to be gained.
  10. Resolution is the creation of concrete changes to the object program, It will normally result in a set of meta-data describing the new things to implement.
  11. Execution is the final commiting and implementation of these changes to the object program. This includes generation of code, creation of packages, the communication of meta-data back into the context of the build environment.
  12. Interception is the process of intercepting, and capturing the message between two meta-programs or between two functions in a meta-program.
  13. Enrichment is the process of adding in more context data and more meta-data to the existing set of data. This feeds understanding.
  14. Visualization is the process of selecting, focusing, filtering, layouting the meta-data and feeding the results to the visual cortex of the user for further pattern matching.
  15. Understanding the human mental process that involves the visualization of the results of the introspection, and the refocusing of that process on arising open questions until the mind has built an internal mental model of the software. Understanding involves translating between contexts, and the creation of abstract contexts.
  16. The mind of the programmer and user of the final program is what is feeding information back into to the meta-program, and therefore the interface to the user and programmer must be as good as possible. The current form of using many different languages and formats of the meta-data creates accidental costs in the communication. The actions of the programmer can be seen as following some process and program, these actions are then codified in the meta-program.
Thank you for listening to my speech,

I hope that I have explained the motivation and the goals of the introspector project to you. If you are interested in hearing more about it, please contact me at mdupont777@yahoo.com, or jabber me at mdupont@nureality.ca, or visit the introspector irc chat at irc.freenode.net:#introspector