GCC Hacker: Tuesday, December 21, 2004

Dear fellow hackers, I would like your advice, on this new proposed syntax for cwm and rdf.

There is one thing that has been bothering me about CWM, RDF/XML, N3, Turtle, and ntriples : The lack of ability to *easily* define and process *nice looking* structure that are larger than three. When I say, *pretty*, of course this is a relative statement of personal preference.

My Goal : Find a simple representation for making complex trees in RDF more pleasing to the eye.

Disclaimer : Maybe this is possible, maybe not with the current set of tools. I have not done all needed research either, this page collects my limited knowlege about the subject at the moment, hopefully others can give me pointers in the right direction. Maybe you will find my viewpoint amusing or interesting. RDFPath seems to be going in the right direction, I need to read more about this. Here is a article that takes a shot at rdf and gets blasted, I dont want do have that happen here.

Of course you can define bags, lists, alts using rdf. You can also define chains of objects with triples between them. This however does not have a pleasing syntax!

Lets get back what these triples are. The triple defines the edge in a graph, the starting point, the path and the ending point. But, the real root of the problem, the notation is secondary. The problem with notation, and the idea of triples

RDF Primer: "Sometimes it is not convenient to draw graphs when discussing them, so an alternative way of writing down the statements, called triples, is also used. In the triples notation, each statement in the graph is written as a simple triple of subject, predicate, and object, in that order. [...] Each triple corresponds to a single arc in the graph, complete with the arc's beginning and ending nodes (the subject and object of the statement). [...] However, the triples represent exactly the same information as the drawn graph, and this is a key point: what is fundamental to RDF is the graph model of the statements. The notation used to represent or depict the graph is secondary."

When we question the idea triple, and we want to get out of its limitations, we might end up going in the wrong direction! If you trapped in a triple, you go to a quad! then you want to get a pent (or even a suite )! I dont want to go there.

Why three? I think that this is a number that needs some questioning. I am firm beliver in questioning beliefs and assumptions. I want to take this back to the roots. But I am going to expand on the meaning of three in the human mind.

Three is the lucky number : Mr Ogbuji writes Thinking XML: Introducing N-Triples: "Three is the lucky number" That is a good start, but he does not address why the three.

The number 3 has a deep religious history.

According to the Numerology - Wikipedia, the free encyclopedia: "3 Three relates to expansiveness and learning through life experiences. It is considered to be lucky, and is often associated with money and good fortune. Three generally depicts several people joining together to achieve a common goal, whether through a social or professional affiliation. Although three possesses attributes of wisdom, understanding and knowledge, negatively it can exhibit pessimism, foolhardiness and unnecessary risk taking."

Chunky Gulas?

I say, keep it simple silly. The mind likes three, it is that simple.

Now, I have heard that the mind is able to remember three chunks of 2, but cannot find a reference to that. It is easy to count to three. The Mnemonic page on the wikipedia also gives reference to Chunking. This article discussed 7-+ 2 on webpages. EET Templates: "Because STM's capacity is limited to seven items, regardless of the complexity of those items, chunking allows the brain to automatically group certain items together. There is a interesting discussion of Chunking in the Natural Language Toolkit. Here is also an nice article on CHUNKING AND PHRASING AND THE DESIGN OF HUMAN-COMPUTER DIALOGUES.

The problem :

The concrete problem that I have is quite simple. I want to represent trees easier in cwm.

Lets say that I have a graph of rdfs:type relationships, as an example.

Given the following turtle/n3 file :


@prefix : <#> .

:b a :a.

:b2 a :a.

:c a :b.

:c3 a :b2.

:d a :c.

:d2 a :c.

:d3 a :c3.

:d4 a :c3.

:d5 a :c3.

and the following cwm --filter

@prefix : <#> .

@prefix log: .

this log:forAll :s,:t,:u,:v.

{:t a :s. :u a :t.} log:implies { :s :t :u.}.

It produces the following, *pretty* output:


#Processed by Id: cwm.py,v 1.144 2003/09/14 20:20:20 timbl Exp   

#  Notation3 generation by

#       notation3.py,v 1.146 2003/09/14 20:20:24 timbl Exp

   @prefix : <#> .

   @prefix log:  .



 :a     :b :c;

       :b2 :c3 .



  :b     :c :d,

              :d2 .



  :b2     :c3 :d3,

              :d4,

              :d5 .



#ENDS

Of course this is no longer the rdf that you know, because the :b and :b2 are not edges like you know them, The edges implied by the structure of the graph.

I would like to say that this graph has an implied tree structure, where downwards movement in the tree is the inverse of
rdfs:type
relationship.

This is great for simple inheritance hierarchies. But what I would really like is this


  :a      :b     :c :d,

                    :d2 ;

          :b2     :c3 :d3,

                      :d4,

                      :d5 .

You may ask, what does the ";" mean then? Well, it means close off this triple in the tree, going back three steps. So, ":d2;" means close off {:b :c :d2.}

Now, lets look at a more complex example

   :a     :b :c;

       :b2 :c3 .



  :b     :c :d,

              :d2 .



  :b2     :c3 :d3,

              :d4,

              :d5 .



  :c     :d :e3,

              :e4,

              :e5 .

How would that look as a tree?


   :a    :b     :c    :d :e3,

                            :e4,

                            :e5;

                        :d2 ;

       :b2     :c3 :d3,

                        :d4,

                        :d5 .

I think that this would be very easy to implement as a parser and a serializer.

The key would be to give the cwm a tip about the direction of the tree,
by defining the document as a tree, and the predicate to use for the direction it could be possible to parse and generate that tree document.

I would propose a new set of terms to describe how to parse this new syntax :
tree.owl this is the owl ontology file that defines a tree class and direction
tree.n3 this is the n3 equivalent
tree-build.n3 This is a cwm program to build a tree
tree-test.n3 this is the test data.

I look forward to your comments.

mike

GCC Hacker

Tuesday, December 21, 2004

Abusing CWM and n3 to grow bushy trees

About Me

Links

Previous Posts

Archives