Towards a Rigorous Logic for Spatial Data Representation

logo NCGeo

Towards a Rigorous Logic for Spatial Data Representation

lettergrootte lettergrootte verkleinen Lettergrootte verkleinen
Print
E-mailadres

PoG 65, Rodney James Thompson, Towards a Rigorous Logic for Spatial Data Representation

Rodney James Thompson

Publications on Geodesy 65
Delft, 2007. 336 pagina's. ISBN: 978 90 6132 303 7.

Abstract

The storage and retrieval of spatial data in computer systems has matured greatly over recent years, from the earliest approaches (of simple digitised linework and text) to the representation of features and their attributes, with the semantics of their behaviour associated. This has led to massive cost savings where data, which might have been captured for a specific purpose, can be shared and reused for other purposes.

In this first generation of Geographic Information Systems (GIS), the data is stored locally, with each vendor using different nomenclature and definitions of spatial objects and having very different rules for what is accepted as ''valid''. As a result a scientist using a desktop GIS may need to expend a considerable portion of his/her research effort and funds in translating, cleaning and preparing pre-existing data to convert to the form required for the study.

For some years now, there has been a trend towards spatial data being housed within a database management system, these being considered as a corporate resource, leading to the realisation that the geographic data itself is in fact an infrastructure, in the same way as is, for example, a telephone network. This moves the ownership of the data from the desktop, firstly to the corporation, and ultimately to being a shared resource between public and private organisations - a Geographic Information Infrastructure (GII).

An inhibiting factor in these trends is the lack of standardisation alluded to above. Where every data sharing operation involves manual intervention, it is difficult, if not impossible to create a GII. Thus a strong and consistent set of standards is needed, with the most basic requirement being for consistency in the geometric concepts used. While progress is being made by groups such as the International Standards Organisation Technical Committee 211 (ISO TC211) and the Open Geospatial Consortium (OGC), there is still much to be done.

The success of these standardisation efforts has been compromised by the requirement to be vendor neutral - i.e. to avoid specifying an internal representation to be used for storage. For example, the standards will remain silent on whether coordinate values should be stored in floating point or integer format.

As a result, definitions of spatial objects are expressed in mathematical terms assuming an infinite precision real number system, with the details of how this is to be translated into the computational representation being left to the implementer. Therefore there is no agreed normative meaning of the ''equals'' predicate when applied to geometric objects, and definitions of validity are in general left to the implementers.

If the standardisation effort is to allow spatial data to be interchanged without expensive manual intervention, a well defined logic is needed to underpin the standards and support the definition of validity of that data. This would also ensure that inferences drawn from the digital model remain consistent and do not lead to logical fallacies.

The language of spatial databases is couched in the language of mathematics, with operations being given names such as ''union'' and ''intersection'' and using vector-like representations. This naturally leads to the impression that the representations form a topological and/or vector space. Unfortunately this is not the case. Generally speaking, the rigorous mathematics used in the definition of spatial objects ends outside the database representation, which is only an approximation of the theoretical formalism used to define it.

This thesis documents a number of cases that illustrate the potential breakdown of logic to be found in current technology, for example, cases where the union or intersection operations lead to inconsistent results. Various alternative approaches that have been investigated in search of solutions are discussed, and their advantages and disadvantages indicated.

This current research has been motivated by an attempt to apply the mathematical approach to the actual representation of spatial features within the computer system. In this rigorous approach, the assumptions (or ''axioms'') are clearly identified, and used to develop a chain of argument, leading to a proof of the required proposition. The advantage of this approach in the field of spatial data representation is that, if the computer hardware can be verified to obey the axioms, then the correct results of the algorithms are assured.

In order to facilitate such a chain of proof, a form of representation known as the regular polytope has been defined, based on a small set of axioms and definitions, and shown to possess a consistent and complete logic. That is to say, the computational representation itself expresses the algebraic formalism, rather than being an approximation to an idealized mathematical model.

Thus this representation is capable of providing a potential storage structure for a useful class of features, but this should not be seen as the sole object of the research. Rather the regular polytope should be seen as an exemplar for any approach to spatial data representation and storage.

The fact that this particular representation can be axiomatically defined and implemented demonstrates that such an approach is feasible, and opens the possibility that all computational representations can be similarly analysed. The regular polytope is a particularly tractable construct for this type of analysis, which is the reason for choosing it. By contrast the kind of structure embedded in many current systems is far more complex. In particular, floating point numbers add a significant level of complexity, and only the most basic topological behaviour has been proved where floating point operations are assumed.

Based on integer and domain restricted rational arithmetic, it is shown that the logic of topology, the Boolean connection algebra and the region connection calculus can be expressed directly by the database implementation. Thus a database built on this structure cannot suffer from the kinds of breakdown of logic discussed above. In addition, this raises the prospect of a definition of validity and robustness of representation that is not vendor specific.

A regular polytope representation of spatial objects is defined as the union of a finite set of (possibly overlapping) "convex regular polytopes", which are in turn defined as the intersection of a finite set of half spaces. These half spaces are defined by finite precision number representations. The term ''Regular Polytope'' here does not carry its conventional meaning as the generalisation of a regular polyhedron (one having equal sides, faces and angles etc.). In the form used here, it combines the topological term ''regular'' with the conventional geometric meaning of ''polyhedron''.

The actual definition is given in axiomatic form, structured so as to form a ''boundary free'' representation, valid in any number of dimensions. Although it is explored here principally in 3D, particular reference is made to the mixture of 2D and 3D found in many current application areas such as cadastral property boundaries. Particular attention is paid to the issue of connectivity, both within and between regular polytopes, and the resultant logic is developed in terms of well studied concepts such as the region connection calculus.

The particular representation chosen for the half space is such that adjoining regular polytopes will have no points in common, and no points will exist between them. Thus it is possible to define a complete partition of space where every point that can be represented computationally is defined to exist in one and only one region. In the traditional representations of 2D polygons and 3D polyhedrons, points play a very important role of carrying the metric information. This is in contrast to regular polytopes where points do not play a role in the definition at all. Instead the metric is specified via the half planes using 3 or 4 integers (in 2D and 3D respectively).

This theoretic basis is then applied to actual database schema design, and several alternative models proposed and analysed. As a check on the practicality of the algorithms, ''proof of concept'' classes have been developed in the Java programming language, and tested on a significant set of cadastral parcels (2D and 3D) from the Queensland cadastre.

Finally, further areas of research are identified, including extensions of the approach to wider problem domains.

Introduction 13
Case Studies 31
Related Work and Theory 47
The Regular Polytope Representation 85
Connectivity in the Regular Polytope Representation 113
Algebras of Connectivity 139
The Data Model 167
Implementation Issues 199
Review of Case Studies 221
Conclusions 229

Bibliography 239
Appendices I - VII 249
Summary 325
Nederlandse Samenvatting 329

Samenvatting

De opslag en bevraging van ruimtelijke gegevens in computersystemen heeft de laatste jaren flinke vooruitgang geboekt, vanaf het allereerste begin (digitaliseren van het lijnenwerk en de tekst) tot aan de huidige representatie van objecten en attributen, aangevuld met de semantiek. Dit heeft geleid tot een enorme kostenbesparing daar waar gegevens, die voor een bepaald doel worden ingewonnen en bijgehouden, nu ook gedeeld en hergebruikt kunnen worden voor andere toepassingen.

In de eerste generatie GIS had elke leverancier zijn eigen begrippenkader met bijbehorende definities van ruimtelijke objecten waarbij zeer verschillende regels werden toegepast om te bepalen wanneer een ruimtelijk object als geldig werd gezien. Momenteel kan een wetenschapper die een GIS gebruikt gedwongen zijn een aanzienlijk gedeelte van zijn/haar onderzoeksinspanning en -fondsen te besteden aan het vertalen, opschonen en voorbereiden van reeds bestaande gegevens om deze in de vorm te krijgen die voor het onderzoek nodig is.

Gedurende enige tijd is er een ontwikkeling richting het beheer van ruimtelijke gegevens in een database management systeem, dat als collectief bedrijfsmiddel wordt beschouwd. De volgende fase in deze ontwikkeling is het besef dat geografische gegevens zelf onderdeel van de infrastructuur zijn, vergelijkbaar met bijvoorbeeld een telefoonnetwerk. Hiermee verhuist het beheer van de gegevens van de lokale computer, via de bedrijfsorganisatie, uiteindelijk naar de infrastructuuromgeving als een gedeeld hulpmiddel tussen de openbare diensten en private organisaties - een Geografische Informatie Infrastructuur (GII).

Een belemmerende factor in deze ontwikkelingen is het hierboven geïmpliceerde gebrek aan eenduidigheid (standaarden). Wanneer bij de dagelijkse gegevensuitwisseling steeds handwerk nodig is, zal het moeilijk - zo niet onmogelijk - zijn om een GII te realiseren. Daarom is een degelijke en consistente verzameling standaarden nodig. De meest basale vereiste is standaardisatie van de gebruikte geometrische concepten. Hoewel er al de nodige vooruitgang is geboekt door groepen zoals Technisch Comité 211 van de Internationale Standaardisatie Organisatie (ISO TC211) en het Open Geospatial Consortium (OGC), moet er op dit gebied nog steeds veel gedaan worden.

Het succes van de standaardisatieactiviteiten wordt beperkt door de eis van een zuivere leveranciersneutrale aanpak - zoals het voorkomen om betrokken te raken bij de kwestie hoe ruimtelijke gegevens worden omgezet naar een interne representatie geschikt voor opslag. Zo zwijgen bijvoorbeeld de standaarden of de coördinaatwaarden in drijvendekomma-of gehele-getallenformaat zouden moeten worden opgeslagen. Dientengevolge worden de definities uitgedrukt in wiskundige termen, de oneindige nauwkeurigheid veronderstellend van reële getallen. De details ten aanzien van hoe dit dan in de drijvendekomma-of gehele getallen van de computersystemen moet worden vertaald worden aan de uitvoeder/programmeur overgelaten. Zo is er geen gestandaardiseerde interpretatie van het predikaat ''is gelijk'' wanneer toegepast op geometrische objecten. Verder worden de definities van valide objecten in het algemeen bepaald door de ontwikkelaars van implementaties.

Als standaardisatieactiviteiten moeten leiden tot een situatie waarbij ruimtelijke gegevens zonder handmatig ingrijpen kunnen worden uitgewisseld, dan is er een strenge logica nodig om de standaarden te onderbouwen en om de definitie van valide objecten te specificeren. Dit zou verzekeren dat de gevolgtrekkingen die op basis van een digitale representatie worden getrokken consistent zijn en niet tot logische fouten leiden.

De taal van de ruimtelijke databases is ingebed in de taal van de wiskunde met operatienamen zoals ''vereniging'' en ''doorsnede’’ en het gebruik van vectorachtige representaties. Dit leidt natuurlijk tot de indruk dat de representaties een topologische ruimte vormen (en/of een vectorruimte). Wat helaas niet het geval is. Over het algemeen, is de streng-wiskundige definitie van ruimtelijke objecten niet geldig buiten de databaserepresentatie, omdat het slechts een benadering is van het theoretische formalisme dat gebruikt is bij de definitie.

Dit proefschrift beschrijft een aantal gevallen van falende logica binnen de huidige technologie, zoals bijvoorbeeld situaties waarbij de operatoren ''vereniging'' en ''doorsnede'' tot inconsistente resultaten leiden. Diverse alternatieve benaderingen die zijn onderzocht worden beschreven, waarbij van elk de voor- en nadelen worden aangeduid.

Het huidige onderzoek wordt gekenmerkt door een poging om een wiskundige basis te kiezen voor de feitelijke representatie van ruimtelijke objecten in een computer. In deze rigide aanpak zijn de aannamen (axioma’s) duidelijk gedefinieerd en gebruikt om een keten van argumenten samen te stellen, die tot een bewijs leiden van de gewenste eigenschap van voorspelbaar en correct gedrag. Het voordeel van deze aanpak voor de representatie van ruimtelijke objecten is dat, indien de computerhardware aantoonbaar aan de axioma’s voldoet, de correcte werking van de algoritmen gegarandeerd kan worden.

Om een dergelijke bewijsketen te faciliteren is een representatievorm, bekend onder de naam ''regulier polytoop'' gedefinieerd (op basis van een aantal axioma’s en definities) en onderzocht. Hierbij is aangetoond dat deze een consistente en compleet gedefinieerde logica bezit. Dat betekent dat de opslagstructuur in de computer zelf dit algebraïsche formalisme heeft, in plaats van een benadering te zijn van een geïdealiseerd wiskundig model.

Het regulier polytoop is een representatie die als opslagstructuur voor ruimtelijke objecten kan dienen. Dit moet niet gezien worden als het enige onderzoeksobject, maar eerder als een exemplarisch voorbeeld voor rigide methoden om ruimtelijke gegevens te representeren en op te slaan.

Dat deze specifieke representatie rigide gedefinieerd en geïmplementeerd kan worden demonstreert dat een dergelijke rigiditeit mogelijk is en opent de mogelijkheid dat andere computerrepresentaties soortgelijk geanalyseerd kunnen worden. Het regulier polytoop is een bijzonder handelbaar concept voor dit type van analyse, vandaar de keuze voor dit concept. Dit in tegenstelling tot de structuren in de hedendaagse systemen, die veel complexer zijn op dit vlak. In het bijzonder leiden drijvende-kommagetallen tot een aanzienlijk hoger niveau van complexiteit en alleen de meest basale topologische eigenschappen kunnen bewezen worden wanneer drijvende-kommaoperaties worden gebruikt.

Gebaseerd op gehele of domein-beperkte rationele-getallenrekenkunde wordt aangetoond dat de strenge logica van topologie, de ''Boolean connection algebra'' en de ''region connection calculus'' gerealiseerd kunnen worden in de database-implementatie zelf. Aldus lijdt een op deze structuur gebaseerde database niet aan de eerder besproken falende logica. Bovendien komt er hierbij zicht op een definitie van geldige (valide) objecten en robuuste representaties die niet leveranciersspecifiek zijn.

Een regulier polytoop representatie van ruimtelijke objecten is gedefinieerd als de vereniging van een eindige verzameling van (mogelijk overlappende) convexe reguliere polytopen, welke op hun beurt weer gedefinieerd zijn als de doorsnede van een eindige verzameling van halfruimten. Deze halfruimten zijn gedefinieerd door getalrepresentaties met eindige nauwkeurigheid. De term regulier polytoop zoals hier gebruikt is een combinatie van het topologische begrip ''regulier'' (verzameling die gelijk is aan het binnenste van zijn afsluiting) met de gangbare geometrische betekenis van ''polytoop'' (de n-dimensionale veralgemenisering van een polyhedron).

De feitelijke definitie is gegeven in axiomatische vorm, zodanig gestructureerd dat het een representatie zonder expliciete grenzen vormt, welke geldig is in elke dimensie. Hoewel het hier vooral in 3D gebruikt wordt, wordt er ook een specifieke vermelding gemaakt van het gecombineerde gebruik in 2D en 3D. Dit heeft dan vele toepassingsgebieden, zoals bijvoorbeeld kadastrale eaigendomspercelen. Bijzondere aandacht is besteed aan het onderwerp ''verbondenheid'', zowel binnen een regulier polytoop als tussen meerdere reguliere polytopen. De bijbehorende logica is ontwikkeld in termen van goede eerder bestudeerde concepten, zoals de ''region connection calculus''.

De specifiek gebruikte representatie van halfruimten is zodanig dat naburige reguliere polytopen geen enkel punt gemeenschappelijk hebben en dat er ook geen enkel punt tussen hen invalt. Het is dus mogelijk om een complete partitie (opdeling) van de ruimte te maken, zodanig dat elk computationeel representeerbaar punt precies in één gebied valt. Bijzonder is ook dat het regulier polytoop geen punten gebruikt om de metrische informatie te representeren zoals dit wel gebeurt in de meer traditionele weergaven van polygonen (in 2D) en polyhedra (in 3D). Bij reguliere polytopen wordt deze metrische informatie gespecificeerd via de halfruimten door drie of vier (in respectievelijk 2D en 3D) gehele getallen.

Deze theoretische basis wordt vervolgens toegepast op een daadwerkelijk databaseschemaontwerp, waarbij verschillende alternatieven onderzocht worden. Als controle voor de praktische haalbaarheid van het concept is een verzameling Java-klassen ontwikkeld en getest met een flink aantal kadastrale percelen (2D en 3D) van het kadaster van Queensland.

Tot slot worden de verdere onderzoeksgebieden geïdentificeerd, met inbegrip van uitbreidingen van de gepresenteerde aanpak naar andere probleemdomeinen.

terug naar boven

Towards a Rigorous Logic for Spatial Data Representation

Rodney James Thompson

Abstract

Contents

Samenvatting