\documentclass
[fleqn,runningheads%
%if dvipdfm
,dvipdfm
%endif
]{llncs}
% Babel destroys the runningheads:
%\usepackage[english]{babel}
\usepackage{xspace}
\usepackage{url}
\usepackage{afterpage}
%include lhs2TeX.fmt
%include lhs2TeX.sty
%include polycode.fmt
%include forall.fmt
%include spacing.fmt
%include indices.fmt
%options ghci
\usepackage{stmaryrd}
\usepackage{amssymb}
\usepackage{doubleequals}
\title{``Scrap Your Boilerplate'' Reloaded}
\titlerunning{``Scrap Your Boilerplate'' Reloaded}
\author{Ralf Hinze\inst{1} \and Andres L\"{o}h\inst{1} \and Bruno C.~d.~S.~Oliveira\inst{2}}
\authorrunning{R.~Hinze, A.~L\"{o}h, and B.~Oliveira}
\institute{Institut f\"{u}r Informatik III, Universit\"{a}t Bonn%
\\R\"{o}merstra\ss e 164, 53117 Bonn, Germany%
\\\email{$\{$ralf,loeh$\}$\symbol{64}informatik.uni-bonn.de}
%\\\texttt{http://www.informatik.uni-bonn.de/\homedir ralf/}
\and
Oxford University Computing Laboratory%
\\Wolfson Building, Parks Road, Oxford OX1 3QD, UK%
\\\email{bruno\symbol{64}comlab.ox.ac.uk}%
%\\\texttt{http://www.cs.uu.nl/\homedir$\{$ralf,johanj,andres$\}$/}%
}
%if techreport
\subtitle{Extended version -- \today}
%endif
%if False
> {-# OPTIONS -fglasgow-exts -fallow-undecidable-instances -fallow-overlapping-instances #-}
The latter two options above are only required for the class-based implementation.
> module SYB0 where
> import qualified Prelude
> import Prelude hiding (sum, show)
> import Control.Monad (msum)
> import Data.Maybe (fromJust)
> import Data.Typeable -- only required for the class-based implementation
> dots = undefined
> pair = (,)
%endif
%format ... = "\dots"
%format dots = "\dots"
%format Dots = ...
%format space = " "
%format TDots1 = ... "\gobble{"
%format TDots2 = TDots1
%format TDots3 = TDots1
%format TDots4 = TDots1
%format tdotsend = "}"
%format == = "\doubleequals"
%format pair = "(\mathbin{,})"
\newcommand{\gobble}[1]{}
\newcommand{\fp}{\Xenspace.}
\newcommand{\fc}{\Xenspace,}
\let\Xenspace=\enspace
\let\originalinlinehs=\inlinehs
\def\inlinehs{\let\Xenspace\empty\originalinlinehs}
\newcommand{\technical}[1]{\textsl{#1}}% sl is ugly
\newcommand{\SYB}{SYB\xspace}
\newcommand{\GP}{GP\xspace}
\newcommand{\GADT}{GADT\xspace}
\newcommand{\GADTs}{GADTs\xspace}
\newcommand{\GHC}{GHC\xspace}
\begin{document}
\maketitle
\begin{abstract}
The paper ``Scrap your boilerplate'' (\SYB) introduces a combinator
library for generic programming that offers generic traversals and
queries. Classically, support for generic programming consists of two
essential ingredients: a way to write (type-)overloaded functions, and
independently, a way to access the structure of data types. \SYB seems
to lack the second. As a consequence, it is difficult to compare with
other approaches such as PolyP or Generic Haskell. In this paper we
reveal the structural view that \SYB builds upon. This allows us to
define the combinators as generic functions in the classical sense. We
explain the \SYB approach in this changed setting from ground up, and
use the understanding gained to relate it to other generic
programming approaches. Furthermore, we show that the \SYB view is
applicable to a very large class of data types, including generalized
algebraic data types.
\end{abstract}
\section{Introduction}
The paper ``Scrap your boilerplate'' (\SYB)~\cite{laemmel03scrap}
introduces a combinator library for generic programming that offers
generic traversals and queries. Classically, support for generic
programming consists of two essential ingredients: a way to write
(type-)overloaded functions, and independently, a way to access the
structure of data types. \SYB seems to lacks the second, because it is
entirely based on combinators.
In this paper, we make the following contributions:
\begin{itemize}
\item We explain the \SYB approach from ground up using an explicit
representation of data types, the \technical{spine view}. Many of
the \SYB library functions are more easily defined in the spine view
than using the combinators underlying the original \SYB library.
\item We compare the expressive power and applicability of the spine
view to the original \SYB paper, to PolyP~\cite{polyp}
and to Generic Haskell~\cite{loeh04exploring,ghcoral}.
\item Furthermore, we show that the \SYB view is applicable to a very
large class of data types, including generalized algebraic data
types (\GADTs)~\cite{gadt,pj04wobbly}.
\end{itemize}
We use Haskell~\cite{Haskell} for all our examples. The source code of
this paper~\cite{syb0tr} constitutes a Haskell program that can be compiled by
\GHC~\cite{ghcuser} in order to test and experiment with our implementation.
While our implementation is not directly usable as a separate library, because
it is not extensible (new data types cannot be added in a compositional
way), this deficiency is not tied to the idea of the |Spine| view:
%if not techreport
the technical report version of this paper~\cite{syb0tr} contains a slightly
less elegant implementation that is extensible.
%else
We show in Section~\ref{classbased} how to integrate the |Spine| view
with the third \SYB paper, thereby providing an extensible, albeit
slightly less elegant implementation.
%endif
In this introduction, we explain the ingredients of a system for
generic programming, and argue that the original \SYB presentation
does not clearly qualify as such a system.
In order to better understand the concept of generic programming,
let us first look at plain functional programming.
\subsection{Functional programming and views}
As functional programmers in a statically typed language, we are used
to define functions by case analysis on a data type. In fact, it is
standard practice to define a function on a data type by performing
case analysis on the input. The shape of the data type guides our
function definitions, and affects how easy it is to define certain
functions.
As an example, assume we want to implement a priority queue supporting
among others the operation
%format PQ = PriorityQueue
%format PQ' = PQ
%format insert' = insert
%format splitMinimum' = splitMinimum
> splitMinimum :: PQ -> Maybe (Int,PQ)
to separate the minimum from the remaining queue if the queue is not
empty.
%if False
> splitMinimum = undefined
> insert = undefined
%endif
We can choose a heap-structured tree to implement the priority queue,
and define
> data Tree a = Empty | Node (Tree a) a (Tree a)
> type PQ = Tree Int {-"\fp"-}
The choice of a heap as the underlying data stucture
makes the implementation of
|splitMinimum| slightly tricky, requiring an
auxiliary operation to merge two heaps.
If, on the other hand, we choose a sorted list
to represent the priority queue
> data PQ' = Void | Min Int PQ' {-"\fc"-}
we make our life much easier, because |splitMinimum|
is now trivial to define. The price we pay is that
the implementation on lists is likely
to be less efficient than the one using the tree.
Such different \technical{views} on a data structure
need not be mutually exclusive. Wadler and others have
proposed language support for
views~\cite{wadler87views,haskell13views}.
Many functions on a single data type follow common
traversal and recursion patterns. Instead of defining
each function by case analysis, it is possible to define
combinators that capture these patterns.
For instance, given functions
> foldTree :: r -> (r -> a -> r -> r) -> Tree a -> r
> mapTree :: (a -> b) -> Tree a -> Tree b {-"\fc"-}
%if False
> foldTree empty node Empty = empty
> foldTree empty node (Node l x r) =
> node (foldTree empty node l) x (foldTree empty node r)
> mapTree f Empty = Empty
> mapTree f (Node l x r) = Node (mapTree f l) (f x) (mapTree f r) {-"\fc"-}
%endif
we can write functions to perform an inorder traversal
of the tree or to increase every label in a tree by one
very concisely:
> inorder = foldTree [] (\l x r -> l ++ [x] ++ r)
> incTree = mapTree (+1) {-"\fp"-}
\subsection{Generic programming}
A \technical{generic function} is a function that is defined
once, but works for many data types. It can adapt itself to the
structure of data types. Generic functions are also called
\technical{polytypic} or \technical{structurally polymorphic}.
Genericity is different from \technical{parametric polymorphism},
where the same code works for multiple types, and the structure
of a data type is not available for analysis. It is also more specific
than \technical{ad-hoc polymorphism}, which allows a function
to be defined for different data types, by providing one implementation
for each type.
Typical examples of generic functions are equality or comparison,
parsing and unparsing, serialization,
traversals over large data structures and many others.
Support for generic programming consists of two essential ingredients.
Firstly, support for ad-hoc polymorphism is required. This allows the
programmer to write \technical{overloaded functions}, i.e., functions
that dispatch on a type argument.
Secondly, we need a \technical{generic view} on the structure of data
types. In a nominal type system, types with similar structure are
considered to be completely distinct. To employ generic programming,
we need to lower this barrier and make the structure transparent if
desired.
The two ingredients are orthogonal, and for both, there is a choice.
Overloaded functions
can be expressed in Haskell using the class system, using a type-safe
cast operation, by reflecting the type system on the value level,
or by a combination of the above.
Any of these approaches has certain advantages and disadvantages,
but they are mostly interchangeable and do not dramatically affect
the expressivity of the generic programming system.
The structural view, on the other hand, dictates the flavour
of the whole system: it affects the set of data types we can represent in the
view, the class of functions we can write using case analysis on the
structure, and potentially the efficiency of these functions. The
structural view is used to make an overloaded function truly generic,
working for a data type even if it has no ad-hoc case for that type.
For instance, PolyP views data types as fixed points of regular functors.
Therefore its approach is limited to regular data types, but the view
allows access to the points of recursion and allows the definition of
recursion combinators such as catamorphisms.
Generic Haskell uses a sum-of-products view which is more widely
applicable, but limits the class of functions we can write.
The concept of generic views is explained further in a recent
paper~\cite{holdermans2005views},
and is related to \technical{universes}
in dependently-typed programming~\cite{benke2003universes}.
In summary, it turns out that there is a close analogy between
plain functional and generic programming: the concepts of views,
function definition by case analysis, and combinators occur in
both settings.
\subsection{Scrap your boilerplate}
In analogy with the situation on plain functions, not all
generic functions are defined by case analysis. Just as there
are powerful combinators for ordinary functions, such combinators
also exist for generic programming. In fact,
the very combinators we have used above, |foldTree| and |mapTree|,
are typical candidates for generalization.
The paper ``Scrap your boilerplate'' (\SYB) describes a library for
\technical{strategic programming}~\cite{visser2000language}, i.e., it
offers combinators for generic traversals and queries on terms. Two
central combinators of the \SYB library are |everywhere| to traverse a
data structure and modify it in certain places, and |everything| to
traverse a data structure and collect information in the process.
The \SYB approach builds completely on combinators, and some
fundamental combinators are assumed
to be provided by the implementation. While this is fine in practice,
it makes it difficult to compare \SYB with other
approaches such as PolyP or Generic Haskell. The reason is that
the concept of a generic view seems to be missing. Functions are
never defined by case analysis on the structure of types.
However, the generic view is only hidden
in the original presentation. In this paper we reveal the
structure that \SYB uses behind the scenes and that allows
us to define the \SYB combinators as generic functions by
case analysis on that structure.
We will explain the \SYB approach in this changed setting from ground
up. The focus of the presentation is on conceptual conciseness.
We do not strive to replace the original implementation, but to
complement it by an alternative implementation which may be easier
to understand and relate to other approaches.
\subsection{Organization of this paper}
The rest of this paper is organized as follows:
We first describe
the two orthogonal ingredients required for generic programming
in our presentation of the \SYB approach: overloaded functions
(Section~\ref{functionsontypes}) and the \technical{spine view},
the structure that is the hidden foundation of \SYB
(Section~\ref{spineview}).
We then review the central combinators
of \SYB in Section~\ref{mapqmapt}. Section~\ref{genericshow}
shows how we can access names of constructors.
In Section~\ref{comparison}, we take a step back
and relate \SYB to other generic programming approaches.
Inspired by our analysis
on the expressiveness of the \SYB approach, we demonstrate how
to extend the spine view to generalized algebraic data types
(Section~\ref{ssyb}).
%if techreport
In Section~\ref{classbased}, we take a more detailed look at
the way that extensible generic functions are encoded in the
third of the \SYB papers. We explain how the spine view can
be used with a class-based encoding of overloaded functions.
%endif
Section~\ref{conclusions} discusses related work and concludes.
\section{Overloaded functions}%
\label{functionsontypes}
% Alternative section titles:
% Type-indexed functions
% Generic functions
% Type case
The standard way in Haskell to express an overloaded function is to
use a type class. In fact, this is the way taken by the original \SYB
papers: in the first \SYB paper, type classes are used in conjunction
with a type-safe cast operation, and in the third paper, overloaded
functions are expressed solely based on type classes. However, type
classes leave it to the compiler to find the correct instance, and
thus hide a non-trivial aspect of the program.
In this paper, we prefer to be more explicit
and emphasize the idea that an overloaded function dispatches on a
type argument. Haskell excels at embedded languages, so it seems a
good idea to try to embed the type language in Haskell. The following
way to encode overloaded functions is not new: it is based on Hinze's
``Fun of Programming'' chapter~\cite{hinze03fun} and has been used
widely elsewhere~\cite{oliveira05typecase}.
The whole point of static types is that they can
be used at compile time to distinguish programs, hence we certainly
do not want to use an unparameterized data type |Type| to represent types.
Instead, we add a parameter so that |Type t| comprises only
type representations for the type |t|. We now need ways to construct values
of type |Type t|. For instance, |Int| can be a representation
of the type |Int|, so that we have |Int :: Type Int|.
Similarly, if we have a representation |r| of type |a|, we can
make |List r| a representation of type |[a]|, or formally
|List :: Type a -> Type [a]|.
The notation we use suggests that |Int| and |List| are data
constructors of type |Type|, but this impossible in Haskell~98,
because the result type of a constructor must always be unrestricted,
i.e., |Type a| for some type variable~|a|. Fortunately, \GHC now
supports \technical{generalized algebraic data types}
(\GADTs)~\cite{gadt,pj04wobbly}, which lift exactly this
restriction. Therefore, we can indeed define |Type| in Haskell using
the following \GADT:
> data Type :: * -> * {-" "-} where
> Int :: Type Int
> Char :: Type Char
> List :: Type a -> Type [a]
> Pair :: Type a -> Type b -> Type (a,b)
> Tree :: Type a -> Type (Tree a) {-"\fp"-}
%if False
> Bool :: Type Bool
> Maybe :: Type a -> Type (Maybe a)
> Rose :: (forall a. Type a -> Type (f a)) -> Type a -> Type (Rose f a)
> Type :: Type a -> Type (Type a)
> Dynamic :: Type Dynamic
> Dots :: Type a -- dummy
For the refactoring example:
> HsExp :: Type HsExp
> HsPat :: Type HsPat
> HsName :: Type HsName
> HsLiteral :: Type HsLiteral
> HsAlt :: Type HsAlt
> HsGuardedAlt :: Type HsGuardedAlt
%endif
This type allows us to represent integers, characters,
lists, pairs, and trees -- enough to give an example of
a simple overloaded function that sums up all
integers in a value:
%format sum0 = sum
> sum0 :: Type a -> a -> Int
> sum0 Int n = n
> sum0 Char _ = 0
> sum0 (List a) xs = foldr (+) 0 (map (sum0 a) xs)
> sum0 (Pair a b) (x,y) = sum0 a x + sum0 b y
> sum0 (Tree a) t = sum0 (List a) (inorder t) {-"\fp"-}
The function |sum0| works on all types that can be
constructed from |Int|, |Char|, |[]|, |pair|, and |Tree|, for instance,
on a complex type such as |[(Char,Int)]|: the expression
\begingroup
\invisiblecomments
%{
%format = = " "
%format v0 = " "
> v0=sum0 (List (Pair Char Int)) [('k',6),('s',9),(' ',27)]
evaluates to \eval{v0}.
%}
\endgroup
The function |sum0| is an example of an ad-hoc-polymorphic
function. There are a limited number of cases for different
types, defining potentially unrelated behavior of |sum| for
these types. The function will not work on types such as |Bool| or |Maybe|
or even on a type
> newtype MyPair a b = MyPair (a,b) {-"\fc"-}
because Haskell has a nominal type system, hence |MyPair a b|
is isomorphic to, yet distinct from |(a,b)|.
%if False
Let us emphasize again that the encoding of overloaded functions
is orthogonal to the concept of a generic view, therefore the
remainder of this paper is not intricately tied to the choice
that overloaded functions are represented using an explicit argument
of type |Type|.
%endif
\section{The spine view}%
\label{spineview}
%format :$ = "\mathbin{\raisebox{.15ex}{$\scriptstyle\lozenge$}}"
%format :$: = :$
%format $$ = "\mathbin{\raisebox{.15ex}{$\scriptstyle\blacklozenge$}}"
%format Spine0 = Spine
%format Constr0 = Constr
In this section, we learn how to define a truly generic |sum|,
which works on |Bool| and |Maybe| and |MyType|, among others.
Take a look at any Haskell value. If it is not of some abstract
type, it can always be written as a data constructor applied
to other values. For example, |Node Empty 2 Empty| is the |Node|
data constructor, applied to the three values |Empty|, |2|,
and |Empty|. Even built-in types such as |Int| or |Char|
are not fundamentally different: every literal can be seen
as a nullary constructor.
Let us make the structure of constructed values visible and mark each
constructor using |Constr|, and each function application using |:$|.
The example from above becomes
%{
%format = = " "
%format v1 = " "
%format v2 = " "
%format v3 = " "
%if False
> v1=Constr0 1
> v2=Constr0 (:) :$: 1 :$: [2,3]
%endif
\begingroup
%if not techreport
\inlinehs
%endif
> v3=Constr0 Node :$: Empty :$: 2 :$: Empty {-"\fp"-}
\endgroup
%}
The functions |Constr| and |(:$)|%
\footnote{We use |(:$:)| as a symbol for an infix data constructor.
For our presentation, we ignore the Haskell rule that names
of infix data constructors must start with a colon.}
are themselves constructors
of a new data type |Spine0|:%
\footnote{Note that in contrast to |Type|, the data type |Spine0|
is not necessarily a \emph{generalized} algebraic data type.
The result types of the constructors are not restricted,
|Spine| could therefore be
defined in \GHC as a normal data type with existentials. However,
we prefer the \GADT syntax.}
> data Spine0 :: * -> * {-" "-} where
> Constr0 :: a -> Spine0 a
> (:$:) :: Spine0 (a -> b) -> a -> Spine0 b {-"\fp"-}
%format fromSpine0 = fromSpine
%format toSpine0 = toSpine
Given a value of type |Spine a|, we can recover the original
value of type |a| by undoing the conversion step made
before:
> fromSpine0 :: Spine0 a -> a
> fromSpine0 (Constr0 c) = c
> fromSpine0 (f :$: a) = (fromSpine0 f) a {-"\fp"-}
The function |fromSpine0|
is parametrically polymorphic, i.e., it works independently
of the type in question:
it just replaces |Constr| with the
original constructor and |(:$)| with function application.
Unfortunately, |fromSpine0| is the only interesting function
we can write on a |Spine0|. Reconsider the type of the
|(:$:)| constructor:
< (:$:) :: Spine0 (a -> b) -> a -> Spine0 b {-"\fp"-}
The type |a| is not visible in the final result (it is
existentially quantified in the data type), so the only thing
we can do with the component of type |a| is to combine it
somehow with the component of type |Spine0 (a -> b)|.
%format ::: = "\mathrel{:}"
Since we intend to call overloaded functions on the value
of type |a|, we require a representation
of the type of |a|. Our solution is thus that together with
the value of type |a|, we store a representation of its type.
To this end, we introduce a data type for
typed values\footnote{We use |(:::)| as data constructor
for the type |Typed| in this paper. The \technical{cons}-operator
for lists, written |(:)| in Haskell,
does not occur in this paper.}
> data Typed a = a ::: Type a {-"\fc"-}
and then adapt |(:$:)| to use |Typed a| instead of |a|:
> data Spine :: * -> * {-" "-} where
> Constr :: a -> Spine a
> (:$) :: Spine (a -> b) -> Typed a -> Spine b {-"\fp"-}
%if False
[Alternatively, we can use |Apply| instead of |(:$)| and
add a third argument.]
%endif
Of course, we have to adapt |fromSpine| to ignore
the new type annotations:
> fromSpine :: Spine a -> a
> fromSpine (Constr c) = c
> fromSpine (f :$ (a ::: _)) = (fromSpine f) a {-"\fp"-}
We can define a right inverse to |fromSpine|, as
an overloaded function. For each data type,
the definition follows a trivial pattern. Here
are the cases for the |Int| and |Tree| types:
> toSpine :: Type a -> a -> Spine a
> toSpine Int n = Constr n
> toSpine (Tree a) Empty = Constr Empty
> toSpine (Tree a) (Node l x r) = Constr Node
> :$ (l ::: Tree a) :$ (x ::: a) :$ (r ::: Tree a) {-"\fp"-}
%if False
> toSpine (List a) x = fromList a x
> toSpine Char x = fromChar x
> toSpine Bool x = fromBool x
For the refactoring example:
> toSpine HsExp x = fromHsExp x
> toSpine HsPat x = fromHsPat x
> toSpine HsName x = fromHsName x
> toSpine HsLiteral x = fromHsLiteral x
> toSpine HsAlt x = fromHsAlt x
> toSpine HsGuardedAlt x = fromHsGuardedAlt x
Some of the auxiliary functions:
> fromList :: Type a -> [a] -> Spine [a]
> fromList _ [] = Constr []
> fromList a (x : xs) = Constr (:) :$ (x ::: a) :$ (xs ::: List a)
> fromChar :: Char -> Spine Char
> fromChar c = Constr c
> fromBool :: Bool -> Spine Bool
> fromBool b = Constr b
Old, separated (but too long) variants of |fromInt| and |fromTree|:
> fromInt :: Int -> Spine Int
> fromInt n = Constr n
> fromTree :: Type a -> Tree a -> Spine (Tree a)
> fromTree _ Empty = Constr Empty
> fromTree a (Node l x r) = Constr Node
> :$ (l ::: Tree a) :$ (x ::: a) :$ (r ::: Tree a)
%endif
With all the machinery in place, we can now write the truly
generic |sum|:
%, given in Figure~\ref{genericsum}.
%\begin{figure}[b]
> sum :: Type a -> a -> Int
> sum Int n = n
> sum t x = sum' (toSpine t x)
>
> sum' :: Spine a -> Int
> sum' (Constr c) = 0
> sum' (f :$ (x ::: t)) = sum' f + sum t x {-"\fp"-}
%\caption{Generic |sum|}%
%\label{genericsum}
%\end{figure}
This function requires only a single type-specific case,
namely the one for |Int|. The reason is that we want to
do something specific for integers, which does not follow
the general pattern, whereas the formerly explicit
behavior for the types |Char|, |[]|, |pair|, and |Tree|
is now completely subsumed by the function |sum'|.
Note also that in the last line of |sum'|,
the type information |t| for |x| is indispensable, as we call
the generic function |sum| recursively.
Why are we in a better situation than before? If we
encounter a new data type such as |Maybe|, we still have
to extend the representation type
%if techreport
%format Type4 = Type
%format toSpine4 = toSpine
%format Maybe4 = Maybe
> data Type4 :: * -> * {-" "-} where
> TDots4 :: Type4 tdotsend
> Maybe4 :: Type4 a -> Type4 (Maybe a)
and adapt the |toSpine| function
> fromMaybe :: Type a -> Maybe a -> Spine (Maybe a)
> fromMaybe _ Nothing = Constr Nothing
> fromMaybe a (Just x) = Constr Just :$ (x ::: a)
>
> toSpine4 :: Type a -> a -> Spine a
> toSpine4 Dots = dots
> toSpine4 (Maybe a) = fromMaybe a {-"\fp"-}
%else
with a constructor
|Maybe :: Type a -> Type (Maybe a)| and provide a case
for the constructor |Maybe| in the |toSpine| function.
%endif
However, this has to be done only once per data type, and
it is so simple that it could easily be done automatically.
The code for the generic functions (of which there can be
many) is completely unaffected by the addition of a new
data type.
\section{Generic queries and traversals}%
\label{mapqmapt}
In this section, we implement the two central \SYB combinators
|everything| and |everywhere| that are used to construct generic queries
and traversals.
\subsection{Generic queries}
%format everything0 = everything
%format everything0' = everything0 "''"
A query is an overloaded function that returns a result of a specific type:
> type Query r = forall a. Type a -> a -> r {-"\fp"-}
We have already
seen an example of a query, namely the |sum| function
from
%Figure~\ref{genericsum}.
Section~\ref{spineview}.
There are many more
applications of queries: computation of the size of
a structure, collection of names, collection of free
variables, building a finite map, finding a specific
element etc.
If we look back at the generic |sum| function,
we see that it performs
several tasks at once, leading to a relatively
complex definition: integers are preserved, while
in general, constructors are replaced by |0|; the
subresults are added; finally, a recursive traversal
is performed over the entire data structure.
In the following, we describe how to separate
these different activities into different functions,
and at the same time abstract from the specific
problem of summing up values.
If we already have a query, we can define a derived
query that applies the original query to all immediate
children of a given constructor:
> mapQ :: Query r -> Query [r]
> mapQ q t = mapQ' q . toSpine t
>
> mapQ' :: Query r -> (forall a. Spine a -> [r])
> mapQ' q (Constr c) = []
> mapQ' q (f :$ (x ::: t)) = mapQ' q f ++ [q t x] {-"\fp"-}
The results of the original query |q| are collected in
a list. The combinator |mapQ| does not traverse the input
data structure. The traversal is the job of |everything'|,
which is defined in terms of |mapQ|:
> everything' :: Query r -> Query [r]
> everything' q t x = [q t x] ++ concat (mapQ (everything' q) t x) {-"\fp"-}
Here, we apply the given query |q| to the entire argument |x|,
and then recurse for the immediate children. The \SYB version
of |everything| fuses |everything'| with an application of
|foldl1|, using a binary operator to combine all the elements
of the nonempty list returned by |everything'|:
> everything :: (r -> r -> r) -> Query r -> Query r
> everything op q t x = foldl1 op ([q t x] ++ mapQ (everything op q) t x) {-"\fp"-}
In order to express the query |sum| in terms of |everything|,
we need a simple query |sumQ| expressing that we want to count
integers:
%format sum1 = sum
> sumQ :: Query Int
> sumQ Int n = n
> sumQ t x = 0
>
> sum1 :: Query Int
> sum1 = everything (+) sumQ {-"\fp"-}
\subsection{Generic traversals}%
\label{mapt}%
While a query computes an answer of a fixed type from an
input, a traversal is an overloaded function that
preserves the type of its input:
> type Traversal = forall a. Type a -> a -> a {-"\fp"-}
The counterpart of |mapQ| is |mapT|. It applies a given
traversal to the immediate children of a constructor,
then rebuilds a value of the same constructor from the
results:
> mapT :: Traversal -> Traversal
> mapT h t = fromSpine . mapT' h . toSpine t
>
> mapT' :: Traversal -> (forall a. Spine a -> Spine a)
> mapT' h (Constr c) = Constr c
> mapT' h (f :$ (x ::: t)) = mapT' h f :$ (h t x ::: t) {-"\fp"-}
The function |mapT| not only consumes
a value of the type argument, but also produces one.
Therefore we call not only |toSpine| on the input value,
but also |fromSpine| before returning the result. The
calls to |fromSpine| and |toSpine| are determined by the
type of the generic function that is defined. The general
principle is described elsewhere~\cite[Chapter 11]{loeh04exploring}.
Using |mapT|, we can build bottom-up or
top-down variants of |everywhere|, which
apply the given traversal recursively:
%format everywhere_td = everywhere "_{\textsc{td}}"
%format everywhere_bu = everywhere "_{\textsc{bu}}"
> everywhere_bu :: Traversal -> Traversal
> everywhere_bu f t = f t . mapT (everywhere_bu f) t
>
> everywhere_td :: Traversal -> Traversal
> everywhere_td f t = mapT (everywhere_td f) t . f t {-"\fp"-}
There are many applications of traversals, such as renaming
variables in an abstract syntax tree, annotating a structure
with additional information, optimizing or simplifying a
structure etc. Here is a simplified example of a transformation
performed by the Haskell refactorer HaRe~\cite{li2003refactor}, which
rewrites a Haskell |if| construct into an equivalent |case|
expression according to the rule
%if False
Haskell abstract syntax (from Language.Haskell, simplified):
> data HsExp =
> HsVar HsName -- different from the original
> | HsCon HsName -- different from the original
> | HsLit HsLiteral
> | HsApp HsExp HsExp
> | HsLambda {- SrcLoc -} [HsPat] HsExp
> -- | HsLet [HsDecl] HsExp
> | HsIf HsExp HsExp HsExp
> | HsCase HsExp [HsAlt]
> | HsTuple [HsExp]
> | HsParen HsExp
> | HsWildCard deriving (Eq,Show)
> fromHsExp (HsVar name) = Constr HsVar :$ (name ::: HsName)
> fromHsExp (HsLit lit) = Constr HsLit :$ (lit ::: HsLiteral)
> fromHsExp (HsApp e1 e2) = Constr HsApp :$ (e1 ::: HsExp)
> :$ (e2 ::: HsExp)
> fromHsExp (HsLambda ps e) = Constr HsLambda :$ (ps ::: List HsPat)
> :$ (e ::: HsExp)
> fromHsExp (HsIf e1 e2 e3) = Constr HsIf :$ (e1 ::: HsExp)
> :$ (e2 ::: HsExp)
> :$ (e3 ::: HsExp)
> fromHsExp (HsCase e l) = Constr HsCase :$ (e ::: HsExp)
> :$ (l ::: List HsAlt)
> data HsPat =
> HsPVar HsName
> | HsPLit HsLiteral
> | HsPTuple [HsPat]
> | HsPParen HsPat
> | HsPWildCard deriving (Eq,Show)
>
> fromHsPat (HsPVar name) = Constr HsPVar :$ (name ::: HsName)
> fromHsPat (HsPLit lit) = Constr HsPLit :$ (lit ::: HsLiteral)
> fromHsPat (HsPWildCard) = Constr HsPWildCard
>
> data HsName = HsIdent String deriving (Eq,Show)
>
> fromHsName (HsIdent name) = Constr HsIdent :$ (name ::: List Char)
>
> data HsLiteral =
> HsChar Char
> | HsInt Int
> | HsBool Bool deriving (Eq,Show)
>
> fromHsLiteral (HsChar c) = Constr HsChar :$ (c ::: Char)
> fromHsLiteral (HsInt x) = Constr HsInt :$ (x ::: Int)
> fromHsLiteral (HsBool x) = Constr HsBool :$ (x ::: Bool)
> data HsAlt = HsAltC HsPat HsExp deriving (Eq,Show)
> fromHsAlt (HsAltC pat exp) = Constr HsAltC :$ (pat ::: HsPat)
> :$ (exp ::: HsExp)
> data HsGuardedAlt = HsGuardedAltC HsExp HsExp deriving (Eq,Show)
>
> fromHsGuardedAlt (HsGuardedAltC e1 e2) = Constr HsGuardedAltC
> :$ (e1 ::: HsExp)
> :$ (e2 ::: HsExp)
%endif
%format ==> = "\quad\leadsto\quad"
%format HsAltC = HsAlt
< if e then e1 else e2 ==> case e of True -> e1; False -> e2 {-"\fp"-}
We assume a suitable abstract syntax for Haskell.
The rewrite rule is captured by the traversal
> ifToCaseT :: Traversal
> ifToCaseT HsExp (HsIf e e1 e2) =
> HsCase e [ HsAltC (HsPLit (HsBool True)) e1,
> HsAltC (HsPLit (HsBool False)) e2]
> ifToCaseT _ e = e {-"\fp"-}
The traversal can be applied to a complete Haskell program
using
> ifToCase = everywhere_bu ifToCaseT {-"\fp"-}
%if False
This is the Haskell expression |if x == 0 then 1 else x|:
> testIf = HsIf (HsApp (HsApp (HsVar (HsIdent "=="))
> (HsVar (HsIdent "x")))
> (HsLit (HsInt 0)))
> (HsLit (HsInt 1))
> (HsVar (HsIdent "x"))
> testHaReExample = ifToCase HsExp testIf
%endif
\section{Generically showing values}%
\label{genericshow}
We have seen that we can traverse data types in several
ways, performing potentially complex calculations in the
process. However, we cannot reimplement Haskell's
|show| function, even though it looks like a |Query String|.
The reason is that there is no way to access the name
of a constructor. We have a case for constructors, |Constr|,
in our |Spine| data type, but there is really not much we
can do at this point. So far, we have either invented
a constant value (|[]| in the case of |mapQ|), or applied
the constructor itself again (in the case of |mapT|).
But it is easy to provide additional information for each
constructor. When we define |toSpine| for a specific data
type, whether manually or automatically, we have information
about the constructors of the data type available, so why
not use it? Let us therefore modify |Spine|
once more:
%format SpineC = Spine
%format :$$ = :$
> data SpineC :: * -> * {-" "-} where
> As :: a -> ConDescr -> SpineC a
> (:$$) :: SpineC (a -> b) -> Typed a -> SpineC b {-"\fp"-}
We have renamed |Constr| to |As|, as we intend to use it
as a binary operator which takes a constructor
function and information about the constructor.
In this paper, we use only the name to describe a
constructor,
> type ConDescr = String {-"\fc"-}
%format fromTreeC = fromTree
%format toSpineC = toSpine
but we could include additional information such as its arity,
the name of the type, the ``house number'' of the constructor
and so on. Adapting |SpineC| means that the generation of
|toSpineC| has to be modified as well. We show as an
example how to do this for the type |Tree|:
%if False
Again, we had |fromTreeC| as a separate function before:
> fromTreeC :: Type a -> Tree a -> SpineC (Tree a)
> fromTreeC _ Empty = Empty `As` "Empty"
> fromTreeC a (Node l x r) = Node `As` "Node"
> :$$ (l ::: Tree a) :$$ (x ::: a) :$$ (r ::: Tree a)
%endif
%if False
> toSpineC :: Type a -> a -> SpineC a
%endif
> toSpineC (Tree a) Empty = Empty `As` "Empty"
> toSpineC (Tree a) (Node l x r) = Node `As` "Node"
> :$$ (l ::: Tree a) :$$ (x ::: a) :$$ (r ::: Tree a) {-"\fp"-}
%if False
> toSpineC Dots space = dots
%endif
%if False
> toSpineC (List a) x = fromListC a x
> toSpineC Int x = fromIntC x
> toSpineC Char x = fromCharC x
> toSpineC (Pair a b) x = fromPairC a b x
> toSpineC (Type a) x = fromType x
> toSpineC Dynamic x = fromDynamic x
> fromIntC :: Int -> SpineC Int
> fromIntC n = n `As` Prelude.show n
> fromCharC :: Char -> SpineC Char
> fromCharC c = c `As` Prelude.show c
> fromPairC :: Type a -> Type b -> (a,b) -> SpineC (a,b)
> fromPairC a b (x,y) = pair `As` "(,)" :$$ (x ::: a) :$$ (y ::: b)
> fromListC :: Type a -> [a] -> SpineC [a]
> fromListC _ [] = [] `As` "[]"
> fromListC a (x:xs) = (:) `As` "(:)" :$$ (x ::: a) :$$ (xs ::: List a)
%endif
With the new version of |Spine|, the function |show| is straightforward
to write:
> show :: Type a -> a -> String
> show t x = show' (toSpineC t x)
>
> show' :: SpineC a -> String
> show' (_ `As` c) = c
> show' (f :$$ (a ::: t)) = "(" ++ show' f ++ " " ++ show t a ++ ")" {-"\fp"-}
The result of the call
%{
%format = = " "
%format v4 = " "
> v4=show (Tree Int) (Node (Node Empty 1 Empty) 2 (Node Empty 3 Empty))
is
\begingroup
\invisiblecomments
%if not techreport
\inlinehs
%else
the string
%endif
> {- ``\eval{putStr $ v4}''\fp -}
\endgroup
%}
%if techreport
It is also easy to define |toConDescr|, which returns the |ConDescr|
for a value:
> toConDescr :: Type a -> a -> ConDescr
> toConDescr t = toConDescr' . toSpineC t
>
> toConDescr' :: SpineC a -> String
> toConDescr' (_ `As` c) = c
> toConDescr' (f :$$ (a ::: t)) = toConDescr' f {-"\fp"-}
%endif
Even though we have information about constructors, we cannot define a
generic |read| without further extensions. In the next section,
we will discuss this and other questions regarding the expressivity
of the \SYB approach.
\section{\SYB in context}%
\label{comparison}
In the previous sections, we have introduced the \SYB approach on
the basis of the |Spine| data type. Generic functions are
overloaded functions that make use of the |Spine| view by calling
|toSpine| on their type argument.
We have seen that we can define useful and widely applicable
combinators such as |everything| and |everywhere| using some
basic generic functions. As long as we stay within the expressivity
of these combinators, it is possible to perform generic programming
avoiding explicit case analysis on types.
In this section, we want to answer how expressive the |Spine| view
is in comparison to both the original presentation of \SYB, which uses
only a given set of combinators, and in relation to other
views, as they are employed by other approaches to generic programming
such as PolyP and Generic Haskell.
\subsection{The original presentation}
As described in the section of implementing \SYB in
the original paper, it turns out that |mapT| and |mapQ| are both
instances of a function that is called |gfoldl|. We
can define |gfoldl|, too. To do this, let us define the
ordinary fold (or catamorphism, if you like) of the
|Spine| type:
> foldSpine :: (forall a . a -> r a) -> (forall a b . r (a -> b) -> Typed a -> r b) ->
> SpineC a -> r a
> foldSpine constr ($$) (c `As` _) = constr c
> foldSpine constr ($$) (f :$$ (x ::: t)) = (foldSpine constr ($$) f) $$ (x ::: t) {-"\fp"-}
The definition follows the catamorphic principle
of replacing data constructors with functions.
The \SYB |gfoldl| is just |foldSpine| composed with
|toSpine|:
> gfoldl :: Type a -> (forall a . a -> r a) -> (forall a b . r (a -> b) -> Typed a -> r b) ->
> a -> r a
> gfoldl t constr app = foldSpine constr app . toSpineC t {-"\fp"-}
It is therefore clear that our approach via the
|Spine| type and the original \SYB approach via |gfoldl|
are in principle equally expressive, because the
|Spine| type can be recovered from |gfoldl|.
However, we believe that the presence of the explicit data
type |Spine| makes the definitions of some generic functions
easier, especially if they do not directly fall in the range
of any of the simpler combinators.
The original \SYB paper describes only generic functions that either
consume a value based on its type (queries, consumers), or
that consume a value based on its type and build up a similar value
at the same time (traversals).
%format some0 = some
%format some0' = some'
There are also generic functions that construct values based on a type
(producers). Such functions include the already mentioned generic
|read|, used to parse a string into a value of a data type, or |some|,
a function that produces some non-bottom value of a given data type. We
cannot define such functions without further help: The definition of
|some0| would presumably follow the general pattern of overloaded functions
on spines, the shape of the final case dictated by the type of |some0|
(cf.~Section~\ref{mapt}):
> some0 :: Type a -> a
> some0 Dots = dots
> some0 t = fromSpine some0'
%if False
> some0' :: Spine a
> some0' = undefined
%endif
But we cannot define |some0' :: Spine a|,
because that would yield
%{
%format = = " "
%format v5 = " "
\begingroup
\inlinehs
> v5=fromSpine some0' {-"\linebreak[1]"-}:: forall a. a {-"\fc"-}
\endgroup
%}
which has to be |undefined| according to the parametricity
theorem~\cite{wadler89theorems}.
Due to the well-definedness of |fromSpine|, |some'|
would have to be |undefined|, too.
It is nevertheless possible to define
|some :: Type a -> a|,
but only if |Type| is augmented with more information
%if not techreport
about the type it represents and its constructors.
Due to space limitations, the implementation
is deferred to
the technical report~\cite{syb0tr}, where
we give an equivalent of |gunfold| from the second
\SYB paper~\cite{syb2}.
%else
about the type it represents. In particular, it must
be possible to obtain
a full list of constructors in some suitable
data structure from a type representation.
If we pursue this path further, we end
up with a specific data type to hold the constructors
of a single type
%format ::$ = "\mathbin{\raisebox{.15ex}{$\scriptstyle\oblong$}}"
> data TypeSpine :: * -> * {-" "-} where
> TypeAs :: a -> ConDescr -> TypeSpine a
> (::$) :: TypeSpine (a -> b) -> Type a -> TypeSpine b
This data type is almost the same as |SpineC|, only the second
argument of |(::$)| is a |Type| instead of a |Typed|. A |TypeSpine|
therefore contains no value except the constructor function itself. We
can now write a function
> constrs :: Type a -> [TypeSpine a]
for each data type, but this function has to be defined
for each data type separately, much like |toSpine|.
Here is an example for the constructors of |Tree|:
> constrs (Tree a) = [ Empty `TypeAs` "Empty",
> Node `TypeAs` "Node" ::$ Tree a ::$ a ::$ Tree a ] {-"\fp"-}
%if False
> constrs (List a) = [ [] `TypeAs` "[]",
> (:) `TypeAs` "(:)" ::$ a ::$ List a ]
> constrs Int = [ n `TypeAs` Prelude.show n | n <- [0..] ]
> -- a proper implementation would use interleave to get negatives as well
> constrs (Pair a b)= [ pair `TypeAs` "(,)" ::$ a ::$ b ]
%endif
With |constrs|, we now can define |some| or even |read|:
> some :: Type a -> a
> some = some' . head . constrs
>
> some' :: TypeSpine a -> a
> some' (TypeAs c _) = c
> some' (f ::$ a) = some' f (some a) {-"\fp"-}
%if False
%format toSpine2 = toSpine
%format toSpine2' = toSpine'
Can |toSpine| can be written as a true generic function in
terms of |constrs|? I think not.
%endif
Indeed, the |gunfold| function that is added to the set of predefined
combinators in the second \SYB paper~\cite{syb2} is derived from the
catamorphism on |Constr| much like |gfoldl| is derived from the
catamorphism on |SpineC|. Unfortunately though, there is no relation between
|gunfold| and an anamorphism on |SpineC|.
%if False
> foldTypeSpine :: (forall a . a -> r a) -> (forall a b . r (a -> b) -> Type a -> r b) ->
> TypeSpine a -> r a
> foldTypeSpine constr ($$) (TypeAs c _) = constr c
> foldTypeSpine constr ($$) (f ::$ t) = (foldTypeSpine constr ($$) f) $$ t {-"\fp"-}
This is almost the same as |gunfold|, which flips the first two
arguments, and uses an abstract type |Constr| instead of |TypeSpine a|.
%endif
%endif
We \emph{can}, however, define functions
on multiple type arguments without further additions.
The definition of generic equality is very
straightforward using the spine view:
> eq :: Type a -> Type b -> a -> b -> Bool
> eq t1 t2 x y = eq' (toSpineC t1 x) (toSpineC t2 y)
>
> eq' :: SpineC a -> SpineC b -> Bool
> eq' (_ `As` c1) (_ `As` c2) = c1 == c2
> eq' (f1 :$$ (a1 ::: t1)) (f2 :$$ (a2 ::: t2)) = eq' f1 f2 && eq t1 t2 a1 a2
> eq' _ _ = False {-"\fp"-}
The generalized type of |eq| avoids the necessity of
a type-level equality test. In the second \SYB paper,
|eq| is defined in terms of a combinator called |zipWithQ|.
%if techreport
%format eq1 = eq
> type Query2 r = forall a b. Type a -> Type b -> a -> b -> r
>
> zipWithQ :: Query2 r -> Query2 [r]
> zipWithQ q t1 t2 x y = zipWithQ' q (toSpineC t1 x) (toSpineC t2 y)
>
> zipWithQ' :: Query2 r -> (forall a b. SpineC a -> SpineC b -> [r])
> zipWithQ' q (f1 :$$ (a1 ::: t1)) (f2 :$$ (a2 ::: t2)) = zipWithQ' q f1 f2 ++
> [q t1 t2 a1 a2]
> zipWithQ' q s1 s2 = []
>
> eqQ :: Query2 Bool
> eqQ Int Int x y = x == y
> eqQ Char Char x y = x == y
> eqQ t1 t2 x y = toConDescr t1 x == toConDescr t2 y
>
> eq1 :: Query2 Bool
> eq1 t1 t2 x y = and ([eqQ t1 t2 x y] ++ zipWithQ eq1 t1 t2 x y) {-"\fp"-}
%endif
Although we can mirror the definition of |zipWithQ|,
we believe that the direct definition is much clearer.
\subsection{Other views and their strengths and weaknesses}
Let us now look at two other approaches to generic programming,
PolyP and Generic Haskell.
They are also based on overloaded functions, but they do not
represent values using |Spine|.
A different choice of view affects the class of generic functions
that can be written, how easily they can be expressed, and the
data types that can be represented.
\subsubsection{PolyP}
In PolyP~\cite{polyp}, data types of kind |* -> *| are viewed
as fixed points of regular pattern functors. The regular functors
in turn are of kind |* -> * -> *| and represented as lifted
sums of products. The view makes use of the following type
definitions:
> data Fix f = In (f (Fix f))
> type LSum f g a r = Either (f a r) (g a r)
> type LProd f g a r = (f a r, g a r)
> type LUnit a r = ()
> type Par a r = a
> type Rec a r = r {-"\fp"-}
Here, |Fix| is a fixed-point computation on the type level.
The type constructors
|LSum|, |LProd|, and |LUnit| are lifted variants of
the binary sum type |Either|, the binary product type |pair|,
and the unit type |()|. Finally we have |Par| to select
the parameter, and |Rec| to select the recursive call.
%format =~ = "\cong "
As an example, our type |Tree| has pattern functor |TreeF|:
> data TreeF a r = EmptyF | NodeF r a r {-"\fp"-}
We have (modulo |undefined|) that |Tree a =~ Fix (TreeF a)|.
Furthermore, we can view
|TreeF| as a binary sum (it has two constructors), where
the right component is a nested binary product (|NodeF| has three
fields). The recursive argument |r| is represented by |Rec|,
the parameter to |Tree| by |Par|:
> type TreeFS a r = LSum LUnit (LProd Rec (LProd Par Rec)) a r {-"\fp"-}
Again, we have (modulo |undefined|)
an isomorphism |TreeF a r =~ TreeFS a r|.
The view of PolyP has two obvious disadvantages: first, due to its
two-level nature, it is relatively complicated; second, it is
quite limited in its applicability. Only data types of kind~|* -> *|
that are regular can be represented.
On the other hand, many generic functions on data types of
kind~|* -> *| are definable. PolyP can express functions to
parse, compare, unify, or print values generically. Its particular
strength is that recursion patterns such as
cata- or anamorphisms can be expressed generically,
because each data type is viewed as a fixed point, and the
points of recursion are visible.
\subsubsection{Generic Haskell}
In contrast to PolyP, Generic Haskell~\cite{loeh04exploring,ghcoral}
uses a view that is much
more widely applicable and is slightly easier to handle: all data types
are (unlifted) sums of products. The data type |Tree| is viewed
as the isomorphic
> type TreeS a = Either () (Tree a, (a, Tree a)) {-"\fp"-}
The original type |Tree| appears in |TreeS|, there is no special
mechanism to treat recursion differently. This has a clear advantage,
namely that the view is applicable to nested and mutually recursive
data types of arbitrary kinds. In fact, in Generic Haskell all
Haskell~98 data types can be represented. The price is that recursion
patterns such as cata- or anamorphisms cannot be defined directly.
On the other hand, generic functions in Generic Haskell can be defined
such that they work on types of all kinds. It is therefore significantly
more powerful than PolyP. In Generic Haskell we can, for instance,
define a generic |map| that works for generalized rose trees,
a data type of kind~|(* -> *) -> * -> *|:
\begingroup
\setlength{\belowdisplayskip}{-2pt}%
> data Rose f a = Fork a (f (Rose f a)) {-"\fp"-}
\endgroup
\subsubsection{Scrap your boilerplate}
The |Spine| view is not so much based on the structure of types,
but on the structure of values. It emphasizes the structure of
a constructor application. We have already noticed that this limits
the generic functions that can be written. Pure producers
such as |read| or |some| require additional information.
Furthermore, all generic functions work on types of kind~|*|. It
is not possible to define a generic version of |map| for type
constructors, or to define a recursion pattern such as a catamorphism
generically.
But the |Spine| view also has two noticeable advantages over the other
views discussed. Firstly, the view is simple, and the relation between
a value and its spine representation is very direct. As a consequence,
the transformation functions |fromSpine| and |toSpine| are quite
efficient, and it is easy to deforest the |Spine| data structure.
Secondly, as every (non-abstract) Haskell value is a constructor
application, the view is very widely applicable. Not only all
Haskell~98 data types of all kinds can be represented, the |Spine|
view is general enough to represent data types containing existentials
and even \GADTs without any further modifications. This is
particularly remarkable, because at the moment, \GHC does not support
automatic derivation of classes for \GADTs. The methods of the classes
|Eq|, |Ord|, and |Show| can easily be defined using the \SYB approach.
Thus, there is no theoretical problem to allow derivation of these
classes also for \GADTs. We discuss this newly found expressive power
further in the next section.
%if False
Example for all kinds:
Here is an example for generalized rose trees:
%format Type1 = Type
%format Rose1 = Rose
> data Type1 :: * -> * {-" "-} where
> TDots1 :: Type1 tdotsend
> Rose1 :: (forall a. Type1 a -> Type1 (f a)) -> Type1 a -> Type1 (Rose f a)
%format toSpine1 = toSpine
> toSpine1 :: Type a -> a -> Spine a
> toSpine1 (Rose f a) = toSpineRose f a
> toSpineRose :: (forall a . Type a -> Type (f a)) ->
> Type a -> Rose f a -> Spine (Rose f a)
> toSpineRose f a (Fork x ts) = Constr Fork :$ (x ::: a) :$ (ts ::: f (Rose f a))
[Note that we cannot use above approach with type classes,
because we'd need higher-order (quantified) class constraints.]
Possible problem for representation: types which have polymorphic components
require polymorphic type representations, but that's an
orthogonal issue.
> data Monad m = Monad (forall a. a -> m a) (forall a b. m a -> (a -> m b) -> m b)
%endif
\section{Scrap your boilerplate for ``Scrap your boilerplate''}%
\label{ssyb}
There are almost no limits to the data types we can represent
using |Spine|. One very interesting example is the \GADT of
types itself, namely |Type|. This allows us to instantiate
generic functions on type |Type|.
Consequently, we can show types by invoking the generic |show|
function, or compute type equality using the generic equality~|eq|!
Both are useful in the context of dynamically typed values:
> data Dynamic where
> Dyn :: t -> Type t -> Dynamic {-"\fp"-}
The difference between the types |Dynamic| and |Typed| is
that |Dynamic| contains an existential quantification.
Before we can actually use generic functions on |Dynamic|, we require
that |Type| has a constructor for types and dynamic values:
%format Type2 = Type
%format Dynamic2 = Dynamic
> data Type2 :: * -> * {-" "-} where
> TDots2 :: Type2 tdotsend
> Type2 :: Type2 a -> Type2 (Type2 a)
> Dynamic2 :: Type2 Dynamic {-"\fp"-}
The function |toSpine| also requires cases for |Type| and |Dynamic|,
but converting types or dynamics into the |SpineC| view is entirely
straightforward, as the following example cases demonstrate:
%format toSpineC1 = toSpineC
%if False
> toSpineC1 :: Type a -> a -> SpineC a
%endif
> toSpineC1 (Type a') (Type a) = Type `As` "Type" :$$ (a ::: Type a)
> toSpineC1 Dynamic (Dyn x t) = Dyn `As` "Dyn" :$$ (x ::: t) :$$ (t ::: Type t) {-"\fp"-}
In the first line above, |a'| is always equal to |a|, but the Haskell
type system does not know that, so we do not enforce it in the program.
%if False
> fromType :: Type a -> SpineC (Type a)
> fromType (Tree a) = Tree `As` "Tree" :$$ (a ::: Type a)
> fromType (Type a) = Type `As` "Type" :$$ (a ::: Type a)
> fromType Dots = dots
> fromType Int = Int `As` "Int"
> fromType Char = Char `As` "Char"
> fromType (List a) = List `As` "List" :$$ (a ::: Type a)
> fromType (Pair a b) = Pair `As` "Pair" :$$ (a ::: Type a) :$$ (b ::: Type b)
> fromDynamic :: Dynamic -> SpineC Dynamic
> fromDynamic (Dyn x t) = Dyn `As` "Dyn" :$$ (x ::: t) :$$ (t ::: Type t)
> dyn = Dyn (Node Empty 2 Empty) (Tree Int)
%endif
The output of
%{
%format = = " "
%format w0 = " "
\begingroup
%if not techreport
\inlinehs
%endif
> w0=show Dynamic (Dyn (Node Empty 2 Empty) (Tree Int))
is now the string
\invisiblecomments
< {- ``\eval{putStr $ w0}''\fc -}
and comparing the dynamic value to itself using |eq Dynamic Dynamic|
yields indeed \eval{eq Dynamic Dynamic dyn dyn},
incorporating a run-time type equality test.
\endgroup
%}
%if techreport
\section{Class-based implementation}%
\label{classbased}
In this section, we combine the idea of the |Spine| view with an
alternative way to express overloaded functions, using
classes as described in the third \SYB paper~\cite{syb3},
short \SYB 3 in the remainder of this section.
We assume knowledge with the material in that paper, as some
of the techniques described there are quite subtle.
The advantage of this presentation (which comes at the price
of elegance) is that the resulting functions are extensible
in a compositional way. New data types can be added, and
functions adapted, just by providing a few additional instances.
As an introduction, here is how the overloaded |sum0| function from
Section~\ref{functionsontypes} is expressed using type classes:
%format Sum2 = Sum
%format sum2 = sum
> class Sum2 a where
> sum2 :: a -> Int
>
> instance Sum2 Int where
> sum2 n = n
>
> instance Sum2 Char where
> sum2 _ = 0
>
> instance Sum2 a => Sum2 [a] where
> sum2 xs = foldr (+) 0 (map sum2 xs)
>
> instance (Sum2 a, Sum2 b) => Sum2 (a,b) where
> sum2 (x,y) = sum2 x + sum2 y
>
> instance Sum2 a => Sum2 (Tree a) where
> sum2 t = sum2 (inorder t) {-"\fp"-}
If we want to turn this into the generic variant of |sum|, as
given near the end of Section~\ref{spineview}, while keeping
the function extensible in the spirit of \SYB 3, we have to
do a bit of preliminary work.
The whole point of the exercise is to reuse the spine view,
but to do without explicit type representations. It turns out
that the main thing we need to be able to do is to turn values
into their spine representation, so we define a |Data| class
as follows:
%format SpineX = Spine
%format toSpineX = toSpine
%format constrsX = constrs
> class (Sat (ctx a)) => Data ctx a where
> toSpineX :: a -> SpineX ctx a {-"\fc"-}
which plays more or less the same role as the |Data| class
in the \SYB papers.
The intricacies of |Sat| and the |ctx| parameter are explained
in \SYB 3, they have to do with tying the recursive knot for
extensible generic functions. The definition of |Sat|
(read: satisfy) is simply
> class Sat a where
> dict :: a {-"\fp"-}
We can now specify the variant of the |Spine| data type we
use here, where we use the |Data| class instead of the |Type|
data type:
%format AsX = As
%format :$? = :$
> data SpineX :: (* -> *) -> (* -> *) where
> AsX :: a -> ConDescr -> SpineX ctx a
> (:$?) :: (Data ctx a) => SpineX ctx (a -> b) -> a -> SpineX ctx b {-"\fp"-}
Defining instances of the |Data| class is without surprises:
> instance Sat (ctx Int) => Data ctx Int where
> toSpineX n = n `AsX` Prelude.show n
>
> instance (Sat (ctx (Tree a)), Data ctx a) => Data ctx (Tree a) where
> toSpineX Empty = Empty `AsX` "Empty"
> toSpineX (Node l x r) = Node `AsX` "Node" :$? l :$? x :$? r {-"\fp"-}
%format Sum3 = Sum
%format sum3 = sum
%format sum3' = sum'
Now we can start defining the generic |sum3| function.
The main part of the function turns out to
be simpler than in the non-generic case, because we need
only one specific case, for integers:
> class Sum3 a where
> sum3 :: a -> Int
>
> instance Sum3 Int where
> sum3 n = n {-"\fp"-}
The generic case looks as follows:
> instance Data SumD a => Sum3 a where
> sum3 x = sum3' (toSpineX x) {-"\fp"-}
The data type |SumD| is a \technical{dictionary}
for the |sum3| function. We use it for recursive
calls:
> data SumD a = SumD { sumD :: a -> Int }
>
> sum3' :: SpineX SumD a -> Int
> sum3' (c `AsX` _) = 0
> sum3' (f :$? x) = sum3' f + sumD dict x
>
> instance Sum3 a => Sat (SumD a) where
> dict = SumD { sumD = sum3 } {-"\fp"-}
This function works as the original generic |sum|,
only that we do not have to provide a type argument.
The call
\begingroup
\invisiblecomments
%{
%format = = " "
%format v6 = " "
> v6=sum3 (Node Empty (17 :: Int) (Node Empty 25 Empty))
%}
\endgroup
evaluates to \eval{v6}. The function |sum3| is easy
to extend: if we want to handle new data types, all
we have to do is to define appropriate instances for
the |Data| and |Sum3| classes.
When moving to the classic \SYB combinators, we have
to resort to proxies as explained in \SYB 3:
%format mapQX = mapQ
%format mapQX' = mapQ'
%format everythingX = everything
%format everythingX' = everything'
%format QueryX = Query
> data Proxy (ctx :: * -> *) -- empty type
>
> proxy :: Proxy ctx
> proxy = error "proxy"
>
> type QueryX ctx r = forall a. Data ctx a => Proxy ctx -> a -> r
>
> mapQX :: QueryX ctx r -> QueryX ctx [r]
> mapQX q _ = mapQX' q . toSpineX
>
> mapQX' :: QueryX ctx r -> (forall a. SpineX ctx a -> [r])
> mapQX' q (c `AsX` _) = []
> mapQX' q (f :$? x) = mapQX' q f ++ [q proxy x]
>
> everythingX' :: QueryX ctx r -> QueryX ctx [r]
> everythingX' q p x = [q p x] ++ concat (mapQX (everythingX' q) p x)
>
> everythingX :: (r -> r -> r) -> QueryX ctx r -> QueryX ctx r
> everythingX op q p x = foldl1 op ([q p x] ++ mapQX (everythingX op q) p x) {-"\fp"-}
With these preparations in place, we can now write |sum3| as
a query:
%format SumQ2 = SumQ
%format sumQ2 = sumQ
%format sum4 = sum
> class SumQ2 a where
> sumQ2 :: a -> Int
> sumQ2 _ = 0
>
> instance SumQ2 Int where
> sumQ2 n = n
>
> data SumQD a = SumQD { sumQD :: a -> Int }
>
> instance SumQ2 a => Sat (SumQD a) where
> dict = SumQD { sumQD = sumQ2 }
>
> sum4 :: QueryX SumQD Int
> sum4 = everythingX (+) (const (sumQD dict)) {-"\fp"-}
The rest of the code in this paper can be adapted to a class-based
approach using the same techniques as shown above. This demonstrates
that the choice of how to represent overloaded functions is mostly
independent of the usage of the spine view.
%endif
\section{Conclusions}%
\label{conclusions}
The \SYB approach has been developed by Peyton Jones
and L\"ammel in a series of
papers~\cite{laemmel03scrap,syb2,syb3}. Originally,
it was an implementation of \technical{strategic
programming}~\cite{visser2000language} in Haskell,
intended for traversing and querying complex, compound
data such as abstract syntax trees.
The ideas underlying the generic programming extension
PolyP~\cite{polyp}
go back to the categorical notions of functors and
catamorphisms, which are independent of the data type
in question~\cite{backhousegp}.
Generic Haskell~\cite{hinzempc} was motivated by the
desire to overcome the restrictions of PolyP.
Due to the different backgrounds, it is not surprising that \SYB and
generic programming have remained difficult to compare for a long
time. The recent work on \technical{generic
views}~\cite{holdermans2005views,holdermans2005thesis} has been an
attempt to unify different approaches. We believe that we
bridged the gap in this paper for the first time, by presenting the
|Spine| data type which encodes the \SYB approach faithfully.
Our implementation handles the two central ingredients
of generic programming differently from the
original \SYB paper: we use overloaded functions with
explicit type arguments instead of overloaded functions
based on a type-safe cast~\cite{laemmel03scrap} or a
class-based extensible scheme~\cite{syb3}; and we use
the explicit spine view rather than a combinator-based
approach.
%
Both changes are independent of each other, and have been made
with clarity in mind: we think that the
structure of the \SYB approach is more visible in our setting, and that
the relations to PolyP and Generic Haskell become clearer. We have
revealed that while the spine view is
limited in the class of generic functions that can
be written, it is applicable to a very large class of
data types, including \GADTs.
Our approach cannot be used easily as a library,
because the encoding of overloaded functions using explicit type
arguments requires the extensibility of the |Type| data type and of
functions such as |toSpine|. One can, however, incorporate
|Spine| into the \SYB library while still using the techniques
of the \SYB papers to encode overloaded functions%
%if not techreport
\ (see the technical report~\cite{syb0tr} for more details)%
%endif
.
In this paper, we do not use classes at all, and we therefore expect
that it is easier to prove algebraic properties about SYB (such as
|mapT copy = copy| where |copy _ = id| is the identity traversal)
in this setting.
For example, we believe that the work of
Reig~\cite{reig:tfp:2006} could be recast using our approach,
leading to shorter and more concise proofs.
\paragraph{Acknowledgements}
We thank Jeremy Gibbons, Ralf L\"ammel, Pablo Nogueira,
Simon Peyton Jones, Fermin Reig,
and the four anonymous referees
for several helpful remarks.
\bibliographystyle{splncs}
%\clearpage
%\addcontentsline{toc}{chapter}{\bibname}
%if not techreport
\afterpage{\enlargethispage*{\baselineskip}}%
%endif
\bibliography{Syb}
\end{document}