Some ideas about dependency graph generation ============================================ Ok, here's the new version. Each variant is represented by a number of nodes in the dependency graph. These nodes are called the "history" of the variant in the graph. Variants can have different forms of histories, reflecting different processes that happen during the resolution of the dependency graph. For instance, a variant can have an "install" history if it isn't available in the beginning, but will be pulled into the system. It can have an "available" history if it is available and will remain so. It can have a "remove" history if it is available but will be removed. An "upgrade" from one variant to another is, in essence, a combination of a "remove" history for the old variant with an "install" history for the new variant. One design principle is that problems should appear as cycles in the dependency graph. In the end, we can check if the graph is cycle-free and if it is, we can go ahead and perform the resulting actions. Aha, actions. Actions such as "fetching", "building", "merging", "unmerging", are attached to some of the nodes. = Representation = For easy access, we should retain the histories that are stored in the graph. > histories :: Map PS History > > History = Install Version > | Available Version > | Remove Version > | Upgrade Version Version -- can also be used for downgrades We should also have a direct map to look up nodes. > nodes :: Map Node Int > > Node = (Action, PVS) -- or PV? = Actions = In principle, we require the following sorts of nodes: Built: a variant is built Available: a variant is available for use Removed: a variant is no longer on the system Possibly, we require Pre-Available to resolve PDEPEND cycles. In addition, we require the following two meta-nodes: Top: This is the goal. Bottom: This is what's present before we start. = "Install" history = An install history looks as follows /- X.R ---\ | | v v Top ----> X.A ----> X.B ----> Bot Explanation: X.B --> Bot: The package can only be built if we have a consistent status quo. X.A --> X.B: The package cannot be used before it isn't built. Top --> X.A: We *want* X on the system in the end. X.R --> Top: If we have Top --> X.R somehow, this is a desired conflict. X.R --> X.A: We cannot remove X before it isn't available. = "Available" history = /- X.R ---\ | | v v Top ----> X.A <---- Bot X.B <---- Bot Bot --> X.A: The package is available before we start. Bot --> X.B: Dito for building. Top --> X.A: We want X on the system in the end. X.R --> Top: If we have Top --> X.R somehow, this is a desired conflict. X.R --> X.A: We cannot remove X before it isn't available. = "Remove" history = Top ----> X.R ----> X.A <---- Bot X.B <---- Bot Bot --> X.A: The package is available before we start. Bot --> X.B: Dito for building. X.R --> X.A: We cannot remove X before is isn't available. Top --> X.R: We want X removed in the end. = Dependencies = For X DEPENDs on Y, add X.B --> Y.A, and Y.R --> X.B. For X RDEPENDs on Y, add X.A --> Y.A, and Y.R --> X.R. PDEPENDs necessarily create cycles. If X PDEPENDS on Y, but Y DEPENDs on X (which it usually does), then Y cannot assume that X is completely usable before Y is installed. After all, for X to be completely usable Y would have to be installed already. For X PDEPENDs on Y, add X.A --> Y.A, and Y.R --> X.R. So, a typical situation is that of ghc and cabal: ghc PDEPENDs on cabal cabal DEPENDs on ghc cabal RDEPENDs on ghc The idea is that cabal's dependencies should not point to ghc's Available state, but to something intermediate. If a cycle contains a path X.? -(R)DEPEND-> Y.A -PDEPEND-> Z.A we change it to X.? -(R)DEPEND-> Y.P Y.A -PDEPEND-> Z.A Y.A -META-> Y.P Additionally, Y.A -META-> Y.P This will resolve this particular cycle. For RDEPEND/PDEPENDs, there's also a cycle between X.R and Y.R. Since a PDEPEND is installed after the originating package, we want it removed first, too, so we change Y.R -(R)DEPEND-> X.R -PDEPEND-> Y.R to X.R -PDEPEND-> Y.R = Blockers = For X DEPENDs on !Y, add Y.B --> X.B. For X RDEPENDs on !Y, add Y.B --> X.R. Are these sufficient? Yes, I think so: if X is installed, and Y available, and we have X RDEPEND !Y, we get the cycle: Top --> X.A --> X.B --> Bot --> Y.B ==> X.R --> Top Nice, huh? What if X is installed, Y is available, and we have X DEPEND !Y? Top --> X.A --> X.B --> Bot --> Y.B ==> X.B Still a conflict, but we could do better (remove Y, but not without reverse dependency handling). What if X is installed, Y is installed, and we have X DEPEND !Y? Top --> X.A --> X.B --> Bot | ^ ^ | | | \----> Y.A --> Y.B -----/ Ok, but X must be built before Y. = Histories vs. dependencies = If we take dependencies into account, histories can be simplified. Essentially, there are two histories only: ~ "Available" history (for packages that are available initially) X.R ----> X.A <---- Bot = X.B ~ "Install" history X.R ----> X.A ----> X.B ----> Bot [2] TOP There is a special node called TOP. If a package X is to be installed on the system, there should be a direct edge from TOP to X.Available. If we do not want something to happen, we can depend on TOP. For instance, if X is supposed not to be installed on the system, let X.Available depend on TOP. As a result, X.Available cannot be depended on from TOP without introducing a cycle, and thus an inconsistency. [3] For blockers, we have to distinguish between DEPEND blockers, and R/PDEPEND blockers. The latter are far more severe and can never be resolved if encountered. If X DEPENDs on !Y, we have that Y.Built depends on X.Built. That's if Y has an installation history (see [4]). If Y has a remove history, then X.Built depends on Y.Removed. If Y is already installed, there's an unresolvable conflict. One way to represent this is to add a remove history for Y even though an installed history already exists. [5] Cycle detection =============== A really important feature will be that we have reasonably efficient detection of cycles. We could first compute SCCs. We know that each SCC has at least one back-edge, which yields a cycle. We could iterate this process. We require cycles to be given as lists of edges, so that we can check what kind of edges make up a path. Data.Graph.Inductive contains a type Path, which is a list of nodes. We can compute our representation from this list of nodes.