Advanced Topics

After you have become familiar with the core topics , you can move on to these advanced topics:


Building PSL

This page is for building the PSL source code for the purposes of development. For running a standard PSL program, see Running a Program.

The PSL source code is publicly available and hosted on GitHub.

To get the code, simply clone the repository:

>> git clone https://github.com/linqs/psl.git

If you are already comfortable using Git, then you can just skip ahead to the section on compiling PSL.

Getting started with Git

The Git website has information on installing Git, as does the GitHub guides mentioned below. This tutorial is helpful for learning how to use Git, and this tutorial is particularly helpful for SVN users.

Once Git is installed and you're ready to use it, you can run the above command to clone the PSL repository.

Checking out the development branch

Between releases, the develop branch may be significantly ahead of the master branch. To see the latest changes, checkout the develop branch.

>> git checkout develop

Contributing Code

To contribute code to PSL first fork the PSL repository, which means that you create a fork hosted on GitHub.

Then you clone that repository to a local machine, make commits, and, push some or all of those commits back to the repository on GitHub. When your change is ready to be added to PSL, you can submit a pull request which will be reviewed by the PSL maintainers. The maintainers may request that you make additional changes. After your code is deemed acceptable, it will get merged into the develop branch of PSL.

Building PSL

PSL uses the maven build system. Change to the top-level directory of your working copy and run

>> mvn compile

You can install PSL to your local Maven repository by running

>> mvn install

Updating your Projects

Remember to update your project's pom.xml file with the (possibly) new version you installed.


Builtin Similarity Functions

PSL comes with several builtin similarity functions. If you have a need not captured by these functions, then you can also create customized similarity functions.

These similarity functions are shipped with the PSL Utils package.

Text Similarity

Name: Cosine Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.CosineSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Cosine_similarity

Name: Dice Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.DiceSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient

Name: Jaccard Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.JaccardSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaccard_index

Name: Jaro Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.JaroSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.census.gov/srd/papers/pdf/rr91-9.pdf

Name: Jaro-Winkler Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance

Name: Level 2 Jaro-Winkler Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.Level2JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Jaro-Winkler Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).

Name: Level 2 Levenshtein Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.Level2LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Levenshtein Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).

Name: Level 2 Monge Elkan Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.Level2MongeElkanSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.aaai.org/Papers/KDD/1996/KDD96-044.pdf

Name: Levenshtein Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Levenshtein_distance

Name: Same Initials
Qualified Path: org.linqs.psl.utils.textsimilarity.SameInitials
Arguments: String, String
Return Type: Discrete
Description: First splits the input strings on any whitespace and ensures both have the same number of tokens (returns 0 if they do not). Then, the first character of all the tokens are checked for equality (ignoring case and order of appearance). Note that all all character that are not alphabetic ASCII characters are considered equal (eg. all numbers and unicode are considered the same character).

Name: Same Number of Tokens
Qualified Path: org.linqs.psl.utils.textsimilarity.SameNumTokens
Arguments: String, String
Return Type: Discrete
Description: Checks same number of tokens (delimited by any whitespace).

Name: Sub String Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.SubStringSimilarity
Arguments: String, String
Return Type: Continuous
Description: If one input string is a substring of another, then the length of the substring divided by the length of the text is returned. 0 is returned if neither string is a substring of the other.


Change Log

Version 2.0.0 (https://github.com/linqs/psl/tree/2.0.0)

Version 1.2.1 (https://github.com/linqs/psl/tree/1.2.1)

  • Bug fix for External Function registration

Version 1.2 (https://github.com/linqs/psl/tree/1.2)

Version 1.1.1 (https://github.com/linqs/psl/tree/1.1.1)

  • Improved examples, which demonstrate database population for non-lazy inference and learning
  • Support for learning negative weights (limited to inference methods for discrete MRFs that support negative weights)
  • Bug fixes

Version 1.1 (https://github.com/linqs/psl/tree/1.1)

  • An improved Groovy interface. Try the new examples via https://github.com/linqs/psl/wiki/Installing-examples to learn the new interface.
  • New, improved psl-core architecture
  • Much faster inference based on the alternating direction method of multipliers (ADMM).
  • Improved max-likelihood weight learning
  • New max-pseudolikelihood and large-margin weight learning
  • Many bug fixes and minor improvements.

Version 1.0.2 (https://github.com/linqs/psl/tree/1.0.2)

  • Fixed bugs in HomogeneousIPM and MOSEK add-on caused by bug in parallel colt when using selections from large, sparse matrices.
  • Fixed bug when learning weights of programs which contain set functions.
  • Reduced memory footprint of HomogeneousIPM and matrices produced by ConicProgram.

Version 1.0.1 (https://github.com/linqs/psl/tree/1.0.1)

  • Fixed bug in optimization program when the same atom was used more than once in a ground rule or constraint.
  • Added release profile to parent POM for better packaging.
  • Minor changes to archetypes.

Version 1.0 (https://github.com/linqs/psl/tree/1.0)


Choosing a Version of PSL

Maven allows several ways to specify acceptable versions for dependencies. This page discusses the recommended options to specifying the PSL version to use.

Exact Version

If you are working on a paper or code that requires exact reproducibility, then you should specify an exact version of PSL.

For example:

<dependencies>
    ...
    <dependency>
        <groupId>org.linqs</groupId>
        <artifactId>psl-groovy</artifactId>
        <version>2.1.3</version>
    </dependency>
    ...
</dependencies>

Major Minor

If you want to get bug fixes without worrying about breaking changes, then you can specify a major and minor version while allowing the incremental (patch) version to grow.

For example:

<dependencies>
    ...
    <dependency>
        <groupId>org.linqs</groupId>
        <artifactId>psl-groovy</artifactId>
        <version>[2.1,)</version>
    </dependency>
    ...
</dependencies>

Major

If you want the latest stable code and can tolerate the occasional breakage, then you can specify just the major version.

For example:

<dependencies>
    ...
    <dependency>
        <groupId>org.linqs</groupId>
        <artifactId>psl-groovy</artifactId>
        <version>[2,)</version>
    </dependency>
    ...
</dependencies>

Canary

If you are doing development any are willing to accept potential bugs, broken builds, and API breakages, then you can use the canary build. See the working with canary page to get detail on how best to work with the canary build.

<dependencies>
    ...
    <dependency>
        <groupId>org.linqs</groupId>
        <artifactId>psl-groovy</artifactId>
        <version>CANARY</version>
    </dependency>
    ...
</dependencies>

Configuration Options

Key: pairedduallearner.warmuprounds
Type: int
Default Value: 0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.PairedDualLearner
Description: Key for Integer property that indicates how many rounds of paired-dual learning to run before beginning to update the weights (parameter K in the ICML paper)

Key: pairedduallearner.admmsteps
Type: int
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.PairedDualLearner
Description: Key for Integer property that indicates how many steps of ADMM to run for each inner objective before each gradient step (parameter N in the ICML paper)

Key: hardem.adagrad
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.HardEM
Description: Key for Boolean property that indicates whether to use AdaGrad subgradient scaling, the adaptive subgradient algorithm of John Duchi, Elad Hazan, Yoram Singer (JMLR 2010). If TRUE, will override other step scheduling options (but not scaling).

Key: bernoullimeanfieldem.mpeinit
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.BernoulliMeanFieldEM
Description: Key for Boolean property. If true, the mean field will be reinitialized via MPE inference at each round. If false, each mean will be initialized to 0.5 before the first round.

Key: em.iterations
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for positive int property for the number of iterations of expectation maximization to perform

Key: em.resetschedule
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for Boolean property that indicates whether to reset step-size schedule for each EM round. If TRUE, schedule will be VotedPerceptron#STEP_SIZE_KEY at start of each round. If FALSE, schedule will smoothly decrease across rounds, i.e., the schedule will be 1/ (round number * num steps + step number). This property has no effect if VotedPerceptron#STEP_SCHEDULE_KEY is false.

Key: em.storeweights
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for Boolean property that indicates whether to store weights along entire optimization path

Key: em.tolerance
Type: double
Default Value: 1e-3
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for positive double property for the minimum absolute change in weights such that EM is considered converged

Key: votedperceptron.augmentloss
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for boolean property for whether to add loss-augmentation for online large margin

Key: votedperceptron.l2regularization
Type: double
Default Value: 0.0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for positive double property scaling the L2 regularization (\lambda / 2) * ||w||^2

Key: votedperceptron.l1regularization
Type: double
Default Value: 0.0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for positive double property scaling the L1 regularization \gamma * |w|

Key: votedperceptron.stepsize
Type: double
Default Value: 1.0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for positive double property which will be multiplied with the objective gradient to compute a step.

Key: votedperceptron.schedule
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for Boolean property that indicates whether to shrink the stepsize by a 1/t schedule.

Key: votedperceptron.scalegradient
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for Boolean property that indicates whether to scale gradient by number of groundings

Key: votedperceptron.averagesteps
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for Boolean property that indicates whether to average all visited weights together for final output.

Key: votedperceptron.numsteps
Type: int
Default Value: 25
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for positive integer property. VotedPerceptron will take this many steps to learn weights.

Key: votedperceptron.nonnegativeweights
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for boolean property. If true, only non-negative weights will be learned.

Key: maxspeudolikelihood.bool
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Boolean property. If true, MaxPseudoLikelihood will treat RandomVariableAtoms as boolean valued. Note that this restricts the types of contraints supported.

Key: maxspeudolikelihood.numsamples
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Key for positive integer property. MaxPseudoLikelihood will sample this many values to approximate the integrals in the marginal computation.

Key: maxspeudolikelihood.constrainttolerance
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Key for constraint violation tolerance

Key: maxspeudolikelihood.minwidth
Type: double
Default Value: 1e-2
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Key for positive double property. Used as minimum width for bounds of integration.

Key: maxmargin.tolerance
Type: double
Default Value: 1e-3
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for double property, cutting plane tolerance

Key: maxmargin.slackpenalty
Type: double
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for double property, slack penalty C, where objective is ||w|| + C (slack)

Key: maxmargin.maxiter
Type: int
Default Value: 500
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for positive integer, maximum number of constraints to add to quadratic program

Key: maxmargin.nonnegativeweights
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for boolean property. If true, only non-negative weights will be learned.

Key: maxmargin.scalenorm
Type: NormScalingType
Default Value: NormScalingType.NONE
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for NormScalingType enum property. Determines type of norm scaling MaxMargin will use in its objective.

Key: maxmargin.squareslack
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for SquareSlack boolean property. Determines whether to penalize slack linearly or quadratically.

Key: minnormprog.conicprogramsolver
Type: ConicProgramSolverFactory
Default Value: new HomogeneousIPMFactory()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MinNormProgram
Description: Key for ConicProgramSolverFactory or String property. Should be set to a ConicProgramSolverFactory (or the binary name of one). The ConicReasoner will use this ConicProgramSolverFactory to instantiate a ConicProgramSolver.

Key: frankwolfe.tolerance
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for double property, cutting plane tolerance

Key: frankwolfe.maxiter
Type: int
Default Value: 500
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for positive integer, maximum iterations

Key: frankwolfe.averageweights
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for boolean property. If true, algorithm will output average weights when learning exceeds maximum number of iterations.

Key: frankwolfe.nonnegativeweights
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for boolean property. If true, only non-negative weights will be learned.

Key: frankwolfe.normalize
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for boolean property. If true, loss and gradient will be normalized by number of labels.

Key: frankwolfe.regparam
Type: double
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for double property, regularization parameter \lambda, where objective is \lambda*||w|| + (slack)

Key: l1maxmargin.balanceloss
Type: LossBalancingType
Default Value: LossBalancingType.NONE
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.L1MaxMargin
Description: Key for LossBalancingType enum property. Determines the type of loss balancing MaxMargin will use. @see LossBalancingType

Key: weightlearning.reasoner
Type: ReasonerFactory
Default Value: new ADMMReasonerFactory()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.WeightLearningApplication
Description: Key for Factory or String property.

Should be set to a ReasonerFactory or the fully qualified name of one. Will be used to instantiate a Reasoner.

This reasoner will be used when constructing ground models for weight learning, unless this behavior is overriden by a subclass.

Key: lazympeinference.reasoner
Type: ReasonerFactory
Default Value: new ADMMReasonerFactory()
Module: psl-core
Defining Class: org.linqs.psl.application.inference.LazyMPEInference
Description: Key for Factory or String property.

Should be set to a ReasonerFactory or the fully qualified name of one. Will be used to instantiate a Reasoner.

Key: lazympeinference.maxrounds
Type: int
Default Value: 100
Module: psl-core
Defining Class: org.linqs.psl.application.inference.LazyMPEInference
Description: Key for int property for the maximum number of rounds of inference.

Key: mpeinference.reasoner
Type: ReasonerFactory
Default Value: new ADMMReasonerFactory()
Module: psl-core
Defining Class: org.linqs.psl.application.inference.MPEInference
Description: Key for Factory or String property.

Should be set to a ReasonerFactory or the fully qualified name of one. Will be used to instantiate a Reasoner.

Key: LTNmaxspeudolikelihood.bool
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood
Description: Boolean property. If true, MaxPseudoLikelihood will treat RandomVariableAtoms as boolean valued. Note that this restricts the types of contraints supported.

Key: LTNmaxspeudolikelihood.numsamples
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood
Description: Key for positive integer property. MaxPseudoLikelihood will sample this many values to approximate the integrals in the marginal computation.

Key: LTNmaxspeudolikelihood.constrainttolerance
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood
Description: Key for constraint violation tolerance

Key: LTNmaxspeudolikelihood.minwidth
Type: double
Default Value: 1e-2
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood
Description: Key for positive double property. Used as minimum width for bounds of integration.

Key: CONFIG_PREFIX + ".lowerboundepsilon"
Type: double
Default Value: 1e-6
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.reasoner.admm.LatentTopicNetworkADMMReasoner
Description: Key for positive double property. Minimum value that theta and phi parameters are allowed to take, enforced by clipping the consensus variables to this.

Key: latentTopicNetworks.hingeLossTheta
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to use a hinge-loss MRF to model theta.

Key: latentTopicNetworks.hingeLossPhi
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to use a hinge-loss MRF to model phi.

Key: latentTopicNetworks.numIterations
Type: int
Default Value: 200
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of EM iterations to perform.

Key: latentTopicNetworks.numBurnIn
Type: int
Default Value: 0
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of vanilla LDA EM iterations to perform before using hinge losses in the M step.

Key: latentTopicNetworks.numTopics
Type: int
Default Value: 20
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of EM iterations to perform.

Key: latentTopicNetworks.alpha
Type: double
Default Value: 1.01
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive double property, the Dirichlet prior hyperparameter alpha.

Key: latentTopicNetworks.beta
Type: double
Default Value: 1.01
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive double property, the Dirichlet prior hyperparameter beta.

Key: latentTopicNetworks.weightLearning
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to perform pseudo-likelihood weight learning in the EM loop.

Key: latentTopicNetworks.firstWLearningIter
Type: int
Default Value: 50
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of EM iterations to perform before performing weight learning.

Key: latentTopicNetworks.WLearningGap
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of EM iterations to between weight learning steps.

Key: latentTopicNetworks.initMStepToLDAtheta
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to initialize the ADMM variables to LDA, for theta. The alternative is to initialize at the previous iteration. LDA initialization may be best in high dimensions, while previous iteration initialization may be best with strong weights.

Key: latentTopicNetworks.initMStepToLDAphi
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to initialize the ADMM variables to LDA, for phi. The alternative is to initialize at the previous iteration. LDA initialization may be best in high dimensions, while previous iteration initialization may be best with strong weights.

Key: latentTopicNetworks.saveDir
Type: String
Default Value: ""
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for string property indicating the directory to save intermediate topic models (if empty, do not save them).

Key: LTNmaxspeudolikelihood.bool
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood_Naive
Description: Boolean property. If true, MaxPseudoLikelihood will treat RandomVariableAtoms as boolean valued. Note that this restricts the types of contraints supported.

Key: LTNmaxspeudolikelihood.numsamples
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood_Naive
Description: Key for positive integer property. MaxPseudoLikelihood will sample this many values to approximate the integrals in the marginal computation.

Key: LTNmaxspeudolikelihood.constrainttolerance
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood_Naive
Description: Key for constraint violation tolerance

Key: LTNmaxspeudolikelihood.minwidth
Type: double
Default Value: 1e-2
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood_Naive
Description: Key for positive double property. Used as minimum width for bounds of integration.

Key: uaiformatreasoner.task
Type: Task
Default Value: Task.MPE
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.UAIFormatReasoner
Description: Key for Task enum property which is reasoner task to perform.

Key: uaiformatreasoner.seed
Type: int
Default Value: 0
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.UAIFormatReasoner
Description: Key for integer property which is random seed for reasoner

Key: booleanmaxwalksat.maxflips
Type: int
Default Value: 50000
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMaxWalkSat
Description: Key for positive integer property that is the maximum number of flips to try during optimization

Key: booleanmaxwalksat.noise
Type: double
Default Value: (double) 1 / 100
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMaxWalkSat
Description: Key for double property in [0,1] that is the probability of randomly perturbing an atom in a randomly chosen potential

Key: booleanmcsat.numsamples
Type: int
Default Value: 2500
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMCSat
Description: Key for length of Markov chain

Key: booleanmcsat.numburnin
Type: int
Default Value: 500
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMCSat
Description: Number of burn-in samples

Key: ad3reasoner.algorithm
Type: Algorithm
Default Value: Algorithm.AD3
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.AD3Reasoner
Description: Key for Algorithm enum property which is inference algorithm to use.

Key: conicreasoner.conicprogramsolver
Type: ConicProgramSolverFactory
Default Value: new HomogeneousIPMFactory()
Module: psl-core
Defining Class: org.linqs.psl.reasoner.conic.ConicReasoner
Description: Key for org.linqs.psl.config.Factory or String property. Should be set to a org.linqs.psl.optimizer.conic.ConicProgramSolverFactory (or the binary name of one). The ConicReasoner will use this org.linqs.psl.optimizer.conic.ConicProgramSolverFactory to instantiate a org.linqs.psl.optimizer.conic.ConicProgramSolver, which will then be used for inference.

Key: admmreasoner.maxiterations
Type: int
Default Value: 25000
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for int property for the maximum number of iterations of ADMM to perform in a round of inference

Key: admmreasoner.stepsize
Type: double
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for non-negative double property. Controls step size. Higher values result in larger steps.

Key: admmreasoner.epsilonabs
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for positive double property. Absolute error component of stopping criteria.

Key: admmreasoner.epsilonrel
Type: double
Default Value: 1e-3
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for positive double property. Relative error component of stopping criteria.

Key: admmreasoner.stopcheck
Type: int
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for positive integer. The number of ADMM iterations after which the termination criteria will be checked.

Key: admmreasoner.numthreads
Type: int
Default Value: Runtime.getRuntime().availableProcessors()
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for positive integer. Number of threads to run the optimization in.

Key: executablereasoner.executable
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.reasoner.ExecutableReasoner
Description: Key for String property which is path to reasoner executable. This is the rare PSL property that is mandatory to specify.

Key: atomeventframework.activation
Type: double
Default Value: 0.01
Module: psl-core
Defining Class: org.linqs.psl.model.atom.AtomEventFramework
Description: ; Key for double property in (0,1]. Activation events will be generated for RandomVariableAtoms when they meet or exceed this threshold.

Key: blocksolver.maxcgiter
Type: int
Default Value: 1000000
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for integer property. The BlockSolver will throw an exception if the conjugate gradient solver completes this many iterations without solving the normal system.

Key: blocksolver.cgreltol
Type: double
Default Value: 10e-10
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for double property. The conjugate gradient solver will terminate as converged if the residual is less than this value times the initial residual.

Key: blocksolver.cgabstol
Type: double
Default Value: 10e-50
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for double property. The conjugate gradient solver will terminate as converged if the residual is less than this value.

Key: blocksolver.cgdivtol
Type: double
Default Value: 10e5
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for double property. The BlockSolver will throw an exception if the conjugate graident solver reaches an iterate whose residual is at least this value times the initial residual.

Key: blocksolver.preconditionerterms
Type: int
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for non-negative integer property. The BlockSolver preconditions the Schur's complement matrix by a truncated series summation. Higher values generally result in fewer conjugate gradient iterations, but each iteration is more time consuming.

Key: cgsolver.maxcgiter
Type: int
Default Value: 1000000
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for integer property. The ConjugateGradient solver will throw an exception if the conjugate gradient solver completes this many iterations without solving the normal system.

Key: cgsolver.cgreltol
Type: double
Default Value: 10e-10
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for double property. The ConjugateGradient solver will terminate as converged if the residual is less than this value times the initial residual.

Key: cgsolver.cgabstol
Type: double
Default Value: 10e-50
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for double property. The ConjugateGradient solver will terminate as converged if the residual is less than this value.

Key: cgsolver.cgdivtol
Type: double
Default Value: 10e5
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for double property. The ConjugateGradient solver will throw an exception if the conjugate graident solver reaches an iterate whose residual is at least this value times the initial residual.

Key: cgsolver.preconditioner
Type: PreconditionerFactory
Default Value: new IdentityPreconditionerFactory()
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for Factory or String property. Should be set to a PreconditionerFactory or the fully qualified name of one. Will be used to instantiate a DoublePreconditioner.

Key: ppipm.threadpoolsize
Type: int
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.ParallelPartitionedIPM
Description:

Key: hipm.dualize
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for boolean property. If true, the IPM will dualize the conic program before solving it. The IPM will substitute the results back into the original problem, so this should only affect the computational cost of #solve(ConicProgram), not the quality of the solution. @see Dualizer

Key: hipm.infeasibilitythreshold
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property. The IPM will consider the problem primal, dual, or gap feasible if the primal, dual, or gap infeasibility is less than its value, respectively.

Key: hipm.gapthreshold
Type: double
Default Value: 10e-6
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property. The IPM will iterate until the duality gap is less than its value.

Key: hipm.tauthreshold
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property. The IPM will multiply its value by another value and consider tau small if tau is less than that product.

Key: hipm.muthreshold
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property. The IPM will consider mu small if mu is less than its value times the initial mu.

Key: hipm.beta
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property in (0,1). The IPM will stay in a neighborhood of the central path, the size of which is defined by its value. Larger values correspond to smaller neighborhoods.

Key: hipm.delta
Type: double
Default Value: 0.5
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property in [0,1]. The IPM will use its value to determine how aggressively to minimize the objective (versus to follow the central path). Lower values correspond to more aggressive strategies.

Key: hipm.normalsolver
Type: NormalSystemSolverFactory
Default Value: new CholeskyFactory()
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for Factory or String property. Should be set to a NormalSystemSolverFactory or the fully qualified name of one. Will be used to instantiate a NormalSystemSolver.

Key: ipm.initfeasible
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.IPM
Description: Key for boolean property. If true, the IPM will initialize the conic program to a feasible point before solving it. @see FeasiblePointInitializer

Key: ipm.dualize
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.IPM
Description: Key for boolean property. If true, the IPM will dualize the conic program before solving it. The IPM will substitute the results back into the original problem, so this should only affect the computational cost of #solve(ConicProgram), not the quality of the solution. @see Dualizer

Key: ipm.dualitygapthreshold
Type: double
Default Value: 0.0001
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.IPM
Description: Key for double property. The IPM will iterate until the duality gap is less than its value.

Key: ipm.infeasibilitythreshold
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.IPM
Description: Key for double property. The IPM will iterate until the primal and dual infeasibilites are each less than its value. @see ConicProgram#getPrimalInfeasibility() @see ConicProgram#getDualInfeasibility()

Key: cgipm.maxcgiter
Type: int
Default Value: 1000000
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.cg.ConjugateGradientIPM
Description: Key for integer property. The ConjugateGradientIPM will throw an exception if the conjugate gradient solver completes this many iterations without solving the normal system.

Key: cgipm.cgreltol
Type: double
Default Value: 1e-10
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.cg.ConjugateGradientIPM
Description: Key for double property. The conjugate gradient solver will terminate as converged if the residual is less than this value times the initial residual.

Key: cgipm.cgabstol
Type: double
Default Value: 1e-50
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.cg.ConjugateGradientIPM
Description: Key for double property. The conjugate gradient solver will terminate as converged if the residual is less than this value.

Key: cgipm.cgdivtol
Type: double
Default Value: 1e5
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.cg.ConjugateGradientIPM
Description: Key for double property. The ConjugateGradientIPM will throw an exception if the conjugate graident solver reaches an iterate whose residual is at least this value times the initial residual.

Key: rdbmsdatastore.valuecolumn
Type: String
Default Value: truth
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.RDBMSDataStore
Description: Key for String property for the name of the value column in the database.

Key: rdbmsdatastore.confidencecolumn
Type: String
Default Value: confidence
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.RDBMSDataStore
Description: Key for String property for the name of the confidence column in the database.

Key: rdbmsdatastore.partitioncolumn
Type: String
Default Value: partition
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.RDBMSDataStore
Description: Key for String property for the name of the partition column in the database.

Key: rdbmsdatastore.usestringids
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.RDBMSDataStore
Description: Key for boolean property of whether to use RDBMSUniqueStringID as a UniqueID.


Configuration

See the Configuration Options page for all options that PSL uses.

Many components of the PSL software have modifiable parameters and options, called properties. Every property has a key, which is a string that should uniquely identify it. These keys are organized into a namespace hierarchy, with each level separated by dots, e.g. <namespace>.<option>.

Each PSL class can specify a namespace for the options used by the class and its subclasses. For example, the org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron weight learning class specifies the namespace votedperceptron. Setting the configuration option votedperceptron.stepsize allows you to control the size of the gradient descent update step in the VotedPerceptron weight learning class.

Every property has a type and a default value, which is the value the object will use unless a user overrides it. Every class with properties documents them by declaring their keys as public static final Strings, with Javadoc comments describing the corresponding property's type and semantics. Another public static final member declares the default value for that property.

Bundles

Users of the PSL software can specify property values by grouping them into bundles, which are objects that implement the org.linqs.psl.config.ConfigBundle interface. Every bundle has a name and a map from property keys to values. A configurable component takes a ConfigBundle as an argument in its constructor and queries it with a property key and a default value. If the bundle does not map the key to a value, it returns the provided default, e.g.

ConfigBundle cb;
stepsize = cb.getProperty('votedperceptron.stepsize', 100);

PSL components also pass their bundles to components that they create, so a user can group their property values into a single bundle, pass it into a component with which they interact, and the values will be used by the entire stack of components. Any properties that don't belong to a particular component will be ignored by that component.

The psl.properties file

PSL projects can specify different configuration bundles in a file named psl.properties on the classpath. The standard location for this file is <project root>/src/main/resources/psl.properties. Each key-value pair should be specified on its own line with a <bundle>.<namespace>.<option> = <value> format. The following example sets options for the example and test bundles.

# This is an example properties file for PSL.
# 
# Options are specified in a namespace hierarchy, with levels separated by '.'.
# The top levels are called bundles. Use the ConfigManager class to access them.

# Weight learning parameters
# Parameters for voted perceptron algorithm
# This property adaptively changes the step size of the updates
example.votedperceptron.schedule = true

# This property specifies the number of iterations of voted perceptron updates
example.votedperceptron.numsteps = 700

# This property specifies the initial step size of the voted perceptron updates
example.votedperceptron.stepsize = 0.1

# Parameters for the Hard-EM weight learning algorithm
# This property specifies the number of Hard-EM updates
test.em.iterations = 1000

# This property specifies the tolerance to check for convergence for Hard-EM
test.em.tolerance = 1e-5

# This property specifies the number of iterations of voted perceptron updates
test.votedperceptron.numsteps = 1000

The ConfigManager object

The standard way to create bundles is with an instance of the org.linqs.psl.config.ConfigManager class. ConfigManager uses the Singleton pattern. The ConfigManager instance will read psl.properties to generate bundles. Then a bundle can be instantiated with the code

ConfigBundle bundle = ConfigManager.getManager().getBundle("example");

Constraints

Arithmetic rules can be used to enforce modeling constraints. Many different types of constraints can be modeled, here are a few of the common types:

Let Foo be the binary predicate that we wish to put constraints on. (Constraints are not limited to only binary predicates.)

Functional

A Functional constraint enforces the condition that for each possible constant c, the values of all groundings of Foo(A, c) sum to exactly 1.

Foo(A, +c) = 1 .

Note that the rule is unweighted (as indicated by the period at the end).

Inverse Functional

Summing the first argument instead of the second one is often called Inverse Functional.

Foo(+c, A) = 1 .

Partial Functional

A Partial Functional constraint is like a Functional one, except the values of all groundings of Foo(A, c) sum to 1 or less.

Foo(A, +c) <= 1 .

Partial Inverse Functional

Foo(+c, A) <= 1 .

Core Topics

This page serves as a starting place for getting familiar with PSL. Entirely new users are reccomended to start with the Environment Setup and Configuration sections.


Creating a New Project

Before you set up a new project, ensure that the prerequisites are met.

The easiest way to get a new project started in PSL is to copy an existing project. The examples are kept up-to-date and exhibit the preferred style for PSL programs. It is recommended to start there and change the program as you go.


Database Creation

To read in the truth values of ground atoms from text files, a DataStore object is required.

DataStore data = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, "database/path", true), configBundle);

In the code snippet above, the RDBMSDataStore constructor takes two arguments:

  • H2DatabaseDriver specifying the database to connect to.
  • ConfigBundle specifying additional configuration.

The H2DatabaseDriver constructor accepts three arguments:

  • Type.Disk for a database stored on disk, or Type.Memory for a database held in RAM.
  • The path to store the database (if Type.Disk) is used, or a unique identifier for the database (if Type.Memory).
  • The final boolean indicates whether to clear the contents from any existing database found at the same path (true for clear and false to leave the database intact).

After a DataStore is created, we can read in the truth values of ground atoms from text files as follows:

Inserter insert = data.getInserter(<predicateName>, <partition>);
InserterUtils.loadDelimitedData(<filePath>);
  • <partition> is the partition to write the data into.
  • <predicateName> is the name of the predicate whose ground atoms are to be read.
  • <fileName> is the name of the file containing the ground atoms.

All data read in with InserterUtils.loadDelimitedData is assumed to have a truth value of 1. If you need to also read in specific truth values, use InserterUtils.loadDelimitedData.


Developing PSL

Setting up your environment

Cloning the PSL repository

If you are already comfortable using Git and you don't want or need to push commits to GitHub, then you can just clone the PSL repository using the command below. Otherwise, this short primer on some Git essentials may be useful.

>> git clone https://github.com/linqs/psl.git

Building PSL from source

Change to the top-level directory of your working copy and run

>> mvn compile

You can install PSL to your local Maven repository by running

>> mvn install

Best practices

Git policies

If you're a member of the LINQS group, you may eventually need to release a new version of PSL. There are a number of steps involved in the process, which are detailed in the guide for Releasing a New Stable Version.


Eclipse integration

Eclipse is an extensible, integrated development environment that can be used to develop PSL and PSL projects. The recommended way of using Eclipse with PSL is to use the Eclipse plugin for Maven to generate Eclipse project information for a PSL project and then import that project into Eclipse.

Prerequisites

Ensure that you have version 3.6 (Helios) or higher of Eclipse installed. Then, install the Groovy Eclipse plugin and the optional 1.8 version of the Groovy compiler, which is available when installing the plugin. The version 1.8 compiler is what Maven will use to compile the Groovy scripts, so builds done by either tool should be interchangeable. If you use an older version, Eclipse will probably recompile some files which then won't be compatible with the rest, and it won't run. (Cleaning and rebuilding everything should help.)

You might have to change the Groovy compiler version to 1.8.x in your Groovy compiler preferences (part of the Eclipse preferences).

You need to add a classpath variable in Eclipse to point to your local Maven repository. You can access the variables either from the main options or from the build-path editor for any project. Where you specify additional libs, make a new variable (there should be a button) with the name M2_REPO and the path to your repo (e.g., ~/.m2/repository). This can also be achieved automatically via the following Maven command:

mvn -Declipse.workspace=/path/to/workspace eclipse:configure-workspace

Generating and importing Eclipse metadata

In the top-level directory of your PSL project, run

>> mvn eclipse:eclipse

Then in Eclipse, go to File/Import/General/\. Select the top-level directory of your project. You probably don't want to copy it into the workspace, so uncheck that option.

Running programs

Be sure to run as a "Java application."

Tips

  • If you want to delete the Eclipse metadata for any reason, run
>> mvn eclipse:clean
  • If you want to generate metadata for a project that depends on another project you're developing with Eclipse (PSL or not), run
>> mvn eclipse:eclipse -Declipse.workspace=<path to Eclipse workspace>

The Eclipse plugin for Maven will look in the provided workspace for any projects that match dependencies declared in your project's POM file. Your project will be configured to depend on any such projects found as opposed to their respective installed jars. This way, changes to the sources of those dependencies will be seen by your project without reinstalling the dependencies. Note that this works even for dependencies that were imported but not copied into the workspace.

  • The m2eclipse Eclipse plugin is another option for developing PSL projects with Eclipse. It differs from the recommended method in that it is an Eclipse plugin designed to support Maven projects, as opposed to a Maven plugin designed to support Eclipse.

Environment Setup

Before working with PSL, you must make sure that your working environment is properly setup.


Example Walkthrough

This page will walk you through the Groovy version of the Easy Link Prediction example.

Setup

First, ensure that your system meets the prerequisites . Then clone the psl-examples repository:

git clone https://bitbucket.org/linqs/psl-examples.git

Running

Then move into the root directory for the easy link prediction example:

cd psl-examples/link_prediction/easy/groovy

Each example comes with a run.sh script to quickly compile and run the example. To compile and run the example:

./run.sh

To see the output of the example, check the output/default/knows_infer.txt file:

cat output/default/knows_infer.txt

You should see some output like:

--- Atoms: 
KNOWS(Steve, Ben) Truth=[0.64]
KNOWS(Alex, Dhanya) Truth=[0.36]
< ... 48 rows omitted for brevity ...>
KNOWS(Dhanya, Ben) Truth=[0.55]
KNOWS(Alex, Sabina) Truth=[0.44]
# Atoms: 52

The exact order of the output may change and some rows were left out for brevity.

Now that we have the example running, lets take a look inside the only source file for the example: src/main/java/edu/ucsc/linqs/psl/example/easylp/EasyLP.groovy.

Configuration

One of the first things you may notice in this file is a private classes that holds configuration data. In addition to ConfigBundles, it is sometimes also useful to create configuration classes that you can pass quickly change and run different experiments with. This is not required, but you may find it useful.

In the populateConfigBundle() method you can see the ConfigBundle actually getting created:

ConfigBundle cb = ConfigManager.getManager().getBundle("easylp");

Defining Predicates

The definePredicates() method defines the three predicates for our example:

model.add predicate: "Lived", types: [ConstantType.UniqueID, ConstantType.UniqueID];
model.add predicate: "Likes", types: [ConstantType.UniqueID, ConstantType.UniqueID];
model.add predicate: "Knows", types: [ConstantType.UniqueID, ConstantType.UniqueID];

Each predicate here takes two unique identifiers as arguments.

  • Lived indicates that a person has lived in a specific location. For example: Lived(Sammy, SantaCruz) would indicate that Sammy has lived in Santa Cruz.
  • Likes indicates the extent to which a person likes something. For example: Likes(Sammy, Hiking) would indicate the extent that Sammy likes hiking.
  • Knows indicates that a person knows some other person. For example: Knows(Sammy, Jay) would indicate that Sammy and Jay know each other.

Defining Rules

The defineRules() method defines seven rules for the example. There are pages that cover the PSL rule specification and the rule specification in Groovy . We will discuss the following two rules:

model.add(
   rule: ( Lived(P1,L) & Lived(P2,L) & (P1-P2) ) >> Knows(P1,P2),
   squared: config.sqPotentials,
   weight : config.weightMap["Lived"]
);

model.add(
   rule: ~Knows(P1,P2),
   squared:config.sqPotentials,
   weight: config.weightMap["Prior"]
);

The first first rule can be read as "If P1 and P2 are different people and have both lived in the same location, L, then they know each other". Some key points to note from this rule are:

  • The variable L was reused in both Lived atoms and therefore must refer to the same location.
  • (P1 - P2) is shorthand for P1 and P2 referring to different people (different unique ids).

The second rule is a special rule that acts as a prior. Notice how this rule is not an implication like all the other rules. Instead, this rule can be read as "By default, people do not know each other". Therefore, the program will start with the belief that no one knows each other and this prior belief will be overcome with evidence.

Loading Data

The loadData() method loads the data from the flat files in the data directory into the data store that PSL is working with. For berevity, we will only be looking at two files:

Inserter inserter = ds.getInserter(Lived, obsPartition);
InserterUtils.loadDelimitedData(inserter, Paths.get(config.dataPath, "lived_obs.txt").toString());

inserter = ds.getInserter(Likes, obsPartition);
InserterUtils.loadDelimitedDataTruth(inserter, Paths.get(config.dataPath, "likes_obs.txt").toString());

Both portions load data using the InserterUtils. The primary difference between the two calls is that the second one is looking for a truth value while the first one assumes that 1 is the truth value.

If we look in the files, we see lines like:

data/lives_obs.txt

Jay    Maryland
Jay    California

data/likes_obs.txt

Jay    Machine Learning  1
Jay    Skeeball 0.8

In lives_obs.txt, there is no need to use a truth value because living somewhere is a discrete act. You have either lived there or you have not. Liking something, however, is more continuous. Jay may like Machine Learning 100%, but he only likes Skeeball 80%.

Partitions

Here we must take a moment to talk about data partitions. In PSL, we use partitions to organize data. A partition is nothing more than a container for data, but we use them to keep specific chunks of data together or separate. For example if we are running evaluation, we must be sure not use our test partition in training.

PSL users typically organize their data in at least three different partitions (all of which you can see in this example):

  • observations (called obsPartition in this example): In this partition we put actual observed data. In this example, we put all the observations about who has lived where, who likes what, and who knows who in the observations partition.
  • targets (called targetsPartition in this example): In this partition we put atoms that we want to infer values for. For example if we want to if Jay and Sammy know each other, then we would put the atom Knows(Jay, Sammy) into the targets partition.
  • truth (called truthPartition in this example): In this partition we put our test set, data that we have actual values for but are not including in our observations for the purpose of evaluation. For example, if we know that Jay and Sammy actually do know each other, we would put Knows(Jay, Sammy) in the truth partition with a truth value of 1.

Running Inference

The runInference() method handles running inference for all the data we have loaded.

Before we run inference, we have to set up a database to use for inference:

HashSet closed = new HashSet<StandardPredicate>([Lived, Likes]);
Database inferDB = ds.getDatabase(targetsPartition, closed, obsPartition);

The getDatabase() method of DataStore is the proper way to get a database. This method takes a minimum of two parameters:

  • The partition that this database is allowed to write to. In inference, we will be writing the inferred truth values of atom to the target partition, so we will need to have it open for writing.
  • A set of partitions to be closed in the write partition. Even though we are writing values into the write partition, we may only have a few predicates that we actually want to infer values for. This parameter allows you to close those predicates that you do not what changed. Lastly, getDatabase() takes any number of read-only partitions that you want to include in this database. In our example, we want to include our observations when we run inference.

Now we are ready to run inference:

MPEInference mpe = new MPEInference(model, inferDB, config.cb);
mpe.mpeInference();
mpe.close();
inferDB.close();

To the MPEInference constructor, we supply our model, the database to infer over, and our ConfigBundle. To see the results, then we will need to look inside of the target partition.

Output

The method writeOutput() handles printing out the results of the inference. There are two key lines in this method:

Database resultsDB = ds.getDatabase(targetsPartition);
...
Set atomSet = Queries.getAllAtoms(resultsDB, Knows);

The first line gets a fresh database that we can get the atoms from. Notice that we are passing in targetsPartition as a write partition, but we are actually just reading from it.

The second line uses the Queries class to get all the Knows atoms from the database we just created.

Evaluation

Lastly, the evalResults() method handles seeing how well our model did. The DiscretePredictionComparator and ContinuousPredictionComparator classes provide basic tools to compare two partitions. In this example, we are comparing our target partition to our truth partition.


Examples

Example PSL programs are available at https://bitbucket.org/linqs/psl-examples.

Each example contains a script called run.sh which will handle all the building and running.

A detailed walkthrough of an example can he found here .


External Functions

Customized functions can be created be implementing the ExternalFunction interface. The getValue() method should return a value in [0, 1].

public class MyStringSimilarity implements ExternalFunction {
   @Override
   public int getArity() {
      return 2;
   }

   @Override
   public ConstantType[] getArgumentTypes() {
      return [ConstantType.String, ConstantType.String].toArray();
   }

   @Override
   public double getValue(ReadOnlyDatabase db, Constant... args) {
      return args[0].toString().equals(args[1].toString()) ? 1.0 : 0.0;
   }
}

A function comparing the similarity between two entities or text can then be declared as follows:

model.add function: MyStringSimilarity, implementation: new MyStringSimilarity();

A function can be used in the same manner as a predicate in rules:

Name(P1, N1) & Name(P2, N2) & MyStringSimilarity(N1, N2) -> SamePerson(P1, P2)

Glossary

The PSL software uses concepts from the PSL paper , and introduces new ones for advanced data management and machine learning. On this page, we define the commonly used terms and point out the corresponding classes in the codebase.

Please note that this page is organized conceptually, not alphabetically.

Preliminaries

Hinge-loss Markov random field: A factor graph defined over continuous variables in the [0,1] interval with (log) factors that are hinge-loss functions. Many classes in PSL work together to implement the functionality of HL-MRFs, but the class for storing collections of hinge-loss potentials, which define HL-MRFs, is GroundRuleStore.java .

Ground atom: A logical relationship corresponding to a random variable in a HL-MRF. For example, Friends("Steve", "Jay") is an alias for a specific random variable. Implemented in GroundAtom.java .

Random variable atom: A ground atom that is unobserved, i.e., no value is known for it. A HL-MRF assigns probability densities to assignments to random variable atoms. Implemented in RandomVariableAtom.java .

Observed atom: A ground atom that has an observed, immutable value. HL-MRFs are conditioned on observed atoms. Implemented in ObservedAtom.java .

Atom: A generalization of ground atoms that allow logical variables as placeholders for constant arguments. For example, Friends("Steve", A) is a placeholder for all the ground atoms that can be obtained by substituting constants for the logical variable A. Implemented in Atom.java .

Syntax

PSL Program: A set of rules, each of which is a template for hinge-loss potentials or hard linear constraints. When grounded over a base of ground atoms, a PSL program induces a HL-MRF conditioned on any specified observations. Implemented in Model.java .

Rule: See Rule Specification .

Data Management

Data Store: An entire data repository, such as a relational database management system (RDBMS). Implemented in DataStore.java .

Partition: A logical division of ground atoms in a data store. Implemented in Partition.java .

Database: A logical view of a data store, constructed by specifying a write partition and one or more read partitions of a data store. Implemented in Database.java .

Open Predicate: A predicate whose atoms can be random variable atoms, i.e., unobserved.The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.

Closed Predicate: A predicate whose atoms are always observed atoms. The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.


H2 Web Interface

If you use H2 as the backend database for PSL (as is done in the examples), it can be helpful to open up the resulting database and examine it for debugging purposes.

Prerequisites

You should set up your PSL program to use H2 on disk and note where it is stored. For example, if you create your DataStore using the following code

DataStore data = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, "/home/steve/psl", true), config);

then PSL will create an H2 database in the file /home/steve/psl/psl.mv.db. Then, run your program so the resulting H2 database can be inspected.

Starting the H2 Web Server

You will need to use the H2 jar for your classpath. This is likely ~/.m2/repository/com/h2database/h2/1.4.192/h2-1.4.192.jar, but you will need to modify it if, for example, you're using a different version of H2. You start the H2 web server by running the following command:

java -cp ~/.m2/repository/com/h2database/h2/1.4.192/h2-1.4.192.jar org.h2.tools.Server

Using the H2 Web Server

Once you have started the web server, you can access it at http://localhost:8082. To log in, you should change the connection string to point to your H2 database file without .mv.db on the end. The username and password are both empty strings.


Home

Welcome to the PSL software Wiki!

Getting Started with Probabilistic Soft Logic

To get started with PSL you can follow one of these guides:

  • Command Line Interface for New Users : If you are new to PSL we suggest that you start with our Command Line Interface (CLI), which allows you to write a complete model in a simple text file.

  • Groovy for Intermediate Users : If you are comfortable with Java/Groovy, and want to get your hands dirty with advanced modeling capabilities we recommend that you use our Groovy interface.

  • Java for Application Developers : If you plan on integrating PSL into your own applications, and will need direct access to the Java API, refer to this guide.

PSL requires Java, so before you start make sure that you have Java installed.

Before you get started you may want to learn more about PSL.

Learn More About PSL

PSL is a machine learning framework for building probabilistic models developed by the Statistical Relational Learning Group LINQS at the University of Maryland and the University of California Santa Cruz. PSL has produced state-of-the-art results in many areas spanning natural language processing, social-network analysis, and computer vision. The complete list of publications and projects is available on the PSL homepage . The homepage also has several videos , to introduce you to PSL.

Resources

Examples

API Refernce

Migration Guide

We are improving PSL all the time, and now have two versions! If you are migrating from PSL 1.0 to 2.0 please refer to our Migration Guide.

Glossary

Developing PSL Guide


Introduction to Probabilistic Soft Logic

Probabilistic soft logic (PSL) is a general purpose language for modeling probabilistic and relational domains. It is applicable to a variety of machine learning problems, such as link prediction and ontology alignment. PSL combines the strengths of two powerful theories -- first-order logic, with its ability to succinctly represent complex phenomena, and probabilistic graphical models, which capture the uncertainty and incompleteness inherent in real-world knowledge. More specifically, PSL uses "soft" logic as its logical component and Markov networks as its statistical model.

In "soft" logic, logical constructs need not be strictly false (0) or true (1), but can take on values between 0 and 1 inclusively. For example, in logical formula similarNames(X, Y) => sameEntity(X, Y) (which encodes the belief that if two people X and Y have similar names, then they are likely the same person), the truth value of similarNames(X, Y) and that of the entire formula lie in the range [0, 1]. The logical operators and (^), or (v) and not (~) are defined using the Lukasiewicz t-norms, i.e.,

A ^ B = max{A + B - 1, 0}
A v B = min{A + B, 1}
~A = 1 - A

(Note that if the values of A and B are restricted to be false or true, then the logical operators work as they are conventionally defined.) PSL provides an interface in the Groovy programming language for users to declaratively encode their knowledge of a particular domain in soft logic.

These logical formulas become the features of a Markov network. Each feature in the network is associated with a weight, which determines its importance in the interactions between features. Weights can be specified manually or learned from evidence data using PSL's suite of learning algorithms. PSL also provides sophisticated inference techniques for finding the most likely answer (i.e. the MAP state) to a user's query. The "softening" of the logical formulas allows us to cast the inference problem as a polynomial-time optimization, rather than a (much more difficult NP-hard) combinatorial one. (See LP relaxation for more details.)

For more details on PSL, please refer to the paper Hinge-Loss Markov Random Fields and Probabilistic Soft Logic.


Logging

PSL uses SLF4J for logging. In the PSL Groovy program template, SLF4J is bound to Log4j 1.2. The Log4j configuration file is located at src/main/resources/log4j.properties. It should look something like this:

# Set root logger level to the designated level and its only appender to A1.
log4j.rootLogger=ERROR, A1

# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender

# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

The logging verbosity can be set by changing ERROR in the second line to a different level and recompiling. Options include OFF, WARN, DEBUG, and TRACE.


MOSEK add on

MOSEK is software for numeric optimization. PSL can use MOSEK as a conic program solver via a PSL add on. Mosek support is provided as part of the PSL Experimental package.

Setting up the MOSEK add on

First, install MOSEK 6. In addition to a commercial version for which a 30-day trial is currently available, the makers of MOSEK also currently offer a free academic license. Users will need the "PTS" base system for using the linear distribution of the ConicReasoner and the "PTON" non-linear and conic extension to use the quadratic distribution. Both of these components are currently covered by the academic license.

After installing MOSEK, install the included mosek.jar file to your local Maven repository. (This file should be in <mosek-root>/6/tools/platform/<your-platform>/bin.)

mvn install:install-file -Dfile=<path-to-mosek.jar> -DgroupId=com.mosek \
    -DartifactId=mosek -Dversion=6.0 -Dpackaging=jar

Next, add the following dependency to your project's pom.xml file:

<dependencies>
    ...
    <dependency>
        <groupId>org.linqs</groupId>
        <artifactId>psl-mosek</artifactId>
        <version>YOUR-PSL-VERSION</version>
    </dependency>
    ...
</dependencies>

where YOUR-PSL-VERSION is replaced with your PSL version.

Finally, it might be necessary to rebuild your project.

Using the MOSEK add on

After installing the MOSEK add on, you can use it where ever a ConicProgramSolver is used. To use it for inference with a ConicReasoner set the conicreasoner.conicprogramsolver configuration property to oorg.linqs.psl.optimizer.conic.mosek.MOSEKFactory.

Further, MOSEK requires that two environment variables be set when running. The same bin directory where you found mosek.jar needs to be on the path for shared libraries. The environment variable MOSEKLM_LICENSE_FILE needs to be set to the path to your license file (usually <mosek-root>/6/licenses/mosek.lic).

In bash in Linux, this can be done with the commands

export LD_LIBRARY_PATH=<path_to_mosek_installation>/mosek/6/tools/platform/<platform>/bin
export MOSEKLM_LICENSE_FILE=<path_to_mosek_installation>/mosek/6/licenses/mosek.lic

On Mac OS X, instead set DYLD_LIBRARY_PATH to the directory containing the MOSEK binaries.


Migrating to PSL 2

Maven Repository Move

Our Maven repository has moved

  • from: http://maven.linqs.org/maven/repositories/psl-releases/
  • to: https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-releases/

The new endpoint will redirect to a https endpoint that may be used if necessary: https://linqs-data.soe.ucsc.edu/maven/repositories/psl-releases/

Naming Changes

Package Renames

All packages have been renamed from edu.umd.cs.* to org.linqs.*.

Renames/Moves

  • edu.umd.cs.psl.model.argument.ArgumentTypeorg.linqs.psl.model.term.ConstantType

Usage Changes

Predicate Arguments

ArgumentType.*ConstantType.*
The arguments for a predicate are now defined in org.linqs.psl.model.term.ConstantType instead of edu.umd.cs.psl.model.argument.ArgumentType. All the same types are supported, just the containing class has been moved and renamed.

Getting a Partition

new Partition(int)DataStore.getPartition("stringIdentifier")
If the partition does not exist, it will be created and returned. If it exists, it will be returned. It is not longer necessary to pass around partitions if you don't want to.

Changes to Rule Syntax

Arithmetic rules are now supported in 2.0. See the Rule Specification for details. Rules in Groovy can now be specified in additional ways. See Rule Specification in Groovy .

Constraints

Constraints are now implemented using unweighted arithmetic rules. See Constraints for more details.

Util and Experimental Breakout

To speed up utility development and reduce bloat, some components have been removed from this primary PSL repository and brought into their own repositories. In these sample POM snippets all versions have been set to CANARY, however you may choose your corresponding release.

PSL Utils

https://github.com/linqs/psl-utils

Data Loading - psl-dataloading

Maven Dependency Declaration
<dependency>
  <groupId>org.linqs</groupId>
  <artifactId>psl-dataloading</artifactId>
  <version>CANARY</version> 
</dependency>
Included (Old) Packages
  • edu.umd.cs.psl.ui.loading
  • edu.umd.cs.psl.ui.data
New Package
  • org.linqs.psl.utils.dataloading

Evaluation - psl-evaluation

<dependency>
  <groupId>org.linqs</groupId>
  <artifactId>psl-evaluation</artifactId>
  <version>CANARY</version> 
</dependency>
Included (Old) Packages
  • edu.umd.cs.psl.evaluation
New Package
  • org.linqs.psl.utils.evaluation

Text Similarity - psl-textsim

<dependency>
  <groupId>org.linqs</groupId>
  <artifactId>psl-textsim</artifactId>
  <version>CANARY</version> 
</dependency>
Included (Old) Packages
  • edu.umd.cs.psl.ui.functions.textsimilarity
New Package
  • org.linqs.psl.utils.textsimilarity

PSL Experimental

https://github.com/linqs/psl-experimental

Data Splitter - psl-datasplitter

<dependency>
  <groupId>org.linqs</groupId>
  <artifactId>psl-datasplitter</artifactId>
  <version>CANARY</version> 
</dependency>
Included (Old) Packages
  • edu.umd.cs.psl.util.datasplitter
New Package
  • org.linqs.psl.experimental.datasplitter

Experiment - psl-experiment

<dependency>
  <groupId>org.linqs</groupId>
  <artifactId>psl-experiment</artifactId>
  <version>CANARY</version> 
</dependency>
Included (Old) Packages
  • edu.umd.cs.psl.ui.experiment
New Package
  • org.linqs.psl.experimental.experiment

Mosek - psl-mosek

<dependency>
  <groupId>org.linqs</groupId>
  <artifactId>psl-mosek</artifactId>
  <version>CANARY</version> 
</dependency>
Included (Old) Packages
  • edu.umd.cs.psl.optimizer.conic.mosek
New Package
  • org.linqs.psl.optimizer.conic.mosek

Optimize - psl-optimize

<dependency>
  <groupId>org.linqs</groupId>
  <artifactId>psl-optimize</artifactId>
  <version>CANARY</version> 
</dependency>
Included (Old) Packages
  • edu.umd.cs.psl.optimizer
  • edu.umd.cs.psl.reasoner.conic
New Package
  • org.linqs.psl.experimental.optimizer

Sampler - psl-sampler

<dependency>
  <groupId>org.linqs</groupId>
  <artifactId>psl-sampler</artifactId>
  <version>CANARY</version> 
</dependency>
Included (Old) Packages
  • edu.umd.cs.psl.sampler
New Package
  • org.linqs.psl.experimental.sampler

Weight Learning - psl-weightlearning

<dependency>
  <groupId>org.linqs</groupId>
  <artifactId>psl-weightlearning</artifactId>
  <version>CANARY</version> 
</dependency>
Included (Old) Packages
  • edu.umd.cs.psl.application.learning.weight.maxmargin
New Package
  • org.linqs.psl.experimental.learning.weight.maxmargin

PSL Administration


Prerequisites

The following software is required to use PSL:

Java 7 or 8 JDK

Ensure that the Java 7 or 8 development kit is installed. Either OpenJDK or Oracle Java work.

We have had some reports of failing builds using Java prior to 1.7.0_110 or 1.8.0_110. If you are issues with Maven (especially handshake errors), try updating your version of java to at least 1.7.0_110 or 1.8.0_110. This is especially relevant for Mac users where the version of Java is less frequently updated.

Maven 3.x

PSL uses Maven to manage builds and dependencies. Users should install Maven 3.x. PSL is developed with Maven and PSL programs are created as Maven projects. See running Maven for help using Maven to build projects.


Releasing a New Stable Version

This a HOWTO on releasing a new stable PSL version. All first and second level headers are steps in the process, and should be followed sequentially.

Preliminaries

Get the Code Ready

A release is a single commit that increments the software's version number to a stable version number and does nothing else. So, before you release a version, make sure all your changes are committed and pushed, and the code is in the state in which you want to release it.

Make sure the copyright notices are up to date.

Test the Code

Remember to test the code and double check it is ready for release. To clean, compile, and test the code, you can do:

mvn clean test

Find the New Version Number

Stable version numbers are of the format x.y.z, where

  • x = major version
  • y = minor version
  • z = patch version

The git branch the code is on (the working branch) should already have a version number in its pom.xml files of the form x.y.z-SNAPSHOT. Whatever x.y.z-SNAPSHOT is, the new version will be x.y.z.

Create the Stable Release

Change the Version

The first step is to change the version number to the stable version number. Remember to perform the commit at the end of the instructions.

Tag the New Stable Version

Run the following two commands:

git tag -a x.y.z -m 'Version x.y.z'
git push origin x.y.z

Update Git Branches

There are two ways the branch structure of the Git repo can change because of a new stable version:

  1. The master branch might need to be updated
  2. The working branch might need to be deleted

Updating the Master Branch

The Master branch should always point to the commit of the highest stable version number, where x, y, and z are treated as separate orders of magnitude.

So, if the master branch points to version 1.2, then releasing 1.1.1 would not update the master branch, but releasing 1.2.1 or 1.3 would.

If you are updating the master branch, it should already be upstream of the new stable version. Substituting the working branch name for WORKING_BRANCH, simply run the following commands:

git checkout master
git pull origin WORKING_BRANCH
git push

Deleting the working branch

There should now be a working branch pointing to the tag "x.y.z" (and possibly the master branch). If the working branch is not the develop branch, it should probably be deleted (which deletes the branch name, not the commit itself). Don't delete the develop branch! Substituting the working branch name for WORKING_BRANCH, run the following commands:

git branch -d WORKING_BRANCH
git push origin :WORKING_BRANCH

Deploy New Stable Version

With the new stable version checked out, on a machine with file system access to the repository, in the top level directory of the project (the one with the PSL project pom.xml file, not any of the subprojects), run the following commands:

mvn clean
mvn deploy

Last Steps

Update Change Log

Update the change log with a list of the main changes since the most recent upstream stable version. For example, if releasing 1.0.2, list the main changes since 1.0.1, even if there is a more recent 1.1 release.

Announce New Release

Post an announcement on the user group . Remember to select the "make an announcement" option, rather than "start a discussion." Here is a template:

Subject: New Version: x.y.z

A new stable version of PSL, version x.y.z (https://github.com/linqs/psl/tree/x.y.z) is now available.

See [switching the PSL version your program uses](switching the PSL version your program uses) for instructions on changing your PSL projects to the new version.

In version x.y.z:
[A list of the main changes]

Rule Specification in Groovy

Models in Groovy support three different ways to specify rules:

In-Line Syntax

Rules can be specified using the natural Groovy syntax and the add() method for models. The rule weight and squaring must be specified as additional arguments. Both may be left off to specify an unweighted rule.

model.add(
   rule: ( Likes(A, 'Dogs') & Likes(B, 'Dogs') ) >> Friends(A, B),
   weight: 5.0,
   squared: true
);

Because the in-line syntax must be a subset of Groovy syntax, the following operator variants are not supported:

  • &&
  • ||
  • ->
  • <<
  • <-
  • !
    Note that there are supported variants for all unsupported operators. Arithmetic rules are not supported with the in-line syntax.

String Syntax

model.add(
   rule: "( Likes(A, 'Dogs') & Likes(B, 'Dogs') ) >> Friends(A, B)",
   weight: 5.0,
   squared: true
);

// Produces the same rule as above.
model.add(
   rule: "5.0: ( Likes(A, 'Dogs') && Likes(B, 'Dogs') ) -> Friends(A, B) ^2"
);

// An unweighted (constraint) variant of the above rule.
model.add(
   rule: "( Likes(A, 'Dogs') && Likes(B, 'Dogs') ) -> Friends(A, B)"
);

// An arithmetic constraint.
model.add(
   rule: "Likes(A, +B) = 1 ."
);

Rules can also be specified directly as a string. Because they are not limited by the Groovy syntax, all operators are available. A rule that specifies a weight and squaring in the string may not also pass "weight" and "squared" arguments.

Bulk String Syntax

// Load multiple rules from a single string.
model.addRules("""
   1: ( Likes(A, 'Dogs') & Likes(B, 'Dogs') ) >> Friends(A, B) ^2
   Likes(A, +B) = 1 .
""");

// Load multiple rules from a file.
model.addRules(new FileReader("myRules.txt"));

The addRules() method may be used add multiple rules at a time, each rule on its own line. A String or Reader may be passed. Each rule must be fully specified with respects to weights and squaring.

Constraints

Constraints are specified as unweighted arithmetic rules. So all you need to do is make an arithmetic rule and either explicitly specify that it is unweighted (using the period syntax), or not specify a weight.

```groovy // An unweighted rule (constraint) explicitly specified with a period. model.add( rule: "Likes(A, +B) = 1 ." );

// An unweighted rule (constraint) implicitly specified by not adding a weight. model.add( rule: "Likes(A, +B) = 1" );


Rule Specification

PSL supports two primary types of rules: Logical and Arithmetic. Each of these types of rules support weights and squaring.

Logical Rules

Logical rules in PSL are implications joined with logical operators (with the exception of negative priors). Since PSL uses soft logic, hard logic operators are replaced with Lukasiewicz operators.

Operators

& (&&) - Logical And
The and operator is binary and functions as a Lukasiewicz t-norm operator: A & B = MAX(0, A + B - 1)

| (||) - Logical Or
The or operator is binary and functions as a Lukasiewicz t-conorm operator: A | B = MIN(1, A + B)

>> (->) / << (<-) - Implication
The implication operator acts similar to the standard logical implication where the truth of the body implies the truth of the head. Note that the head is always the side the that arrow is pointing at and both directions are supported. It is most common to see rules where the body is on the left and the head is on the right. The body of an implication must be a conjunctive clause (contain only and operators) while the head must be a disjunctive clause (contain only or operators).

~ (!) - Negation
The negation operator is unary and functions as a Lukasiewicz negation operator: ~A = 1 - A

Examples

// The same rule written in two different ways.
Nice(A) & Nice(B) -> Friends(A, B)
Friends(A, B) << Nice(A) && Nice(B)

// Using a disjunction in the head instead of a conjunction in the body.
// Also written two different ways.
Friends(A, B) >> Nice(A) || Nice(B)
Nice(A) | Nice(B) <- Friends(A, B)

Arithmetic Rules

Arithmetic rules are relations of two linear combinations.

Operators

The following operators are used in arithmetic rules:

  • +
  • -
  • *
  • /

Note that each side of an arithmetic rule must be a linear combination, so +/- is only allowed between terms and *// is only allowed for coefficients.

Relational Operator

The following relational operators are allowed between the two linear combinations:

  • =
  • <=
  • >=

Summation

A summation can be used when you want to aggregate over a variable. You turn a variable into a summation variable by prefixing it with a +. Each sum variable can only be used once per expression, but you may have multiple different summation variables.

Filter Clauses

A filter clause appears at the end of a rule and decides what values the summation variable can take. There can be multiple filter clauses for each rule, but each summation variable can have at most one filter clause. The filter clause is a logical expression, but uses hard logic rather than Lukasiewicz. All non-zero truth values are considered true in a filter expression. If this expression evaluates to zero for a value, then that value is not used in the summation. Valid things that may appear in the filter clause are:

  • Constants
  • Variables appearing in the associated arithmetic expression
  • Closed predicates
  • The single summation variable that was defined as the argument
// Only sum up friends of A that are nice.
Friends(A, +B) <= 1 {B: Nice(B)}

// Only sum up friends of A that are similar to A.
// Note that Similar must be a closed predicate.
Friends(A, +B) <= 1 {B: Similar(A, B)}

// Only sum up friends of A that are not similar to A.
// Note that Similar must be a closed predicate.
Friends(A, +B) <= 1 {B: !Similar(A, B)}

// Only sum up friends of A where both A and B are nice and similar.
// Note that Similar must be a closed predicate.
Friends(A, +B) <= 1 {B: Nice(A) & Nice(B) & Similar(A, B)}

// Only sum up combinations for friends where A is nice and B is not nice.
Friends(+A, +B) <= 1 {A: Nice(A)} {B: !Nice(B)}

Coefficients

Each term in an arithmetic rule can take an optional coefficient. The coefficient may be any real number and can either appear before the term and act as a multiplier:

2.5 * Similar(A, B) >= 1

or, appear after the term and act as a divisor:

Similar(A, B) / 2.5 <= 1

Coefficient Operators

Special coefficients (called Coefficient Operators) may be used:

|A| - Cardinality
A cardinality coefficient may only be used on a summation variable. It becomes the count of the number of terms substituted for a summation variable.

@Min[A, B] - Max
Returns the maximum of A and B. May be used with summation variables.

@Max[A, B] - Min
Returns the minimum of A and B. May be used with summation variables.

Examples

(Note that these rules are meant to show the semantics of arithmetic rules and may not make logical sense.)

Friends(A, B) = 0.5
Friends(A, +B) <= 1
Friends(A, +B) / |B| <= 1
@Min[2, |B|] * Friends(A, +B) <= 1
Friends(A, +B) <= 1 {B: Nice(B)}
Friends(A, B) <= Nice(A) + Nice(B)
Friends(A, B) >= 3.0 * Nice(A) - 2.0 * Nice(B)

Weights

Every rule must be either weighted or unweighted. Unweighted rules are also called constraints since they are strictly enforced.

Weighted

Weighted rules are prefixed with the weight and a colon:

<weight>: <rule>

For example:

2.5: Nice(A) & Nice(B) & (A != B) -> Friends(A, B)
5.0: Friends(A, +B) <= 1
10.0: Friends(A, +B) <= 1 {B: Nice(B)}

Unweighted

Unweighted rules are suffixed with a period:

<rule> .

For example:

Nice(A) & Nice(B) & (A != B) -> Friends(A, B) .
Friends(A, +B) <= 1 .
Friends(A, +B) <= 1 . {B: Nice(B)}

Squaring

Any weighted rule can choose to square their hinge-loss functions. Squaring the hinge-loss (or "squared potentials") may result in better performance. Non-squared potentials tend to encourage a "winner take all" optimization, while squared potentials encourage more trading off. To square a rule, just suffix a ^2 to it:

2.5: Nice(A) & Nice(B) (A != B) -> Friends(A, B) ^2
5.0: Friends(A, +B) <= 1 ^2
10.0: Friends(A, +B) <= 1 ^2 {B: Nice(B)}

Priors

You may specify priors for a predicate in PSL. A prior is specified on a specific predicate and affects all open ground atoms of that predicate. Priors in PSL must be weighted and may be squared. Priors tend to have low weights, since they are supposed to get overpowered by evidence.

Note that priors are not the same as specifying an initial value for your open predicate in a data file. Once optimization starts, the initial value specified in the data file will quickly get changed and have little/no impact on the final optimization. A prior however, is a ground rule that becomes a full fledged potential function that actively participates in optimization.

Negative Priors

Negative priors are the most common type of prior in PSL. It assumes that all ground atoms for the predicate are zero.

Negative priors may be specified using logical rules:

1.0: ~Friends(A, B) ^2

This prior can be interpreted as "By default, people are not friends".

Arithmetic rules may also be used to specify a negative prior:

1.0: Friends(A, B) = 0 ^2

Positive Priors

Positive priors can be a little more tricky than negative priors.

If you want all the ground atoms to take the same positive prior, then you can just use an arithmetic rule:

1.0: Friends(A, B) = 0.75 ^2

If you want different ground atoms to have different positive priors, then you will need to use a surrogate predicate. First, create a new closed predicate that corresponds 1-to-1 with your open predicate you wish to put the prior on. Then add observations for the surrogate predicate with the truth value being the prior you wish to put on that ground atom. Now just create a rule that directly ties together the surrogate predicate to the open predicate. See the example below.

Non-Uniform Prior Example

Consider a PSL program where we are trying to infer friendship (the Friends predicate). We may have a prior belief on the friendship quality between all people in our data. To encode this prior belief, we will first construct a surrogate predicate called FriendsPrior (the name does not matter). Now we will load FriendsPrior with observations where the truth value of the observation is our prior. Our data file for FriendsPrior may look something like:

Alice   Bob     1.0
Alice   Charlie 0.75
Bob     Charlie 0.33

Now we will add this rule that acts as our prior:

1.0: FriendsPrior(A, b) -> Friends(A, B) ^2

Special Operators

Both logical and arithmetic rules support some special operators.

== (=) - Equals
Ensure that that the left and right side are equal. Note that this is not the same as a similarity function evaluating to 1. Two variables may be 100% similar, but equals will only evaluate to 1 unless they refer to the same value.

!= (~=) - Not Equals
Evaluates to 1 when both side are not the same. This is a very common operator to use in most rules.

For example, consider the following two rules:

Nice(A) && Nice(B) -> Friends(A, B)
Nice(A) && Nice(B) && (A != B) -> Friends(A, B)

If A and B can both take the values "Alice" and "Bob", then the first rule would generate the following groundings:

Nice("Alice") && Nice("Alice") -> Friends("Alice", "Alice")
Nice("Alice") && Nice("Bob") -> Friends("Alice", "Bob")
Nice("Bob") && Nice("Bob") -> Friends("Bob", "Bob")
Nice("Bob") && Nice("Alice") -> Friends("Bob", "Alice")

While the second rule would only generate two groundings:

Nice("Alice") && Nice("Bob") -> Friends("Alice", "Bob")
Nice("Bob") && Nice("Alice") -> Friends("Bob", "Alice")

% (^) - Non-Symmetric
Ensure that the reverse (or equal) paring of the two operands is not grounded. For example, consider the following two rules:

SimilarNames(A, B) && (A % B) -> SamePerson(A, B)
SimilarNames(A, B) -> SamePerson(A, B)

If A and B can both take the values "Alice" and "Bob", then the first rule would generate the following groundings:

SimilarNames("Alice", "Alice") && ("Alice" % "Alice") -> SamePerson("Alice", "Alice")
SimilarNames("Alice", "Bob") && ("Alice" % "Bob") -> SamePerson("Alice", "Bob")
SimilarNames("Bob", "Alice") && ("Bob" % "Alice") -> SamePerson("Bob", "Alice")
SimilarNames("Bob", "Bob") && ("Bob" % "Bob") -> SamePerson("Bob", "Bob")

While the second rule would only generate one grounding:

SimilarNames("Alice", "Bob") && ("Alice" % "Bob") -> SamePerson("Alice", "Bob")

Running a PSL program as a Markov Logic program

PSL includes implementations of Markov Logic inference algorithms. You can use them in your inference and learning applications by setting the following configuration options. Note that these implementations do not support all constraints allowed in PSL. If your program's constraint set does not decompose over atoms, (i.e., each atom participates in at most one constraint), then they will throw exceptions.

Inference

MPEInference and LazyMPEInference can use MaxWalkSat (MPE inference) and MC-Sat (marginal inference) with the following configuration options. Marginal probabilities will be set as the atoms' truth values.

# Sets MPEInference to perform Markov Logic MPE inference
<bundle>.mpeinference.reasoner = org.linqs.psl.reasoner.bool.BooleanMaxWalkSatFactory
# Sets MPEInference to perform Markov Logic marginal inference
<bundle>.mpeinference.reasoner = org.linqs.psl.reasoner.bool.BooleanMCSatFactory

# Sets LazyMPEInference to perform Markov Logic MPE inference
<bundle>.lazympeinference.reasoner = org.linqs.psl.reasoner.bool.BooleanMaxWalkSatFactory
# Sets LazyMPEInference to perform Markov Logic marginal inference
<bundle>.lazympeinference.reasoner = org.linqs.psl.reasoner.bool.BooleanMCSatFactory

Weight Learning

Weight learning that uses a reasoner for MPE inference as a subroutine (e.g., MaxLikelihoodMPE, LazyMaxLikelihoodMPE) can also use Markov Logic MPE inference.

<bundle>.weightlearning.reasoner = org.linqs.psl.reasoner.bool.BooleanMaxWalkSatFactory

MaxPseudoLikelihood also supports Markov Logic weight learning.

<bundle>.maxpseudolikelihood.bool = true

Running a Program

To run a PSL program, change to the top-level directory of its project (the directory with the Maven pom.xml file).

Compile your project:

mvn compile

Now use Maven to generate a classpath for your project's dependencies:

mvn dependency:build-classpath -Dmdep.outputFile=classpath.out

You can now run a class with the command:

java -cp ./target/classes:`cat classpath.out` <fully qualified class name>

where \ is the full name (package and class) of the class you want to run (e.g., org.linqs.psl.example.BasicExample).

Tips and troubleshooting

  • Example programs and experiments are often deployed with a run.sh script that will compile and run the program. Look for this script first.
  • The classpath for the dependencies will need to be regenerated to incorporate any new dependencies or dependencies in new locations (such as when dependency versions have been changed).
  • PSL and PSL projects are configured to use the Groovy-Eclipse compiler for Maven to compile Groovy scripts. (The reference to Eclipse in its name signifies that it is based on the same compiler used in Eclipse, not that Eclipse is required.) This compiler creates regular Java class files from your Groovy scripts. The main methods generated for these class files run the scripts. Hence, the java command is used to run a script.
  • Classes can also be run with the command: sh mvn exec:java -Dexec.mainClass=<fully qualified class name> The advantages are that the project does not need to be compiled separately and the classpath does not need to be generated or updated separately. The disadvantages are that the class output is preceded and succeeded by Maven output, exception stack traces are not printed by default (add the -e switch), and Maven adds some overhead to execution (sometimes a significant amount, especially on less powerful machines).

Sitemap


Switching the PSL version your program uses

To change the version of PSL your project uses, edit your project's pom.xml file. The POM will declare dependencies on one or more PSL artifacts, e.g.,

<dependencies>
    ...
    <dependency>
        <groupId>org.linqs</groupId>
        <artifactId>psl-groovy</artifactId>
        <version>2.0.0</version>
    </dependency>
    ...
</dependencies>

Change the version element of each such dependency to a new version (all the same one) and rebuild. See Choosing a Version of PSL to decide how to specify your version.


Updating Javadocs

We can use maven in the project root to create javadocs for all modules and link them together. This applies to the primary psl repository, psl-experimental, and psl-utils.

mvn javadoc:aggregate

Note that psl-experimental and psl-utils depend on psl. In addition, the javadocs for experimental and utils will try to link to psl. So, the psl javadocs should be deployed before the experimental and util docs are built.


Updating the Copyright Notice

Before releasing a new stable version, it is good to make sure that PSL's copyright notices are up to date. This can be simply done from the command line.

For example, if I wanted to update the copyright from 2013-2015 to 2013-2017, I could do the following in the project root:

mvn clean  # We do not want to bother updating any compiled artifacts.
grep -R "2013-2015"  # Examine the results and ensure that you are only updating correct references.
grep -Rl "2013-2015" | xargs sed -i 's/2013-2015/2013-2017/g'  # Do the actual replacement.

Updating the Version Number

This is a HOWTO on changing the version number in the PSL code base. In most, if not all, cases, this HOWTO should be followed as part of a larger one, such as Releasing a New Stable Version, not by itself.

Version Number Policy

A new version number should be applied as a new commit that does nothing else, so make sure you are working on a clean working copy with no uncommitted changes.

Version numbers consist of the following components:

  • x = major version
  • y = minor version
  • z = patch version

Your new version number should be of the form x.y.z (for a stable version) or x.y.z-SNAPSHOT (for an unstable version).

All the occurrences of a PSL version number should be kept in sync, i.e., have the same value for all occurrences in all pom.xml files and other resources across all modules. In addition, only one commit in the entire Git repository should have a particular stable version number.

Edit the code

Each pom.xml has only one instance of the version number that will need to be changed. You can change each instance manually, or use the following commands (replacing 2.0.0-SNAPSHOT and 2.1.0-SNAPSHOT with the actual versions):

find . -name pom.xml | xargs grep '<version>2.0.0-SNAPSHOT</version>' # Examine the results and ensure that you are only updating correct references.
find . -name pom.xml | xargs sed -i 's#<version>2.0.0-SNAPSHOT</version>#<version>2.1.0-SNAPSHOT</version>#g' # Perform the actual replacement.

Commit

Commit the changes with one of the following commit messages.

If you are changing to a stable version, use:

Version x.y.z

If you are changing to a new snapshot version, use:

Started x.y.z-SNAPSHOT

Push your commit when finished.


Using Git

Getting started with Git

The Git website has information on installing Git, as do the GitHub guides mentioned below. This tutorial is helpful for learning how to use Git, and this tutorial is particularly helpful for SVN users.

Checking out branches which track remote branches

To use an existing branch in the remote repo on GitHub, create a tracking branch to track it. It can be kept in sync via git pull. For example to track the branch 'develop' (assuming the GitHub repo is named 'origin') run

>> git branch --track develop origin/develop

then

>> git checkout develop

Preparing to push a commit to the PSL repository or a fork on GitHub

Create a free account on GitHub. Then follow one of the following sets of instructions to set up Git and GitHub:

You can fork the PSL repository, which means that you create a fork hosted on GitHub. You then clone that repository to a local machine, make commits, and, optionally, push some or all of those commits back to the repository on GitHub. Those commits are then publicly available (unless you have paid GitHub for private hosting).


Using the CLI

PSL provides a Command Line Interface. The CLI is the easiest interface to PSL and handles most situations where you do not need additional customization.

Setup

PSL requires that you have Java installed .

The PSL jar file psl-cli-CANARY.jar already contains all required PSL libraries that you need to be able to run your PSL programs. You can find a current snapshot of this .jar file from our resources directory until we finalize our v2.0 release.

In this page we will be using the CANARY build, but you may use any PSL jar that is at least version 2.0.0.

Running your first program

Let's first download the files for our example program, run it and see what it does!

In this program, we'll use information about known locations of some people, know people know, and what people like to infer who knows each other. We'll first run the program and see the output. We will be working from the command line so open up your shell or terminal.

Get the code

As with the other PSL examples, you can find all the code in our psl-examples repository. We will be using the easy link prediction example.

git clone https://bitbucket.org/linqs/psl-examples.git
cd psl-examples/link_prediction/easy/cli

Run your first PSL program

All the required commands are contained in the run.sh script. However, the commands are very simple and can also be run individually. You only need to fetch the jar (done in the setup steps above) and run PSL.

wget https://linqs-data.soe.ucsc.edu/maven/repositories/psl-releases/org/linqs/psl-cli/CANARY/psl-cli-CANARY.jar
java -jar psl-cli-CANARY.jar -infer -model simple_lp.psl -data simple_lp.data

You should now see output that looks like this (note that the order of the output lines may differ):

0    [main] INFO  org.linqs.psl.config.ConfigManager  - PSL configuration psl.properties file not found. Only default values will be used unless additional properties are specified.
6    [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.dbpath. Returning default of /tmp/cli.
19   [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.dbpath. Returning default of /tmp/cli.
133  [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.rdbmsdatastore.valuecolumn. Returning default of truth.
134  [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.rdbmsdatastore.confidencecolumn. Returning default of confidence.
134  [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.rdbmsdatastore.partitioncolumn. Returning default of partition.
150  [main] INFO  org.linqs.psl.config.ConfigManager  - Found value true for option cli.rdbmsdatastore.usestringids.
154  [main] INFO  org.linqs.psl.cli.Launcher  - data:: loading:: ::starting
208  [main] WARN  org.linqs.psl.database.rdbms.RDBMSDataStoreMetadata  - Determining max partition, no partitions found null
260  [main] INFO  org.linqs.psl.cli.Launcher  - data:: loading:: ::done
260  [main] INFO  org.linqs.psl.cli.Launcher  - model:: loading:: ::starting
320  [main] INFO  org.linqs.psl.cli.Launcher  - model:: loading:: ::done
335  [main] INFO  org.linqs.psl.cli.Launcher  - operation::infer ::starting
336  [main] INFO  org.linqs.psl.config.ConfigManager  - Found value org.linqs.psl.reasoner.admm.ADMMReasonerFactory@34a0ee3f for option cli.mpeinference.reasoner.
339  [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.admmreasoner.maxiterations. Returning default of 25000.
339  [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.admmreasoner.stepsize. Returning default of 1.0.
339  [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.admmreasoner.epsilonabs. Returning default of 1.0E-5.
339  [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.admmreasoner.epsilonrel. Returning default of 0.001.
339  [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.admmreasoner.stopcheck. Returning default of 1.
340  [main] INFO  org.linqs.psl.config.ConfigManager  - No value found for option cli.admmreasoner.numthreads. Returning default of 8.
345  [main] INFO  org.linqs.psl.application.inference.MPEInference  - Grounding out model.
463  [main] INFO  org.linqs.psl.application.inference.MPEInference  - Beginning inference.
647  [main] INFO  org.linqs.psl.reasoner.admm.ADMMReasoner  - Optimization completed in  662 iterations. Primal res.: 0.022740805753995307, Dual res.: 5.499541249718983E-4
647  [main] INFO  org.linqs.psl.application.inference.MPEInference  - Inference complete. Writing results to Database.
669  [main] INFO  org.linqs.psl.cli.Launcher  - operation::infer inference:: ::done
KNOWS('Alex', 'Jay') = 0.6563306840338842
KNOWS('Steve', 'Ben') = 0.44100447478155413
< ... 50 rows omitted for brevity ...>
KNOWS('Sabina', 'Arti') = 0.7194742867561412
KNOWS('Dhanya', 'Elena') = 0.3682973941849134
KNOWS('Elena', 'Sabina') = 0.3287882658219531

What did it do?

Now that we've run our first program that performs link prediction to infer who knows who, let's understand the steps that we went through to infer the unknown values: defining the underlying model, providing data to the model, and running inference to classify the unknown values.

Defining a Model

A model in PSL is a set of logic-like rules.

The model is defined inside a text file with the format .psl. We describe this model in the file simple_lp.psl. Let's have a look at the rules that make up our model:

20:    Lived(P1,L) & Lived(P2,L) & P1!=P2   -> Knows(P1,P2) ^2
5:     Lived(P1,L1) & Lived(P2,L2) & P1!=P2 & L1!=L2  -> !Knows(P1,P2) ^2
10:    Likes(P1,L) & Likes(P2,L) & P1!=P2  -> Knows(P1,P2) ^2
5:     Knows(P1,P2) & Knows(P2,P3) & P1!=P3 -> Knows(P1,P3) ^2
10000: Knows(P1,P2) -> Knows(P2,P1) ^2
5:     !Knows(P1,P2) ^2

The model is expressing the intuition that people who have lived in the same location or like the same thing may know each other. The integer values at the beginning of rules indicate the weight of the rule. Intuitively, this tells us the relative importance of satisfying this rule compared to the other rules. The ^2 at the end of the rules indicates that the hinge-loss functions based on groundings of these rules are squared, for a smoother tradeoff. For more details on hinge-loss functions and squared potentials, see the publications on our PSL webpage.

Loading the Data

PSL rules consist of predicates. The names of the predicates used in our model and possible substitutions of these predicates with actual entities from our network are defined inside the file simple_lp.data. Let's have a look:

predicates:
  Knows/2: open
  Likes/2: closed
  Lived/2: closed

observations:
  Knows : ../data/knows_obs.txt
  Lived : ../data/lived_obs.txt
  Likes : ../data/likes_obs.txt

targets:
  Knows : ../data/knows_targets.txt

truth:
  Knows : ../data/knows_truth.txt

In the predicate section, we list all the predicates that will be used in rules that define the model. The keyword open indicates that we want to infer some substitutions of this predicate while closed indicates that this predicate is fully observed. I.e. all substitutions of this predicate have known values and will behave as evidence for inference.

For our simple example, we fully observe where people have lived and what things they like (or dislike). Thus, Likes and Lived are both closed predicates. We are aware of some instances of people knowing each other, but wish to infer the other insatnces Knows an open predicate.

In the observations section, for each predicate for which we have observations, we specify the name of the .txt file containing the observations. For example, knows_obs.txt and lived_obs.txt specifies which people know each other and where some of these people live, respectively.

The targets section specifies a .txt file that, for each open predicate, lists all substitutions of that predicate that we wish to infer. In knows_targets.txt, we specify the pairs of people for whom we wish to infer.

The truth section specifies a .txt file that provides a set of ground truth observations for each open predicate. Here, we give the actual values for the Knows predicate for all the people in the network as training labels. We describe the the general data loading scheme in more detail in the sections below.

Inferring the Missing Values

When we run the java -jar psl-cli-CANARY.jar -infer -model simple_lp.psl -data simple_lp.data command with the -infer flag, PSL's inference engine substitutes values from the data files into the rules of the model and runs inference on the targets.

Writing PSL Rules

To create a PSL model, you should define a set of rules in a .psl file. Let's go over the basic syntax to write rules. Consider this very general rule form:

w: P(A,B) & Q(B,C) -> R(A,C) ^2

The first part of the rule, w, is an integer value that specifies the weight of the rule. In this example, P, Q and R are predicates. Logical rules consist of the rule "body" and rule "head." The body of the rule appears before the -> which denotes logical implication. The body can have one or more predicates conjuncted together with the & that denotes logical conjunctions. The head of the rule should be a single predicate. The predicates that appear in the body and head can be any combination of open and closed predicate types.

The Rule Specification page contains the full syntax for PSL rules.

Organizing your Data

In a .data file, you should first define your predicates: as shown in the above example. Use the open and closed keywords to characterize each predicate.

An closed predicate is a predicate whose values are always observed. For example, the knows predicate from the simple example is closed because we fully observe the entire network of people that know one another. On the other hand, an open predicate is a predicate where some values may be observed, but some values are missing and thus, need to be inferred.

As shown above, then create your observations:, targets: and truth: sections that list the names of .txt files that specify the observed values for predicates, values you want to infer for open predicates and observed ground truth values for open predicates.

For all predicates, all possible substitutions should be specified either in the target files or in the observation files. The observations files should contain the known values for all closed predicates and can contain some of the known values for the open predicates. The target files tell PSL which substitutions of the open predicates it needs to infer. Target files cannot be specified for closed predicates as they are fully observed.

The truth files provide training labels in order learn the weights of the rules directly from data. This is similar to learning the weights of coefficients in a logistic regression model from training data. Weight learning is described below in greater detail.

Running Inference

Run inference with the general command:

java -jar psl-cli-CANARY.jar -infer -model [name of model file].psl -data [name of data file].data

When we run inference, the inferred values are outputted to the screen as shown for our example above. If you want to write the outputs to a file and use the inferred values in various ways downstream, you can use:

java -jar psl-cli-CANARY.jar -infer -model [name of model file].psl -data [name of data file].data -output [directory to write output files]

Values for all predicates will be output as .csv files in the specified output directory.

With the inferred values, some downstream tasks that you can perform are:

  • if you have a gold standard set of labels, you can evaluate your model by computing standard metrics like accuracy, AUC, F1, etc.
  • you may want to use the predicted outputs of PSL as inputs for another model.
  • you may want to visualize the predicted values and use the outputs of PSL as inputs to a data visualization program.

Using the Groovy Interface

Setup

Ensure that you have the prerequisites installed.

Examples

Looking at the examples is a great way to get familiar with the Groovy interface. We also have an in depth walkthrough of one of our examples.

Project Setup

After you have looked at the examples, you should set up a new PSL project . You can run a PSL Groovy program in the same way that you run an example program.

Groovy Specifics

Here are some more detailed topics that you may need:


Using the Java Interface

Application builders and advanced users can integrate PSL into their code as a library. Since the PSL codebase is organized as a Maven project, it is easiest to include PSL as a dependency via Maven.

Integrating PSL via Maven

The PSL codebase is organized as a Maven project with several subprojects. The subproject most likely of interest is psl-core, but stable versions of all the subprojects are published to the PSL Maven repository . Including a PSL subproject in your Maven project is easy. It requires two steps

First, add psl-core (and any other subprojects) as dependencies to your pom.xml file:

<dependencies>
    ...
    <dependency>
        <groupId>org.linqs</groupId>
        <artifactId>psl-core</artifactId>
        <version>2.0.0</version>
    </dependency>
    ...
</dependencies>

Second, specify the location of the PSL Maven repository in your pom.xml file, anywhere within the <project> </project> tags:

<repositories>
    <repository>
        <releases>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>
            <checksumPolicy>fail</checksumPolicy>
        </releases>
        <id>psl-releases</id>
        <name>PSL Releases</name>
        <url>http://maven.linqs.org/maven/repositories/psl-releases/</url>
        <layout>default</layout>
    </repository>
</repositories>

Maven will now make the the required PSL libraries and their dependencies available when compiling and running your project.


Version Cheatsheet

Release Date PSL PSL Utils PSL Experimental
2015-10-11 1.2.1 Code API Doc Static Wiki
2017-07-04 2.0.0 Code API Doc Static Wiki 1.0.0 Code API Doc 1.0.0 Code API Doc

Weight Learning

The job of a Weight Learning Application is to use data to learn the weights of each rule in a PSL model.

##Syntax In weight learning we follow the structure below:

<WeightLearningApplication> weightLearner =
      new <WeightLearningApplication>(<model>, <targetDatabase>, <groundTruthDatabase>, <config>);
  • <model> is the model specified by your PSL program.
  • <targetDatabase> is a database which contains all of the atoms for which you would like to infer values. When you create this database, the target predicate will be open.
  • <groundTruthDatabase> is a database which contains the known values of the atoms for which you are inferring values in the targetDatabase. When you create this database the predicates should be closed.
  • <config> is your config bundle .

Weight Learning Applications include:

  • LazyMaxLikelihoodMPE
  • L1MaxMargin
  • MaxLikelihoodMPE
  • MaxPseudoLikelihood

Working With Canary

What is Canary?

The Canary is a published build of PSL that is based on the development branch. The name "Canary" comes from the iconic use of a canary in a coal mine to detect toxic gas. The build is somewhere near the tip of the development tree. It is updated whenever the PSL developers feel a significant change has been made in development.

Using Canary

The use canary, simply change your PSL version in your pom.xml to CANARY. There is only one official Canary build (no version numbers). We do not use version number on Canary, so you can always simply pull CANARY and be sure that you have the latest version.

<dependencies>
    ...
    <dependency>
        <groupId>org.linqs</groupId>
        <artifactId>psl-groovy</artifactId>
        <version>CANARY</version>
    </dependency>
    ...
</dependencies>

How do I upgrade my Canary?

Since we do not use version numbers with CANARY, Maven will not fetch a new version unprompted. To upgrade your canary build, delete the old Canary from your Maven cache. On Lunix/Mac, this is: ~/.m2/repository/org/linqs/psl*/CANARY