After you have become familiar with the core topics , you can move on to these advanced topics:
This page is for building the PSL source code for the purposes of development. For running a standard PSL program, see Running a Program.
The PSL source code is publicly available and hosted on GitHub.
To get the code, simply clone the repository:
>> git clone https://github.com/linqs/psl.git
If you are already comfortable using Git, then you can just skip ahead to the section on compiling PSL.
The Git website has information on installing Git, as does the GitHub guides mentioned below. This tutorial is helpful for learning how to use Git, and this tutorial is particularly helpful for SVN users.
Once Git is installed and you're ready to use it, you can run the above command to clone the PSL repository.
Between releases, the develop branch may be significantly ahead of the master branch. To see the latest changes, checkout the develop branch.
>> git checkout develop
To contribute code to PSL first fork the PSL repository, which means that you create a fork hosted on GitHub.
Then you clone that repository to a local machine, make commits, and, push some or all of those commits back to the repository on GitHub. When your change is ready to be added to PSL, you can submit a pull request which will be reviewed by the PSL maintainers. The maintainers may request that you make additional changes. After your code is deemed acceptable, it will get merged into the develop branch of PSL.
PSL uses the maven build system. Change to the top-level directory of your working copy and run
>> mvn compile
You can install PSL to your local Maven repository by running
>> mvn install
Remember to update your project's pom.xml
file with the (possibly) new version you installed.
PSL comes with several builtin similarity functions. If you have a need not captured by these functions, then you can also create customized similarity functions.
These similarity functions are shipped with the PSL Utils package.
Name: Cosine Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.CosineSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Cosine_similarity
Name: Dice Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.DiceSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
Name: Jaccard Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.JaccardSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaccard_index
Name: Jaro Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.JaroSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.census.gov/srd/papers/pdf/rr91-9.pdf
Name: Jaro-Winkler Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Name: Level 2 Jaro-Winkler Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.Level2JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Jaro-Winkler Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).
Name: Level 2 Levenshtein Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.Level2LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Levenshtein Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).
Name: Level 2 Monge Elkan Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.Level2MongeElkanSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.aaai.org/Papers/KDD/1996/KDD96-044.pdf
Name: Levenshtein Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Levenshtein_distance
Name: Same Initials
Qualified Path: org.linqs.psl.utils.textsimilarity.SameInitials
Arguments: String, String
Return Type: Discrete
Description: First splits the input strings on any whitespace and ensures both have the same number of tokens (returns 0 if they do not). Then, the first character of all the tokens are checked for equality (ignoring case and order of appearance). Note that all all character that are not alphabetic ASCII characters are considered equal (eg. all numbers and unicode are considered the same character).
Name: Same Number of Tokens
Qualified Path: org.linqs.psl.utils.textsimilarity.SameNumTokens
Arguments: String, String
Return Type: Discrete
Description: Checks same number of tokens (delimited by any whitespace).
Name: Sub String Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.SubStringSimilarity
Arguments: String, String
Return Type: Continuous
Description: If one input string is a substring of another, then the length of the substring divided by the length of the text is returned. 0 is returned if neither string is a substring of the other.
Version 2.0.0 (https://github.com/linqs/psl/tree/2.0.0)
edu.umd.cs
to org.linqs
.Version 1.2.1 (https://github.com/linqs/psl/tree/1.2.1)
Version 1.2 (https://github.com/linqs/psl/tree/1.2)
Version 1.1.1 (https://github.com/linqs/psl/tree/1.1.1)
Version 1.1 (https://github.com/linqs/psl/tree/1.1)
Version 1.0.2 (https://github.com/linqs/psl/tree/1.0.2)
Version 1.0.1 (https://github.com/linqs/psl/tree/1.0.1)
Version 1.0 (https://github.com/linqs/psl/tree/1.0)
Maven allows several ways to specify acceptable versions for dependencies. This page discusses the recommended options to specifying the PSL version to use.
If you are working on a paper or code that requires exact reproducibility, then you should specify an exact version of PSL.
For example:
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>2.1.3</version>
</dependency>
...
</dependencies>
If you want to get bug fixes without worrying about breaking changes, then you can specify a major and minor version while allowing the incremental (patch) version to grow.
For example:
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>[2.1,)</version>
</dependency>
...
</dependencies>
If you want the latest stable code and can tolerate the occasional breakage, then you can specify just the major version.
For example:
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>[2,)</version>
</dependency>
...
</dependencies>
If you are doing development any are willing to accept potential bugs, broken builds, and API breakages, then you can use the canary build. See the working with canary page to get detail on how best to work with the canary build.
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>CANARY</version>
</dependency>
...
</dependencies>
Key: pairedduallearner.warmuprounds
Type: int
Default Value: 0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.PairedDualLearner
Description: Key for Integer property that indicates how many rounds of paired-dual learning to run before beginning to update the weights (parameter K in the ICML paper)
Key: pairedduallearner.admmsteps
Type: int
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.PairedDualLearner
Description: Key for Integer property that indicates how many steps of ADMM to run for each inner objective before each gradient step (parameter N in the ICML paper)
Key: hardem.adagrad
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.HardEM
Description: Key for Boolean property that indicates whether to use AdaGrad subgradient scaling, the adaptive subgradient algorithm of John Duchi, Elad Hazan, Yoram Singer (JMLR 2010). If TRUE, will override other step scheduling options (but not scaling).
Key: bernoullimeanfieldem.mpeinit
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.BernoulliMeanFieldEM
Description: Key for Boolean property. If true, the mean field will be reinitialized via MPE inference at each round. If false, each mean will be initialized to 0.5 before the first round.
Key: em.iterations
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for positive int property for the number of iterations of expectation maximization to perform
Key: em.resetschedule
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for Boolean property that indicates whether to reset step-size schedule for each EM round. If TRUE, schedule will be VotedPerceptron#STEP_SIZE_KEY at start of each round. If FALSE, schedule will smoothly decrease across rounds, i.e., the schedule will be 1/ (round number * num steps + step number). This property has no effect if VotedPerceptron#STEP_SCHEDULE_KEY is false.
Key: em.storeweights
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for Boolean property that indicates whether to store weights along entire optimization path
Key: em.tolerance
Type: double
Default Value: 1e-3
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for positive double property for the minimum absolute change in weights such that EM is considered converged
Key: votedperceptron.augmentloss
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for boolean property for whether to add loss-augmentation for online large margin
Key: votedperceptron.l2regularization
Type: double
Default Value: 0.0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for positive double property scaling the L2 regularization (\lambda / 2) * ||w||^2
Key: votedperceptron.l1regularization
Type: double
Default Value: 0.0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for positive double property scaling the L1 regularization \gamma * |w|
Key: votedperceptron.stepsize
Type: double
Default Value: 1.0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for positive double property which will be multiplied with the objective gradient to compute a step.
Key: votedperceptron.schedule
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for Boolean property that indicates whether to shrink the stepsize by a 1/t schedule.
Key: votedperceptron.scalegradient
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for Boolean property that indicates whether to scale gradient by number of groundings
Key: votedperceptron.averagesteps
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for Boolean property that indicates whether to average all visited weights together for final output.
Key: votedperceptron.numsteps
Type: int
Default Value: 25
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for positive integer property. VotedPerceptron will take this many steps to learn weights.
Key: votedperceptron.nonnegativeweights
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
Description: Key for boolean property. If true, only non-negative weights will be learned.
Key: maxspeudolikelihood.bool
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Boolean property. If true, MaxPseudoLikelihood will treat RandomVariableAtoms as boolean valued. Note that this restricts the types of contraints supported.
Key: maxspeudolikelihood.numsamples
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Key for positive integer property. MaxPseudoLikelihood will sample this many values to approximate the integrals in the marginal computation.
Key: maxspeudolikelihood.constrainttolerance
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Key for constraint violation tolerance
Key: maxspeudolikelihood.minwidth
Type: double
Default Value: 1e-2
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Key for positive double property. Used as minimum width for bounds of integration.
Key: maxmargin.tolerance
Type: double
Default Value: 1e-3
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for double property, cutting plane tolerance
Key: maxmargin.slackpenalty
Type: double
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for double property, slack penalty C, where objective is ||w|| + C (slack)
Key: maxmargin.maxiter
Type: int
Default Value: 500
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for positive integer, maximum number of constraints to add to quadratic program
Key: maxmargin.nonnegativeweights
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for boolean property. If true, only non-negative weights will be learned.
Key: maxmargin.scalenorm
Type: NormScalingType
Default Value: NormScalingType.NONE
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for NormScalingType enum property. Determines type of norm scaling MaxMargin will use in its objective.
Key: maxmargin.squareslack
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MaxMargin
Description: Key for SquareSlack boolean property. Determines whether to penalize slack linearly or quadratically.
Key: minnormprog.conicprogramsolver
Type: ConicProgramSolverFactory
Default Value: new HomogeneousIPMFactory()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.MinNormProgram
Description: Key for ConicProgramSolverFactory or String property. Should be set to a ConicProgramSolverFactory (or the binary name of one). The ConicReasoner will use this ConicProgramSolverFactory to instantiate a ConicProgramSolver.
Key: frankwolfe.tolerance
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for double property, cutting plane tolerance
Key: frankwolfe.maxiter
Type: int
Default Value: 500
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for positive integer, maximum iterations
Key: frankwolfe.averageweights
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for boolean property. If true, algorithm will output average weights when learning exceeds maximum number of iterations.
Key: frankwolfe.nonnegativeweights
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for boolean property. If true, only non-negative weights will be learned.
Key: frankwolfe.normalize
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for boolean property. If true, loss and gradient will be normalized by number of labels.
Key: frankwolfe.regparam
Type: double
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.FrankWolfe
Description: Key for double property, regularization parameter \lambda, where objective is \lambda*||w|| + (slack)
Key: l1maxmargin.balanceloss
Type: LossBalancingType
Default Value: LossBalancingType.NONE
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxmargin.L1MaxMargin
Description: Key for LossBalancingType enum property. Determines the type of loss balancing MaxMargin will use. @see LossBalancingType
Key: weightlearning.reasoner
Type: ReasonerFactory
Default Value: new ADMMReasonerFactory()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.WeightLearningApplication
Description: Key for Factory or String property.
Should be set to a ReasonerFactory or the fully qualified name of one. Will be used to instantiate a Reasoner.
This reasoner will be used when constructing ground models for weight learning, unless this behavior is overriden by a subclass.
Key: lazympeinference.reasoner
Type: ReasonerFactory
Default Value: new ADMMReasonerFactory()
Module: psl-core
Defining Class: org.linqs.psl.application.inference.LazyMPEInference
Description: Key for Factory or String property.
Should be set to a ReasonerFactory or the fully qualified name of one. Will be used to instantiate a Reasoner.
Key: lazympeinference.maxrounds
Type: int
Default Value: 100
Module: psl-core
Defining Class: org.linqs.psl.application.inference.LazyMPEInference
Description: Key for int property for the maximum number of rounds of inference.
Key: mpeinference.reasoner
Type: ReasonerFactory
Default Value: new ADMMReasonerFactory()
Module: psl-core
Defining Class: org.linqs.psl.application.inference.MPEInference
Description: Key for Factory or String property.
Should be set to a ReasonerFactory or the fully qualified name of one. Will be used to instantiate a Reasoner.
Key: LTNmaxspeudolikelihood.bool
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood
Description: Boolean property. If true, MaxPseudoLikelihood will treat RandomVariableAtoms as boolean valued. Note that this restricts the types of contraints supported.
Key: LTNmaxspeudolikelihood.numsamples
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood
Description: Key for positive integer property. MaxPseudoLikelihood will sample this many values to approximate the integrals in the marginal computation.
Key: LTNmaxspeudolikelihood.constrainttolerance
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood
Description: Key for constraint violation tolerance
Key: LTNmaxspeudolikelihood.minwidth
Type: double
Default Value: 1e-2
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood
Description: Key for positive double property. Used as minimum width for bounds of integration.
Key: CONFIG_PREFIX + ".lowerboundepsilon"
Type: double
Default Value: 1e-6
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.reasoner.admm.LatentTopicNetworkADMMReasoner
Description: Key for positive double property. Minimum value that theta and phi parameters are allowed to take, enforced by clipping the consensus variables to this.
Key: latentTopicNetworks.hingeLossTheta
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to use a hinge-loss MRF to model theta.
Key: latentTopicNetworks.hingeLossPhi
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to use a hinge-loss MRF to model phi.
Key: latentTopicNetworks.numIterations
Type: int
Default Value: 200
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of EM iterations to perform.
Key: latentTopicNetworks.numBurnIn
Type: int
Default Value: 0
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of vanilla LDA EM iterations to perform before using hinge losses in the M step.
Key: latentTopicNetworks.numTopics
Type: int
Default Value: 20
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of EM iterations to perform.
Key: latentTopicNetworks.alpha
Type: double
Default Value: 1.01
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive double property, the Dirichlet prior hyperparameter alpha.
Key: latentTopicNetworks.beta
Type: double
Default Value: 1.01
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive double property, the Dirichlet prior hyperparameter beta.
Key: latentTopicNetworks.weightLearning
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to perform pseudo-likelihood weight learning in the EM loop.
Key: latentTopicNetworks.firstWLearningIter
Type: int
Default Value: 50
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of EM iterations to perform before performing weight learning.
Key: latentTopicNetworks.WLearningGap
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for positive integer property indicating the number of EM iterations to between weight learning steps.
Key: latentTopicNetworks.initMStepToLDAtheta
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to initialize the ADMM variables to LDA, for theta. The alternative is to initialize at the previous iteration. LDA initialization may be best in high dimensions, while previous iteration initialization may be best with strong weights.
Key: latentTopicNetworks.initMStepToLDAphi
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for Boolean property indicating whether to initialize the ADMM variables to LDA, for phi. The alternative is to initialize at the previous iteration. LDA initialization may be best in high dimensions, while previous iteration initialization may be best with strong weights.
Key: latentTopicNetworks.saveDir
Type: String
Default Value: ""
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetwork
Description: Key for string property indicating the directory to save intermediate topic models (if empty, do not save them).
Key: LTNmaxspeudolikelihood.bool
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood_Naive
Description: Boolean property. If true, MaxPseudoLikelihood will treat RandomVariableAtoms as boolean valued. Note that this restricts the types of contraints supported.
Key: LTNmaxspeudolikelihood.numsamples
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood_Naive
Description: Key for positive integer property. MaxPseudoLikelihood will sample this many values to approximate the integrals in the marginal computation.
Key: LTNmaxspeudolikelihood.constrainttolerance
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood_Naive
Description: Key for constraint violation tolerance
Key: LTNmaxspeudolikelihood.minwidth
Type: double
Default Value: 1e-2
Module: psl-core
Defining Class: org.linqs.psl.application.topicmodel.LatentTopicNetworkMaxPseudoLikelihood_Naive
Description: Key for positive double property. Used as minimum width for bounds of integration.
Key: uaiformatreasoner.task
Type: Task
Default Value: Task.MPE
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.UAIFormatReasoner
Description: Key for Task enum property which is reasoner task to perform.
Key: uaiformatreasoner.seed
Type: int
Default Value: 0
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.UAIFormatReasoner
Description: Key for integer property which is random seed for reasoner
Key: booleanmaxwalksat.maxflips
Type: int
Default Value: 50000
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMaxWalkSat
Description: Key for positive integer property that is the maximum number of flips to try during optimization
Key: booleanmaxwalksat.noise
Type: double
Default Value: (double) 1 / 100
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMaxWalkSat
Description: Key for double property in [0,1] that is the probability of randomly perturbing an atom in a randomly chosen potential
Key: booleanmcsat.numsamples
Type: int
Default Value: 2500
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMCSat
Description: Key for length of Markov chain
Key: booleanmcsat.numburnin
Type: int
Default Value: 500
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMCSat
Description: Number of burn-in samples
Key: ad3reasoner.algorithm
Type: Algorithm
Default Value: Algorithm.AD3
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.AD3Reasoner
Description: Key for Algorithm enum property which is inference algorithm to use.
Key: conicreasoner.conicprogramsolver
Type: ConicProgramSolverFactory
Default Value: new HomogeneousIPMFactory()
Module: psl-core
Defining Class: org.linqs.psl.reasoner.conic.ConicReasoner
Description: Key for org.linqs.psl.config.Factory or String property. Should be set to a org.linqs.psl.optimizer.conic.ConicProgramSolverFactory (or the binary name of one). The ConicReasoner will use this org.linqs.psl.optimizer.conic.ConicProgramSolverFactory to instantiate a org.linqs.psl.optimizer.conic.ConicProgramSolver, which will then be used for inference.
Key: admmreasoner.maxiterations
Type: int
Default Value: 25000
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for int property for the maximum number of iterations of ADMM to perform in a round of inference
Key: admmreasoner.stepsize
Type: double
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for non-negative double property. Controls step size. Higher values result in larger steps.
Key: admmreasoner.epsilonabs
Type: double
Default Value: 1e-5
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for positive double property. Absolute error component of stopping criteria.
Key: admmreasoner.epsilonrel
Type: double
Default Value: 1e-3
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for positive double property. Relative error component of stopping criteria.
Key: admmreasoner.stopcheck
Type: int
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for positive integer. The number of ADMM iterations after which the termination criteria will be checked.
Key: admmreasoner.numthreads
Type: int
Default Value: Runtime.getRuntime().availableProcessors()
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Key for positive integer. Number of threads to run the optimization in.
Key: executablereasoner.executable
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.reasoner.ExecutableReasoner
Description: Key for String property which is path to reasoner executable. This is the rare PSL property that is mandatory to specify.
Key: atomeventframework.activation
Type: double
Default Value: 0.01
Module: psl-core
Defining Class: org.linqs.psl.model.atom.AtomEventFramework
Description: ; Key for double property in (0,1]. Activation events will be generated for RandomVariableAtoms when they meet or exceed this threshold.
Key: blocksolver.maxcgiter
Type: int
Default Value: 1000000
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for integer property. The BlockSolver will throw an exception if the conjugate gradient solver completes this many iterations without solving the normal system.
Key: blocksolver.cgreltol
Type: double
Default Value: 10e-10
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for double property. The conjugate gradient solver will terminate as converged if the residual is less than this value times the initial residual.
Key: blocksolver.cgabstol
Type: double
Default Value: 10e-50
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for double property. The conjugate gradient solver will terminate as converged if the residual is less than this value.
Key: blocksolver.cgdivtol
Type: double
Default Value: 10e5
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for double property. The BlockSolver will throw an exception if the conjugate graident solver reaches an iterate whose residual is at least this value times the initial residual.
Key: blocksolver.preconditionerterms
Type: int
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.BlockSolver
Description: Key for non-negative integer property. The BlockSolver preconditions the Schur's complement matrix by a truncated series summation. Higher values generally result in fewer conjugate gradient iterations, but each iteration is more time consuming.
Key: cgsolver.maxcgiter
Type: int
Default Value: 1000000
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for integer property. The ConjugateGradient solver will throw an exception if the conjugate gradient solver completes this many iterations without solving the normal system.
Key: cgsolver.cgreltol
Type: double
Default Value: 10e-10
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for double property. The ConjugateGradient solver will terminate as converged if the residual is less than this value times the initial residual.
Key: cgsolver.cgabstol
Type: double
Default Value: 10e-50
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for double property. The ConjugateGradient solver will terminate as converged if the residual is less than this value.
Key: cgsolver.cgdivtol
Type: double
Default Value: 10e5
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for double property. The ConjugateGradient solver will throw an exception if the conjugate graident solver reaches an iterate whose residual is at least this value times the initial residual.
Key: cgsolver.preconditioner
Type: PreconditionerFactory
Default Value: new IdentityPreconditionerFactory()
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.solver.ConjugateGradient
Description: Key for Factory or String property. Should be set to a PreconditionerFactory or the fully qualified name of one. Will be used to instantiate a DoublePreconditioner.
Key: ppipm.threadpoolsize
Type: int
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.ParallelPartitionedIPM
Description:
Key: hipm.dualize
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for boolean property. If true, the IPM will dualize the conic program before solving it. The IPM will substitute the results back into the original problem, so this should only affect the computational cost of #solve(ConicProgram), not the quality of the solution. @see Dualizer
Key: hipm.infeasibilitythreshold
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property. The IPM will consider the problem primal, dual, or gap feasible if the primal, dual, or gap infeasibility is less than its value, respectively.
Key: hipm.gapthreshold
Type: double
Default Value: 10e-6
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property. The IPM will iterate until the duality gap is less than its value.
Key: hipm.tauthreshold
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property. The IPM will multiply its value by another value and consider tau small if tau is less than that product.
Key: hipm.muthreshold
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property. The IPM will consider mu small if mu is less than its value times the initial mu.
Key: hipm.beta
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property in (0,1). The IPM will stay in a neighborhood of the central path, the size of which is defined by its value. Larger values correspond to smaller neighborhoods.
Key: hipm.delta
Type: double
Default Value: 0.5
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for double property in [0,1]. The IPM will use its value to determine how aggressively to minimize the objective (versus to follow the central path). Lower values correspond to more aggressive strategies.
Key: hipm.normalsolver
Type: NormalSystemSolverFactory
Default Value: new CholeskyFactory()
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.HomogeneousIPM
Description: Key for Factory or String property. Should be set to a NormalSystemSolverFactory or the fully qualified name of one. Will be used to instantiate a NormalSystemSolver.
Key: ipm.initfeasible
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.IPM
Description: Key for boolean property. If true, the IPM will initialize the conic program to a feasible point before solving it. @see FeasiblePointInitializer
Key: ipm.dualize
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.IPM
Description: Key for boolean property. If true, the IPM will dualize the conic program before solving it. The IPM will substitute the results back into the original problem, so this should only affect the computational cost of #solve(ConicProgram), not the quality of the solution. @see Dualizer
Key: ipm.dualitygapthreshold
Type: double
Default Value: 0.0001
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.IPM
Description: Key for double property. The IPM will iterate until the duality gap is less than its value.
Key: ipm.infeasibilitythreshold
Type: double
Default Value: 10e-8
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.IPM
Description: Key for double property. The IPM will iterate until the primal and dual infeasibilites are each less than its value. @see ConicProgram#getPrimalInfeasibility() @see ConicProgram#getDualInfeasibility()
Key: cgipm.maxcgiter
Type: int
Default Value: 1000000
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.cg.ConjugateGradientIPM
Description: Key for integer property. The ConjugateGradientIPM will throw an exception if the conjugate gradient solver completes this many iterations without solving the normal system.
Key: cgipm.cgreltol
Type: double
Default Value: 1e-10
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.cg.ConjugateGradientIPM
Description: Key for double property. The conjugate gradient solver will terminate as converged if the residual is less than this value times the initial residual.
Key: cgipm.cgabstol
Type: double
Default Value: 1e-50
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.cg.ConjugateGradientIPM
Description: Key for double property. The conjugate gradient solver will terminate as converged if the residual is less than this value.
Key: cgipm.cgdivtol
Type: double
Default Value: 1e5
Module: psl-core
Defining Class: org.linqs.psl.optimizer.conic.ipm.cg.ConjugateGradientIPM
Description: Key for double property. The ConjugateGradientIPM will throw an exception if the conjugate graident solver reaches an iterate whose residual is at least this value times the initial residual.
Key: rdbmsdatastore.valuecolumn
Type: String
Default Value: truth
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.RDBMSDataStore
Description: Key for String property for the name of the value column in the database.
Key: rdbmsdatastore.confidencecolumn
Type: String
Default Value: confidence
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.RDBMSDataStore
Description: Key for String property for the name of the confidence column in the database.
Key: rdbmsdatastore.partitioncolumn
Type: String
Default Value: partition
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.RDBMSDataStore
Description: Key for String property for the name of the partition column in the database.
Key: rdbmsdatastore.usestringids
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.RDBMSDataStore
Description: Key for boolean property of whether to use RDBMSUniqueStringID as a UniqueID.
See the Configuration Options page for all options that PSL uses.
Many components of the PSL software have modifiable parameters and options, called properties. Every property has a key, which is a string that should uniquely identify it.
These keys are organized into a namespace hierarchy, with each level separated by dots, e.g. <namespace>.<option>
.
Each PSL class can specify a namespace for the options used by the class and its subclasses. For example, the org.linqs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
weight learning class specifies the namespace votedperceptron
. Setting the configuration option votedperceptron.stepsize
allows you to control the size of the gradient descent update step in the VotedPerceptron weight learning class.
Every property has a type and a default value, which is the value the object will use unless a user overrides it. Every class with properties documents them by declaring their keys as public static final Strings, with Javadoc comments describing the corresponding property's type and semantics. Another public static final member declares the default value for that property.
Users of the PSL software can specify property values by grouping them into bundles, which are objects that implement the org.linqs.psl.config.ConfigBundle
interface. Every bundle has a name and a map from property keys to values. A configurable component takes a ConfigBundle
as an argument in its constructor and queries it with a property key and a default value. If the bundle does not map the key to a value, it returns the provided default, e.g.
ConfigBundle cb;
stepsize = cb.getProperty('votedperceptron.stepsize', 100);
PSL components also pass their bundles to components that they create, so a user can group their property values into a single bundle, pass it into a component with which they interact, and the values will be used by the entire stack of components. Any properties that don't belong to a particular component will be ignored by that component.
PSL projects can specify different configuration bundles in a file named psl.properties
on the classpath. The standard location for this file is <project root>/src/main/resources/psl.properties
. Each key-value pair should be specified on its own line with a <bundle>.<namespace>.<option> = <value>
format. The following example sets options for the example
and test
bundles.
# This is an example properties file for PSL. # # Options are specified in a namespace hierarchy, with levels separated by '.'. # The top levels are called bundles. Use the ConfigManager class to access them. # Weight learning parameters # Parameters for voted perceptron algorithm # This property adaptively changes the step size of the updates example.votedperceptron.schedule = true # This property specifies the number of iterations of voted perceptron updates example.votedperceptron.numsteps = 700 # This property specifies the initial step size of the voted perceptron updates example.votedperceptron.stepsize = 0.1 # Parameters for the Hard-EM weight learning algorithm # This property specifies the number of Hard-EM updates test.em.iterations = 1000 # This property specifies the tolerance to check for convergence for Hard-EM test.em.tolerance = 1e-5 # This property specifies the number of iterations of voted perceptron updates test.votedperceptron.numsteps = 1000
The standard way to create bundles is with an instance of the org.linqs.psl.config.ConfigManager
class.
ConfigManager
uses the Singleton pattern. The ConfigManager
instance will read psl.properties
to generate bundles. Then a bundle can be instantiated with the code
ConfigBundle bundle = ConfigManager.getManager().getBundle("example");
Arithmetic rules can be used to enforce modeling constraints. Many different types of constraints can be modeled, here are a few of the common types:
Let Foo
be the binary predicate that we wish to put constraints on. (Constraints are not limited to only binary predicates.)
A Functional
constraint enforces the condition that for each possible constant c, the values of all groundings of Foo(A, c)
sum to exactly 1.
Foo(A, +c) = 1 .
Note that the rule is unweighted (as indicated by the period at the end).
Summing the first argument instead of the second one is often called Inverse Functional
.
Foo(+c, A) = 1 .
A Partial Functional
constraint is like a Functional
one, except the values of all groundings of Foo(A, c)
sum to 1 or less.
Foo(A, +c) <= 1 .
Foo(+c, A) <= 1 .
This page serves as a starting place for getting familiar with PSL. Entirely new users are reccomended to start with the Environment Setup and Configuration sections.
Before you set up a new project, ensure that the prerequisites are met.
The easiest way to get a new project started in PSL is to copy an existing project. The examples are kept up-to-date and exhibit the preferred style for PSL programs. It is recommended to start there and change the program as you go.
To read in the truth values of ground atoms from text files, a DataStore
object is required.
DataStore data = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, "database/path", true), configBundle);
In the code snippet above, the RDBMSDataStore
constructor takes two arguments:
The H2DatabaseDriver constructor accepts three arguments:
After a DataStore
is created, we can read in the truth values of ground atoms from text files as follows:
Inserter insert = data.getInserter(<predicateName>, <partition>);
InserterUtils.loadDelimitedData(<filePath>);
<partition>
is the partition to write the data into.<predicateName>
is the name of the predicate whose ground atoms are to be read.<fileName>
is the name of the file containing the ground atoms.All data read in with InserterUtils.loadDelimitedData
is assumed to have a truth value of 1.
If you need to also read in specific truth values, use InserterUtils.loadDelimitedData
.
If you are already comfortable using Git and you don't want or need to push commits to GitHub, then you can just clone the PSL repository using the command below. Otherwise, this short primer on some Git essentials may be useful.
>> git clone https://github.com/linqs/psl.git
Change to the top-level directory of your working copy and run
>> mvn compile
You can install PSL to your local Maven repository by running
>> mvn install
If you're a member of the LINQS group, you may eventually need to release a new version of PSL. There are a number of steps involved in the process, which are detailed in the guide for Releasing a New Stable Version.
Eclipse is an extensible, integrated development environment that can be used to develop PSL and PSL projects. The recommended way of using Eclipse with PSL is to use the Eclipse plugin for Maven to generate Eclipse project information for a PSL project and then import that project into Eclipse.
Ensure that you have version 3.6 (Helios) or higher of Eclipse installed. Then, install the Groovy Eclipse plugin and the optional 1.8 version of the Groovy compiler, which is available when installing the plugin. The version 1.8 compiler is what Maven will use to compile the Groovy scripts, so builds done by either tool should be interchangeable. If you use an older version, Eclipse will probably recompile some files which then won't be compatible with the rest, and it won't run. (Cleaning and rebuilding everything should help.)
You might have to change the Groovy compiler version to 1.8.x in your Groovy compiler preferences (part of the Eclipse preferences).
You need to add a classpath variable in Eclipse to point to your local Maven repository. You can access the variables either from the main options or from the build-path editor for any project. Where you specify additional libs, make a new variable (there should be a button) with the name M2_REPO
and the path to your repo (e.g., ~/.m2/repository
). This can also be achieved automatically via the following Maven command:
mvn -Declipse.workspace=/path/to/workspace eclipse:configure-workspace
In the top-level directory of your PSL project, run
>> mvn eclipse:eclipse
Then in Eclipse, go to File/Import/General/\
Be sure to run as a "Java application."
Tips
>> mvn eclipse:clean
>> mvn eclipse:eclipse -Declipse.workspace=<path to Eclipse workspace>
The Eclipse plugin for Maven will look in the provided workspace for any projects that match dependencies declared in your project's POM file. Your project will be configured to depend on any such projects found as opposed to their respective installed jars. This way, changes to the sources of those dependencies will be seen by your project without reinstalling the dependencies. Note that this works even for dependencies that were imported but not copied into the workspace.
Before working with PSL, you must make sure that your working environment is properly setup.
This page will walk you through the Groovy version of the Easy Link Prediction example.
First, ensure that your system meets the prerequisites .
Then clone the psl-examples
repository:
git clone https://bitbucket.org/linqs/psl-examples.git
Then move into the root directory for the easy link prediction example:
cd psl-examples/link_prediction/easy/groovy
Each example comes with a run.sh
script to quickly compile and run the example.
To compile and run the example:
./run.sh
To see the output of the example, check the output/default/knows_infer.txt
file:
cat output/default/knows_infer.txt
You should see some output like:
--- Atoms:
KNOWS(Steve, Ben) Truth=[0.64]
KNOWS(Alex, Dhanya) Truth=[0.36]
< ... 48 rows omitted for brevity ...>
KNOWS(Dhanya, Ben) Truth=[0.55]
KNOWS(Alex, Sabina) Truth=[0.44]
# Atoms: 52
The exact order of the output may change and some rows were left out for brevity.
Now that we have the example running, lets take a look inside the only source file for the example:
src/main/java/edu/ucsc/linqs/psl/example/easylp/EasyLP.groovy
.
One of the first things you may notice in this file is a private classes that holds configuration data. In addition to ConfigBundles, it is sometimes also useful to create configuration classes that you can pass quickly change and run different experiments with. This is not required, but you may find it useful.
In the populateConfigBundle()
method you can see the ConfigBundle actually getting created:
ConfigBundle cb = ConfigManager.getManager().getBundle("easylp");
The definePredicates()
method defines the three predicates for our example:
model.add predicate: "Lived", types: [ConstantType.UniqueID, ConstantType.UniqueID];
model.add predicate: "Likes", types: [ConstantType.UniqueID, ConstantType.UniqueID];
model.add predicate: "Knows", types: [ConstantType.UniqueID, ConstantType.UniqueID];
Each predicate here takes two unique identifiers as arguments.
The defineRules()
method defines seven rules for the example.
There are pages that cover the PSL rule specification and the rule specification in Groovy .
We will discuss the following two rules:
model.add(
rule: ( Lived(P1,L) & Lived(P2,L) & (P1-P2) ) >> Knows(P1,P2),
squared: config.sqPotentials,
weight : config.weightMap["Lived"]
);
model.add(
rule: ~Knows(P1,P2),
squared:config.sqPotentials,
weight: config.weightMap["Prior"]
);
The first first rule can be read as "If P1 and P2 are different people and have both lived in the same location, L, then they know each other". Some key points to note from this rule are:
L
was reused in both Lived
atoms and therefore must refer to the same location.(P1 - P2)
is shorthand for P1 and P2 referring to different people (different unique ids).The second rule is a special rule that acts as a prior. Notice how this rule is not an implication like all the other rules. Instead, this rule can be read as "By default, people do not know each other". Therefore, the program will start with the belief that no one knows each other and this prior belief will be overcome with evidence.
The loadData()
method loads the data from the flat files in the data
directory into the data store that PSL is working with.
For berevity, we will only be looking at two files:
Inserter inserter = ds.getInserter(Lived, obsPartition);
InserterUtils.loadDelimitedData(inserter, Paths.get(config.dataPath, "lived_obs.txt").toString());
inserter = ds.getInserter(Likes, obsPartition);
InserterUtils.loadDelimitedDataTruth(inserter, Paths.get(config.dataPath, "likes_obs.txt").toString());
Both portions load data using the InserterUtils
.
The primary difference between the two calls is that the second one is looking for a truth value while the first one assumes that 1 is the truth value.
If we look in the files, we see lines like:
data/lives_obs.txt
Jay Maryland
Jay California
data/likes_obs.txt
Jay Machine Learning 1
Jay Skeeball 0.8
In lives_obs.txt
, there is no need to use a truth value because living somewhere is a discrete act.
You have either lived there or you have not.
Liking something, however, is more continuous.
Jay may like Machine Learning 100%, but he only likes Skeeball 80%.
Here we must take a moment to talk about data partitions. In PSL, we use partitions to organize data. A partition is nothing more than a container for data, but we use them to keep specific chunks of data together or separate. For example if we are running evaluation, we must be sure not use our test partition in training.
PSL users typically organize their data in at least three different partitions (all of which you can see in this example):
obsPartition
in this example): In this partition we put actual observed data. In this example, we put all the observations about who has lived where, who likes what, and who knows who in the observations partition.targetsPartition
in this example): In this partition we put atoms that we want to infer values for. For example if we want to if Jay and Sammy know each other, then we would put the atom Knows(Jay, Sammy)
into the targets partition.truthPartition
in this example): In this partition we put our test set, data that we have actual values for but are not including in our observations for the purpose of evaluation. For example, if we know that Jay and Sammy actually do know each other, we would put Knows(Jay, Sammy)
in the truth partition with a truth value of 1.The runInference()
method handles running inference for all the data we have loaded.
Before we run inference, we have to set up a database to use for inference:
HashSet closed = new HashSet<StandardPredicate>([Lived, Likes]);
Database inferDB = ds.getDatabase(targetsPartition, closed, obsPartition);
The getDatabase()
method of DataStore
is the proper way to get a database.
This method takes a minimum of two parameters:
getDatabase()
takes any number of read-only partitions that you want to include in this database.
In our example, we want to include our observations when we run inference.Now we are ready to run inference:
MPEInference mpe = new MPEInference(model, inferDB, config.cb);
mpe.mpeInference();
mpe.close();
inferDB.close();
To the MPEInference
constructor, we supply our model, the database to infer over, and our ConfigBundle.
To see the results, then we will need to look inside of the target partition.
The method writeOutput()
handles printing out the results of the inference.
There are two key lines in this method:
Database resultsDB = ds.getDatabase(targetsPartition); ... Set atomSet = Queries.getAllAtoms(resultsDB, Knows);
The first line gets a fresh database that we can get the atoms from.
Notice that we are passing in targetsPartition
as a write partition, but we are actually just reading from it.
The second line uses the Queries
class to get all the Knows
atoms from the database we just created.
Lastly, the evalResults()
method handles seeing how well our model did.
The DiscretePredictionComparator
and ContinuousPredictionComparator
classes provide basic tools to compare two partitions.
In this example, we are comparing our target partition to our truth partition.
Example PSL programs are available at https://bitbucket.org/linqs/psl-examples.
Each example contains a script called run.sh
which will handle all the building and running.
A detailed walkthrough of an example can he found here .
Customized functions can be created be implementing the ExternalFunction
interface.
The getValue() method should return a value in [0, 1].
public class MyStringSimilarity implements ExternalFunction {
@Override
public int getArity() {
return 2;
}
@Override
public ConstantType[] getArgumentTypes() {
return [ConstantType.String, ConstantType.String].toArray();
}
@Override
public double getValue(ReadOnlyDatabase db, Constant... args) {
return args[0].toString().equals(args[1].toString()) ? 1.0 : 0.0;
}
}
A function comparing the similarity between two entities or text can then be declared as follows:
model.add function: MyStringSimilarity, implementation: new MyStringSimilarity();
A function can be used in the same manner as a predicate in rules:
Name(P1, N1) & Name(P2, N2) & MyStringSimilarity(N1, N2) -> SamePerson(P1, P2)
The PSL software uses concepts from the PSL paper , and introduces new ones for advanced data management and machine learning. On this page, we define the commonly used terms and point out the corresponding classes in the codebase.
Please note that this page is organized conceptually, not alphabetically.
Hinge-loss Markov random field: A factor graph defined over continuous variables in the [0,1] interval with (log) factors that are hinge-loss functions. Many classes in PSL work together to implement the functionality of HL-MRFs, but the class for storing collections of hinge-loss potentials, which define HL-MRFs, is GroundRuleStore.java .
Ground atom: A logical relationship corresponding to a random variable in a HL-MRF. For example, Friends("Steve", "Jay")
is an alias for a specific random variable. Implemented in GroundAtom.java .
Random variable atom: A ground atom that is unobserved, i.e., no value is known for it. A HL-MRF assigns probability densities to assignments to random variable atoms. Implemented in RandomVariableAtom.java .
Observed atom: A ground atom that has an observed, immutable value. HL-MRFs are conditioned on observed atoms. Implemented in ObservedAtom.java .
Atom: A generalization of ground atoms that allow logical variables as placeholders for constant arguments. For example, Friends("Steve", A)
is a placeholder for all the ground atoms that can be obtained by substituting constants for the logical variable A
. Implemented in Atom.java .
PSL Program: A set of rules, each of which is a template for hinge-loss potentials or hard linear constraints. When grounded over a base of ground atoms, a PSL program induces a HL-MRF conditioned on any specified observations. Implemented in Model.java .
Rule: See Rule Specification .
Data Store: An entire data repository, such as a relational database management system (RDBMS). Implemented in DataStore.java .
Partition: A logical division of ground atoms in a data store. Implemented in Partition.java .
Database: A logical view of a data store, constructed by specifying a write partition and one or more read partitions of a data store. Implemented in Database.java .
Open Predicate: A predicate whose atoms can be random variable atoms, i.e., unobserved.The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.
Closed Predicate: A predicate whose atoms are always observed atoms. The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.
If you use H2 as the backend database for PSL (as is done in the examples), it can be helpful to open up the resulting database and examine it for debugging purposes.
You should set up your PSL program to use H2 on disk and note where it is stored. For example, if you create your DataStore using the following code
DataStore data = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, "/home/steve/psl", true), config);
then PSL will create an H2 database in the file /home/steve/psl/psl.mv.db
. Then, run your program so the resulting H2 database can be inspected.
You will need to use the H2 jar for your classpath. This is likely ~/.m2/repository/com/h2database/h2/1.4.192/h2-1.4.192.jar
, but you will need to modify it if, for example, you're using a different version of H2. You start the H2 web server by running the following command:
java -cp ~/.m2/repository/com/h2database/h2/1.4.192/h2-1.4.192.jar org.h2.tools.Server
Once you have started the web server, you can access it at http://localhost:8082
. To log in, you should change the connection string to point to your H2 database file without .mv.db on the end. The username and password are both empty strings.
Welcome to the PSL software Wiki!
To get started with PSL you can follow one of these guides:
Command Line Interface for New Users : If you are new to PSL we suggest that you start with our Command Line Interface (CLI), which allows you to write a complete model in a simple text file.
Groovy for Intermediate Users : If you are comfortable with Java/Groovy, and want to get your hands dirty with advanced modeling capabilities we recommend that you use our Groovy interface.
Java for Application Developers : If you plan on integrating PSL into your own applications, and will need direct access to the Java API, refer to this guide.
PSL requires Java, so before you start make sure that you have Java installed.
Before you get started you may want to learn more about PSL.
PSL is a machine learning framework for building probabilistic models developed by the Statistical Relational Learning Group LINQS at the University of Maryland and the University of California Santa Cruz. PSL has produced state-of-the-art results in many areas spanning natural language processing, social-network analysis, and computer vision. The complete list of publications and projects is available on the PSL homepage . The homepage also has several videos , to introduce you to PSL.
We are improving PSL all the time, and now have two versions! If you are migrating from PSL 1.0 to 2.0 please refer to our Migration Guide.
Probabilistic soft logic (PSL) is a general purpose language for modeling probabilistic and relational domains. It is applicable to a variety of machine learning problems, such as link prediction and ontology alignment. PSL combines the strengths of two powerful theories -- first-order logic, with its ability to succinctly represent complex phenomena, and probabilistic graphical models, which capture the uncertainty and incompleteness inherent in real-world knowledge. More specifically, PSL uses "soft" logic as its logical component and Markov networks as its statistical model.
In "soft" logic, logical constructs need not be strictly false (0)
or true (1)
, but can take on values between 0 and 1 inclusively. For example, in logical formula similarNames(X, Y) => sameEntity(X, Y)
(which encodes the belief that if two people X
and Y
have similar names, then they are likely the same person), the truth value of similarNames(X, Y)
and that of the entire formula lie in the range [0, 1]. The logical operators and (^)
, or (v)
and not (~)
are defined using the Lukasiewicz t-norms, i.e.,
A ^ B = max{A + B - 1, 0}
A v B = min{A + B, 1}
~A = 1 - A
(Note that if the values of A
and B
are restricted to be false or true, then the logical operators work as they are conventionally defined.) PSL provides an interface in the Groovy programming language for users to declaratively encode their knowledge of a particular domain in soft logic.
These logical formulas become the features of a Markov network. Each feature in the network is associated with a weight, which determines its importance in the interactions between features. Weights can be specified manually or learned from evidence data using PSL's suite of learning algorithms. PSL also provides sophisticated inference techniques for finding the most likely answer (i.e. the MAP state) to a user's query. The "softening" of the logical formulas allows us to cast the inference problem as a polynomial-time optimization, rather than a (much more difficult NP-hard) combinatorial one. (See LP relaxation for more details.)
For more details on PSL, please refer to the paper Hinge-Loss Markov Random Fields and Probabilistic Soft Logic.
PSL uses SLF4J for logging. In the PSL Groovy program template, SLF4J is bound to Log4j 1.2. The Log4j configuration file is located at src/main/resources/log4j.properties
. It should look something like this:
# Set root logger level to the designated level and its only appender to A1.
log4j.rootLogger=ERROR, A1
# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
The logging verbosity can be set by changing ERROR
in the second line to a different level and recompiling. Options include OFF
, WARN
, DEBUG
, and TRACE
.
MOSEK is software for numeric optimization. PSL can use MOSEK as a conic program solver via a PSL add on. Mosek support is provided as part of the PSL Experimental package.
First, install MOSEK 6. In addition to a commercial version for which a 30-day trial is currently available, the makers of MOSEK also currently offer a free academic license. Users will need the "PTS" base system for using the linear distribution of the ConicReasoner
and the "PTON" non-linear and conic extension to use the quadratic distribution. Both of these components are currently covered by the academic license.
After installing MOSEK, install the included mosek.jar
file to your local Maven repository. (This file should be in <mosek-root>/6/tools/platform/<your-platform>/bin
.)
mvn install:install-file -Dfile=<path-to-mosek.jar> -DgroupId=com.mosek \
-DartifactId=mosek -Dversion=6.0 -Dpackaging=jar
Next, add the following dependency to your project's pom.xml
file:
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-mosek</artifactId>
<version>YOUR-PSL-VERSION</version>
</dependency>
...
</dependencies>
where YOUR-PSL-VERSION
is replaced with your PSL version.
Finally, it might be necessary to rebuild your project.
After installing the MOSEK add on, you can use it where ever a ConicProgramSolver
is used. To use it for inference with a ConicReasoner
set the conicreasoner.conicprogramsolver
configuration property to oorg.linqs.psl.optimizer.conic.mosek.MOSEKFactory
.
Further, MOSEK requires that two environment variables be set when running. The same bin
directory where you found mosek.jar
needs to be on the path for shared libraries. The environment variable MOSEKLM_LICENSE_FILE
needs to be set to the path to your license file (usually <mosek-root>/6/licenses/mosek.lic
).
In bash in Linux, this can be done with the commands
export LD_LIBRARY_PATH=<path_to_mosek_installation>/mosek/6/tools/platform/<platform>/bin
export MOSEKLM_LICENSE_FILE=<path_to_mosek_installation>/mosek/6/licenses/mosek.lic
On Mac OS X, instead set DYLD_LIBRARY_PATH
to the directory containing the MOSEK binaries.
Our Maven repository has moved
http://maven.linqs.org/maven/repositories/psl-releases/
https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-releases/
The new endpoint will redirect to a https endpoint that may be used if necessary:
https://linqs-data.soe.ucsc.edu/maven/repositories/psl-releases/
All packages have been renamed from edu.umd.cs.*
to org.linqs.*
.
edu.umd.cs.psl.model.argument.ArgumentType
→ org.linqs.psl.model.term.ConstantType
ArgumentType.*
→ ConstantType.*
The arguments for a predicate are now defined in org.linqs.psl.model.term.ConstantType
instead of edu.umd.cs.psl.model.argument.ArgumentType
.
All the same types are supported, just the containing class has been moved and renamed.
new Partition(int)
→ DataStore.getPartition("stringIdentifier")
If the partition does not exist, it will be created and returned.
If it exists, it will be returned.
It is not longer necessary to pass around partitions if you don't want to.
Arithmetic rules are now supported in 2.0. See the Rule Specification for details. Rules in Groovy can now be specified in additional ways. See Rule Specification in Groovy .
Constraints are now implemented using unweighted arithmetic rules. See Constraints for more details.
To speed up utility development and reduce bloat, some components have been removed from this primary PSL repository and brought into their own repositories.
In these sample POM snippets all versions have been set to CANARY
, however you may choose your corresponding release.
https://github.com/linqs/psl-utils
psl-dataloading
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-dataloading</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.ui.loading
edu.umd.cs.psl.ui.data
org.linqs.psl.utils.dataloading
psl-evaluation
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-evaluation</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.evaluation
org.linqs.psl.utils.evaluation
psl-textsim
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-textsim</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.ui.functions.textsimilarity
org.linqs.psl.utils.textsimilarity
https://github.com/linqs/psl-experimental
psl-datasplitter
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-datasplitter</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.util.datasplitter
org.linqs.psl.experimental.datasplitter
psl-experiment
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-experiment</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.ui.experiment
org.linqs.psl.experimental.experiment
psl-mosek
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-mosek</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.optimizer.conic.mosek
org.linqs.psl.optimizer.conic.mosek
psl-optimize
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-optimize</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.optimizer
edu.umd.cs.psl.reasoner.conic
org.linqs.psl.experimental.optimizer
psl-sampler
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-sampler</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.sampler
org.linqs.psl.experimental.sampler
psl-weightlearning
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-weightlearning</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.application.learning.weight.maxmargin
org.linqs.psl.experimental.learning.weight.maxmargin
The following software is required to use PSL:
Ensure that the Java 7 or 8 development kit is installed. Either OpenJDK or Oracle Java work.
We have had some reports of failing builds using Java prior to 1.7.0_110
or 1.8.0_110
.
If you are issues with Maven (especially handshake errors), try updating your version of java to at least 1.7.0_110
or 1.8.0_110
.
This is especially relevant for Mac users where the version of Java is less frequently updated.
PSL uses Maven to manage builds and dependencies. Users should install Maven 3.x. PSL is developed with Maven and PSL programs are created as Maven projects. See running Maven for help using Maven to build projects.
This a HOWTO on releasing a new stable PSL version. All first and second level headers are steps in the process, and should be followed sequentially.
A release is a single commit that increments the software's version number to a stable version number and does nothing else. So, before you release a version, make sure all your changes are committed and pushed, and the code is in the state in which you want to release it.
Make sure the copyright notices are up to date.
Remember to test the code and double check it is ready for release. To clean, compile, and test the code, you can do:
mvn clean test
Stable version numbers are of the format x.y.z, where
The git branch the code is on (the working branch) should already have a version number in its pom.xml
files of the form x.y.z-SNAPSHOT. Whatever x.y.z-SNAPSHOT is, the new version will be x.y.z.
The first step is to change the version number to the stable version number. Remember to perform the commit at the end of the instructions.
Run the following two commands:
git tag -a x.y.z -m 'Version x.y.z'
git push origin x.y.z
There are two ways the branch structure of the Git repo can change because of a new stable version:
The Master branch should always point to the commit of the highest stable version number, where x, y, and z are treated as separate orders of magnitude.
So, if the master branch points to version 1.2, then releasing 1.1.1 would not update the master branch, but releasing 1.2.1 or 1.3 would.
If you are updating the master branch, it should already be upstream of the new stable version. Substituting the working branch name for WORKING_BRANCH, simply run the following commands:
git checkout master git pull origin WORKING_BRANCH git push
There should now be a working branch pointing to the tag "x.y.z" (and possibly the master branch). If the working branch is not the develop branch, it should probably be deleted (which deletes the branch name, not the commit itself). Don't delete the develop branch! Substituting the working branch name for WORKING_BRANCH, run the following commands:
git branch -d WORKING_BRANCH
git push origin :WORKING_BRANCH
With the new stable version checked out, on a machine with file system access to the repository, in the top level directory of the project (the one with the PSL project pom.xml file, not any of the subprojects), run the following commands:
mvn clean mvn deploy
Update the change log with a list of the main changes since the most recent upstream stable version. For example, if releasing 1.0.2, list the main changes since 1.0.1, even if there is a more recent 1.1 release.
Post an announcement on the user group . Remember to select the "make an announcement" option, rather than "start a discussion." Here is a template:
Subject: New Version: x.y.z
A new stable version of PSL, version x.y.z (https://github.com/linqs/psl/tree/x.y.z) is now available.
See [switching the PSL version your program uses](switching the PSL version your program uses) for instructions on changing your PSL projects to the new version.
In version x.y.z:
[A list of the main changes]
Models in Groovy support three different ways to specify rules:
Rules can be specified using the natural Groovy syntax and the add() method for models. The rule weight and squaring must be specified as additional arguments. Both may be left off to specify an unweighted rule.
model.add(
rule: ( Likes(A, 'Dogs') & Likes(B, 'Dogs') ) >> Friends(A, B),
weight: 5.0,
squared: true
);
Because the in-line syntax must be a subset of Groovy syntax, the following operator variants are not supported:
&&
||
->
<<
<-
!
model.add(
rule: "( Likes(A, 'Dogs') & Likes(B, 'Dogs') ) >> Friends(A, B)",
weight: 5.0,
squared: true
);
// Produces the same rule as above.
model.add(
rule: "5.0: ( Likes(A, 'Dogs') && Likes(B, 'Dogs') ) -> Friends(A, B) ^2"
);
// An unweighted (constraint) variant of the above rule.
model.add(
rule: "( Likes(A, 'Dogs') && Likes(B, 'Dogs') ) -> Friends(A, B)"
);
// An arithmetic constraint.
model.add(
rule: "Likes(A, +B) = 1 ."
);
Rules can also be specified directly as a string. Because they are not limited by the Groovy syntax, all operators are available. A rule that specifies a weight and squaring in the string may not also pass "weight" and "squared" arguments.
// Load multiple rules from a single string.
model.addRules("""
1: ( Likes(A, 'Dogs') & Likes(B, 'Dogs') ) >> Friends(A, B) ^2
Likes(A, +B) = 1 .
""");
// Load multiple rules from a file.
model.addRules(new FileReader("myRules.txt"));
The addRules() method may be used add multiple rules at a time, each rule on its own line.
A String
or Reader
may be passed.
Each rule must be fully specified with respects to weights and squaring.
Constraints are specified as unweighted arithmetic rules. So all you need to do is make an arithmetic rule and either explicitly specify that it is unweighted (using the period syntax), or not specify a weight.
```groovy // An unweighted rule (constraint) explicitly specified with a period. model.add( rule: "Likes(A, +B) = 1 ." );
// An unweighted rule (constraint) implicitly specified by not adding a weight. model.add( rule: "Likes(A, +B) = 1" );
PSL supports two primary types of rules: Logical and Arithmetic. Each of these types of rules support weights and squaring.
Logical rules in PSL are implications joined with logical operators (with the exception of negative priors). Since PSL uses soft logic, hard logic operators are replaced with Lukasiewicz operators.
& (&&)
- Logical And
The and
operator is binary and functions as a Lukasiewicz t-norm
operator:
A & B = MAX(0, A + B - 1)
| (||)
- Logical Or
The or
operator is binary and functions as a Lukasiewicz t-conorm
operator:
A | B = MIN(1, A + B)
>> (->) / << (<-)
- Implication
The implication
operator acts similar to the standard logical implication where the truth of the body implies the truth of the head.
Note that the head is always the side the that arrow is pointing at and both directions are supported.
It is most common to see rules where the body is on the left and the head is on the right.
The body of an implication must be a conjunctive clause (contain only and
operators) while the head must be a disjunctive clause (contain only or
operators).
~ (!)
- Negation
The negation
operator is unary and functions as a Lukasiewicz negation
operator:
~A = 1 - A
// The same rule written in two different ways.
Nice(A) & Nice(B) -> Friends(A, B)
Friends(A, B) << Nice(A) && Nice(B)
// Using a disjunction in the head instead of a conjunction in the body.
// Also written two different ways.
Friends(A, B) >> Nice(A) || Nice(B)
Nice(A) | Nice(B) <- Friends(A, B)
Arithmetic rules are relations of two linear combinations.
The following operators are used in arithmetic rules:
+
-
*
/
Note that each side of an arithmetic rule must be a linear combination, so +/-
is only allowed between terms and *//
is only allowed for coefficients.
The following relational operators are allowed between the two linear combinations:
=
<=
>=
A summation can be used when you want to aggregate over a variable.
You turn a variable into a summation variable by prefixing it with a +
.
Each sum variable can only be used once per expression, but you may have multiple different summation variables.
A filter clause appears at the end of a rule and decides what values the summation variable can take. There can be multiple filter clauses for each rule, but each summation variable can have at most one filter clause. The filter clause is a logical expression, but uses hard logic rather than Lukasiewicz. All non-zero truth values are considered true in a filter expression. If this expression evaluates to zero for a value, then that value is not used in the summation. Valid things that may appear in the filter clause are:
// Only sum up friends of A that are nice.
Friends(A, +B) <= 1 {B: Nice(B)}
// Only sum up friends of A that are similar to A.
// Note that Similar must be a closed predicate.
Friends(A, +B) <= 1 {B: Similar(A, B)}
// Only sum up friends of A that are not similar to A.
// Note that Similar must be a closed predicate.
Friends(A, +B) <= 1 {B: !Similar(A, B)}
// Only sum up friends of A where both A and B are nice and similar.
// Note that Similar must be a closed predicate.
Friends(A, +B) <= 1 {B: Nice(A) & Nice(B) & Similar(A, B)}
// Only sum up combinations for friends where A is nice and B is not nice.
Friends(+A, +B) <= 1 {A: Nice(A)} {B: !Nice(B)}
Each term in an arithmetic rule can take an optional coefficient. The coefficient may be any real number and can either appear before the term and act as a multiplier:
2.5 * Similar(A, B) >= 1
or, appear after the term and act as a divisor:
Similar(A, B) / 2.5 <= 1
Special coefficients (called Coefficient Operators) may be used:
|A|
- Cardinality
A cardinality coefficient may only be used on a summation variable.
It becomes the count of the number of terms substituted for a summation variable.
@Min[A, B]
- Max
Returns the maximum of A
and B
.
May be used with summation variables.
@Max[A, B]
- Min
Returns the minimum of A
and B
.
May be used with summation variables.
(Note that these rules are meant to show the semantics of arithmetic rules and may not make logical sense.)
Friends(A, B) = 0.5
Friends(A, +B) <= 1
Friends(A, +B) / |B| <= 1
@Min[2, |B|] * Friends(A, +B) <= 1
Friends(A, +B) <= 1 {B: Nice(B)}
Friends(A, B) <= Nice(A) + Nice(B)
Friends(A, B) >= 3.0 * Nice(A) - 2.0 * Nice(B)
Every rule must be either weighted or unweighted. Unweighted rules are also called constraints since they are strictly enforced.
Weighted rules are prefixed with the weight and a colon:
<weight>: <rule>
For example:
2.5: Nice(A) & Nice(B) & (A != B) -> Friends(A, B)
5.0: Friends(A, +B) <= 1
10.0: Friends(A, +B) <= 1 {B: Nice(B)}
Unweighted rules are suffixed with a period:
<rule> .
For example:
Nice(A) & Nice(B) & (A != B) -> Friends(A, B) .
Friends(A, +B) <= 1 .
Friends(A, +B) <= 1 . {B: Nice(B)}
Any weighted rule can choose to square their hinge-loss functions.
Squaring the hinge-loss (or "squared potentials") may result in better performance.
Non-squared potentials tend to encourage a "winner take all" optimization, while squared potentials encourage more trading off.
To square a rule, just suffix a ^2
to it:
2.5: Nice(A) & Nice(B) (A != B) -> Friends(A, B) ^2
5.0: Friends(A, +B) <= 1 ^2
10.0: Friends(A, +B) <= 1 ^2 {B: Nice(B)}
You may specify priors for a predicate in PSL. A prior is specified on a specific predicate and affects all open ground atoms of that predicate. Priors in PSL must be weighted and may be squared. Priors tend to have low weights, since they are supposed to get overpowered by evidence.
Note that priors are not the same as specifying an initial value for your open predicate in a data file. Once optimization starts, the initial value specified in the data file will quickly get changed and have little/no impact on the final optimization. A prior however, is a ground rule that becomes a full fledged potential function that actively participates in optimization.
Negative priors are the most common type of prior in PSL. It assumes that all ground atoms for the predicate are zero.
Negative priors may be specified using logical rules:
1.0: ~Friends(A, B) ^2
This prior can be interpreted as "By default, people are not friends".
Arithmetic rules may also be used to specify a negative prior:
1.0: Friends(A, B) = 0 ^2
Positive priors can be a little more tricky than negative priors.
If you want all the ground atoms to take the same positive prior, then you can just use an arithmetic rule:
1.0: Friends(A, B) = 0.75 ^2
If you want different ground atoms to have different positive priors, then you will need to use a surrogate predicate. First, create a new closed predicate that corresponds 1-to-1 with your open predicate you wish to put the prior on. Then add observations for the surrogate predicate with the truth value being the prior you wish to put on that ground atom. Now just create a rule that directly ties together the surrogate predicate to the open predicate. See the example below.
Consider a PSL program where we are trying to infer friendship (the Friends
predicate).
We may have a prior belief on the friendship quality between all people in our data.
To encode this prior belief, we will first construct a surrogate predicate called FriendsPrior
(the name does not matter).
Now we will load FriendsPrior
with observations where the truth value of the observation is our prior.
Our data file for FriendsPrior
may look something like:
Alice Bob 1.0
Alice Charlie 0.75
Bob Charlie 0.33
Now we will add this rule that acts as our prior:
1.0: FriendsPrior(A, b) -> Friends(A, B) ^2
Both logical and arithmetic rules support some special operators.
== (=)
- Equals
Ensure that that the left and right side are equal.
Note that this is not the same as a similarity function evaluating to 1.
Two variables may be 100% similar, but equals will only evaluate to 1 unless they refer to the same value.
!= (~=)
- Not Equals
Evaluates to 1 when both side are not the same.
This is a very common operator to use in most rules.
For example, consider the following two rules:
Nice(A) && Nice(B) -> Friends(A, B)
Nice(A) && Nice(B) && (A != B) -> Friends(A, B)
If A and B can both take the values "Alice" and "Bob", then the first rule would generate the following groundings:
Nice("Alice") && Nice("Alice") -> Friends("Alice", "Alice")
Nice("Alice") && Nice("Bob") -> Friends("Alice", "Bob")
Nice("Bob") && Nice("Bob") -> Friends("Bob", "Bob")
Nice("Bob") && Nice("Alice") -> Friends("Bob", "Alice")
While the second rule would only generate two groundings:
Nice("Alice") && Nice("Bob") -> Friends("Alice", "Bob")
Nice("Bob") && Nice("Alice") -> Friends("Bob", "Alice")
% (^)
- Non-Symmetric
Ensure that the reverse (or equal) paring of the two operands is not grounded.
For example, consider the following two rules:
SimilarNames(A, B) && (A % B) -> SamePerson(A, B)
SimilarNames(A, B) -> SamePerson(A, B)
If A and B can both take the values "Alice" and "Bob", then the first rule would generate the following groundings:
SimilarNames("Alice", "Alice") && ("Alice" % "Alice") -> SamePerson("Alice", "Alice")
SimilarNames("Alice", "Bob") && ("Alice" % "Bob") -> SamePerson("Alice", "Bob")
SimilarNames("Bob", "Alice") && ("Bob" % "Alice") -> SamePerson("Bob", "Alice")
SimilarNames("Bob", "Bob") && ("Bob" % "Bob") -> SamePerson("Bob", "Bob")
While the second rule would only generate one grounding:
SimilarNames("Alice", "Bob") && ("Alice" % "Bob") -> SamePerson("Alice", "Bob")
PSL includes implementations of Markov Logic inference algorithms. You can use them in your inference and learning applications by setting the following configuration options. Note that these implementations do not support all constraints allowed in PSL. If your program's constraint set does not decompose over atoms, (i.e., each atom participates in at most one constraint), then they will throw exceptions.
MPEInference and LazyMPEInference can use MaxWalkSat (MPE inference) and MC-Sat (marginal inference) with the following configuration options. Marginal probabilities will be set as the atoms' truth values.
# Sets MPEInference to perform Markov Logic MPE inference
<bundle>.mpeinference.reasoner = org.linqs.psl.reasoner.bool.BooleanMaxWalkSatFactory
# Sets MPEInference to perform Markov Logic marginal inference
<bundle>.mpeinference.reasoner = org.linqs.psl.reasoner.bool.BooleanMCSatFactory
# Sets LazyMPEInference to perform Markov Logic MPE inference
<bundle>.lazympeinference.reasoner = org.linqs.psl.reasoner.bool.BooleanMaxWalkSatFactory
# Sets LazyMPEInference to perform Markov Logic marginal inference
<bundle>.lazympeinference.reasoner = org.linqs.psl.reasoner.bool.BooleanMCSatFactory
Weight learning that uses a reasoner for MPE inference as a subroutine (e.g., MaxLikelihoodMPE, LazyMaxLikelihoodMPE) can also use Markov Logic MPE inference.
<bundle>.weightlearning.reasoner = org.linqs.psl.reasoner.bool.BooleanMaxWalkSatFactory
MaxPseudoLikelihood also supports Markov Logic weight learning.
<bundle>.maxpseudolikelihood.bool = true
To run a PSL program, change to the top-level directory of its project (the directory with the Maven pom.xml
file).
Compile your project:
mvn compile
Now use Maven to generate a classpath for your project's dependencies:
mvn dependency:build-classpath -Dmdep.outputFile=classpath.out
You can now run a class with the command:
java -cp ./target/classes:`cat classpath.out` <fully qualified class name>
where \
Tips and troubleshooting
run.sh
script that will compile and run the program. Look for this script first.java
command is used to run a script.sh mvn exec:java -Dexec.mainClass=<fully qualified class name>
The advantages are that the project does not need to be compiled separately and the classpath does not need to be generated or updated separately. The disadvantages are that the class output is preceded and succeeded by Maven output, exception stack traces are not printed by default (add the -e
switch), and Maven adds some overhead to execution (sometimes a significant amount, especially on less powerful machines).To change the version of PSL your project uses, edit your project's pom.xml
file. The POM will declare dependencies on one or more PSL artifacts, e.g.,
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>2.0.0</version>
</dependency>
...
</dependencies>
Change the version element of each such dependency to a new version (all the same one) and rebuild. See Choosing a Version of PSL to decide how to specify your version.
We can use maven in the project root to create javadocs for all modules and link them together.
This applies to the primary psl
repository, psl-experimental
, and psl-utils
.
mvn javadoc:aggregate
Note that psl-experimental
and psl-utils
depend on psl
.
In addition, the javadocs for experimental and utils will try to link to psl.
So, the psl
javadocs should be deployed before the experimental and util docs are built.
Before releasing a new stable version, it is good to make sure that PSL's copyright notices are up to date. This can be simply done from the command line.
For example, if I wanted to update the copyright from 2013-2015
to 2013-2017
, I could do the following in the project root:
mvn clean # We do not want to bother updating any compiled artifacts.
grep -R "2013-2015" # Examine the results and ensure that you are only updating correct references.
grep -Rl "2013-2015" | xargs sed -i 's/2013-2015/2013-2017/g' # Do the actual replacement.
This is a HOWTO on changing the version number in the PSL code base. In most, if not all, cases, this HOWTO should be followed as part of a larger one, such as Releasing a New Stable Version, not by itself.
A new version number should be applied as a new commit that does nothing else, so make sure you are working on a clean working copy with no uncommitted changes.
Version numbers consist of the following components:
Your new version number should be of the form x.y.z (for a stable version) or x.y.z-SNAPSHOT (for an unstable version).
All the occurrences of a PSL version number should be kept in sync, i.e., have the same value for all occurrences in all pom.xml
files and other resources across all modules. In addition, only one commit in the entire Git repository should have a particular stable version number.
Each pom.xml
has only one instance of the version number that will need to be changed.
You can change each instance manually, or use the following commands (replacing 2.0.0-SNAPSHOT
and 2.1.0-SNAPSHOT
with the actual versions):
find . -name pom.xml | xargs grep '<version>2.0.0-SNAPSHOT</version>' # Examine the results and ensure that you are only updating correct references.
find . -name pom.xml | xargs sed -i 's#<version>2.0.0-SNAPSHOT</version>#<version>2.1.0-SNAPSHOT</version>#g' # Perform the actual replacement.
Commit the changes with one of the following commit messages.
If you are changing to a stable version, use:
Version x.y.z
If you are changing to a new snapshot version, use:
Started x.y.z-SNAPSHOT
Push your commit when finished.
The Git website has information on installing Git, as do the GitHub guides mentioned below. This tutorial is helpful for learning how to use Git, and this tutorial is particularly helpful for SVN users.
To use an existing branch in the remote repo on GitHub, create a tracking branch to track it. It can be kept in sync via git pull
. For example to track the branch 'develop' (assuming the GitHub repo is named 'origin') run
>> git branch --track develop origin/develop
then
>> git checkout develop
Create a free account on GitHub. Then follow one of the following sets of instructions to set up Git and GitHub:
You can fork the PSL repository, which means that you create a fork hosted on GitHub. You then clone that repository to a local machine, make commits, and, optionally, push some or all of those commits back to the repository on GitHub. Those commits are then publicly available (unless you have paid GitHub for private hosting).
PSL provides a Command Line Interface. The CLI is the easiest interface to PSL and handles most situations where you do not need additional customization.
PSL requires that you have Java installed .
The PSL jar file psl-cli-CANARY.jar
already contains all required PSL libraries that you need to be able to run your PSL programs. You can find a current snapshot of this .jar
file from our resources directory until we finalize our v2.0 release.
In this page we will be using the CANARY
build, but you may use any PSL jar that is at least version 2.0.0.
Let's first download the files for our example program, run it and see what it does!
In this program, we'll use information about known locations of some people, know people know, and what people like to infer who knows each other. We'll first run the program and see the output. We will be working from the command line so open up your shell or terminal.
As with the other PSL examples, you can find all the code in our psl-examples repository.
We will be using the easy
link prediction
example.
git clone https://bitbucket.org/linqs/psl-examples.git
cd psl-examples/link_prediction/easy/cli
All the required commands are contained in the run.sh
script.
However, the commands are very simple and can also be run individually.
You only need to fetch the jar (done in the setup steps above) and run PSL.
wget https://linqs-data.soe.ucsc.edu/maven/repositories/psl-releases/org/linqs/psl-cli/CANARY/psl-cli-CANARY.jar
java -jar psl-cli-CANARY.jar -infer -model simple_lp.psl -data simple_lp.data
You should now see output that looks like this (note that the order of the output lines may differ):
0 [main] INFO org.linqs.psl.config.ConfigManager - PSL configuration psl.properties file not found. Only default values will be used unless additional properties are specified.
6 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.dbpath. Returning default of /tmp/cli.
19 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.dbpath. Returning default of /tmp/cli.
133 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.rdbmsdatastore.valuecolumn. Returning default of truth.
134 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.rdbmsdatastore.confidencecolumn. Returning default of confidence.
134 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.rdbmsdatastore.partitioncolumn. Returning default of partition.
150 [main] INFO org.linqs.psl.config.ConfigManager - Found value true for option cli.rdbmsdatastore.usestringids.
154 [main] INFO org.linqs.psl.cli.Launcher - data:: loading:: ::starting
208 [main] WARN org.linqs.psl.database.rdbms.RDBMSDataStoreMetadata - Determining max partition, no partitions found null
260 [main] INFO org.linqs.psl.cli.Launcher - data:: loading:: ::done
260 [main] INFO org.linqs.psl.cli.Launcher - model:: loading:: ::starting
320 [main] INFO org.linqs.psl.cli.Launcher - model:: loading:: ::done
335 [main] INFO org.linqs.psl.cli.Launcher - operation::infer ::starting
336 [main] INFO org.linqs.psl.config.ConfigManager - Found value org.linqs.psl.reasoner.admm.ADMMReasonerFactory@34a0ee3f for option cli.mpeinference.reasoner.
339 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.admmreasoner.maxiterations. Returning default of 25000.
339 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.admmreasoner.stepsize. Returning default of 1.0.
339 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.admmreasoner.epsilonabs. Returning default of 1.0E-5.
339 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.admmreasoner.epsilonrel. Returning default of 0.001.
339 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.admmreasoner.stopcheck. Returning default of 1.
340 [main] INFO org.linqs.psl.config.ConfigManager - No value found for option cli.admmreasoner.numthreads. Returning default of 8.
345 [main] INFO org.linqs.psl.application.inference.MPEInference - Grounding out model.
463 [main] INFO org.linqs.psl.application.inference.MPEInference - Beginning inference.
647 [main] INFO org.linqs.psl.reasoner.admm.ADMMReasoner - Optimization completed in 662 iterations. Primal res.: 0.022740805753995307, Dual res.: 5.499541249718983E-4
647 [main] INFO org.linqs.psl.application.inference.MPEInference - Inference complete. Writing results to Database.
669 [main] INFO org.linqs.psl.cli.Launcher - operation::infer inference:: ::done
KNOWS('Alex', 'Jay') = 0.6563306840338842
KNOWS('Steve', 'Ben') = 0.44100447478155413
< ... 50 rows omitted for brevity ...>
KNOWS('Sabina', 'Arti') = 0.7194742867561412
KNOWS('Dhanya', 'Elena') = 0.3682973941849134
KNOWS('Elena', 'Sabina') = 0.3287882658219531
Now that we've run our first program that performs link prediction to infer who knows who, let's understand the steps that we went through to infer the unknown values: defining the underlying model, providing data to the model, and running inference to classify the unknown values.
A model in PSL is a set of logic-like rules.
The model is defined inside a text file with the format .psl
. We describe this model in the file simple_lp.psl
.
Let's have a look at the rules that make up our model:
20: Lived(P1,L) & Lived(P2,L) & P1!=P2 -> Knows(P1,P2) ^2
5: Lived(P1,L1) & Lived(P2,L2) & P1!=P2 & L1!=L2 -> !Knows(P1,P2) ^2
10: Likes(P1,L) & Likes(P2,L) & P1!=P2 -> Knows(P1,P2) ^2
5: Knows(P1,P2) & Knows(P2,P3) & P1!=P3 -> Knows(P1,P3) ^2
10000: Knows(P1,P2) -> Knows(P2,P1) ^2
5: !Knows(P1,P2) ^2
The model is expressing the intuition that people who have lived in the same location or like the same thing may know each other.
The integer values at the beginning of rules indicate the weight of the rule.
Intuitively, this tells us the relative importance of satisfying this rule compared to the other rules.
The ^2
at the end of the rules indicates that the hinge-loss functions based on groundings of these rules are squared, for a smoother tradeoff.
For more details on hinge-loss functions and squared potentials, see the publications on our PSL webpage.
PSL rules consist of predicates. The names of the predicates used in our model and possible substitutions of these predicates with actual entities from our network are defined inside the file simple_lp.data
.
Let's have a look:
predicates:
Knows/2: open
Likes/2: closed
Lived/2: closed
observations:
Knows : ../data/knows_obs.txt
Lived : ../data/lived_obs.txt
Likes : ../data/likes_obs.txt
targets:
Knows : ../data/knows_targets.txt
truth:
Knows : ../data/knows_truth.txt
In the predicate
section, we list all the predicates that will be used in rules that define the model.
The keyword open
indicates that we want to infer some substitutions of this predicate while closed
indicates that this predicate is fully observed.
I.e. all substitutions of this predicate have known values and will behave as evidence for inference.
For our simple example, we fully observe where people have lived and what things they like (or dislike).
Thus, Likes
and Lived
are both closed predicates.
We are aware of some instances of people knowing each other, but wish to infer the other insatnces Knows
an open predicate.
In the observations
section, for each predicate for which we have observations, we specify the name of the .txt
file containing the observations.
For example, knows_obs.txt
and lived_obs.txt
specifies which people know each other and where some of these people live, respectively.
The targets
section specifies a .txt
file that, for each open predicate, lists all substitutions of that predicate that we wish to infer.
In knows_targets.txt
, we specify the pairs of people for whom we wish to infer.
The truth
section specifies a .txt
file that provides a set of ground truth observations for each open predicate.
Here, we give the actual values for the Knows
predicate for all the people in the network as training labels. We describe the the general data loading scheme in more detail in the sections below.
When we run the java -jar psl-cli-CANARY.jar -infer -model simple_lp.psl -data simple_lp.data
command with the -infer
flag, PSL's inference engine substitutes values from the data files into the rules of the model and runs inference on the targets.
To create a PSL model, you should define a set of rules in a .psl
file.
Let's go over the basic syntax to write rules. Consider this very general rule form:
w: P(A,B) & Q(B,C) -> R(A,C) ^2
The first part of the rule, w
, is an integer value that specifies the weight of the rule.
In this example, P
, Q
and R
are predicates.
Logical rules consist of the rule "body" and rule "head."
The body of the rule appears before the ->
which denotes logical implication.
The body can have one or more predicates conjuncted together with the &
that denotes logical conjunctions.
The head of the rule should be a single predicate.
The predicates that appear in the body and head can be any combination of open and closed predicate types.
The Rule Specification page contains the full syntax for PSL rules.
In a .data
file, you should first define your predicates:
as shown in the above example.
Use the open
and closed
keywords to characterize each predicate.
An closed
predicate is a predicate whose values are always observed.
For example, the knows
predicate from the simple example is closed because we fully observe the entire network of people that know one another.
On the other hand, an open
predicate is a predicate where some values may be observed, but some values are missing and thus, need to be inferred.
As shown above, then create your observations:
, targets:
and truth:
sections that list the names of .txt
files that specify the observed values for predicates, values you want to infer for open predicates and observed ground truth values for open predicates.
For all predicates, all possible substitutions should be specified either in the target files or in the observation files. The observations files should contain the known values for all closed predicates and can contain some of the known values for the open predicates. The target files tell PSL which substitutions of the open predicates it needs to infer. Target files cannot be specified for closed predicates as they are fully observed.
The truth files provide training labels in order learn the weights of the rules directly from data. This is similar to learning the weights of coefficients in a logistic regression model from training data. Weight learning is described below in greater detail.
Run inference with the general command:
java -jar psl-cli-CANARY.jar -infer -model [name of model file].psl -data [name of data file].data
When we run inference, the inferred values are outputted to the screen as shown for our example above. If you want to write the outputs to a file and use the inferred values in various ways downstream, you can use:
java -jar psl-cli-CANARY.jar -infer -model [name of model file].psl -data [name of data file].data -output [directory to write output files]
Values for all predicates will be output as .csv
files in the specified output directory.
With the inferred values, some downstream tasks that you can perform are:
Ensure that you have the prerequisites installed.
Looking at the examples is a great way to get familiar with the Groovy interface. We also have an in depth walkthrough of one of our examples.
After you have looked at the examples, you should set up a new PSL project . You can run a PSL Groovy program in the same way that you run an example program.
Here are some more detailed topics that you may need:
Application builders and advanced users can integrate PSL into their code as a library. Since the PSL codebase is organized as a Maven project, it is easiest to include PSL as a dependency via Maven.
The PSL codebase is organized as a Maven project with several subprojects. The subproject most likely of interest is psl-core
, but stable versions of all the subprojects are published to the PSL Maven repository . Including a PSL subproject in your Maven project is easy. It requires two steps
First, add psl-core
(and any other subprojects) as dependencies to your pom.xml
file:
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-core</artifactId>
<version>2.0.0</version>
</dependency>
...
</dependencies>
Second, specify the location of the PSL Maven repository in your pom.xml
file, anywhere within the <project> </project>
tags:
<repositories>
<repository>
<releases>
<enabled>true</enabled>
<updatePolicy>daily</updatePolicy>
<checksumPolicy>fail</checksumPolicy>
</releases>
<id>psl-releases</id>
<name>PSL Releases</name>
<url>http://maven.linqs.org/maven/repositories/psl-releases/</url>
<layout>default</layout>
</repository>
</repositories>
Maven will now make the the required PSL libraries and their dependencies available when compiling and running your project.
Release Date | PSL | PSL Utils | PSL Experimental | |||||||
---|---|---|---|---|---|---|---|---|---|---|
2015-10-11 | 1.2.1 | Code | API Doc | Static Wiki | ||||||
2017-07-04 | 2.0.0 | Code | API Doc | Static Wiki | 1.0.0 | Code | API Doc | 1.0.0 | Code | API Doc |
The job of a Weight Learning Application is to use data to learn the weights of each rule in a PSL model.
##Syntax In weight learning we follow the structure below:
<WeightLearningApplication> weightLearner =
new <WeightLearningApplication>(<model>, <targetDatabase>, <groundTruthDatabase>, <config>);
<model>
is the model specified by your PSL program. <targetDatabase>
is a database which contains all of the atoms for which you would like to infer values. When you create this database, the target predicate will be open. <groundTruthDatabase>
is a database which contains the known values of the atoms for which you are inferring values in the targetDatabase. When you create this database the predicates should be closed. <config>
is your config bundle . Weight Learning Applications include:
LazyMaxLikelihoodMPE
L1MaxMargin
MaxLikelihoodMPE
MaxPseudoLikelihood
The Canary is a published build of PSL that is based on the development branch. The name "Canary" comes from the iconic use of a canary in a coal mine to detect toxic gas. The build is somewhere near the tip of the development tree. It is updated whenever the PSL developers feel a significant change has been made in development.
The use canary, simply change your PSL version in your pom.xml
to CANARY
.
There is only one official Canary build (no version numbers).
We do not use version number on Canary, so you can always simply pull CANARY
and be sure that you have the latest version.
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>CANARY</version>
</dependency>
...
</dependencies>
Since we do not use version numbers with CANARY, Maven will not fetch a new version unprompted.
To upgrade your canary build, delete the old Canary from your Maven cache.
On Lunix/Mac, this is: ~/.m2/repository/org/linqs/psl*/CANARY