This page is for building the PSL source code for the purposes of development. For running a standard PSL program, see Running a Program.
To get the code, simply clone the repository:
git clone https://github.com/linqs/psl.git
If you are already comfortable using Git, then you can just skip ahead to the section on compiling PSL.
The Git website has information on installing Git, as does the GitHub guides mentioned below. This tutorial is helpful for learning how to use Git, and this tutorial is particularly helpful for SVN users.
Once Git is installed and you're ready to use it, you can run the above command to clone the PSL repository.
Between releases, the develop branch may be significantly ahead of the master branch. To see the latest changes, checkout the develop branch.
git checkout develop
To contribute code to PSL first fork the PSL development fork.
Then you clone that repository to a local machine, make commits, and push some or all of those commits back to the repository on GitHub. When your change is ready to be added to PSL, you can submit a pull request which will be reviewed by the PSL maintainers. The maintainers may request that you make additional changes. After your code is deemed acceptable, it will get merged into the develop branch of PSL.
PSL uses the maven build system. Move to the top-level directory of your working copy and run:
mvn compile
You can install PSL to your local Maven repository by running:
mvn install
Remember to update your project's pom.xml
file with the (possibly) new version you installed.
PSL comes with several builtin similarity functions. If you have a need not captured by these functions, then you can also create customized similarity functions.
These similarity functions are shipped with the PSL Utils repository.
Name: Cosine Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.CosineSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Cosine_similarity
Name: Dice Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.DiceSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
Name: Jaccard Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.JaccardSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaccard_index
Name: Jaro Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.JaroSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.census.gov/srd/papers/pdf/rr91-9.pdf
Name: Jaro-Winkler Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Name: Level 2 Jaro-Winkler Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.Level2JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Jaro-Winkler Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).
Name: Level 2 Levenshtein Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.Level2LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Levenshtein Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).
Name: Level 2 Monge Elkan Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.Level2MongeElkanSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.aaai.org/Papers/KDD/1996/KDD96-044.pdf
Name: Levenshtein Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Levenshtein_distance
Name: Same Initials
Qualified Path: org.linqs.psl.utils.textsimilarity.SameInitials
Arguments: String, String
Return Type: Discrete
Description: First splits the input strings on any whitespace and ensures both have the same number of tokens (returns 0 if they do not). Then, the first character of all the tokens are checked for equality (ignoring case and order of appearance). Note that all all character that are not alphabetic ASCII characters are considered equal (eg. all numbers and unicode are considered the same character).
Name: Same Number of Tokens
Qualified Path: org.linqs.psl.utils.textsimilarity.SameNumTokens
Arguments: String, String
Return Type: Discrete
Description: Checks same number of tokens (delimited by any whitespace).
Name: Sub String Similarity
Qualified Path: org.linqs.psl.utils.textsimilarity.SubStringSimilarity
Arguments: String, String
Return Type: Continuous
Description: If one input string is a substring of another, then the length of the substring divided by the length of the text is returned. 0 is returned if neither string is a substring of the other.
The .data
files for the PSL CLi are YAML files with some additional constraints.
The accepted top-level keys for data files are:
predicates
observations
targets
truth
This section defines the allowed predicates for a PSL program. The minimum amount of information that must be supplied for a predicate is:
open
or closed
.A simple predicates
section may look like:
predicates:
Foo/2: closed
Bar/2: open
In addition to the base information, you can also specify the types of the predicate arguments and whether or not the predicate is a block.
These are specified with the types
and block
keys respectively.
If the types of a predicate are not explicitly stated, then UniqueStringID
is used (or UniqueIntID
if --int-ids
is specified as a command line option).
If an arity is not specified, then the number of supplied types is inferred to be the arity.
If both the arity and specific types are supplied, then both sizes much match.
Allowed types are specified by the ConstantType enum.
The blocking nature of a predicate is used in some advanced and experimental PSL features.
A more advanced predicates
section may look like:
predicates:
Foo/2:
- closed
- types:
- UniqueIntID
- String
Bar:
- open
- types:
- UniqueStringID
- Integer
- block
The remaining top level keys are all partitions that data should be loaded into:
observations
targets
truth
In these sections, you specify the files that should be loaded into each partition for each predicate. One or more tab-separated files may be specified for each partition and predicate. If you do not have data for a predicate/partition pairing, just leave it out.
Below is a sample of these sections:
observations:
Foo: foo_obs.txt
Bar: bar_obs.txt
targets:
Foo:
- foo_targets_1.txt
- foo_targets_2.txt
truth:
Foo: foo_truth.txt
Version 2.1.0 (https://github.com/linqs/psl/tree/2.1.0)
Full Change Information
Version 2.0.0 (https://github.com/linqs/psl/tree/2.0.0)
edu.umd.cs
to org.linqs
.Version 1.2.1 (https://github.com/linqs/psl/tree/1.2.1)
Version 1.2 (https://github.com/linqs/psl/tree/1.2)
Version 1.1.1 (https://github.com/linqs/psl/tree/1.1.1)
Version 1.1 (https://github.com/linqs/psl/tree/1.1)
Version 1.0.2 (https://github.com/linqs/psl/tree/1.0.2)
Version 1.0.1 (https://github.com/linqs/psl/tree/1.0.1)
Version 1.0 (https://github.com/linqs/psl/tree/1.0)
Maven allows several ways to specify acceptable versions for dependencies. This page discusses the recommended options to specifying the PSL version to use.
If you are working on a paper or code that requires exact reproducibility, then you should specify an exact version of PSL.
For example:
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>2.1.3</version>
</dependency>
...
</dependencies>
If you want to get bug fixes without worrying about breaking changes, then you can specify a major and minor version while allowing the incremental (patch) version to grow.
For example:
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>[2.1,)</version>
</dependency>
...
</dependencies>
If you want the latest stable code and can tolerate the occasional breakage, then you can specify just the major version.
For example:
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>[2,)</version>
</dependency>
...
</dependencies>
If you are doing development any are willing to accept potential bugs, broken builds, and API breakages, then you can use the canary build. See the working with canary page to get detail on how best to work with the canary build.
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>CANARY</version>
</dependency>
...
</dependencies>
Key: admmmemorytermstore.internalstore
Type: String
Default Value: org.linqs.psl.reasoner.term.MemoryTermStore
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.term.ADMMTermStore
Description: Initial size for the memory store.
Key: admmreasoner.epsilonabs
Type: float
Default Value: 1e-5f
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Absolute error component of stopping criteria. Should be positive.
Key: admmreasoner.epsilonrel
Type: float
Default Value: 1e-3f
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Relative error component of stopping criteria. Should be positive.
Key: admmreasoner.initialconsensusvalue
Type: String
Default Value: InitialValue.RANDOM.toString()
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: The starting value for consensus variables. Values should come from the InitialValue enum.
Key: admmreasoner.initiallocalvalue
Type: String
Default Value: InitialValue.RANDOM.toString()
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: The starting value for local variables. Values should come from the InitialValue enum.
Key: admmreasoner.maxiterations
Type: int
Default Value: 25000
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: The maximum number of iterations of ADMM to perform in a round of inference.
Key: admmreasoner.objectivebreak
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Stop if the objective has not changed since the last logging period (see LOG_PERIOD).
Key: admmreasoner.stepsize
Type: float
Default Value: 1.0f
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.ADMMReasoner
Description: Step size. Higher values result in larger steps. Should be positive.
Key: admmtermgenerator.invertnegativeweights
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.reasoner.admm.term.ADMMTermGenerator
Description: If true, then invert negative weight rules into their positive weight counterparts (negate the weight and expression).
Key: arithmeticrule.delim
Type: String
Default Value: ;
Module: psl-core
Defining Class: org.linqs.psl.model.rule.arithmetic.AbstractArithmeticRule
Description: The delimiter to use when building summation substitutions. Make sure the value for this key does not appear in ground atoms that use a summation.
Key: booleanmaxwalksat.maxflips
Type: int
Default Value: 50000
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMaxWalkSat
Description: Key for positive integer property that is the maximum number of flips to try during optimization
Key: booleanmaxwalksat.noise
Type: double
Default Value: 0.01
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMaxWalkSat
Description: Key for double property in [0,1] that is the probability of randomly perturbing an atom in a randomly chosen potential
Key: booleanmcsat.numburnin
Type: int
Default Value: 500
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMCSat
Description: Number of burn-in samples
Key: booleanmcsat.numsamples
Type: int
Default Value: 2500
Module: psl-core
Defining Class: org.linqs.psl.reasoner.bool.BooleanMCSat
Description: Key for length of Markov chain
Key: categoricalevaluator.categoryindexes
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.evaluation.statistics.CategoricalEvaluator
Description: The index of the arguments in the predicate (delimited by colons).
Key: categoricalevaluator.defaultpredicate
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.evaluation.statistics.CategoricalEvaluator
Description: The default predicate to use when none are supplied.
Key: categoricalevaluator.representative
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.evaluation.statistics.CategoricalEvaluator
Description: The representative metric. Default to accuracy. Must match a string from the RepresentativeMetric enum.
Key: continuousevaluator.representative
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.evaluation.statistics.ContinuousEvaluator
Description: The representative metric. Default to MSE. Must match a string from the RepresentativeMetric enum.
Key: continuousrandomgridsearch.maxlocations
Type: int
Default Value: 250
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.grid.ContinuousRandomGridSearch
Description: The max number of locations to search.
Key: discreteevaluator.representative
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.evaluation.statistics.DiscreteEvaluator
Description: The representative metric. Default to F1. Must match a string from the RepresentativeMetric enum.
Key: discreteevaluator.threshold
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.evaluation.statistics.DiscreteEvaluator
Description: The truth threshold.
Key: em.iterations
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for positive int property for the number of iterations of expectation maximization to perform
Key: em.tolerance
Type: double
Default Value: 1e-3
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.ExpectationMaximization
Description: Key for positive double property for the minimum absolute change in weights such that EM is considered converged
Key: executablereasoner.cleanupinput
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.reasoner.ExecutableReasoner
Description: Key for boolean property for whether to delete the input file to external the reasoner on close.
Key: executablereasoner.cleanupoutput
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.reasoner.ExecutableReasoner
Description: Key for boolean property for whether to delete the output file to external the reasoner on close.
Key: executablereasoner.executablepath
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.reasoner.ExecutableReasoner
Description: Key for int property for the path of the executable.
Key: gridsearch.weights
Type: String
Default Value: 0.001:0.01:0.1:1:10
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.grid.GridSearch
Description: A comma-separated list of possible weights. These weights should be in some sorted order.
Key: guidedrandomgridsearch.explorelocations
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.grid.GuidedRandomGridSearch
Description: The number of initial seed locations to explore based off of whichever ones score the best.
Key: guidedrandomgridsearch.seedlocations
Type: int
Default Value: 25
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.grid.GuidedRandomGridSearch
Description: The number of locations to initially search.
Key: hardem.adagrad
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.HardEM
Description: Key for Boolean property that indicates whether to use AdaGrad subgradient scaling, the adaptive subgradient algorithm of John Duchi, Elad Hazan, Yoram Singer (JMLR 2010). If TRUE, will override other step scheduling options (but not scaling).
Key: hyperband.basebracketsize
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.Hyperband
Description: The base number of weight configurations for each brackets.
Key: hyperband.numbrackets
Type: int
Default Value: 4
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.Hyperband
Description: The number of brackets to consider. This is computed in vanilla Hyperband.
Key: hyperband.survival
Type: int
Default Value: 4
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.Hyperband
Description: The proportion of configs that survive each round in a brancket.
Key: inference.groundrulestore
Type: String
Default Value: org.linqs.psl.application.groundrulestore.MemoryGroundRuleStore
Module: psl-core
Defining Class: org.linqs.psl.application.inference.InferenceApplication
Description: The class to use for ground rule storage.
Key: inference.reasoner
Type: String
Default Value: org.linqs.psl.reasoner.admm.ADMMReasoner
Module: psl-core
Defining Class: org.linqs.psl.application.inference.InferenceApplication
Description: The class to use for a reasoner.
Key: inference.termgenerator
Type: String
Default Value: org.linqs.psl.reasoner.admm.term.ADMMTermGenerator
Module: psl-core
Defining Class: org.linqs.psl.application.inference.InferenceApplication
Description: The class to use for term generator. Should be compatible with REASONER_KEY and TERM_STORE_KEY.
Key: inference.termstore
Type: String
Default Value: org.linqs.psl.reasoner.admm.term.ADMMTermStore
Module: psl-core
Defining Class: org.linqs.psl.application.inference.InferenceApplication
Description: The class to use for term storage. Should be compatible with REASONER_KEY.
Key: initialweighthyperband.internalwla
Type: String
Default Value: MaxLikelihoodMPE.class.getName()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.InitialWeightHyperband
Description: The internal weight learning application (WLA) to use. Should actually be a VotedPerceptron.
Key: lazyatommanager.activation
Type: double
Default Value: 0.01
Module: psl-core
Defining Class: org.linqs.psl.database.atom.LazyAtomManager
Description: The minimum value an atom must take for it to be activated. Must be a flot in (0,1].
Key: lazymaxlikelihoodmpe.maxgrowrounds
Type: int
Default Value: 100
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.LazyMaxLikelihoodMPE
Description: Key for int property for the maximum number of rounds of lazy growing.
Key: lazympeinference.maxrounds
Type: int
Default Value: 100
Module: psl-core
Defining Class: org.linqs.psl.application.inference.LazyMPEInference
Description: Key for int property for the maximum number of rounds of inference.
Key: maxpiecewisepseudolikelihood.numsamples
Type: int
Default Value: 100
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPiecewisePseudoLikelihood
Description: Key for positive integer property. MaxPiecewisePseudoLikelihood will sample this many values to approximate the expectations.
Key: maxspeudolikelihood.bool
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Boolean property. If true, MaxPseudoLikelihood will treat RandomVariableAtoms as boolean valued. Note that this restricts the types of contraints supported.
Key: maxspeudolikelihood.minwidth
Type: double
Default Value: 1e-2
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Key for positive double property. Used as minimum width for bounds of integration.
Key: maxspeudolikelihood.numsamples
Type: int
Default Value: 10
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.maxlikelihood.MaxPseudoLikelihood
Description: Key for positive integer property. MaxPseudoLikelihood will sample this many values to approximate the integrals in the marginal computation.
Key: memorytermstore.initialsize
Type: int
Default Value: 5000
Module: psl-core
Defining Class: org.linqs.psl.reasoner.term.MemoryTermStore
Description: Initial size for the memory store.
Key: optimalcover.blockadvantage
Type: double
Default Value: 100.0
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.OptimalCover
Description: The cost for a blocking predicate is divided by this.
Key: optimalcover.joinadvantage
Type: double
Default Value: 2.0
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.OptimalCover
Description: The cost for a JOIN.
Key: pairedduallearner.admmsteps
Type: int
Default Value: 1
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.PairedDualLearner
Description: Key for Integer property that indicates how many steps of ADMM to run for each inner objective before each gradient iteration (parameter N in the ICML paper)
Key: pairedduallearner.warmuprounds
Type: int
Default Value: 0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.em.PairedDualLearner
Description: Key for Integer property that indicates how many rounds of paired-dual learning to run before beginning to update the weights (parameter K in the ICML paper)
Key: parallel.numthreads
Type: int
Default Value: Runtime.getRuntime().availableProcessors()
Module: psl-core
Defining Class: org.linqs.psl.util.Parallel
Description:
Key: persistedatommanager.throwaccessexception
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.database.atom.PersistedAtomManager
Description: Whether or not to throw an exception on illegal access. Note that in most cases, this indicates incorrectly formed data. This should only be set to true when the user understands why these exceptions are thrown in the first place and the grounding implications of not having the atom initially in the database.
Key: random.seed
Type: int
Default Value: 4
Module: psl-core
Defining Class: org.linqs.psl.util.RandUtils
Description:
Key: randomgridsearch.maxlocations
Type: int
Default Value: 150
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.grid.RandomGridSearch
Description: The max number of locations to search.
Key: rankingevaluator.representative
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.evaluation.statistics.RankingEvaluator
Description: The representative metric. Default to F1. Must match a string from the RepresentativeMetric enum.
Key: rankingevaluator.threshold
Type:
Default Value:
Module: psl-core
Defining Class: org.linqs.psl.evaluation.statistics.RankingEvaluator
Description: The truth threshold.
Key: ranksearch.scalingfactors
Type: String
Default Value: 1:2:10:100
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.search.grid.RankSearch
Description: A comma-separated list of scaling factors.
Key: rdbmsdatabase.optimalcover
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.database.rdbms.RDBMSDatabase
Description: Use optimal cover grounding.
Key: votedperceptron.averagesteps
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: Key for Boolean property that indicates whether to average all visited weights together for final output.
Key: votedperceptron.clipnegativeweights
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: If true, then weight will not be allowed to go negative (clipped at zero).
Key: votedperceptron.cutobjective
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: If true, then cut the step size in half whenever the objective increases.
Key: votedperceptron.inertia
Type: double
Default Value: 0.00
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: The inertia that is used for adaptive step sizes. Should be in [0, 1).
Key: votedperceptron.l1regularization
Type: double
Default Value: 0.0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: Key for positive double property scaling the L1 regularization \gamma * |w|
Key: votedperceptron.l2regularization
Type: double
Default Value: 0.0
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: Key for positive double property scaling the L2 regularization (\lambda / 2) * ||w||^2
Key: votedperceptron.numsteps
Type: int
Default Value: 25
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: Key for positive integer property. VotedPerceptron will take this many steps to learn weights.
Key: votedperceptron.scalegradient
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: Key for Boolean property that indicates whether to scale gradient by number of groundings
Key: votedperceptron.scalestepsize
Type: boolean
Default Value: true
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: If true, then scale the step size down by the iteration.
Key: votedperceptron.stepsize
Type: double
Default Value: 0.2
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: Key for positive double property which will be multiplied with the objective gradient to compute a step.
Key: votedperceptron.zeroinitialweights
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.VotedPerceptron
Description: If true, then start all weights at zero for learning.
Key: weightlearning.evaluator
Type: String
Default Value: ContinuousEvaluator.class.getName()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.WeightLearningApplication
Description: An evalautor capable of producing a score for the current weight configuration. Child methods may use this at their own discrection. This is only used for logging/information, and not for gradients.
Key: weightlearning.groundrulestore
Type: String
Default Value: MemoryGroundRuleStore.class.getName()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.WeightLearningApplication
Description: The class to use for ground rule storage.
Key: weightlearning.randomweights
Type: boolean
Default Value: false
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.WeightLearningApplication
Description: Randomize weights before running. The randomization will happen during ground model initialization.
Key: weightlearning.reasoner
Type: String
Default Value: ADMMReasoner.class.getName()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.WeightLearningApplication
Description: The class to use for inference.
Key: weightlearning.termgenerator
Type: String
Default Value: ADMMTermGenerator.class.getName()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.WeightLearningApplication
Description: The class to use for term generator. Should be compatible with REASONER_KEY and TERM_STORE_KEY.
Key: weightlearning.termstore
Type: String
Default Value: ADMMTermStore.class.getName()
Module: psl-core
Defining Class: org.linqs.psl.application.learning.weight.WeightLearningApplication
Description: The class to use for term storage. Should be compatible with REASONER_KEY.
See the Configuration Options page for all options that PSL uses.
Many components of the PSL software have modifiable parameters and options, called properties.
Every property has a key, which is a string that uniquely identifies it.
These keys are organized into a namespace hierarchy, with each level separated by dots, e.g. <namespace>.<option>
.
Each PSL class can specify a namespace for the options used by the class and its subclasses.
For example, the org.linqs.psl.application.learning.weight.VotedPerceptron
weight learning class uses the namespace votedperceptron
.
Setting the configuration option votedperceptron.stepsize
allows you to control the size of the gradient descent update step in the VotedPerceptron
weight learning class.
Every property has a type and a default value, which is the value the object will use unless a user overrides it. Every class with properties documents them by declaring their keys as public static final Strings, with Javadoc comments describing the corresponding property's type and semantics. Another public static final member declares the default value for that property.
Setting properties for PSL programs differ depending on whether you are using the CLI or Java/Groovy interface.
CLI users can pass any PSL configuration property on the CLI command line.
Just use the -D
option and specify the key-value pair.
For example, you can run PSL with debug logging like this:
java -jar psl.jar --infer --data example.data --model example.psl -D log4j.threshold=DEBUG
PSL projects can specify different configuration bundles in a file named psl.properties
on the classpath.
The standard location for this file is <project root>/src/main/resources/psl.properties
.
Each key-value pair should be specified on its own line with a <namespace>.<option> = <value>
format.
Here is an example psl.properties
:
# This is an example properties file for PSL. # # Options are specified in a namespace hierarchy, with levels separated by '.'. # Weight learning parameters # This property specifies the number of iterations of voted perceptron updates votedperceptron.numsteps = 700 # This property specifies the initial step size of the voted perceptron updates votedperceptron.stepsize = 0.1
Properties can only be fetched from a running program in the Java/Groovy interface.
They are accessed statically via the org.linqs.psl.config.Config
class.
For example:
import org.linqs.psl.config.Config;
public static void main(String[] args) {
System.out.println(Config.getString("key", "default"));
}
Arithmetic rules can be used to enforce modeling constraints. Many different types of constraints can be modeled, this page show a few of the common types.
For these examples, let Foo
be the binary predicate that we wish to put constraints on.
(Constraints are not limited to only binary predicates.)
A Functional constraint enforces the condition that for each possible constant c
,
the values of all groundings of Foo(A, c)
sum to exactly 1.
Foo(A, +c) = 1 .
Note that the rule is unweighted (as indicated by the period at the end).
Summing the first argument instead of the second one is often called Inverse Functional. There are no semantic differences between functional and inverse functional constraints.
Foo(+c, A) = 1 .
A Partial Functional constraint is like a functional one, except the value of all groundings of Foo(A, c)
sum to 1 or less.
Foo(A, +c) <= 1 .
Foo(+c, A) <= 1 .
Before you set up a new project, ensure that the prerequisites are met.
The easiest way to get a new project started in PSL is to copy an existing project. The examples are kept up-to-date and exhibit the preferred style for PSL programs. It is recommended to start there and change the program as you go.
There are several levels of data abstraction in PSL to help manage and isolate data:
The DataStore represents the physical place that all the data is stored. It matches one-to-one with an actual RDBMS database instance (either H2 or PostgreSQL).
All data is stored in tables organized by predicate (one predicate to a table).
Databases are created using their constructor.
In this diagram, you can see how the data resides in the DataStore:
The Database is like a view onto a DataStore where subsets of the data are assigned to be read/write, read-only, or inaccessible. This makes it easy to do things like have observations and truth in the same database without worrying about one leaking into the other.
To get a database, you call DataStore.getDatabase on a DataStore.getDatabase()
takes two required arguments and one variadic argument:
In this diagram, you can see what a Database set up for inference looks like:
In this diagram, you can see what a Database set up as a truth for weight learning or evaluation looks like:
A Partition is the most fine-grained collection of data in PSL. Every ground atom (piece of data) belongs to exactly one partition. Within a partition, all data must be unique (an exception will be thrown during data loading if this is broken).
In most cases, you will want two or three partitions for inference:
observations
- for observed data that has a fixed value.targets
- for the data you want to infer.truth
- optional truth values for that targets that you can use for evaluation.Follow the Google style guide with the following exceptions:
/* */
) inside methods.@author
tags in javadoc comments.Since PSL is built with Java and Maven, many IDEs are supported. (Although many PSL developers prefer using vim/emacs and a terminal.) One of the popular supported IDEs is Eclipse.
Eclipse is an extensible, integrated development environment that can be used to develop PSL and PSL projects. The recommended way of using Eclipse with PSL is to use the Eclipse plugin for Maven to generate Eclipse project information for a PSL project and then import that project into Eclipse.
Ensure that you have version 3.6 (Helios) or higher of Eclipse installed. Then, install the Groovy Eclipse plugin and the optional 1.8 version of the Groovy compiler, which is available when installing the plugin. The version 1.8 compiler is what Maven will use to compile the Groovy scripts, so builds done by either tool should be interchangeable. If you use an older version, Eclipse will probably recompile some files which then won't be compatible with the rest, and it won't run. (Cleaning and rebuilding everything should help.)
You might have to change the Groovy compiler version to 1.8.x in your Groovy compiler preferences (part of the Eclipse preferences).
You need to add a classpath variable in Eclipse to point to your local Maven repository.
You can access the variables either from the main options or from the build-path editor for any project.
Where you specify additional libs, make a new variable (there should be a button) with the name M2_REPO
and the path to your repo (e.g., ~/.m2/repository
).
This can also be achieved automatically via the following Maven command:
mvn -Declipse.workspace=/path/to/workspace eclipse:configure-workspace
In the top-level directory of your PSL project, run:
mvn eclipse:eclipse
Then in Eclipse, go to File/Import/General/\<something like 'Existing Project'\>
.
Select the top-level directory of your project.
You probably don't want to copy it into the workspace, so uncheck that option.
Be sure to run as a "Java application."
Tips
If you want to delete the Eclipse metadata for any reason, run:
mvn eclipse:clean
If you want to generate metadata for a project that depends on another project you're developing with Eclipse (PSL or not), run:
mvn eclipse:eclipse -Declipse.workspace=<path to Eclipse workspace>
The Eclipse plugin for Maven will look in the provided workspace for any projects that match dependencies declared in your project's POM file. Your project will be configured to depend on any such projects found as opposed to their respective installed jars. This way, changes to the sources of those dependencies will be seen by your project without reinstalling the dependencies. Note that this works even for dependencies that were imported but not copied into the workspace.
The m2eclipse Eclipse plugin is another option for developing PSL projects with Eclipse. It differs from the recommended method in that it is an Eclipse plugin designed to support Maven projects, as opposed to a Maven plugin designed to support Eclipse.
This page will walk you through the Groovy version of the Simple Acquaintances example.
First, ensure that your system meets the prerequisites .
Then clone the psl-examples
repository:
git clone https://github.com/linqs/psl-examples.git
Then move into the root directory for the simple acquaintances groovy example:
cd psl-examples/simple-acquaintances/groovy
Each example comes with a run.sh
script to quickly compile and run the example.
To compile and run the example:
./run.sh
To see the output of the example, check the inferred-predicates/KNOWS.txt
file:
cat inferred-predicates/KNOWS.txt
You should see some output like:
'Arti' 'Ben' 0.48425865173339844
'Arti' 'Steve' 0.5642937421798706
< ... 48 rows omitted for brevity ...>
'Jay' 'Dhanya' 0.4534565508365631
'Alex' 'Dhanya' 0.48786869645118713
The exact order of the output may change and some rows were left out for brevity.
Now that we have the example running, lets take a look inside the only source file for the example:
src/main/java/org/linqs/psl/examples/simpleacquaintances/Run.groovy
.
All configuration in PSL is handled through the Config object.
By default, PSL will look for two configuration files: psl.properties
and log4j.properties
.
You can find these files in the src/main/resources
directory.
The Config class will automatically load these files (if they exist) and all the options in them.
Configuration options can still be set using the addProperty()
and setProperty()
methods of the Config class.
The definePredicates()
method defines the three predicates for our example:
model.add predicate: "Lived", types: [ConstantType.UniqueStringID, ConstantType.UniqueStringID];
model.add predicate: "Likes", types: [ConstantType.UniqueStringID, ConstantType.UniqueStringID];
model.add predicate: "Knows", types: [ConstantType.UniqueStringID, ConstantType.UniqueStringID];
Each predicate here takes two unique string identifiers as arguments.
Note that for unique identifiers, ConstantType.UniqueStringID
and ConstantType.UniqueIntID
are available.
Having integer identifiers usually requires more pre-processing on the user's side, but gains better performance.
The defineRules()
method defines six rules for the example.
There are pages that cover the PSL rule specification and the rule specification in Groovy .
We will discuss the following two rules:
model.add(
rule: "20: Lived(P1, L) & Lived(P2, L) & (P1 != P2) -> Knows(P1, P2) ^2"
);
model.add(
rule: "5: !Knows(P1, P2) ^2"
);
The first first rule can be read as "If P1 and P2 are different people and have both lived in the same location, L, then they know each other". Some key points to note from this rule are:
L
was reused in both Lived
atoms and therefore must refer to the same location.(P1 != P2)
is shorthand for P1 and P2 referring to different people (different unique ids).The second rule is a special rule that acts as a prior. Notice how this rule is not an implication like all the other rules. Instead, this rule can be read as "By default, people do not know each other". Therefore, the program will start with the belief that no one knows each other and this prior belief will be overcome with evidence.
The loadData()
method loads the data from the flat files in the data
directory into the data store that PSL is working with.
For berevity, we will only be looking at two files:
Inserter inserter = dataStore.getInserter(Lived, obsPartition);
inserter.loadDelimitedData(Paths.get(DATA_PATH, "lived_obs.txt").toString());
inserter = dataStore.getInserter(Likes, obsPartition);
inserter.loadDelimitedDataTruth(Paths.get(DATA_PATH, "likes_obs.txt").toString());
Both portions load data using an Inserter
.
The primary difference between the two calls is that the second one is looking for a truth value while the first one assumes that 1 is the truth value.
If we look in the files, we see lines like:
../data/lives_obs.txt
Jay Maryland
Jay California
../data/likes_obs.txt
Jay Machine Learning 1
Jay Skeeball 0.8
In lives_obs.txt
, there is no need to use a truth value because living somewhere is a discrete act.
You have either lived there or you have not.
Liking something, however, is more continuous.
Jay may like Machine Learning 100%, but he only likes Skeeball 80%.
Here we must take a moment to talk about data partitions. In PSL, we use partitions to organize data. A partition is nothing more than a container for data, but we use them to keep specific chunks of data together or separate. For example if we are running evaluation, we must be sure not use our test partition in training. A more complete discussion of partitions and data storage in PSL can be found here on this page .
PSL users typically organize their data in at least three different partitions (all of which you can see in this example):
obsPartition
in this example): In this partition we put actual observed data. In this example, we put all the observations about who has lived where, who likes what, and who knows who in the observations partition.targetsPartition
in this example): In this partition we put atoms that we want to infer values for. For example if we want to if Jay and Sammy know each other, then we would put the atom Knows(Jay, Sammy)
into the targets partition.truthPartition
in this example): In this partition we put our test set, data that we have actual values for but are not including in our observations for the purpose of evaluation. For example, if we know that Jay and Sammy actually do know each other, we would put Knows(Jay, Sammy)
in the truth partition with a truth value of 1.The runInference()
method handles running inference for all the data we have loaded.
Before we run inference, we have to set up a database to use for inference:
Database inferDB = dataStore.getDatabase(targetsPartition, [Lived, Likes] as Set, obsPartition);
The getDatabase()
method of DataStore
is the proper way to get a database.
This method takes a minimum of two parameters:
getDatabase()
takes any number of read-only partitions that you want to include in this database.
In our example, we want to include our observations when we run inference.Now we are ready to run inference:
InferenceApplication inference = new MPEInference(model, inferDB);
inference.inference();
inference.close();
inferDB.close();
To the MPEInference
constructor, we supply our model and the database to infer over.
To see the results, then we will need to look inside of the target partition.
The method writeOutput()
handles printing out the results of the inference.
There are two key lines in this method:
Database resultsDB = ds.getDatabase(targetsPartition);
...
for (GroundAtom atom : resultsDB.getAllGrondAtoms(Knows)) {
The first line gets a fresh database that we can get the atoms from.
Notice that we are passing in targetsPartition
as a write partition, but we are actually just reading from it.
The second line uses the Queries
class to iterate over all the Knows
atoms from the database we just created.
Lastly, the evalResults()
method handles seeing how well our model did.
The DiscreteEvaluator
class provides basic tools to compare two partitions.
In this example, we are comparing our target partition to our truth partition.
Example PSL programs are available at https://github.com/linqs/psl-examples.
Each example contains a script called run.sh
which will handle all the building and running.
A detailed walkthrough of an example can he found here .
Customized functions can be created be implementing the ExternalFunction
interface.
The getValue() method should return a value in [0, 1].
public class MyStringSimilarity implements ExternalFunction {
@Override
public int getArity() {
return 2;
}
@Override
public ConstantType[] getArgumentTypes() {
return [ConstantType.String, ConstantType.String].toArray();
}
@Override
public double getValue(ReadableDatabase db, Constant... args) {
return args[0].toString().equals(args[1].toString()) ? 1.0 : 0.0;
}
}
A function comparing the similarity between two entities or text can then be declared as follows:
model.add function: "MyStringSimilarity", implementation: new MyStringSimilarity();
A function can be used in the same manner as a predicate in rules:
Name(P1, N1) & Name(P2, N2) & MyStringSimilarity(N1, N2) -> SamePerson(P1, P2)
The PSL software uses concepts from the core PSL paper, and introduces new ones for advanced data management and machine learning. On this page, we define the commonly used terms and point out the corresponding classes in the code base.
Please note that this page is organized conceptually, not alphabetically.
Hinge-loss Markov random field: A factor graph defined over continuous variables in the [0,1] interval with (log) factors that are hinge-loss functions. Many classes in PSL work together to implement the functionality of HL-MRFs.
Ground atom:
A logical relationship corresponding to a random variable in a HL-MRF.
For example, Friends("Steve", "Jay")
is an alias for a specific random variable.
Random variable atom: A ground atom that is unobserved, i.e., no value is known for it. A HL-MRF assigns probability densities to assignments to random variable atoms.
Observed atom: A ground atom that has an observed, immutable value. HL-MRFs are conditioned on observed atoms.
Atom:
A generalization of ground atoms that allow logical variables as placeholders for constant arguments.
For example, Friends("Steve", A)
is a placeholder for all the ground atoms that can be obtained by substituting constants for the logical variable A
.
PSL Program: A set of rules, each of which is a template for hinge-loss potentials or hard linear constraints. When grounded over a base of ground atoms, a PSL program induces a HL-MRF conditioned on any specified observations.
Rule: See Rule Specification .
See this page for more details.
Data Store: An entire data repository, such as a relational database management system (RDBMS).
Partition: A logical division of ground atoms in a data store.
Database: A logical view of a data store, constructed by specifying a write partition and one or more read partitions of a data store.
Open Predicate: A predicate whose atoms can be random variable atoms, i.e., unobserved. The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.
Closed Predicate: A predicate whose atoms are always observed atoms. The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.
If you use H2 as the backend database for PSL (as is done in the examples), it can be helpful to open up the resulting database and examine it for debugging purposes.
You should set up your PSL program to use H2 on disk and note where it is stored. For example, if you create your DataStore using the following code
DataStore data = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, "/home/bob/psl", true));
then PSL will create an H2 database in the file /home/bob/psl/psl.mv.db
.
Then, run your program so the resulting H2 database can be inspected.
You will need to use the H2 jar for your classpath. This is likely ~/.m2/repository/com/h2database/h2/1.4.192/h2-1.4.192.jar
, but you may need to modify it if, for example, you're using a different version of H2.
You start the H2 web server by running the following command:
java -cp ~/.m2/repository/com/h2database/h2/1.4.192/h2-1.4.192.jar org.h2.tools.Server
Once you have started the web server, you can access it at http://localhost:8082
.
To log in, you should change the connection string to point to your H2 database file without .mv.db on the end. The username and password are both empty strings.
Welcome to the PSL software Wiki!
To get started with PSL you can follow one of these guides:
Command Line Interface for New Users : The Command Line Interface (CLI) will be sufficient for most use cases and we recommend that all users start with it. It is the easiest way to get started with PSL. If you need to take advantage of more advanced / low-level PSL features, then you can move to the Groovy interface.
Groovy for Intermediate Users : If you are comfortable with Java/Groovy and need to take advantage of more advanced / low-level PSL features, then we recommend that you use our Groovy interface.
PSL requires Java, so before you start make sure that you have Java installed.
Probabilistic soft logic (PSL) is a machine learning framework for developing probabilistic models. PSL models are easy to use and fast. You can define models using a straightforward logical syntax and solve them with fast convex optimization. PSL has produced state-of-the-art results in many areas spanning natural language processing, social-network analysis, knowledge graphs, recommender system, and computational biology.
Probabilistic soft logic (PSL) is a general purpose language for modeling probabilistic and relational domains. It is applicable to a variety of machine learning problems, such as link prediction and ontology alignment. PSL combines the strengths of two powerful theories -- first-order logic, with its ability to succinctly represent complex phenomena, and probabilistic graphical models, which capture the uncertainty and incompleteness inherent in real-world knowledge. More specifically, PSL uses "soft" logic as its logical component and Markov random fields as its statistical model.
In "soft" logic, logical constructs need not be strictly false (0)
or true (1)
,
but can take on values between 0 and 1 inclusively.
For example, in logical formula similarNames(X, Y) => sameEntity(X, Y)
(which encodes the belief that if two people X
and Y
have similar names, then they are likely the same person),
the truth value of similarNames(X, Y)
and that of the entire formula lie in the range [0, 1].
The logical operators and (^)
, or (v)
and not (~)
are defined using the Lukasiewicz t-norms, i.e.,
A ^ B = max{A + B - 1, 0}
A v B = min{A + B, 1}
~A = 1 - A
(Note that if the values of A
and B
are restricted to be false or true, then the logical operators work as they are conventionally defined.)
These logical formulas become the features of a Markov network. Each feature in the network is associated with a weight, which determines its importance in the interactions between features. Weights can be specified manually or learned from evidence data using PSL's suite of learning algorithms. PSL also provides sophisticated inference techniques for finding the most likely answer (i.e. the MAP state) to a user's query. The "softening" of the logical formulas allows us to cast the inference problem as a polynomial-time optimization, rather than a (much more difficult NP-hard) combinatorial one. (See LP relaxation for more details.)
For more details on PSL, please refer to the paper Hinge-Loss Markov Random Fields and Probabilistic Soft Logic.
PSL uses SLF4J for logging.
In the PSL Groovy program template, SLF4J is bound to Log4j 1.2.
The Log4j configuration file is located at src/main/resources/log4j.properties
.
It should look something like this:
# Set root logger level to the designated level and its only appender to A1.
log4j.rootLogger=INFO, A1
# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
# Change our connection pool to only log errors.
# Since we may set our root logger to something more loose, we want to explicitly set this.
org.slf4j.simpleLogger.log.com.zaxxer.hikari=ERROR
log4j.logger.com.zaxxer.hikari=ERROR
The logging verbosity can be set by changing ERROR
in the second line to a different level and recompiling. Options include OFF
, WARN
, DEBUG
, and TRACE
.
In the command line interface, the logging level can be set using the same logging levels like this:
java -jar psl.jar --infer --data example.data --model example.psl -D log4j.threshold=DEBUG
MOSEK is software for numeric optimization. PSL can use MOSEK as a conic program solver via a PSL add on. Mosek support is provided as part of the PSL Experimental package.
First, install MOSEK 6.
In addition to a commercial version for which a 30-day trial is currently available, the makers of MOSEK also currently offer a free academic license.
Users will need the "PTS" base system for using the linear distribution of the ConicReasoner
and the "PTON" non-linear and conic extension to use the quadratic distribution.
Both of these components are currently covered by the academic license.
After installing MOSEK, install the included mosek.jar
file to your local Maven repository. (This file should be in <mosek-root>/6/tools/platform/<your-platform>/bin
.)
mvn install:install-file -Dfile=<path-to-mosek.jar> -DgroupId=com.mosek \
-DartifactId=mosek -Dversion=6.0 -Dpackaging=jar
Next, add the following dependency to your project's pom.xml
file:
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-mosek</artifactId>
<version>YOUR-PSL-VERSION</version>
</dependency>
...
</dependencies>
where YOUR-PSL-VERSION
is replaced with your PSL version.
Finally, it might be necessary to rebuild your project:
mvn clean compile
After installing the MOSEK add on, you can use it where ever a ConicProgramSolver
is used.
To use it for inference with a ConicReasoner
set the conicreasoner.conicprogramsolver
configuration property to oorg.linqs.psl.optimizer.conic.mosek.MOSEKFactory
.
Further, MOSEK requires that two environment variables be set when running.
The same bin
directory where you found mosek.jar
needs to be on the path for shared libraries.
The environment variable MOSEKLM_LICENSE_FILE
needs to be set to the path to your license file (usually <mosek-root>/6/licenses/mosek.lic
).
In bash in Linux, this can be done with the commands
export LD_LIBRARY_PATH=<path_to_mosek_installation>/mosek/6/tools/platform/<platform>/bin
export MOSEKLM_LICENSE_FILE=<path_to_mosek_installation>/mosek/6/licenses/mosek.lic
On Mac OS X, instead set DYLD_LIBRARY_PATH
to the directory containing the MOSEK binaries.
Our Maven repository has moved
https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-releases/
http://maven.linqs.org/maven/repositories/psl-releases/
The new endpoint will redirect to a https endpoint that may be used if necessary:
https://linqs-data.soe.ucsc.edu/maven/repositories/psl-releases/
All packages have been renamed from edu.umd.cs.*
to org.linqs.*
.
edu.umd.cs.psl.model.argument.ArgumentType
→ org.linqs.psl.model.term.ConstantType
ArgumentType.*
→ ConstantType.*
The arguments for a predicate are now defined in org.linqs.psl.model.term.ConstantType
instead of edu.umd.cs.psl.model.argument.ArgumentType
.
All the same types are supported, just the containing class has been moved and renamed.
new Partition(int)
→ DataStore.getPartition("stringIdentifier")
If the partition does not exist, it will be created and returned.
If it exists, it will be returned.
It is not longer necessary to pass around partitions if you don't want to.
Arithmetic rules are now supported in 2.0. See the Rule Specification for details. Rules in Groovy can now be specified in additional ways. See Rule Specification in Groovy .
Constraints are now implemented using unweighted arithmetic rules. See Constraints for more details.
To speed up utility development and reduce bloat, some components have been removed from this primary PSL repository and brought into their own repositories.
In these sample POM snippets all versions have been set to CANARY
, however you may choose your corresponding release.
https://github.com/linqs/psl-utils
psl-dataloading
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-dataloading</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.ui.loading
edu.umd.cs.psl.ui.data
org.linqs.psl.utils.dataloading
psl-evaluation
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-evaluation</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.evaluation
org.linqs.psl.utils.evaluation
psl-textsim
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-textsim</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.ui.functions.textsimilarity
org.linqs.psl.utils.textsimilarity
https://github.com/linqs/psl-experimental
psl-datasplitter
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-datasplitter</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.util.datasplitter
org.linqs.psl.experimental.datasplitter
psl-experiment
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-experiment</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.ui.experiment
org.linqs.psl.experimental.experiment
psl-mosek
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-mosek</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.optimizer.conic.mosek
org.linqs.psl.optimizer.conic.mosek
psl-optimize
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-optimize</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.optimizer
edu.umd.cs.psl.reasoner.conic
org.linqs.psl.experimental.optimizer
psl-sampler
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-sampler</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.sampler
org.linqs.psl.experimental.sampler
psl-weightlearning
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-weightlearning</artifactId>
<version>CANARY</version>
</dependency>
edu.umd.cs.psl.application.learning.weight.maxmargin
org.linqs.psl.experimental.learning.weight.maxmargin
The following software is required to use PSL:
Ensure that the Java 7 or 8 development kit is installed. Either OpenJDK or Oracle Java work.
We have had some reports of failing builds using Java prior to 1.7.0_110
or 1.8.0_110
.
If you have issues with Maven (especially handshake errors), try updating your version of java to at least 1.7.0_110
or 1.8.0_110
.
This is especially relevant for Mac users where the version of Java is less frequently updated.
PSL uses Maven to manage builds and dependencies. Users should install Maven 3.x. PSL is developed with Maven and PSL programs are created as Maven projects. See running Maven for help using Maven to build projects.
Models in Groovy support three different ways to specify rules:
Rules can be specified using the natural Groovy syntax and the add() method for models. The rule weight and squaring must be specified as additional arguments. Both may be left off to specify an unweighted rule.
model.add(
rule: ( Likes(A, 'Dogs') & Likes(B, 'Dogs') ) >> Friends(A, B),
weight: 5.0,
squared: true
);
Because the in-line syntax must be a subset of Groovy syntax, the following operator variants are not supported:
&&
||
->
<<
<-
!
Note that there are supported variants for all unsupported operators. Arithmetic rules are not supported with the in-line syntax.
model.add(
rule: "( Likes(A, 'Dogs') & Likes(B, 'Dogs') ) >> Friends(A, B)",
weight: 5.0,
squared: true
);
// Produces the same rule as above.
model.add(
rule: "5.0: ( Likes(A, 'Dogs') && Likes(B, 'Dogs') ) -> Friends(A, B) ^2"
);
// An unweighted (constraint) variant of the above rule.
model.add(
rule: "( Likes(A, 'Dogs') && Likes(B, 'Dogs') ) -> Friends(A, B)"
);
// An arithmetic constraint.
model.add(
rule: "Likes(A, +B) = 1 ."
);
Rules can also be specified directly as a string. Because they are not limited by the Groovy syntax, all operators are available. A rule that specifies a weight and squaring in the string may not also pass "weight" and "squared" arguments.
// Load multiple rules from a single string.
model.addRules("""
1: ( Likes(A, 'Dogs') & Likes(B, 'Dogs') ) >> Friends(A, B) ^2
Likes(A, +B) = 1 .
""");
// Load multiple rules from a file.
model.addRules(new FileReader("myRules.txt"));
The addRules() method may be used add multiple rules at a time, each rule on its own line.
A String
or Reader
may be passed.
Each rule must be fully specified with respects to weights and squaring.
Constraints are specified as unweighted arithmetic rules. So all you need to do is make an arithmetic rule and either explicitly specify that it is unweighted (using the period syntax), or not specify a weight.
`
groovy
// An unweighted rule (constraint) explicitly specified with a period.
model.add(
rule: "Likes(A, +B) = 1 ."
);
// An unweighted rule (constraint) implicitly specified by not adding a weight. model.add( rule: "Likes(A, +B) = 1" );
PSL supports two primary types of rules: Logical and Arithmetic. Each of these types of rules support weights and squaring.
Logical rules in PSL are implications joined with logical operators (with the exception of negative priors). Since PSL uses soft logic, hard logic operators are replaced with Lukasiewicz operators.
& (&&)
- Logical And
The and
operator is binary and functions as a Lukasiewicz t-norm
operator:
A & B = MAX(0, A + B - 1)
| (||)
- Logical Or
The or
operator is binary and functions as a Lukasiewicz t-conorm
operator:
A | B = MIN(1, A + B)
>> (->) / << (<-)
- Implication
The implication
operator acts similar to the standard logical implication where the truth of the body implies the truth of the head.
Note that the head is always the side the that arrow is pointing at and both directions are supported.
It is most common to see rules where the body is on the left and the head is on the right.
The body of an implication must be a conjunctive clause (contain only and
operators) while the head must be a disjunctive clause (contain only or
operators).
~ (!)
- Negation
The negation
operator is unary and functions as a Lukasiewicz negation
operator:
~A = 1 - A
// The same rule written in two different ways.
Nice(A) & Nice(B) -> Friends(A, B)
Friends(A, B) << Nice(A) && Nice(B)
// Using a disjunction in the head instead of a conjunction in the body.
// Also written two different ways.
Friends(A, B) >> Nice(A) || Nice(B)
Nice(A) | Nice(B) <- Friends(A, B)
Arithmetic rules are relations of two linear combinations.
The following operators are used in arithmetic rules:
+
-
*
/
Note that each side of an arithmetic rule must be a linear combination, so +/-
is only allowed between terms and *//
is only allowed for coefficients.
The following relational operators are allowed between the two linear combinations:
=
<=
>=
A summation can be used when you want to aggregate over a variable.
You turn a variable into a summation variable by prefixing it with a +
.
Each sum variable can only be used once per expression, but you may have multiple different summation variables.
A filter clause appears at the end of a rule and decides what values the summation variable can take. There can be multiple filter clauses for each rule, but each summation variable can have at most one filter clause. The filter clause is a logical expression, but uses hard logic rather than Lukasiewicz. All non-zero truth values are considered true in a filter expression. If this expression evaluates to zero for a value, then that value is not used in the summation. Valid things that may appear in the filter clause are:
// Only sum up friends of A that are nice.
Friends(A, +B) <= 1 {B: Nice(B)}
// Only sum up friends of A that are similar to A.
// Note that Similar must be a closed predicate.
Friends(A, +B) <= 1 {B: Similar(A, B)}
// Only sum up friends of A that are not similar to A.
// Note that Similar must be a closed predicate.
Friends(A, +B) <= 1 {B: !Similar(A, B)}
// Only sum up friends of A where both A and B are nice and similar.
// Note that Similar must be a closed predicate.
Friends(A, +B) <= 1 {B: Nice(A) & Nice(B) & Similar(A, B)}
// Only sum up combinations for friends where A is nice and B is not nice.
Friends(+A, +B) <= 1 {A: Nice(A)} {B: !Nice(B)}
Each term in an arithmetic rule can take an optional coefficient. The coefficient may be any real number and can either appear before the term and act as a multiplier:
2.5 * Similar(A, B) >= 1
or, appear after the term and act as a divisor:
Similar(A, B) / 2.5 <= 1
Special coefficients (called Coefficient Operators) may be used:
|A|
- Cardinality
A cardinality coefficient may only be used on a summation variable.
It becomes the count of the number of terms substituted for a summation variable.
@Min[A, B]
- Max
Returns the maximum of A
and B
.
May be used with summation variables.
@Max[A, B]
- Min
Returns the minimum of A
and B
.
May be used with summation variables.
(Note that these rules are meant to show the semantics of arithmetic rules and may not make logical sense.)
Friends(A, B) = 0.5
Friends(A, +B) <= 1
Friends(A, +B) / |B| <= 1
@Min[2, |B|] * Friends(A, +B) <= 1
Friends(A, +B) <= 1 {B: Nice(B)}
Friends(A, B) <= Nice(A) + Nice(B)
Friends(A, B) >= 3.0 * Nice(A) - 2.0 * Nice(B)
Every rule must be either weighted or unweighted. Unweighted rules are also called constraints since they are strictly enforced.
Weighted rules are prefixed with the weight and a colon:
<weight>: <rule>
For example:
2.5: Nice(A) & Nice(B) & (A != B) -> Friends(A, B)
5.0: Friends(A, +B) <= 1
10.0: Friends(A, +B) <= 1 {B: Nice(B)}
Unweighted rules are suffixed with a period:
<rule> .
For example:
Nice(A) & Nice(B) & (A != B) -> Friends(A, B) .
Friends(A, +B) <= 1 .
Friends(A, +B) <= 1 . {B: Nice(B)}
Any weighted rule can choose to square their hinge-loss functions.
Squaring the hinge-loss (or "squared potentials") may result in better performance.
Non-squared potentials tend to encourage a "winner take all" optimization, while squared potentials encourage more trading off.
To square a rule, just suffix a ^2
to it:
2.5: Nice(A) & Nice(B) (A != B) -> Friends(A, B) ^2
5.0: Friends(A, +B) <= 1 ^2
10.0: Friends(A, +B) <= 1 ^2 {B: Nice(B)}
You may specify priors for a predicate in PSL. A prior is specified on a specific predicate and affects all open ground atoms of that predicate. Priors in PSL must be weighted and may be squared. Priors tend to have low weights, since they are supposed to get overpowered by evidence.
Note that priors are not the same as specifying an initial value for your open predicate in a data file. Once optimization starts, the initial value specified in the data file will quickly get changed and have little/no impact on the final optimization. A prior however, is a ground rule that becomes a full fledged potential function that actively participates in optimization.
Negative priors are the most common type of prior in PSL. It assumes that all ground atoms for the predicate are zero.
Negative priors may be specified using logical rules:
1.0: ~Friends(A, B) ^2
This prior can be interpreted as "By default, people are not friends".
Arithmetic rules may also be used to specify a negative prior:
1.0: Friends(A, B) = 0 ^2
Positive priors can be a little more tricky than negative priors.
If you want all the ground atoms to take the same positive prior, then you can just use an arithmetic rule:
1.0: Friends(A, B) = 0.75 ^2
If you want different ground atoms to have different positive priors, then you will need to use a surrogate predicate. First, create a new closed predicate that corresponds 1-to-1 with your open predicate you wish to put the prior on. Then add observations for the surrogate predicate with the truth value being the prior you wish to put on that ground atom. Now just create a rule that directly ties together the surrogate predicate to the open predicate. See the example below.
Consider a PSL program where we are trying to infer friendship (the Friends
predicate).
We may have a prior belief on the friendship quality between all people in our data.
To encode this prior belief, we will first construct a surrogate predicate called FriendsPrior
(the name does not matter).
Now we will load FriendsPrior
with observations where the truth value of the observation is our prior.
Our data file for FriendsPrior
may look something like:
Alice Bob 1.0
Alice Charlie 0.75
Bob Charlie 0.33
Now we will add this rule that acts as our prior:
1.0: FriendsPrior(A, b) -> Friends(A, B) ^2
Both logical and arithmetic rules support some special operators.
== (=)
- Equals
Ensure that that the left and right side are equal.
Note that this is not the same as a similarity function evaluating to 1.
Two variables may be 100% similar, but equals will only evaluate to 1 unless they refer to the same value.
!= (~=)
- Not Equals
Evaluates to 1 when both side are not the same.
This is a very common operator to use in most rules.
For example, consider the following two rules:
Nice(A) && Nice(B) -> Friends(A, B)
Nice(A) && Nice(B) && (A != B) -> Friends(A, B)
If A and B can both take the values "Alice" and "Bob", then the first rule would generate the following groundings:
Nice("Alice") && Nice("Alice") -> Friends("Alice", "Alice")
Nice("Alice") && Nice("Bob") -> Friends("Alice", "Bob")
Nice("Bob") && Nice("Bob") -> Friends("Bob", "Bob")
Nice("Bob") && Nice("Alice") -> Friends("Bob", "Alice")
While the second rule would only generate two groundings:
Nice("Alice") && Nice("Bob") -> Friends("Alice", "Bob")
Nice("Bob") && Nice("Alice") -> Friends("Bob", "Alice")
% (^)
- Non-Symmetric
Ensure that the reverse (or equal) paring of the two operands is not grounded.
For example, consider the following two rules:
SimilarNames(A, B) && (A % B) -> SamePerson(A, B)
SimilarNames(A, B) -> SamePerson(A, B)
If A and B can both take the values "Alice" and "Bob", then the first rule would generate the following groundings:
SimilarNames("Alice", "Alice") && ("Alice" % "Alice") -> SamePerson("Alice", "Alice")
SimilarNames("Alice", "Bob") && ("Alice" % "Bob") -> SamePerson("Alice", "Bob")
SimilarNames("Bob", "Alice") && ("Bob" % "Alice") -> SamePerson("Bob", "Alice")
SimilarNames("Bob", "Bob") && ("Bob" % "Bob") -> SamePerson("Bob", "Bob")
While the second rule would only generate one grounding:
SimilarNames("Alice", "Bob") && ("Alice" % "Bob") -> SamePerson("Alice", "Bob")
To run a PSL program, change to the top-level directory of its project (the directory with the Maven pom.xml
file).
Compile your project:
mvn compile
Now use Maven to generate a classpath for your project's dependencies:
mvn dependency:build-classpath -Dmdep.outputFile=classpath.out
You can now run a class with the command:
java -cp ./target/classes:`cat classpath.out` <fully qualified class name>
where \
Tips and troubleshooting
run.sh
script that will compile and run the program. Look for this script first.java
command is used to run a script.To change the version of PSL your project uses, edit your project's pom.xml
file. The POM will declare dependencies on one or more PSL artifacts, e.g.,
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>2.0.0</version>
</dependency>
...
</dependencies>
Change the version element of each such dependency to a new version (all the same one) and rebuild. See Choosing a Version of PSL to decide how to specify your version.
In addition to H2, PSL supports using PostgreSQL (or "Postgres" for short) as a database backend. Often the slowest operation in PSL (and all SRL systems in general) is grounding. PSL uses "bottom-up" grounding which formulates the grounding for each rule as a SQL query. This means that the choice of database backend can have large performance impacts.
Note that the Postgres backend in PSL is relatively new and may have slight changes made to its interface.
H2 is the default database backend for PSL. H2 is an embedded database and therefore requires no additional installation or configuration, the entire engine ships with the PSL jar. Databases are created as flat files (or fully in-memory). This makes H2 more lightweight and easy to configure. It is ideal for publishing experiments for papers that others may want to reproduce with ease. H2 will also hold information and data in memory (the same memory space that your PSL program is in). This makes certain operations fast:
However, the drawback of H2 is the same as its greatest asset... simplicity. H2 has a simplified query planner, index structure, and data management. As a result, it it often slow when queries get big (lots of ground rules) and/or complicated (rules with many atoms/predicates). To counter these drawbacks, PSL also supports using a Postgres database backend.
Postgres follows the pattern of a more typical database engine where there is a service installed on some server (which may be the local machine) which accepts connections from clients. Postgres is a production-grade, open source database engine and can easily scale into terabytes (but hopefully you are working with much less data). Because Postgres is not embedded, it can be more complicated and robust. The per-query overhead is higher with Postgres (because it needs to talk to another (possibly remote) process), but the queries typically run much faster.
For simple/small queries (ones that typically run in less than 10 seconds), we typically see H2 and Postgres preforming similarly (or H2 even faster). However with larger queries, Postgres often runs 1-3 orders of magnitude faster. The more data, the greater the difference between the two. A general rule of thumb is that if you have more than 1 million groundings, consider switching to Postgres.
We recommend at least Postgres v9.5. The Postgres website has all the directions you should need for installing Postgres. If you still have issues, just try searching the internet for it. Postgres is very popular, so the internet should have a wealth of answers for you.
Once you have Postgres installed, you need to setup a database and user.
In this section, we will assume that your Postgres instance is only accessible to the local machine and not open to the world.
If you want to open up your database please take the time to read up on the security implications for that.
For both performance and security reasons, we strongly recommend connecting to a database that is on the same machine and that only accepts local connections.
To accept all local connections, you can setup trust-based authentication in your Client Authentication Configuration File (usually called pg_hba.conf
).
Adding this line to the bottom will let all local connections to connect.
local all all trust
Creating a database in Postgres can be done with the createdb
command.
You can call your database whatever you want.
createdb psl
Similar to creating a database, you can create a user with the createuser
command.
It is easiest to create your psl user as a superuser (-s
) so it can properly administrate the database.
However, people more versed in database administration can clamp down the permissions more tightly.
createuser -s jay
To get the most performance out of our database, we typically use some additional configuration options.
These are fully optional and PSL will run fine without them.
The general themes of the options are to increase the memory, decrease durability (since we will generally not need the database after PSL completes), and force some query plans.
These options should be in your postgresql.conf
file.
# shared_buffers = 1/4 of system memory
# effective_cache_size = 1/2 of system memory
# For example, on a 16GB machine:
shared_buffers = 4GB
effective_cache_size = 8GB
# No Durability
fsync = off
full_page_writes = off
synchronous_commit = off
# Query Optimization
# Disable nested loops.
enable_nestloop = off
# Limit the write-ahead log (WAL)
# max_wal_size = 1/2 of shared_buffers
max_wal_size = 2GB
wal_buffers = 32MB
wal_level = minimal
max_wal_senders = 0
checkpoint_timeout = 30
To use Postgres in the Groovy interface, just use the PostgreSQLDriver instead of the H2DatabaseDriver. There are various constructors that take in different information ranging from a database name to a full connection string.
DataStore dataStore = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, "/tmp/psl", true), config);
DataStore dataStore = new RDBMSDataStore(new PostgreSQLDriver("psl", true), config);
To use Postgres in the CLI, just use the --postgres
(-p
) option.
This flag has one required argument, the name of the database to connect to.
This means that currently, you must be able to connect to that database as the current user (the one invoking PSL) without a password.
java -jar psl-cli-CANARY.jar --infer --data friendship.data --model friendship.psl --postgres psl
PSL provides a Command Line Interface. The CLI is the easiest interface to PSL and handles most situations where you do not need additional customization.
PSL requires that you have Java installed .
Let's first download the files for our example program, run it and see what it does!
In this program, we'll use information about known locations of some people, know people know, and what people like to infer who knows each other. We'll first run the program and see the output. We will be working from the command line so open up your shell or terminal.
As with the other PSL examples, you can find all the code in our psl-examples repository.
We will be using the simple-acquaintances
example.
git clone https://github.com/linqs/psl-examples.git
cd psl-examples/simple-acquaintances/cli
All the required commands are contained in the run.sh
script.
However, the commands are very simple and can also be run individually.
The PSL jar will be fetched automatically. You can also select what version of PSL if fetched/used at the top of this script.
You should now see output that looks like this (note that the order of the output lines may differ):
Running PSL Inference
0 [main] INFO org.linqs.psl.cli.Launcher - Loading data
81 [main] INFO org.linqs.psl.cli.Launcher - Data loading complete
81 [main] INFO org.linqs.psl.cli.Launcher - Loading model
159 [main] INFO org.linqs.psl.cli.Launcher - Model loading complete
159 [main] INFO org.linqs.psl.cli.Launcher - Starting inference
224 [main] INFO org.linqs.psl.application.inference.MPEInference - Grounding out model.
320 [main] INFO org.linqs.psl.application.inference.MPEInference - Beginning inference.
420 [main] INFO org.linqs.psl.reasoner.admm.ADMMReasoner - Optimization completed in 404 iterations. Primal res.: 0.022839682, Dual res.: 6.607145E-4
420 [main] INFO org.linqs.psl.application.inference.MPEInference - Inference complete. Writing results to Database.
447 [main] INFO org.linqs.psl.application.inference.MPEInference - Results committed to database.
457 [main] INFO org.linqs.psl.cli.Launcher - Inference Complete
461 [main] INFO org.linqs.psl.cli.Launcher - Starting discrete evaluation
472 [main] INFO org.linqs.psl.cli.Launcher - Discrete evaluation results for KNOWS -- Accuracy: 0.5961538461538461, Error: 21.0, Positive Class Precision: 0.7333333333333333, Positive Class Recall: 0.6285714285714286, Negative Class Precision: 0.4090909090909091, Negative Class Recall: 0.5294117647058824,
472 [main] INFO org.linqs.psl.cli.Launcher - Discrete evaluation complete
By default, the PSL examples output the results into the inferred-predicates
directory.
The results for this program will looks something like:
$ cat inferred-predicates/KNOWS.txt | sort
'Alex' 'Arti' 0.9966862201690674
'Alex' 'Ben' 0.5923290848731995
'Alex' 'Dhanya' 0.48786869645118713
< ... 50 rows omitted for brevity ...>
'Steve' 'Elena' 0.49548637866973877
'Steve' 'Jay' 0.614285409450531
'Steve' 'Sabina' 0.5133784413337708
Now that we've run our first program that performs link prediction to infer who knows who, let's understand the steps that we went through to infer the unknown values: defining the underlying model, providing data to the model, and running inference to classify the unknown values.
A model in PSL is a set of logic-like rules.
The model is defined inside a text file with the format .psl
. We describe this model in the file simple-acquaintances.psl
.
Let's have a look at the rules that make up our model:
20: Lived(P1, L) & Lived(P2, L) & (P1 != P2) -> Knows(P1, P2) ^2
5: Lived(P1, L1) & Lived(P2, L2) & (P1 != P2) & (L1 != L2) -> !Knows(P1, P2) ^2
10: Likes(P1, L) & Likes(P2, L) & (P1 != P2) -> Knows(P1, P2) ^2
5: Knows(P1, P2) & Knows(P2, P3) & (P1 != P3) -> Knows(P1, P3) ^2
Knows(P1, P2) = Knows(P2, P1) .
5: !Knows(P1, P2) ^2
The model is expressing the intuition that people who have lived in the same location or like the same thing may know each other.
The integer values at the beginning of rules indicate the weight of the rule.
Intuitively, this tells us the relative importance of satisfying this rule compared to the other rules.
The ^2
at the end of the rules indicates that the hinge-loss functions based on groundings of these rules are squared, for a smoother tradeoff.
For a full description of rule syntax, see the Rule Specification .
For more details on hinge-loss functions and squared potentials, see the publications on our PSL webpage.
PSL rules consist of predicates. The names of the predicates used in our model and possible substitutions of these predicates with actual entities from our network are defined inside the file simple-acquaintances.data
.
Let's have a look:
predicates:
Knows/2: open
Likes/2: closed
Lived/2: closed
observations:
Knows : ../data/knows_obs.txt
Lived : ../data/lived_obs.txt
Likes : ../data/likes_obs.txt
targets:
Knows : ../data/knows_targets.txt
truth:
Knows : ../data/knows_truth.txt
In the predicate
section, we list all the predicates that will be used in rules that define the model.
The keyword open
indicates that we want to infer some substitutions of this predicate while closed
indicates that this predicate is fully observed.
I.e. all substitutions of this predicate have known values and will behave as evidence for inference.
For our simple example, we fully observe where people have lived and what things they like (or dislike).
Thus, Likes
and Lived
are both closed predicates.
We are aware of some instances of people knowing each other, but wish to infer the other instances Knows
an open predicate.
In the observations
section, for each predicate for which we have observations, we specify the name of the tab-separated file containing the observations.
For example, knows_obs.txt
and lived_obs.txt
specifies which people know each other and where some of these people live, respectively.
The targets
section specifies a .txt
file that, for each open predicate, lists all substitutions of that predicate that we wish to infer.
In knows_targets.txt
, we specify the pairs of people for whom we wish to infer.
The truth
section specifies a .txt
file that provides a set of ground truth observations for each open predicate.
Here, we give the actual values for the Knows
predicate for all the people in the network as training labels. We describe the the general data loading scheme in more detail in the sections below.
More advanced features of the .data
file can be found on the CLI Data File Format page.
To create a PSL model, you should define a set of rules in a .psl
file.
Let's go over the basic syntax to write rules. Consider this very general rule form:
w: P(A,B) & Q(B,C) -> R(A,C) ^2
The first part of the rule, w
, is an integer value that specifies the weight of the rule.
In this example, P
, Q
and R
are predicates.
Logical rules consist of the rule "body" and rule "head."
The body of the rule appears before the ->
which denotes logical implication.
The body can have one or more predicates conjuncted together with the &
that denotes logical conjunctions.
The head of the rule should be a single predicate.
The predicates that appear in the body and head can be any combination of open and closed predicate types.
The Rule Specification page contains the full syntax for PSL rules.
In a .data
file, you should first define your predicates:
as shown in the above example.
Use the open
and closed
keywords to characterize each predicate.
A closed
predicate is a predicate whose values are always observed.
For example, the knows
predicate from the simple example is closed because we fully observe the entire network of people that know one another.
On the other hand, an open
predicate is a predicate where some values may be observed, but some values are missing and thus, need to be inferred.
As shown above, then create your observations:
, targets:
and truth:
sections that list the names of files that specify the observed values for predicates, values you want to infer for open predicates, and observed ground truth values for open predicates.
For all predicates, all possible substitutions should be specified either in the target files or in the observation files. The observations files should contain the known values for all closed predicates and can contain some of the known values for the open predicates. The target files tell PSL which substitutions of the open predicates it needs to infer. Target files cannot be specified for closed predicates as they are fully observed.
The truth files provide training labels in order learn the weights of the rules directly from data. This is similar to learning the weights of coefficients in a logistic regression model from training data.
Run inference with the general command:
java -jar psl-cli.jar --infer --model [name of model file].psl --data [name of data file].data
When we run inference, the inferred values are outputted to the screen. If you want to write the outputs to a file and use the inferred values in various ways downstream, you can use:
java -jar psl-cli.jar --infer --model [name of model file].psl --data [name of data file].data --output [directory to write output files]
Values for all predicates will be output as tab-separated files in the specified output directory.
With the inferred values, some downstream tasks that you can perform are:
Ensure that you have the prerequisites installed.
Looking at the examples is a great way to get familiar with the Groovy interface. We also have an in depth walkthrough of one of our examples.
After you have looked at the examples, you should set up a new PSL project . You can run a PSL Groovy program in the same way that you run an example program.
Here are some more detailed topics that you may need:
Release Date | PSL | PSL Utils | PSL Experimental | |||||||
---|---|---|---|---|---|---|---|---|---|---|
2015-10-11 | 1.2.1 | Code | API Doc | Static Wiki | ||||||
2017-07-04 | 2.0.0 | Code | API Doc | Static Wiki | 1.0.0 | Code | API Doc | 1.0.0 | Code | API Doc |
2018-07-31 | 2.1.0 | Code | API Doc | Static Wiki | 2.1.0 | Code | API Doc | 2.1.0 | Code | API Doc |
The Canary is a published build of PSL that is based on the development branch. The name "Canary" comes from the iconic use of a canary in a coal mine to detect toxic gas. The build is somewhere near the tip of the development tree. It is updated whenever the PSL developers feel a significant change has been made in development.
To make using the canary build easier for PSL users, we release two versions of the canary build in parallel.
On version is always called CANARY
and will replace the previous build.
The other version will be called CANARY-X.Y.Z
where X.Y
matches the major/minor number of the next stable release and Z
just increments by one for each canary build.
Unfortunately, we do not yet have any fancy infrastructure to check what canary versions are available, so the easiest way will be to just look at the maven repository.
If you always want the latest and are willing to have potential incompatibilities between updates, use the unversioned canary. If you want a newer version but don't want it to update or you are collaborating with other people, use the versioned canary.
The use canary, simply change your PSL version in your pom.xml
to CANARY
or CANARY-X.Y.Z
(with the proper substitution for X.Y.Z
).
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>CANARY</version>
</dependency>
...
</dependencies>
<dependencies>
...
<dependency>
<groupId>org.linqs</groupId>
<artifactId>psl-groovy</artifactId>
<version>CANARY-2.1.0</version>
</dependency>
...
</dependencies>
Using the canary in the CLI just means grabbing the new jar file from the Maven repository.
If you are using the versioned canary, then just update your version numbers in your pom.
If you are using the unversioned canary, then you will have to delete the old canary from your Maven cache.
On Lunix/Mac, this is at: ~/.m2/repository/org/linqs/psl*/CANARY