Advanced Topics


Basic Example

After ensuring that the prerequisites and examples are installed, change to the directory containing the project of examples. Inside this project you will find the BasicExample.groovy example. First, navigate to the parent directory of the BasicExample.groovy example:

>> cd psl-example/src/main/java/edu/umd/cs/example

Here, you will find BasicExample.groovy. This file provides an example of using the Groovy PSL syntax for defining predicates and rules, loading predicate data, running basic inferences, and learning rule weights.

Model: Creation and Configuration

The first portion of a PSL program creates a model and defines configuration parameters for that model.

We create a ConfigBundle which loads properties from the file: /src/main/resources/psl.properties

ConfigManager cm = ConfigManager.getManager()
ConfigBundle config = cm.getBundle("basic-example")

Now, we create a DataStore to enable database functionality for our PSL program, and provide the specified configuration parameters.

def defaultPath = System.getProperty("java.io.tmpdir")
String dbpath = config.getString("dbpath", defaultPath + File.separator + "basic-example")
DataStore data = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, dbpath, true), config)

Finally, with our DataStore created, we can create a PSL model:

PSLModel m = new PSLModel(this, data)

Model: Predicates, Functions, Rules, and Constraints

'BasicExample.groovy' defines 4 simple predicates:

Predicates

  • Network

    m.add predicate: "Network", types: [ArgumentType.UniqueID, ArgumentType.UniqueID]
  • Name

m.add predicate: "Name", types: [ArgumentType.UniqueID, ArgumentType.String]
  • Knows
m.add predicate: "Knows", types: [ArgumentType.UniqueID, ArgumentType.UniqueID]
  • SamePerson
m.add predicate: "SamePerson", types: [ArgumentType.UniqueID, ArgumentType.UniqueID]

See defining predicate types for more information on defining predicate types.

Functions

In addition, BasicExample.groovy defines the function:

  • SameName
m.add function: "SameName" , implementation: new LevenshteinSimilarity()

See defining functions for more information on defining functions. For the purposes of our example, SameName is a function that maps a pair of entities to 1 if their names are identical and 0 otherwise.

Next, we define ground terms (constants) to refer to two distinct social networks:

GroundTerm snA = data.getUniqueID(1)
GroundTerm snB = data.getUniqueID(2)

In addition to the above predicates and function, we also define the following rules which are written both with pseudo code and their corresponding PSL syntax below. For information on writing rules using PSL's syntax please see writing rules.

Rules

  • IF ( Network(A, snA) AND Network(B, snB) AND Name(A,X) AND Name(B,Y) AND SameName(X,Y) ) THEN SamePerson(A,B)
m.add rule : ( Network(A, snA) & Network(B, snB) & Name(A,X) & Name(B,Y)
    & SameName(X,Y) ) >> SamePerson(A,B),  weight : 5

Similarly, another rule we might define utilizes the knowledge of the SamePerson and Knows predicates to infer other values of the SamePerson predicate:

  • IF ( Network(A, snA) AND Network(B, snB) AND SamePerson(A,B) AND Knows(A, Friend1) AND Knows(B, Friend2) ) THEN SamePerson(Friend1, Friend2)
    m.add rule : ( Network(A, snA) & Network(B, snB) & SamePerson(A,B) & Knows(A, Friend1)
      & Knows(B, Friend2) ) >> SamePerson(Friend1, Friend2) , weight : 3.2

For more information on defining rules see defining rules.

Constraints

After we define our rules, we then define constraints for our model.

m.add PredicateConstraint.PartialFunctional, on : SamePerson
m.add PredicateConstraint.PartialInverseFunctional, on : SamePerson
m.add PredicateConstraint.Symmetric, on : SamePerson

In this case, we restrict that each person can be aligned to at most one other person in the other social network. To do so, we define two partial functional constraints where the latter is on the inverse. We also say that samePerson must be symmetric, i.e., samePerson(p1, p2) == samePerson(p2, p1).

Finally, we can also define a prior, which incorporates our assumption about predicate values into the model. For example, the prior,

m.add rule: ~SamePerson(A,B), weight: 1

suggests that the model should assume that two people are not generally the same person.

Model: Data Loading

In order to load data, we must define a partition. We define the evidencePartition to store all of our knowledge about our predicates into the database.

def evidencePartition = data.getPartition("evidencePartition");

We can insert data into the data by manually inserting values, or loading from file.

To manually insert data, we define an inserter for the specified partition, and insert accordingly.

def insert = data.getInserter(Name, evidencePartition);

/* Social Network A */
insert.insert(1, "John Braker");
insert.insert(2, "Mr. Jack Ressing");
insert.insert(3, "Peter Larry Smith");
insert.insert(4, "Tim Barosso");
insert.insert(5, "Jessica Pannillo");
insert.insert(6, "Peter Smithsonian");
insert.insert(7, "Miranda Parker");

/* Social Network B */
insert.insert(11, "Johny Braker");
insert.insert(12, "Jack Ressing");
insert.insert(13, "PL S.");
insert.insert(14, "Tim Barosso");
insert.insert(15, "J. Panelo");
insert.insert(16, "Gustav Heinrich Gans");
insert.insert(17, "Otto v. Lautern");

To load data from a file, you simply call the InserterUtils class to load delimited data. Here we show how to load data for the Network and Knows predicates from delimited text files, sn_network.txt and sn_knows.txt.

def dir = 'data'+java.io.File.separator+'sn'+java.io.File.separator;

insert = data.getInserter(Network, evidencePartition)
InserterUtils.loadDelimitedData(insert, dir+"sn_network.txt");

insert = data.getInserter(Knows, evidencePartition)
InserterUtils.loadDelimitedData(insert, dir+"sn_knows.txt");

Model: Inference

Now that we have set up our model and data loading, we are ready to enable inference to predict the unknown values of our predicates.

Database Preparation

We start by defining a second partition, targetPartition, that holds the target values for which we want to predict. We then setup a database that takes in 3 arguments:

  • readPartition -- a partition that stores your ground knowledge
  • toClose -- a set which indicates which predicates you want to close. Closing a predicate treats all of its atoms as observed and prevents prediction of those atom values
  • writePartition -- a partition that stores the knowledge you predict

The syntax for this procedure is simple:

def targetPartition = data.getPartition("targetPartition");
Database db = data.getDatabase(targetPartition, [Network, Name, Knows] as Set, evidencePartition);

In order to make predictions, however, we must specify which atoms we want to predict (i.e., we must add such atoms to our targetPartition.

For this example, we add all combinations of user pairs by considering the UniqueIDs used by our data (as assigned in the Data Loading section above).

Set<GroundTerm> usersA = new HashSet<GroundTerm>();
Set<GroundTerm> usersB = new HashSet<GroundTerm>();
for (int i = 1; i < 8; i++)
    usersA.add(data.getUniqueID(i));
for (int i = 11; i < 18; i++)
    usersB.add(data.getUniqueID(i));

Map<Variable, Set<GroundTerm>> popMap = new HashMap<Variable, Set<GroundTerm>>();
popMap.put(new Variable("UserA"), usersA)
popMap.put(new Variable("UserB"), usersB)

DatabasePopulator dbPop = new DatabasePopulator(db);
dbPop.populate((SamePerson(UserA, UserB)).getFormula(), popMap);
dbPop.populate((SamePerson(UserB, UserA)).getFormula(), popMap);

Running Inference

Now that our database is prepared, we can run inference simply with the following call:

MPEInference inferenceApp = new MPEInference(m, db, config);
inferenceApp.mpeInference();
inferenceApp.close();

To view how our inference app performed, we print the results of our predictions by printing all atomic values of SamePerson in our database:

println "Inference results with hand-defined weights:"
DecimalFormat formatter = new DecimalFormat("${symbol_pound}.${symbol_pound}${symbol_pound}");
for (GroundAtom atom : Queries.getAllAtoms(db, SamePerson))
    println atom.toString() + "${symbol_escape}t" + formatter.format(atom.getValue());

Model: Weight Learning

When we defined our model, we specified a predefined weight for each rule. It may be the case that we would rather learn an optimal weight for each rule. In order to do so, we must provide evidence data from which we can learn.

In our example, evidence would be the 'true' alignment of our social networks, which we can load into another partition.

Partition trueDataPartition = data.getPartition("trueDataPartition");
insert = data.getInserter(SamePerson, trueDataPartition)
InserterUtils.loadDelimitedDataTruth(insert, dir + "sn_align.txt");

where sn_align.txt stores delimited data with truth values (e.g., 1,11,1.0 which says that the value of the atom SamePerson(1,11)=1.0).

Once we have evidence available, we can run weight learning with a few short calls. First, we open a database with our true data as the readPartition, and specify which predicates possess values that are fully observed (in this case, only SamePerson).

Database trueDataDB = data.getDatabase(trueDataPartition, [samePerson] as Set);

Then we call weight learning as follows:

MaxLikelihoodMPE weightLearning = new MaxLikelihoodMPE(m, db, trueDataDB, config);
weightLearning.learn();
weightLearning.close();

To see how our learning method did, we can view our weights by printing the model:

println "Learned model:"
println m

Model: Evaluation

To test out our learned weights, we want to follow the process of data loading and populating again to load in a new example.

//Data Loading
Partition evidencePartition2 = data.getPartition("evidencePartition2");

insert = data.getInserter(Network, evidencePartition2)
InserterUtils.loadDelimitedData(insert, dir+"sn2_network.txt");

insert = data.getInserter(Name, evidencePartition2);
InserterUtils.loadDelimitedData(insert, dir+"sn2_names.txt");

insert = data.getInserter(Knows, evidencePartition2);
InserterUtils.loadDelimitedData(insert, dir+"sn2_knows.txt");

//Populating
def targetPartition2 = data.getPartition("targetPartition2");
Database db2 = data.getDatabase(targetPartition2, [Network, Name, Knows] as Set, evidencePartition2);

usersA.clear();
for (int i = 21; i < 28; i++)
    usersA.add(data.getUniqueID(i));
usersB.clear();
for (int i = 31; i < 38; i++)
    usersB.add(data.getUniqueID(i));

dbPop = new DatabasePopulator(db2);
dbPop.populate((SamePerson(UserA, UserB)).getFormula(), popMap);
dbPop.populate((SamePerson(UserB, UserA)).getFormula(), popMap);

And then, we run inference and print our results:

inferenceApp = new MPEInference(m, db2, config);
result = inferenceApp.mpeInference();
inferenceApp.close();

println "Inference results on second social network with learned weights:"
for (GroundAtom atom : Queries.getAllAtoms(db2, SamePerson))
    println atom.toString() + "${symbol_escape}t" + formatter.format(atom.getValue());

Builtin Similarity Functions

PSL comes with several builtin similarity functions. If you have a need not captured by these functions, then you can also create customized similarity functions.

Text Similarity

Name: Cosine Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.CosineSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Cosine_similarity

Name: Dice Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.DiceSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient

Name: Jaccard Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.JaccardSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaccard_index

Name: Jaro Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.JaroSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.census.gov/srd/papers/pdf/rr91-9.pdf

Name: Jaro-Winkler Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance

Name: Level 2 Jaro-Winkler Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.Level2JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Jaro-Winkler Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).

Name: Level 2 Levenshtein Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.Level2LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Levenshtein Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).

Name: Level 2 Monge Elkan Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.Level2MongeElkanSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.aaai.org/Papers/KDD/1996/KDD96-044.pdf

Name: Levenshtein Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Levenshtein_distance

Name: Same Initials
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.SameInitials
Arguments: String, String
Return Type: Discrete
Description: First splits the input strings on any whitespace and ensures both have the same number of tokens (returns 0 if they do not). Then, the first character of all the tokens are checked for equality (ignoring case and order of appearance). Note that all all character that are not alphabetic ASCII characters are considered equal (eg. all numbers and unicode are considered the same character).

Name: Same Number of Tokens
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.SameNumTokens
Arguments: String, String
Return Type: Discrete
Description: Checks same number of tokens (delimited by any whitespace).

Name: Sub String Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.SubStringSimilarity
Arguments: String, String
Return Type: Continuous
Description: If one input string is a substring of another, then the length of the substring divided by the length of the text is returned. 0 is returned if neither string is a substring of the other.


Change log

Version 1.2.1 (https://github.com/linqs/psl/tree/1.2.1)

  • Bug fix for External Function registration

Version 1.2 (https://github.com/linqs/psl/tree/1.2)

Version 1.1.1 (https://github.com/linqs/psl/tree/1.1.1)

  • Improved examples, which demonstrate database population for non-lazy inference and learning
  • Support for learning negative weights (limited to inference methods for discrete MRFs that support negative weights)
  • Bug fixes

Version 1.1 (https://github.com/linqs/psl/tree/1.1)

  • An improved Groovy interface. Try the new examples via https://github.com/linqs/psl/wiki/Installing-examples to learn the new interface.
  • New, improved psl-core architecture
  • Much faster inference based on the alternating direction method of multipliers (ADMM).
  • Improved max-likelihood weight learning
  • New max-pseudolikelihood and large-margin weight learning
  • Many bug fixes and minor improvements.

Version 1.0.2 (https://github.com/linqs/psl/tree/1.0.2)

  • Fixed bugs in HomogeneousIPM and MOSEK add-on caused by bug in parallel colt when using selections from large, sparse matrices.
  • Fixed bug when learning weights of programs which contain set functions.
  • Reduced memory footprint of HomogeneousIPM and matrices produced by ConicProgram.

Version 1.0.1 (https://github.com/linqs/psl/tree/1.0.1)

  • Fixed bug in optimization program when the same atom was used more than once in a ground rule or constraint.
  • Added release profile to parent POM for better packaging.
  • Minor changes to archetypes.

Version 1.0 (https://github.com/linqs/psl/tree/1.0)


Configuration

Many components of the PSL software have modifiable parameters and options, called properties. Every property has a key, which is a string that should uniquely identify it. These keys are organized into a namespace hierarchy, with each level separated by dots, e.g. <namespace>.<option>. Each PSL class can specify a namespace for the options used by the class and its subclasses. For example, the edu.umd.cs.psl.application.learning.weight.maxlikelihood.VotedPerceptron weight learning class specifies the namespace votedperceptron. Setting the configuration option votedperceptron.stepsize allows you to control the size of the gradient descent update step in the VotedPerceptron weight learning class.

Every property has a type and a default value, which is the value the object will use unless a user overrides it. Every class with properties documents them by declaring their keys as public static final Strings, with Javadoc describing the corresponding property's type and semantics. Another public static final member declares the default value for that property.

Bundles

Users of the PSL software can specify property values by grouping them into bundles, which are objects that implement the edu.umd.cs.psl.config.ConfigBundle interface. Every bundle has a name and a map from property keys to values. A configurable component takes a ConfigBundle as an argument in its constructor and queries it with a property key and a default value. If the bundle does not map the key to a value, it returns the provided default, e.g.

ConfigBundle cb;
stepsize = cb.getProperty('votedperceptron.stepsize', 0.1);

PSL components also pass their bundles to components that they create, so a user can group their property values into a single bundle, pass it into a component with which they interact, and the values will be used by the entire stack of components. Any properties that don't belong to a particular component will be ignored by that component.

The psl.properties file

PSL projects can specify different configuration bundles in a file named psl.properties on the classpath. The standard location for this file is <project root>/src/main/resources/psl.properties. Each key-value pair should be specified on its own line with a <bundle>.<key> = <value> format. The following example sets options for the example and test bundles.

# This is an example properties file for PSL.
# 
# Options are specified in a namespace hierarchy, with levels separated by '.'.
# The top levels are called bundles. Use the ConfigManager class to access them.

# Weight learning parameters
# Parameters for voted perceptron algorithm
# This property adaptively changes the step size of the updates
example.votedperceptron.schedule = true

# This property specifies the number of iterations of voted perceptron updates
example.votedperceptron.numsteps = 700

# This property specifies the initial step size of the voted perceptron updates
example.votedperceptron.stepsize = 0.1

# Parameters for the Hard-EM weight learning algorithm
# This property specifies the number of Hard-EM updates
test.em.iterations = 1000

# This property specifies the tolerance to check for convergence for Hard-EM
test.em.tolerance = 1e-5

The ConfigManager object

The standard way to create bundles is with an instance of the edu.umd.cs.psl.config.ConfigManager class. ConfigManager uses the Singleton pattern. The ConfigManager instance will read psl.properties to generate bundles. Then a bundle can be instantiated with the code

ConfigBundle bundle = ConfigManager.getManager().getBundle("example");

Creating a new project

After ensuring that the prerequisites are installed, execute the following command:

mvn archetype:generate -DarchetypeArtifactId=psl-archetype-groovy \
-DarchetypeRepository=https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-releases/ \
-DarchetypeGroupId=edu.umd.cs -DarchetypeVersion=1.2.1

When prompted to accept the default property values, enter 'Y'.

You can replace the version number at the end with the PSL version you want to use. This page shows the different versions that have been released.

The Maven archetype plugin will then create a new project in which you can write PSL programs. The project will be configured to use the Maven project-management tool. You should be prompted for an a group ID (a Maven project namespace, just like a Java package), artifact ID (project name), and a version number for your project, as well as a name for the first Java package to create, which defaults to the specified group ID.

The PSL libraries will be downloaded automatically (if necessary) when you use Maven to compile and run this project.

The project will be set up with configuration files in

<project root>/src/main/resources

You can place Java and Groovy source files in

<project root>/src/main/java

A stub Groovy script will be created at

<project root>/src/main/java/<package path>/App.groovy

which you can run.

Tips and troubleshooting

  • The Windows shell (CMD.EXE) doesn't accept line continuations ('\'), so remove those and enter the command all on one line.

Database creation

To read in the truth values of ground atoms from text files, a DataStore object is required.

DataStore data = new RelationalDataStore(pslModel, entityid : 'string');
data.setup db : DatabaseDriver.H2, type : memory; //in-memory database
//data.setup db : DatabaseDriver.H2; //persistent database

In the code snippet above, the RelationalDataStore constructor takes a PSLModel object as its first argument. entityid : 'string' in the second argument indicates that the arguments of ground atoms can be any text value. If the second argument is omitted, all arguments of ground atoms must be integers (corresponding to IDs of the arguments). To store the database contents in RAM, use the type : memory expression. To store its contents on disk, simply omit the expression.

After a DataStore is created, we can read in the truth values of ground atoms from text files as follows:

insert = data.getInserter(<predicateName>)
insert.loadFromFile(<fileName>)

<predicateName> is the name of the predicate whose ground atoms are to be read, and <fileName> is the name of the file containing its ground atoms' truth values. If <predicateName> is of type PredicateTypes.BooleanTruth, then the file must contain tab-delimited rows, with each row corresponding to the arguments of true ground atoms. (The closed-world assumption is made, i.e., atoms not appearing in the file are assumed false.) If <predicateName> is of type PredicateTypes.SoftTruth, then the insert.loadFromFileWithTruth method must be used instead of insert.loadFromFile, and the last value of each row in the file must be a truth value in the range [0,1]. The default minimum truth value is 0.1. This can be changed by using PSLModel's setDefaultActivationParameter(double) method.

By default, the ground atoms and their truth values are read into partition 1 of DataStore. The query predicates whose values are to be inferred should be read into another partition by a specifying partition ID as an argument: data.getInserter(<predicateName>, <partionID>)

The following code snippet shows how to read in BooleanTruth and SoftTruth evidence ground atoms and SoftTruth query atoms.

for (Predicate p : [<predicateName1>, <predicateName2>, ...]) //BooleanTruth evidence predicates
{
  insert = data.getInserter(p);
  insert.loadFromFile(p.getName()+".txt");//<predicateName> atoms are stored in <predicateName>.txt
}

for (Predicate p2 : [<predName1>, <predName2>,...]) //SoftTruth evidencepredicate
{
  insert = data.getInserter(p2);
  insert.loadFromFileWithTruth(p2.getName()+".txt");//note use of loadFromFileWithTruth
`}

for (Predicate q : [<queryPred1>,<queryPred2>,...]) //SoftTruth query predicate
{

  insert = data.getInserter(q,2); //Partition 2 used to store query predicate ground atoms
  insert.loadFromFileWithTruth(q.getName()+".txt"); //note use of loadFromFileWithTruth
}

  • Note: For the latest version(1.2.1) use loadDelimitedData() and loadDelimitedDataTruth() instead of loadFromFile() and loadFromFileWithTruth() respectively.

Developing PSL

Setting up your environment

Cloning the PSL repository

If you are already comfortable using Git and you don't want or need to push commits to GitHub, then you can just clone the PSL repository using the command below. Otherwise, this short primer on some Git essentials may be useful.

>> git clone https://github.com/linqs/psl.git

Building PSL from source

Change to the top-level directory of your working copy and run

>> mvn compile

You can install PSL to your local Maven repository by running

>> mvn install

Best practices

Git policies

If you're a member of the LINQS group, you may eventually need to release a new version of PSL. There are a number of steps involved in the process, which are detailed in the guide for Releasing a New Stable Version.


ER Example

A more sophisticated example of entity resolution on CiteSeer data is available here: https://github.com/linqs/er-example


Eclipse integration

Eclipse is an extensible, integrated development environment that can be used to develop PSL and PSL projects. The recommended way of using Eclipse with PSL is to use the Eclipse plugin for Maven to generate Eclipse project information for a PSL project and then import that project into Eclipse.

Prerequisites

Ensure that you have version 3.6 (Helios) or higher of Eclipse installed. Then, install the Groovy Eclipse plugin and the optional 1.8 version of the Groovy compiler, which is available when installing the plugin. The version 1.8 compiler is what Maven will use to compile the Groovy scripts, so builds done by either tool should be interchangeable. If you use an older version, Eclipse will probably recompile some files which then won't be compatible with the rest, and it won't run. (Cleaning and rebuilding everything should help.)

You might have to change the Groovy compiler version to 1.8.x in your Groovy compiler preferences (part of the Eclipse preferences).

You need to add a classpath variable in Eclipse to point to your local Maven repository. You can access the variables either from the main options or from the build-path editor for any project. Where you specify additional libs, make a new variable (there should be a button) with the name M2_REPO and the path to your repo (e.g., ~/.m2/repository). This can also be achieved automatically via the following Maven command:

mvn -Declipse.workspace=/path/to/workspace eclipse:configure-workspace

Generating and importing Eclipse metadata

In the top-level directory of your PSL project, run

>> mvn eclipse:eclipse

Then in Eclipse, go to File/Import/General/\. Select the top-level directory of your project. You probably don't want to copy it into the workspace, so uncheck that option.

Running programs

Be sure to run as a "Java application."

Tips

  • If you want to delete the Eclipse metadata for any reason, run
>> mvn eclipse:clean
  • If you want to generate metadata for a project that depends on another project you're developing with Eclipse (PSL or not), run
>> mvn eclipse:eclipse -Declipse.workspace=<path to Eclipse workspace>

The Eclipse plugin for Maven will look in the provided workspace for any projects that match dependencies declared in your project's POM file. Your project will be configured to depend on any such projects found as opposed to their respective installed jars. This way, changes to the sources of those dependencies will be seen by your project without reinstalling the dependencies. Note that this works even for dependencies that were imported but not copied into the workspace.

  • The m2eclipse Eclipse plugin is another option for developing PSL projects with Eclipse. It differs from the recommended method in that it is an Eclipse plugin designed to support Maven projects, as opposed to a Maven plugin designed to support Eclipse.

Example PSL Programs

Examples written for the Command Line interface

Examples written for the Groovy interface

Installing examples

  1. Basic Example
  2. Ontology Alignment Example
  3. External Functions Example
  4. ER Example

External functions example

After ensuring that the prerequisites and examples are installed, change to the directory containing the project of examples. Inside this project you will find the ExternalFunctionExample.groovy example. First, navigate to the parent directory of the ExternalFunctionExample.groovy example:

>> cd psl-example/src/main/java/edu/umd/cs/psl/example/external

Here, you will find ExternalFunctionExample.groovy. This example provides an instance of calling an external Java function from within the Groovy PSL syntax.


Fixing NegativeArraySizeException in H2

From Pigi Kouki:

I run the MPE Inference with a relatively big dataset as input and I got the following error during the mpeInference step:

Exception in thread "main" java.lang.RuntimeException: Error executing database query.
    at edu.umd.cs.psl.database.rdbms.RDBMSDatabase.executeQuery(RDBMSDatabase.java:612)
    at edu.umd.cs.psl.model.atom.PersistedAtomManager.executeQuery(PersistedAtomManager.java:108)
    at edu.umd.cs.psl.model.kernel.rule.AbstractRuleKernel.groundAll(AbstractRuleKernel.java:81)
    at edu.umd.cs.psl.application.util.Grounding.groundAll(Grounding.java:59)
    at edu.umd.cs.psl.application.util.Grounding.groundAll(Grounding.java:43)
    at edu.umd.cs.psl.application.inference.MPEInference.mpeInference(MPEInference.java:106)
    at edu.umd.cs.psl.application.inference.MPEInference$mpeInference.call(Unknown Source)
    ...  
Caused by: org.h2.jdbc.JdbcSQLException: General error: "java.lang.NegativeArraySizeException"; SQL statement:
SELECT DISTINCT t1.UniqueID_0 AS U1,t1.UniqueID_1 AS P1,t2.UniqueID_1 AS U2 FROM RATING_predicate t1, SIM_USERS_predicate t2 WHERE ((t1.parti$
    at org.h2.message.Message.getSQLException(Message.java:110)
    at org.h2.message.Message.convert(Message.java:287)
    at org.h2.message.Message.convert(Message.java:248)
    at org.h2.command.Command.executeQuery(Command.java:134)
    at org.h2.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:76)
    at edu.umd.cs.psl.database.rdbms.RDBMSDatabase.executeQuery(RDBMSDatabase.java:575)
    ... 26 more
Caused by: java.lang.NegativeArraySizeException
    at org.h2.util.ValueHashMap.reset(ValueHashMap.java:51)
    at org.h2.util.ValueHashMap.rehash(ValueHashMap.java:58)
    at org.h2.util.HashBase.checkSizePut(HashBase.java:79)
    at org.h2.util.ValueHashMap.put(ValueHashMap.java:78)
    at org.h2.util.ValueHashMap.rehash(ValueHashMap.java:62)
    at org.h2.util.HashBase.checkSizePut(HashBase.java:79)
    at org.h2.util.ValueHashMap.put(ValueHashMap.java:78)
    at org.h2.util.ValueHashMap.rehash(ValueHashMap.java:62)
    at org.h2.util.HashBase.checkSizePut(HashBase.java:79)
    at org.h2.util.ValueHashMap.put(ValueHashMap.java:78)
    at org.h2.result.LocalResult.addRow(LocalResult.java:262)
    at org.h2.command.dml.Select.queryFlat(Select.java:499)
    at org.h2.command.dml.Select.queryWithoutCache(Select.java:558)
    at org.h2.command.dml.Query.query(Query.java:243)
    at org.h2.command.CommandContainer.query(CommandContainer.java:81)
    at org.h2.command.Command.executeQuery(Command.java:132)
    ... 28 more

I searched online and find the following useful link that helped me solve the problem

https://groups.google.com/forum/#!topic/h2-database/XeFtWY_vvBQ

So I needed to download the source code for h2-1.2.126-sources.jar (the version of h2 that PSL uses), change the line in HashBase.java file

maxSize = (int) (len * MAX_LOAD / 100L); 

to

maxSize = (int) (((long)len) * MAX_LOAD / 100L); 

and then create the new jar of this library and included it in the project.


Functions

A customized similarity function can be created by implementing the AttributeSimilarityFunction interface in a Groovy file. It must return a value in [0,1]. For example:

class MyStringSimilarity implements AttributeSimilarityFunction
{
  @Override
  public double similarity(String a, String b) { return a.equals(b)?1.0:0.0; }
}

A function comparing the similarity between two entities or text can then be declared as follows:

m.add function: <functionName> , implementation: new <SimilarityFunction>()

  • <functionName> is the name of the function, e.g., "sameName".
  • <SimilarityFunction> is the name of the class implementing the AttributeSimilarityFunction interface, e.g., MyStringSimilarity.

A function can be used in the same manner as a predicate in rules.


Getting Started with CLI

Setup

PSL requires that you have Java installed .

The PSL jar file psl-cli-2.0-SNAPSHOT.jar already contains all required PSL libraries that you need to be able to run your PSL programs. You can find a current snapshot of this .jar file from our resources directory until we finalize our v2.0 release.

Running your first program

Let's first download the files for our example program, run it and see what it does!

In this program, we'll use information about known locations of some people and friendship networks between people to collectively infer where some other people live. This form of inference is called collective classification. We'll first run the program and see the output. We will be working from the command line so open up your shell or terminal.

Download the simple example

You can download the files needed for our simple first example program from Simple CLI Example Files This will a create a new PSLCLIFirstExample directory in your current directory.

Run your first PSL program

Change directories to the new PSLCLIFirstExample that was created in your current directory in your open command line shell. From there, run the following command:

java -jar psl-cli-2.0-SNAPSHOT.jar -infer -model simple_cc.psl -data simple_cc.data

You should now see output that looks like this (note that the order of the output lines may differ):

data:: loading:: ::starting
data:: loading:: ::done
model:: loading:: ::starting
Model:
{10.0} ( KNOWS(P1, P2) & LIVES(P1, L) ) >> LIVES(P2, L) {squared}
{10.0} ( KNOWS(P2, P1) & LIVES(P1, L) ) >> LIVES(P2, L) {squared}
{2.0} ~( LIVES(P, L) ) {squared}

model:: loading:: ::done
operation::infer ::starting
operation::infer inference:: ::starting
operation::infer inference:: ::done
LIVES(Alex, Maryland) = 0.9086212203617681
LIVES(Jay, Maryland) = 1.0
LIVES(Ben, Maryland) = 1.0
LIVES(Steve, Maryland) = 0.9086212203617681
PERSON(Steve) = 1.0
PERSON(Ben) = 1.0
PERSON(Jay) = 1.0
PERSON(Alex) = 1.0
KNOWS(Steve, Ben) = 1.0
KNOWS(Alex, Jay) = 1.0
KNOWS(Steve, Jay) = 1.0
KNOWS(Alex, Ben) = 1.0
LOCATION(Maryland) = 1.0
operation::infer ::done

What did it do?

Now that we've run our first program that performs collective classification to infer where some people live based on some known facts about living locations and friendship links, let's understand the steps that we went through to infer the unknown values: defines the underlying model, provided data to the model and ran inference to classify the unknown values.

Defining a Model

A model in PSL is a set of weighted logical rules.

The model is defined inside a text file with the format .psl. We describe the collective location classification model in the file simple_cc.psl. Let's have a look at the rules that make up our model:

10: Knows(P1,P2) & Lives(P1,L) -> Lives(P2,L) ^2
10: Knows(P2,P1) & Lives(P1,L) -> Lives(P2,L) ^2
2: ~Lives(P,L) ^2

The model is expressing the intuition that people that know one another live in the same location. The integer values at the beginning of rules indicate the weight of the rule. Intuitively, this tells us the relative importance of satisfying this rule compared to the other rules. The ^2 at the end of the rules indicates that the hinge-loss functions based on groundings of these rules are squared, for a smoother tradeoff. For more details on hinge-loss functions and squared potentials, see the publications on our PSL webpage.

Loading the Data

Logical rules consist of predicates. The names of the predicates used in our model and possible substitutions of these predicates with actual entities from our network are defined inside the file simple_cc.data. Let's have a look:

predicates:
  Person/1: closed
  Location/1: closed
  Knows/2: closed
  Lives/2: open

observations:
  Person : 
  - person_obs.txt
  - person_obs2.txt
  Location : location_obs.txt
  Knows : knows_obs.txt
  Lives : lives_obs.txt

targets: 
  Lives : lives_targets.txt

truth: 
  Lives : lives_truth.txt

In the predicate section, we list all the predicates that will be used in logical rules that define the model. The keyword open indicates that we want to infer some substitutions of this predicate while closed indicates that this predicate is fully observed. I.e. all substitutions of this predicate have known values and will behave as evidence for inference.

For our simple example, we fully observe the network of people that know each other and thus, knows is a closed predicate. We know living locations for some of the people in the network but wish to infer the others, making lives an open predicate.

In the observations section, for each predicate for which we have observations, we specify the name of the .txt file containing the observations. For example, knows_obs.txt and lives_obs.txt specifies which people know each other and where some of these people live, respectively.

The targets section specifies a .txt file that, for each open predicate, lists all substitutions of that predicate that we wish to infer. In lives_targets.txt, we specify the people whose location we want to infer based on the knows network and the known locations of some of the people.

The truth section specifies a .txt file that provides a set of ground truth observations for each open predicate. Here, we give the actual values for the lives predicate for all the people in the network as training labels. We describe the the general data loading scheme in more detail in the sections below.

Inferring the Missing Values

When we run the java -jar psl-cli-2.0-SNAPSHOT.jar -infer -model simple_cc.psl -data simple_cc.data command with the -infer flag, PSL's inference engine substitutes values from the data files into the logical rules of the collective location classification model and infers whether entities Steve and Alex live in Maryland.

Writing PSL Rules

To create a PSL model, you should define a set of weighted logical rules in a .psl file. Let's go over the basic logical syntax to write rules. Consider this very general rule form:

w: P(A,B) & Q(B,C) -> R(A,C) ^2

The first part of the rule, w, is an integer value that specifies the weight of the rule. In this example, P, Q and R are predicates. Logical rules consist of the rule "body" and rule "head." The body of the rule appears before the -> which denotes logical implication. The body can have one or more predicates conjuncted together with the & that denotes logical conjunctions. The head of the rule should be a single predicate. The predicates that appear in the body and head can be any combination of open and closed predicate types.

To see more examples of logically templated models in the command line interface, see the Command Line Interface Examples . For best practices, tips and tricks to design good, semantically meaningful models, see our Modeling Tips and Tricks .

Organizing your Data

In a .data file, you should first define your predicates: as shown in the above example. Use the open and closed keywords to characterize each predicate.

An closed predicate is a predicate whose values are always observed. For example, the knows predicate from the simple example is closed because we fully observe the entire network of people that know one another. On the other hand, an open predicate is a predicate where some values may be observed, but some values are missing and thus, need to be inferred.

As shown above, then create your observations:, targets: and truth: sections that list the names of .txt files that specify the observed values for predicates, values you want to infer for open predicates and observed ground truth values for open predicates.

For all predicates, all possible substitutions should be specified either in the target files or in the observation files. The observations files should contain the known values for all closed predicates and can contain some of the known values for the open predicates. The target files tell PSL which substitutions of the open predicates it needs to infer. Target files cannot be specified for closed predicates as they are fully observed.

The truth files provide training labels in order learn the weights of the rules directly from data. This is similar to learning the weights of coefficients in a logistic regression model from training data. Weight learning is described below in greater detail.

Running Inference

Run inference with the general command:

java -jar psl-cli-2.0-SNAPSHOT.jar -infer -model [name of model file].psl -data [name of data file].data

When we run inference, the inferred values are outputted to the screen as shown for our example above. If you want to write the outputs to a file and use the inferred values in various ways downstream, you can use:

java -jar psl-cli-2.0-SNAPSHOT.jar -infer -model [name of model file].psl -data [name of data file].data -output [directory to write output files]

Values for all predicates will be output as .csv files in the specified output directory.

With the inferred values, some downstream tasks that you can perform are:

  • if you have a gold standard set of labels, you can evaluate your model by computing standard metrics like accuracy, AUC, F1, etc.
  • you may want to use the predicted outputs of PSL as inputs for another model.
  • you may want to visualize the predicted values and use the outputs of PSL as inputs to a data visualization program.

Learning Rule Weights

We see above that in our example, we explicitly stated the weights for each rule. Think of these weights as dictating the relative importance of each rule, just as the weights of logistic regression or SVM features. Instead of explicitly giving the weights, we can also learn the weights from training labels.

To perform weight learning instead of inference, use the command:

java -jar psl-cli-2.0-SNAPSHOT.jar -learn -model [name of model file].psl -data [name of data file].data

Running the weight learning command outputs a .psl model file with the learned weights and logical rules. You can use this produced model file for running inference with the learned model.

PSL provides gradient-descent based weight learning algorithms that treat the files specified in the truth: section of your .data file as the training labels.


Getting Started with Groovy

Setup

PSL requires that you have Java installed. To use the Groovy interface it is also required that you install maven.

Looking at a basic example

Creating a new Groovy project

Groovy Syntax

Running a Groovy PSL program

Advanced Topics

Additional examples


Getting Started with Java

Application builders and advanced users can integrate PSL into their code as a library. Since the PSL codebase is organized as a Maven project, it is easiest to include PSL as a dependency via Maven.

Integrating PSL via Maven

The PSL codebase is organized as a Maven project with several subprojects. The subproject most likely of interest is psl-core, but stable versions of all the subprojects are published to the PSL Maven repository . Including a PSL subproject in your Maven project is easy. It requires two steps

First, add psl-core (and any other subprojects) as dependencies to your pom.xml file:

<dependencies>
    ...
    <dependency>
        <groupId>edu.umd.cs</groupId>
        <artifactId>psl-core</artifactId>
        <version>1.2.1</version>
    </dependency>
    ...
</dependencies>

Second, specify the location of the PSL Maven repository in your pom.xml file, anywhere within the <project> </project> tags:

<repositories>
    <repository>
        <releases>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>
            <checksumPolicy>fail</checksumPolicy>
        </releases>
        <id>psl-releases</id>
        <name>PSL Releases</name>
        <url>https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-releases/</url>
        <layout>default</layout>
    </repository>
    <repository>
        <releases>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>
            <checksumPolicy>fail</checksumPolicy>
        </releases>
        <id>psl-thirdparty</id>
        <name>PSL Third Party</name>
        <url>https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-thirdparty/</url>
        <layout>default</layout>
    </repository>
</repositories>

Maven will now make the the required PSL libraries and their dependencies available when compiling and running your project.

The PSL API

Advanced Topics


Glossary

The PSL software uses concepts from the PSL paper , and introduces new ones for advanced data management and machine learning. On this page, we define the commonly used terms and point out the corresponding classes in the codebase.

Please note that this page is organized conceptually, not alphabetically.

Preliminaries

Hinge-loss Markov random field: A factor graph defined over continuous variables in the [0,1] interval with (log) factors that are hinge-loss functions. Many classes in PSL work together to implement the functionality of HL-MRFs, but the class for storing collections of hinge-loss potentials, which define HL-MRFs, is GroundRuleStore.java .

Ground atom: A logical relationship corresponding to a random variable in a HL-MRF. For example, Friends("Steve", "Jay") is an alias for a specific random variable. Implemented in GroundAtom.java .

Random variable atom: A ground atom that is unobserved, i.e., no value is known for it. A HL-MRF assigns probability densities to assignments to random variable atoms. Implemented in RandomVariableAtom.java .

Observed atom: A ground atom that has an observed, immutable value. HL-MRFs are conditioned on observed atoms. Implemented in ObservedAtom.java .

Atom: A generalization of ground atoms that allow logical variables as placeholders for constant arguments. For example, Friends("Steve", A) is a placeholder for all the ground atoms that can be obtained by substituting constants for the logical variable A. Implemented in Atom.java .

Syntax

PSL Program: A set of rules, each of which is a template for hinge-loss potentials or hard linear constraints. When grounded over a base of ground atoms, a PSL program induces a HL-MRF conditioned on any specified observations. Implemented in Model.java .

Rule:

Logical rule:

Arithmetic rule:

Unweighted rule:

Weighted rule:

Data Management

Data Store: An entire data repository, such as a relational database management system (RDBMS). Implemented in DataStore.java .

Partition: A logical division of ground atoms in a data store. Implemented in Partition.java .

Database: A logical view of a data store, constructed by specifying a write partition and one or more read partitions of a data store. Implemented in Database.java .

Open Predicate: A predicate whose atoms can be random variable atoms, i.e., unobserved.The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.

Closed Predicate: A predicate whose atoms are always observed atoms. The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.

Machine Learning


Header Files Specification

At the top of a Groovy file, you must import all the relevant Java and Groovy packages that you wish to use. The list below is a standard set that may be helpful to import in your program

import edu.umd.cs.psl.config.*
import edu.umd.cs.psl.groovy.*;
import edu.umd.cs.psl.database.DataStore;
import edu.umd.cs.psl.database.rdbms.RDBMSDataStore;
import edu.umd.cs.psl.database.rdbms.driver.H2DatabaseDriver;
import edu.umd.cs.psl.database.rdbms.driver.H2DatabaseDriver.Type;
import edu.umd.cs.psl.model.atom.GroundAtom;
import edu.umd.cs.psl.model.atom.QueryAtom;
import edu.umd.cs.psl.model.predicate.Predicate;
import edu.umd.cs.psl.model.term.*;

Home

Welcome to the PSL software Wiki!

Getting Started with Probabilistic Soft Logic

To get started with PSL you can follow one of these guides:

  • Command Line Interface for New Users : If you are new to PSL we suggest that you start with our Command Line Interface (CLI), which allows you to write a complete model in a simple text file.
  • Groovy for Intermediate Users : If you are comfortable with Java/Groovy, and want to get your hands dirty with advanced modeling capabilities we recommend that you use our Groovy interface.
  • Java for Application Developers : If you plan on integrating PSL into your own applications, and will need direct access to the Java API, refer to this guide.

PSL requires Java, so before you start make sure that you have Java installed.

Before you get started you may want to learn more about PSL.

Learn More About PSL

PSL is a machine learning framework for building probabilistic models developed by the Statistical Relational Learning Group LINQS at the University of Maryland and the University of California Santa Cruz. PSL has produced state-of-the-art results in many areas spanning natural language processing, social-network analysis, and computer vision. The complete list of publications and projects is available on the PSL homepage . The homepage also has several videos , to introduce you to PSL.

Resources

Examples

Migration Guide

We are improving PSL all the time, and now have two versions! If you are migrating from PSL 1.0 to 2.0 please refer to our Migration Guide.

Glossary

Developing PSL Guide


How to use the H2 Web Interface

If you use H2 as the backend database for PSL (as is done in the examples), it can be helpful to open up the resulting database and examine it for debugging purposes.

Prerequisites

You should set up your PSL program to use H2 on disk and note where it is stored. For example, if you create your DataStore using the following code

DataStore data = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, "/home/steve/psl", true), config);

then PSL will create an H2 database in the file /home/steve/psl/psl.h2.db. Then, run your program so the resulting H2 database can be inspected.

Starting the H2 Web Server

You will need to use the H2 jar for your classpath. This is likely ~/.m2/repository/com/h2database/h2/1.2.126/h2-1.2.126.jar, but you will need to modify it if, for example, you're using a different version of H2. You start the H2 web server by running the following command:

>> java -cp ~/.m2/repository/com/h2database/h2/1.2.126/h2-1.2.126.jar org.h2.tools.Server

Using the H2 Web Server

Once you have started the web server, you can access it at http://localhost:8082. To log in, you should change the connection string to point to your H2 database file without .h2.db on the end. The username and password are both empty strings.


Install Java

Open up your terminal and type java -version You should see something like:

java version "1.x.0_y".

If you see

java: command not found

please download and install Java.


Install Maven

PSL uses Maven to manage builds and dependencies. Users should install Maven 3.x. PSL is developed with Maven and PSL programs are created as Maven projects. See running Maven for help using Maven to build projects.


Installing examples

To set up the examples, change to the directory in which you want to create the project of examples.

Then execute the following command:

mvn archetype:generate -DarchetypeArtifactId=psl-archetype-example \
-DarchetypeRepository=https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-releases/ \
-DarchetypeGroupId=edu.umd.cs -DarchetypeVersion=1.2.1

When prompted to accept the default property values, enter 'Y'.

You can replace the version number at the end with the PSL version you want to use.

The Maven archetype plugin will then create a new project of PSL examples. The project will be configured to use the Maven project-management tool. The PSL libraries will be downloaded automatically (if necessary) when you use Maven to compile and run this project.

You can now run the example PSL programs.

Tips and troubleshooting

  • The Windows shell (CMD.EXE) doesn't accept line continuations ('\'), so remove those and enter the command all on one line.

Logging

PSL uses SLF4J for logging. In the PSL Groovy program template, SLF4J is bound to Log4j 1.2. The Log4j configuration file is located at src/main/resources/log4j.properties. It should look something like this:

# Set root logger level to the designated level and its only appender to A1.
log4j.rootLogger=ERROR, A1

# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender

# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

The logging verbosity can be set by changing ERROR in the second line to a different level and recompiling. Options include OFF, WARN, DEBUG, and TRACE.


MOSEK add on

MOSEK is software for numeric optimization. PSL can use MOSEK as a conic program solver via a PSL add on.

Setting up the MOSEK add on

First, install MOSEK 6. In addition to a commercial version for which a 30-day trial is currently available, the makers of MOSEK also currently offer a free academic license. Users will need the "PTS" base system for using the linear distribution of the ConicReasoner and the "PTON" non-linear and conic extension to use the quadratic distribution. Both of these components are currently covered by the academic license.

After installing MOSEK, install the included mosek.jar file to your local Maven repository. (This file should be in <mosek-root>/6/tools/platform/<your-platform>/bin.)

mvn install:install-file -Dfile=<path-to-mosek.jar> -DgroupId=com.mosek \
    -DartifactId=mosek -Dversion=6.0 -Dpackaging=jar

Next, add the following dependency to your project's pom.xml file:

<dependencies>
    ...
    <dependency>
        <groupId>edu.umd.cs</groupId>
        <artifactId>psl-addon-mosek</artifactId>
        <version>YOUR-PSL-VERSION</version>
    </dependency>
    ...
</dependencies>

where YOUR-PSL-VERSION is replaced with your PSL version .

Finally, it might be necessary to rebuild your project.

Using the MOSEK add on

After installing the MOSEK add on, you can use it where ever a ConicProgramSolver is used. To use it for inference with a ConicReasoner set the conicreasoner.conicprogramsolver configuration property to edu.umd.cs.psl.optimizer.conic.mosek.MOSEKFactory.

Further, MOSEK requires that two environment variables be set when running. The same bin directory where you found mosek.jar needs to be on the path for shared libraries. The environment variable MOSEKLM_LICENSE_FILE needs to be set to the path to your license file (usually <mosek-root>/6/licenses/mosek.lic).

In bash in Linux, this can be done with the commands

export LD_LIBRARY_PATH=<path_to_mosek_installation>/mosek/6/tools/platform/<platform>/bin
export MOSEKLM_LICENSE_FILE=<path_to_mosek_installation>/mosek/6/licenses/mosek.lic

On Mac OS X, instead set DYLD_LIBRARY_PATH to the directory containing the MOSEK binaries.


Migrating to PSL 2

Changes to data management

Changes to rule syntax

Changes to Java class names

Renames/Moves

  • edu.umd.cs.psl.model.argument.Variable -> edu.umd.cs.psl.model.term.Variable
  • edu.umd.cs.psl.model.argument.ArgumentType -> edu.umd.cs.psl.model.term.ConstantType
  • edu.umd.cs.psl.model.argument.GroundTerm -> edu.umd.cs.psl.model.term.Constant

Removals

  • edu.umd.cs.psl.model.argument

Model Creation

After importing all relevant Java and Groovy files, you create a PSL model to contain predicates and rules, and to learn weight and perform inference as follows: PSLModel model = new PSLModel(this, <DataStore object>);


Old Home Page

Welcome to the wiki for the PSL software from the University of Maryland.

Probabilistic Soft Logic (PSL) is a machine learning framework for developing probabilistic models. PSL models are easy and fast: you can define them using a straightforward logical syntax and solve them with fast convex optimization. PSL has produced state-of-the-art results in many areas spanning natural language processing, social-network analysis, and computer vision.

Visit the getting started guide to use the PSL software.

FAQ

Table of Contents

Getting Started

Advanced Setup

Example Programs

Versioning

Groovy PSL

PSL Groovy syntax

Development


Ontology alignment example

After ensuring that the prerequisites and examples are installed, change to the directory containing the project of examples. Inside this project you will find the OntologyAlignment.groovy example. First, navigate to the parent directory of the OntologyAlignment.groovy example:

>> cd psl-example/src/main/java/edu/umd/cs/psl/example

Here, you will find OntologyAlignment.groovy. This example gives an instance of using PSL for the task of Ontology Alignment .


PSL Groovy Syntax


Predicate Declaration

Predicates are added to a PSLModel model (below) by using its add method.

model.add predicate: <predicateName>, types: [<argumentTypeOne>,...,<argumentTypeX>]

  • <predicateName> is the name of a predicate in quotes, e.g., "authorName"
  • <argumentTypeX> is the type of argument accepted by this predicate. Possible types include:
    • ConstantType.Double
    • ConstantType.Integer
    • ConstantType.Long
    • ConstantType.String
    • ConstantType.Date

An example of a declaration of a predicate that represents an author's name is:

model.add predicate: "authorName", types: [ConstantType.String]

An example of a predicate that represents a friendship between two people is:

model.add predicate: "Friends", types: [ConstantType.UniqueID, ConstantType.UniqueID]

To take a look at the relevant code look here.


Prior Declaration

The add method of a PSLModel model (below) is used to specify a rule.

m.add rule: ~<predicateName>, weight: <weight>;

We assume that most of the groundings of <predicateName> is false, and hence ~<predicateName> has positive weight.


Releasing a New Stable Version

This a HOWTO on releasing a new stable PSL version. All first and second level headers are steps in the process, and should be followed sequentially. To make it easier to understand, a complete example of all steps is given in new release checklist.

Preliminaries

Get the Code Ready

A release is a single commit that increments the software's version number to a stable version number and does nothing else. So, before you release a version, make sure all your changes are committed and pushed, and the code is in the state in which you want to release it.

Make sure the copyright notices are up to date.

Test the Code

Remember to test the code and double check it is ready for release. To complete a release build, you will need all dependencies used by PSL even if not used by your changes, for example the MOSEK add-on . Make sure there are no errors or bugs.

  1. Install the code. (Run mvn clean -P release and mvn install -P release.)
  2. Install and run the examples .

Find the New Version Number

Stable version numbers are of the format x.y or x.y.z, where

  • x = major version
  • y = minor version
  • z = patch version

The git branch the code is on (the working branch) should already have a version number in its pom.xml files of the form x.y.z-SNAPSHOT. Whatever x.y.z-SNAPSHOT is, the new version will be x.y.z. Note that the patch version is not written if it is 0. For example, version 1.1 is always written as x.y, not x.y.z. If the new version is just of the form x.y, ignore the ".z" in the below instructions.

Create the Stable Release

Change the Version

The first step is to change the version number to the stable version number. Remember to perform the commit at the end of the instructions.

Tag the New Stable Version

Run the following two commands:

git tag -a x.y.z -m 'Version x.y.z'
git push origin x.y.z

Update Git Branches

There are two ways the branch structure of the Git repo can change because of a new stable version:

  1. The master branch might need to be updated
  2. The working branch might need to be deleted

Updating the Master Branch

The Master branch should always point to the commit of the highest stable version number, where x, y, and z are treated as separate orders of magnitude.

So, if the master branch points to version 1.2, then releasing 1.1.1 would not update the master branch, but releasing 1.2.1 or 1.3 would.

TODO: Update this after wiki restructuring. If you are updating the master branch, update the latest stable version number listed on the version page , the version changing page , the example installation page , and the new project page .

If you are updating the master branch, it should already be upstream of the new stable version. Substituting the working branch name for WORKING_BRANCH, simply run the following commands:

git checkout master
git pull origin WORKING_BRANCH
git push

Deleting the working branch

There should now be a working branch pointing to the tag "x.y.z" (and possibly the master branch). If the working branch is not the develop branch, it should probably be deleted (which deletes the branch name, not the commit itself). Don't delete the develop branch! Substituting the working branch name for WORKING_BRANCH, run the following commands:

git branch -d WORKING_BRANCH
git push origin :WORKING_BRANCH

Deploy New Stable Version

With the new stable version checked out, on a machine with file system access to the repository, in the top level directory of the project (the one with the PSL project pom.xml file, not any of the subprojects), run the following commands:

mvn clean -P release
mvn deploy -P release

Last Steps

Update Change Log

Update the change log with a list of the main changes since the most recent upstream stable version. For example, if releasing 1.0.2, list the main changes since 1.0.1, even if there is a more recent 1.1 release.

Announce New Release

Post an announcement on the user group . Remember to select the "make an announcement" option, rather than "start a discussion." Here is a template:

Subject: New Version: x.y.z

A new stable version of PSL, version x.y.z (https://github.com/linqs/psl/tree/x.y.z) is now available.

See [switching the PSL version your program uses](switching the PSL version your program uses) for instructions on changing your PSL projects to the new version.

In version x.y.z:
[A list of the main changes]

Rule Specification in Groovy

The add method of PSLModel model (below) is used to specify a rule.

model.add rule : ( B1(V1,V2) & B2(V3,V4) & ... & B5(V5,V6) ) >> H(V1,V3,V6), weight : <weight>

  • B1,B2,...B5, and H are predicate symbols
  • V1,V2,...,V6 are arguments of the predicates. Variables are in upper case. To specify constants as arguments, the is operator is used, e.g., ( B1(V1,V2) & V1.is("constant1") ) >> H(V1,V2). A literal can be negated if all its arguments appear in non-negated literals.
  • & is the logical and operator, and >> is the implication operator
  • <weight> is a real-number that is the weight of the rule. If the weight is to be learned from data, then the specified weight is ignored. Otherwise, it is used during inference.
  • To specify infinite weight(i.e. a hard rule) use constraint : true instead of weight : <weight>.

Running a PSL program as a Markov Logic program

PSL includes implementations of Markov Logic inference algorithms. You can use them in your inference and learning applications by setting the following configuration options. Note that these implementations do not support all constraints allowed in PSL. If your program's constraint set does not decompose over atoms, (i.e., each atom participates in at most one constraint), then they will throw exceptions.

Inference

MPEInference and LazyMPEInference can use MaxWalkSat (MPE inference) and MC-Sat (marginal inference) with the following configuration options. Marginal probabilities will be set as the atoms' truth values.

# Sets MPEInference to perform Markov Logic MPE inference
<bundle>.mpeinference.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMaxWalkSatFactory
# Sets MPEInference to perform Markov Logic marginal inference
<bundle>.mpeinference.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMCSatFactory

# Sets LazyMPEInference to perform Markov Logic MPE inference
<bundle>.lazympeinference.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMaxWalkSatFactory
# Sets LazyMPEInference to perform Markov Logic marginal inference
<bundle>.lazympeinference.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMCSatFactory

Weight Learning

Weight learning that uses a reasoner for MPE inference as a subroutine (e.g., MaxLikelihoodMPE, LazyMaxLikelihoodMPE) can also use Markov Logic MPE inference.

<bundle>.weightlearning.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMaxWalkSatFactory

MaxPseudoLikelihood also supports Markov Logic weight learning.

<bundle>.maxpseudolikelihood.bool = true

Running a program

To run a PSL program, change to the top-level directory of its project (the directory with the Maven pom.xml file).

Compile your project:

mvn compile

Now use Maven to generate a classpath for your project's dependencies:

mvn dependency:build-classpath -Dmdep.outputFile=classpath.out

You can now run a class with the command

java -cp ./target/classes:`cat classpath.out` <fully qualified class name>

where \ is the full name (package and class) of the class you want to run (e.g., edu.umd.cs.example.BasicExample).

Tips and troubleshooting

  • The classpath for the dependencies will need to be regenerated to incorporate any new dependencies or dependencies in new locations (such as when dependency versions have been changed).
  • PSL and PSL projects are configured to use the Groovy-Eclipse compiler for Maven to compile Groovy scripts. (The reference to Eclipse in its name signifies that it is based on the same compiler used in Eclipse, not that Eclipse is required.) This compiler creates regular Java class files from your Groovy scripts. The main methods generated for these class files run the scripts. Hence, the java command is used to run a script.
  • Classes can also be run with the command
mvn exec:java -Dexec.mainClass=<fully qualified class name>

The advantages are that the project does not need to be compiled separately and the classpath does not need to be generated or updated separately. The disadvantages are that the class output is preceded and succeeded by Maven output, exception stack traces are not printed by default (add the -e switch), and Maven adds some overhead to execution (sometimes a significant amount, especially on less powerful machines).


Switching the PSL version your program uses

To change the version of PSL your project uses, edit your project's pom.xml file. The POM will declare dependencies on one or more PSL artifacts, e.g.,

<dependencies>
    ...
    <dependency>
        <groupId>edu.umd.cs</groupId>
        <artifactId>psl-groovy</artifactId>
        <version>1.2.1</version>
    </dependency>
    ...
</dependencies>

Change the version element of each such dependency to a new version (all the same one) and rebuild.


Updating the Copyright Notice

Before releasing a new stable version, it is good to make sure that PSL's copyright notices are up to date. Scripts for doing that are below:

#!/bin/bash

# THIS VERSION ONLY WORKS FOR THE MAC OSX VERSION OF SED

die () {
    echo >&2 "$@"
    exit 1
}

[ "$#" -eq 2 ] || die "Two arguments, old and new end years, required"

export LANG=C

find * -not -path '*/\.*' -type f -exec sed -i "" "s_ \* Copyright 2013-$1 The Regents of the University of California_ \* Copyright 2013-$2 The Regents of the University of California_g" {} \;

find . -not -path '*/\.*' -type f -exec sed -i "" "s_  - Copyright 2013-$1 The Regents of the University of California_  - Copyright 2013-$2 The Regents of the University of California_g" {} \;

sed -i '' "s_Copyright 2013-$1 The Regents of the University of California_Copyright 2013-$2 The Regents of the University of California_g" NOTICE

echo "Remember to check the results of this script before committing!"

Updating the Version Number

This is a HOWTO on changing the version number in the PSL code base. In most, if not all, cases, this HOWTO should be followed as part of a larger one, such as Releasing a New Stable Version, not by itself.

Version Number Policy

A new version number should be applied as a new commit that does nothing else, so make sure you are working on a clean working copy with no uncommitted changes.

Version numbers consist of the following components:

  • x = major version
  • y = minor version
  • z = patch version

Your new version number should be of the form x.y.z (for a stable version) or x.y.z-SNAPSHOT (for an unstable version). Note that the patch version is not written if it is 0. For example, version 1.1 is always written as x.y, not x.y.z, and version 1.1-SNAPSHOT is always written as x.y-SNAPSHOT, not x.y.z-SNAPSHOT. If the new version is just of the form x.y or x.y-SNAPSHOT, ignore the ".z" in the below instructions.

All the occurrences of a PSL version number should be kept in sync, i.e., have the same value for all occurrences in all pom.xml files and other resources across all modules. In addition, only one commit in the entire Git repository should have a particular stable version number.

Edit the code

Version numbers appear as a module's version in its pom.xml file, as well as the version of parents and dependencies. The following list is all the occurrences of the version number in the PSL code (relative to the root directory, in the develop branch):

  1. pom.xml (1x: version)
  2. psl-addon/pom.xml (2x: version and parent version)
  3. psl-addon/psl-addon-mosek/pom.xml (3x: version, parent version, and psl-core dependency version)
  4. psl-archetype/pom.xml (2x: version and parent version)
  5. psl-archetype/psl-archetype-example/pom.xml (2x: version and parent version)
  6. psl-archetype/psl-archetype-example/src/main/resources/META-INF/maven/archetype-metadata.xml (1x: property default value)
  7. psl-archetype/psl-archetype-example/src/main/resources/archetype-resources/pom.xml (1x: psl-groovy dependency version)
  8. psl-archetype/psl-archetype-groovy/pom.xml (2x: version and parent version)
  9. psl-archetype/psl-archetype-groovy/src/main/resources/archetype-resources/pom.xml (1x: psl-groovy dependency version)
  10. psl-cli/pom.xml (3x: version, parent version, and psl-core dependency version)
  11. psl-cli/src/main/scripts/psl.sh (1x: version environment variable)
  12. psl-core/pom.xml (2x: version and parent version)
  13. psl-groovy/pom.xml (3x: version, parent version, and psl-core dependency version)
  14. psl-parser/pom.xml (3x: version, parent version, and psl-core dependency version)

Total line changes: 27

Remember to check the diff statistics before proceeding.

OSX Script for Changing Version Numbers in the Code

#!/bin/bash

# THIS VERSION ONLY WORKS FOR THE MAC OSX VERSION OF SED

die () {
    echo >&2 "$@"
    exit 1
}

[ "$#" -eq 2 ] || die "Two arguments, old and new versions, required"

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" pom.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-addon/pom.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-addon/psl-addon-mosek/pom.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/pom.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/psl-archetype-example/pom.xml

sed -i "" "s_<defaultValue>$1\</defaultValue>_<defaultValue>$2</defaultValue>_g" psl-archetype/psl-archetype-example/src/main/resources/META-INF/maven/archetype-metadata.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/psl-archetype-example/src/main/resources/archetype-resources/pom.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/psl-archetype-groovy/pom.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/psl-archetype-groovy/src/main/resources/archetype-resources/pom.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-cli/pom.xml

sed -i "" "s_export PSL\_VERSION=$1_export PSL\_VERSION=$2_g" psl-cli/src/main/scripts/psl.sh

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-core/pom.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-groovy/pom.xml

sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-parser/pom.xml

git diff --shortstat

echo "Does the above say 27 lines added and deleted? IF NOT, SOMETHING WENT WRONG!"

Linux Script for Changing Version Numbers in the Code

#!/bin/bash

# THIS VERSION ONLY WORKS FOR THE LINUX VERSION OF SED

die () {
    echo >&2 "$@"
    exit 1
}

[ "$#" -eq 2 ] || die "Two arguments, old and new versions, required"

sed -i "s_<version>$1</version>_<version>$2</version>_g" pom.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-addon/pom.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-addon/psl-addon-mosek/pom.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/pom.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/psl-archetype-example/pom.xml

sed -i "s_<defaultValue>$1</defaultValue>_<defaultValue>$2</defaultValue>_g" psl-archetype/psl-archetype-example/src/main/resources/META-INF/maven/archetype-metadata.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/psl-archetype-example/src/main/resources/archetype-resources/pom.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/psl-archetype-groovy/pom.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/psl-archetype-groovy/src/main/resources/archetype-resources/pom.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-cli/pom.xml

sed -i "s_export PSL\_VERSION=$1_export PSL\_VERSION=$2_g" psl-cli/src/main/scripts/psl.sh

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-core/pom.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-groovy/pom.xml

sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-parser/pom.xml

git diff --shortstat

echo "Does the above say 26 lines added and deleted? IF NOT, SOMETHING WENT WRONG!"

Commit

Commit the changes with one of the following commit messages.

If you are changing to a stable version, use:

Version x.y.z

If you are changing to a new snapshot version, use:

Started x.y.z-SNAPSHOT

Push your commit when finished.


Using Git

Getting started with Git

The Git website has information on installing Git, as do the GitHub guides mentioned below. This tutorial is helpful for learning how to use Git, and this tutorial is particularly helpful for SVN users.

Checking out branches which track remote branches

To use an existing branch in the remote repo on GitHub, create a tracking branch to track it. It can be kept in sync via git pull. For example to track the branch 'develop' (assuming the GitHub repo is named 'origin') run

>> git branch --track develop origin/develop

then

>> git checkout develop

Preparing to push a commit to the PSL repository or a fork on GitHub

Create a free account on GitHub. Then follow one of the following sets of instructions to set up Git and GitHub:

You can fork the PSL repository, which means that you create a fork hosted on GitHub. You then clone that repository to a local machine, make commits, and, optionally, push some or all of those commits back to the repository on GitHub. Those commits are then publicly available (unless you have paid GitHub for private hosting).


Weight Learning

The job of a Weight Learning Application is to use data to learn the weights of each rule in a PSL model.

##Syntax In weight learning we follow the structure below:

<WeightLearningApplication> weightLearner = new <WeightLearningApplication>(<model>, <targetDatabase>, <groundTruthDatabase>, <config>)

  • <model> is the model specified by your PSL program.
  • <targetDatabase> is a database which contains all of the atoms for which you would like to infer values. When you create this database, the target predicate will be open.
  • <groundTruthDatabase> is a database which contains the known values of the atoms for which you are inferring values in the targetDatabase. When you create this database the predicates should be closed.
  • <config> is your config bundle .

Weight Learning Applications include:

  • MaxLikelihoodMPE
  • MaxPseudoLikelihood
  • MaxMargin

After weight learning, the learned PSLModel can be printed using println model.

To see the weight learning code look here .