After ensuring that the prerequisites and examples are installed, change to the directory containing the project of examples. Inside this project you will find the BasicExample.groovy
example. First, navigate to the parent directory of the BasicExample.groovy
example:
>> cd psl-example/src/main/java/edu/umd/cs/example
Here, you will find BasicExample.groovy
. This file provides an example of using the Groovy PSL syntax for defining predicates and rules, loading predicate data, running basic inferences, and learning rule weights.
The first portion of a PSL program creates a model and defines configuration parameters for that model.
We create a ConfigBundle which loads properties from the file: /src/main/resources/psl.properties
ConfigManager cm = ConfigManager.getManager()
ConfigBundle config = cm.getBundle("basic-example")
Now, we create a DataStore
to enable database functionality for our PSL program, and provide the specified configuration parameters.
def defaultPath = System.getProperty("java.io.tmpdir")
String dbpath = config.getString("dbpath", defaultPath + File.separator + "basic-example")
DataStore data = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, dbpath, true), config)
Finally, with our DataStore
created, we can create a PSL model:
PSLModel m = new PSLModel(this, data)
'BasicExample.groovy' defines 4 simple predicates:
Network
m.add predicate: "Network", types: [ArgumentType.UniqueID, ArgumentType.UniqueID]
Name
m.add predicate: "Name", types: [ArgumentType.UniqueID, ArgumentType.String]
m.add predicate: "Knows", types: [ArgumentType.UniqueID, ArgumentType.UniqueID]
m.add predicate: "SamePerson", types: [ArgumentType.UniqueID, ArgumentType.UniqueID]
See defining predicate types for more information on defining predicate types.
In addition, BasicExample.groovy
defines the function:
m.add function: "SameName" , implementation: new LevenshteinSimilarity()
See defining functions for more information on defining functions. For the purposes of our example, SameName is a function that maps a pair of entities to 1 if their names are identical and 0 otherwise.
Next, we define ground terms (constants) to refer to two distinct social networks:
GroundTerm snA = data.getUniqueID(1)
GroundTerm snB = data.getUniqueID(2)
In addition to the above predicates and function, we also define the following rules which are written both with pseudo code and their corresponding PSL syntax below. For information on writing rules using PSL's syntax please see writing rules.
Network(A, snA)
AND Network(B, snB)
AND Name(A,X)
AND Name(B,Y)
AND SameName(X,Y)
) THEN SamePerson(A,B)
m.add rule : ( Network(A, snA) & Network(B, snB) & Name(A,X) & Name(B,Y)
& SameName(X,Y) ) >> SamePerson(A,B), weight : 5
Similarly, another rule we might define utilizes the knowledge of the SamePerson and Knows predicates to infer other values of the SamePerson predicate:
Network(A, snA)
AND Network(B, snB)
AND SamePerson(A,B)
AND Knows(A, Friend1)
AND Knows(B, Friend2)
) THEN SamePerson(Friend1, Friend2)
m.add rule : ( Network(A, snA) & Network(B, snB) & SamePerson(A,B) & Knows(A, Friend1)
& Knows(B, Friend2) ) >> SamePerson(Friend1, Friend2) , weight : 3.2
For more information on defining rules see defining rules.
After we define our rules, we then define constraints for our model.
m.add PredicateConstraint.PartialFunctional, on : SamePerson
m.add PredicateConstraint.PartialInverseFunctional, on : SamePerson
m.add PredicateConstraint.Symmetric, on : SamePerson
In this case, we restrict that each person can be aligned to at most one other person in the other social network. To do so, we define two partial functional constraints where the latter is on the inverse. We also say that samePerson must be symmetric, i.e., samePerson(p1, p2) == samePerson(p2, p1)
.
Finally, we can also define a prior, which incorporates our assumption about predicate values into the model. For example, the prior,
m.add rule: ~SamePerson(A,B), weight: 1
suggests that the model should assume that two people are not generally the same person.
In order to load data, we must define a partition. We define the evidencePartition
to store all of our knowledge about our predicates into the database.
def evidencePartition = data.getPartition("evidencePartition");
We can insert data into the data by manually inserting values, or loading from file.
To manually insert data, we define an inserter for the specified partition, and insert accordingly.
def insert = data.getInserter(Name, evidencePartition);
/* Social Network A */
insert.insert(1, "John Braker");
insert.insert(2, "Mr. Jack Ressing");
insert.insert(3, "Peter Larry Smith");
insert.insert(4, "Tim Barosso");
insert.insert(5, "Jessica Pannillo");
insert.insert(6, "Peter Smithsonian");
insert.insert(7, "Miranda Parker");
/* Social Network B */
insert.insert(11, "Johny Braker");
insert.insert(12, "Jack Ressing");
insert.insert(13, "PL S.");
insert.insert(14, "Tim Barosso");
insert.insert(15, "J. Panelo");
insert.insert(16, "Gustav Heinrich Gans");
insert.insert(17, "Otto v. Lautern");
To load data from a file, you simply call the InserterUtils
class to load delimited data. Here we show how to load data for the Network
and Knows
predicates from delimited text files, sn_network.txt
and sn_knows.txt
.
def dir = 'data'+java.io.File.separator+'sn'+java.io.File.separator;
insert = data.getInserter(Network, evidencePartition)
InserterUtils.loadDelimitedData(insert, dir+"sn_network.txt");
insert = data.getInserter(Knows, evidencePartition)
InserterUtils.loadDelimitedData(insert, dir+"sn_knows.txt");
Now that we have set up our model and data loading, we are ready to enable inference to predict the unknown values of our predicates.
We start by defining a second partition, targetPartition
, that holds the target values for which we want to predict. We then setup a database that takes in 3 arguments:
The syntax for this procedure is simple:
def targetPartition = data.getPartition("targetPartition");
Database db = data.getDatabase(targetPartition, [Network, Name, Knows] as Set, evidencePartition);
In order to make predictions, however, we must specify which atoms we want to predict (i.e., we must add such atoms to our targetPartition.
For this example, we add all combinations of user pairs by considering the UniqueIDs used by our data (as assigned in the Data Loading section above).
Set<GroundTerm> usersA = new HashSet<GroundTerm>();
Set<GroundTerm> usersB = new HashSet<GroundTerm>();
for (int i = 1; i < 8; i++)
usersA.add(data.getUniqueID(i));
for (int i = 11; i < 18; i++)
usersB.add(data.getUniqueID(i));
Map<Variable, Set<GroundTerm>> popMap = new HashMap<Variable, Set<GroundTerm>>();
popMap.put(new Variable("UserA"), usersA)
popMap.put(new Variable("UserB"), usersB)
DatabasePopulator dbPop = new DatabasePopulator(db);
dbPop.populate((SamePerson(UserA, UserB)).getFormula(), popMap);
dbPop.populate((SamePerson(UserB, UserA)).getFormula(), popMap);
Now that our database is prepared, we can run inference simply with the following call:
MPEInference inferenceApp = new MPEInference(m, db, config);
inferenceApp.mpeInference();
inferenceApp.close();
To view how our inference app performed, we print the results of our predictions by printing all atomic values of SamePerson
in our database:
println "Inference results with hand-defined weights:"
DecimalFormat formatter = new DecimalFormat("${symbol_pound}.${symbol_pound}${symbol_pound}");
for (GroundAtom atom : Queries.getAllAtoms(db, SamePerson))
println atom.toString() + "${symbol_escape}t" + formatter.format(atom.getValue());
When we defined our model, we specified a predefined weight
for each rule. It may be the case that we would rather learn an optimal weight for each rule. In order to do so, we must provide evidence data from which we can learn.
In our example, evidence would be the 'true' alignment of our social networks, which we can load into another partition.
Partition trueDataPartition = data.getPartition("trueDataPartition");
insert = data.getInserter(SamePerson, trueDataPartition)
InserterUtils.loadDelimitedDataTruth(insert, dir + "sn_align.txt");
where sn_align.txt
stores delimited data with truth values (e.g., 1,11,1.0 which says that the value of the atom SamePerson(1,11)=1.0
).
Once we have evidence available, we can run weight learning with a few short calls. First, we open a database with our true data as the readPartition, and specify which predicates possess values that are fully observed (in this case, only SamePerson
).
Database trueDataDB = data.getDatabase(trueDataPartition, [samePerson] as Set);
Then we call weight learning as follows:
MaxLikelihoodMPE weightLearning = new MaxLikelihoodMPE(m, db, trueDataDB, config);
weightLearning.learn();
weightLearning.close();
To see how our learning method did, we can view our weights by printing the model:
println "Learned model:"
println m
To test out our learned weights, we want to follow the process of data loading and populating again to load in a new example.
//Data Loading
Partition evidencePartition2 = data.getPartition("evidencePartition2");
insert = data.getInserter(Network, evidencePartition2)
InserterUtils.loadDelimitedData(insert, dir+"sn2_network.txt");
insert = data.getInserter(Name, evidencePartition2);
InserterUtils.loadDelimitedData(insert, dir+"sn2_names.txt");
insert = data.getInserter(Knows, evidencePartition2);
InserterUtils.loadDelimitedData(insert, dir+"sn2_knows.txt");
//Populating
def targetPartition2 = data.getPartition("targetPartition2");
Database db2 = data.getDatabase(targetPartition2, [Network, Name, Knows] as Set, evidencePartition2);
usersA.clear();
for (int i = 21; i < 28; i++)
usersA.add(data.getUniqueID(i));
usersB.clear();
for (int i = 31; i < 38; i++)
usersB.add(data.getUniqueID(i));
dbPop = new DatabasePopulator(db2);
dbPop.populate((SamePerson(UserA, UserB)).getFormula(), popMap);
dbPop.populate((SamePerson(UserB, UserA)).getFormula(), popMap);
And then, we run inference and print our results:
inferenceApp = new MPEInference(m, db2, config);
result = inferenceApp.mpeInference();
inferenceApp.close();
println "Inference results on second social network with learned weights:"
for (GroundAtom atom : Queries.getAllAtoms(db2, SamePerson))
println atom.toString() + "${symbol_escape}t" + formatter.format(atom.getValue());
PSL comes with several builtin similarity functions. If you have a need not captured by these functions, then you can also create customized similarity functions.
Name: Cosine Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.CosineSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Cosine_similarity
Name: Dice Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.DiceSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
Name: Jaccard Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.JaccardSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaccard_index
Name: Jaro Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.JaroSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.census.gov/srd/papers/pdf/rr91-9.pdf
Name: Jaro-Winkler Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Name: Level 2 Jaro-Winkler Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.Level2JaroWinklerSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Jaro-Winkler Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).
Name: Level 2 Levenshtein Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.Level2LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: A level 2 variation of Levenshtein Similarity. Level 2 means that tokens are broken up before comparison (see http://secondstring.sourceforge.net/javadoc/com/wcohen/ss/Level2.html).
Name: Level 2 Monge Elkan Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.Level2MongeElkanSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://www.aaai.org/Papers/KDD/1996/KDD96-044.pdf
Name: Levenshtein Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.LevenshteinSimilarity
Arguments: String, String
Return Type: Continuous
Description: https://en.wikipedia.org/wiki/Levenshtein_distance
Name: Same Initials
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.SameInitials
Arguments: String, String
Return Type: Discrete
Description: First splits the input strings on any whitespace and ensures both have the same number of tokens (returns 0 if they do not). Then, the first character of all the tokens are checked for equality (ignoring case and order of appearance). Note that all all character that are not alphabetic ASCII characters are considered equal (eg. all numbers and unicode are considered the same character).
Name: Same Number of Tokens
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.SameNumTokens
Arguments: String, String
Return Type: Discrete
Description: Checks same number of tokens (delimited by any whitespace).
Name: Sub String Similarity
Qualified Path: edu.umd.cs.psl.ui.functions.textsimilarity.SubStringSimilarity
Arguments: String, String
Return Type: Continuous
Description: If one input string is a substring of another, then the length of the substring divided by the length of the text is returned. 0 is returned if neither string is a substring of the other.
Version 1.2.1 (https://github.com/linqs/psl/tree/1.2.1)
Version 1.2 (https://github.com/linqs/psl/tree/1.2)
Version 1.1.1 (https://github.com/linqs/psl/tree/1.1.1)
Version 1.1 (https://github.com/linqs/psl/tree/1.1)
Version 1.0.2 (https://github.com/linqs/psl/tree/1.0.2)
Version 1.0.1 (https://github.com/linqs/psl/tree/1.0.1)
Version 1.0 (https://github.com/linqs/psl/tree/1.0)
Many components of the PSL software have modifiable parameters and options, called properties. Every property has a key, which is a string that should uniquely identify it.
These keys are organized into a namespace hierarchy, with each level separated by dots, e.g. <namespace>.<option>
.
Each PSL class can specify a namespace for the options used by the class and its subclasses. For example, the edu.umd.cs.psl.application.learning.weight.maxlikelihood.VotedPerceptron
weight learning class specifies the namespace votedperceptron
. Setting the configuration option votedperceptron.stepsize
allows you to control the size of the gradient descent update step in the VotedPerceptron weight learning class.
Every property has a type and a default value, which is the value the object will use unless a user overrides it. Every class with properties documents them by declaring their keys as public static final Strings, with Javadoc describing the corresponding property's type and semantics. Another public static final member declares the default value for that property.
Users of the PSL software can specify property values by grouping them into bundles, which are objects that implement the edu.umd.cs.psl.config.ConfigBundle
interface. Every bundle has a name and a map from property keys to values. A configurable component takes a ConfigBundle
as an argument in its constructor and queries it with a property key and a default value. If the bundle does not map the key to a value, it returns the provided default, e.g.
ConfigBundle cb;
stepsize = cb.getProperty('votedperceptron.stepsize', 0.1);
PSL components also pass their bundles to components that they create, so a user can group their property values into a single bundle, pass it into a component with which they interact, and the values will be used by the entire stack of components. Any properties that don't belong to a particular component will be ignored by that component.
PSL projects can specify different configuration bundles in a file named psl.properties
on the classpath. The standard location for this file is <project root>/src/main/resources/psl.properties
. Each key-value pair should be specified on its own line with a <bundle>.<key> = <value>
format. The following example sets options for the example
and test
bundles.
# This is an example properties file for PSL. # # Options are specified in a namespace hierarchy, with levels separated by '.'. # The top levels are called bundles. Use the ConfigManager class to access them. # Weight learning parameters # Parameters for voted perceptron algorithm # This property adaptively changes the step size of the updates example.votedperceptron.schedule = true # This property specifies the number of iterations of voted perceptron updates example.votedperceptron.numsteps = 700 # This property specifies the initial step size of the voted perceptron updates example.votedperceptron.stepsize = 0.1 # Parameters for the Hard-EM weight learning algorithm # This property specifies the number of Hard-EM updates test.em.iterations = 1000 # This property specifies the tolerance to check for convergence for Hard-EM test.em.tolerance = 1e-5
The standard way to create bundles is with an instance of the edu.umd.cs.psl.config.ConfigManager
class.
ConfigManager
uses the Singleton pattern. The ConfigManager
instance will read psl.properties
to generate bundles. Then a bundle can be instantiated with the code
ConfigBundle bundle = ConfigManager.getManager().getBundle("example");
After ensuring that the prerequisites are installed, execute the following command:
mvn archetype:generate -DarchetypeArtifactId=psl-archetype-groovy \
-DarchetypeRepository=https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-releases/ \
-DarchetypeGroupId=edu.umd.cs -DarchetypeVersion=1.2.1
When prompted to accept the default property values, enter 'Y'.
You can replace the version number at the end with the PSL version you want to use. This page shows the different versions that have been released.
The Maven archetype plugin will then create a new project in which you can write PSL programs. The project will be configured to use the Maven project-management tool. You should be prompted for an a group ID (a Maven project namespace, just like a Java package), artifact ID (project name), and a version number for your project, as well as a name for the first Java package to create, which defaults to the specified group ID.
The PSL libraries will be downloaded automatically (if necessary) when you use Maven to compile and run this project.
The project will be set up with configuration files in
<project root>/src/main/resources
You can place Java and Groovy source files in
<project root>/src/main/java
A stub Groovy script will be created at
<project root>/src/main/java/<package path>/App.groovy
which you can run.
Tips and troubleshooting
To read in the truth values of ground atoms from text files, a DataStore
object is required.
DataStore data = new RelationalDataStore(pslModel, entityid : 'string');
data.setup db : DatabaseDriver.H2, type : memory; //in-memory database
//data.setup db : DatabaseDriver.H2; //persistent database
In the code snippet above, the RelationalDataStore
constructor takes a PSLModel object as its first argument. entityid : 'string'
in the second argument indicates that the arguments of ground atoms can be any text value. If the second argument is omitted, all arguments of ground atoms must be integers (corresponding to IDs of the arguments). To store the database contents in RAM, use the type : memory
expression. To store its contents on disk, simply omit the expression.
After a DataStore
is created, we can read in the truth values of ground atoms from text files as follows:
insert = data.getInserter(<predicateName>)
insert.loadFromFile(<fileName>)
<predicateName>
is the name of the predicate whose ground atoms are to be read, and <fileName>
is the name of the file containing its ground atoms' truth values. If <predicateName>
is of type PredicateTypes.BooleanTruth
, then the file must contain tab-delimited rows, with each row corresponding to the arguments of true ground atoms. (The closed-world assumption is made, i.e., atoms not appearing in the file are assumed false.) If <predicateName>
is of type PredicateTypes.SoftTruth
, then the insert.loadFromFileWithTruth
method must be used instead of insert.loadFromFile
, and the last value of each row in the file must be a truth value in the range [0,1]. The default minimum truth value is 0.1. This can be changed by using PSLModel
's setDefaultActivationParameter(double)
method.
By default, the ground atoms and their truth values are read into partition 1 of DataStore
. The query predicates whose values are to be inferred should be read into another partition by a specifying partition ID as an argument: data.getInserter(<predicateName>, <partionID>)
The following code snippet shows how to read in BooleanTruth
and SoftTruth
evidence ground atoms and SoftTruth
query atoms.
for (Predicate p : [<predicateName1>, <predicateName2>, ...]) //BooleanTruth evidence predicates
{
insert = data.getInserter(p);
insert.loadFromFile(p.getName()+".txt");//<predicateName> atoms are stored in <predicateName>.txt
}
for (Predicate p2 : [<predName1>, <predName2>,...]) //SoftTruth evidencepredicate
{
insert = data.getInserter(p2);
insert.loadFromFileWithTruth(p2.getName()+".txt");//note use of loadFromFileWithTruth
`}
for (Predicate q : [<queryPred1>,<queryPred2>,...]) //SoftTruth query predicate
{
insert = data.getInserter(q,2); //Partition 2 used to store query predicate ground atoms
insert.loadFromFileWithTruth(q.getName()+".txt"); //note use of loadFromFileWithTruth
}
loadDelimitedData()
and loadDelimitedDataTruth()
instead of loadFromFile()
and loadFromFileWithTruth()
respectively.If you are already comfortable using Git and you don't want or need to push commits to GitHub, then you can just clone the PSL repository using the command below. Otherwise, this short primer on some Git essentials may be useful.
>> git clone https://github.com/linqs/psl.git
Change to the top-level directory of your working copy and run
>> mvn compile
You can install PSL to your local Maven repository by running
>> mvn install
If you're a member of the LINQS group, you may eventually need to release a new version of PSL. There are a number of steps involved in the process, which are detailed in the guide for Releasing a New Stable Version.
A more sophisticated example of entity resolution on CiteSeer data is available here: https://github.com/linqs/er-example
Eclipse is an extensible, integrated development environment that can be used to develop PSL and PSL projects. The recommended way of using Eclipse with PSL is to use the Eclipse plugin for Maven to generate Eclipse project information for a PSL project and then import that project into Eclipse.
Ensure that you have version 3.6 (Helios) or higher of Eclipse installed. Then, install the Groovy Eclipse plugin and the optional 1.8 version of the Groovy compiler, which is available when installing the plugin. The version 1.8 compiler is what Maven will use to compile the Groovy scripts, so builds done by either tool should be interchangeable. If you use an older version, Eclipse will probably recompile some files which then won't be compatible with the rest, and it won't run. (Cleaning and rebuilding everything should help.)
You might have to change the Groovy compiler version to 1.8.x in your Groovy compiler preferences (part of the Eclipse preferences).
You need to add a classpath variable in Eclipse to point to your local Maven repository. You can access the variables either from the main options or from the build-path editor for any project. Where you specify additional libs, make a new variable (there should be a button) with the name M2_REPO
and the path to your repo (e.g., ~/.m2/repository
). This can also be achieved automatically via the following Maven command:
mvn -Declipse.workspace=/path/to/workspace eclipse:configure-workspace
In the top-level directory of your PSL project, run
>> mvn eclipse:eclipse
Then in Eclipse, go to File/Import/General/\
Be sure to run as a "Java application."
Tips
>> mvn eclipse:clean
>> mvn eclipse:eclipse -Declipse.workspace=<path to Eclipse workspace>
The Eclipse plugin for Maven will look in the provided workspace for any projects that match dependencies declared in your project's POM file. Your project will be configured to depend on any such projects found as opposed to their respective installed jars. This way, changes to the sources of those dependencies will be seen by your project without reinstalling the dependencies. Note that this works even for dependencies that were imported but not copied into the workspace.
After ensuring that the prerequisites and examples are installed, change to the directory containing the project of examples. Inside this project you will find the ExternalFunctionExample.groovy
example. First, navigate to the parent directory of the ExternalFunctionExample.groovy
example:
>> cd psl-example/src/main/java/edu/umd/cs/psl/example/external
Here, you will find ExternalFunctionExample.groovy
. This example provides an instance of calling an external Java function from within the Groovy PSL syntax.
From Pigi Kouki:
I run the MPE Inference with a relatively big dataset as input and I got the following error during the mpeInference step:
Exception in thread "main" java.lang.RuntimeException: Error executing database query.
at edu.umd.cs.psl.database.rdbms.RDBMSDatabase.executeQuery(RDBMSDatabase.java:612)
at edu.umd.cs.psl.model.atom.PersistedAtomManager.executeQuery(PersistedAtomManager.java:108)
at edu.umd.cs.psl.model.kernel.rule.AbstractRuleKernel.groundAll(AbstractRuleKernel.java:81)
at edu.umd.cs.psl.application.util.Grounding.groundAll(Grounding.java:59)
at edu.umd.cs.psl.application.util.Grounding.groundAll(Grounding.java:43)
at edu.umd.cs.psl.application.inference.MPEInference.mpeInference(MPEInference.java:106)
at edu.umd.cs.psl.application.inference.MPEInference$mpeInference.call(Unknown Source)
...
Caused by: org.h2.jdbc.JdbcSQLException: General error: "java.lang.NegativeArraySizeException"; SQL statement:
SELECT DISTINCT t1.UniqueID_0 AS U1,t1.UniqueID_1 AS P1,t2.UniqueID_1 AS U2 FROM RATING_predicate t1, SIM_USERS_predicate t2 WHERE ((t1.parti$
at org.h2.message.Message.getSQLException(Message.java:110)
at org.h2.message.Message.convert(Message.java:287)
at org.h2.message.Message.convert(Message.java:248)
at org.h2.command.Command.executeQuery(Command.java:134)
at org.h2.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:76)
at edu.umd.cs.psl.database.rdbms.RDBMSDatabase.executeQuery(RDBMSDatabase.java:575)
... 26 more
Caused by: java.lang.NegativeArraySizeException
at org.h2.util.ValueHashMap.reset(ValueHashMap.java:51)
at org.h2.util.ValueHashMap.rehash(ValueHashMap.java:58)
at org.h2.util.HashBase.checkSizePut(HashBase.java:79)
at org.h2.util.ValueHashMap.put(ValueHashMap.java:78)
at org.h2.util.ValueHashMap.rehash(ValueHashMap.java:62)
at org.h2.util.HashBase.checkSizePut(HashBase.java:79)
at org.h2.util.ValueHashMap.put(ValueHashMap.java:78)
at org.h2.util.ValueHashMap.rehash(ValueHashMap.java:62)
at org.h2.util.HashBase.checkSizePut(HashBase.java:79)
at org.h2.util.ValueHashMap.put(ValueHashMap.java:78)
at org.h2.result.LocalResult.addRow(LocalResult.java:262)
at org.h2.command.dml.Select.queryFlat(Select.java:499)
at org.h2.command.dml.Select.queryWithoutCache(Select.java:558)
at org.h2.command.dml.Query.query(Query.java:243)
at org.h2.command.CommandContainer.query(CommandContainer.java:81)
at org.h2.command.Command.executeQuery(Command.java:132)
... 28 more
I searched online and find the following useful link that helped me solve the problem
https://groups.google.com/forum/#!topic/h2-database/XeFtWY_vvBQ
So I needed to download the source code for
h2-1.2.126-sources.jar
(the version of h2 that PSL uses), change the line in HashBase.java file
maxSize = (int) (len * MAX_LOAD / 100L);
to
maxSize = (int) (((long)len) * MAX_LOAD / 100L);
and then create the new jar of this library and included it in the project.
A customized similarity function can be created by implementing the AttributeSimilarityFunction
interface in a Groovy file. It must return a value in [0,1]. For example:
class MyStringSimilarity implements AttributeSimilarityFunction
{
@Override
public double similarity(String a, String b) { return a.equals(b)?1.0:0.0; }
}
A function comparing the similarity between two entities or text can then be declared as follows:
m.add function: <functionName> , implementation: new <SimilarityFunction>()
<functionName>
is the name of the function, e.g., "sameName"
.<SimilarityFunction>
is the name of the class implementing the AttributeSimilarityFunction
interface, e.g., MyStringSimilarity
.A function can be used in the same manner as a predicate in rules.
PSL requires that you have Java installed .
The PSL jar file psl-cli-2.0-SNAPSHOT.jar
already contains all required PSL libraries that you need to be able to run your PSL programs. You can find a current snapshot of this .jar
file from our resources directory until we finalize our v2.0 release.
Let's first download the files for our example program, run it and see what it does!
In this program, we'll use information about known locations of some people and friendship networks between people to collectively infer where some other people live. This form of inference is called collective classification. We'll first run the program and see the output. We will be working from the command line so open up your shell or terminal.
You can download the files needed for our simple first example program from Simple CLI Example Files
This will a create a new PSLCLIFirstExample
directory in your current directory.
Change directories to the new PSLCLIFirstExample
that was created in your current directory in your open command line shell. From there, run the following command:
java -jar psl-cli-2.0-SNAPSHOT.jar -infer -model simple_cc.psl -data simple_cc.data
You should now see output that looks like this (note that the order of the output lines may differ):
data:: loading:: ::starting
data:: loading:: ::done
model:: loading:: ::starting
Model:
{10.0} ( KNOWS(P1, P2) & LIVES(P1, L) ) >> LIVES(P2, L) {squared}
{10.0} ( KNOWS(P2, P1) & LIVES(P1, L) ) >> LIVES(P2, L) {squared}
{2.0} ~( LIVES(P, L) ) {squared}
model:: loading:: ::done
operation::infer ::starting
operation::infer inference:: ::starting
operation::infer inference:: ::done
LIVES(Alex, Maryland) = 0.9086212203617681
LIVES(Jay, Maryland) = 1.0
LIVES(Ben, Maryland) = 1.0
LIVES(Steve, Maryland) = 0.9086212203617681
PERSON(Steve) = 1.0
PERSON(Ben) = 1.0
PERSON(Jay) = 1.0
PERSON(Alex) = 1.0
KNOWS(Steve, Ben) = 1.0
KNOWS(Alex, Jay) = 1.0
KNOWS(Steve, Jay) = 1.0
KNOWS(Alex, Ben) = 1.0
LOCATION(Maryland) = 1.0
operation::infer ::done
Now that we've run our first program that performs collective classification to infer where some people live based on some known facts about living locations and friendship links, let's understand the steps that we went through to infer the unknown values: defines the underlying model, provided data to the model and ran inference to classify the unknown values.
A model in PSL is a set of weighted logical rules.
The model is defined inside a text file with the format .psl
. We describe the collective location classification model in the file simple_cc.psl
. Let's have a look at the rules that make up our model:
10: Knows(P1,P2) & Lives(P1,L) -> Lives(P2,L) ^2
10: Knows(P2,P1) & Lives(P1,L) -> Lives(P2,L) ^2
2: ~Lives(P,L) ^2
The model is expressing the intuition that people that know one another live in the same location.
The integer values at the beginning of rules indicate the weight of the rule. Intuitively, this tells us the relative importance of satisfying this rule compared to the other rules.
The ^2
at the end of the rules indicates that the hinge-loss functions based on groundings of these rules are squared, for a smoother tradeoff. For more details on hinge-loss functions and squared potentials, see the publications on our PSL webpage.
Logical rules consist of predicates. The names of the predicates used in our model and possible substitutions of these predicates with actual entities from our network are defined inside the file simple_cc.data
. Let's have a look:
predicates:
Person/1: closed
Location/1: closed
Knows/2: closed
Lives/2: open
observations:
Person :
- person_obs.txt
- person_obs2.txt
Location : location_obs.txt
Knows : knows_obs.txt
Lives : lives_obs.txt
targets:
Lives : lives_targets.txt
truth:
Lives : lives_truth.txt
In the predicate
section, we list all the predicates that will be used in logical rules that define the model. The keyword open
indicates that we want to infer some substitutions of this predicate while closed
indicates that this predicate is fully observed. I.e. all substitutions of this predicate have known values and will behave as evidence for inference.
For our simple example, we fully observe the network of people that know each other and thus, knows
is a closed predicate. We know living locations for some of the people in the network but wish to infer the others, making lives
an open predicate.
In the observations
section, for each predicate for which we have observations, we specify the name of the .txt
file containing the observations. For example, knows_obs.txt
and lives_obs.txt
specifies which people know each other and where some of these people live, respectively.
The targets
section specifies a .txt
file that, for each open predicate, lists all substitutions of that predicate that we wish to infer. In lives_targets.txt
, we specify the people whose location we want to infer based on the knows
network and the known locations of some of the people.
The truth
section specifies a .txt
file that provides a set of ground truth observations for each open predicate. Here, we give the actual values for the lives
predicate for all the people in the network as training labels. We describe the the general data loading scheme in more detail in the sections below.
When we run the java -jar psl-cli-2.0-SNAPSHOT.jar -infer -model simple_cc.psl -data simple_cc.data
command with the -infer
flag, PSL's inference engine substitutes values from the data files into the logical rules of the collective location classification model and infers whether entities Steve
and Alex
live in Maryland
.
To create a PSL model, you should define a set of weighted logical rules in a .psl
file. Let's go over the basic logical syntax to write rules. Consider this very general rule form:
w: P(A,B) & Q(B,C) -> R(A,C) ^2
The first part of the rule, w
, is an integer value that specifies the weight of the rule. In this example, P
, Q
and R
are predicates. Logical rules consist of the rule "body" and rule "head." The body of the rule appears before the ->
which denotes logical implication. The body can have one or more predicates conjuncted together with the &
that denotes logical conjunctions. The head of the rule should be a single predicate. The predicates that appear in the body and head can be any combination of open and closed predicate types.
To see more examples of logically templated models in the command line interface, see the Command Line Interface Examples . For best practices, tips and tricks to design good, semantically meaningful models, see our Modeling Tips and Tricks .
In a .data
file, you should first define your predicates:
as shown in the above example. Use the open
and closed
keywords to characterize each predicate.
An closed
predicate is a predicate whose values are always observed. For example, the knows
predicate from the simple example is closed because we fully observe the entire network of people that know one another. On the other hand, an open
predicate is a predicate where some values may be observed, but some values are missing and thus, need to be inferred.
As shown above, then create your observations:
, targets:
and truth:
sections that list the names of .txt
files that specify the observed values for predicates, values you want to infer for open predicates and observed ground truth values for open predicates.
For all predicates, all possible substitutions should be specified either in the target files or in the observation files. The observations files should contain the known values for all closed predicates and can contain some of the known values for the open predicates. The target files tell PSL which substitutions of the open predicates it needs to infer. Target files cannot be specified for closed predicates as they are fully observed.
The truth files provide training labels in order learn the weights of the rules directly from data. This is similar to learning the weights of coefficients in a logistic regression model from training data. Weight learning is described below in greater detail.
Run inference with the general command:
java -jar psl-cli-2.0-SNAPSHOT.jar -infer -model [name of model file].psl -data [name of data file].data
When we run inference, the inferred values are outputted to the screen as shown for our example above. If you want to write the outputs to a file and use the inferred values in various ways downstream, you can use:
java -jar psl-cli-2.0-SNAPSHOT.jar -infer -model [name of model file].psl -data [name of data file].data -output [directory to write output files]
Values for all predicates will be output as .csv
files in the specified output directory.
With the inferred values, some downstream tasks that you can perform are:
We see above that in our example, we explicitly stated the weights for each rule. Think of these weights as dictating the relative importance of each rule, just as the weights of logistic regression or SVM features. Instead of explicitly giving the weights, we can also learn the weights from training labels.
To perform weight learning instead of inference, use the command:
java -jar psl-cli-2.0-SNAPSHOT.jar -learn -model [name of model file].psl -data [name of data file].data
Running the weight learning command outputs a .psl
model file with the learned weights and logical rules. You can use this produced model file for running inference with the learned model.
PSL provides gradient-descent based weight learning algorithms that treat the files specified in the truth:
section of your .data
file as the training labels.
PSL requires that you have Java installed. To use the Groovy interface it is also required that you install maven.
Application builders and advanced users can integrate PSL into their code as a library. Since the PSL codebase is organized as a Maven project, it is easiest to include PSL as a dependency via Maven.
The PSL codebase is organized as a Maven project with several subprojects. The subproject most likely of interest is psl-core
, but stable versions of all the subprojects are published to the PSL Maven repository . Including a PSL subproject in your Maven project is easy. It requires two steps
First, add psl-core
(and any other subprojects) as dependencies to your pom.xml
file:
<dependencies>
...
<dependency>
<groupId>edu.umd.cs</groupId>
<artifactId>psl-core</artifactId>
<version>1.2.1</version>
</dependency>
...
</dependencies>
Second, specify the location of the PSL Maven repository in your pom.xml
file, anywhere within the <project> </project>
tags:
<repositories>
<repository>
<releases>
<enabled>true</enabled>
<updatePolicy>daily</updatePolicy>
<checksumPolicy>fail</checksumPolicy>
</releases>
<id>psl-releases</id>
<name>PSL Releases</name>
<url>https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-releases/</url>
<layout>default</layout>
</repository>
<repository>
<releases>
<enabled>true</enabled>
<updatePolicy>daily</updatePolicy>
<checksumPolicy>fail</checksumPolicy>
</releases>
<id>psl-thirdparty</id>
<name>PSL Third Party</name>
<url>https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-thirdparty/</url>
<layout>default</layout>
</repository>
</repositories>
Maven will now make the the required PSL libraries and their dependencies available when compiling and running your project.
The PSL software uses concepts from the PSL paper , and introduces new ones for advanced data management and machine learning. On this page, we define the commonly used terms and point out the corresponding classes in the codebase.
Please note that this page is organized conceptually, not alphabetically.
Hinge-loss Markov random field: A factor graph defined over continuous variables in the [0,1] interval with (log) factors that are hinge-loss functions. Many classes in PSL work together to implement the functionality of HL-MRFs, but the class for storing collections of hinge-loss potentials, which define HL-MRFs, is GroundRuleStore.java .
Ground atom: A logical relationship corresponding to a random variable in a HL-MRF. For example, Friends("Steve", "Jay")
is an alias for a specific random variable. Implemented in GroundAtom.java .
Random variable atom: A ground atom that is unobserved, i.e., no value is known for it. A HL-MRF assigns probability densities to assignments to random variable atoms. Implemented in RandomVariableAtom.java .
Observed atom: A ground atom that has an observed, immutable value. HL-MRFs are conditioned on observed atoms. Implemented in ObservedAtom.java .
Atom: A generalization of ground atoms that allow logical variables as placeholders for constant arguments. For example, Friends("Steve", A)
is a placeholder for all the ground atoms that can be obtained by substituting constants for the logical variable A
. Implemented in Atom.java .
PSL Program: A set of rules, each of which is a template for hinge-loss potentials or hard linear constraints. When grounded over a base of ground atoms, a PSL program induces a HL-MRF conditioned on any specified observations. Implemented in Model.java .
Rule:
Logical rule:
Arithmetic rule:
Unweighted rule:
Weighted rule:
Data Store: An entire data repository, such as a relational database management system (RDBMS). Implemented in DataStore.java .
Partition: A logical division of ground atoms in a data store. Implemented in Partition.java .
Database: A logical view of a data store, constructed by specifying a write partition and one or more read partitions of a data store. Implemented in Database.java .
Open Predicate: A predicate whose atoms can be random variable atoms, i.e., unobserved.The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.
Closed Predicate: A predicate whose atoms are always observed atoms. The only time a ground atom will be loaded as a random variable atom is when it is stored in the database's write partition and its predicate is not specified as closed. Otherwise it will be loaded as an observed atom. Whether a predicate is open or closed is specific to each database.
At the top of a Groovy file, you must import all the relevant Java and Groovy packages that you wish to use. The list below is a standard set that may be helpful to import in your program
import edu.umd.cs.psl.config.*
import edu.umd.cs.psl.groovy.*;
import edu.umd.cs.psl.database.DataStore;
import edu.umd.cs.psl.database.rdbms.RDBMSDataStore;
import edu.umd.cs.psl.database.rdbms.driver.H2DatabaseDriver;
import edu.umd.cs.psl.database.rdbms.driver.H2DatabaseDriver.Type;
import edu.umd.cs.psl.model.atom.GroundAtom;
import edu.umd.cs.psl.model.atom.QueryAtom;
import edu.umd.cs.psl.model.predicate.Predicate;
import edu.umd.cs.psl.model.term.*;
Welcome to the PSL software Wiki!
To get started with PSL you can follow one of these guides:
PSL requires Java, so before you start make sure that you have Java installed.
Before you get started you may want to learn more about PSL.
PSL is a machine learning framework for building probabilistic models developed by the Statistical Relational Learning Group LINQS at the University of Maryland and the University of California Santa Cruz. PSL has produced state-of-the-art results in many areas spanning natural language processing, social-network analysis, and computer vision. The complete list of publications and projects is available on the PSL homepage . The homepage also has several videos , to introduce you to PSL.
We are improving PSL all the time, and now have two versions! If you are migrating from PSL 1.0 to 2.0 please refer to our Migration Guide.
If you use H2 as the backend database for PSL (as is done in the examples), it can be helpful to open up the resulting database and examine it for debugging purposes.
You should set up your PSL program to use H2 on disk and note where it is stored. For example, if you create your DataStore using the following code
DataStore data = new RDBMSDataStore(new H2DatabaseDriver(Type.Disk, "/home/steve/psl", true), config);
then PSL will create an H2 database in the file /home/steve/psl/psl.h2.db
. Then, run your program so the resulting H2 database can be inspected.
You will need to use the H2 jar for your classpath. This is likely ~/.m2/repository/com/h2database/h2/1.2.126/h2-1.2.126.jar
, but you will need to modify it if, for example, you're using a different version of H2. You start the H2 web server by running the following command:
>> java -cp ~/.m2/repository/com/h2database/h2/1.2.126/h2-1.2.126.jar org.h2.tools.Server
Once you have started the web server, you can access it at http://localhost:8082
. To log in, you should change the connection string to point to your H2 database file without .h2.db on the end. The username and password are both empty strings.
Open up your terminal and type
java -version
You should see something like:
java version "1.x.0_y"
.
If you see
java: command not found
please download and install Java.
PSL uses Maven to manage builds and dependencies. Users should install Maven 3.x. PSL is developed with Maven and PSL programs are created as Maven projects. See running Maven for help using Maven to build projects.
To set up the examples, change to the directory in which you want to create the project of examples.
Then execute the following command:
mvn archetype:generate -DarchetypeArtifactId=psl-archetype-example \
-DarchetypeRepository=https://scm.umiacs.umd.edu/maven/lccd/content/repositories/psl-releases/ \
-DarchetypeGroupId=edu.umd.cs -DarchetypeVersion=1.2.1
When prompted to accept the default property values, enter 'Y'.
You can replace the version number at the end with the PSL version you want to use.
The Maven archetype plugin will then create a new project of PSL examples. The project will be configured to use the Maven project-management tool. The PSL libraries will be downloaded automatically (if necessary) when you use Maven to compile and run this project.
You can now run the example PSL programs.
Tips and troubleshooting
PSL uses SLF4J for logging. In the PSL Groovy program template, SLF4J is bound to Log4j 1.2. The Log4j configuration file is located at src/main/resources/log4j.properties
. It should look something like this:
# Set root logger level to the designated level and its only appender to A1.
log4j.rootLogger=ERROR, A1
# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
The logging verbosity can be set by changing ERROR
in the second line to a different level and recompiling. Options include OFF
, WARN
, DEBUG
, and TRACE
.
MOSEK is software for numeric optimization. PSL can use MOSEK as a conic program solver via a PSL add on.
First, install MOSEK 6. In addition to a commercial version for which a 30-day trial is currently available, the makers of MOSEK also currently offer a free academic license. Users will need the "PTS" base system for using the linear distribution of the ConicReasoner
and the "PTON" non-linear and conic extension to use the quadratic distribution. Both of these components are currently covered by the academic license.
After installing MOSEK, install the included mosek.jar
file to your local Maven repository. (This file should be in <mosek-root>/6/tools/platform/<your-platform>/bin
.)
mvn install:install-file -Dfile=<path-to-mosek.jar> -DgroupId=com.mosek \
-DartifactId=mosek -Dversion=6.0 -Dpackaging=jar
Next, add the following dependency to your project's pom.xml
file:
<dependencies>
...
<dependency>
<groupId>edu.umd.cs</groupId>
<artifactId>psl-addon-mosek</artifactId>
<version>YOUR-PSL-VERSION</version>
</dependency>
...
</dependencies>
where YOUR-PSL-VERSION
is replaced with your PSL version .
Finally, it might be necessary to rebuild your project.
After installing the MOSEK add on, you can use it where ever a ConicProgramSolver
is used. To use it for inference with a ConicReasoner
set the conicreasoner.conicprogramsolver
configuration property to edu.umd.cs.psl.optimizer.conic.mosek.MOSEKFactory
.
Further, MOSEK requires that two environment variables be set when running. The same bin
directory where you found mosek.jar
needs to be on the path for shared libraries. The environment variable MOSEKLM_LICENSE_FILE
needs to be set to the path to your license file (usually <mosek-root>/6/licenses/mosek.lic
).
In bash in Linux, this can be done with the commands
export LD_LIBRARY_PATH=<path_to_mosek_installation>/mosek/6/tools/platform/<platform>/bin
export MOSEKLM_LICENSE_FILE=<path_to_mosek_installation>/mosek/6/licenses/mosek.lic
On Mac OS X, instead set DYLD_LIBRARY_PATH
to the directory containing the MOSEK binaries.
After importing all relevant Java and Groovy files, you create a PSL model to contain predicates and rules, and to learn weight and perform inference as follows: PSLModel model = new PSLModel(this, <DataStore object>);
Welcome to the wiki for the PSL software from the University of Maryland.
Probabilistic Soft Logic (PSL) is a machine learning framework for developing probabilistic models. PSL models are easy and fast: you can define them using a straightforward logical syntax and solve them with fast convex optimization. PSL has produced state-of-the-art results in many areas spanning natural language processing, social-network analysis, and computer vision.
Visit the getting started guide to use the PSL software.
After ensuring that the prerequisites and examples are installed, change to the directory containing the project of examples. Inside this project you will find the OntologyAlignment.groovy
example. First, navigate to the parent directory of the OntologyAlignment.groovy
example:
>> cd psl-example/src/main/java/edu/umd/cs/psl/example
Here, you will find OntologyAlignment.groovy
. This example gives an instance of using PSL for the task of Ontology Alignment .
Predicates are added to a PSLModel model
(below) by using its add
method.
model.add predicate: <predicateName>, types: [<argumentTypeOne>,...,<argumentTypeX>]
<predicateName>
is the name of a predicate in quotes, e.g., "authorName"
<argumentTypeX>
is the type of argument accepted by this predicate. Possible types include: ConstantType.Double
ConstantType.Integer
ConstantType.Long
ConstantType.String
ConstantType.Date
An example of a declaration of a predicate that represents an author's name is:
model.add predicate: "authorName", types: [ConstantType.String]
An example of a predicate that represents a friendship between two people is:
model.add predicate: "Friends", types: [ConstantType.UniqueID, ConstantType.UniqueID]
To take a look at the relevant code look here.
The add
method of a PSLModel model
(below) is used to specify a rule.
m.add rule: ~<predicateName>, weight: <weight>;
We assume that most of the groundings of <predicateName>
is false, and hence ~<predicateName>
has positive weight.
This a HOWTO on releasing a new stable PSL version. All first and second level headers are steps in the process, and should be followed sequentially. To make it easier to understand, a complete example of all steps is given in new release checklist.
A release is a single commit that increments the software's version number to a stable version number and does nothing else. So, before you release a version, make sure all your changes are committed and pushed, and the code is in the state in which you want to release it.
Make sure the copyright notices are up to date.
Remember to test the code and double check it is ready for release. To complete a release build, you will need all dependencies used by PSL even if not used by your changes, for example the MOSEK add-on . Make sure there are no errors or bugs.
mvn clean -P release
and mvn install -P release
.)Stable version numbers are of the format x.y or x.y.z, where
The git branch the code is on (the working branch) should already have a version number in its pom.xml
files of the form x.y.z-SNAPSHOT. Whatever x.y.z-SNAPSHOT is, the new version will be x.y.z. Note that the patch version is not written if it is 0. For example, version 1.1 is always written as x.y, not x.y.z. If the new version is just of the form x.y, ignore the ".z" in the below instructions.
The first step is to change the version number to the stable version number. Remember to perform the commit at the end of the instructions.
Run the following two commands:
git tag -a x.y.z -m 'Version x.y.z'
git push origin x.y.z
There are two ways the branch structure of the Git repo can change because of a new stable version:
The Master branch should always point to the commit of the highest stable version number, where x, y, and z are treated as separate orders of magnitude.
So, if the master branch points to version 1.2, then releasing 1.1.1 would not update the master branch, but releasing 1.2.1 or 1.3 would.
TODO: Update this after wiki restructuring. If you are updating the master branch, update the latest stable version number listed on the version page , the version changing page , the example installation page , and the new project page .
If you are updating the master branch, it should already be upstream of the new stable version. Substituting the working branch name for WORKING_BRANCH, simply run the following commands:
git checkout master git pull origin WORKING_BRANCH git push
There should now be a working branch pointing to the tag "x.y.z" (and possibly the master branch). If the working branch is not the develop branch, it should probably be deleted (which deletes the branch name, not the commit itself). Don't delete the develop branch! Substituting the working branch name for WORKING_BRANCH, run the following commands:
git branch -d WORKING_BRANCH
git push origin :WORKING_BRANCH
With the new stable version checked out, on a machine with file system access to the repository, in the top level directory of the project (the one with the PSL project pom.xml file, not any of the subprojects), run the following commands:
mvn clean -P release mvn deploy -P release
Update the change log with a list of the main changes since the most recent upstream stable version. For example, if releasing 1.0.2, list the main changes since 1.0.1, even if there is a more recent 1.1 release.
Post an announcement on the user group . Remember to select the "make an announcement" option, rather than "start a discussion." Here is a template:
Subject: New Version: x.y.z
A new stable version of PSL, version x.y.z (https://github.com/linqs/psl/tree/x.y.z) is now available.
See [switching the PSL version your program uses](switching the PSL version your program uses) for instructions on changing your PSL projects to the new version.
In version x.y.z:
[A list of the main changes]
The add
method of PSLModel
model
(below) is used to specify a rule.
model.add rule : ( B1(V1,V2) & B2(V3,V4) & ... & B5(V5,V6) ) >> H(V1,V3,V6), weight : <weight>
B1,B2,...B5,
and H
are predicate symbolsV1,V2,...,V6
are arguments of the predicates. Variables are in upper case. To specify constants as arguments, the is
operator is used, e.g., ( B1(V1,V2) & V1.is("constant1") ) >> H(V1,V2)
. A literal can be negated if all its arguments appear in non-negated literals.&
is the logical and operator, and >>
is the implication operator<weight>
is a real-number that is the weight of the rule. If the weight is to be learned from data, then the specified weight is ignored. Otherwise, it is used during inference.constraint : true
instead of weight : <weight>
.PSL includes implementations of Markov Logic inference algorithms. You can use them in your inference and learning applications by setting the following configuration options. Note that these implementations do not support all constraints allowed in PSL. If your program's constraint set does not decompose over atoms, (i.e., each atom participates in at most one constraint), then they will throw exceptions.
MPEInference and LazyMPEInference can use MaxWalkSat (MPE inference) and MC-Sat (marginal inference) with the following configuration options. Marginal probabilities will be set as the atoms' truth values.
# Sets MPEInference to perform Markov Logic MPE inference
<bundle>.mpeinference.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMaxWalkSatFactory
# Sets MPEInference to perform Markov Logic marginal inference
<bundle>.mpeinference.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMCSatFactory
# Sets LazyMPEInference to perform Markov Logic MPE inference
<bundle>.lazympeinference.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMaxWalkSatFactory
# Sets LazyMPEInference to perform Markov Logic marginal inference
<bundle>.lazympeinference.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMCSatFactory
Weight learning that uses a reasoner for MPE inference as a subroutine (e.g., MaxLikelihoodMPE, LazyMaxLikelihoodMPE) can also use Markov Logic MPE inference.
<bundle>.weightlearning.reasoner = edu.umd.cs.psl.reasoner.bool.BooleanMaxWalkSatFactory
MaxPseudoLikelihood also supports Markov Logic weight learning.
<bundle>.maxpseudolikelihood.bool = true
To run a PSL program, change to the top-level directory of its project (the directory with the Maven pom.xml
file).
Compile your project:
mvn compile
Now use Maven to generate a classpath for your project's dependencies:
mvn dependency:build-classpath -Dmdep.outputFile=classpath.out
You can now run a class with the command
java -cp ./target/classes:`cat classpath.out` <fully qualified class name>
where \
Tips and troubleshooting
java
command is used to run a script.mvn exec:java -Dexec.mainClass=<fully qualified class name>
The advantages are that the project does not need to be compiled separately and the classpath does not need to be generated or updated separately. The disadvantages are that the class output is preceded and succeeded by Maven output, exception stack traces are not printed by default (add the -e
switch), and Maven adds some overhead to execution (sometimes a significant amount, especially on less powerful machines).
To change the version of PSL your project uses, edit your project's pom.xml
file. The POM will declare dependencies on one or more PSL artifacts, e.g.,
<dependencies>
...
<dependency>
<groupId>edu.umd.cs</groupId>
<artifactId>psl-groovy</artifactId>
<version>1.2.1</version>
</dependency>
...
</dependencies>
Change the version element of each such dependency to a new version (all the same one) and rebuild.
Before releasing a new stable version, it is good to make sure that PSL's copyright notices are up to date. Scripts for doing that are below:
#!/bin/bash
# THIS VERSION ONLY WORKS FOR THE MAC OSX VERSION OF SED
die () {
echo >&2 "$@"
exit 1
}
[ "$#" -eq 2 ] || die "Two arguments, old and new end years, required"
export LANG=C
find * -not -path '*/\.*' -type f -exec sed -i "" "s_ \* Copyright 2013-$1 The Regents of the University of California_ \* Copyright 2013-$2 The Regents of the University of California_g" {} \;
find . -not -path '*/\.*' -type f -exec sed -i "" "s_ - Copyright 2013-$1 The Regents of the University of California_ - Copyright 2013-$2 The Regents of the University of California_g" {} \;
sed -i '' "s_Copyright 2013-$1 The Regents of the University of California_Copyright 2013-$2 The Regents of the University of California_g" NOTICE
echo "Remember to check the results of this script before committing!"
This is a HOWTO on changing the version number in the PSL code base. In most, if not all, cases, this HOWTO should be followed as part of a larger one, such as Releasing a New Stable Version, not by itself.
A new version number should be applied as a new commit that does nothing else, so make sure you are working on a clean working copy with no uncommitted changes.
Version numbers consist of the following components:
Your new version number should be of the form x.y.z (for a stable version) or x.y.z-SNAPSHOT (for an unstable version). Note that the patch version is not written if it is 0. For example, version 1.1 is always written as x.y, not x.y.z, and version 1.1-SNAPSHOT is always written as x.y-SNAPSHOT, not x.y.z-SNAPSHOT. If the new version is just of the form x.y or x.y-SNAPSHOT, ignore the ".z" in the below instructions.
All the occurrences of a PSL version number should be kept in sync, i.e., have the same value for all occurrences in all pom.xml
files and other resources across all modules. In addition, only one commit in the entire Git repository should have a particular stable version number.
Version numbers appear as a module's version in its pom.xml
file, as well as the version of parents and dependencies.
The following list is all the occurrences of the version number in the PSL code (relative to the root directory, in the develop branch):
pom.xml
(1x: version)psl-addon/pom.xml
(2x: version and parent version)psl-addon/psl-addon-mosek/pom.xml
(3x: version, parent version, and psl-core dependency version)psl-archetype/pom.xml
(2x: version and parent version)psl-archetype/psl-archetype-example/pom.xml
(2x: version and parent version)psl-archetype/psl-archetype-example/src/main/resources/META-INF/maven/archetype-metadata.xml
(1x: property default value)psl-archetype/psl-archetype-example/src/main/resources/archetype-resources/pom.xml
(1x: psl-groovy dependency version)psl-archetype/psl-archetype-groovy/pom.xml
(2x: version and parent version)psl-archetype/psl-archetype-groovy/src/main/resources/archetype-resources/pom.xml
(1x: psl-groovy dependency version)psl-cli/pom.xml
(3x: version, parent version, and psl-core dependency version)psl-cli/src/main/scripts/psl.sh
(1x: version environment variable)psl-core/pom.xml
(2x: version and parent version)psl-groovy/pom.xml
(3x: version, parent version, and psl-core dependency version)psl-parser/pom.xml
(3x: version, parent version, and psl-core dependency version)Total line changes: 27
Remember to check the diff statistics before proceeding.
#!/bin/bash
# THIS VERSION ONLY WORKS FOR THE MAC OSX VERSION OF SED
die () {
echo >&2 "$@"
exit 1
}
[ "$#" -eq 2 ] || die "Two arguments, old and new versions, required"
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" pom.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-addon/pom.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-addon/psl-addon-mosek/pom.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/pom.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/psl-archetype-example/pom.xml
sed -i "" "s_<defaultValue>$1\</defaultValue>_<defaultValue>$2</defaultValue>_g" psl-archetype/psl-archetype-example/src/main/resources/META-INF/maven/archetype-metadata.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/psl-archetype-example/src/main/resources/archetype-resources/pom.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/psl-archetype-groovy/pom.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-archetype/psl-archetype-groovy/src/main/resources/archetype-resources/pom.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-cli/pom.xml
sed -i "" "s_export PSL\_VERSION=$1_export PSL\_VERSION=$2_g" psl-cli/src/main/scripts/psl.sh
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-core/pom.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-groovy/pom.xml
sed -i "" "s_<version>$1\</version>_<version>$2</version>_g" psl-parser/pom.xml
git diff --shortstat
echo "Does the above say 27 lines added and deleted? IF NOT, SOMETHING WENT WRONG!"
#!/bin/bash
# THIS VERSION ONLY WORKS FOR THE LINUX VERSION OF SED
die () {
echo >&2 "$@"
exit 1
}
[ "$#" -eq 2 ] || die "Two arguments, old and new versions, required"
sed -i "s_<version>$1</version>_<version>$2</version>_g" pom.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-addon/pom.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-addon/psl-addon-mosek/pom.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/pom.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/psl-archetype-example/pom.xml
sed -i "s_<defaultValue>$1</defaultValue>_<defaultValue>$2</defaultValue>_g" psl-archetype/psl-archetype-example/src/main/resources/META-INF/maven/archetype-metadata.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/psl-archetype-example/src/main/resources/archetype-resources/pom.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/psl-archetype-groovy/pom.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-archetype/psl-archetype-groovy/src/main/resources/archetype-resources/pom.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-cli/pom.xml
sed -i "s_export PSL\_VERSION=$1_export PSL\_VERSION=$2_g" psl-cli/src/main/scripts/psl.sh
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-core/pom.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-groovy/pom.xml
sed -i "s_<version>$1</version>_<version>$2</version>_g" psl-parser/pom.xml
git diff --shortstat
echo "Does the above say 26 lines added and deleted? IF NOT, SOMETHING WENT WRONG!"
Commit the changes with one of the following commit messages.
If you are changing to a stable version, use:
Version x.y.z
If you are changing to a new snapshot version, use:
Started x.y.z-SNAPSHOT
Push your commit when finished.
The Git website has information on installing Git, as do the GitHub guides mentioned below. This tutorial is helpful for learning how to use Git, and this tutorial is particularly helpful for SVN users.
To use an existing branch in the remote repo on GitHub, create a tracking branch to track it. It can be kept in sync via git pull
. For example to track the branch 'develop' (assuming the GitHub repo is named 'origin') run
>> git branch --track develop origin/develop
then
>> git checkout develop
Create a free account on GitHub. Then follow one of the following sets of instructions to set up Git and GitHub:
You can fork the PSL repository, which means that you create a fork hosted on GitHub. You then clone that repository to a local machine, make commits, and, optionally, push some or all of those commits back to the repository on GitHub. Those commits are then publicly available (unless you have paid GitHub for private hosting).
The job of a Weight Learning Application is to use data to learn the weights of each rule in a PSL model.
##Syntax In weight learning we follow the structure below:
<WeightLearningApplication> weightLearner = new <WeightLearningApplication>(<model>, <targetDatabase>, <groundTruthDatabase>, <config>)
<model>
is the model specified by your PSL program. <targetDatabase>
is a database which contains all of the atoms for which you would like to infer values. When you create this database, the target predicate will be open. <groundTruthDatabase>
is a database which contains the known values of the atoms for which you are inferring values in the targetDatabase. When you create this database the predicates should be closed. <config>
is your config bundle . Weight Learning Applications include:
MaxLikelihoodMPE
MaxPseudoLikelihood
MaxMargin
After weight learning, the learned PSLModel
can be printed using println model
.
To see the weight learning code look here .