## Justin's Graphical Models / Conditional Random Field ToolboxThis is a Matlab/C++ "toolbox" of code for learning and inference with graphical models. It is focused on parameter learning using marginalization in the high-treewidth setting. Though the code is, in principle, domain independent, I've developed it with vision problems in mind, particularly for learning Conditional Random Fields (CRFs). This means that the code is A) efficient (all the inference algorithms are implemented in C++) and B) can handle arbitrary graph structures. There are, at present, a bunch of limitations: - All the inference algorithms are for
*marginal*inference. No MAP inference, at all. - The code handles pairwise graphs only
- All variables must have the same number of possible values.
- For tree-reweighted belief propagation, a
*single*edge appearance probability must be used for all edges
Size: (non-comment lines of code) - Matlab: 4204
- C++: 1322
The code can be downloaded here. Note that it includes a set of binaries for various things for efficiency. I've included binaries for 64 bit OS X and 64 bit linux. If you compile the code for a different platform, please email me the binaries so they could be included for others. - Version 1 (Nov 7 2011): JGMT1.zip Initial Version. (Obsolete!)
- Version 2 (Jan 4 2012): JGMT2.zip Bugfixes, binaries for linux, new generic train_crf inferface, multithreaded back-TRW switch from Eigen 2 to Eigen 3.
Examples The code can be used in a bunch of different ways, depending on if you are looking for a specific tool to use, or a large framework. - Just use the low-level Inference algorithms, namely:
- Tree-Reweighted Belief Propagation + variants (Loopy BP, TRW-S)
- Mean-field
- Use the Differentiation methods (back-TRW or implicit differentiation) to calculate parameter gradients by providing your own loss functions. Do everything else on your own.
- Use the Loss methods (E.M., implicit_loss) to calculate parameter gradients by providing a true vector x and a loss name (univariate likelihood, clique likelihood, etc.) Unlike the above usages, these methods explicitly consider the conditional learning setting where one has an input and an output.
- Use the CRF methods to do almost everything (deal with parameter ties for a specific type of model, etc.) These methods consider specific classes of CRFs and given and input, output, loss function, inference method, etc. give the parameter gradient. Employing this gradient in a learning framework is straightforward.
Paper: - If you want to understand the algorithms being used, they are described in great detail in this draft paper
## Example: A simple CRF for binary denoisingFirst, set up the parameters of the problemN = 5; % number of training images siz = 50; % size of training images rho = .5; % TRW edge appearance probability nvals = 2; % this problem is binaryNext, generate the graph for this CRF. (A simple pairwise grid) model = gridmodel(siz,siz,nvals);Now, generate the data. Basically, we make noisy images, then smooth them to make the true (discrete) output values, and then add noise to make the input. for n=1:N x{n} = round(imfilter(rand(siz),fspecial('gaussian',50,7),'same','symmetric')); t = rand(size(x{n})); noiselevel = 1.25; y{n} = x{n}.*(1-t.^noiselevel) + (1-x{n}).*t.^noiselevel; endIn this case, the data looks like this. First the true labels x, and then the noisy inputs y.
Next, we make the features and the labels. The features consist of simply a constant of one, and the input image y itself. for n=1:N feats{n} = [y{n}(:) 1+0*x{n}(:)]; labels{n} = x{n}+1; end % no edge features here efeats = []; % none Next, a visualization function. This takes a cell array of predicted beliefs as input, and shows them to the screen during training. This is totally optional, but very useful if you want to understand what is happening in your training run. % visualization function function viz(b_i) % here, b_i is a cell array of size nvals X nvars for n=1:N subplot(3,N,n ); imshow(reshape(b_i{n}(2,:),siz,siz)); subplot(3,N,n+ N); imshow(reshape(feats{n}(:,1),siz,siz)); subplot(3,N,n+2*N); imshow(reshape(labels{n}-1,siz,siz)); end xlabel('top: marginals middle: input bottom: labels') drawnow end Next, we pick a string to specify the loss and inference method. In this case, we choose truncated fitting with the clique logistic loss based on TRW with five iterations. ```
loss_spec = 'trunc_cl_trw_5';
```
Other options include 'pert_ul_trw_1e5'
(perturbation, univariate logistic loss, TRW, threshold of 1e-5) 'em_mnf_1e5' (Surrogate Expectation-Maximization
based on mean-field with a threshold of 1e-5 (simplifies
to surrogate likelihood with no hidden variables) or 'trunc_em_trwpll_10' (Truncated surrogate EM based
on multithreaded TRW with 10 iterations).
Next, we set up some parameters for the training optimization. crf_type = 'linear_linear'; options.derivative_check = 'off'; options.viz = @viz; options.rho = rho; options.print_times = 1; options.nvals = nvals;Finally we actually optimize. p = train_crf(feats,efeats,labels,model,loss_spec,crf_type,options)This prints a visualization while running, using the viz function above.
time: 1.490180 / Inf (1.490184 per call) loss: 2.717137 error: 0.586960 time: 2.336790 / Inf (1.168397 per call) loss: 2.495032 error: 0.413040 time: 3.126747 / Inf (1.042250 per call) loss: 2.336501 error: 0.413040 time: 4.396906 / Inf (1.099227 per call) loss: 2.091947 error: 0.413040 time: 5.526749 / Inf (1.105350 per call) loss: 11.609399 error: 0.586960 etc... time: 35.654142 / Inf (0.848908 per call) loss: 0.736988 error:0.103520 p = F: [2x2 double] G: [4x1 double]The result is a structure array p. It contains two matrices. The first, F, determines the univariate potentials. Specifically, the vector of log-potentials for node i is given by multiplying F with the features for node i. Similarly, G determines the log-potentials for the edge interactions. Since there are no features, though, G just multiplies a constant of 1, meaning that the 4 entries of G are themselves the log-potentials for the four possible values of (x_i,x_j). Now that we've trained the image, let's make a new test image, and get example marginals for it. x = round(imfilter(rand(siz),fspecial('gaussian',50,7),'same','symmetric')); t = rand(size(x)); y = x.*(1-t.^noiselevel) + (1-x).*t.^noiselevel; feats = [y(:) 1+0*x(:)]; labels = x+1; [b_i b_ij] = eval_crf(p,feats,efeats,model,loss_spec,crf_type,rho); b_i = reshape(b_i',[siz siz nvals]); [~,label_pred] = max(b_i,[],3); error = mean(label_pred(:)~=labels(:))In this case the error rate on the test image was around 12%. Here, we can see the true labels, input, predicted marginals, and predicted labels. ## Example: Background DatasetThis is a somewhat more involved example, going through the entire process of training CRFs on the Stanford Backgrounds Dataset. Before using matlab, you need to download the dataset. From the shell, do, for example, I go to ~/Datasets and do:>> wget http://dags.stanford.edu/data/iccv09Data.tar.gz Resolving dags.stanford.edu... Connecting to dags.stanford.edu HTTP request sent, awaiting response... 200 OK Length: 14727974 (14M) [application/x-gzip] Saving to: ?iccv09Data.tar.gz? 100%[======================================>] 14,727,974 3.03M/s in 6.7s 2011-12-17 18:23:42 (2.10 MB/s) - ?iccv09Data.tar.gz? saved [14727974/14727974] >> tar -xvf iccv09Data.tar.gz iccv09Data/horizons.txt iccv09Data/images/ iccv09Data/images/0000051.jpg iccv09Data/images/0000059.jpg iccv09Data/images/0000072.jpg [...] iccv09Data/labels/9005273.layers.txt iccv09Data/labels/9005294.regions.txt iccv09Data/labels/9005294.surfaces.txt iccv09Data/labels/9005294.layers.txt iccv09Data/READMEThis puts the data in a directory at ~/Datasets/iccv09Data/. Now, start matlab. We begin with some parameter choices. imsdir = '~/Datasets/iccv09Data/images/'; % Change this to fit your system! labdir = '~/Datasets/iccv09Data/labels/'; % Change this to fit your system! nvals = 8; rez = .2; % how much to reduce resolution rho = .5; % (1 = loopy belief propagation) (.5 = tree-reweighted belief propagation)Next, we need to choose what features will be used. Here, we choose to use the RGB intensities, and position, jointly Fourier expanded, plus a histogram of Gaussians, computed using Piotr Dollar's toolbox. feat_params = {{'patches',0},{'position',1},{'fourier',1},{'hog',8}};Now, we will load the data. In the backgrounds dataset, labels are stored as a text array of integers in the range 0-7, with negative values for unlabelled regions. JGMT uses 0 to represent unlabelled/hidden values, so we make this conversion when loading the data. Additionally, we reduce resolution to 20% after computing the features. This actually increases the accuracy of the final predictions, interpolated
back to the original resolution.
ims_names = dir([imsdir '*.jpg']); lab_names = dir([labdir '*regions.txt']); N = length(ims_names); ims = cell(N,1); labels = cell(N,1); fprintf('loading data and computing feature maps...\n'); parfor n=1:N % load data lab = importdata([labdir lab_names(n).name]); im = double(imread(([imsdir ims_names(n).name])))/255; ims{n} = im; labels0{n} = max(0,lab+1); % compute features feats{n} = featurize_im(ims{n},feat_params); % reduce resolution for speed ims{n} = imresize(ims{n} ,rez,'bilinear'); feats{n} = imresize(feats{n} ,rez,'bilinear'); labels{n} = imresize(labels0{n},rez,'nearest'); % reshape features [ly lx lz] = size(feats{n}); feats{n} = reshape(feats{n},ly*lx,lz); endNext, we will make the graph structure. In this dataset, the images come in slightly different sizes. Rather than making a different graph for each image (which would be fine if slow) we use a "hashing" strategy to make them, then copy into an array with one per image. model_hash = repmat({[]},1000,1000); fprintf('building models...\n') for n=1:N [ly lx lz] = size(ims{n}); if isempty(model_hash{ly,lx}); model_hash{ly,lx} = gridmodel(ly,lx,nvals); end end models = cell(N,1); for n=1:N [ly lx lz] = size(ims{n}); models{n} = model_hash{ly,lx}; endNow, we compute edge features. (This must be done here since it uses the graph structures.) First off, we must specify what features to use. Here, we choose a constant of one, a set of thresholds on the difference of neighboring pixels, and "pairtype" features. In pairtype last ones, the number of features is doubled, with the previous features either put in the first or second half. The effect is that vertical and horizontal edges are parameterized separately. edge_params = {{'const'},{'diffthresh'},{'pairtypes'}}; fprintf('computing edge features...\n') efeats = cell(N,1); parfor n=1:N efeats{n} = edgeify_im(ims{n},edge_params,models{n}.pairs,models{n}.pairtype); endNext up, we split the data into a training set (80%) and a test set (20%). ```
fprintf('splitting data into a training and a test set...\n')
k = 1;
[who_train who_test] = kfold_sets(N,5,k);
ims_train = ims(who_train);
feats_train = feats(who_train);
efeats_train = efeats(who_train);
labels_train = labels(who_train);
labels0_train = labels0(who_train);
models_train = models(who_train);
ims_test = ims(who_test);
feats_test = feats(who_test);
efeats_test = efeats(who_test);
labels_test = labels(who_test);
labels0_test = labels0(who_test);
models_test = models(who_test);
```
Again we make a visualization function. This takes a cell array of predicted
beliefs as input, and shows them to the screen during training. This
is totally optional, but very useful if you want to understand what
is happening in your training run.
% visualization function function viz(b_i) % here, b_i is a cell array of size nvals x nvars M = 5; for n=1:M [ly lx lz] = size(ims_train{n}); subplot(3,M,n ); miximshow(reshape(b_i{n}',ly,lx,nvals),nvals); subplot(3,M,n+ M); imshow(ims_train{n}) subplot(3,M,n+2*M); miximshow(reshape(labels_train{n},ly,lx),nvals); end xlabel('top: marginals middle: input bottom: labels') drawnow endNow, we choose what learning method to use. Here, we choose truncated fitting with the clique logistic loss. We use 5 iterations of TRW inference. Here, we use 'trwpll' to indicate to use the multithreaded TRW code. You will probably have to call 'compile_openmp' to make this work. Otherwise, you could just switch to 'trunc_cl_trw_5', which uses the non-parallel code. ```
loss_spec = 'trunc_cl_trwpll_5';
```
Finally, we actually train the model. This takes about an hour and a
half on an 8-core machine. You should have at least 4-8GB of memory.
matlabpool 8 fprintf('training the model (this is slow!)...\n') crf_type = 'linear_linear'; options.viz = @viz; options.print_times = 0; % since this is so slow, print stuff to screen options.gradual = 1; % use gradual fitting options.maxiter = 1000; options.rho = rho; options.reg = 1e-4; options.opt_display = 0; p = train_crf(feats_train,efeats_train,labels_train,models_train,loss_spec,crf_type,options) ans = F: [8x100 double] G: [64x22 double] The result is a structure array p. It contains two matrices. The first, F, determines the univariate potentials. Specifically, the vector of log-potentials for node i is given by multiplying F with the features for node i. Since there are 100 univariate features, this is a 8x100 matrix. Similarly, G determines the log-potentials for the edge interactions. Since there are 22 edge features and 64=8*8 pairwise values, this is a 64x22 matrix. fprintf('get the marginals for test images...\n'); close all for n=1:length(feats_test) [b_i b_ij] = eval_crf(p,feats_test{n},efeats_test{n},models_test{n},loss_spec,crf_type,rho); [ly lx lz] = size(labels_test{n}); [~,x_pred] = max(b_i,[],1); x_pred = reshape(x_pred,ly,lx); [ly lx lz] = size(labels0_test{n}); x = labels0_test{n}; % upsample predicted images to full resolution x_pred = imresize(x_pred,size(x),'nearest'); E(n) = sum(x_pred(x(:)>0)~=x(x(:)>0)); T(n) = sum(x(:)>0); [ly lx lz] = size(ims_test{n}); subplot(2,3,1) miximshow(reshape(b_i',ly,lx,nvals),nvals); subplot(2,3,2) imshow(ims_test{n}) subplot(2,3,3) miximshow(reshape(labels_test{n},ly,lx),nvals); [ly lx lz] = size(labels0_test{n}); subplot(2,3,4) miximshow(reshape(x_pred,ly,lx),nvals); subplot(2,3,5) imshow(ims_test{n}) subplot(2,3,6) miximshow(reshape(labels0_test{n},ly,lx),nvals); drawnow end fprintf('total pixelwise error on test data: %f \n', sum(E)/sum(T))In this case, the error on test data is around 23%. Certainly not perfect, but seems to be state of the art on this dataset currently. Finally, for fun, I also downloaded a video of someone driving from Arlington into Georgetown and ran the algorithm. This took about .62s per frame to compute the features and .75s per frame to run TRW (though inference has been optimized for learning, not test time performance). Anyway, this gives a performance of "0.73 FPS". Not fast enough to process video in real time, but potentially useful in a robotics application or similar |