MEGRASP -- The Maximum Entropy GR Parser for CHILDES transcripts
Version 0.8a

Kenji Sagae

lastname [at] usc [dot] edu
University of Southern California

July 2009

Based on GRASP, aka childesparser, developed by
Kenji Sagae 
at Carnegie Mellon University

Includes code from SSMaxEnt, written by Yoshimasa Tsuruoka 
at the University of Tokyo (now at the University of 
Manchester).

----------------------------------------------------------------

QUICK START

1. Unpack the tar.gz or .zip archive (see below).

2. Find the executable for your platform.  For example, the
   i386 Linux executable is called megrasp-linux.

   If an executable for your plataform is not available,
   compile (see below).

3. Assuming you have a file named test.cha, you can run
   % ./megrasp test.cha > output.cha
   to create an output file called output.cha.
   
   This uses a model provided with the distribution,
   megrasp.mod.  If option -t is used and a new model
   name is not specified, the provided model will be
   overwritten.  To retrain the parser, see command line
   options below.

----------------------------------------------------------------

UNPACKING

% gunzip megrasp0.8b.tar.gz
% tar xvf megrasp0.8b.tar

COMPILING

% make megrasp

This produces an executable called megrasp.
This has only been tested on Linux, but it
should work fine on cygwin and MacOSX.

USAGE

% ./megrasp [-g str] [-G str] [-p str] [-m str] [-L int] [-e]
	    [-t [-i int] [-c float]] inputfile

inputfile may be a filename (file must be in CHAT format, 
with part-of-speech tags produced by MOR and POST), or a 
sequence of filenames, or a pattern using wildcards (for 
example: *.cha).  Use the -o option (see below) only if 
using a single file name.

If inputfile is omitted, the parser reads input from STDIN
and writes output to STDOUT.  If inputfile is specified, 
the parser creates a new output file with the same name as
the input file with the additional extension .txt (unless 
the -o option is used to set the name of the output file).

Command line options:

-g str:	prefix for output lines (default: xgra)
	For example, if the option '-g abc' is given to the parser,
	output lines will begin with %abc: followed by a tab character
	(the default is to use the standard %xgra:). 

-G str:	prefix for GR training lines (default: xgrt)
	GR training lines have the same format as output lines, but
	are assumed to be correct.

-p str:	prefix for POS lines, must be three characters (default: mor)
	Other choices that make sense here include trn and pos.
	Ambiguous tags are NOT supported.

-m str:	model name (default: megrasp.mod).	

-o str: output file name.  If this option is not used, the output
	file name is the input file name with a ".txt" appended to
	it (if input is being taken from STDIN, output is written
	to STDOUT).

-L int: number of characters before output line wraps (default: 80)
	If L is set to zero, line-wrapping is turned off, and
	line breaks never interrupt output lines.

-e:	evaluate accuracy (input file must contain gold-standard GRs)

-t: training mode (parser runs in parse mode by default)
	Use this option only to train the parser using a file that
	contains GR training lines (usually %xgrt:).  When the parser
	is used with the -t option, it writes warnings to STDERR, 
	which can be captured to a file like this:
	./megrasp -t myfile.cha >& log.txt

The next two options are used in training mode only (with the -t option).
If the short descriptions don't make sense, it's best to leave those
alone, or see Kazama & Tsujii (EMNLP 2003).

-i int:	number of iterations for training the ME model (default: 300)
-c float: inequality parameter for training the ME model with
	inequality constraints (default: 1.0).


---------------------------

Changes

10 JUL 2009, v0.8a: changed default tier names (xgrt, xgra, mor)
10 JUL 2009, v0.8a: changed default output file name (add .txt)
17 MAR 2008, v0.7a: added evaluation option (-e)
19 NOV 2007, v0.7: made the "previous action" feature part of the parser state.