SGF - Sekimo Generic Format/XStandoff
The Sekimo Generic Format (SGF) was developed during the Sekimo project at Bielefeld University for allowing the structured (i.e. XML-based) storage and analysis of multi-dimensional annoted (including possible overlaps) linguistic corpora. It combines both elements of classic stand-off and inline annotation and is capable of using tree- and graph-based annotation models (the formal model of SGF itself is the one of a multi-rooted tree). A detailed discussion of the format and its application can be found in Stührenberg & Goecke, 2008. The use of SGF for storing lexical chaining (SGF-LC) is demonstrated in Waltinger et al., 2008. In addition, SGF is used as import and export format for the current version of the web-based annotation tool Serengeti which is used as part of the AnaphoricBank and the AnaWiki project.
The extended current developer version has been renamed to X-Standoff. Additional information can be found at http://www.xstandoff.net.
The XML schema files defining the Sekimo Generic Format can be found at the following URLs:
Version 1.0 (stable release)
SGF schema files:
N.B.: All SGF XML schema files are available under the GNU Lesser General Public License (LGPL v3).
- Base layer (sgf.xsd; XML namespace: http://www.text-technology.de/sekimo):
http://www.maik-stuehrenberg.de/files/sgf/1.0/sgf_stable.zip - Logging (especially for the use in the web-based annotation tool Serengeti; log.xsd; XML namespace: http://www.text-technology.de/sekimo/log):
http://www.maik-stuehrenberg.de/files/sgf/1.0/log_stable.zip
Additional XML schemas that are used in conjunction with SGF:
- The XML schema used in the Sekimo project for the logical document structure (doc.xsd; XML namespace: http://www.text-technology.de/sekimo/doc):
http://www.maik-stuehrenberg.de/files/sgf/1.0/doc.xsd - The adapted XML schema used in the Sekimo project for the output of the Machinese Syntax dependency parser by Connexor Oy (cnx.xsd; XML namespace: http://www.text-technology.de/sekimo/cnx):
http://www.maik-stuehrenberg.de/files/sgf/1.0/cnx.xsd
The copyright of the DTD fdg3.dtd on which this XSD file is based on is held by Connexor Oy! - The XML schema used in the Sekimo project for storing semantic relations (chs.xsd; XML namespace: http://www.text-technology.de/sekimo/chs):
http://www.maik-stuehrenberg.de/files/sgf/1.0/chs.xsd - The XML schema used in the AnaphoricBank and the AnaWiki project for the logical document structure (ana_doc.xsd; XML namespace: http://www.text-technology.de/anawiki/ds):
http://www.maik-stuehrenberg.de/files/sgf/1.0/ana_doc.xsd - The XML schema used in the AnaphoricBank and the AnaWiki project for the markable annotation using Serengeti (ana_markables.xsd; XML namespace: http://www.text-technology.de/anawiki/mark):
http://www.maik-stuehrenberg.de/files/sgf/1.0/ana_markables.xsd - The XML schema used in the AnaphoricBank and the AnaWiki project for storing semantic relations using Serengeti (ana_semrels.xsd; XML namespace: http://www.text-technology.de/anawiki/relation):
http://www.maik-stuehrenberg.de/files/sgf/1.0/ana_semrels.xsd - The XML schema used in the AnaphoricBank and the AnaWiki project for storing MASXML files based on the GNOME DTD by Massimo Poesio (ana_masxml.xsd; XML namespace: http://www.text-technology.de/anawiki/masxml):
http://www.maik-stuehrenberg.de/files/sgf/1.0/ana_masxml.xsd - The XML schema used in the Indogram projekt for storing lexical chains (SGF-LC) as an export format for the Scientific Workplace (lc.xsd; XML namespace: http://www.text-technology.de/sekimo/lc):
http://www.maik-stuehrenberg.de/files/sgf/1.0/lc.xsd
Version 1.1 (development build, XStandoff Format)
Please visit http://www.xstandoff.net for the current release.