Metafacture-core release 5.3.1

January 06, 2022 | Pascal Christoph

Preamble

This post describes the new developements in metafacture-core release 5.3.1 since the release of metafacture-core-5.2.0 in April 2021. As the release of metafacture-core-5.3.0 comes with some possible breaking changes it is recommended to stick to 5.3.1. The intention is to provide a single paged overview of the improvements with the example section to provide a condensed glimpse showing more real life examples.

Table of Contents

Changes

Bug fixes

  • XML/biblio: Fix creation of Marc XML namespaces #403
  • XML/biblio: Fix Namespace-prefixes of elements and attributes #377
  • XML/biblio: Marc-XML-encoder: record-type written as controlfield not as attribut of record-field #402
  • XML/biblio: Improve handling of XML attributes and element values #394
  • XML/biblio: Encode top-level MARC record leader as proper XML element instead of control field #336
  • XML/biblio: Make simple XML encoder value tag name configurable #379
  • JSON: Fix _elseNested loses array-key in JSON #374
  • Metamorph: Fix _elseNested only outputs two hierachy levels #378
  • Metamorph: Fix “setreplace” using a FileMap #381
  • Metamorph: Guarantee that tests should verify that no unexpected interactions occurred #339

New Features

  • JSON: Make JSON encoder array marker configurable #393
  • JSON: Add or enhance a function to extract JSON-Records from an JSON-API #382
  • Mangling: Split up event stream into records #385
  • Metamorph: Allow empty values in setreplace map #420
  • Triples: Sort triples numerically #380
  • YAML: Add YAML Encoder/Decoder #399

Others

  • Update release and publish process #311
  • Checkstyle and javadoc #389389 and #396
  • Update and apply EditorConfig file #388
  • Add initial CONTRIBUTING.md #382
  • Fix insecure logging configuration #364

… and various smaller fixes and improvements (e.g. #417)

Caveats

This will occur only quite rarely: If you are using a metamorph.xsd on your own and make use of FileMap you have to also update your locally metamorph.xsd:

-      <attribute name="separator" type="string" use="optional" default="\t">
+      <attribute name="separator" type="string" use="optional" default="&#09;">

Examples

Here are some examples that describe the former behaviour and the new changes.

Encoding (MARC21) XML

Namespace prefixes of elements and attributes in XML can now be preserved by adding a parameter in the flux: handle-generic-xml(emitnamespace="true").

The MARC21 XML encoding has gotten some improvements regarding the record type (which was formerly written as a controllfield). Also the output of leader is now fixed.

Input:

	<record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">
		<leader>00000pam a2200000 c 4500</leader>
		<marc:datafield tag="856" ind1="4" ind2="0">
			<marc:subfield code="u">http://www.video2brain.com/</mx:subfield>
			<marc:subfield code="x">Agentur</mx:subfield>
		</marc:datafield>
	</record>

was:

	<marc:record>
		<marc:controlfield tag="type">Bibliographic</marc:controlfield>
		<marc:leader>00000pam a2200000 c 4500</marc:leader>
		<marc:datafield tag="856" ind1="4" ind2="0">
			<marc:subfield code="u">http://www.video2brain.com/</marc:subfield>
		</marc:datafield>
		<marc:datafield tag="856" ind1="4" ind2="0">
			<marc:subfield code="x">Agentur</marc:subfield>
		</marc:datafield>
	</marc:record>

now:

	<marc:record type="Bibliographic">
		<marc:leader>00000pam a2200000 c 4500</marc:leader>
		<marc:datafield tag="856" ind1="4" ind2="0">
			<marc:subfield code="u">http://www.video2brain.com/</mx:subfield>
			<marc:subfield code="x">Agentur</mx:subfield>
		</marc:datafield>
	</marc:record>
`

It’s now possible to configure what’s a valuetag (String) and an attributemarker (String) in many XML-related readers and encoders (see the diff of the flux-commands.md).

Encoding JSON

The possibilty in morph to serve pass down untreated fields untouched was delimited to a nested hierachy of only two. Also, the structure was kind of broken e.g. for JSON.

Using the following FLUX:

"testArray.json"
| open-file
| as-records
| decode-json
| morph("all.xml")
| encode-json(prettyPrinting="true")
| write("stdout");

and the morph:

<?xml version="1.0" encoding="UTF-8"?>
<metamorph xmlns="http://www.culturegraph.org/metamorph" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	version="1">
	<rules>
<data source="_elseNested"/>
</rules>
</metamorph>

with this JSON input:

{
    "author": [
        {
            "@type": "Person",
            "name": "Katja Königstein-Lüdersdorff"
        },
        {
            "@type": "Person",
            "name": "Corinna Peters"
        },
        {
            "@type": "Person",
            "name": "Oleg Tjulenev"
        },
        {
            "@type": "Person",
            "name": "Claudia Vogeler"
        }
    ]
}

resulted in:

{
  "1" : {
    "@type" : "Person",
    "name" : "Katja Königstein-Lüdersdorff"
  },
  "2" : {
    "@type" : "Person",
    "name" : "Corinna Peters"
  },
  "3" : {
    "@type" : "Person",
    "name" : "Oleg Tjulenev"
  },
  "4" : {
    "@type" : "Person",
    "name" : "Claudia Vogeler"
  }
}

With the new release the output is:

{
  "author" : [ {
    "@type" : "Person",
    "name" : "Katja Königstein-Lüdersdorff"
  }, {
    "@type" : "Person",
    "name" : "Corinna Peters"
  }, {
    "@type" : "Person",
    "name" : "Oleg Tjulenev"
  }, {
    "@type" : "Person",
    "name" : "Claudia Vogeler"
  } ]
}

Morph function “setreplace” with filemaps

You can now use the morph function setreplace with externals file maps. These maps are lists of tabulator separated values (tsv): the first value is, when matched, substituted by the second value. You can also leave out a second value resulting in the removal of the matched value. Use it like this:

       ...
       <rules>
          <data source='fieldNameWhereValuesShallBeSubstituded'>
            <setreplace map='mapname' />
          </data>
        </rules>
        <maps>
          <filemap name='mapname' files='org/metafacture/metamorph/maps/file-map-test.txt' />
        </maps>
        ...

Outlook

We are working on a Catmandu like fix language which can be used instead of the morph script. Also, there will be a playground, realized as a web app, to play around with data and transformation rules and see the outcome immediately. You will be offered to load predefined examples. The playground comes with the capability to share examples or whole complex workflows, with the vision to enable this as a web API for processing data without even installing metafacture.

Watch out for updates and new blog posts!


Metafacture ant

A blog for the ETL toolkit Metafacture. This blog is maintained by the Metafacture community.