edugain-discuss AT lists.geant.org

Subject: An open discussion list for topics related to the eduGAIN interfederation service.

List archive

Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata

From: Peter Schober <peter.schober AT univie.ac.at>
To: edugain-discuss AT lists.geant.org
Subject: Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata
Date: Sat, 21 Sep 2019 11:19:21 +0200
Organization: ACOnet

* Leif Johansson <leifj AT sunet.se> [2019-09-20 23:49]:
> There are a bunch of stuff that the libxml2 schema code doesn't catch.

ACK

> The "funny" thing is that one of my SATOSA proxies (i.e pysaml2)
> choked on this ... and this is iirc the same underlying library
> doing the validation!

From a quick grep through the code it seems the only use of libxml2
(via lxml) pysaml2 makes is within sign_statement() in
src/saml2/sigver.py. I guess that means plenty of code paths left to
choke that do not involve libxml2 at all?

The xmllint command line tool (also using libxml2) did report this
(but as you say does not catch many other things), so it seems not
even knowing the underlying library will be sufficient to termine
whether a given system will break on a given input.

> Its been one of those days. I'm just happy one of the many strange
> things that bit my systems today wasn't actually a heisenbug but had
> a perfectly reasonable explanation!

:)

Do you think it's viable to try to get lxml to dectect such errors
during validation -- or whatever would be needed to allow pyff to
detect and avoid ingesting such metadata?

I'm just trying to get eduGAIN to a place where it's less brittle,
i.e. where not a single "unclean" upstream feed has the potential to
break hundreds or thousands of end entities across the globe.

Currently for my own purposes I'm running pre-(and post-)processing
around pyff with all the tooling I can get my hands on (xmlwf from
expat, xmllint, xmlstarlet, xmlsec1, XmlSecTool, manual checks for the
expected minimum numbers of entities in a given feeds, etc.; in a
previous iteration I even scripted a Shibboleth SP to see whether it
would successfully load a piece of metadata) in order to catch
anything "bad" before it even gets loaded into my feed configs. (And
in some cases again on the generated outputs, to make sure I'm not
generating something that other tools will choke on.)

Clearly it's impossible to make one tool replicate all checks from an
array of different technologies and code bases, though it would
undoubtly be great if one could run "just pyff" and be reasonably safe
without the need for excessive pre- and/or post-processing.

Would calling out from pyff to external tools as part of a pipeline be
a reasonable idea? Maybe something like:

- when update: # or when batch
- load:
- $url via validate
- $url2 via validate
- select ...
- when validate:
- command:
- /some/script
- /usr/bin/executable subcmd --with options --file

Where a 'command' would write ('publish') the current metadata set to
a (securely generated) temporary file and pass that file name as the
only argumnent to each listed command (or possibly with the name of
the input as second arg or as environment variable). Each command
would simply process $1 or sys.argv[1] or whatever and either complete
silently (return 0) -- in which case processing continues -- or fail
(return non-zero and possibly an error msg on STDERR) and cause
processing to stop. The commands wouldn't be able to modify anything.

That would allow use of arbitrary tools for pre-processing, but now
all orchestrated from within pyff (instead of having to run externals
tools before starting pyff and after it completes, which may not be a
usable pattern when running pyffd, anyway.)

Though ideally such checks could be inserted at arbitrary places of a
processing pipeline (e.g. check signature and schema valitity on load,
my own xmldsig-signature after signing but before publishing, etc.):

Cheers,
-peter

[eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Peter Schober, 20-Sep-2019
- Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Tomasz Wolniewicz, 20-Sep-2019
  - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Peter Schober, 20-Sep-2019
  - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Leif Johansson, 20-Sep-2019
    - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Peter Schober, 09/21/2019
      - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Leif Johansson, 22-Sep-2019
        
        Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Peter Schober, 22-Sep-2019
- Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Molnár Péter, 20-Sep-2019
  - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Peter Schober, 20-Sep-2019
    - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Peter Schober, 20-Sep-2019
- Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Tomasz Wolniewicz, 23-Sep-2019
  - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Peter Schober, 23-Sep-2019
    - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Leif Johansson, 23-Sep-2019
      - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Davide Vaghetti, 24-Sep-2019
        
        Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata, Tomasz Wolniewicz, 24-Sep-2019