Skip to Content.

edugain-discuss - Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata

edugain-discuss AT lists.geant.org

Subject: An open discussion list for topics related to the eduGAIN interfederation service.

List archive


Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata


Chronological Thread 
  • From: Leif Johansson <leifj AT sunet.se>
  • To: edugain-discuss AT lists.geant.org
  • Subject: Re: [eduGAIN-discuss] MDS re-publishes schema-invalid metadata
  • Date: Sun, 22 Sep 2019 22:39:15 +0200

On 2019-09-21 11:19, Peter Schober wrote:
> * Leif Johansson <leifj AT sunet.se> [2019-09-20 23:49]:
>> There are a bunch of stuff that the libxml2 schema code doesn't catch.
>
> ACK
>
>> The "funny" thing is that one of my SATOSA proxies (i.e pysaml2)
>> choked on this ... and this is iirc the same underlying library
>> doing the validation!
>
> From a quick grep through the code it seems the only use of libxml2
> (via lxml) pysaml2 makes is within sign_statement() in
> src/saml2/sigver.py. I guess that means plenty of code paths left to
> choke that do not involve libxml2 at all?

Darnit - you're right!

>
> The xmllint command line tool (also using libxml2) did report this
> (but as you say does not catch many other things), so it seems not
> even knowing the underlying library will be sufficient to termine
> whether a given system will break on a given input.
>

Did you feed it schema or did it catch the error anyway?

>> Its been one of those days. I'm just happy one of the many strange
>> things that bit my systems today wasn't actually a heisenbug but had
>> a perfectly reasonable explanation!
>
> :)
>
> Do you think it's viable to try to get lxml to dectect such errors
> during validation -- or whatever would be needed to allow pyff to
> detect and avoid ingesting such metadata?

I think the right thing to do is to write plugins to pyFF for doing
validation checks other than schema. There are one or two already.

Maybe this is the time to make sure pyFF can handle plugin libraries
in a smooth way.

>
> I'm just trying to get eduGAIN to a place where it's less brittle,
> i.e. where not a single "unclean" upstream feed has the potential to
> break hundreds or thousands of end entities across the globe.
>
> Currently for my own purposes I'm running pre-(and post-)processing
> around pyff with all the tooling I can get my hands on (xmlwf from
> expat, xmllint, xmlstarlet, xmlsec1, XmlSecTool, manual checks for the
> expected minimum numbers of entities in a given feeds, etc.; in a
> previous iteration I even scripted a Shibboleth SP to see whether it
> would successfully load a piece of metadata) in order to catch
> anything "bad" before it even gets loaded into my feed configs. (And
> in some cases again on the generated outputs, to make sure I'm not
> generating something that other tools will choke on.)
>
> Clearly it's impossible to make one tool replicate all checks from an
> array of different technologies and code bases, though it would
> undoubtly be great if one could run "just pyff" and be reasonably safe
> without the need for excessive pre- and/or post-processing.

With a bit of effort it would probably be doable to produce a library
of the most useful checks. It would be interesting to get a feeling
for what type of stuff people would like to have to get the ball
rolling.

>
> Would calling out from pyff to external tools as part of a pipeline be
> a reasonable idea? Maybe something like:
>
> - when update: # or when batch
> - load:
> - $url via validate
> - $url2 via validate
> - select ...
> - when validate:
> - command:
> - /some/script
> - /usr/bin/executable subcmd --with options --file

Sure but wouldn't it be easier to just code the check up in python?

Presumably the checks themselves aren't hard to describe right?

Cheers Leif




Archive powered by MHonArc 2.6.19.

Top of Page