Skip to Content.

rare-dev - Re: [rare-dev] Capability detection added

Subject: Rare project developers

List archive


Re: [rare-dev] Capability detection added


Chronological Thread 
  • From: mc36 <>
  • To: , Frédéric LOUI <>
  • Subject: Re: [rare-dev] Capability detection added
  • Date: Fri, 4 Feb 2022 11:55:12 +0100

hi,


On 2/4/22 11:18, Fr d ric LOUI wrote:
Hi,

It should be reported now :)
https://bitbucket.software.geant.org/projects/RARE/repos/rare/commits/a833430f5870c2e24ffb61e0fbcce8519c4801b6

i'll test it, thanks!

TL;DR:
bf_switchd actually does not use same prefix in order to reference. Object in
BfRtIno.
Table starts with pipe name, action does not include pipe name Previous version assumed the P4 object are reference the same way in BfRtInfo.

e.g:
table MPLS: ig_ctl_mpls.tbl_mpls_fib
one of BIER action: ig_ctl.ig_ctl_mpls.act_mpls_bier_label

My bad then :)
I presume this is a good timing to test this code with all the profile :)

But, there is an additional catch, I m thinking while I m writing this
email.
Advertising dataplane capability is one first step.

yeahhh....


But I presume that bf_forwarder needs to refuse to program an entry if
MAX_ENTRY is reported from bf_switchd
(I don t know if BrRuntime is reporting this error)
And this error should be reported to freeRtr.

i disagree on this: so bffwd should be kept as simple as it could be....
adding additional logic to keep track
of the already programmed entries is obviously falls to the additional
complexity category, and does not pays
off, as if you try to add an entry to an already full table, it'll fail and
you'll get a grpc response accordingly.
all these exceptions could be reported to freerouter, we have the
dataplane_say message that bfrt should emit in this case....
(it already does it anyway, but only to the local console...)

morever, the lpm and mask matching (tcam) tables does not have exact capacity
and adding a lot of ntries to the
same partition of a given table could fail when a partition becomes full,
whereas the whole table still can accommodate
a lot of prefixes, into an other partitions...

so instead of adding the above, what if you auto-fill these
https://bitbucket.software.geant.org/projects/RARE/repos/rare/browse/bfrt_python/bf_forwarder.py#114
from the detected tables and remove these arguments....
these now already used to not to program to nonexistent tables,
and could be used to not start counter reporting from non-existent tables....





IIRC, in case for VRF for example, IOS was just ditching the last routes and
was indicating %VRF FULL% when the VRF reached MAX_ROUTES.

the above mentioned (already caught exception, but sent to the control plane
too) would result in exactly this behavior:
once an add/del/mod fails, freerouter will log to the terminal that this and
that did not went out and the netadmin
can decide on how to recover.... mostly reducing routes will work, freerouter
will try to del mostly programmed
routes that will be removed from the tcam, and some non-programmed prefixes
will be also tried to del, which then,
will fail again because they did not accepted beforehand to the tcam.... lets
say, you have 1 vrf and 100
routes from 1/8 and 100 routes from 2/0 and the tcam somehow can only accept
100 routes in total, then you'll
see 'grpc error table full exception' for the 2/8 prefixes on bffwd, repoted
to freerouter, then arriving to the terminal
then you set the filtering or vrf route limit and then freerouter will send
100*del 2/8 which will fail with
'grpc error no such entry', but the overall result will be that the tcam
programmed will equal what freerouter
thinks about what it should contain...

I guess this behaviour has to be implemented also.

Anyone fancy to get its hand dirty on it ?
It is a good item in order to get into bf_forwarder and TNA development
without getting too deep into the code.
I can of course offer help ! :)

If not, I ll add it in todo list ...

All the best,
Frederic

Le 3 f vr. 2022 19:07, mc36 <> a crit :

hi,

On 2/3/22 16:04, Fr d ric LOUI wrote:
Now let s cross fingers so it passes mc36 automated QC s :-)

all the tests passes:
- summary: 2022-02-03 17:58:42, took 01:13:10, with 10 workers, on 331 cases,
0 failed, 6 traces, 1 retries

and this is what i see on my stordis:

core#show p4lang p4
category value
peer 127.0.0.1
closed 0
capability mpls polka
platform tna/bfforwarder
since 2022-02-03 19:02:18
for 00:00:16

core#

i'm pretty sure you're not reporting back the subcapabilities like bier which
i have... see my profile here:
https://bitbucket.software.geant.org/projects/RARE/repos/rare/browse/profiles/profile-nop-mchome.tmpl
https://bitbucket.software.geant.org/projects/RARE/repos/rare/browse/profiles/profile-nop-mchome.p4

regards,
cs




Archive powered by MHonArc 2.6.19.

Top of Page