Skip to Content.

rare-users - Re: [RARE-users] RES: RES: RES: Problem on ports after reboot

Subject: RARE user and assistance email list

List archive


Re: [RARE-users] RES: RES: RES: Problem on ports after reboot


Chronological Thread 
  • From: mc36 <>
  • To: Marcos Felipe Schwarz <>, "" <>
  • Cc: Pedro Diniz <>, Alexander Gall <>, Jordi Ortiz <>
  • Subject: Re: [RARE-users] RES: RES: RES: Problem on ports after reboot
  • Date: Thu, 6 Jan 2022 18:53:26 +0100

hi,
okk, now i caught the issue for now! thanks for the explanation... :)
so as your description suggests, i agree that after the reboot, you may have
software switched packets...
it can be spotted by three easy ways:
1) sho inter ether0 history --- should not indicate any peaks and should say
far below 10kbps or so when you transmit....
2) sho bridge 1 --- the byte/packet counters should not increase as you
transmit, and, the sdn19 and sdn20 must have macs learned....
3) sho ipv4 arp bvi1 --- should have both servers listed with the proper mac
addresses... (this one is needed as here we're emulating a vlan behavior...)
i bet you that some of these does not get satisfied somehow after you reboot
the wedge... (my best guess is the arp table afterall...)
but imho you can restore all if these by issueing ping 10.0.0.3 /vrf CORE and
ping 10.0.0.4 /vrf CORE
could you please give it a try?
thanks,
cs





On 1/6/22 17:53, Marcos Felipe Schwarz wrote:
We probably should hava a talk so I can better describe the problem.

My topology right now is DTN3<->Wedge<->DTN4.
Both DTNs are Ubuntu20.04 servers with Mellanox ConnectX-5 NICs.
My initial problem was, every time I reboot my wedge, I can only communicate
between the DTNs with MTU 8190 (ping -M do -s 8162), and I see a lot of
retransmissions between them. If I disconnect the cable between Wedge and
DTN4 and reconnect it, the MTU goes back to normal 9000 (ping -M do -s 8972
works), and I get zero retransmissions.
I have no idea why the problem occours. But after your initial suggestion I
noticed that always the MTU to the GW IP on the switch was limited to 8190,
even when the connection between the servers was working at 9000. And after
increasing the cpuport MTU to 9002 the problem went away.

My question now is, should I be experiencing this problema in the first
place? Can it be some kind of bug or misconfiguration on my part? Should we
do futher testing? I'd like to help, even though my problem has been
mitigated.

Regards,

Marcos Schwarz

-----Mensagem original-----
De: mc36 <>
Enviada em: quinta-feira, 6 de janeiro de 2022 13:20
Para: ; Marcos Felipe Schwarz
<>
Cc: Pedro Diniz <>; Alexander Gall <>; Jordi
Ortiz <>
Assunto: Re: [RARE-users] RES: RES: Problem on ports after reboot

hi,

i'm a bit lost now... i found two contradiction statements in your reply:

1-"I can ping between my servers with MTU 9000"
2-"switch bridge IP is limiting the MTU between the ports that are part of
it?"

in my reading, if you can pass mtu 9000 between the servers then the switch
is not limiting with the mtu?

thanks,
cs



On 1/6/22 16:58, Marcos Felipe Schwarz wrote:
Thanks mc36.

This seams to be dirreclty related to my issue. Somehow the cpuport is
limiting the MTU of the traffic.
With the original value 8192 on /opt/freertr/bin/hwdet-main.sh, I can ping
the GW IP on wedge only up to 8162, which is also the value I'm limited too
when the problem occours (after a reboot).
I changed the value on /opt/freertr/bin/hwdet-main.sh to 9000 and I could
ping the GW on wedge up to 8970, which is also the value I'm limited too when
the problem occours (after a reboot).
I now set it to 9002, and now the problem has been mitigated. I can ping
between my servers with MTU 9000 (ping -M do 8972) after a reboot.

I believe that this is not intended. Should we do any tests to verify why the
switch bridge IP is limiting the MTU between the ports that are part of it?

-----Mensagem original-----
De: mc36 <>
Enviada em: quinta-feira, 6 de janeiro de 2022 10:36
Para: ; Marcos Felipe Schwarz
<>
Cc: Pedro Diniz <>; Alexander Gall
<>; Jordi Ortiz <>
Assunto: Re: [RARE-users] RES: Problem on ports after reboot

hi,
let me only answer to the last, mtu part:
so what you've measured here is the mtu that the freerouter can accomplish
when sending/receiving to an sdn interface...
for that communication, the asic have the so called cpuport, and it's
connected to the host os as ens1... the packets are prepended a 16bit word to
multiplex on which source/target physical port the packet intended to/from...
freerouter restricts itself to 1024 bytes when sending to be able to work
flawlessly over various tunneling topologies, setting to cpu port mtu to 8k
seemed more than enough... we only increased it to be able to test the jumbo
capabilities of the asic with locally generated packets, for the routing
protocols to work properly, jumbo frames are not needed at all... but yeahhh,
you can increase that, it's set up in /opt/freertr/bin/hwdet-main.sh on the
debian based images, and you should find that file if you're on a nix based
image....
regards,
cs


On 1/6/22 13:44, Marcos Felipe Schwarz wrote:
Thanks for the suggestions Frederic,

I tried some of your suggestions already. Setting autoneg to 2 doesn't work,
but it does with 1. I have another server on port 19 that doesn't have this
issue.
I'll try the other tests and post the results

Another thing that I noticed is that when everything is worknig correctly and
I can ping between Both servers with MTU 9000 and no drops, if I ping the GW
IP at the Wedge I also can only get MTU 8190 to it. Is there a way to
increase the MTU of a internal interface on RARE OS?

Regards,

Marcos Schwarz

-----Mensagem original-----
De:
<> Em nome de Fr d ric LOUI
Enviada em: quarta-feira, 5 de janeiro de 2022 15:16
Para:
Cc: Pedro Diniz <>; Alexander Gall
<>; Jordi Ortiz <>
Assunto: Re: [RARE-users] Problem on ports after reboot

Did you try to remove configuration from P4lang stanza and sdn interface and
tried to add it again ?
Instead of plug/unplug ?

It sounds like this is a physical BSP problem.

Unfortunately we did not experience such problem on our case Can you drop
QSFP info ? Also this can be related to Mellanox OFED driver.

We configured some 100GE port with CERN with Mellanox XConnect5 ofed driver
and it worked flawlessly.

Is it the only port at 100GE ? Do you have additional servers ?

In CHICAGO we have LEONI DAC cable with 100GE port and I set AUTONEG to ON.
(Otherwise the link does not come up) Can you please try the
following line ?

export-port sdn20 28 100 0 2 0

The problem is that we are also using LEONI DAC but as they are using a
specific P4 switch the BSP is not working correctly.
(No info at all from bf_platform)

Maybe @Alex or @Jordi can share their experience as they have 100GE ports
connected to Mellanox XCOnnect5 ?
Unfortunately in my case I usually dealt with 10GE ports.

All in all, please to to enable autoneg and please let me know if this change
something ...
>
All the best,
Frederic

Le 5 janv. 2022 16:44, Marcos Felipe Schwarz <> a
crit :

Dear all,
I m having problems on my Wedge running RARE-OS.
Everytime I reboot the port sdn20 comes dropping packets and with maximum MTU
8190 (ping -M do -s 8162). To solve the issue I need to phisically disconnect
the cable and reconnect it. If I reboot again, the problem reapear.
Has any of you had a similar problem?
Port sdn20 (export-port sdn20 28 100 0 1 0) is configured with MTU 9000 and conected through a Leoni DAC cable to a Mellanox ConnectX5. I ve tried changing the DAC cable from EdgeCore, and the problem persists.
I have limited availability to the equipment, so I d like to get some ideas
and commands to help troubleshoot the issue.
This Friday I ll be able to continue the tests, and I intend to:
Change the connection to a different port and verify if this is an
port issue
Set up logging and compare port information before and after the
problem
Any other ideas that you guys can help me with to either
troubleshoot or mitigate the problem
Regards,
Marcos Schwarz
Gerente de P&D | R&D Manager
Ger ncia de Execu o de P&D em Ciberinfraestrutura | Management of R&D
Execution in Cyberinfrastructure Diretoria de Pesquisa e
Desenvolvimento | Board of Research and Development RNP - Rede
Nacional de Ensino e Pesquisa | Brazilian National Research and
Educational Network Promovendo o uso inovador de redes avan adas |
Promoting the innovative use of advanced networks http://www.rnp.br
|
+55 (19) 3787- 3386 | Skype ID: marcos.f.sch Campinas - SP - Brasil
+|
E-mail:
<RIO0001.txt>




Archive powered by MHonArc 2.6.19.

Top of Page