Skip to Content.
Sympa Menu

rare-users - [RARE-users] RES: RES: RES: RES: Problem on ports after reboot

Subject: RARE user and assistance email list

List archive

[RARE-users] RES: RES: RES: RES: Problem on ports after reboot


Chronological Thread 
  • From: Marcos Felipe Schwarz <>
  • To: "" <>, "" <>
  • Cc: Pedro Diniz <>, Alexander Gall <>, Jordi Ortiz <>
  • Subject: [RARE-users] RES: RES: RES: RES: Problem on ports after reboot
  • Date: Fri, 14 Jan 2022 20:32:12 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=rnp.br; dmarc=pass action=none header.from=rnp.br; dkim=pass header.d=rnp.br; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=m7RDh7fxLc49NrWJSXLDLRuTr1ldHoPyoV01cV53NnU=; b=YXh/A6nwMy4m1xs4sQmiUbuqfxY+z0hbsgD37xmdWz2fF64k7wpggdEbQiLN9Oc2qT5QyG56i6v3cC6YY/k4V+1qdMZKoOKhiZhpI8XJN3JJNE1AFwH12psuq8LsSVV7pUJShn9QJe0G6b4dv5ylSis/LqLvCWDyDvSGi2YG+U9f5+SzG2b0ImkHeOVh5W9kgqw48ayHb9cuEdj+IbmILeXEvE2OIF1t/ieVzIguO92TaCMo7yaCsBZanoPCXISKbKAfx40n3dnoNsPnXrlbPEofF+nl3w5HpiONJMnGB1+zX+4zBHzfKzirOzy2z+MqN3hNYZ2w0y3oNFONYhRq9Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HUA83nhxtBl+9Iiocczd/O1zC3025tskxFhEkSR9HoXNKDf4VC4PL/7Fu7f824E7tyA02pp/tZoPaiSJkhAPOb/Ig2DhFRXOPMlnQDGhR2yOphM19Crw+qXSeVOOxHVjKGp8zuYHhSV/6+8rLGE8pxGBpMo8pgXKoFpZ/aEp++uR9oZ+/hlNbPM3LzdTK36Ftuc5aXdLMCE9NrTnHglFG39JmyWP/57vM5aKx/a2PG/7QUUjYynvSpWNuk0KPh31L9n+/xKMOsfxlECKtdLgWZb/syD1+UPJisQ1YnQo4CVTzh1SUIGhqswZxD/Zwy4B7ixO5GlLdBYHUy+UYn9o2w==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=rnp.br;

Hi mc36,

Thanks for your help. I can confirm that after a reboot "sho ipv4 arp bvi1"
is empty, and when I do a transfer between 10.0.0.3 (at sdn19) and 10.0.0.4
(at sdn20), the wedge is only learning the MAC from 10.0.0.3.
So it doesn't seems to learn from the MAC reply, is this the expected
behavior?
I can force 10.0.0.4 mac to be learned either pinging 10.0.0.4 from the wedge
or pinging 10.0.0.254 (wedge's IP) from the server.
So I confirm that the problem is that I was running my traffic on "slowpath"
using cpuport.
Is ther something that I could configure so that the wedge learns MACs from
mac replies?

Regards,

Marcos Schwarz

-----Mensagem original-----
De: mc36 <>
Enviada em: quinta-feira, 6 de janeiro de 2022 14:53
Para: Marcos Felipe Schwarz <>;

Cc: Pedro Diniz <>; Alexander Gall <>;
Jordi Ortiz <>
Assunto: Re: RES: [RARE-users] RES: RES: Problem on ports after reboot

hi,
okk, now i caught the issue for now! thanks for the explanation... :) so as
your description suggests, i agree that after the reboot, you may have
software switched packets...
it can be spotted by three easy ways:
1) sho inter ether0 history --- should not indicate any peaks and should say
far below 10kbps or so when you transmit....
2) sho bridge 1 --- the byte/packet counters should not increase as you
transmit, and, the sdn19 and sdn20 must have macs learned....
3) sho ipv4 arp bvi1 --- should have both servers listed with the proper mac
addresses... (this one is needed as here we're emulating a vlan behavior...)
i bet you that some of these does not get satisfied somehow after you reboot
the wedge... (my best guess is the arp table afterall...) but imho you can
restore all if these by issueing ping 10.0.0.3 /vrf CORE and ping 10.0.0.4
/vrf CORE could you please give it a try?
thanks,
cs





On 1/6/22 17:53, Marcos Felipe Schwarz wrote:
> We probably should hava a talk so I can better describe the problem.
>
> My topology right now is DTN3<->Wedge<->DTN4.
> Both DTNs are Ubuntu20.04 servers with Mellanox ConnectX-5 NICs.
> My initial problem was, every time I reboot my wedge, I can only
> communicate between the DTNs with MTU 8190 (ping -M do -s 8162), and I see
> a lot of retransmissions between them. If I disconnect the cable between
> Wedge and DTN4 and reconnect it, the MTU goes back to normal 9000 (ping -M
> do -s 8972 works), and I get zero retransmissions.
> I have no idea why the problem occours. But after your initial suggestion I
> noticed that always the MTU to the GW IP on the switch was limited to 8190,
> even when the connection between the servers was working at 9000. And after
> increasing the cpuport MTU to 9002 the problem went away.
>
> My question now is, should I be experiencing this problema in the first
> place? Can it be some kind of bug or misconfiguration on my part? Should we
> do futher testing? I'd like to help, even though my problem has been
> mitigated.
>
> Regards,
>
> Marcos Schwarz
>
> -----Mensagem original-----
> De: mc36 <>
> Enviada em: quinta-feira, 6 de janeiro de 2022 13:20
> Para: ; Marcos Felipe Schwarz
> <>
> Cc: Pedro Diniz <>; Alexander Gall
> <>; Jordi Ortiz <>
> Assunto: Re: [RARE-users] RES: RES: Problem on ports after reboot
>
> hi,
>
> i'm a bit lost now... i found two contradiction statements in your reply:
>
> 1-"I can ping between my servers with MTU 9000"
> 2-"switch bridge IP is limiting the MTU between the ports that are part of
> it?"
>
> in my reading, if you can pass mtu 9000 between the servers then the switch
> is not limiting with the mtu?
>
> thanks,
> cs
>
>
>
> On 1/6/22 16:58, Marcos Felipe Schwarz wrote:
>> Thanks mc36.
>>
>> This seams to be dirreclty related to my issue. Somehow the cpuport is
>> limiting the MTU of the traffic.
>> With the original value 8192 on /opt/freertr/bin/hwdet-main.sh, I can ping
>> the GW IP on wedge only up to 8162, which is also the value I'm limited
>> too when the problem occours (after a reboot).
>> I changed the value on /opt/freertr/bin/hwdet-main.sh to 9000 and I could
>> ping the GW on wedge up to 8970, which is also the value I'm limited too
>> when the problem occours (after a reboot).
>> I now set it to 9002, and now the problem has been mitigated. I can ping
>> between my servers with MTU 9000 (ping -M do 8972) after a reboot.
>>
>> I believe that this is not intended. Should we do any tests to verify why
>> the switch bridge IP is limiting the MTU between the ports that are part
>> of it?
>>
>> -----Mensagem original-----
>> De: mc36 <>
>> Enviada em: quinta-feira, 6 de janeiro de 2022 10:36
>> Para: ; Marcos Felipe Schwarz
>> <>
>> Cc: Pedro Diniz <>; Alexander Gall
>> <>; Jordi Ortiz <>
>> Assunto: Re: [RARE-users] RES: Problem on ports after reboot
>>
>> hi,
>> let me only answer to the last, mtu part:
>> so what you've measured here is the mtu that the freerouter can accomplish
>> when sending/receiving to an sdn interface...
>> for that communication, the asic have the so called cpuport, and it's
>> connected to the host os as ens1... the packets are prepended a 16bit word
>> to multiplex on which source/target physical port the packet intended
>> to/from...
>> freerouter restricts itself to 1024 bytes when sending to be able to work
>> flawlessly over various tunneling topologies, setting to cpu port mtu to
>> 8k seemed more than enough... we only increased it to be able to test the
>> jumbo capabilities of the asic with locally generated packets, for the
>> routing protocols to work properly, jumbo frames are not needed at all...
>> but yeahhh, you can increase that, it's set up in
>> /opt/freertr/bin/hwdet-main.sh on the debian based images, and you should
>> find that file if you're on a nix based image....
>> regards,
>> cs
>>
>>
>> On 1/6/22 13:44, Marcos Felipe Schwarz wrote:
>>> Thanks for the suggestions Frederic,
>>>
>>> I tried some of your suggestions already. Setting autoneg to 2 doesn't
>>> work, but it does with 1. I have another server on port 19 that doesn't
>>> have this issue.
>>> I'll try the other tests and post the results
>>>
>>> Another thing that I noticed is that when everything is worknig correctly
>>> and I can ping between Both servers with MTU 9000 and no drops, if I ping
>>> the GW IP at the Wedge I also can only get MTU 8190 to it. Is there a way
>>> to increase the MTU of a internal interface on RARE OS?
>>>
>>> Regards,
>>>
>>> Marcos Schwarz
>>>
>>> -----Mensagem original-----
>>> De:
>>> <> Em nome de Fr d ric LOUI
>>> Enviada em: quarta-feira, 5 de janeiro de 2022 15:16
>>> Para:
>>> Cc: Pedro Diniz <>; Alexander Gall
>>> <>; Jordi Ortiz <>
>>> Assunto: Re: [RARE-users] Problem on ports after reboot
>>>
>>> Did you try to remove configuration from P4lang stanza and sdn interface
>>> and tried to add it again ?
>>> Instead of plug/unplug ?
>>>
>>> It sounds like this is a physical BSP problem.
>>>
>>> Unfortunately we did not experience such problem on our case Can you
>>> drop QSFP info ? Also this can be related to Mellanox OFED driver.
>>>
>>> We configured some 100GE port with CERN with Mellanox XConnect5 ofed
>>> driver and it worked flawlessly.
>>>
>>> Is it the only port at 100GE ? Do you have additional servers ?
>>>
>>> In CHICAGO we have LEONI DAC cable with 100GE port and I set AUTONEG to
>>> ON.
>>> (Otherwise the link does not come up) Can you please try the
>>> following line ?
>>>
>>> export-port sdn20 28 100 0 2 0
>>>
>>> The problem is that we are also using LEONI DAC but as they are using a
>>> specific P4 switch the BSP is not working correctly.
>>> (No info at all from bf_platform)
>>>
>>> Maybe @Alex or @Jordi can share their experience as they have 100GE ports
>>> connected to Mellanox XCOnnect5 ?
>>> Unfortunately in my case I usually dealt with 10GE ports.
>>>
>>> All in all, please to to enable autoneg and please let me know if this
>>> change something ...
>> >
>>> All the best,
>>> Frederic
>>>
>>>> Le 5 janv. 2022 16:44, Marcos Felipe Schwarz <>
>>>> a crit :
>>>>
>>>> Dear all,
>>>>
>>>> I m having problems on my Wedge running RARE-OS.
>>>> Everytime I reboot the port sdn20 comes dropping packets and with
>>>> maximum MTU 8190 (ping -M do -s 8162). To solve the issue I need to
>>>> phisically disconnect the cable and reconnect it. If I reboot again, the
>>>> problem reapear.
>>>> Has any of you had a similar problem?
>>>>
>>>> Port sdn20 (export-port sdn20 28 100 0 1 0) is configured with MTU 9000
>>>> and conected through a Leoni DAC cable to a Mellanox ConnectX5. I ve
>>>> tried changing the DAC cable from EdgeCore, and the problem persists.
>>>> I have limited availability to the equipment, so I d like to get some
>>>> ideas and commands to help troubleshoot the issue.
>>>> This Friday I ll be able to continue the tests, and I intend to:
>>>> Change the connection to a different port and verify if this is
>>>> an port issue
>>>> Set up logging and compare port information before and after the
>>>> problem
>>>> Any other ideas that you guys can help me with to either
>>>> troubleshoot or mitigate the problem
>>>>
>>>> Regards,
>>>>
>>>> Marcos Schwarz
>>>>
>>>> Gerente de P&D | R&D Manager
>>>> Ger ncia de Execu o de P&D em Ciberinfraestrutura | Management of R&D
>>>> Execution in Cyberinfrastructure Diretoria de Pesquisa e
>>>> Desenvolvimento | Board of Research and Development RNP - Rede
>>>> Nacional de Ensino e Pesquisa | Brazilian National Research and
>>>> Educational Network Promovendo o uso inovador de redes avan adas |
>>>> Promoting the innovative use of advanced networks http://www.rnp.br
>>>> |
>>>> +55 (19) 3787- 3386 | Skype ID: marcos.f.sch Campinas - SP - Brasil
>>>> +|
>>>> E-mail:
>>>>
>>>> <RIO0001.txt>
>>>



Archive powered by MHonArc 2.6.19.

Top of Page