Skip to Content.

rare-users - [RARE-users] RES: RES: RES: Problem on ports after reboot

Subject: RARE user and assistance email list

List archive


[RARE-users] RES: RES: RES: Problem on ports after reboot


Chronological Thread 
  • From: Marcos Felipe Schwarz <>
  • To: "" <>, "" <>
  • Cc: Pedro Diniz <>, Alexander Gall <>, Jordi Ortiz <>
  • Subject: [RARE-users] RES: RES: RES: Problem on ports after reboot
  • Date: Thu, 6 Jan 2022 16:53:25 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=rnp.br; dmarc=pass action=none header.from=rnp.br; dkim=pass header.d=rnp.br; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hgw9dXW8N8MB3lym7+skcYkMT7Cu0DGFYo3jK7M8hbA=; b=m/QAWbWBJxUQXFsqLt71mFulWLOAXo64KSnLOPeq9Qi2l0kd7CRsqUtKAViSSdio1o2Dul7onBaxOl5owDJFtVRgkRvY/TSRVLBLO2+soIU2gYvCdjdMy5Fj6CJylCzxRyzve7yxVOLNNu3DLx9sLZFly6SIGObe9gRpTE3zWMdicQn32d+P9bViVq4cAHU33GEna3GOaAmw1Bk454ccYyXI0GQXPR67oGjolU8ndMICp67y6wgVW60qjmd6Adpr1HMYn1nw0uyu1e0Yp7iY9YB2SEBQWCpVEOYKJ/rcF2bAERwN81jGMHxVUZfvCIPnxj5/6vqUix5Vw09oonUhAw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JkqM9cYUXe1ZoM8jQyk4dQae8DP9BiB+FYjEI1/AwJCGR1aIrgd8c+PYSzf61FJ6IQ/vv3MQGuwS8RKFz1QCbnBAvyqrIiavnQch7HJZQbN5QFrTfOzO92dSOU8pNxJEP3OXG9WVxxxCTtxGenw21d8/P2G7XbKIuU5eS2haQ6RufQsaHMwq68ZR8Y/+W1Q6J7bJUlQkMYN4Yb9p6YbhbvjFcqkyBUJL+xjca/kLcutQCObjZatur+lz/yzaqADPlWZEBu7ZoZEoYKk1ZCYam9i+zHobnT6cBC6ooS8iaWm3JJYVDSMTYeYVtFdtDXk5xKnu6f0cQE5rwG55Caa22A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=rnp.br;

We probably should hava a talk so I can better describe the problem.

My topology right now is DTN3<->Wedge<->DTN4.
Both DTNs are Ubuntu20.04 servers with Mellanox ConnectX-5 NICs.
My initial problem was, every time I reboot my wedge, I can only communicate
between the DTNs with MTU 8190 (ping -M do -s 8162), and I see a lot of
retransmissions between them. If I disconnect the cable between Wedge and
DTN4 and reconnect it, the MTU goes back to normal 9000 (ping -M do -s 8972
works), and I get zero retransmissions.
I have no idea why the problem occours. But after your initial suggestion I
noticed that always the MTU to the GW IP on the switch was limited to 8190,
even when the connection between the servers was working at 9000. And after
increasing the cpuport MTU to 9002 the problem went away.

My question now is, should I be experiencing this problema in the first
place? Can it be some kind of bug or misconfiguration on my part? Should we
do futher testing? I'd like to help, even though my problem has been
mitigated.

Regards,

Marcos Schwarz

-----Mensagem original-----
De: mc36 <>
Enviada em: quinta-feira, 6 de janeiro de 2022 13:20
Para: ; Marcos Felipe Schwarz
<>
Cc: Pedro Diniz <>; Alexander Gall <>;
Jordi Ortiz <>
Assunto: Re: [RARE-users] RES: RES: Problem on ports after reboot

hi,

i'm a bit lost now... i found two contradiction statements in your reply:

1-"I can ping between my servers with MTU 9000"
2-"switch bridge IP is limiting the MTU between the ports that are part of
it?"

in my reading, if you can pass mtu 9000 between the servers then the switch
is not limiting with the mtu?

thanks,
cs



On 1/6/22 16:58, Marcos Felipe Schwarz wrote:
> Thanks mc36.
>
> This seams to be dirreclty related to my issue. Somehow the cpuport is
> limiting the MTU of the traffic.
> With the original value 8192 on /opt/freertr/bin/hwdet-main.sh, I can ping
> the GW IP on wedge only up to 8162, which is also the value I'm limited too
> when the problem occours (after a reboot).
> I changed the value on /opt/freertr/bin/hwdet-main.sh to 9000 and I could
> ping the GW on wedge up to 8970, which is also the value I'm limited too
> when the problem occours (after a reboot).
> I now set it to 9002, and now the problem has been mitigated. I can ping
> between my servers with MTU 9000 (ping -M do 8972) after a reboot.
>
> I believe that this is not intended. Should we do any tests to verify why
> the switch bridge IP is limiting the MTU between the ports that are part of
> it?
>
> -----Mensagem original-----
> De: mc36 <>
> Enviada em: quinta-feira, 6 de janeiro de 2022 10:36
> Para: ; Marcos Felipe Schwarz
> <>
> Cc: Pedro Diniz <>; Alexander Gall
> <>; Jordi Ortiz <>
> Assunto: Re: [RARE-users] RES: Problem on ports after reboot
>
> hi,
> let me only answer to the last, mtu part:
> so what you've measured here is the mtu that the freerouter can accomplish
> when sending/receiving to an sdn interface...
> for that communication, the asic have the so called cpuport, and it's
> connected to the host os as ens1... the packets are prepended a 16bit word
> to multiplex on which source/target physical port the packet intended
> to/from...
> freerouter restricts itself to 1024 bytes when sending to be able to work
> flawlessly over various tunneling topologies, setting to cpu port mtu to 8k
> seemed more than enough... we only increased it to be able to test the
> jumbo capabilities of the asic with locally generated packets, for the
> routing protocols to work properly, jumbo frames are not needed at all...
> but yeahhh, you can increase that, it's set up in
> /opt/freertr/bin/hwdet-main.sh on the debian based images, and you should
> find that file if you're on a nix based image....
> regards,
> cs
>
>
> On 1/6/22 13:44, Marcos Felipe Schwarz wrote:
>> Thanks for the suggestions Frederic,
>>
>> I tried some of your suggestions already. Setting autoneg to 2 doesn't
>> work, but it does with 1. I have another server on port 19 that doesn't
>> have this issue.
>> I'll try the other tests and post the results
>>
>> Another thing that I noticed is that when everything is worknig correctly
>> and I can ping between Both servers with MTU 9000 and no drops, if I ping
>> the GW IP at the Wedge I also can only get MTU 8190 to it. Is there a way
>> to increase the MTU of a internal interface on RARE OS?
>>
>> Regards,
>>
>> Marcos Schwarz
>>
>> -----Mensagem original-----
>> De:
>> <> Em nome de Fr d ric LOUI
>> Enviada em: quarta-feira, 5 de janeiro de 2022 15:16
>> Para:
>> Cc: Pedro Diniz <>; Alexander Gall
>> <>; Jordi Ortiz <>
>> Assunto: Re: [RARE-users] Problem on ports after reboot
>>
>> Did you try to remove configuration from P4lang stanza and sdn interface
>> and tried to add it again ?
>> Instead of plug/unplug ?
>>
>> It sounds like this is a physical BSP problem.
>>
>> Unfortunately we did not experience such problem on our case Can you
>> drop QSFP info ? Also this can be related to Mellanox OFED driver.
>>
>> We configured some 100GE port with CERN with Mellanox XConnect5 ofed
>> driver and it worked flawlessly.
>>
>> Is it the only port at 100GE ? Do you have additional servers ?
>>
>> In CHICAGO we have LEONI DAC cable with 100GE port and I set AUTONEG to ON.
>> (Otherwise the link does not come up) Can you please try the
>> following line ?
>>
>> export-port sdn20 28 100 0 2 0
>>
>> The problem is that we are also using LEONI DAC but as they are using a
>> specific P4 switch the BSP is not working correctly.
>> (No info at all from bf_platform)
>>
>> Maybe @Alex or @Jordi can share their experience as they have 100GE ports
>> connected to Mellanox XCOnnect5 ?
>> Unfortunately in my case I usually dealt with 10GE ports.
>>
>> All in all, please to to enable autoneg and please let me know if this
>> change something ...
> >
>> All the best,
>> Frederic
>>
>>> Le 5 janv. 2022 16:44, Marcos Felipe Schwarz <> a
>>> crit :
>>>
>>> Dear all,
>>>
>>> I m having problems on my Wedge running RARE-OS.
>>> Everytime I reboot the port sdn20 comes dropping packets and with maximum
>>> MTU 8190 (ping -M do -s 8162). To solve the issue I need to phisically
>>> disconnect the cable and reconnect it. If I reboot again, the problem
>>> reapear.
>>> Has any of you had a similar problem?
>>>
>>> Port sdn20 (export-port sdn20 28 100 0 1 0) is configured with MTU 9000
>>> and conected through a Leoni DAC cable to a Mellanox ConnectX5. I ve
>>> tried changing the DAC cable from EdgeCore, and the problem persists.
>>> I have limited availability to the equipment, so I d like to get some
>>> ideas and commands to help troubleshoot the issue.
>>> This Friday I ll be able to continue the tests, and I intend to:
>>> Change the connection to a different port and verify if this is an
>>> port issue
>>> Set up logging and compare port information before and after the
>>> problem
>>> Any other ideas that you guys can help me with to either
>>> troubleshoot or mitigate the problem
>>>
>>> Regards,
>>>
>>> Marcos Schwarz
>>>
>>> Gerente de P&D | R&D Manager
>>> Ger ncia de Execu o de P&D em Ciberinfraestrutura | Management of R&D
>>> Execution in Cyberinfrastructure Diretoria de Pesquisa e
>>> Desenvolvimento | Board of Research and Development RNP - Rede
>>> Nacional de Ensino e Pesquisa | Brazilian National Research and
>>> Educational Network Promovendo o uso inovador de redes avan adas |
>>> Promoting the innovative use of advanced networks http://www.rnp.br
>>> |
>>> +55 (19) 3787- 3386 | Skype ID: marcos.f.sch Campinas - SP - Brasil
>>> +|
>>> E-mail:
>>>
>>> <RIO0001.txt>
>>



Archive powered by MHonArc 2.6.19.

Top of Page