Realtek RTL8168/8111C - Link is disconnected #432
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hello,
Here is a bug report regarding Realtek's gigabit Ethernet network adapter, the RTL8168/8111C.
PCI LIST:
Bug description:
The network card is detected, but remains “disconnected” from the local network indefinitely. No obvious actions (enabling or disabling the card in the LAN control panel, reconnecting the Ethernet cable) have any effect. Entering static network settings in
network.inialso has no effect.Context:
BOARD LOG:
According to research, PHYStatus register might be buggy/delayed on this particular chip.
Linux drivers writes many magic numbers into the chip which I'd like to avoid.
Attached is an updated driver that will read the MII status when PHY says link is down, to debug the issue.
Thank you for the test driver.
Here is what the log currently shows:
By that, do you mean the undocumented values set by the r8169.c Linux driver in the 8168’s proprietary registers exposed in the PCI BAR? Why are you reluctant to use them in your own driver? Please don’t take this as a criticism on my part; I'm just curious about the right way to go about it when you can't sign the required NDA with the manufacturer.
Hi,
just a bit of extra information.
The
XIDreported bydmesgfor my Ethernet adapter is0x3C4:In
r8169_main.c, this would correspond to the identifierRTL_GIGA_MAC_VER_22Could the problem be caused by the 8168’s power-saving feature, which might prevent it from detecting the modulation on the cable?
Did you ever check if unplugging and re-plugging the cable causes link detection to try again?
In any case, attached you will find an updated driver with PHY configuration for your particular chip, ported from the linux driver.
Thank you for your prompt reply @hidnplayr. Here is the MD5 hash of the driver I have just tested:
Unfortunately, the result isn't the success we'd hoped for, but that's just part and parcel of debugging. The symptoms remain the same as those described in my first bug report. I had already tried unplugging and re-plugging the Ethernet cable, and I’ve now tried again, but there’s still no change. The LEDs on the RJ45 connector remain off.
BOARDLOGisn't giving us any more clues than it did the first time:Misc
In the LAN control panel, apart from the ‘Start device’ button, which triggers a response from the kernel (see screenshot), the other two buttons appear to be completely non-functional.
PHY proprietary initialisation parameters
In your shared source code (thank you for that), you imported from
r8169_phy_config.cthe parameters stored in the functionsrtl8168c_1_hw_phy_configrtl8168c_2_hw_phy_configrtl8168c_3_hw_phy_config:Which of these three sets of values does your FASM code actually send to my RTL8168c (stepping 3c4)?
Hi @hidnplayr,
To stick strictly to the initialisation parameters contained within the
rtl8168c_3_hw_phy_configfunction of the Linux driver — which apply specifically to RTL8168C steppings0x3c3and0x3C4— I have made a slight modification to your code. The .asm and .diff files are attached.I would like to assemble
RTL8169.asmusing FASM integrated into Kolibri. To do this, do I need to manually gather all the dependencies called byincludeinto the root of a single working directory, or would you recommend a different approach?Thank you!
The easiest is to assemble the driver on windows or linux and inject it into kolibri.img before booting kolibrios.
Otherwise, you will need to edit AUTORUN.DAT so driver is not loaded on boot, or save the new driver to kolibri.img and reboot kolibrios, as there is currently no way to unload the driver once loaded.
In case you still want to assemble from within kolibrios, you'll need to make space on the floppy image for all the needed files (remove some of the games/3d demos for instance), or put them on a harddrive or usb drive accessible from within KolibriOS.
I'm not sure about the stepping you refer to. Last presented driver by me was modified by Claude AI to incorporate the supposedly relevant changes for your card. (based on PCI ID and revision)
One possible explanation of the issue at hand, as suggested by different AI's, could be that auto-negotiation is taking longer on your card than on others, and does not complete before our link detection routine is run.
In your linux log, it seems it takes around 2s for link to be detected.
PS: we can increase debug output, by changing the DEBUG_LEVEL value before assembling the driver, to get more information.
Hi @hidnplayr , thank you for your detailed reply.
Here are the driver's debug messages, with its verbosity level set to 1:
I was referring to the revision number, as returned by the kernel –
0x3C4(aka0x7c8, 0x3c0, RTL_GIGA_MAC_VER_22, "RTL8168c/8111c") in the case of my 8168C adapter.The LLM did a fair job. I was surprised that the resulting code (I’m referring to your version dated 23 April) modified the registers with values different from those in r8169_phy_config.c. It also didn’t include that bit shift in register 0x16 (whatever the impact might be):
phy_set_bits(phydev, 0x16, BIT(0));But my latest attempt to adapt RTL8169.asm to match r8169_phy_config.c as closely as possible didn't yield any better results. Incidentally, the binaries produced by FASM on Linux and Kolibri do not generate the same MD5 hash. I am attaching the relevant files here.
Would it be possible to add a delay to the FASM driver to test this hypothesis?
Another idea: would you be willing to prepare a version of the driver with the magic numbers for other cards removed, in order to rule out any risk that an error in the detection tests might cause a section other than the one applicable to my card to be executed?
Thank you !
Aha! mcfg=0, it means non of our specific code is being run..
On line ~594, change
mov [ebx + device.mac_version], ecx
to
mov [ebx + device.mcfg], ecx
Hello,
I've made the changes you suggested, and the driver is now generating the debug messages we were expecting:
I think I’ve also figured out the reason for the differing hash values between my builds on Linux and Kolibri: in fact, two consecutive builds on Linux produce two files with different hash values. I suspect FASM adds a timestamp to the executable binary?
So much for the good news...
Unfortunately, in this version (attached below), the driver is still unable to initialise the Realtek adapter so that it detects the signal on the cable. I would have loved to share some good news with you, but... it’s too soon.
Let me reiterate my previous suggestion: in a test version, would you be willing to remove all configuration code for other adapter models and run the configuration code specific to my adapter (lines 859–874) unconditionally? I’d be happy to test such a version and let you know. The idea is to completely rule out the risk of any other part of the MCFG code being executed.
The next step would be to check the Realtek RTL8168 driver to see if there is anything fundamentally different from the Linux RTL8169 ‘umbrella’ driver. We can always put Claude's skills to the test... :-)
See you soon.
Thanks for hanging on!
Without the actual hardware or experience with this specific revision, it's just guessing for me, so I hope you don't mind the ride..
Adding a long delay inside interrupt handler is a recipe for disaster in production code (as it blocks other interrupts during that period), so some kind of timer would need to be implemented instead.
I have seen it/have done it on some other hardware many moons ago, we can go down that route once all other roadblocks are cleared.
For debug, you could try using the udelay macro, which actually uses the sleep function that takes an argument in milliseconds.
Meanwhile, I have fed your new debug output to the machine, and it generated more changes based on latest linux source:
(Sorry, I ran out of credits before I fed it your previously corrected asm file.)
(Code in attach)
However, after taking a brief look myself, the end of probe function (waiting for auto negotiation to complete) looks fishy to me.
Here's my updated proposal regarding this part:
Hi !
Well, I have successively built and restarted with:
I have attached both versions here, with descriptive filenames.
Here is the kernel log, generated on the last boot:
As you will have gathered, unfortunately, I still do not have good news. 👨🏻🦯➡️
Are there LED's on the adapter? do they come up?
Where is the other end of the Ethernet cable connected to? What is the link status there?
If the link does come up, is it before or after "RTL8169: Link change detected" event?
Hello,
relying on Claude.ai, I carried on with our work and assembled a new version (please see the attached
RTL8169_0x3C4_v5.asmfile), based on the following analysis of the FASM code. Here’s an excerpt from what the LLM has to say:And here is what the logs say about its work
(and yes, all the LEDs remain stubbornly off):
The last debug message
non-TBI: autonegotiation completeis missing from the log. Rather than proceeding with a new iteration with Claude, it makes more sense to ask what you think of his hypothesis. I’ve already questioned his analysis, but he stood by it with confidence (as all LLMs are so good at doing). Here is how I phrased my objection:We could also talk about it on Telegram.
As for the LEDs, none of them light up on the RJ45 connector soldered to the motherboard. I hooked up a mini-hub between the computer and the wall-mounted RJ45 outlet. Under Linux, all the LEDs light up and the traffic LED flashes. Under Kolibri, none of the LEDs light up, including none on the mini-hub.
I think we need more information !
If the link doesn't come up, first things I think about are:
Energy/power settings. I understood from previous posts this chip is integrated on a motherboard.
Try disabling LAN related power settings in BIOS/UEFI.
You can also try a warm reboot from eg Linux into KolibriOS.
Network adapters are typically split into a MAC (Media Access Controller) and PHY (Physical layer converter).
Sometimes they are integrated into a single chip, and they know each-other very well and play along nicely.
Sometimes they are two separate chips and the driver needs to convince the MAC a little to play nicely with the PHY...
Can you perhaps make a picture of the concerning chip(s) on your motherboard?
They will be close to the Ethernet port and have Realtek logo.
Meanwhile, I dug through some boxes and found a motherboard with RTL8111B chips that I haven't tested before.
I will test it soon and report back here.
So.. the test results are in !
The motherboard I tested with has two identical RTL8111B chips onboard.
I also added one PCI-express and 3 regular PCI card RTL8169 and similar NICs.
One of the onboard ports was used to boot KolibriOS via PXE (using built-in ROM on motherboard).
This onboard port was working fine in KolibriOS: gigabit link detected, IP address assigned etc.
Second port was seen in KolibriOS and link was detected when cable plugged in.
It however negotiated only 100mbit instead of gigabit, and send/receive was not working (DHCP failed)
Only one of the other NICs was detected. (The motherboard has 6 PCI-slots, so there must be at least one PCI bridge, clearly it was disabled)
I disabled ACPI in the BIOS settings and tried again.
Now the second RTL8111B port was also working (sometimes), but link was still only seen as 100mbit.
Additional cards were now also visible:
I got a working gigabit link on RTL8168evl/RTL8111evl (the pci-express card)
The three PCI-cards all identify as RTL8169sb/RTL8110sb, link was detected as 100mbit only on all of them, but working.
I corrected information in the previous post, after testing again.
The LLMs were right regarding this point:
"The problem: auto-negotiation is never restarted for the 8168C", but their proposed solution was incomplete.
(They didn't go through the effort to understand reason for 'set_io' macro and how to use it properly.)
After fixing the issue manually, Gigabit is now working on both onboard ports of my test setup.
Hello!
Attached are photos:
Furthermore, this BIOS offers no accessible settings regarding ACPI. It is sometimes possible to open certain BIOS settings of in “extended mode”, where additional choices suddenly become available to the user; I will check if this is the case for this one.
Regarding warm booting from Linux into Kolibri, I will look into how to hot-swap the Linux kernel to memdisk --initrd=kolibri.img. These commands, tried without certainty:
caused the machine to reboot via the BIOS POST phase, which is precisely what they were supposed to avoid. So I’ll have to ask a Linux guru...
Lastly, the datasheet we were discussing this morning on Telegram is completely silent on the point that interests us: the proprietary registers and their initialisation values. Were you referring to that same datasheet, or to a different one?
Cheers!
Our answers overlap. Your latest finding seems very promising...
Attached is the code with gigabit + aneg init patched.
More issues remain, but it's a start.
ACPI setting on my board
The 'warm reboot' test might also be replaced with enabling 'onboard LAN boot ROM', and have 'boot from LAN' option in the boot order list, before the one that actually boots KolibriOS.
Is this issue fully resolved by #440 and can be closed?
@Burer, I'm awaiting feedback from the OP, as the original issue was related to RTL8111C, which I was unable to test.
Hello, it is not possible to turn off ACPI in my BIOS. Only let the system decide (S1&S3).
The card is identified well, but has "disconnected" status.
@Golffies: @Leency had an interesting development where his RTL8169SC card started to work only after a hot reboot from windows. We'll investigate by dumping registers after hot windows vs cold reboot, perhaps its something you can also investigate.
@Golffies : Please try the driver from attachment if you have the time. It will perform a dump of registers at certain strategic points, but also has fixed PHY revision logic.
Hi guys,
The BIOS on my motherboard is also from Award. It doesn't have the ability to disable ACPI either.
@hidnplayr
Here is the dump generated by your latest version of the RTL8169.SYS driver.
On the motherboard, the Ethernet adapter LEDs remain off, and no packets are being received or transmitted. However, the static configuration for the LAN subnet now appears in the “Network status” control panel. Interestingly, once this test driver is loaded, the Ethernet adapter no longer appears in the PCI enumeration at all.
I'm here to help you continue testing. Would you like me to mail you this motherboard (I have a spare)?
Very interesting, the driver managed to make the card refuse to talk to the PCI(e) bus any further!
I hope the card wants to talk to us again after a (cold) reboot, so we can try again!
I adjusted the debug output a little, also the setting of power state is more finely implemented, and I skipped the writing to undocumented MAC register 0x82, as it seems its needed for very old chips only, and was likely what upset your chip.
Hello,
This 8111C is giving us a hard time. This latest version of the 8169 driver keeps the Ethernet adapter in its usual state: LEDs off, no Ethernet signal detected. Here is the log generated: