mirror of
https://git.missingno.dev/kolibrios-nvme-driver/
synced 2024-11-22 07:33:49 +01:00
NVMe Commands Fail on Virtualbox #1
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is a followup continuation of KolibriOS/kolibrios#81.
Summary: Currently commands fail to run on Virtualbox's NVMe controller. I'm currently investigating a fix for this.
A little addition: I also tried to load driver (some version for 01.08.2024) in VMWare, and it just hangs system, even without any debug prints.
Maybe the reason is just unformatted disk, but still it shouldn't freeze the system so severely.
Huh? That's bizarre. I don't think it's a problem with VMWare's NVMe controller because the driver didn't even start PCI device enumeration yet if it didn't print anything. So it's probably a driver bug.
Can you reproduce this issue when loading another driver?
Calling loaddrv manually on QEMU also produces some weird behavior.
Well, at least related to the QEMU issue, it looks the data is uninitialized for some reason. I'm working on a fix for this, might help the VMWare situation too.
See: #2
As far as I know - no, I've never seen anything like that. But I have very little experience with KolibriOS, so my opinion is hardly relevant here. Perhaps @Doczom can suggest something.
But if you have ideas for specific drivers, I'd be happy to test them.
So, I tested latest image, with commit
aa053399c1
both in VirtualBox and VMWare with driver loading added in AUTORUN.DAT and here are the results.@Burer These are interesting results. So it looks like both Virtualbox and VMWare both fail to send a command (at least I think so for VMWare...) except VMWare freezes immediately after that occurs, correct?
This is good though. I was suspecting that perhaps Virtualbox's NVMe controller may not be compliant with the NVMe spec. But if VMWare is also having this issue at that same exact spot, then it's for certain a issue with the driver.
I'll do some more reading on the specification and browse through Linux's NVMe driver to see what it does differently tomorrow. In the meantime, could you try and pull the changes from the latest commit (
8fbb0268
) and share the screenshot once again on VMWare? I just want to know for certain that it is in fact hanging on a command.Yes, it just freezes at the moment, showed on the last screenshot I posted here.
Of course, and here are the results on
8fbb02681d
commit - just instant freeze after driver loading command input.Ah, that's not what I meant. Sorry.
I forgot to add the updated image, so check the newest commit and try that one instead. If there's a message saying something like: "Writing to SQ0 doorbell register..." at the end, then it's definitely the command problem which Virtualbox has as well.
As for the issue with VMWare just freezing from loading the driver in the command line, I'm not sure why that's happening.. but perhaps I can investigate once this is resolved.
Ah, no problem at all, don't worry.
And here we are, looks like you were totally right, it gives this message at the end.
And is there any setup, that also have this problem, not only VirtualBox?
I think VMWare might. Try that.
Got it, just thought that you want to deal with VMWare later.
Just the same result, this message in debug and dead system.
@Burer, my bad! I meant let's avoid testing with calling loaddrv manually for now. Sorry.
But this is really good. So it looks like both Virtualbox and VMWare both fail to complete a command (except VMWare freezes if I understand correctly). Thank you for testing. I'll let you know if I make any progress on this.
I don't have VMWare, but my hope is that once I fix Virtualbox the problem should go away on VMWare as well. So let's see how that goes. :)
@ramenu, don't worry, no problem at all!
And you are always welcome.
Will be waiting for any results to test them, and yes, will hope that both problems will be solved at once! :)
@Burer, Some good news, I have sent @punk_joker a message but maybe you can answer this as well:
EDIT: Answered by @punk_joker
@ramenu, your clue sounds pretty promising, and I'm glad you and @punk_joker were able to find the answer to that question, because I'm unlikely to be able to help here - I've never even been close to writing drivers in my life, and pretty little development of anything for KolibriOS in general.
So will hope and wait for positive results.
@Burer, do you see this error when looking in VBox.log? Should be in the Logs folder of wherever your KolibriOS VM folder is:
00:00:06.095625 NVMe#0Wrk#0: Processing admin command 0xffff returned with error: VERR_PDM_NOT_PCI_BUS_MASTER
@ramenu, yes, it is here:
@Burer Following up on this,
Unfortunately, I haven't been able to fix this issue. However, I was able to find one useful hint as to what this error may indicate. Perhaps you can ask one of the developers about it and they may be able to provide a reason for it.
/** The PCI device isn't configured as a busmaster, physical memory access
* rejected. */
#define VERR_PDM_NOT_PCI_BUS_MASTER (-2891)
I'm not sure if this is a issue with the driver itself, but judging from what the error says, it seems like the device needs to be configured correctly (as a busmaster? I don't know what this means.)
Does VMWare provide similar logs somewhere? If you can provide a dump of that, it could be helpful. Thanks!
@ramenu, greetings!
First of all, here are some examples of setting PCI master mode from @Doczom. Looks like it is really obligatory to set in order to use drivers for PCI devices (but I am not an expert in that, so may be wrong).
And about VMWare logs - there are no so clean "some PCI error" statement, but are some NVMe and PCI logs, so I will attach the whole file, and here is some example from line 1119, for example:
@Burer and @Doczom, thank you so much! That did fix the problem indeed. :) I've included a fix for this in the newest commit, so you can grab that whenever it syncs to the Gitea repo.
Now the screen freezes just like VMWare. No logs generated, but I'm very happy about this development. This took a lot longer than it should have. I'll have to debug some more to see what's going on here.
Aha, so the controller is finally receiving commands properly, that's good news! It's just stuck calling the interrupt handler repeatedly, presumably because a command isn't being acknowledged as completed properly (but that's a bug in my code) if I were to guess, I'm assuming this is true in VMWare's case too.
@Burer, so I got the NVMe driver working on Virtualbox, however I haven't given this an extensive test yet. That specific freeze was caused by the interrupt status bit in the PCI status register not being set.
I haven't made made these changes to the git repo yet, but it's nice to know that once this bug is sorted out, everything else seems to work properly. Let's hope this is the case for VMWare too! :)
@Burer , I've added the commit which removes the interrupt status bit check. This is not ideal but until I can think of a better solution I'll leave it like this for now.
So please test the latest driver on Virtualbox and VMWare, should work as expected now. I did notice a bug when reading and writing from disk enough times (about 255 times to be precise) where the screen will freeze due to a bug in the command wrapping part of the driver, I think. That's probably next on my TODO list.
@ramenu, it is just great!
Driver now properly loads on both VB and VMW, and don't freeze OS.
But on VB, where I have NVMe drive connected, it freezes when trying to shut down OS (page fault), and resets it to minimal resolution and 8-bit video mode. While on VMW, where I don't have connected NVMe drives, is shuts down normally.
I will test it later with NVMe drive connected to the VMW and will add results to this post, but for now results are just great, thank you very much!
UPD. I configured a virtual NVMe drive for VMW and tested KolibriOS with it, but results are a little more modest - I just got controller initialization error and nothing more, system still works and shuts down steadily.
P.S. While testing the driver in VB I also found some other problems, and created separate issues for them, in order to not mix it all up.
@Burer, Thank you for providing these extensive tests! It's much appreciated, as always. :) I've opened a separate issue for these topics, I'm going to close this issue since it's been resolved now.