Hardware IOMMU and xhci controllers

You'd think that after the long time that has passed since the first USB3 controllers appeared on the computer market, Linux would have no big issues with USB3. Well, that's as far from reality as one might imagine.
For the last couple of months I had an annoying problem with my USB3 PCIe controller card. One out of four ports were dead and the other ones occasionally refused to work as well. Luckily at least one port always worked and so I just plugged my USB3-hub into that port and continued doing stuff. Fast forward to this week. I replaced my GPU with a more recent model and in order to fit the new card into the PC-tower, I had to shuffle around other PCIe cards. This also required the USB3-controller card to be removed. After I finally got the GPU into a fitting slot, I started to put the other PCIe cards back into the tower when I found a dark spot on my USB3 controller card:
broken_USB3_card.jpg
To my surprise some electronic part on that card had burnt and left a spot the size of a human thumb nail being entirely black. Now I had my explanation why one of the USB3-ports was always dead. Of course I didn't want to put a semi-broken card back into my PC so I tried to replace it with a spare part. And that's where the real fun began.

Booting Linux with that spare USB3 PCIe-card immediately resulted into the IOMMU shutting down the new USB3-card. What I found out the hard way with three(!) different USB3-controller cards was that no matter what kind of chip is being used on the cards, there's always some issue.
Apparently, cards with a "Renesas" chip don't have IOMMU issues but if you happen to use a card with too many of these chips being soldered onto the card, Linux just shuts the entire controller card down because Renesas chips seem to be the worst possible USB3 controller chips availabe with just way too many quirks.
On the other hand, cards with either "VIA" or "ASMedia" chips get in trouble when a (buggy?) hardware IOMMU is in use. My new card with two ASMedia chips booted nicely when no USB device was connected to it:

xhci_hcd 0000:09:00.0: xHCI Host Controller
xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 8
xhci_hcd 0000:09:00.0: hcc params 0x0200ef81 hci version 0x110 quirks
0x0000000000000010 usb usb8: New USB device found, idVendor=1d6b,
idProduct=0002, bcdDevice= 5.04 usb usb8: New USB device strings:
Mfr=3, Product=2, SerialNumber=1 usb usb8: Product: xHCI Host Controller
usb usb8: Manufacturer: Linux 5.4.51 xhci-hcd
usb usb8: SerialNumber: 0000:09:00.0
hub 8-0:1.0: USB hub found
hub 8-0:1.0: 2 ports detected
xhci_hcd 0000:09:00.0: xHCI Host Controller
xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 9
xhci_hcd 0000:09:00.0: Host supports USB 3.1 Enhanced SuperSpeed
usb usb9: We don't know the algorithms for LPM for this host, disabling
LPM. usb usb9: New USB device found, idVendor=1d6b, idProduct=0003,
bcdDevice= 5.04 usb usb9: New USB device strings: Mfr=3, Product=2,
SerialNumber=1 usb usb9: Product: xHCI Host Controller
usb usb9: Manufacturer: Linux 5.4.51 xhci-hcd
usb usb9: SerialNumber: 0000:09:00.0
hub 9-0:1.0: USB hub found
hub 9-0:1.0: 2 ports detected
xhci_hcd 0000:08:00.0: xHCI Host Controller
xhci_hcd 0000:08:00.0: new USB bus registered, assigned bus number 10
xhci_hcd 0000:08:00.0: hcc params 0x0200ef81 hci version 0x110 quirks 0x0000000000000010
usb usb10: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.04 usb usb10: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb10: Product: xHCI Host Controller
usb usb10: Manufacturer: Linux 5.4.51 xhci-hcd
usb usb10: SerialNumber: 0000:08:00.0
hub 10-0:1.0: USB hub found
hub 10-0:1.0: 2 ports detected
xhci_hcd 0000:08:00.0: xHCI Host Controller
xhci_hcd 0000:08:00.0: new USB bus registered, assigned bus number 11
xhci_hcd 0000:08:00.0: Host supports USB 3.1 Enhanced SuperSpeed
usb usb11: We don't know the algorithms for LPM for this host,
disabling LPM. usb usb11: New USB device found, idVendor=1d6b,
idProduct=0003, bcdDevice= 5.04 usb usb11: New USB device strings:
Mfr=3, Product=2, SerialNumber=1 usb usb11: Product: xHCI Host
Controller usb usb11: Manufacturer: Linux 5.4.51 xhci-hcd
usb usb11: SerialNumber: 0000:08:00.0
hub 11-0:1.0: USB hub found
hub 11-0:1.0: 2 ports detected


But as soon as I connected some device to that card:

usb 10-1: new high-speed USB device number 2 using xhci_hcd
xhci_hcd 0000:08:00.0: Abort failed to stop command ring: -110
xhci_hcd 0000:08:00.0: xHCI host controller not responding, assume dead
xhci_hcd 0000:08:00.0: HC died; cleaning up
xhci_hcd 0000:08:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0012 address=0x100000000 flags=0x0000]
xhci_hcd 0000:08:00.0: Timeout while waiting for setup device command
usb 10-1: hub failed to enable device, error -62
usb usb10-port1: couldn't allocate usb_device
usb usb11-port1: couldn't allocate usb_device


Yeah! Well done IOMMU! Searching for "IOMMU xhci" results in lots of posts from Linux users who suffer the same problem. Some xhci kernel driver hacker told me that this is an IOMMU issue and usually requires to update the mainboard's BIOS. But my mainboard (Supermicro H8DG6-F) is EOL for quite a while already and the manufacturer told me that there will be no more BIOS updates for this board. So the only other solution was to disable usage of the hardware IOMMU in the Linux kernel. I added the following to my kernel command-line: "amd-iommu=off iommu=soft" and now I can finally use my new USB3 PCIe controller card.

Fun with Areca RAID-Controllers

I am using RAID-cards for about twenty years now and they served me very well over all these years. I went through different brands starting with a Mylex AcceleRAID 170 SCSI PCI RAID-controller back in the year 2000 which I then replaced in 2007 with an Adaptec 4805SAS SAS PCIe card.
What I learned the hard way was that all these "old" controllers were not capable of having RAID-volumes bigger than 2TB or handling hard disks bigger than 2TB. So - again - a new RAID-controller card had to be obtained. This time I chose an Areca ARC-1680ix-12 and - what I found out over the years - that was a mostly good decision. The card has some really nice features I didn't have with my old cards. One feature I instantly started to like was that I could perform firmware updates of that card within Linux. No more DOS boot diskette / USB-memory device. Just unpacking the new firmware files, firing up "CLI64", reboot and voila... new firmware \o/
In 2016 I decided to put my OS on two SSDs configured as RAID-1 on my controller. Booting was significantly faster becaues of the low seek times but the transfer rate was awful… hdparm -t measured about 250MB/s which is even worse than my RAID-10 with 13 ten years old spinning rust which is at about 310MB/s.
So I started reading Areca threads in different hardware forums which took weeks to finish (some threads dated back to the year 2009 and had not less than about 70 pages full of comments) but to no avail. Occasionally some user reported the same issue having low transfer rates with SSDs but not a single hint where the problem comes from or if there's a fix available.
Meanwhile I even replaced both SSDs with different (and bigger :-D) ones but transfer speed was still at disappointing 250MB/s.
The first glimpse what could be wrong I got when I checked my SSDs with smartctl:

CODE:
# smartctl -i -d areca,1/2 /dev/arecactl0 | grep '^SATA Version'
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s)

The controller showed the same link speed but unfortunately not with the "CLI64" tool but only in the RAID-BIOS or in the archttp web configuration framework.
Alright, so now I knew why my SSDs are so slow but I still had no idea why the controller decided to set the link speed to the lowest possible value. At this point I already ruled out the SSDs being to blame here because of the two different types of SSDs I tried.
Being stuck here I forgot about the issue till I accidentally stumbled upon a changelog file on Areca's FTP-server which describes fixes to the built-in SAS-Expander all Areca RAID-Cards with an "ix" in its model name have:
D)Set the initial Min. speed to 3.0G. for some 6G SATA HDD negotiate as 1.5G.

Jackpot! Exactly my issue. But how to update that SAS-Expander's firmware? Well… unfortunately not so easy like the RAID-Controller's Firmware… now I learned why my Areca RAID-Controller has a built in RJ11 connector. This is actually an RS232 interface that allows direct communication with the built-in SAS-Expander as long as you have a RS232 to RJ11 converter cable. Fortunately the Areca controller I bought had such a cable with it. And even better, Areca provides a downloadable PDF document which describes how to connect and interact with the SAS-Expander... by using Windows + Hyperterminal! :-(
So I had to figure out how to do the firmware upgrade under Linux myself. First I tried different serial terminal programs (cutecom, minicom, screen) but they all failed at uploading the two(!) firmware files even though I used sx for transfer as the documentation said using xmodem/1K is required.
At this point I was really nervous because in order to upload the two files to the expander you first have to erase the corresponding blocks in the expander's ROM. So while the erasing was successful the upload was not. After 90 minutes of try and error I finally got the files uploaded and the SAS-Expander its new firmware. Here's a short list of things you need to do:

  • Connect to the SAS-Expander:
    CODE:
    cu -l /dev/ttyS0 -s 115200

  • Go through the processes like described in the upgrade manual until you reach the point where you are asked to upload the files. Now type ~$ into cu and then paste the following command (with the real filename of course) into cu:
    CODE:
    sx -b -k filename < /dev/ttyS0 > /dev/ttyS0


  • Logout from the SAS-Expander, type ~. into cu, reboot and you should have the new firmware running the built-in SAS-Expander.
    Unfortunately this was still not enough for my second Areca RAID-controller. It still reported my SSDs with 1.5G link speed. So I connected to the SAS-Expander once again and manually set the minimum link speed for the affected devices to 3.0G instead. The following is an example command how to do this. You have to keep in mind that the first hex value is the device, the second hex value is the max speed and the third hex value is the min speed. Since my ARC-1680ix-12 controller only can perform 3.0G link speed as maximum, I set the max and min link speeds the same:
    CODE:
    CLI> LI 0x02 0x9 0x9

    Now save the settings and do not switch cables on your SSDs or you need to set the link speeds again :-)
    After all this hassle, I finally had some satisfying transfer rates:
    CODE:
    # smartctl -i -d areca,1/2 /dev/arecactl0 | grep '^SATA Version'
    ATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
    # hdparm -t /dev/sda
    /dev/sda:
     Timing buffered disk reads: 1416 MB in  3.00 seconds = 471.72 MB/sec

    My Notebook and its fans...

    Having my Dell Precision M6700 notebook for quite a while already, the only remaining big issue I had was insufficient GPU cooling. Everytime the GPU got heavily utilized, the temperature easily exceeded 95°C and sometimes even reached the point where my notebook would simply shut off entirely.
    I finally took the time to do some research and it seems like this is a common issue with Dell notebooks. There are numerous reports about Dell notebooks either having their fans running at high speeds all the time, doing a constant spin up / spin down cycle or - like in my case - simply doesn't do proper cooling at all. The reason for these issues most of the time is that the notebook's BIOS takes full control over the fans and doesn't do a good job at that.
    Under Linux there are the i8kutils but these cannot change the fan speeds permanently because the BIOS keeps control over the fans.
    Luckily there is now a tool called dell-bios-fan-control which can toggle BIOS control of the fans.
    With the help of this tool I can finally set the GPU fan to full speed when necessary and do no longer have to fear overheating or even instant shut offs anymore.
    Now all I need is a tool that can independently monitor CPU and GPU temperatures and set the corresponding fans as necessary to keep my notebook cool.

    FOSDEM 2018

    Finally I made it to some convention again. Last time I was at a convention was back in 2010. So it became FOSDEM this time and what should I say? It was really great.

    Friday evening

    I was especially excited to meet some of my fellow Gentoo developer colleagues and friday's "Beer Event" seemed to be the perfect start for this. Although being quite exhausted from my five hours ride by car it was a really nice evening chatting with my latest recruit ntnn as well as Whissi, dilfridge, mrueg, k_f, graaff, bircoph and lu_zero while having (a bit too much) beer. By the way, some of the belgian beer is really tasty.

    Saturday

    So thanks to the Beer Event I made my first impression at FOSDEM by being one hour too late for my shift at the Gentoo stand. Admittedly it was not only the beer but also trying to find my way by car from the hotel to FOSDEM. Traffic in Brussels is really horrible.

    Arriving at the stand, fortunately k_f and chithanh already took over and helped interested people "compiling" their own Gentoo buttons by giving them instructions on how to operate the button machine. The button machine turned out to be a big attraction for many FOSDEM visitors and combined with the great Gentoo table cloth provided by k_f, the stand had a real great look and might have been a great impression to the visitors.

    So I took over my second half of the shift when some woman approached the stand being very kind and even shook hands with us Gentoo people. It turned out she is an employee at Intel doing Linux kernel testing with Gentoo Linux. To be hoest I was a bit stunned when I heard that and although I was tempted to talk with her about the recent Meltdown/Spectre disaster I changed my mind and instead asked her if there will ever be another release of the xf86-video-intel driver that hasn't seen a release for a couple of years already. She told me that she knows the guy at Intel who is the driver's maintainer and that she would talk about it with him. Ultimately this was a real great start to FOSDEM for me and it keept me being quite motivated for the rest of FOSDEM.
    Later, some guy appeared at the stand talking about working on a new dependency resolver for Gentoo packages. If I remember correctly the resolver was not for a specific package manager but rather for being as thoroughly as possible providing the user with an end result of packages that can be installed rather than bailing out like portage does sometimes when the dependencies seem to be unsolvable. He is interested in getting portage configuration examples from different Gentoo installations for test-feeding those to his dependency resolver. Since I found his approach quite interesting, I referred him to the gentoo-dev mailing list and once he has sent his request to the list it would be nice if people could provide him with the requested data.

    During the day I also met monsieurp, haubi, haubi's daughter, abailler and Soap. I visited some presentations (one together with monsieurp) but they all were not what I expected and at the end of the day I was a bit disappointed about them. I could not even visit the last presentation being on my list because the crew of the Gentoo stand wanted to shut down the stand and I left my notebook there while being on the presentations. Nevertheless my overall impression about Saturnday at FOSDEM was great.

    After FOSDEM we had the Gentoo Dinner at a nice restaurant in Brussels (I forgot its name, sorry guys). This was mostly organized by xaviermiller who unfortunately could not make it to FOSDEM because of private matters but still wanted to meet with the Gentoo people. I had a great meal and a couple of really nice chats so this was a perfect finisher for a perfect day.

    Sunday

    For sunday I hoped to attend some more interesting presentations and I should not become disappointed.
    Starting with the GRUB upstream and distros cooperation presentation I learned that grub upstream is planning to implement a second - much less complicated - configuration language for grub. It's not intended to replace the current more complicated configuration language but rather being an alternative for people who don't need big, fancy setups. Furthermore they want to work more closely with downstreams so that distributions no longer have to apply lots of patches to their grub packages. As Gentoo already has the rule to send patches to upstream if possible, I was kind of happy seeing our current grub-2.02 package only applying one patch. We should expect grub-2.04 in the second half of 2018 and I'm looking forward to this release.

    The next presentation that didn't disappoint me was Data integrity protection with cryptsetup tools which I was especially interested in because of my notebook having full disk encryption with dm-crypt + LUKS and I wondered if the new LUKS2 + data integrity is ready for production usage. In my opinion it's not yet ready as the only cipher combination that can be safely used decreases the read/write performance quite much. But new ciphers are around the corner and once their specs have been finished and the ciphers being fully developed this might change to the better.

    Later I was at haubi's Unix? Windows? Gentoo! presentation which I visited because it was the only Gentoo related presentation at FOSDEM this year but in the end I learned quite a bit about Gentoo prefix in general and the new stuff that haubi is working on for Gentoo prefix. I must admit that I never took much care about Gentoo prefix because I never had a use case that required Gentoo prefix. But I learned at FOSDEM that Gentoo prefix seems to be quite an attractor for interested people and some visitors I talked to at the Gentoo stand were very interested in that specific part of Gentoo.

    Somewhere around noon, Amynka showed up at the Gentoo stand. So I had the pleasure to finally meet one of our most active Gentoo recruiters as well. Unfortunately I missed a very well done prank she played on monsieurp but even getting told about it was quite hilarious.
    While being in front of the Gentoo stand talking to some interested visitors, a guy walked by handing out Wireguard stickers. It turned out that he was zx2c4 who is not only a Gentoo developer but also the author behind Wireguard.

    During the day I also went to the VideoLAN stand together with lu_zero in hope to win a T-Shirt. Unfortunately all I won was a foto with a guy wearing a traffic cone costume but at least that's better than nothing and the guy was really fun to talk to. I took the opportunity to talk to some VideoLAN guys about the not so good support of DVD menus in current stable VLC. They asked me to give their latest development release a try as they claimed to have done lots of work in improving DVD menu support in it. I will do so once I find the time to finally watch some DVD again.
    Next to the VideoLAN stand was the stand of 0ad which I happen to do maintenance of the Gentoo package from time to time. Unfortunately I don't have much time playing that game although it's a really nice looking game. I told them about my issues with their (sorry guys) crappy build system and that I would love to see them doing some overhaul of it.
    Finally I approached the mozilla stand talking to a thunderbird developer(?) about why Gentoo uses the thunderbird sources to provide the seamonkey package to our users. At the end I drifted a bit away expressing my concerns with latest firefox releases and why I think they did no big service to their users by removing the old extensions API.

    I also met a CAcert guy at FOSDEM and we had a long chat about CAcert's current situation and its future which seems to be a bit uncertain right now. They are in desperate need of motivated people to support them in software development, support and assuring tasks. So if you are - like me - no friend of letsencrypt and still want some CA where you can get free certificates, please consider supporting CAcert. Not especially by donating money but by offering some of your time to help improving CAcert.

    Summary

    I should have done that earlier. Being away from such conventions for nearly eight years was definitely too long. It was a great pleasure meeting all those Gentoo developers (even if it was only for a couple of minutes like with zx2c4).
    I'd love doing this more often but having new family liabilities makes it more complicated and also money plays a roll in this as well. Time will tell if I can make it to future conventions. At least I had a great weekend and lots of good new memories.

    The Long Dark Story Mode

    Finally, The Long Dark has been released with the long awaited story mode.
    Unfortunately since their last pre-release update "Faithful Cartographer", they also introduced a couple of bugs which I first didn't notice (and simply thought were intended changes in their UI). Searching for other players having these issues I found about this posting in the Steam Forums. The solution suggested there didn't work for me because it's rather hard to submit command line options to the game via the GoG scripts. So I tried a different approach and wrote a small script which fixes all the issues that have been described in that Steam Forum posting:

    CODE:
    #!/bin/bash

    GAME_DIR="${HOME}/GOG Games/The Long Dark"
    PREF_FILE="${HOME}/.config/unity3d/Hinterland/TheLongDark/prefs"
    if [[ -f "${PREF_FILE}" ]] ; then
            sed '/Screenmanager Is Fullscreen mode/s|1|0|' \
                    -i "${PREF_FILE}"
    fi

    cd "${GAME_DIR}" && ./start.sh




    Also I used some different commands to find out the game's dependencies on Gentoo Linux and these commands revealed some more needed packages:

    CODE:
    > find ${HOME}/GOG\ Games/The\ Long\ Dark -type f -print0 | xargs --null --no-run-if-empty scanelf -L -n -q -F '%n #F' | tr , '\n' | xargs --no-run-if-empty readlink -f | uniq | xargs --no-run-if-empty qfile -CSq | sort -u | grep -v "^${HOME}"
    dev-libs/glib:2
    dev-libs/wayland:0
    media-libs/alsa-lib:0
    media-libs/libsdl2:0
    media-libs/mesa:0
    media-libs/openal:0
    media-sound/pulseaudio:0
    sys-libs/glibc:2.2
    sys-libs/zlib:0
    x11-libs/gdk-pixbuf:2
    x11-libs/gtk+:2
    x11-libs/libX11:0
    x11-libs/libXScrnSaver:0
    x11-libs/libXcursor:0
    x11-libs/libXext:0
    x11-libs/libXi:0
    x11-libs/libXinerama:0
    x11-libs/libXrandr:0
    x11-libs/libXxf86vm:0
    x11-libs/libxkbcommon:0