Issue: BISQ startup problem on slow networks ("3/4" problem)


#1

This is a follow-on to the issue discussed in this thread:

Summary:
On slow/congested networks, Bisq never progresses past step 3/4 of the startup sequence, due to a timeout occurring in the initial connection to seed nodes.

Details:
One of the compulsory steps in the Bisq client startup sequence is a special handshake with a seed node, consisting in a two-way exchange of large data packets in the 1.5 MB size range (PreliminaryGetDataRequest in the debug log). On slow networks, specifically when the network output channel is degraded, this handshake fails systematically, resulting in an inability to start up (the “3/4” problem).

Bisq versions tested: 0.8.1, 0.9.3, 0.9.4, 0.9.5
Test platform: Tails 3.10, 3.11, 3.12 (Debian 9)

To reproduce the bug, use a working Bisq installation and limit the network output bandwidth to a small value (100 kbps in my example) using the Linux TBF qdisc.

Steps to reproduce:

  • start with a correctly installed and tested operational copy of Bisq
  • close the application
  • in a terminal, execute (replace eth0 with name of default route interface):
    sudo tc qdisc add dev eth0 root tbf rate 100kbit burst 1540 latency 50ms
  • start the application as usual

Expected result:
Bisq should complete the startup sequence and operate (with some lag due to network bandwidth limitation).

Actual result:
Bisq never progresses past step 3/4 of startup sequence, resulting in complete no-op.

The 2 log file fragments in post 22 of the original thread illustrate well the “expected” vs “actual” scenarios.

Happy to assist with any debugging information, just let me know.


#2

Thanks for the info. I forwarded it to our P2P network dev.


#3

No problem @ManfredKarrer, it’s my pleasure to help.

I have done more testing on this issue and have found another interesting quirk. It seems that a network degradation during program startup can leave the program in a dead-end state, from which it can never recover on its own, even after full network recovery. The only way to recover is restarting the program.

Here’s the test sequence:

  • like above, start with a working installation
  • close application
  • degrade network: tc qdisc add ...
  • start application
  • wait 3 hours, there should be no progress past step 3/4
  • remove network degradation: tc qdisc del ...
  • wait indefinitely, there will be no progress past step 3/4, even 12 hours later
  • close application
  • restart application without any change to network parameters
  • full startup sequence occurs as expected

Due to time constraints I was only able to do the test once, but it would be interesting to see if other people can reproduce it. If confirmed, then the existence of a dead-end program state might explain some of the other out-of-sync issues reported recently.


#4

Thanks for the report and testing!
That behaviour is expected, in the startup procedure we don’t have any recovery mechanism built in. Only after you are connected you would recover in case you lose network connection.
But would be definitely good to get those issues solved. Best to get in touch with Florian as I don’t have time atm to focus on that. I think Florian is not active here on the Forum so better to use Github.


#5

Thanks, will do.