Some common characteristics of apparent blind data-collecting robots:
IP address starting with 159.138
HTTP headers:
HTTP_REFERER: (none)
HTTP_ORIGIN: (none)
HTTP_ACCEPT_LANGUAGE: "zh-CN,zh;q=0.9"
HTTP_USER_AGENT: many different possibilities (probably all "faked"), but apparently always containing "AppleWebKit"
not using cookies
It seems to me that blocking by IP would be actually most reasonable. Maybe for some days or weeks at least, hoping that the robots would then not return for a while.
Searching around on the internet, others have reported the same problem since a few months about similar IP ranges.
It seems to me that requiring log-in for the whole forum does not have much lasting effect. They could probably not read and understand that login is required, and that every URL they try simply shows this same message and information.
As long as there is anything to see, and even small changes maybe (like content of the recent topics and shoutbox), it is probably "interesting" enough to try and collect data.
The bots are visiting many URLs which could not be accessed through any links for non-logged in people at the moment. So they rely on some memory of pages they visited before, or links they collected before.
There are thousands of possible URLs in the forum. So they could keep themselves busy with that for a very long time.
I think maybe only a "404 - not found" or "405 - gone" answer, if repeated long enough to them, could make them "forget" and let go of this site for maybe a longer while.
It looks like they are keeping a "slow" pace, in order to be not completely disruptive. Probably they would want to check every URL at least a few times over some longer period, until they would accept that it is useless to come back to.
So my approach would be now to just collect IP addresses (so far, all starting with 159.138) for a while and check if they all match the pattern of "probable bot". And then block all those IPs that seem to be confirmed as bots, for at least a week or two. Maybe better even a month.
Of course not totally sure how far exactly the IP range of bots goes. And such IP ranges could be re-assigned one day. But for now, it seems, they all come from some IP range currently (and probably for a longer time) belonging to "Huawei Cloud Services", i.e. some cluster of servers that anyone could rent, not belonging to any normal "consumer" internet connection of some smart phone, or laptop or desktop computer etc.