Menu Sign In Contact FAQ
Banner
Welcome to our forums

Software nostalgia discussion

As an old C programmer, I am enjoying some of the new tools like vscode and platformio for programming my ESP32s. Really fun, but under the covers it’s all the same as it was in the 1980s.

Fly more.
LSGY, Switzerland

From Peter: (by the way, I know some about compiler, but can’t figure out how to quote from the forum without cut paste and add block quote manually :(

And nobody is going to put any effort into optimising tools which being GNU are a give-away.

I would disagree slightly, today’s compiler are quite amazing, and free, but companies invest a lot in optimising them back to the community.

I actually attended this talk and I thought it was quite entertaining:



Last Edited by roznet at 25 May 09:54
EGTF, United Kingdom

And nobody is going to put any effort into optimising tools which being GNU are a give-away.

There are only three compilers that matter nowadays, GNU (gcc etc), LLVM (Clang), and Microsoft. The first two are indeed free but they are very actively developed and supported at the expense of all the big names. As the video says, they are truly amazing. I use GCC and I can tell you that it produces far better code than anyone would dare to do in assembler (because it would be unmaintainable).

LFMD, France

for programming my ESP32s

One hopes China never messes about in Taiwan, or anywhere else The price paid for the ESP32 being cheap, and a good pre-cooked software package.

but can’t figure out how to quote from the forum without cut paste and add block quote manually :(

Posting Tips. I just use the bq. method.

There are only three compilers that matter nowadays, GNU (gcc etc), LLVM (Clang), and Microsoft.

Yes; I use GCC for ARM32 (ST Cube IDE). It does produce very clever code, especially at higher opt levels like -O3. But it can also bite you on the bum with stuff like auto loop identification replacing loops with memcpy() etc which is fine if you actually have memcpy() in your code. It doesn’t check that you have it available at runtime; not always true for things like loaders. Also your memcpy() might be poorly optimised; there is the whole Newlib v. Newlib Nano debate with more variations than anybody can understand. That’s before you get onto things like a printf library which uses the heap (!!) and isn’t thread-safe a) due to that and b) due to the heap API also not being thread safe. I had to dive into all that and mutex it appropriately. I know C does not officially guarantee thread-safety but this is not 1990 anymore

I haven’t watched the whole 1hr video but I get the drift. A lot of people worked on a lot of tricks but in the end the benefits are tiny because in most products 90% of the time is spent in 10% of the code (or some such – probably even more extreme in most cases) and a human can optimise the 10% by knowing what it is supposed to do. And a lot of clever code can make debugging very hard because e.g. conditional instructions don’t often generate code on which you can set a breakpoint, and if you build with say -O0 then you have to do an impossible amount of regression testing when you build the “release” version with -O3. Hence I use -Og the whole time and then Debug = Release.

EDIT: Watching a lot of the video convinces me even more than the esoteric optimisation effort is largely wasted. This switches bytes around

but how often does one need to do that in a time-critical application? I’d say never.

Administrator
Shoreham EGKA, United Kingdom

Peter, don’t hate me for contradicting you again, I love what you do … but an example of need to switch bits in time critical application is processing networking messages of different protocols in low latency financial stock exchanges communication for example, which is an area where Matt in the video worked on.

These kind of code are highly optimised for high frequency processing and highly time critical.
And for the little story, I think I recall Matt wrote compiler explorer precisely as he was trying to optimise this kind of code and trying to make sure the code was saving every cycle possible

Some other application where bitwise operation require optimisation are cryptography, computer graphics especially real time game graphics

I agree with your other statement though that for most application the problem is bloat and 90% spent on some silly wrapper on top of wrapper on top of high level abstractions, and advanced compiler won’t help for that…

Last Edited by roznet at 25 May 21:18
EGTF, United Kingdom

I thought that application used a lot of FPGAs, which is far faster than software could ever be. Is that the one where all users in a building are fed via fibres cut to equal length so that all the dealers get the price feed at the same time (within nanoseconds)? There was a guy on here who works/worked in that business but I think he’s gone.

One challenge in cunning optimisations is that you have to write the C code such that the compiler recognises the pattern. The video presenter came across one such example in GCC – the weird optimisation where you concurrently test for four possible bytes by using a side effect of a 32 bit comparison. In GCC it was dependent on the order in which the four values were listed. That makes it sensitive to coding style. And if you are going to accept a coding style dependency as the price of getting the highest performance, why not just use good old assembler e.g.

Same with other stuff like counting all the “1” bits in a word. One could write that function in lots of different ways. And probably the fastest would be a lookup table for each byte, and add them all up.

Crypto is another area I have spent time on and in general it is addressed with lookup tables, which are sometimes huge (megabytes). All the “s-box” based ciphers benefit from tables.

Another thing, relevant to management of projects where the product has a long market life but may need a revisit periodically, is that if your code relies on shaving off every last cycle to function, you need to archive not just the source but also the tools – probably in a VM. But almost nobody does that.

IMHO effort should go into improving the absolutely horrible linker script syntax I’ve wasted days on that stuff.

Administrator
Shoreham EGKA, United Kingdom

And probably the fastest would be a lookup table for each byte, and add them all up.

Hard to beat the POPCNT instruction (ARM has one too).

Intel has instructions for the common crypto stuff too.

I’ve spent the last ten years working on super-high-performance network stuff – we do intensive packet processing at 10G bits/sec. It’s not trivial and there is a huge amount of work gone into performance (e.g. ensuring everything is in L1 cache when needed, avoiding locks and atomic operations in the data path). There is exactly one place where we use a dozen lines of assembler. Everything else is in C++. The compiler produces code which is way better than anything you would do by hand.

LFMD, France

Hard to beat the POPCNT instruction (ARM has one too).

Sure, but I was talking about doing it in C.

The challenge of writing in C and expecting the compiler to recognise the code as e.g. counting 1s, and drop in POPCNT or whatever, is that it is

  • code style dependent
  • compiler version dependent

and if the code really is critical then a compiler upgrade could break it. In your firm you need to be very careful with compiler upgrades, and have a regression test suite.

The compiler produces code which is way better than anything you would do by hand.

That, however, does not withstand logical scrutiny The compiler cannot be more clever than the compiler writer.

It’s a hollow argument though because while asm written by somebody clever will (must) always outperform a compiler, you will never get anything finished these days. I was doing asm for about 30 years and the only reason I got some (very good) stuff done is because in the old days products didn’t need to be so sophisticated. Today, you might spend 3 months coding the functionality and then 10 man-years coding the connectivity (ethernet, tcp/ip, tls, etc). So you need libs for the latter… another debate. Obviously there are other problems with asm too, like documentation (which coders hate doing, also for best job security, so asm is mostly unmaintainable).

Administrator
Shoreham EGKA, United Kingdom

Obviously there are other problems with asm too, like documentation (which coders hate doing, also for best job security, so asm is mostly unmaintainable).

I was at a fun retro event at the National Museum of Computing last weekend (an Econet lan party, Econet was Acorn’s networking in the 1980s, a low cost bus network based on the Motorola 68B54 ADLC) trying to reverse engineer some 6502 asm I wrote when I was 15. I think I spent most of the two days saying “This doesn’t make any sense at all!” – this mostly because back in the day, I wrote extremely bad 6502 asm. (Lots of self-modifying code for no good reason).

I did figure out what the code did in the end, but decided if I wanted to actually run the networking programs I’d written back then, rewriting them would probably be the best option!

Last Edited by alioth at 26 May 10:25
Andreas IOM

Self-mod code is quite esoteric

I never actually did that, but a candidate for it would be the Z80 indexing instructions like

ld d, (ix+23)

and the 23 was stored as a byte, and if this is in RAM then you could modify that byte. It doesn’t save any time really though.

Administrator
Shoreham EGKA, United Kingdom
Sign in to add your message

Back to Top