Let's talk about the test/build environment

Here's the place for discussion related to coding in FreeCAD, C++ or Python. Design, interfaces and structures.
Forum rules
Be nice to others! Respect the FreeCAD code of conduct!
mlampert
Veteran
Posts: 1772
Joined: Fri Sep 16, 2016 9:28 pm

Re: Let's talk about the test/build environment

Post by mlampert »

I have verified that the file exists and it can be accessed, so it doesn't seem to be permission/installation issue.

I tried a few things and none has an impact:
* disaabling the 2 test cases which use the offending file - all other tests pass
* moving the document loading into setUp function instead of as a class initializer
* using FreeCAD.openDocument() instead of FreeCAD.open()

I found this thread about the same error https://forum.freecadweb.org/viewtopic.php?f=10&t=54501

After reading through that I unpacked the .fcstd file and checked the validity of Document.xml. I also compared Document.xml of the new version against the Document.xml of the version currently in master and other than a few minor changes I didn't spot big differences. I couldn't find a DTD file to verify the file against but it seems to be well formed. I've attached it if somebody knows what to look for.

There are other .fcstd files in the PathTest directory which are loaded and used by other unit tests - files I also had to change to deal with the new python class structure - and those tests are passing.

Anything else I should or can try?
Attachments
Document.xml
(261.09 KiB) Downloaded 19 times
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: Let's talk about the test/build environment

Post by openBrain »

mlampert wrote: Mon Oct 03, 2022 7:08 am Anything else I should or can try?
OK, I tried some things out here.

I checked out your PR on a local branch then pushed it to my fork and finally triggered the CI manually.
And everything went well : https://github.com/0penBrain/FreeCAD/ac ... 3180547170

As I suspected, the problem is certainly due to GH caching because, as I ran a 2nd time the CI workflow it failed with the already seen error : https://github.com/0penBrain/FreeCAD/ac ... 3181812825

Now I disabled the cache and I'm currently running the CI another time again : https://github.com/0penBrain/FreeCAD/ac ... 3182126354
I expect it to succeed, will see in roughly 4 hours.

If it does, I will do some other tries using the "native" caching github action (the workflow currently uses an "enhanced one", but that isn't at the bleeding edge of dev, so can eventually have some bugs).
Will keep you updated.
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: Let's talk about the test/build environment

Post by openBrain »

openBrain wrote: Tue Oct 04, 2022 12:38 pm Now I disabled the cache and I'm currently running the CI another time again : https://github.com/0penBrain/FreeCAD/ac ... 3182126354
I expect it to succeed, will see in roughly 4 hours.

If it does, I will do some other tries using the "native" caching github action (the workflow currently uses an "enhanced one", but that isn't at the bleeding edge of dev, so can eventually have some bugs).
Will keep you updated.
OK, as expected, the run without cache succeeded.

Now, before trying the other caching action, I re-enabled the cache and instructed the workflow to export the Path files from source as artifact in case of failure. Hopefully it will allow some deeper investigations. To be continued...
mlampert
Veteran
Posts: 1772
Joined: Fri Sep 16, 2016 9:28 pm

Re: Let's talk about the test/build environment

Post by mlampert »

oh great - you figured it out. I didn't see your message so, given the log msgs I saw that the CI is done on Ubuntu Jammy - I installed a VM with that and built my branch. And yes, all unit tests pass.

I'm glad the issue is identified, I was about to lose the last of my already rather sparse hair.
Thanks!
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: Let's talk about the test/build environment

Post by openBrain »

mlampert wrote: Tue Oct 04, 2022 11:17 pm I'm glad the issue is identified, I was about to lose the last of my already rather sparse hair.
Thanks!
That's a bit messy to debug. I just changed the GH cache name to generate a new one, and now even subsequent runs don't trigger the error. :?
ATM, my opinion would be to offer a mechanism in the CI to force a build without using cache (something like adding "#nocache" in the commit message).
Also we could add a "#debug" tag that would upload build files and Ccache cache as artifacts.
Good to have some opinion from mergers before doing the effort. @uwestoehr @chennes @wmayer
wmayer
Founder
Posts: 20243
Joined: Thu Feb 19, 2009 10:32 am
Contact:

Re: Let's talk about the test/build environment

Post by wmayer »

So, you say the problem is caused by the caching. Does it mean that between two commits the system doesn't clean the build directory of the previous commit and it uses the old directory structure of the Path module which then leads to the failure to load the one file correctly?

If yes, then it should be possible to reproduce the issue locally by building this PR without purging the current build directory of the Path module? Or do I get it wrong?
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: Let's talk about the test/build environment

Post by openBrain »

wmayer wrote: Wed Oct 05, 2022 10:59 am So, you say the problem is caused by the caching. Does it mean that between two commits the system doesn't clean the build directory of the previous commit and it uses the old directory structure of the Path module which then leads to the failure to load the one file correctly?

If yes, then it should be possible to reproduce the issue locally by building this PR without purging the current build directory of the Path module? Or do I get it wrong?
You're wrong, but that's all my mistake because my previous posts make it easy. :)
The 'build' directory is always empty for each workflow run. Only the Ccache cache is cached from one run to the other.

I didn't found the real root cause yet (and not sure I'll found). What I saw is that when I disabled the cache, it ran well. Then I've not been able to reproduce the problem with a new cache name (maybe the GH cache name is the issue, highly improbable but who knows?).

Maybe we are in a very improbable (yet possible) case where a Ccache hash (used to identified available precompiled files in the cache) matches a former (different or different version of the) file and it messes up the build.
wmayer
Founder
Posts: 20243
Joined: Thu Feb 19, 2009 10:32 am
Contact:

Re: Let's talk about the test/build environment

Post by wmayer »

Only the Ccache cache is cached from one run to the other.
But we should keep using it as otherwise every commit will take a lot of time.
I didn't found the real root cause yet (and not sure I'll found). What I saw is that when I disabled the cache, it ran well. Then I've not been able to reproduce the problem with a new cache name (maybe the GH cache name is the issue, highly improbable but who knows?).
But the issue only occurs in the context of this PR when ccache is enabled? Is it possible to clean the cache manually and what happens if the PR was merged and a new change is committed? Is the error then banned or will it reappear?
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: Let's talk about the test/build environment

Post by openBrain »

wmayer wrote: Wed Oct 05, 2022 11:38 am But we should keep using it as otherwise every commit will take a lot of time.
I agree. Why my idea was to have a tag that (when present in the commit message), would allow to ignore cache.
But the issue only occurs in the context of this PR when ccache is enabled?
To be precise, when a former Ccache cache is loaded. In my tests, I still have Ccache enabled, just its cache is empty at start.
Is it possible to clean the cache manually
Yes, you most probably have rights to do so : https://docs.github.com/en/rest/actions ... -cache-key
and what happens if the PR was merged and a new change is committed? Is the error then banned or will it reappear?
I don't know exactly how this error may propagate (mainly because I don't know exactly where it comes from).
Having the PR merged -- or not -- doesn't change anything, the cache is uploaded as soon as the workflow run. The current naming of GH cache is only based on datetime.
wmayer
Founder
Posts: 20243
Joined: Thu Feb 19, 2009 10:32 am
Contact:

Re: Let's talk about the test/build environment

Post by wmayer »

openBrain wrote: Wed Oct 05, 2022 11:58 am I don't know exactly how this error may propagate (mainly because I don't know exactly where it comes from).
Having the PR merged -- or not -- doesn't change anything, the cache is uploaded as soon as the workflow run. The current naming of GH cache is only based on datetime.
Well, my idea was to merge the PR and then clean the cache manually. Then committing a new change triggers a rebuild from scratch and creates a new and hopefully valid cache.
Post Reply