[Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

jnxd · Post by **jnxd** » Fri Nov 18, 2022 7:13 am

There is a while (1) loop in GCS::System::identifyConflictingRedundantConstraints(). There is a possibility that this does not exit leading to FreeCAD hanging. This thread is to discuss what can be done about it.

Originally reported here: https://forum.freecadweb.org/viewtopic. ... 80#p639280. This particular hang happens with some unmerged features (tangent at B-spline knot), but there is a small possibility this can be triggered otherwise as well.

In the particular case @abdullah mentioned, the problem is happening because of two factors 1) internal alignment constraints are not intended to be removed; 2) the solution converged onto is a degenerate case. In the solution The end points of the line are "almost" coincident (10e-14 difference). This leads to some really large numbers in the Jacobian (in other cases it can also become nan on division by zero). I was only able to print R (a parameter which comes from QR decomposion) and there were some 10e+17 elements on its diagonal. Because of this, a 32 constraint system only has rank of just 2.

(My debug build is not breaking for some reason, possibly address sanitization, so I can't give exact details)

Here's an example attempted with existing code, with B-spline replaced by a parabola. Try to make the line and parabola tangent.

impossible-tangent-parabola.FCStd: (4.24 KiB) Downloaded 20 times

This does not lead to an infinite loop, but the internal constraints are still suggested for deletion (in fact far too many are).

@abdullah did you have the time to look into this since we last discussed it? If yes could you share any ideas? Thanks.

abdullah · Post by **abdullah** » Fri Nov 18, 2022 8:06 am

jnxd wrote: ↑Fri Nov 18, 2022 7:13 am @abdullah did you have the time to look into this since we last discussed it? If yes could you share any ideas? Thanks.

I changed recently the behaviour of the popularity contest algorithm to avoid suggesting internal alignment geometries, when possible.

We know that without this change, this while (1) loop is not a problem. Still, we know that we want to avoid suggesting removal of internal alignment geometries, when possible.

I have not taken the time yet to look at the algorithm, why this is happening with your branch, and what should be the best solution to it.

My stance is as follows:
1) We need to change the algorithm, to ensure that it does not hang, even when constrains are numerically unstable in a region. A hang is a hang and should be prevented.
2) Changing the algorithm needs to be done before release for sure. At the same time, I think it would help not to change it immediately. The current version of the algorithm has been deployed for some time and there is not a single report of a hang (I have warned about the issue to some of those helping in the help forum, to track the issue). Reports would help to better understand the extent of what needs to be fixed. Today, it is only using your branch that it hangs. The solution offered may be different depending on the scope.

jnxd wrote: ↑Fri Nov 18, 2022 7:13 am In the particular case @abdullah mentioned, the problem is happening because of two factors 1) internal alignment constraints are not intended to be removed; 2) the solution converged onto is a degenerate case. In the solution The end points of the line are "almost" coincident (10e-14 difference). This leads to some really large numbers in the Jacobian (in other cases it can also become nan on division by zero). I was only able to print R (a parameter which comes from QR decomposion) and there were some 10e+17 elements on its diagonal. Because of this, a 32 constraint system only has rank of just 2.

Internal alignment constraints should not be intended to be removed. If a situation arises where popularity contest cannot finish (i.e. hangs) while still ignoring internal alignment constraints (be it because we limit the number of runs of the loop, or because we can identify a priori that it won't finish), then we need to abort and convey this finding to the user (new error message). This should be very rare (currently only reproducible in your branch).

I cannot know without looking at the very specific of your degenerate case, which I have not done yet, what the actual problem is. The fact that the line converges to a point suggests that the constraint system is conflicting (which indeed can be due to errors due to numerical unstability). Specially for a tangency line to Curve, if the only solution the solver can get is to make the line a point. You should look into J (the jacobian) to see which numbers are being given by the constraints, to see if some numbers are zero, very small (e-8 or lower) or very high [i.e. checking the input data, following the garbage in-garbage out popular model]. Then, look to R, before and after doing the zero elimination, together with the mapping of which row of R corresponds to which which solver constraint of the original system. Before zero elimination, look for very big values and very small values. After elimination the same, but it will be easier in some cases to identify which constraints are creating the problem. I say both, before and after, because of the same garbage-in garbage-out principle. If the input data is not reasonable, the zero elimination (involving divisions) will only produce more unreasonable results.

One note however, R is not the reason that a 32 constraint system only has rank 2. R can, at best, show that the data input to the Jacobian is not numerically stable. The reason often is that the values provided by the constraint in the specific situation are not reasonable. Please, realise that all this Jacobian-QR decomposition is not about convergence to a solution (DogLeg) in the sense that the line is made into a point, but diagnosis of the equation system.

jnxd wrote: ↑Fri Nov 18, 2022 7:13 am Here's an example attempted with existing code, with B-spline replaced by a parabola. Try to make the line and parabola tangent.
impossible-tangent-parabola.FCStd

This does not lead to an infinite loop, but the internal constraints are still suggested for deletion (in fact far too many are).

Here I do not think there is a case of constraint lack of numerical stability. The situation is indeed impossible. The interesting part would be to investigate what happens inside the popularity contest, that in this case it does not hang, which could help identify what makes it hang with the B-Spline.

jnxd · Post by **jnxd** » Fri Nov 18, 2022 1:27 pm

My stance is as follows:
...
2) Changing the algorithm needs to be done before release for sure. At the same time, I think it would help not to change it immediately. The current version of the algorithm has been deployed for some time and there is not a single report of a hang (I have warned about the issue to some of those helping in the help forum, to track the issue). Reports would help to better understand the extent of what needs to be fixed. Today, it is only using your branch that it hangs. The solution offered may be different depending on the scope.

The problem with this algorithm is that it is not completely robust, or it's assumptions are not sufficiently documented, or at least that recent changes are breaking these assumptions.

Tangent-on-spline isn't the only branch where this hang occurs: certain corner cases of knot constraints also have this issue. Besides, even without the hang, it can suggest internal alignments for removal, as demonstrated with the parabola case.

Internal alignment constraints should not be intended to be removed. If a situation arises where popularity contest cannot finish (i.e. hangs) while still ignoring internal alignment constraints (be it because we limit the number of runs of the loop, or because we can identify a priori that it won't finish), then we need to abort and convey this finding to the user (new error message). This should be very rare (currently only reproducible in your branch).

Indeed the solution is a new error message, which could say 1) there is a degenerate case, and/or 2) the Jacobian matrix is ill-conditioned. Both of these could actually be the same thing, and a little bit more investigation might lead to finding the degenerate cases.

[In the b-spline case] The fact that the line converges to a point suggests that the constraint system is conflicting (which indeed can be due to errors due to numerical unstability). Specially for a tangency line to Curve, if the only solution the solver can get is to make the line a point.

Its not that the "only" solution that the solver can get is to make the line a point, but maybe the "closest" solution (a local optimum, something we should actually be aiming for otherwise). Theoretically, if, say the solver had the intuition of a human, it could take one look at the case and just rotate the line by 90 degrees without defying any other constraint.

I do not believe the numerical instability is causing the degenerate case, but that the degenerate solution is causing the numerical instability.

You should look into J (the jacobian).... Then, look to R, before and after doing the zero elimination, together with the mapping of which row of R corresponds to which which solver constraint of the original system. Before zero elimination, look for very big values and very small values. After elimination the same, but it will be easier in some cases to identify which constraints are creating the problem...

One note however, R is not the reason that a 32 constraint system only has rank 2. R can, at best, show that the data input to the Jacobian is not numerically stable. The reason often is that the values provided by the constraint in the specific situation are not reasonable. Please, realise that all this Jacobian-QR decomposition is not about convergence to a solution (DogLeg) in the sense that the line is made into a point, but diagnosis of the equation system.

Indeed I should look into those things. Also I need to understand this zero elimination a little bit more. However, visualizing Eigen matrices is troublesome, even after adding certain scripts (so far I'm using eigengdb).

I understand that R is not the reason for the rank. Sorry if I somehow implied that. Thanks for pointing out that this is all about diagnosis, not solution. In the B-spline case, a (bad) solution was found, and the diagnosis only happened after. In that case, when you removed the internal alignment issue and added the length constraint, the solver did not converge. Our infinite loop happens only after that convergence happens.

Here [in the parabola case] I do not think there is a case of constraint lack of numerical stability. The situation is indeed impossible. The interesting part would be to investigate what happens inside the popularity contest, that in this case it does not hang, which could help identify what makes it hang with the B-Spline.

The situation, as far as making the error zero, is not impossible, just that the only solution is a degenerate case. That was intentionally done, but somehow this does not hang. My guess is somehow the "conflictGroups" variable (which is essentially a graph) is still connected to the "non-problematic" constraints (which form the estimated basis).

Indeed investigating it will help a lot. As I said earlier address sanitizer I was trying is interfering with gdb. I'll get back here after I've rebuilt and tested.

abdullah · Post by **abdullah** » Sun Nov 20, 2022 7:18 am

Somehow my longer reply got lost while posting.

Short summary. we agree the algorithm needs to be reviewed. We agree we need to understand where the problems are before attempting to fix it, to see what is the best solution.

Diagnosis (QR) happens before Solving (DogLeg) and can prevent Solving altogether if there are issues of redundant/conflicting constraints. Popularity contest is part of diagnosis and happens after QR decomposition and before Solving, when the number of constraints and the rank point towards problems of redundant/conflicting constraints. So the infinite loop happens before calling DogLeg.

I do not know eigengdb. When I had to deal with massive R matrices in the past, I wrote them to a file using the debugging code and loaded it with Jupyter notebook. This is based on Python (NumPy, SymPy, ...). Apart from visualisation offers many options to investigate the matrices.

I do not understand how you can make the error zero in the parabola case with a tangency constraint. The parabola cannot rotate. Neither can the line. The line is normal to the parabola in the initial situation. I think we might be understanding two different things.

jnxd · Post by **jnxd** » Sun Nov 20, 2022 11:01 am

abdullah wrote: ↑Sun Nov 20, 2022 7:18 am I do not understand how you can make the error zero in the parabola case with a tangency constraint. The parabola cannot rotate. Neither can the line. The line is normal to the parabola in the initial situation. I think we might be understanding two different things.

I haven't seen the parabola code itself, so cannot comment on it. How I implemented tangency was requiring the cross product of two vectors should be zero. But another way for the cross product to be zero is making at least one of the vectors zero! I could enforce some sort of normalization, but that would make the gradients more complex.

jnxd · Post by **jnxd** » Sun Nov 20, 2022 11:05 am

abdullah wrote: ↑Sun Nov 20, 2022 7:18 am I do not know eigengdb. When I had to deal with massive R matrices in the past, I wrote them to a file using the debugging code and loaded it with Jupyter notebook. This is based on Python (NumPy, SymPy, ...). Apart from visualisation offers many options to investigate the matrices.

Could you share this code that you used to write the matrix to a file? If that can be done, of course plenty of options open up.

abdullah · Post by **abdullah** » Sat Nov 26, 2022 7:17 am

jnxd wrote: ↑Sun Nov 20, 2022 11:05 am
abdullah wrote: ↑Sun Nov 20, 2022 7:18 am I do not know eigengdb. When I had to deal with massive R matrices in the past, I wrote them to a file using the debugging code and loaded it with Jupyter notebook. This is based on Python (NumPy, SymPy, ...). Apart from visualisation offers many options to investigate the matrices.
Could you share this code that you used to write the matrix to a file? If that can be done, of course plenty of options open up.

The debug code you are looking for is in:
https://github.com/FreeCAD/FreeCAD/blob ... .cpp#L4230

Define these three macros:
https://github.com/FreeCAD/FreeCAD/blob ... CS.cpp#L32

Debug to file will generate this file:
https://github.com/FreeCAD/FreeCAD/blob ... S.cpp#L265

abdullah · Post by **abdullah** » Sat Nov 26, 2022 7:23 am

jnxd wrote: ↑Fri Nov 18, 2022 1:27 pm Tangent-on-spline isn't the only branch where this hang occurs: certain corner cases of knot constraints also have this issue.

Is it possible for you to share a case where the hang can be reproduced with the knot constraint?

jnxd · Post by **jnxd** » Sat Nov 26, 2022 2:56 pm

abdullah wrote: ↑Sat Nov 26, 2022 7:23 am
jnxd wrote: ↑Fri Nov 18, 2022 1:27 pm Tangent-on-spline isn't the only branch where this hang occurs: certain corner cases of knot constraints also have this issue.
Is it possible for you to share a case where the hang can be reproduced with the knot constraint?

Currently it's not possible because I'm without laptop for a week. However let's try. the hang can't be reproduced because the code is now patched differently. Maybe the following will show it: 1) revert last commit from that pr, 2) remove the if( numpoints >2) condition and just let the code within run without check.

[Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

[Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

Re: [Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

Re: [Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

Re: [Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

Re: [Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

Re: [Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

Re: [Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

Re: [Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis

Re: [Sketcher] (Possible?) infinite loop and degenerate cases in solver diagnosis