Why we do not accept the result of our simulation study as evidence of a limitation of one method












2












$begingroup$


I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.



My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?



New edit



In other words,



My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.










share|cite|improve this question











$endgroup$

















    2












    $begingroup$


    I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.



    My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?



    New edit



    In other words,



    My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.










    share|cite|improve this question











    $endgroup$















      2












      2








      2





      $begingroup$


      I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.



      My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?



      New edit



      In other words,



      My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.










      share|cite|improve this question











      $endgroup$




      I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.



      My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?



      New edit



      In other words,



      My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.







      mixed-model simulation fitting






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited 35 mins ago







      Maryam

















      asked 1 hour ago









      MaryamMaryam

      5012




      5012






















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




          1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

          2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

          3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            Thank you so much for your answer. I appreciate it. I have edited my question.
            $endgroup$
            – Maryam
            31 mins ago






          • 1




            $begingroup$
            I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
            $endgroup$
            – Björn
            30 mins ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "65"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389476%2fwhy-we-do-not-accept-the-result-of-our-simulation-study-as-evidence-of-a-limitat%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




          1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

          2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

          3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            Thank you so much for your answer. I appreciate it. I have edited my question.
            $endgroup$
            – Maryam
            31 mins ago






          • 1




            $begingroup$
            I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
            $endgroup$
            – Björn
            30 mins ago
















          2












          $begingroup$

          Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




          1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

          2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

          3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.






          share|cite|improve this answer









          $endgroup$













          • $begingroup$
            Thank you so much for your answer. I appreciate it. I have edited my question.
            $endgroup$
            – Maryam
            31 mins ago






          • 1




            $begingroup$
            I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
            $endgroup$
            – Björn
            30 mins ago














          2












          2








          2





          $begingroup$

          Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




          1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

          2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

          3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.






          share|cite|improve this answer









          $endgroup$



          Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




          1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

          2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

          3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered 46 mins ago









          BjörnBjörn

          10.5k11039




          10.5k11039












          • $begingroup$
            Thank you so much for your answer. I appreciate it. I have edited my question.
            $endgroup$
            – Maryam
            31 mins ago






          • 1




            $begingroup$
            I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
            $endgroup$
            – Björn
            30 mins ago


















          • $begingroup$
            Thank you so much for your answer. I appreciate it. I have edited my question.
            $endgroup$
            – Maryam
            31 mins ago






          • 1




            $begingroup$
            I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
            $endgroup$
            – Björn
            30 mins ago
















          $begingroup$
          Thank you so much for your answer. I appreciate it. I have edited my question.
          $endgroup$
          – Maryam
          31 mins ago




          $begingroup$
          Thank you so much for your answer. I appreciate it. I have edited my question.
          $endgroup$
          – Maryam
          31 mins ago




          1




          1




          $begingroup$
          I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
          $endgroup$
          – Björn
          30 mins ago




          $begingroup$
          I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
          $endgroup$
          – Björn
          30 mins ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389476%2fwhy-we-do-not-accept-the-result-of-our-simulation-study-as-evidence-of-a-limitat%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Olav Thon

          Waikiki

          Tårekanal