Merge/Intersection between table and dataset - How to achieve?












2















Consider the following table:



Id          Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0

(Id is an identity column)


In to that table I want to merge the following dataset:



Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06

(Id is NOT present in the dataset)


The rules for the merge are:




  • If the hash does not exist in the table, insert it to the table;

  • If the hash does not exist in the dataset, delete it from the table;

  • If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.


The result of the merge should leave the table looking like this:



Id          Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06


What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.










share|improve this question





























    2















    Consider the following table:



    Id          Hash
    ----------- ----------------------------------
    1 0x31F777F0804D301936411E3ECD760859
    2 0xD64A593F3E9ACC972158D522A4289EA0

    (Id is an identity column)


    In to that table I want to merge the following dataset:



    Hash
    ----------------------------------
    0x31F777F0804D301936411E3ECD760859
    0x31F777F0804D301936411E3ECD760859
    0x0C5A65264F92A543E7AAA06375349C06

    (Id is NOT present in the dataset)


    The rules for the merge are:




    • If the hash does not exist in the table, insert it to the table;

    • If the hash does not exist in the dataset, delete it from the table;

    • If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.


    The result of the merge should leave the table looking like this:



    Id          Hash
    ----------- ----------------------------------
    1 0x31F777F0804D301936411E3ECD760859
    3 0x31F777F0804D301936411E3ECD760859
    4 0x0C5A65264F92A543E7AAA06375349C06


    What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.










    share|improve this question



























      2












      2








      2








      Consider the following table:



      Id          Hash
      ----------- ----------------------------------
      1 0x31F777F0804D301936411E3ECD760859
      2 0xD64A593F3E9ACC972158D522A4289EA0

      (Id is an identity column)


      In to that table I want to merge the following dataset:



      Hash
      ----------------------------------
      0x31F777F0804D301936411E3ECD760859
      0x31F777F0804D301936411E3ECD760859
      0x0C5A65264F92A543E7AAA06375349C06

      (Id is NOT present in the dataset)


      The rules for the merge are:




      • If the hash does not exist in the table, insert it to the table;

      • If the hash does not exist in the dataset, delete it from the table;

      • If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.


      The result of the merge should leave the table looking like this:



      Id          Hash
      ----------- ----------------------------------
      1 0x31F777F0804D301936411E3ECD760859
      3 0x31F777F0804D301936411E3ECD760859
      4 0x0C5A65264F92A543E7AAA06375349C06


      What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.










      share|improve this question
















      Consider the following table:



      Id          Hash
      ----------- ----------------------------------
      1 0x31F777F0804D301936411E3ECD760859
      2 0xD64A593F3E9ACC972158D522A4289EA0

      (Id is an identity column)


      In to that table I want to merge the following dataset:



      Hash
      ----------------------------------
      0x31F777F0804D301936411E3ECD760859
      0x31F777F0804D301936411E3ECD760859
      0x0C5A65264F92A543E7AAA06375349C06

      (Id is NOT present in the dataset)


      The rules for the merge are:




      • If the hash does not exist in the table, insert it to the table;

      • If the hash does not exist in the dataset, delete it from the table;

      • If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.


      The result of the merge should leave the table looking like this:



      Id          Hash
      ----------- ----------------------------------
      1 0x31F777F0804D301936411E3ECD760859
      3 0x31F777F0804D301936411E3ECD760859
      4 0x0C5A65264F92A543E7AAA06375349C06


      What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.







      sql-server insert merge






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 1 hour ago







      James

















      asked 1 hour ago









      JamesJames

      1165




      1165






















          1 Answer
          1






          active

          oldest

          votes


















          3














          You could write it as a MERGE:



          WITH
          T AS
          (
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          ),
          S AS
          (
          SELECT DISTINCT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          )
          MERGE T
          USING S
          ON S.[Hash] = T.[Hash]
          AND S.rn = T.rn
          WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
          WHEN NOT MATCHED BY SOURCE THEN DELETE;


          But for performance reasons, I would normally write it as two separate statements:



          WITH ToDelete AS
          (
          SELECT
          T.*
          FROM @T AS T
          WHERE
          NOT EXISTS
          (
          SELECT
          S.*
          FROM @S AS S
          WHERE
          S.[Hash] = T.[Hash]
          )
          )
          DELETE ToDelete;


          WITH ToInsert AS
          (
          SELECT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          EXCEPT
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          )
          INSERT @T
          ([Hash])
          SELECT
          ToInsert.[Hash]
          FROM ToInsert;


          Where @S is the source table and @T is the target.



          You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



          There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.






          share|improve this answer
























          • There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

            – Randi Vertongen
            2 mins ago













          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "182"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          You could write it as a MERGE:



          WITH
          T AS
          (
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          ),
          S AS
          (
          SELECT DISTINCT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          )
          MERGE T
          USING S
          ON S.[Hash] = T.[Hash]
          AND S.rn = T.rn
          WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
          WHEN NOT MATCHED BY SOURCE THEN DELETE;


          But for performance reasons, I would normally write it as two separate statements:



          WITH ToDelete AS
          (
          SELECT
          T.*
          FROM @T AS T
          WHERE
          NOT EXISTS
          (
          SELECT
          S.*
          FROM @S AS S
          WHERE
          S.[Hash] = T.[Hash]
          )
          )
          DELETE ToDelete;


          WITH ToInsert AS
          (
          SELECT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          EXCEPT
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          )
          INSERT @T
          ([Hash])
          SELECT
          ToInsert.[Hash]
          FROM ToInsert;


          Where @S is the source table and @T is the target.



          You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



          There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.






          share|improve this answer
























          • There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

            – Randi Vertongen
            2 mins ago


















          3














          You could write it as a MERGE:



          WITH
          T AS
          (
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          ),
          S AS
          (
          SELECT DISTINCT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          )
          MERGE T
          USING S
          ON S.[Hash] = T.[Hash]
          AND S.rn = T.rn
          WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
          WHEN NOT MATCHED BY SOURCE THEN DELETE;


          But for performance reasons, I would normally write it as two separate statements:



          WITH ToDelete AS
          (
          SELECT
          T.*
          FROM @T AS T
          WHERE
          NOT EXISTS
          (
          SELECT
          S.*
          FROM @S AS S
          WHERE
          S.[Hash] = T.[Hash]
          )
          )
          DELETE ToDelete;


          WITH ToInsert AS
          (
          SELECT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          EXCEPT
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          )
          INSERT @T
          ([Hash])
          SELECT
          ToInsert.[Hash]
          FROM ToInsert;


          Where @S is the source table and @T is the target.



          You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



          There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.






          share|improve this answer
























          • There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

            – Randi Vertongen
            2 mins ago
















          3












          3








          3







          You could write it as a MERGE:



          WITH
          T AS
          (
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          ),
          S AS
          (
          SELECT DISTINCT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          )
          MERGE T
          USING S
          ON S.[Hash] = T.[Hash]
          AND S.rn = T.rn
          WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
          WHEN NOT MATCHED BY SOURCE THEN DELETE;


          But for performance reasons, I would normally write it as two separate statements:



          WITH ToDelete AS
          (
          SELECT
          T.*
          FROM @T AS T
          WHERE
          NOT EXISTS
          (
          SELECT
          S.*
          FROM @S AS S
          WHERE
          S.[Hash] = T.[Hash]
          )
          )
          DELETE ToDelete;


          WITH ToInsert AS
          (
          SELECT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          EXCEPT
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          )
          INSERT @T
          ([Hash])
          SELECT
          ToInsert.[Hash]
          FROM ToInsert;


          Where @S is the source table and @T is the target.



          You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



          There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.






          share|improve this answer













          You could write it as a MERGE:



          WITH
          T AS
          (
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          ),
          S AS
          (
          SELECT DISTINCT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          )
          MERGE T
          USING S
          ON S.[Hash] = T.[Hash]
          AND S.rn = T.rn
          WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
          WHEN NOT MATCHED BY SOURCE THEN DELETE;


          But for performance reasons, I would normally write it as two separate statements:



          WITH ToDelete AS
          (
          SELECT
          T.*
          FROM @T AS T
          WHERE
          NOT EXISTS
          (
          SELECT
          S.*
          FROM @S AS S
          WHERE
          S.[Hash] = T.[Hash]
          )
          )
          DELETE ToDelete;


          WITH ToInsert AS
          (
          SELECT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          EXCEPT
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          )
          INSERT @T
          ([Hash])
          SELECT
          ToInsert.[Hash]
          FROM ToInsert;


          Where @S is the source table and @T is the target.



          You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



          There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 28 mins ago









          Paul WhitePaul White

          50.3k14269435




          50.3k14269435













          • There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

            – Randi Vertongen
            2 mins ago





















          • There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

            – Randi Vertongen
            2 mins ago



















          There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

          – Randi Vertongen
          2 mins ago







          There are also bugs reported on merge statements mssqltips.com/sqlservertip/3074/…

          – Randi Vertongen
          2 mins ago




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Database Administrators Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          What are all the squawk codes?

          What are differences between VBoxVGA, VMSVGA and VBoxSVGA in VirtualBox?

          Hudsonelva