4 min⚡️4 min 25 sec3 min 32 sec2 min 56 sec2 min 21 sec2 min 4 sec1 min 46 sec1 min 24 sec
0.8×
1×
1.2×
1.5×
1.7×
2×
2.5×
4 min⚡️4 min 25 sec3 min 32 sec2 min 56 sec2 min 21 sec2 min 4 sec1 min 46 sec1 min 24 sec
joy
1
love
2
wow
3
yay
4
up
5
down
6
More reactions
7
21 Comments
A
Anonymous
May 21, 2023
Thank you Kyle! 🙏
Reply
A
Anonymous
June 8, 2023
Thank you Kyle! 🙏 Could you tell me if the diff always compares between dev and prod or can it be between two 'states' of the same environment? Also, does it execute the model in prod as well?
Kyle McNair
June 8, 2023
> Could you tell me if the diff always compares between dev and prod or can it be between two 'states' of the same environment?
The --dbt integration is specifically intended to address the development use case, and compare the same tables, in different environments. To compare two different tables in the same environment, you can use plain old `data-diff` (no --dbt), and the joindiff algorith.
Mind sharing a little detail on your use case? Some type of migration?
> Also, does it execute the model in prod as well?
Nope! data-diff simply compares your dev tables to their prod counterparts - it doesn't require prod to execute
A
Anonymous
June 9, 2023
(edited)
Thank you! I thought the feature would execute the models in both the environment and then do a comparison. And hence I enquired if we could do the comparison in the same environment. The other reason for the query was in most cases the data used in Dev and Prod could be vastly different, with Dev containing only a subset or even some sample data. In such a scenario, the comparison between the two environments may not be that fruitful.
In our case, we have this same problem where the data between the two environments are different. And a comparison of the same table "before and after" the change would be more useful.
Lastly, I believe, as you mentioned, it only does a select on the Prod data instead of re-executing the model again. But a select "is" executed.
Reply
B
Bjarne Surströmming
Aug 11, 2023
Kyle, not only do you look good, now m data will be good too! 🙏
Reply
B
Bjarne Surströmming
Aug 11, 2023
Kyle, not only do you look good, now m data will be good too! 🙏
Could you tell me if the diff always compares between dev and prod or can it be between two 'states' of the same environment? Also, does it execute the model in prod as well?
The --dbt integration is specifically intended to address the development use case, and compare the same tables, in different environments. To compare two different tables in the same environment, you can use plain old `data-diff` (no --dbt), and the joindiff algorith.
https://docs.datafold.com/reference/open_source/cli#joindiff
Mind sharing a little detail on your use case? Some type of migration?
> Also, does it execute the model in prod as well?
Nope! data-diff simply compares your dev tables to their prod counterparts - it doesn't require prod to execute
I thought the feature would execute the models in both the environment and then do a comparison. And hence I enquired if we could do the comparison in the same environment. The other reason for the query was in most cases the data used in Dev and Prod could be vastly different, with Dev containing only a subset or even some sample data. In such a scenario, the comparison between the two environments may not be that fruitful.
In our case, we have this same problem where the data between the two environments are different. And a comparison of the same table "before and after" the change would be more useful.
Lastly, I believe, as you mentioned, it only does a select on the Prod data instead of re-executing the model again. But a select "is" executed.