Hi @Kashyap_Maheshwari
[I am taking a stab at replying first, but hopefully resonates with Amit as well but I know he does care so wait for his reply as well]
The changes that are proposed are definitely looking like are going to break a lot of changes and infact it is mostly to stay ahead with the python community latest trends than what could be considered as stable changes except for couple of things like a) moving the constants and urls to a seperate module for a better readability of the code b) extract codes that can be part of utils c) add the tests as they make sense
these above seem to be low hanging fruits for you to take a stab at without really breaking the existing functionality of the code. they can be multiple iterations as part of the first tranche of changes here.
now, the other changes are more of changing the norms of the current practices which may or may not become the actual standards (like poetry or pyproject.toml have not really become mainstream python ecosystem standards as yet, so I would advise against from proposing them to the community changes just yet, perhaps in the future when it is a widespread ecosystem standard)
once the above first tranche of changes are taken care of, I could suggest to see if it works with python 3.9 or python3.10 (or which ever is more stable python in 3.x series instead of the latest version) as that would help the whole codebase stay inline with the stable python release even in 3.x series, – this could be your second tranche of changes.
regarding the move to polars from pandas, (I was considering it myself a while ago), I would not really advocate it unless there is a really lot of potential in terms of performance improvement that can be measured and also proved.
however, what can be considered or attempted, ofcourse after first tranche and second tranche as priorities, is to consider moving to use pandas 2.0 library – recently @dexter has made all the changes necessary to make the current code base latest pandas 1.x compliant and is now ready for accommodating the move to pandas 2.x without much hassle unless the methods themselves have signficantly changed — but again, all aspects of the code base still need to be thoroughly tested and guarantee the backward functionality
and this is the reason I would suggest add the (missing) tests at first to make sure the functionality and integrity are thoroughly at check all the time.
so to summarize, this would be recommended approach, in my view:
-
tranche
o add all the missing functional and integrity tests as they make sense
o modularize the constants and urls – this improves readability without breaking the tests
o extract codes that can be part of utils
-
tranche
o consider making the codebase work with stable python 3.x series
-
tranche
o attempt to make the code base work with stable pandas 2.x series
o again, make sure all the functional and integrity tests are guaranteeing the backward compatibility
because this module is now deep in almost lot of code bases (thanks to efforts by @dexter ), it is important to make sure any and every codebase change is going to only provide benefits in the interests of the growing community here.
hope this helps – again, I really appreciate your efforts here @Kashyap_Maheshwari