Sunday, August 14, 2011

More PEP 402

After discovering my initial approach to PEP-402 style imports wasn't very robust, I implemented a better way of going about things but it later turned out that the recursive approach which importlib.__import__() employs is not well suited for virtual packages. Because of that, I ended up integrating P.J. Eby's (who is the author of PEP 402) iterative _gcd_import(). After a few minor changes it passes all unittests for importlib, but some details of virtual package imports still need to be worked out in the abstract before they can be implemented. In particular, it's not entirely clear if it's better to let virtual packages hang around should import of a module fail or should they be removed.

Tuesday, August 9, 2011

PEP 402

I realised I didn't say much about PEP 402 last time round so here's some more details:
PEP 402 aims to address the long standing problem with storing contents of a Python package in several directories. A previous proposal to fix this issue was PEP 382 which proposed extensions to the existing *.pth mechanism available on the top-level python path. PEP 402, however, describes a simplified solution where the requirement for directories with packages to contain an __init__.py file is lifted in some cases:
  • when importing submodules from a directory which contains Python modules. E. g. if there's a directory Foo containing Bar.py present on sys.path, one can import Foo.Bar, just as if Foo also contained __init__.py but not import Foo alone.
  • when importing *submodules* from a directory with the same name as an already imported package, e. g. a standard library package.

I spent the last few days working on a proof-of-concept implementation. The changes outlined in the PEP are quite small but identifying the correct places to inject new code was quite tricky. I had to read the importlib code in much more detail to identify the places where to make the changes but eventually I got the code to support simple use cases. I created a separate repository for this part of the project: https://bitbucket.org/jergosh/pep-402
Here's the commit with the initial implementation: https://bitbucket.org/jergosh/pep-402/changeset/2c60dc2d17f1

Sunday, July 31, 2011

The work on the PEP continues and, following Eric Snow's helpful suggestions, I posted it on the Import-SIG mailing list to get some more feedback.

Otherwise, I will be looking into implementing PEP 402 "Simplified Package Layout and Partitioning" during the remaining part of the program. This PEP describes functionality related to dividing packages into separately installed components, aka "namespace packages".

Thursday, July 21, 2011

Post-mid-term

This is just a quick note to say that my midterm evaluation was successful and I'm in fact slightly ahead of schedule. The remaining aims for the end of the program are:
  • Controlling which modules can be loaded by white- and blacklisting
  • Ensuring the ImportEngine class can be conveniently subclassed to extend/modify its behaviour
  • Final testing and documentation (including the PEP)

Monday, July 11, 2011

More progress

Following finishing working on the code for now, I moved on to documentation. I started with drafting a short PEP which describes the proposed changes (I've actually had it written for a few days but I wanted to incorporate Nick's comments before making it public): Import Engine PEP XXX.

With the mid-term evaluation imminent, I revisited my proposal and made sure all the deliverables are in place. I ended up doing things in a different order, mainly because at the time of writing I didn't quite understand how things are organised in the code but it seems I got everything to work. The PEP draft and also some misc functionality are a nice bonus on top of that. This means during the second part of GSoC (assuming my evaluation goes smoothly), I should have time to make my code integrate well with importlib so that there's no code duplication etc. and also polish the PEP, as well as implement remaining features.

Monday, July 4, 2011

Progress!

After a few distracting days I had several very productive ones and now all the features projected for the mid-term evaluation are in place (yippee!), leaving me enough time for documenting the work that has been done. This includes drafting an update to PEP 302, proposing including the import engine functionality into the Python distribution. As I have no experience in writing this PEPs, in the past few days I've been reading different PEPs to get a feeling what style they're written in and started drafting my own. The process is unfortunately likely to take a few more days since I'm much slower to write English than Python ;)

Saturday, June 25, 2011

In the past week or so I finished implementing the core functionality of my project, importing modules using isolated state. Loaders and importers now accept an optional engine parameter. If it's not supplied, they fall back on a GlobalImportEngine instance which uses global state.

I also discovered that some loaders (namely for builtin and extension modules) call functions from the implementation of import in C (imp.*) and those are hardcoded to inject modules into sys.modules. This is a problem, but thanks to limitations imposed on these kind of modules (one copy of the module per process), it should be fine to place them into modules dictionary in an ImportEngine instance after they are imported by the imp module.

Finally, I realised that the test structure is more complex because the global import state needs to be preserved. So there was a number of context managers and other tricks which I didn't change before but I think now I got it right.

As a side note, I had some personal trouble in the last few days but things should be back to normal now and I'll try to post more often here.

Wednesday, June 15, 2011

So after two weeks of preparations, I spent last week migrating the core importing functionality to ImportEngine.

Since there the hierarchy of classes in importlib involved in __import__ functionality has to be pretty deep to maximise code reuse, the changes are spread over quite a few places:

  • @module_for_loader
  • BuiltinImporter.find_module
  • BuiltinImporter.load_module
  • FrozenImporter.find_module
  • FrozenImporter.load_module
  • _LoaderBasics._load_module
  • _SourcelessFileLoader.load_module
  • _ExtensionFileLoader.load_module
  • PathFinder._path_hooks
  • PathFinder._path_importer_cache
  • PathFinder.find_module
  • _FileFinder.find_module
  • _DefaultPathFinder._path_hooks
In short, the engine argument has to be passed from the __import__ function down to the loaders. This is particularly tricky in case of path hooks (which are in fact implemented as a meta hook). All this results in substantial changes in the code and there are still a few places I need to iron out, and afterwards it will need solid testing.

Wednesday, June 8, 2011

Testing, testing, 1-2-3

One problem with using a custom import function is handling recursive imports (i. e. imports occurring in the module that you are importing). In order to handle this, I implemented a replacement __import__, which can be substituted for the builtin one (commit: https://bitbucket.org/jergosh/gsoc_import_engine/changeset/b189df886193).

I also spent a large part of last week figuring out how to make unittests take advantage of the ImportEngine. Resulting commit: https://bitbucket.org/jergosh/gsoc_import_engine/changeset/b39d1c0c3e53

Next step: new style finders and loaders.

Wednesday, June 1, 2011

First week and first commits!

With help from Nick and Brett, I implemented an initial barebones version of the import engine. The current implementation still uses the global import state, following Nick's advice to get the engine logic first.

Relevant commits:

Next step: unittests.

Saturday, May 14, 2011

Starting off!

Hello and welcome,

My name is Greg Slodkowicz and this is my blog documenting progress on the Python Import Engine project. Here is a short description of the project:
Large part of development effort in Python 3 went into making the language definition and standard library cleaner and more consistent. However, the module importing functionality was originally implemented in C, which made extending it from the level of Python nearly impossible. This functionality has since been made more flexible and the situation was improved by introduction of import hooks as well as an implementation of __import__() in Python.

A modularised, self-contained import functionality would make Python more consistent and powerful, e. g. isolating import state would open avenues into implementing better sandbox environments (through module white/blacklisting).

In the time before the coding period starts, I will focus on how to isolate the state relevant to the import functionality and how to maximise code reuse when implementing the ImportEngine class. A more detailed description of the design will follow on the Python wiki: http://wiki.python.org/moin/SummerOfCode/PythonImportEnginePlanning

The forked Python 3.3 repository is hosted on bitbucket: http://bitbucket.org/jergosh/gsoc_import_engine