I looked through the Flux samples and now have an idea of how to run tasks on base kernels. Also I am trying to get the filesystem running, it seems that Flux kernel can read Linux file sytem which a good news for us. For tasks, We can start a kernel and run create several threads to run taskts.
Well, I looked into the checkpoing and injection aspects.
for non-real-time tasks, the ckp is possible, and we just have to follow what the "chk" means for its name, do some regular backup automatic if we want to make it transparent. The point is how long the intervals are, shall we offer an interface to control it.
A timer interrupt handler rountine can be used to accept this user defined parameter and do its work to trigger backup op timely.
The fault injection can also benefit from a smart TIH (time interrupt handler). Since usually we ask a process to stop for us to get in to do something, each time there is a context switch, to facilitate teh FI, how about we save the last process' id, and add a flag to the process structure. The flag severs as a symbol whether current process (on the context switch's "TO" part), ie, every time there is a context switch, this check flag operation is peformed. In case the previous process is the injection target, the current check will do the injection, the "id" saved in stack can be used to identify the address space of the target (sure in this way we need to do some mlock() or plock() to prevent its page from swapped or paged out, most probably unnecessary since the injection as process is usually not very hugh). If we are allowed to do more "save" for a context switch, we can save the intended injection address info which the target process will receive from injector, with that assistance, which page to lock is well-known.
The whole idea to provide an application transparent checkpointing scheme is not new, to add user specified information to decide the intervals for backup or we get some algorithm to figure it out automatically. User level chkpoint can be realized with two calls similar to longjmp and setjmp, also not new.
And in short, the fault injection can be made conveniently if we attach some operation to each switch between target and injector, along with some added saving contents if necessary, then what a user level injector needs to do is to say, "I want to inject, to that task (process), in that area), the actually injection is finished on the execution of the injection process. Note we just attach some function which we decide to do or not based on the user input, the overhead is there but not too much considering we are doing injection, this idea is borrowed from the C++ object functions.
The above statement may not right, I hope to see your comments.
The network driver works for my desktop at home (with the tulip.c driver). I'll the "even newer" 3Com Boomerange (3c59x.c) driver when I'm well enough to get out of the house.