For this big sequencing thing, I'm planning on using a 100bp PE protocol. This means that the Illumina machine will read the sequence from both ends and give me a sequence from each. The main question I'm working on now is how long should the insert size be?
If it is less than 200bp then I will get overlapping sequences (which is good according to J~, although I'm not clear as to exactly why). However Sara just got some sequencing back which had (we think) too much overlap and this diminished the total amount of data she has by about 10%. Her insert size was ~150bp which means the overlap was 50%. I think that what I should shoot for is closer to 20% but I can't seem to find any papers or seq-answer threads that address the issue.
I think that the reason to have some overlap is that the program I'll be using, Trinity (Grabherr 2011), will function better, but I'm not really sure.
Hopefully I can get this figured out soon so that I can start preparing my libraries. I have all the samples extracted so that's the final step before they get sent off.
3-10-13 update: The TruSeq protocol uses a enzyme to fragment the mRNA and then a bead-based size selection. I can modify this size-selection, but at this point it seems prudent to just go ahead and use what they give me.
----------------------------------------------------------------------------------------------------------------------------------
Grabherr, M. G., B. J. Haas, M. Yassour, J. Z. Levin, D. A. Thompson, I. Amit, X. Adiconis, L. Fan, R. Raychowdhury, Q. Zeng, Z. Chen, E. Mauceli, N. Hacohen, A. Gnirke, N. Rhind, F. Di Palma, B. W. Birren, C. Nusbaum, K. Lindblad-Toh, N. Friedman and A. Regev. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29:644–652.