Using several analyse techniques for the hierarchical clustering of a SAGE expression dataset of 822 tags from 74 tissue samples (normal and cancer) we show that cleaning the dataset (tags and experiments) is critical and that attribution of a tag to a gene is not easy. Comparison of cancers from various tissues is a difficult task as tissue samples cluster according to tissue origin and not as cancer or normal.