General FAQ about ArachisPheno
starWhat is ArachisPheno?
ArachisPheno is a repository for population scale phenotype data for Arachis (peanut) species.
Inspired by and based on AraPheno, for the model plant Arabidopsis thaliana.
starIs the data in ArachisPheno public?
Data on ArachisPheno is public. Please cite the phenotype, original study of the phenotype as well as ArachisPheno if you use any data from this database.
starWhich information is stored in ArachisPheno?
This database contains public phenotype data from different studies.
starWhich data formats are supported?
ArachisPheno supports a variety of different data formats, including CSV, JSON, PLINK and ISA-TAB.
starIs it possible to download the phenotype data?
Yes, you can download phenotype data at the individual phenotype views. You can download the phenotypic meta-information or the actual phenotype values. For this purpose, you can choose different formats, including CSV, JSON and PLINK.
starCan I download the full database?
Yes, click the download database link in the home page. This will generate a zip file containing a csv file with a list of the studies (and their details) as well as one folder per study, with the study id as the folder name. Each folder contains information about the study’s phenotypes as well as the values, both in csv and plink format.
starShould I upload mean/average values or replicates?
Whenever replicate values are available, you should upload the replicate values and not the averages/means. Both submission formats (ISA-TAB and PLINK) support uploading replicate values.
starIs it possible to preserve the replicate information across multiple phenotypes?
Yes, it is possible to preserve the specific value of each replicate across multiple phenotypes. In the case of PLINK or CSV, just repeat the FID (accession_id) multiple times and add an arbitrary number into the IID (replicate_id) column or alternatively leave it empty (it is not used by ArachisPheno).
For PLINK this should look as follows:
FID IID pheno1 pheno2 6909 1 24.5 100.2 6909 2 23.2 101.5 6909 3 25.2 99.4 6414 4 5.4 10.4 6414 5 11.2 6414 6 4.2 9.8 ...
For CSV this should look as follows:
accession_id,replicate_id,pheno1,pheno2 6909, 1, 24.5, 100.2 6909, 2, 23.2, 101.5 6909, 3, 25.2, 99.4 6414, 4, 5.4, 10.4 6414, 5, , 11.2 6414, 6, 4.2, 9.8 ...
The main difference between PLINK and CSV is that PLINK uses a space as a delimiter and CSV uses a comma. Additionally the headers are different.
Empty values are encoded as empty cells in both CSV and PLINK (see pheno1 for accession_id/FID: 6414 and replicate_id/IID: 5)
ArachisPheno will create separate replicate values for each accession and make sure that for example replicate 1 of 6909 has 24.5 for pheno1 and 100.2 for pheno2.
This also works for the ISA-TAB format.